enable images to see person experiencing Zen

AMD Zen 5 Raises IPC, Speeds up Math and AI Processing


This article is based on preliminary information. We’ll update it if we get additional information from AMD.

AMD has disclosed Zen 5 details. The new CPU’s microarchitecture departs from that of Zen 4, which implemented many small changes to Zen 3. Clock rates are similar, and AMD claims instruction throughput (IPC) climbs 16% over Zen 4, which gained 13% over its predecessor according to AMD. For complex microarchitectures, IPC increases vary widely among applications, however. The biggest speedups will come on math-intensive code owing to Zen 5 doubling peak SIMD (AVX) throughput.

AMD Zen 5 Stuffs More into a Same-Sized Sack

Die size (based on preliminary information) for an eight-core compute die (CCD) appears to be similar to that of Zen 4, although the aspect ratio changes. Our proprietary spreadsheet-based derriere-extraction technique had indicated the CPU alone should’ve grown at least 25%, including the shrink from TSMC’s N5 to N4P process. The new processors’ L2 and L3 cache sizes are unchanged, and the die-to-die (chiplet) interface is also likely the same. In Zen 4, these accounted for about half the die, thus diluting a hypothetical 25% CPU-size increase to 12%. Tighter physical design must explain the remaining density improvement. Assuming Zen 5’s N4P process is no more expensive than Zen 4’s N5, the IPC gains come at no additional manufacturing cost to AMD.

Big AVX Vectors Accelerate Number Crunching

FP/SIMD units occupy a lot of area, especially considering their limited applicability. AMD saved area in the last generation by executing AVX-512 instructions on a 256-bit data path by breaking them into two sequential operations. Zen 5 implements the full architectural width and doubles the number of physical registers. The new CPU should crush Zen 4 on the Linpack benchmark. Other software also speeds up; AMD touts 35% and 32% IPC uplifts on Geekbench AES and machine learning subtests. Moreover, to keep the wider FP/SIMD unit fed, Zen 5 widens the path to the data cache and increases the latter’s size by 50%. This may accelerate data-intensive workloads even if they don’t use AVX-512.

Aping Arm, AMD Adds ALUs

The integer path also widens in AMD Zen 5. Peak ALU operations per cycle now match or exceed Intel’s current P-Core and the Qualcomm Oryon but fall short of the Arm Cortex-X925. The number of units capable of branches increases, matching that of the wide Arm machine, which has dedicated branch units instead of sharing them with ALUs. Zen 5 adds a fourth address-generation unit (AGU), enabling it to compute as many loads or store addresses per cycle as the class-leading Oryon. (With its M1 CPU, Apple initiated the recent trend to wide execution, and the M4 is probably comparable to the Cortex.) Overall, execution-bound workloads will run faster on Zen 5 than its predecessor. A lot of PC benchmarks fall into this category, but servers often run memory-bound applications.

AMD Zen 5 is Big Up Top

A wider front end matches the added execution engines. Zen 5 can dispatch 33% more operations per cycle than its forebear but trails that of the Intel P-Core and the Cortex-X925. More significantly, Zen 5 has eight decoders, twice that of Zen 4, exceeding that of Intel, matching Oryon, and approaching the Cortex-X925. AMD describes Zen 5 as having a pair of four decoders. We infer Zen 5 employs the same technique as Intel’s E-Core, which also has two decoder banks. Because x86 instruction sizes vary, a critical path is finding instruction boundaries. Finding eight at once probably takes too long. One bank, therefore, decodes four along the current execution path while the other works independently on a different block, such as starting from the next predicted branch target.

Fair and Balanced Are the Performance Gains

Overall, performance improvements are balanced throughout the design. AMD attributes Zen 5’s 16% IPC gain to greater execution resources (34%), added data bandwidth (27%), a wider front end (27%), and improved fetching and branch prediction (12%). By contrast, Zen 4’s gains mostly came from doing a better job feeding execution units instead of adding to them.

AMD is Ahead of Intel

  • Intel, which dominates PC processors, will field its all-new CPU microarchitecture this year, starting with the TSMC-built Lunar Lake. Intel’s big cores have been more complex than AMD’s, and their better physical design has helped Intel stay competitive despite all but Meteor Lake laptop chips using older process technology. Zen 5 puts AMD’s microarchitecture ahead, but Lunar and its successors should leap-frog it—at least in terms of performance. Employing a newer fabrication technology, Intel’s upcoming processors should also improve power efficiency, where they have been lagging behind Zen.
  • Power efficiency is critical when comparing with Qualcomm. Despite the focus on the company’s new PC processor’s AI performance, battery life could prove to be their key selling point. AMD implements various power-related improvements in its Zen 5 processors, but they’ll likely struggle to match Qualcomm Snapdragon’s efficiency. On desktop, however, performance rules, and Zen 5’s higher peak clock rate eliminates AMD’s competitors other than Intel.

What More Can AMD Do for OEMs?

Among PC enthusiasts, AMD has a big following, which has grown owing to the company’s excellent execution and Intel’s woes. Hungry for performance, particularly if not accompanied by absurd power levels, and unimpressed by multithread benchmark scores pumped up by E-cores, this crowd will find Zen 5 feeds their craving.

AMD has yet to win over OEMs to the same degree. They buy most PC processors; for them, the company’s consistent execution over five Zen generations is a plus. As AMD fields improved PC processors based on Zen 5, it could gain share, particularly if new Intel processors fall short of consumer expectations or reliability problems tarnish the company’s reputation.

Bottom Line

Zen 5 delivers IPC speedups consistent with past Zen generations and improves area efficiency. Performance gains come throughout the microarchitecture, and workloads using AVX-512 will see a big speedup. Following the path blazed by Apple, Arm, and Qualcomm designs. With any new microarchitecture, some speedup opportunities are left for the next generation to address. The PC- and server-processor market is heating up as Arm-compatible alternatives get traction. A fast, efficient CPU is essential to success in these markets, and Zen 5 puts AMD ahead on important metrics.


Posted

in

by

Tags:


error: Unable to select