Arm promises its just-announced Cortex-X925 CPU will deliver 36% greater single-thread performance on Geekbench than the prior-generation Cortex-X4 owing to a combination of instruction throughput (IPC) and clock-rate increases. The new midrange Cortex-A725 improves upon its predecessor’s power efficiency. To round out its client-CPU portfolio, Arm updated its Cortex-A520 and added features to its DSU-120 interconnect/cache to improve their efficiency, too.
Arm Updates Annually
Arm updates its top-end Cortex-X and midrange Cortex-A700 client designs (IP) annually and the low-end Cortex-A500 less frequently. This cadence has helped smartphone-processor suppliers deliver new high-profile flagship products every year as well. If Arm-based PCs take off, the yearly refresh will also benefit them. A consequence of the frequent updates is that microarchitecture changes, particularly for the midrange, are typically incremental.
Vector Units Lie Behind Cortex-X925 IPC Gains
The Arm Cortex-X925, however, has a singular large microarchitecture change: 50% more vector units than the Cortex-X4. Tapped by AI algorithms and even Geekbench, this additional hardware undergirds Arm’s AI-performance claims and helps the X925 raise its Geekbench 6 IPC by about 15%. In addition to speeding up vector operations, the X925 also raises matrix-multiply throughput, accelerating AI models.
However, typical integer workloads will see smaller IPC gains. On the standard Spec integer benchmark, for example, the X925 raises IPC by about 7%. Speedups will come to front-end-bound programs owing to various X925 changes, including a doubling of instruction-cache bandwidth. Programs bound by load bandwidth will also run faster owing to the X925 adding a fourth load pipeline and doubling data-cache bandwidth. Other performance enhancements include executing unconditional direct branches and sign extension wholly within the front end, freeing execution resources.
Physical Design Gooses GHz
IPC doesn’t tell the whole performance story, however. Cycles per second (clock speed) is the other factor. Whereas Cortex-X4 operated at 3.25–3.39 GHz in flagship 4 nm smartphone chips, Arm revs the X925 to 3.8 GHz. To achieve this rate, customers must license an Arm Client CSS hard macroblock instead of the usual soft design (IP) and employ a 3 nm process. Whereas Arm previously quoted performance gains assuming the same clock rate and configuration (iso-everything), the company this generation is touting many gains iso-nothing. That 36% Geekbench speedup, for example, comes as much from a faster clock rate as more IPC.
Arm Grinds Out Cortex-A700 Tweaks
The Arm Cortex-A725’s improvements over its predecessor are less obvious. The company assesses power efficiency by retarding the new, faster CPU’s clock and voltage so that it delivers the same performance as its predecessor at its peak speed. Realistically, customers will run the A725 at its peak. Efficiency will therefore be less than what Arm cites owing to the power-performance curve’s nonlinearity. Moreover, in some cases, Arm compares a 3 nm Cortex-A725 to a 4 nm Cortex-A720 and configures the A725 with larger caches. Customers licensing a CSS should see greater clock rates as well as power and area reductions. Like the A720, the A725 also comes in an area-optimized configuration to lure licensees to upgrade from the Cortex-A78.
Competition
- Architecture licensees—Qualcomm has resumed developing CPUs instead of licensing them, joining Apple as an Arm-architecture licensee. Apple inaugurated the current era of wide brainiac CPUs, an approach embraced by Arm with recent Cortex-X cores, achieving performance unavailable at the time from licensed cores and further adding value with proprietary x86-emulation and AI-focused matrix-math extensions. Employing some of the same people behind Apple’s technology, Qualcomm is also trying to outdo Arm. The licensor, however, is steadily ratcheting up Cortex-X performance, and the CSS option eliminates physical design as a chipmaker’s differentiator.
- X86 suppliers—As Arm-compatible PCs extend beyond those powered by Qualcomm’s processors with their proprietary CPUs to some based on Arm’s cores, the PC-benchmarking legion will test the Cortex-X925 against the newest from AMD and Intel. The x86 speed demons’ peak performance will be hard to match, but industry leader Intel must improve power efficiency.
- RISC-V is a nonfactor. A few companies offer high-performance RISC-V alternatives to Cortex-A725 and Cortex-X925, but compatibility concerns keep RISC-V host processors out of smartphones and PCs. As for other SoC types, compatibility and business concerns favor Arm in all but some niches.
Customers
Arm has delivered steady performance and efficiency improvements to its Cortex-A700 and Cortex-X product lines. When coupled with new process technology and multicore configurations employing more high-end and fewer low-end cores, Arm’s refreshed CPUs have helped smartphone processors achieve sizable performance gains. Licensees making smartphone chips as well as those developing other SoCs will find Arm’s 2024 lineup again delivers annual performance and power improvements, proving itself to be a reliable semiconductor-technology supplier.
Bottom Line
The Cortex-X started as a modified Cortex-A7x but has evolved to be a separate microarchitecture sharing design elements with its sibling. This separation has enabled Arm to push performance boundaries with Cortex-X, focus on efficiency with Cortex-A500, and balance the two parameters with Cortex-A700. The Cortex-X925 should enable licensees to field processors performing similarly to those from AMD, Apple, Intel, and Qualcomm. The new Cortex-A725 delivers small improvements over the Cortex-A720, which was slightly better than the Cortex-A715. As small as these changes are, they’re enough to propel licensees’ upgrade cycles.
This post was updated 7 June 2024.