Arm Neoverse V3 and N3 Turn Up the Heat on X86

Arm has refreshed its entire infrastructure-CPU line, delivering new Neoverse V3, N3, and E3 cores. A rung up the integration ladder, the company also offers compute subsystems (CSS) that combine these CPUs, a mesh network, cache, and interfaces to facilitate customers’ SoC and chiplet designs. Arm reports V3 and N3 are typically about 13% and 16% speedier than V2 and N2, respectively, and almost 2× and 3× faster on XGBoost, iso frequency and iso core count.

Notables

The speedups come partially from the updated coherent mesh network, the CMN S3 (the successor to the CMN-700), that Arm released alongside the new CPUs and that improves memory-access performance.
XGBoost is particularly sensitive to memory throughput and cache size. The outsized XGBoost gains likely come in part from mating the newer cores to larger Level Two caches.
Arm shows the older V2 and N2 outperforming AMD Epyc and Intel Xeon on database, Java, and XGBoost tests. However, the comparisons effectively match a single Arm core against a single x86 thread running on a dual-thread core. Epyc and Xeon threads would be faster per GHz if operating their CPUs in single-thread mode. In some comparisons, the Epyc processor has a low operating frequency, being one of the higher-core count models.
Arm has withheld microarchitecture details, but in the past has based Neoverse cores on Cortex designs. The Neoverse models add infrastructure features, such as extending physical-memory addressing and adding confidential-computing and RAS capabilities. They also support multiple SIMD pipes, with two 128-bit and four 128-bit pipes typical in the N2/N3 and V2/V3, respectively. Further, the company has the opportunity to tweak the microarchitectures, such as by enlarging branch-history tables, to improve performance on server workloads. Other differences between Neoverse and Cortex cores include how they connect to the SoC bus and more advanced system IP (e.g., SMMU and GIC).
The company likely bases the Arm Neoverse V3 (code-named Poseidon) on the Cortex-X4 (Hunter ELP), the N3 (Hermes) on the Cortex-A720 (Hunter), and the all-but-invisible E3 (Hayes) on the Cortex-A520 (Hayes). The V3 and N3 instruction-throughput gains over their predecessors are better than (but consistent with) those of their Cortex counterparts. The N3’s claimed 20% performance-per-watt improvement over the N2 is also similar to that claimed on the Cortex side. Likewise, the V3 CSS’s claimed 50% per-socket uplift over an N2 CSS is consistent with the Cortex-X4’s speedup over the Cortex-A710, the basis for the N2.
As much as Arm associates Neoverse with AI, the new CPUs are general purpose and target various workloads. There is nothing AI-specific to them. However, they can be a key ingredient in a processor chip targeting AI; for example, one with high-speed chip-to-chip (or die-to-die) interfaces can alleviate bottlenecks to an NPU.
The Nvidia Grace chip is one such processor. Arm, however, now has numerous Neoverse customers. AWS was the first with a large-scale deployment and bases its fourth-generation Graviton chip on the V2. Rival cloud company Microsoft uses the N2 in its Cobalt 100. On the embedded side, Marvell is a notable customer.
In addition to the CSS macroblocks, Arm reduces customers’ time to market with its Total Design and Chiplet System Architecture (CSA) efforts. Total Design is a collection of IP, services, and foundry companies that contribute to building a complete Neoverse-based chip. A customer, for example, can go to Socionext, Faraday, ADTechnology, or another services company to design an ASIC integrating a Neoverse CSS and other IP and take it to TSMC, Intel Foundry Services, or Samsung to fab it. The CSA is less tangible but should help chiplet technology mature.

Competition

Arm-based microprocessors are a proven alternative to AMD and Intel x86 chips, particularly for cloud operators, and can run important software. Their performance and power are competitive, allowing customers to break free from chip vendors, mitigate supplier risk, and reduce capital costs. Importantly, they also allow a degree of customization, such as by adding proprietary chip-to-chip interfaces or trading core count for single-thread performance. The latter likely motivated AMD Bergamo and Intel Sierra Forest development. The x86 companies may need to become more flexible in how they tailor and deliver technology.

For embedded infrastructure, Arm is the defender and Neoverse can stave off attacks by server-like chips (AMD Siena, Intel Xeon D); as for SoCs, Arm doesn’t have a credible challenger.

Customers

Customers choose a supplier based on their confidence in the company’s roadmap as much as on today’s products. Execution is the best way to build confidence. The new Arm Neoverse V3, Neoverse N3, and Neoverse E3 and the CMN-800 indicate Arm is performing well. We expect those already sourcing earlier versions to upgrade to the new models. The CSS and Total Design initiatives make using Neoverse easier than before, lowering adoption barriers and attracting additional customers.

Bottom Line

Arm’s position in embedded infrastructure has been well cemented but only in recent years can the same be said for computing infrastructure. Arm servers are no longer a question of if, no longer a question of when, but a question of how widespread. The company can leverage its investment in the well-selling Cortex CPUs to develop a consistent stream of new Neoverses. Cloud providers and chip companies like Nvidia have the resources and motivation to employ these for homegrown processors, replacing x86 in some instances. As Neoverse becomes easier to adopt and software more prevalent, additional customers will design it in.

Arm Neoverse V3 and N3 Turn Up the Heat on X86

Notables

Competition

Customers

Bottom Line

If You Enjoyed This Post, Read one of These: