Banner reading XPU dot pub
Simple Oryon block diagram

Qualcomm’s Oryon CPU Challenges x86 But How Does It Stack Up Against the Arm Cortex-X925?

Break the rules and you will suffer. Follow the rules and you will suffer.

Qualcomm disclosed an unusual cache configuration and other details of the Snapdragon X’s Oryon CPUs. After several lackluster Arm-based PC attempts, Qualcomm and Microsoft are finally mounting a credible attack on x86 hegemony. The chipmaker’s acquisition of startup Nuvia equipped it with a CPU that can rival AMD’s and Intel’s processors’ performance. Qualcomm is combining its new CPU with a refreshed Windows-compatible Adreno GPU.

Qualcomm Oryon Goes Almost as Wide as Cortex to Raise IPC

Microarchitecture comparisons between vendors are parlous. Performance and power differences accrue from many factors, not just the most prominent design elements. Empirical testing will reveal how well Oryon compares with Arm Cortex-X925, AMD Zen 5, and Intel Lion Cove (in Intel Lunar Lake and Arrow Lake processors).

The Qualcomm Oryon is a wide CPU, featuring eight decoders and many execution units. However, it’s not as wide as the Cortex-X925, which has ten decoders and even more execution resources. Arm’s design falls short only in load/store hardware, where it offers two L/S plus two load-only units compared with Oryon’s four units, which all support both loads and stores. Oryon also has larger TLBs, but Arm’s CPU has a bigger reorder buffer. As for clock rate, Oryon’s runs at 3.8 GHz, boosting to 4.3 GHz in a 4 nm process. Arm has talked about a 3.8 GHz rate for a 3 nm Cortex-X925. If a licensee’s design can boost to 4.3 GHz, we expect it to be about as fast as Oryon on CPU-bound integer code and deliver greater floating point and vector throughput.

We expect Zen 5 and Lion Cove to be wider than their predecessors. The current-gen Zen 4 and Golden/Raptor/Redwood Cove CPUs can match Oryon’s front-end throughput only when pulling micro-ops from their Level 0 caches. Back-end execution resources are similar except for the wider vector units. The x86 processors stand apart owing to their clock rates, which peak at around 6 GHz.

Oryon Employs Unusually Sized Caches

Taking advantage of Oryon’s longer clock period, Qualcomm built first-level caches that are unusually large but don’t require more cycles. At 192 KB and 96 KB, Oryon’s instruction and data caches (respectively) exceed the 64 KB and smaller units competitors employ. The bigger L1 caches will alleviate pressure on Oryon’s second level, which shares 4 MB among four cores. Competitors employ private L2 caches that provide at least as much capacity per core. Moreover, they also have large last-level/L3 caches to further mitigate stalls waiting for off-chip memory. Oryon, by contrast, has only a 6 MB system cache, enough to buffer DRAM transactions and help manage coherency among the Snapdragon’s function units.

Bottom Line

Qualcomm promises Oryon will enable the Snapdragon X Elite to deliver greater single-thread performance while requiring less power than current-generation AMD and Intel laptop processors. Now that reviewers can finally obtain systems, we’ll soon see third parties evaluate these claims. Meanwhile, AMD and Intel are readying new chips, which should improve performance and power efficiency, and other Arm-compatible PC processors could be available next year. Because they’ll target thermally constrained designs, they won’t challenge the x86 speed demons for desktop dominance. That, combined with compatibility concerns, will preserve x86’s prominence in the Windows ecosystem, even if Oryon proves wildly successful.

Note: We’re inquiring about SVE support and hardware features to aid x86 emulation.





error: Unable to select