Banner reading XPU dot pub
photo of three applie cores, one large and two small

You Would Think There’s a Single Optimal Heterogenous Multicore Configuration for Power, Performance, or Area

The new AMD 8500G (Phoenix 2) PC processor has a heterogeneous CPU configuration suited to mainstream users that could be a model for other processors, but configurations are as much a result of the processor vendor as customer requirements. Years after the multicore transition and the initial heterogeneous-CPU paper, the industry hasn’t converged on a single approach.

The 8500G has two Zen 4 and four slower Zen 4c cores, which are smaller and thus less expensive to implement. A typical multithread program can run its one or two main threads on the faster CPUs and its ancillary threads, which are more likely to sometimes idle, on the slower cores. In a multiprocessing scenario, processes/threads aren’t always ready to run. (Look at your computer’s CPU utilization when you’re busy doing something simple, such as writing in a word processor. It’s usually less than 10%.) The OS can assign the first ready-to-run process/thread to one of the fast cores. On the off chance that others are ready, the OS can assign them to the small ones.

Considering these typical usage models, it makes sense for a processor targeting mainstream users to ensure the common case runs fast by providing a couple of high-performance CPUs and to reduce cost by integrating additional smaller, albeit slower, ones for less-frequent instances demanding more horsepower. Even greater dimorphism could reduce cost further, but AMD had to go to the market with the smaller core it had, the 4c. The CPU was quick and cheap for the company to develop; it’s simply a more compact physical implementation of the Zen 4 architecture.

Intel Alder Lake Took Heterogeneous Multicore in a Different Direction

Intel’s circumstances differed when it began building heterogeneous multicore processors. Whereas the 8500G aimed to reduce the cost of a mainstream PC processor, Intel’s 12000 series (Alder Lake) is a response to boost multithread benchmark scores. Adding more copies of Intel’s large high-performance CPUs (P-cores) would’ve raised die cost too much. A quick-and-dirty retargeting ala Zen 4c was infeasible in Intel’s design methodology. However, Intel had Atom (’mont) cores in its portfolio. Much lower performing but much smaller—and quite area-efficient on balance—these little E-cores were added to boost aggregate throughput.

Intel endowed Alder with several P-cores and several E-cores. The former deliver the performance required by computationally intense workloads (e.g., AAA games), and the latter boost scores on high-profile multithread benchmarks.

Smartphone Chips Established, Then Ignored, Three-Tier Canon

These examples show how heterogeneity can address different goals. Smartphone processors shine additional light on its application. Arm offers little Cortex-A5xx, big Cortex-A7xx, and extra-fast Cortex-Xx CPUs, which processor makers combine in different ways. Smartphone battery life is important and can be extended by shifting workloads to power-efficient little cores and sparingly employing the power-hungry faster cores.

For a time, we saw a three-tier CPU configuration in flagship MediaTek and Qualcomm processors: a single Cortex-X to run the main thread in performance-sensitive use cases (e.g., loading a fresh web page, starting a game), a few Cortex-A5xx for quiescent cases (e.g., background tasks), and a few Cortex-A7xx for normal interactive use.

As logical as this three-tier configuration is, it hasn’t persisted. The MediaTek Dimensity 9300, for example, has three tiers but employs only two CPU microarchitectures, Cortex-X4 and Cortex-A720, optimizing one X4 for performance, three X4s for power and area, and its four A720s for power and area. The Qualcomm Snapdragon 8 Gen 3 also departs from three-tier canon but differently, employing three microarchitectures in four tiers. Apple for its part employs only two core types in its smartphone (and PC) processors, using CPUs of its own design.

No Correct Answer

In conclusion, the best heterogeneous multicore configuration depends on a processor’s cost, performance, or power targets. It also depends on the vendors’ circumstances, including the cores available to it. Even so, the diversity of designs among smartphone processors indicates that there’s no obvious single correct configuration for a given target and suggests future-generation PC processors will diverge from Phoenix 2 and Alder Lake.





error: Unable to select