Alongside the Cortex-X925 and other client CPU updates, Arm refreshed its GPU architecture. The company reports the Arm Immortalis-G925 makes graphics and AI applications 35% faster. Behind this speedup is a combination of architecture changes and a shady comparison between a 14-core hard macroblock CSS offering and a 12-core soft implementation of the prior-generation GPU. A level comparison would likely show only a 10% gain.
Collaborating on Game Engines
Significant by its absence, DirectX support remains unavailable for Arm GPUs, indicating the company is understandably staying focused on Android and embedded Linux. We can, therefore, infer that any Arm-compatible PC processors that come to market to challenge Qualcomm Snapdragon X will integrate AMD or Nvidia GPUs, even if they employ Arm Cortex CPUs.
At the same time, the distinction between smartphone and PC gaming is blurring. Arm disclosed it is collaborating with Epic Games to port the desktop version of the Unreal Engine and Lumen ray-tracing pipeline to Immortalis. Arm is also working with Unity, including supporting the Sentis AI framework.
What’s in a Name?
Introduced two years ago, the Immortalis moniker is part of Arm’s belt-and-suspenders branding strategy. With each GPU generation, the company designates a minimum feature constellation required to use the new name. Lesser GPUs must use the older Mali brand. Simultaneously, an alphanumeric model number reinforces the distinction. This generation, the company applies the Arm Immortalis-G925 name to GPUs with 10 or more cores and ray tracing. Those with 6–9 cores are the Mali-G725. Smaller configurations are the Mali-G625.
What’s in a Core?
As alluded to above, one way to raise GPU performance is to add shader cores. Each GPU generation can also update the cores’ design. Arm adapted this generation’s architecture to changing workloads, such as more AI in games and greater use of variable-rate shading, by supporting large memory pages, adding format-conversion hardware, and reworking the depth/stencil unit.
In addition to the cores, Arm GPUs have logic to manage memory, receive commands from the host and dispatch work to the cores, and divide scenes into tiles. The new design can fuse primitives in some cases, doubling tiler throughput, and it adds hardware to improve job dispatch. In conjunction with these updates, Arm has raised the largest configuration from 16 to 24 cores.
Skipping Work
However, the big change this year is the addition of a fragment prepass stage. In graphics processing, the primary technique to increase throughput is to avoid operating on stuff that the user won’t see because it’s offscreen or occluded. The fragment prepass operation adds to a list of techniques to assess visibility. It executes enough fragment shading to determine visibility, dropping unseen fragments and passing the rest downstream to have their geometry recomputed and then shaded. Previously, code running on the host would sort objects before sending them to the GPU. By obviating this step, the fragment prepass operation reduces CPU render-thread cycles by up to (i.e., less than) 43% according to Arm.
The Immortalis-G925 also enhances the previous-generation ray-tracing engine. Arm withholds details but reports that the frame rate goes up by 27% by default on an internal workload employing ray tracing. Alternatively, the new engine allows the developer to sacrifice accuracy, which can boost the workloads’ frame rates by 52% while reducing memory traffic 57%. The latter saves power and can keep ray tracing from constraining other memory-intensive operations.
No Contest
In smartphone designs, Arm has no competition. Apple and Qualcomm use internally developed GPUs. Samsung has employed AMD-sourced GPUs in its Exynos processors but hasn’t yet been successful. Other GPU licensors, such as Imagination Technologies and VeriSilicon (Vivante) are no longer a factor in smartphone designs. They’ll compete with Arm for other SoCs, but Arm is better positioned. Most SoCs employ Arm CPUs, enabling the company to offer an enticing bundle. Arm’s financial resources also enable it to regularly add features, raise performance, and improve area and power efficiency.
At the same time, Immortalis/Mali customer concentration is high. The largest smartphone-chip supplier, MediaTek, uses Arm GPUs. Other customers are almost too small to matter.
Bottom Line
The Arm Immortalis-G925 will help licensees deliver leading gaming performance and battery life in flagship smartphone processors, while the lesser Mali configurations will help flesh out their midrange with feature-rich GPUs. The new CSS hard-macro option should not only increase processing speed and efficiency but also reduce development time.
We would like to see Arm push ray tracing beyond high-end configurations to entice more game developers to use it. Its silicon cost inhibits adoption, but Arm’s game-engine collaborations should lead to more titles using the technology, helping to justify broader hardware adoption. Through its apples-to-oranges comparisons, Arm has muddied the new GPU’s generational speedups but nonetheless can show it’s indeed continuing to deliver better GPUs every year as required to keep the smartphone upgrade cycle humming.