Imagination Technologies has previewed its next-generation GPU architecture, the E-series. Adding new AI capabilities, the GPU scales from 2 to 200 TOPS. The new architecture also improves graphics processing, promising 35% better frames per second per watt on many workloads. Imagination plans to deliver the first configurations by the end of the year.
New Under the Hood
Two changes distinguish the E-series from its predecessors, such as the Imagination DXT GPU. First, the company supplemented its shader units with neural cores, which are like Nvidia’s tensor cores and excel at matrix math. These speed up raw throughput by an order of magnitude. Whereas the regular vector/graphics throughput of a 1.6 GHz implementation is 13 FP32 or 26 FP16 TFLOPS, the neural cores deliver 100 BF16 TFLOPS or 200 INT8 TOPS. In addition to added performance, area efficiency (TOPS/mm2) increases 3.6×.
Second, Imagination added “burst processors” to supplement the ALUs. Working within the neural cores, these have a shallower pipeline than the ALUs. For example, a fused multiply-add operation (FMA) requires only two cycles instead of approximately ten. Also available to improve graphics workloads, the burst units can only offload FMAs and a few other operations from the regular ALUs.
A shorter pipeline limits peak clock rates, but Imagination finds that constraint moot considering their targeted frequencies and advanced process technologies. A shorter pipeline, however, is easier to keep full, reduces the area and power overhead of registers between stages, and employs only a few bypass paths to feed data back to dependent instructions. Regarding data feedback, Imagination’s regular deep pipeline requires writing results back to the GPU’s large register file. By contrast, the burst processor can avoid register access and employ the bypass path, further reducing power.
Power Efficiency and Flexibility
Like many others, Imagination sees opportunities in accelerating AI. Possessing GPU technology and seeing NPUs’ limitations, the company consolidated its AI initiative into its GPU efforts. It found that the tile-based deferred rendering (TBDR) that it has long employed in power-efficient graphics processing has analogs in AI processing. To decrease power, TBDR reduces data movement by increasing data locality. It divides the screen into tiles and, for each tile, processes the portions of all triangles contained therein. In addition to the power savings, computing-unit utilization increases because there’s less dead time shuffling data.
Compared with many edge-computing NPUs, a GPU should have greater flexibility. Some NPUs have limited capability to run customer code, providing a fixed-function matrix unit and a SIMD-enabled CPU. A GPU, by contrast, is more programmable, distributing vector and matrix processing among shader cores. Moreover, developer tools are typically more mature and extensive for a GPU. Imagination’s E-series GPU also supports many data formats, not just one or two (e.g., INT8 and BF16) like many NPUs. For example, the company has adapted its texture unit to load and unpack FP4 data. Customers with models that use FP4 can take advantage of their reduced memory requirements. Processing throughput is no greater than using FP8, but execution time will hardly be affected if a model is bandwidth limited.
Taking Graphics and AI to the Edge
Imagination targets the gamut of edge applications and has licensees in the automotive, mobile/consumer, and PC markets. The E-series scales from 2 TOPS to 200 TOPS, enabling it to serve such diverse workloads, and the company promises models with functional safety features for automotive and industrial customers. The new GPU architecture has a lead customer, but Imagination has withheld its identity.
Although customers have the option of employing an E-series GPU alongside an NPU to supplement its AI capabilities and provide flexibility, we expect most will use only a GPU for all graphics and AI processing. Mixed workloads can coexist in shader pipelines, and Imagination GPUs support virtualization (partitioning) to enable workloads to claim dedicated resources. Adding AI functions to graphics applications, such as upscaling and frame generation for games and image effects, is also possible and likely much improved with the E-series compared with its predecessors.
Bottom Line
Imagination faces numerous edge-AI competitors, but its GPU-based approach better aligns it with the dominant AI-processing architecture. Moreover, by handling both graphics and AI tasks, it offers greater value. However, comparisons with competing designs must wait until the company discloses specific E-series models later this year. To back up its claims, the company must provide both graphics and AI benchmarks for typical configurations. Nonetheless, it’s the only licensor to have disclosed a GPU with a feature like neural cores. Designers requiring flexible AI acceleration will find the Imagination E-series GPU unique.