Arm Ethos-U85 graphic. Set your email program to always show pictures.

Arm Ethos-U85 NPU Raises Edge AI Performance and Efficiency

Say what you want about the tenets.

The new Arm Ethos-U85 anchors the top end of the company’s NPU portfolio, as we noted in our Embedded World 2024 roundup. Configurable from 256 GOPS to 4 TOPS, it scales to both lower and higher performance levels than the earlier Ethos-U65, and it supports Cortex-A CPUs in addition to Cortex-M models. Arm also includes the Ethos-U85 in its Corstone-320 design, a preintegrated and preverified IP combination.

Notables

  • Improvements—The Arm Ethos-U85 does more than scale up the U65 design by enlarging the multiply-accumulate (MAC) array. The company raised performance and efficiency through numerous changes, including chaining operators in hardware instead of passing intermediate results through memory—improving resource utilization and reducing energy consumption. The U85 also updates the weight decoder, which reads and decompresses weights from memory, preparing them for the MAC array to use. The U85 can also execute more operators than earlier Ethos models and can support transformer networks. Arm states the TinyLlama model fully maps to the U85, with no operators falling back to the host CPU.
  • Area—Scaling down a high-performance design often fails to be as power- and area-efficient as developing a core for lower performance from the start. Arm, therefore, designed the U85 for only 4 TOPS—a lot for an MCU but less than what smartphone and PC processors deliver. As with other scalable designs, larger U85 configurations are more area efficient because they amortize the NPU’s fixed-area component over more TOPS. Arm estimates that a 2,048-MAC configuration is only seven times the size of a 128-MAC one.
  • Memory—The U85 integrates roughly the same amount of SRAM per TOPS as the previous U55 and U65, with memory capacity ranging from 29 KB to 267 KB. Whereas those NPUs had two 64-bit and 128-bit AXI ports, respectively, the U85 supports up to six 128-bit ports, providing more bandwidth to memory to keep the NPU from starving for data. The at-memory Syntiant NDP250, by contrast, integrates 6 MB in its neural core, yielding area for better power and performance. Arm’s conventional memory hierarchy constrains chip cost but may impact system cost by necessitating more external memory, and it may be better for larger models.

Competition

As an NPU licensor, Arm’s biggest advantage is its near omnipresence in SoC designs, owing to the popularity of its broad line of CPUs. We expect every Ethos-U85 to have a Cortex CPU attached and for Arm to price the combination such that the two together cost less to license and integrate than an ala carte Cortex plus a competitor’s NPU. The CPU—be it a Cortex-M or a Cortex-A core—hosts applications employing AI and is a fallback for operations unsupported by the NPU. Other competitors, however, such as Cadence, Ceva, and Synopsys, also license other technology—including CPU and DSP cores—and claim many customers. Consequently, we expect NPU suppliers to remain fragmented. Smaller, independent licensors must fight mightily to land customers and ecosystem partners, and the technologically superior ones could sell their operations to chip companies.

Customers

Disclosed Arm Ethos-U85 customers include Alif and Infineon. Considering the NPU’s improved efficiency, we expect adoption by other U-series licensees, which include Nuvoton, NXP, and Synaptics. The NPUs aren’t binary compatible, but Ethos-U55/65 users can recompile their models for the U85. Its added performance and better transformer support will attract additional customers across the spectrum of edge-AI applications, such as security cameras, factory automation, and smart retail systems.

Bottom Line

The Arm Ethos-U85 is a significant upgrade over the previous-generation U65, supporting a broader configuration range, improving performance and power efficiency, and adding Cortex-A and transformer-network support. The latter is important as customers develop applications for generative AI at the edge, such as employing scaled-down large-language models to improve user interfaces. Having improved its NPU offering over multiple generations and being the leading CPU licensor, Arm is well positioned to enable designers to add AI acceleration to their chips.


Posted

in

by


error: Selecting disabled if not logged in