Ceva has released a new AI engine (NPU) for TinyML (lightweight embedded) workloads. The licensable NeuPro Nano (NPN) targets consumer and industrial SoCs and doesn’t require a separate host. It’s a small DSP with NPU blocks, including the expected hardware for multiplying tensors and performing activations plus functions for decompressing weights and quantizing data. The former reduces the memory required to store models and unzips them on the fly. The latter facilitates models’ use of small types, such as INT4, which the NPN supports. Ceva offers the design in a base configuration capable of four 32-bit MACs per cycle or 32 four-bit MACs per cycle.
AI functions are coming to embedded (edge) applications, including battery-powered hearing aids and headphones and low-cost condition-monitoring systems for industrial machines. While their processing loads are low enough for a fast CPU alone, an accelerator can be more energy and area efficient. The compact Ceva NeuPro Nano targets such designs and includes power management (e.g., DVFS) to improve battery life. Some alternatives, such as Arm’s Ethos accelerators, require a host CPU, which Ceva’s design can either completely obviate or enable the use of a less-costly core. Other alternatives, such as the Syntiant NDP250, employ novel architectures to reduce power. While many NPUs deliver more AI-processing throughput each generation, the Ceva NeuPro Nano goes in the opposite direction, extending the company’s NPU family to reduce performance, power, and die area. In so doing, it will help chip developers bring AI capabilities to a different class of applications while preserving familiar von Neuman processing and CMOS technology.