As noted in our Embedded World 2024 roundup, Syntiant has introduced the NDP250 NPU (AI accelerator) that integrates the company’s new Core 3 engine and more memory to quintuple its predecessor’s peak performance. The chip also sports an Arm Cortex-M0 CPU and a Cadence Tensilica HiFi 3 DSP. Targeting low-power, always-on designs, the NDP250’s milliwatt scale offers 30 GOPS of number crunching for video and audio applications, such as automotive security and smart doorbells.
Notables
- Architecture—Syntiant NPUs implement an at-memory architecture. This design type employs multiple processing engines like other NPUs but distributes memory among them instead of requiring external memory. This reduces power and increases performance because data isn’t constantly shuttled between engines and distant memory. Syntiant claims a hundred-fold performance and efficiency improvement over running models on an MCU’s CPU. Other features provide a further 10× improvement on streaming audio or video data. Moreover, Syntiant NPUs execute all AI-processing tasks, whereas some competing NPUs rely on a host CPU to handle activations and other functions, potentially creating a bottleneck and requiring added power.
- Other cores—Like its previous NPUs, the Syntiant NDP250 also integrates a Cortex-M0 CPU and Cadence (Tensilica) HiFi3 DSP but adds memory for these cores. The M0 manages the chip and enables it to operate like a standalone SoC, either as a simple design’s main chip or a coprocessor that listens for wake events while its host processor powers down. The HiFi3 performs audio processing, such as preparing samples for inference.
- Bigger is better—A drawback of at-memory architectures, however, is poor performance when a model doesn’t entirely fit on chip. Therefore, to run bigger models, the Syntiant NDP250 integrates 6 MB of memory compared with only 1 MB in the company’s previous generation. As before, multiple models can run simultaneously on a single chip.
- Business model—In addition to NPUs and tools to enable them, Syntiant offers AI-modeling software, turnkey models, and design services, supporting chips other than its own. Able to target NPUs, CPUs, and DSPs, the software’s ability to handle sparse data, for example, speeds up execution without impacting accuracy; the company has demonstrated how it can double Llama 7B’s token rate. For its own models, the company has training data that includes both model-dependent and -independent items to improve false acceptance and false rejection rates. For example, to identify cars in a scene, Syntiant trains its models on not only pictures of cars but also of other objects. The company’s complementary products inform its chip designs and help it react quickly to customer requirements.
Competition
Various companies have targeted low-power, low-cost edge-AI applications. Arm is a chief competitor, licensing its Ethos U series NPU designs. Able to offer them to its vast customer base and in conjunction with its CPUs, Arm has important business advantages. Even so, like other edge NPU suppliers, Arm doesn’t sell production-ready models, services, and modeling software as Syntiant does. Moreover, Syntiant’s unusual—but by no means unique—at-memory architecture confers performance and power advantages.
Customers
Designers of video doorbells, security cameras, driver-awareness electronics, and other embedded systems are executing more AI functions locally instead of in a data center to improve responsiveness and reduce cloud costs. Compared with handling these functions on a host processor, offloading them to an NPU decreases power and increases performance. On these metrics, the Syntiant at-memory architecture is better than the alternatives when an entire model fits within a single chip. Adding capacity, the NDP250 can run bigger or more simultaneous models than earlier Syntiant NPU chips. Moreover, many customers don’t have the expertise or resources to develop and train proprietary models, making Syntiant worth considering solely for its software and modeling businesses.
Bottom Line
As much as large language models (LLMs) have captured the public’s attention and their training has driven NPU/GPU revenue, inference will drive the AI semiconductor industry. Most inferencing will transpire outside of the cloud—at the edge—to improve application responsiveness, maintain privacy, and diffuse cost. Low-power NPUs such as the Syntiant NDP250 also decrease energy consumption by being more efficient than data-center accelerators for small workloads and obviating transmitting audio or video samples across the internet to a cloud service provider. Syntiant’s combination of business offerings and NPU technology positions it to be among the few edge-AI startups to prevail.