Untether AI has folded shop, ceasing supply and support for its hardware and software, and AMD is picking up the team, according to the AI accelerator startup. Untether shipped two product generations and delivered superior power efficiency. However, its NPUs had limitations and faced a relentless incumbent, all in the context of a rapidly evolving industry. The startup joins other data-center NPU companies that have failed (e.g.,) and will surely not be the last.
At-Memory Computation
Like other companies developing AI accelerators de novo, Untether formulated an architecture unlike that of a GPU. The company employed “at memory” computing, distributing SRAM and processing elements throughout its design (and integrating 1,458 RISC-V CPUs in its second-generation SpeedAI240), as its Hot Chips presentation explains. Obviating off-chip memory reduces power and should improve processing-element utilization. The power savings enabled Untether to fit the SpeedAI240 on a standard PCI card. The company backed its claims by publishing performance on MLPerf (but only for ResNet-50) and is one of the few companies to submit MLPerf power results, proving the SpeedAI240 could be much more efficient than an Nvidia GPU.
The original Untether NPU, the RunAI200, leaned too much into integrated memory, providing no external-memory option, which limited model size. It also only supported INT8 data. The second-gen SpeedAI240 (Boqueria) added LPDDR5 interfaces to 64 GB of external memory. eight-bit floating-point (FP8 and BF16) support. The former requires less power than INT8 and quadruples throughput in the Boqueria architecture. The latter’s greater precision improves accuracy, albeit at reduced throughput.
Bottom Line
These changes were a step in the right direction, but 64 GB is shy of the amount required to run a large language model (LLM) such as Llama 70B on a single chip. (Multiple SpeedAI240s ganged together could, in theory, handle bigger neural networks.) Unable to handle popular LLMs and unproven (as far as we know) for recommendation models, Untether’s AI accelerators were practically restricted to convolutional neural networks (CNNs) like ResNet-50. These have widespread application (e.g., Untether was pursuing agricultural robotics), but the CNN niche proved to be unsustainable despite Untether’s clear value proposition.
The open question is whether Untether’s demise signals an overdue shakeout among data-center NPU companies. Other failures include Graphcore, Wave Computing, and Intel, but many businesses with dwindling funds, slim prospects, and unclear technical merits remain. The pecking order is established: Nvidia sits at the top, hyperscaler-proprietary designs rest below them, then comes AMD, and every other supplier is just crusher run. The best opportunity for companies in the lattermost group (aside from sovereign-AI boondoggles) would seem to be a hyperscaler acquisition, contributing intellectual property and expertise, but AMD’s acqui-hire of the Untether team indicates even that exit is unavailable.