“If transformers go away…we’ll be hosed,” Etched CEO Gavin Uberti told Bloomberg in March. Last week Etched received $120 million to develop its Sohu NPU, promising a chip 10× faster than the Nvidia Blackwell GB200 and 20× faster than a Hopper H100 on Llama-3 70B and other transformer-based AI models. Articles online have raised the following doubts:
- Is it possible to make an NPU that much faster than Nvidia’s fastest GPUs?
- Is developing a chip optimized solely for transformers too risky?
- Couldn’t Nvidia or another big company do the same thing?
- Can 22-year-old dropouts make a big chip?
Etched Sohu’s Promised Gains Are Feasible
Etched’s funding announcement addresses their technology’s feasibility. Sohu implements a transformer-specific pipeline, forgoing the flexibility other NPUs afford. In so doing, it can:
- Occupy a greater proportion of die area with FMA units (math hardware), raising peak flops per chip.
- Improve FMA-unit utilization, raising effective flops.
The company calculates only 3.3% of the transistors on the H100 are for matrix multiplication. Although this probably ignores extra logic included to increase manufacturing yields, the point stands that a denser design could theoretically achieve greater performance. Moreover, the company argues that memory access does not bottleneck inference performance. Thus, a 20× gain is feasible.
Betting On One Model Is Risky
Developing a chip solely for transformers is indeed risky. We see transformers remaining popular, but a fixed-function chip could prove inefficient if a new model employs a vastly different activation function or makes another unanticipated low-level change.
As an example of what could happen, consider the case of a networking-focused processor that I know. A customer required it to parse packets at a fast rate, extracting five parameters; thus, a hard pipeline was added to do this. After the chip was done, the customer decided it wanted to extract two additional parameters, which couldn’t be accomplished at the required rate, negating the special pipeline’s value. Etched could find that a new model that has only a small change becomes popular but doesn’t run well on Sohu.
Indeed, data-center NPUs designed with only CNNs in mind became obsolete when customers turned to transformer and recommendation models. The Meta MTIA, for example, targets the latter but retains enough programmability to run other workloads. The Etched Sohu differs, exclusively handling transformers.
Success Could Attract Copycats
Etched could inspire copycats. Slashing LLM-processing costs by 90% could inspire hyperscalers to develop proprietary transformer-specific chips. Likewise, other merchant suppliers could duplicate the startup’s approach or split the difference between today’s flexible designs and a fixed-function device and add transformer-specific hardware. However, if Etched executes well, it will have a head start.
Etched Has Adult Supervision
A few guys in a dorm room or garage can’t independently make a monster chip, but they can do the modeling that justifies their thesis and proves an architecture’s benefits. Having been taken under Peter Thiel’s wing and now well capitalized, Etched is no longer just a few young guys. Management includes experienced executives and others. Startups’ abilities to execute are always an open question, but Etched’s youth and inexperience shouldn’t be a direct factor, given investor oversight and experienced managers.
Bottom Line
Etched is betting that the market will value a transformer-specific NPU over more flexible alternatives. The company can show at a high level the feasibility of big performance gains. Its approach is risky, but this risk is exactly what has kept others from blazing the same trail. At face value, Sohu certainly isn’t a bad idea. It could shake up the data-center market, unlike the numerous other Nvidia alternatives that have emerged.