Tsavorite Scalable Intelligence revealed a few details of its AI accelerator late last year and disclosed that it had booked $100 million in orders. A further bold claim is Cuda compatibility. Taking advantage of its modular architecture, the company targets diverse applications from the edge to the data center.
The company’s founders and team are seasoned Silicon Valley executives. Several hail most recently from Tanzanite, which Marvell acquired and commercialized its products as the Structera CXL processors. Tsavorite plans to ship production silicon and systems this year. In case you’re wondering, Tsavorite (and Tanzanite) are gems found in Tanzania, where company founder and CEO Shalesh Thusoo grew up.
What We Know
- Chiplets—Tsavorite is developing two chiplets, OmniFlex for computing and SkyFlex for both computing and memory interfacing.
- Hardware scalability—Chip products can combine different quantities of the chiplets. The smallest standard implementation has only a SkyFlex pair.
- Fabric—Thusoo calls the Tsavorite fabric the heart of the company’s innovation. It supports linking chiplets to form chips and chips to form nodes. Ethernet provides scale-out connectivity.
- Memory hierarchy—Calling its chips OPUs, Tsavorite’s architecture implements a flat, unified address space, supporting coherency among an undisclosed number of OPUs.
- This approach should simplify programming.
- It should also improve compute-unit utilization by streamlining data sharing.
- Tsavorite believes that its architecture can scale to support large language models (LLMs) with context lengths of 100 million tokens.
- Arm cores—whereas many NPUs employ RISC-V cores, Tsavorite uses Arm CPUs.
- This could indicate an aim to produce a unified processor (see also Tachyum) for agentic AI, which entails NPU-based AI models calling CPU-based software.
- The company describes NPU and CPU functions as coprocessors, not accelerators, suggesting a more cooperative arrangement between these units than a typical NPU/GPU and host.
- Superlinear scaling—Tsavorite claims an eight-OPU implementation is 8.2× faster than a single OPU. The eight OPU’s larger resource pool, the fabric, and the memory hierarchy improve the statistical multiplexing of resources, raising utilization.
- FPGA—The company has supplied FPGA prototypes to customers, which has helped instill the confidence required to book $100 million in orders.
- Software—Tsavorite bills itself as a compiler-first company. It has focused on supporting PyTorch, Triton, and Cuda.
- Cuda compatibility may be a misnomer. The OPU doesn’t run Cuda directly. Instead, Tsavorite supplies tools and libraries/APIs to automate ahead-of-time code conversion.
Bottom Line
Having made extraordinary claims, Tsavorite is on the hook to supply extraordinary evidence. NPU startups commonly tout better performance and efficiency than Nvidia’s GPUs, but schedule slips and software completeness ultimately hold them back. We wait for Tsavorite to demonstrate its technology; meanwhile, it has convinced a few customers of its viability (if not also its superiority) and to sign on the line that’s dotted, which is more than some NPU startups have accomplished.
For the Byrne-Wheeler Report’s discussion of Tsavorite, click on the image at the top of this page.
