8-core Ascalon X floorplan

Tenstorrent Licenses Ascalon RISC-V CPU and Tensix Neo NPU IP


For a few years, Tenstorrent has licensed to select partners the CPU and AI-accelerator designs it created for its chips. Now, the company is formalizing this business, broadly licensing its RISC-V and AI (NPU) designs (IP). The top-end Ascalon X CPU delivers instruction throughput (IPC) approaching that of an Arm Cortex-X (or Arm Lumex Ultra) core. The Tensix Neo is an NPU core that customers can configure and replicate to address different performance requirements. A key selling point is that Tenstorrent employs the same IP in its chips, helping to ensure the designs deliver on their promises.

Although focused on AI chips and systems, Tenstorrent announced in 2023 that it would supply technology to consumer-electronics giant LG and automotive-semiconductor startup BOS. With the new formalization of the business, Tenstorrent offers IP supported by a reference chip and design resources. The Ascalon family has out-of-order CPUs differing in performance, power, and area and supplemented by peripheral IP for interrupts, interfaces, debugging, and power management. A reference chip, Atlantis, is an eight-core Ascalon X processor complete with I/O interfaces, GPU, and video decoder.

Ascalon

The Ascalon family complies with the RVA23 profile, which has emerged as the de facto standard for high-performance RISC-V CPUs. In addition to the X (extreme-performance) model, Tenstorrent offers the Ascalon H (high-performance), S (so-so-performance), and U (ultra-low-power/area) models. The company compares the H and S with the Arm Cortex-A720 (which is similar to the newer Lumex Pro) and Arm Cortex-A78 CPUs. We expect the Ascalon U to be similar to the Arm Cortex-A72.

The H, S, and U models derive from the X, leveraging Tenstorrent’s modular design methodology. Our coverage focuses on Ascalon X, shown in Figure 1. Like competing high-performance CPUs, it’s a wide design, integrating eight decoders. Execution resources include six integer ALUs (two of which also handle branches), three load/store units, two floating-point data paths, and two 256-bit vector data paths.

Ascalon X microarchitecture
Figure 1. Tenstorrent Ascalon X microarchitecture. (Source: Tenstorrent.)

By comparison, the Arm-compatible Qualcomm Oryon-L has an additional decoder, the same number of integer units (although three can handle branches), two load/store plus two load units, and four FP/vector units. Because the lattermost are each 128 bits, total vector throughput is similar to two 256-bit units. The RVA23-compliant Condor Cuzco is eight-wide and has additional execution resources: eight integer, four branch, four load/store, and four 256-bit vector units. However, it can dispatch only eight instructions per cycle. We expect Ascalon X and Cuzco to achieve similar performance.

Design Resources

To complement the Ascalon CPUs, Tenstorrent integrates them with the blocks Figure 2 shows along with on-chip interconnect, providing customers with a complete computing subsystem. The company also offers various simulation tools, including a SystemC model, Synopsys Imperas tools, the Synopsys ZeBu Cloud emulator, and testing scripts and libraries.

Ascalon cluster
Figure 2. Tenstorrent Ascalon X computing subsystem. (Source: Tenstorrent.)

The Atlantis SoC is a complete processor fabricated in a 12 nm TSMC process. To be the basis of a development kit, the chip integrates an eight-core Ascalon X subsystem, common peripherals (e.g., timers), low-speed interfaces (e.g., SPI), high-speed interfaces (e.g., Ethernet, USB, and PCIe), DRAM controllers, a 4K video decoder, and an Imagination GPU.

Roadmap

Tenstorrent is developing an automotive-targeted version of Ascalon X and an associated subsystem, Alexandria. The design will support ASIL-B and ASIL-D safety qualifications, including lockstep operation for the CPUs, power management, and interrupt controller.

Beyond Ascalon X, the company is developing a next-generation CPU called Callandor. Targeting a 2027 release, the design aims to almost double per-cycle throughput—a stunning gain if achieved. In the meantime, Tenstorrent may release refined Ascalon designs that boost IPC by 10% to 20%.

Tensix Neo NPU

Beyond CPUs, Tenstorrent has also begun licensing AI-accelerator (NPU) cores. The Tensix Neo is an NPU building block comprising multiple small RISC-V cores, vector and matrix engines, and other functions. It’s an evolution of the block employed in the Tenstorrent Wormhole and can be configured for different throughput, specific data types including FP4, and memory size. A simulator framework helps customers map their workloads to Tensix Neo and configure their models and hardware to meet their requirements.

Bottom Line

When we covered the Tenstorrent Grayskull, we labeled the company unfocused, noting it sold chips and boards and dabbled in IP licensing. It’s no longer dabbling. The company offers a family of CPUs and an NPU core to the broad market, not just select partners. Ascalon X is among the fastest RISC-V cores available, which Tenstorrent supports with simulators, a reference implementation, and a subsystem including peripheral IP. The market for high-end CPUs is limited, more so for RISC-V designs. However, Tenstorrent’s earlier licensing arrangements indicate that customers are out there. If demand takes off, the company has a comprehensive offering.


Posted

in

by


error: Selecting disabled if not logged in