Earlier this year, Semidynamics released efficiency data for its all-in-one (AIO) AI-processing element comprising a RISC-V CPU with vector and tensor extensions. The company reports tensor-unit utilization exceeds 70% while executing the Llama2-7B model, regardless of the underlying operations’ matrix size. This is a good performance. We’ve noted other designs achieving utilizations up to 65%, and 50% or less is common. However, a full assessment would require evaluating a design integrating many instances of the Semidynamics element.
Based in Spain, Semidynamics develops customizable RISC-V cores, offering a configurator to enable customers to add instructions, I/O and memory ports, and scratchpad memory. The customization starting points include the out-of-order superscalar Atrevido and the two-wide, in-order Avispado CPUs. To extend these cores, the company offers a customizable vector unit, a tensor unit, and the Semidynamics Gazillion Misses memory-access technology.
The Semidynamics AIO AI Element Building Block
The company’s AIO AI element combines these functions with a CPU. An AI workload employs the tensor unit to accelerate matrix multiplication and the vector unit for activations and matrix transpositions. A chip can tile multiple elements to increase processing throughput.
Semidynamics argues that an AIO design streamlines programming compared with integrating separate general-, matrix-, and vector-processing pools, facilitates scaling and future proofing, and simplifies memory transfers. The lattermost advantage stems from eliminating memory transfers when offloading computations from the CPU to the vector or tensor unit. A further advantage is that enables a shared cache to replace the DMA transactions and SRAM blocks typical of a pooled-processing design.
The tensor unit operates on matrices stored in vector registers, an approach the RISC-V community refers to as the integrated matrix extension (IME). An advantage of the IME is that it adds little architectural CPU state. By contrast, the attached matrix extension (AME) defines new structures to process matrices and store results independent of the RISC-V vector extensions. The SiFive XM lies in the middle but toward the AME side, employing vector instructions to transfer some data between the CPU core and matrix unit.
Gazillion Misses extends the nonblocking cache design that high-performance CPUs employ to handle 128 outstanding cache misses per core. In matrix processing, for example, data can be widely and irregularly scattered in memory, necessitating many cache-line fills. However, in general-purpose computing, load and store requests often go to neighboring addresses. Thus, even if a CPU can track 128 or more outstanding loads, the cache only needs to handle a few dozen misses to fill all of these requests. An AI-processing chip may employ many memory banks, and Gazillion helps to keep them all busy.
Bottom Line
The RISC-V architecture has been widely employed by AI accelerators (NPUs). The Semidynamics AIO AI element is one option for customers building anything from a data-center NPU to an SoC integrating a small accelerator. However, dozens of vendors offer RISC-V cores, and numerous open-source implementations are also available to chip designers. A startup offering yet another core isn’t viable.
Semidynamics, therefore, offers additional technologies. Beyond general computing, the company addresses vector and matrix processing, implementing units for those functions and the complementary Gazillion Misses technology as well as CPU customization. Semidynamics has a differentiated offering and targets a market receptive to RISC-V. Nonetheless, it faces competition for designs and must ensure customers can adapt and deploy its IP with minimal support if it is to grow to a sustainable business.