SiFive has extended RISC-V matrix processing with its new Intelligence XM series CPUs, adding a matrix engine to its vector-enabled cores. Targeting mainly AI workloads, the company positions the augmented CPU to be used singly for low-cost, low-performance designs or replicated hundreds of times for high-throughput data-center accelerators. A further goal is to standardize a basic matrix-computation unit, providing a degree of compatibility among math accelerators and enabling chip designers to focus on SoC-level differentiation.
The RISC-V vector extensions (RVV) can already improve matrix-math throughput compared with the base scalar instruction by operating on data a row/column at a time and performing strided loads/store. Further extensions map matrices onto the vector register file and define new matrix operations. However, additional scaling requires a dedicated engine.
SiFive’s approach tiles small matrix units to make a bigger engine. To improve its utilization, multiple CPUs (four in the exemplary case) share the engine. Note, however, that the matrix engine only handles outer products. Functions such as activations rely on the CPUs’ vector units, which operate on results transferred from the engine’s accumulators. Likewise, to set up computations, the vector units transfer data to the engine.
Involving the CPU core and vector units should improve flexibility compared with a self-contained AI accelerator, and many designs will already have RVV-capable cores. To reduce area, the lowest-cost microcontroller-class designs, however, may favor CPUs without vector extensions, preferring instead an attached matrix engine to perform all neural-network processing.
Bottom Line
Numerous companies already employ RISC-V CPUs in their NPUs, including the Tenstorrent Wormhole, Meta MTIA, and Untether SpeedAI. All use the CPUs differently, leaving it unclear how much is to be gained from standardizing matrix operations. Moreover, SiFive’s implementation isn’t an official RISC-V extension—the community is still defining the contours of such an extension, much less implementation and opcode specifics. In the meantime, however, customers requiring fast matrix processing will find SiFive offers a scalable, area-efficient design.