Nuclei System Technology logo

Speedy Nuclei CPU Supports Apps and Real-Time Processing


At the RISC-V Summit North America, Nuclei disclosed its new licensable UX1030H CPU. A midrange configuration of the company’s high-end 1000-series cores, the design complies with RISC-V’s RVA23 profile and supports real-time as well as application processing. Nuclei targets the UX1030H at performance-sensitive systems, such as communications, computing, and storage infrastructure.

Based in Shanghai, Nuclei System Technology offers general-purpose CPU product lines ranging from low-cost designs (IP) for controllers to the high-performance 1000 series. Additional families adapt these cores for systems requiring security, safety, or AI features. With the 1000 series, like other IP vendors, Nuclei offers several variations on a common microarchitecture to address different price, performance, and area (PPA) requirements. The UX1060 is the maximal configuration and integrates six decoders. The UX1020, UX1030, and UX1040 have two, three, and four decoders, respectively.

Founded in 2018, the company is headed by Jianyang Peng, who previously had CPU-engineering leadership roles at Synopsys and Marvell—a background similar to that of the company’s founder, Bob Hu. In August, VeriSilicon declared its intent to acquire the startup. Nuclei has more than 300 customers, and they have shipped more than 100 million chips with its CPUs.

Nuclei UX1030H Core Design

The UX1030H complies with the RVA23 RISC-V profile. Having emerged as the de facto standard for application processing, the profile requires hypervisor and vector extensions. Nuclei supplements the design with peripherals and caches, including features for real-time computing and SoC integration.

Scalar Core

The integer core is a three-way out-of-order design with a 12-stage pipeline, dispatching up to six scalar and two vector instructions per cycle. Nuclei targets 1.6 GHz operation in a 22 nm process. The company rates the UX1030H at 5.35 DMIPS and 8.5 CoreMarks per megahertz. Especially considering the comparatively high CoreMark score, we believe the Dhrystone score to be conservative and expect SpecInt2006 per megahertz to be in the 7–8 range. Scalar performance should be similar to the Arm Cortex-A72 or Lumex C1-Nano. Vector performance could be greater, depending on the configuration.

Vector Unit

Vector processing in RISC-V has more flexibility than SIMD architectures, such as Arm Neon and x86 AVX. Vector register sizes (VLEN) can be a multiple of data-path size (DLEN), and Nuclei allows licensees to configure these dimensions to achieve their PPA targets. A popular configuration sets DLEN to twice that of VLEN, necessitating two passes through an execution unit to complete. However, implementations such as Nuclei’s allow chaining consecutive operations: after one part of a vector passes through an execution unit, it can go to another unit before the first unit completes operating on the whole vector.

The UX1030H can dispatch two vector instructions per cycle to three execution units. One performs load/store operations, another only does multiplication, and the third handles other operations. Supported VLEN values are 128 and 256 bits. Like the scalar core, the vector-processing unit (VPU) can execute instructions out of order.

Memory Hierarchy and System IP

The UX1030H offers numerous memory-related options. The first-level caches can range up to 64 KB, and the company offers a cluster (L2) cache of up to 4 MB that 16 cores can share. At the next level, a system cache can sit on the cache-coherent interconnect. Software can reserve part of the cluster cache as a local memory (CLM) accessible by the whole SoC. At the same time, parts of the cache can be dynamically configured for access only by specific cores. For real-time processing, licensees can provide some cores in a cluster with tightly coupled local memories (ILM and DLM).

Further supporting real-time use, these cores can employ a memory protection unit (PMP) instead of the default memory-management unit (MMU) required by a high-level operating system, such as Linux. Nuclei also offers system-level IP, including an IOMMU to help other SoC functions share memory securely. A cluster can also have an I/O coherency port (IOCP). This enables blocks such as an NPU (which Nuclei offers) to coherently access a cluster’s caches with greater performance than traversing an SoC’s interconnect.

Other system IP from Nuclei includes hardware implementing the RISC-V advanced interrupt architecture (AIA). Moreover, each cluster contains an interrupt controller compliant with the PLIC RISC-V standard and required by RVA23 for application processing. For real-time processing, each CPU also contains another interrupt controller (ECLIC) compliant with the RISC-V CLIC specification, as Figure 1 shows.

Nuclei UX1000 series CPU cluster diagram
Figure 1. Nuclei UX1000-series CPU cluster and peripherals. Up to 16  UX1030H instances can reside in a cluster alongside I/O, memory, and other optional blocks. (Source: Nuclei System Technology.)

Competition

At higher performance echelons, the number of RISC-V competitors decreases. Nonetheless, Nuclei customers have numerous alternatives. From the U.S., Akeana and Tenstorrent have higher-performance designs available in scaled-down configurations. Like Nuclei, SiFive has a broad CPU portfolio, and its RVA22-compliant P400 CPU lines up with the UX1030H. From its home country, the Nuclei CPU competes with the X100 from Spacemit and the Alibaba/XuanTie C910. Additionally, it faces the AX40 series from Andes.

The Nuclei UX1030H is a good option for customers seeking to scale up from lower performance tiers or requiring an application core in addition to control CPUs. By contrast, Akeana and Tenstorrent are better for customers seeking to scale down from higher performance. Compared with Spacemit and XuanTie, Nuclei inspires more confidence, having experienced engineering leadership and company-proprietary cores.

Andes and SiFive provide the stiffest competition, offering CPU portfolios—including UX1030H alternatives—similar to Nuclei’s. The UX1030H stands out for its RVA23 compliance and features for real-time processing. Nuclei also offers additional IP, such as I/O controllers (not discussed here) and CPU peripherals, a differentiator that would only heighten if the VeriSilicon deal closes.

Bottom Line

Nuclei System Technology has a low profile outside China, but its wins demonstrate that it’s a major RISC-V supplier. The UX1030H is suited to real-time and application-processing designs. Its performance is below that required of the primary CPUs in a computer or smartphone, but it’s sufficient for embedded systems. Chip designers worldwide should evaluate it for their next projects.


Posted

in

by

Tags:


error: Selecting disabled if not logged in