Late last year, Marvell began sampling its first Structera products. The company designates Structera A models as near-memory accelerators and Structera X as memory-expansion controllers. In both cases, they bridge CXL interfaces to DRAM to raise memory bandwidth or capacity per server-processor core. Data-center workloads, such as AI recommendation models and in-memory databases, benefit from these improvements. Moreover, Structera X enables cloud service providers to populate expansion modules with DIMMs recycled from decommissioned systems.
Call It a DPU?
We’re inclined to label Structera A a processor because of its 16 Neoverse-V2 cores. The predecessor to the Neoverse V3 and based on the Cortex-X3, the V2 targets infrastructure applications such as servers. In addition to Marvell’s designs, the CPU has found use in Nvidia’s Grace and Amazon’s AWS Graviton processors. A unit targeting data processing, Structera A wears the DPU moniker well.
Compared with a server processor, Structera A has much more DRAM bandwidth per CPU: one DRAM interface for every four cores versus one per 10 in x86 chips such as AMD Turin (Epyc) and Intel Granite Rapids (Xeon). Customers can execute bandwidth-constrained workloads (or parts thereof) on Structera to speed up processing.
In addition to the CPUs and interfaces, Structera A includes compression and encryption engines that operate on data passing through the chip. Compression enables more data to fit in memory, but its effectiveness varies. Sparse and structured data will shrink more than that with more randomness. Encryption improves security in multitenant scenarios because memory isn’t zeroed when a process releases memory.
Joint Expansion
Structera X omits the 16 Neoverse cores but otherwise provides similar hardware features, offering more capabilities than a basic CXL-based DRAM controller. Whereas Structera A is available in DDR4 and DDR5 models, the X comes only in a DDR4 version. Importantly, Structera A allows its 16-lane CXL interface to divide into two ×8 ports to enable two host processors to share a memory pool.
Between funneling transactions through the CXL port and the lower-throughput DRAM interface, host processors will find accessing memory via Structera X to be slower than directly attached DRAM. For workloads where the additional memory capacity offsets the worse throughput and latency, software can employ various mitigations developed for nonuniform-memory-access (NUMA) systems such as those employing the now-defunct Optane memory. This will be an adoption barrier for many customers, but others will cross it to obtain Structera X’s benefits.
Bottom Line
Hype around CXL has been swamped by AI hype, yielding an opportunity to take a level-headed look at the interface. The Structera products demonstrate CXL’s ability to raise CPU-DRAM bandwidth and to expand and pool memory and also the implementation barriers to doing so: explicitly moving workloads off the host processor and accepting NUMA. Hyperscaler and HPC customers have the motive and the means to employ CXL but not necessarily rank-and-file IT managers. A further issue is a CXL-chip vendor’s ability to receive money for value for something that could reduce to interface conversion. By integrating server-class CPUs plus compression and encryption offloads, Marvell Structera offers differentiated features for sophisticated customers.