Like a bolt from the blue, a new GPU company has emerged. Even more unexpectedly, it’s not targeting AI. Startup Bolt Graphics has unveiled its Zeus GPU. Targeting high-performance workloads, it emphasizes 64-bit operations, path/ray tracing, interchip communication, and memory capacity.
The company plans to have Zeus dev kits by the end of the year and offer early access to a Zeus-based server in 2026. A chiplet-based design, Zeus will be available with one, two, or four computing dice. Each die integrates DDR5 and LPDDR5X DRAM interfaces and display interfaces. A separate I/O die provides host-GPU and GPU-GPU interfaces.
Rendering
For rendering, Bolt targets animators, such as those making game cutscenes and feature-length films, and others creating 3D models, such as of buildings and products. The company aims to solve the problem that CPU-based rendering delivers the requisite quality but is slow (hours per chip per frame), while GPU-based rendering can be real time but only at reduced quality. Zeus aims to give creators a full-quality working view. Bolt estimates that the biggest Zeus configuration renders path-traced content 10× faster than the newest top-end desktop Nvidia GPU.
Called Glowstick, Bolt’s path-tracing software is nearing beta release. It includes access to commercial texture libraries and supports standards such as OpenUSD (Pixar-developed scene-description tools), OpenImageIO (image-conversion library), and Open Shading Language (originally created by Sony Pictures). It also implements a programmable shader and a post-render pipeline.
High-Performance Computing
To speed up HPC workloads, Zeus has more FP64 units and a beefier memory hierarchy than other desktop and data-center GPUs. Bolt estimates that the biggest Zeus’s throughput simulating electromagnetic (EM) waves will be 300× that of the Nvidia Blackwell B200. Zeus, therefore, will be able to simulate larger and finer-grained designs. Bolt bundles EM software with the GPU.
The Bolt Zeus Design
Zeus’s memory hierarchy includes 32 KB of cache per FP32 core. For external memory accesses, peak throughput per FP32 core is similar to a high-end desktop GPU and about 75% of the Nvidia B200. Off-chip memory capacity, however, is huge. The basic single-chiplet Zeus supports 160 GB, and the four-chiplet model has up to 2.25 TB of local memory. To reduce costs, Zeus doesn’t employ HBM.
Per GPU conventions, a ×16 PCIe port attaches Zeus to a host processor. However, Zeus also has a second one to connect to another Zeus board or storage. Also unusual is Zeus’s 800 Gbps Ethernet port (restricted to 400 GbE for PCIe-card designs), which enables GPU-to-GPU communication without traversing a host processor. In fact, a host isn’t needed at all. Bolt is developing a 2U server with four quad-chiplet GPUs. Ethernet links the GPU sockets on the mainboard as well as connects multiple GPU servers in an approach reminiscent of Intel Gaudi. Alternatively, a hyperscaler operating a customized Ethernet network could swap Zeus’s I/O chiplet for a proprietary one.
Inside the compute chiplet, Bolt employs an architecture similar to a few NPUs, such as the Tenstorrent Wormhole. A mesh connects a grid of computing units. Each unit has a RISC-V CPU, a vector engine, and a cache. Another unusual aspect of Zeus is that each unit also has accelerators. Bolt withholds their details but indicates they handle operations such as path tracing for graphics rendering and math functions for HPC applications. It’s an accelerator that gives Zeus its 300× performance advantage over Blackwell on EM simulation.
Bottom Line
Any new entrant challenging big, entrenched competition will face skepticism. Nvidia dominates the Top 500 for designs employing accelerators, and the fastest systems employ the AMD MI300. For any processor (XPU), software enablement must be available, and developers must overcome reluctance to switch to a new environment. At first, Bolt targets customers programming general-purpose x86 (or Arm) microprocessors and seeking a speedup. They’ll use the LLVM compiler to retarget their code and hand-tune the results for greater performance.
This approach initially limits Bolt’s market to customers that write code, which is more common for rendering and HPC than for other workloads but isn’t universal. Reporting orders for tens of thousands of servers, the startup is off to a good start but must still complete silicon design. In addition to greater performance, the company promises to reduce customers’ costs, delivering more throughput per capex and opex dollar and obviating separate high-speed Ethernet cards (and, optionally, host processors). Although Bolt targets a niche, it’s one that’s underserved as established GPU suppliers focus on AI.