Microsoft is deploying its second-generation AI accelerator. The new Maia 200 NPU targets inference, with raw performance similar to that of the Nvidia Blackwell but requiring much less power. Maia only employs Ethernet, compared with other accelerators that employ a different technology for local (scale-up) interconnect.
Microsoft has withheld details; its statements indicate that Maia 200 (code-named Braga) shares a high-level architecture with the original Maia 100. The chips divide resources into clusters made up of tiles. Each tile comprises a tensor (matrix) unit, vector unit, data-movement engine, control processor, and local (L1) memory. Each cluster also has a data-movement engine, control processor, and cluster-level (L2) memory. On-chip memory totals to 272 MB, more than Groq’s LPU, which Nvidia recently licensed.
Processors such as DSPs and NPUs typically rely solely on data-movement (DMA) engines and local memories instead of employing caches and hardware-based coherency. Whereas CPU programmers prefer data movement and memory management to be automatic, DSP and NPU programmers prefer to explicitly control these functions. Microsoft highlights how its approach enables retaining intermediate data and buffering data in flight to minimize access latency and maintain compute-unit utilization.
Maia Systems and Software
Microsoft’s system-level architecture assembles four Maias into a fully connected quad, linking together quads with Ethernet switches to scale to a 6,144-NPU cluster. Eliminating switches in the bottom tier reduces cost, power, and latency. Nvidia, for comparison, has 72 switch-connected GPUs at its bottom tier. Google directly attaches 64 TPUs in a 4 × 4 × 4 cube topology. Microsoft runs a proprietary protocol over Ethernet, and the two together help Maia integrate into the Azure control plane.
Maia 200’s high-level software stack looks like its predecessor’s. Microsoft emphasizes its collective communication library (MCCL), a counterpart to Nvidia’s NCCL. This software bridges AI frameworks such as PyTorch with the underlying hardware, helping developers manage data movement and computing. The PyTorch framework and Triton compiler are Maia’s principal high-level tools.
Bottom Line
When learning of a new product, one must first ascertain the problem it solves. This is especially important when alternatives exist. Microsoft is motivated by cost. Skyrocketing demand has bid up Nvidia GPU prices, and a more power-efficient alternative would reduce operating costs. Assuming the hardware and software work well, the Maia 200 should be cheaper to deploy and run than the Nvidia Blackwell. The GPU company claims the forthcoming Rubin will be 10 times more efficient at inference, but Maia 200’s acquisition cost should be lower. Microsoft may not attract many customers from the popular and dominant Nvidia GPUs (which Azure also offers), but it’s likely to use Maia for its own workloads and those of its partner, OpenAI.
For more information, see Microsoft’s Maia 200 blog post.

