AMD Steps Towards Becoming a Major NPU Supplier with MI300X and MI300A

December 6, 2023

In December 2023 AMD disclosed details of the MI300X and MI300A accelerators. The HPC and AI GPU is a packaging tour-de-force and promises more computing performance than rival’s offerings, including Nvidia’s H100 (Hopper). The Frontier supercomputer will employ the MI300, as will Meta, Microsoft, and Oracle. These latter three will use it for AI, and our analysis focuses on that domain instead of HPC.

AMD MI300 Notables

Customer wins—AMD withheld how much customers will buy, but launching with multiple wins is positive regardless of their size.
Benchmarks—AMD compared the MI300 to the H100 on various tests, spurring back-and-forth with Nvidia. Unsurprisingly, AMD just can’t wring as much performance out of Nvidia’s chip as Nvidia can. AMD should’ve launched with MLPerf results.
Packaging—The MI300 stacks logic dice and employs other advanced technologies. Importantly, it looks like it’s as composable as Epyc. AMD varies the CPU count of its server processors by integrating different numbers of die and is to scale the MI300 down in a similar fashion. This is a more economical approach than taping out a new die or deactivating functions to reduce performance.
Hardware—AMD purposefully employed the OCP-compliant form factor used by some Nvidia-based designs to simplify system integration.

Competition

The MI300’s impact on Nvidia is neutral. A viable alternative to Team Green’s hegemony was inevitable, but there’s no reason to expect wholesale defection among data-center customers to AMD. AMD’s success in server processors is instructive: even after years of fielding a better chip and relatively low costs of switching from Xeon to Epyc, AMD’s data-center business is still smaller than Intel’s.

Intel, however, is severely impacted. The company’s Gaudi 2 chip was shaping up to be the best alternative to Nvidia’s GPUs. Although Gaudi 2 couldn’t match the H100’s throughput, Intel successfully showed its performance and scalability–things AMD has yet to show for the MI300. However, it’s AMD that’s claiming big-name customers. The forthcoming Gaudi 3 should retain Gaudi 2’s strengths and raise performance, but that’s the end of the road for the architecture. Intel’s next-generation data-center neural processing unit (NPU) will build on it’s HPC GPU instead.

Customers

The value the MI300 delivers customers varies by their type. Meta, for example, is among the most sophisticated AI-processing customers, developing, training, and deploying proprietary models. Engaging with the hardware at the lowest levels, Meta’s engineers must cope with the differences between AMD’s and Nvidia’s hardware and software. At the same time, they have the skills to jump vendors. And, buying a lot of hardware, Meta has an incentive to find a second source.

At the other extreme, customers renting time from a cloud provider and working only with canned models should be indifferent to the underlying hardware that run them. Those that develop their own models in PyTorch or other frameworks should also theoretically care little about hardware. However, a lot depends on the completeness of AMD’s implementations.

Bottom Line

The MI300A/MI300X is an excellent step down the long path to becoming a significant supplier of data-center-class NPUs.

If You Enjoyed This Post, Read one of These:

Posted

December 6, 2023

in

New Product Analysis

by

Joseph Byrne

Tags:

AMD, data center, NPU (AI accelerator)