Huawei is grabbing headlines for its newest data-center AI chips (NPUs) taking aim at Nvidia GPUs. For more than five years, the company has been developing its Ascend NPUs and associated systems, catering to customers hungry for a local supplier. A recently disclosed CloudMatrix system integrates 384 Ascend 910C chips, and a more powerful Ascend 910D is in development. The Wall Street Journal reports 910D samples are due back from the fab by midyear. It also estimates Huawei will ship 800,000 NPUs this year.
A Famous Huawei Ascend Customer
Thinly sourced reports state DeepSeek employs the Huawei Ascend 910C for inference, following a demand spike after the release of DeepSeek-R1. At a minimum, Huawei ported the open-weights model and offers it independently through its cloud service, which doesn’t set the company apart from Microsoft and others that also ported the model. Other reports state that DeepSeek is employing 910-family chips for training to yet-to-be-released R2 model.
The 910’s Ascent
Disclosed in 2019, the original Huawei Ascend 910 promised greater theoretical performance than the Nvidia V100. The V100 was a couple of years old at that point, but the successor-generation A100 had yet to appear.
With the 910, Huawei targeted training, having already developed an inference-only chip. Actual performance lagged the V100, and the Financial Times reported users found the 910 and its successors unsatisfactory. They mainly blamed the 910’s software.
Inference, however, doesn’t require as extensive software support as training. Thus, while DeepSeek employs Nvidia GPUs to train its models, executing them on other chips is feasible. Therefore, we see the Ascend family—like competing AI processors—principally as an Nvidia alternative only for inference. Since-deleted DeepSeek reports indicate the new Ascend 910C is 60% as fast as the Nvidia H100, which is in line with the two chips’ relative peak BF16 performance. Details, however, are unavailable, and the H100 also supports FP8 at twice the throughput.
The original Ascend 910 was a chiplet design, combining a single computing die, an I/O die, and multiple HBM stacks. The follow-on Ascend 910B revamps the 910’s DaVinci AI cores and increases on-chip memory. We believe the 910B raises HBM capacity and interface speed as well. Raw FP16 and INT8 performance likely grows by 25% proportionally with increased silicon area. Essentially putting two 910Bs in one package, the Ascend 910C employs two computing dice, each like the 910B’s die, and doubles the HBM capacity. The unannounced Ascend 920 should further scale throughput and memory.
System-Level Scaling
Scaling can also take place at the system level, and the multirack CloudMatrix 384 raises peak BF16 performance beyond that of the Nvidia NVL72 system, which employs 72 Nvidia Blackwell GPUs. The Ascend 910 family integrates the Huawei-proprietary I/O port that enables glueless multichip connections. To span 384 chips, we believe the CloudMatrix combines 48 eight-NPU shelves. Fully interconnecting the Ascends, copper connects chips within a shelf, and optics link the shelves.
Bottom Line
Huawei is one of dozens of companies developing data-center AI processors to compete with Nvidia. Beyond the profit motive shared by the others, Huawei is helping its host country pursue its technological independence. U.S. policies implemented in the past several years only heighten China’s need for a homegrown NPU.
Although it has developed the Ascend 910 family for several years, Huawei still lags behind Nvidia. The policies that restrict China’s access to Nvidia GPUs also limit Huawei’s access to leading-edge process nodes and HBM. On the other hand, the company should have adequate development resources, yet its software still lags—a situation that it must rectify to compete in training.
The CloudMatrix 384, however, shows the company can assemble an impressive system. It can’t match the power efficiency of Nvidia’s biggest design but can theoretically beat its performance on some metrics. This should enable CloudMatrix to run the biggest models, but the company must show how it can scale out to handle training.