Monday, February 10, 2025

DeepSeek’s True Training Cost

SemiAnalysis reports that the company behind DeepSeek incurred $1.6 billion in hardware costs and has a fleet of 50,000 Nvidia Hopper GPUs, a finding that undermines the idea that DeepSeek reinvented AI training and inference with dramatically lower investments than the leaders of the AI industry.
DeepSeek operates an extensive computing infrastructure with approximately 50,000 Hopper GPUs, the report claims. This includes 10,000 H800s and 10,000 H100s, with additional purchases of H20 units, according to SemiAnalysis. These resources are distributed across multiple locations and serve purposes such as AI training, research, and financial modeling. The company’s total capital investment in servers is around $1.6 billion, with an estimated $944 million spent on operating costs, according to SemiAnalysis.

Yazhou Sun and Tom Mackenzie:

The notion that China’s DeepSeek spent under $6 million to develop its artificial intelligence system is “exaggerated and a little bit misleading,” according Google DeepMind boss Demis Hassabis.

[…]

DeepSeek “seems to have only reported the cost of the final training round, which is a fraction of the total cost.”

Previously:

Artificial Intelligence DeepSeek NVIDIA Web

6 Comments RSS · Twitter · Mastodon

Bart

February 10, 2025 3:52 PM

Fishier and fishier.

How come this wasn't covered anywhere before it mysteriously hit the top charts? Am I so out of touch? Or is it the SEO scammers gaming the system who are wrong?

Very few people reported that being at the top of the App Store charts is not just about number of downloads. There are other factors, deliberately opaque, so as usual they are difficult for legitimate developers to navigate but people who specialize in gaming the system have no problem.

Seems more and more like a deliberate economic attack. Maybe I'm crazy. Tell me why that's crazy.

Anonymous

February 10, 2025 4:09 PM

They would say that, wouldn't they?

The only moat available to protect the stupidly high valuations of OpenAI et al is high training/inference costs. Those involved have an interest to make it seem impossible to operate on a shoe string. NVidia has an interest in making a lot of chips. So these claims are about maintaining the status quo for those who benefit from it.

If HighFlyer are spouting government propaganda, then perhaps it is in the interest of China to make it seem as if chips aren't that important, so that they can import more of them more easily, and gain the secondary benefit of harming the competition.

On the other hand, HighFlyer's papers do explain some of the tricks they use to improve efficiency, and those tricks make sense. HighFlyer provides the lowest cost per token in the industry. And they built DeepSeek very quickly. Moreover, they're now advertising for chip packaging specialists which suggests they're developing their own chips, which, given the fact all AI is is lots of matrix multiplications / activation function lookups / DRAM access, should not be too surprising. It's a lot simpler task than building an x86.

So I'm going to give HighFlyer the benefit of the doubt.

Old Unix Geek

February 10, 2025 5:10 PM

That February 10, 2025 4:09 PM message was mine -- I forgot to type my name in.

Another take: https://www.tanishq.ai/blog/posts/deepseek-delusions.html

Bri

February 10, 2025 9:03 PM

I'm with Old Unix Geek on this one -- I'm treating anything the US AI companies say about DeepSeek very very skeptically, especially since there's a lot of whining and butt hurt among them. It wouldn't surprise me if High-Flyer exaggerates the cheapness of developing their model, but the apparent proof is in the difference in cost to customers.

Plume

February 11, 2025 3:07 AM

"SemiAnalysis reports that the company behind DeepSeek incurred $1.6 billion in hardware costs and has a fleet of 50,000 Nvidia Hopper GPUs, a finding that undermines the idea that DeepSeek reinvented AI training and inference with dramatically lower investments than the leaders of the AI industry."

This is a nonsensical sentence. What DeepSeek reported were the actual training costs for the final training run. This is tautological, but the training cost of the model is the relevant number if you want to know how much training the model cost.

How much hardware the parent company owns, on the other hand, is completely irrelevant, since they use that hardware for all kinds of different things.

"The notion that China’s DeepSeek spent under $6 million to develop its artificial intelligence system"

DeepSeek never claimed anything even remotely like that. This is a complete straw man.

Kristoffer

February 14, 2025 2:50 AM

Yeah, this was always a number for the final training round and where I've seen it, it was in comparison to other final training rounds.

DeepSeek’s True Training Cost

6 Comments RSS · Twitter · Mastodon

Leave a Comment