DeepSeek didn’t really train its flagship model for $294,000 • The Register

Chinese AI darling DeepSeek’s now infamous R1 research report was published in the Journal Nature this week, alongside new information on the compute resources required to train the model. Unfortunately, some people got the wrong idea about just how expensive it was to create.

The disclosures led some to believe the Chinese AI darling had actually managed to train the model at cost of just $294,000 USD, a figure much lower than previously reported. In reality, the true cost to train the model was roughly 20x that. At least.

The confusion stemmed from the supplementary information released alongside the original January paper, in which the AI model dev revealed it had used just 64 eight-way H800 boxes totaling 512 GPUs running at full tilt for 198 hours to train the preliminary R1-Zero release, and another 80 hours or so to complete it.

Along with about 5,000 GPU hours to generate the supervised fine-tuning datasets used in the training process, the entire endeavor came out to a hair under $300,000 — a pretty damning claim considering the tens of billions dollars American model devs have burned this year alone.

But, that’s not actually what happened. Never mind the fact that $300,000 won’t buy you anywhere close to 512 H800s (those estimates are based on GPU lease rates not actual hardware costs), the researchers aren’t talking about end-to-end model training.

Instead, it focuses on the application of reinforcement learning used to imbue its existing V3 base model with “reasoning” or “thinking” capabilities.

In other words, they’d already already done about 95 percent of the work by the time they’d reached the RL phase detailed in this paper.

There are several ways to approach reinforcement learning, but in a nutshell, it is a post-training process that typically involves reinforcing stepwise reasoning by rewarding models for correct answers, encouraging more accurate responses in the process.

The paper very clearly centers on the application of Group Relative Policy Optimization (GRPO), the specific reinforcement learning technique used in the model’s training. Instead, headlines touting the $294,000 training cost appear to have confused reinforcement learning, which takes place post-training, with the more costly pre-training process used by DeepSeek V3.

How do we know? Because DeepSeek’s research team disclosed how much compute it used to train the base model. According to that paper, DeepSeek V3 was trained on 2,048 H800 GPUs for approximately two months. In total, the model required 2.79 million GPU hours at an estimated cost of $5.58 million.

Since you can’t have R1 without first building V3, the actual cost of the model was closer to $5.87 million. Whether or not these figures have been intentionally understated to cast western model devs as frivolous hype fiends is a subject of intense debate.

It’s also worth pointing out that the cost figures are based on the assumption that those H800 GPUs could be rented for $2/hr. By our estimate, the purchase cost of the 256 GPU servers used to train the models is somewhere north of $51 million. And that doesn’t take into account research and development, data acquisition, data cleaning, or any false starts or wrong turns on the way to making a successful model.

Overall, the idea that DeepSeek was substantially cheaper or more efficient to train than Western models appears to be overblown. DeepSeek V3 and R1 are roughly comparable to Meta’s Llama 4 in terms of compute. Llama 4 required between 2.38M (Maverick) and 5M (Scout) hours to train, but was trained on between 22 and 40 trillion tokens. DeepSeek V3 is larger than Llama 4 Maverick, but used significantly fewer training tokens at 14.8 trillion. In other words, Meta trained a slightly smaller model in slightly fewer GPU hours using significantly more training data.®


Source link

Leave a Reply

Your email address will not be published. Required fields are marked *