OpenAI’s GPT-5 is a cost cutting exercise • The Register

Comment For all the superlative-laden claims, OpenAI’s new top model appears to be less of an advancement and more of a way to save compute costs — something that hasn’t exactly gone over well with the company’s most dedicated users.

As the flag bearer that kicked off the generative AI era, OpenAI is under considerable pressure not only to demonstrate technological advances, but also to justify its massive, multi-billion-dollar funding rounds by showing its business is growing.

To do that, OpenAI can either increase its user base, raise prices, or cut costs. Much of the industry is already aligning around its $20 and $200 a month pricing tiers. So OpenAI would need to offer something others cannot to justify a premium, or risk losing customers to competitors such as Anthropic or Google.

With the academic year about to kick off, OpenAI is sure to pick up a fresh round of subscriptions as students file back into classrooms following the summer break. While more paying customers will mean more revenues, it also means higher compute costs.

Enter the cost-cutting era.

Perhaps the best evidence of cost-cutting is the fact that GPT-5 isn’t actually one model. It’s a collection of at least two models: a lightweight LLM that can quickly respond to most requests and a heavier duty one designed to tackle more complex topics. Which model prompts land in is determined by a router model, which acts a bit like an intelligent load balancer for the platform as a whole. Image prompts use a completely different model, Image Gen 4o.

This is a departure from how OpenAI has operated in the past. Previously, Plus and Pro users have been able to choose which model they’d like to use. If you wanted to ask o3 mundane questions that GPT-4o could have easily handled, you could.

In theory, OpenAI’s router model should allow the bulk of GPT-5’s traffic to be served by its smaller, less resource-intensive models.

We can see more evidence of cost-cutting in OpenAI’s decision to toggle reasoning on and off by default automatically, depending on the complexity of the prompt. Freeloaders… we mean free tier users, don’t have the ability to toggle this on themselves. The less reasoning the models are doing, the fewer tokens they generate and the less expensive they are to operate.

But while this approach may be smarter for OpenAI’s bottom line, it doesn’t seem to have made the models themselves all that much smarter. As we addressed in our launch day coverage, OpenAI’s benchmarks show rather modest gains compared to prior models. The biggest improvements were in tool calling and curbing hallucinations.

Your eyes aren't deceiving you, GPT-5 shows only iterative improvements in math benchmarks like AIME 2025

Your eyes aren’t deceiving you, GPT-5 shows only iterative improvements in math benchmarks like AIME 2025 – Click to enlarge

The new system depends on the routing model to redirect prompts to the right language model, which, based on early feedback, hasn’t been going all that well for OpenAI. According to Altman, on launch day, GPT-5’s routing functionality was broken, which made the model seem “way dumber” than it actually is.

Presumably this is why GPT-5 thought that “Blueberry” has just one B. Now it appears that OpenAI has fixed that rather embarrassing mistake.

But since GPT-5’s router is a separate model, the company can, at least, improve it.

Deprecating models

The router model isn’t OpenAI’s only cost-cutting measure. During the AI behemoth’s launch event last week, execs revealed that they were so confident in GPT-5 that they were deprecating all prior models.

That didn’t go over great with users, and CEO Sam Altman later admitted that OpenAI made a mistake when it elected to remove models like GPT-4o, which, despite its lack of reasoning capability and generally poorer performance in benchmarks, is apparently quite popular with end users and enterprises.

“If you have been following the GPT-5 rollout, one thing you might be noticing is how much of an attachment some people have to specific AI models. It feels different and stronger than the kinds of attachment people have had to previous kinds of technology (and so suddenly deprecating old models that users depended on in their workflows was a mistake),” he wrote.

Nonetheless, fewer models to wrangle means more resources to go around. 

OpenAI doesn’t disclose much technical detail about its internal (non open-source) models, but if GPT-5 is anything like the dev’s open-weights models, gpt-oss-20b and gpt-oss-120b, and it was quantized to MXFP4, OpenAI has good reason for wanting all those legacy GPTs gone.

As we recently explored, the data type can reduce the memory, bandwidth, and compute required by LLMs by up to 75 percent compared to using BF16.

For now, OpenAI restored GPT-4o for paying users, but we have no doubt that, once OpenAI figures out what makes the model so endearing and how they can apply it to GPT-5, they’ll do just that.

Lack of context

In addition to architectural changes, OpenAI opted not to increase GPT-5’s context window, which you can think of as its long-term memory. Free users are still limited to an 8,000-token context while Plus and Pro users cap out at 128,000 tokens.

Compare that to Claude’s Pro plan, which Anthropic prices similarly to OpenAI’s Plus subscription, and which offers a 200,000 token context window. Google’s Gemini supports contexts up to 1 million tokens.

Larger contexts are great for searching or summarizing large volumes of text, but they also require vast amounts of memory. By sticking with smaller contexts, OpenAI can get by running its models on fewer GPUs.

If OpenAI’s claims about GPT-5 hallucinating up to 80 percent less than prior models are true, then we expect users to want larger context windows for document search.

With that said, if long contexts are important to you, the version of GPT-5 available via OpenAI’s API supports context windows up to 400,000 tokens, but you’ll be paying a pretty penny if you actually want to take advantage of it.

Filling the context just once on GPT-5 will set you back about 50 cents USD, which can add up quickly if you plan to throw large documents at the model consistently.

Altman waves his hands

Altman has been doing a fair bit of damage control in the days since GPT-5’s debut.

In addition to bringing GPT-4o back, paid users can now select and adjust GPT-5’s response speed among Auto, Fast, and Thinking. He’s also boosted rate limits to 3,000 messages per week.

On Monday, Altman laid out OpenAI’s strategy for allocating compute over the next few months, which will unsurprisingly prioritize paying customers.

Once ChatGPT’s customers get their resources, Altman says, API use will take precedence at least up to the current allotted capacity. “For a rough sense, we can support about an additional ~30% new API growth from where we are today with this capacity,” he wrote in an X post.

Only then will OpenAI look at improving the quality of ChatGPT’s free tier or expanding API capacity. But worry not, if Altman is to be believed, OpenAI will have twice the compute to play with by the end of the year.

“We are doubling our compute fleet over the next 5 months (!) so this situation should get better,” he wrote. ®


Source link

Leave a Reply

Your email address will not be published. Required fields are marked *