Is AI reaching the limits of its capability?

Why the biggest AI companies went quiet on AGI, and what's actually happening instead.

Apr 05, 2026

I’ve been doing this IT thing now for more than 20 years, and I think I’ve seen similar hype cycles before. There’s a pattern: a breakthrough happens, everyone extrapolates it to infinity, money floods in, and then reality starts to push back. I think we’re somewhere in that push back phase with AI right now. I think it’s worth talking about what’s actually going on, because it’s more interesting (and more hopeful) than the headlines suggest.

This isn’t a “doom” piece. I’m not here to tell you AI is a dead end. But I do think something important has shifted, and most people outside the tech bubble haven’t heard about it yet.

What shift am I talking about?

When GPT-4 landed in early 2023, the assumption in the AI world was pretty straightforward: keep making these models bigger, keep feeding them more data, and they’ll keep getting smarter. The path to AGI (artificial general intelligence, the hypothetical point where AI can do anything a human can) seemed like a straight line. Just add more.

Then GPT 4.5 happened.

OpenAI tried to do exactly that. Their internal project, reportedly codenamed Orion, attempted to go five to ten times bigger than GPT-4. It didn’t get meaningfully better. Meta ran into the same problem. So did others. Fortune magazine called it a “$19.6 billion pivot.”

Ilya Sutskever, one of OpenAI’s co-founders, put it bluntly in late 2025: “The age of scaling is over... we have but one internet.” He left the company shortly after and started his own lab. Yann LeCun, formerly heading AI at Meta, said something similar: “Scaling AI won’t make it smarter.” He also left and raised a billion dollars to pursue a completely different approach.

Cal Newport, a computer science professor at Georgetown University, described this shift in a recent conversation with Hank Green that I found particularly clear-headed. He said the industry hit a wall with pure scaling, and everything since then, the reasoning models, the post-inference computation, all the increasingly complicated techniques, has been “trying to find little narrow places where we could tune to get some advantage.” In his words: “It was actually the beginning of the end of pure scaling.”

Now, I should be honest here. This isn’t a clean, settled story. Some very smart people disagree. Dario Amodei, who runs Anthropic, said in March 2025 that “the Scaling Law has not hit a wall” and predicted “a radical acceleration in 2026.” Google’s Gemini 3 achieved major performance improvements earlier this year with the same number of parameters as its predecessor, through better training techniques rather than brute force. So the wall might not be about scale itself, but about how you scale. Still, the days of “just make it bigger” appear to be over.

We still don’t really understand what we’ve built

Here’s something that should probably get more attention than it does. Researchers are still working to understand the inner workings of very large language models (LLMs). These systems process information through billions of interconnected parameters, and the honest truth is that nobody fully understands how they arrive at the answers they give.

This matters because these models are already making decisions that affect people. Credit applications. Insurance assessments. In some places, recommendations that feed into criminal sentencing. And when the answer is wrong, or when it’s biased in ways that reflect the data it was trained on, there’s no straightforward way to open the thing up and find out why.

Hank Green described this as “algorithmic cruelty” in his recent video essay on AI concerns. His point was sharp: “If there is no way to investigate how those choices are being made, if we cannot crack open the black box and see what happened... this is a kind of cruelty that we should not accept.”

I’ve spent most of my career helping organisations understand their own systems and processes. I’ve seen what happens when a business can’t explain how its own decisions get made. It’s a governance nightmare at even a modest scale. With that problem applied to tools that are being used by hundreds of millions of people, making countless decisions per day, and growing more embedded in critical infrastructure all the time; it’s a scary prospect.

The bigger the model, the harder it is to interpret. And we’re building our future on top of systems we can’t fully explain.

Why the industry went quiet on AGI

For a couple of years, AGI was all anyone in the AI world wanted to talk about. The next big model would be the one. Superintelligence was around the corner. Then the scaling problems started to show, and the conversation shifted... quietly.

This is where it gets interesting, because the silence tells you something.

Cal (as mentioned above) outlined three reasons why the big AI companies have every incentive to keep the AGI narrative going, even when their own evidence suggests the path isn’t as clear as they once promised.

First, it’s a distraction. If the public is focused on hypothetical future risks (Will AI take all our jobs? Will it become Terminator?), they’re paying less attention to the actual problems these products are causing right now. Things like AI-induced psychosis from sycophantic chatbots, the theft of creative intellectual property to train the models, or the environmental cost of running them.

Second, it’s a competitive weapon. Scary AI talk leads to heavy regulation. Heavy regulation costs a lot to comply with. The big companies can afford that. Start-ups can’t. I like the term Cal used to describe this, “regulatory capture”; the companies help write the regulations, and the regulations protect the incumbents.

Third, it attracts money. If investors believe AGI is imminent, they’ll keep funding the companies most likely to get there first. As Cal put it: “It’s scary to think about AI automating half of jobs, but if I’m an investor, I better have my money in the company that’s going to do half of the jobs.”

Now, I want to be careful here. I don’t think this is all cynical theatre. Some of these people genuinely believe they’re building something that will change everything, and they might be right in the long run. But the incentive structures are worth understanding, because they shape what gets talked about publicly and what gets quietly shelved.

Sam Altman, the CEO of OpenAI, let something slip recently that I think is revealing. He said: “The main thing consumers want right now is not more IQ... personalisation is the real moat.” That’s a pretty significant shift from almost everything he’s been talking about to date around AGI.

A handful of companies

Whatever you think about the technology, there’s a structural question here that I think should concern everyone.

Right now, a small number of companies are building the systems through which billions of people will access, process, and interpret information. OpenAI, Google, Anthropic, Meta, and xAI and maybe a couple of others are shaping how the world will interact with knowledge itself.

Hank drew a historical comparison that stuck with me. He talked about how different communication technologies either concentrate or distribute power. Radio was a narrowing technology. Only a few could broadcast, and if you wanted to stage a coup, you captured the radio station. Television was even more concentrated. The internet was the opposite: suddenly anyone could publish, broadcast, participate. It fractured the old power structures.

AI, he argues, is narrowing again. “We’re headed into a world where there’s three radio stations and they control reality for everyone”. Elon Musk talks openly about using Grok to influence behaviour, like influencing people to have more children. I’m sure I don’t need to tell you what I think about that?

The financial barriers reinforce this concentration. Meta alone has committed $115 to $135 billion in capital expenditure for 2026. In 2025, 73% of all AI investment by value went to mega-deals over $100 million, totalling $258.7 billion. You simply cannot compete at the frontier without that kind of money, which means the frontier is an exclusive club with very few members.

Whether those members have earned our trust to define how the world accesses information is a question worth sitting with.

The obvious sustainability problem

Frontier models are extraordinarily hungry for compute power. Every time someone sends a query, the model has to multiply through hundreds of billions of parameters. Every single token (roughly, every word or piece of a word in the response) requires that full calculation. It’s the most expensive computational operation we do right now. I believe I’m right when I say that in term of raw compute, AI chat outweighs all the super computer modelling (disease research, weather science etc.) we’re doing as a species at the moment.

Power companies want to build more infrastructure (that’s how they make more money). AI companies want to claim they need more (to justify investment and intimidate competitors). In a world where electricity is already unaffordable for a lot of people, and the additional demand from AI infrastructure is pushing prices up further. If there’s a cold snap and the grid can’t handle both the data centres and people’s heating, I’m hoping that we choose to turn data centres off before people’s heating, because people will die.

To put things in perspective. Processing a million conversations through a frontier LLM costs somewhere between $15,000 and $75,000. Processing the same volume through a small language model (SLM) [I’ll get onto those shortly] costs between $150 and $800. That’s potentially a hundred times cheaper. When the gap is that large, you have to ask whether the frontier approach is the right one for the vast majority of real-world uses.

The quiet revolution

This is the part of the story that I find genuinely exciting, and it’s the part that gets the least airtime. There’s a growing body of evidence that the future of AI doesn’t look like one massive, do-everything model. It looks like lots of smaller, specialised models, each good at something specific, coordinated by an orchestration layer that routes tasks to the right tool for the job.

A bot called Pluribus, built by a researcher at Carnegie Mellon (now at OpenAI), which was the first system to beat professional poker players at Texas Hold’em with real money on the line. The interesting part: they originally tried scaling up with massive neural networks. It didn’t work well enough. The system that actually won could run on a laptop. It wasn’t one giant brain. It was a clever combination of smaller components. Some neural network, some future prediction, some strategy simulation, each doing what it was best at.

Another example was a diplomacy-playing bot (It’s a boardgame based on World War One, similar to risk with more, well, diplomacy). It had a language model, a future simulator, and a strategy engine, with a control module written by humans in conventional code. When the researchers wanted it to not lie (lying is a core strategy in diplomacy), they could simply tell it not to, because the system was transparent enough to enforce that constraint. Try doing that with a frontier LLM!

To quote Cal once again, “When you move away from ‘all we have is a language model,’ you have multiple components. There’s a control module written by humans in normal computer code. Nothing unknown. Nothing learned. Nothing that evolves.” That transparency is worth more than we currently give it credit for.

Around 40% of production AI deployments now use what’s called a “hybrid router” pattern: route the easy queries (the 80 to 95% of requests that are straightforward) to small, efficient models, and only escalate the hard 5 to 20% to the expensive frontier models.

The real-world examples are compelling:

Checkr, a background check company, replaced GPT-4 with a fine-tuned version of Llama-3-8B (an open-source model from Meta with 8 billion parameters). Result: 90% accuracy, five times cheaper, thirty times faster.
DoorDash achieved a 98% reduction in evaluation turnaround time and 90% fewer hallucinations by switching to SLM-powered systems.
Microsoft’s Phi-4-reasoning model, at 14 billion parameters, achieves capabilities comparable to model fifty times its size.
DeepSeek R1 achieved performance on par with OpenAI’s o1 model while only activating 37 billion of its 671 billion total parameters. That’s a 94% reduction in active compute.

Researchers are calling this trend the “Densing Law”: capability density (how much a model can do per parameter) roughly doubles every three and a half months. The same performance keeps getting achievable with fewer and fewer resources.

To quote AI’s favourite sentence: “Why this matters”

If the future really is specialised models running on smaller hardware, the barrier to entry for building useful AI drops dramatically. You don’t need a hundred-billion-dollar data centre. You don’t need NVIDIA’s latest supercomputer cluster.

A 3-billion-parameter medical model recently outperformed open-source models up to three times its size on medical diagnostic tasks by 25%. Enterprise on-premise AI (companies running their own AI in-house rather than paying for cloud access) grew from 12% adoption in 2023 to 55% in 2025. That’s a massive shift in just two years.

The MMLU benchmark (a standard test used to compare language model capabilities) showed the gap between frontier and open-source models narrowing from 17.5 percentage points to just 0.3 in a single year. One analyst prediction from 2025 puts it starkly: “By 2027, frontier models will not be strategic assets. They will be commodities.”

That’s a world where smaller companies, industry specialists, healthcare providers, legal firms, education platforms, can build AI tools that genuinely work for their specific needs without paying rent to one of three or four tech giants. That’s a more competitive, more innovative, and frankly more interesting market.

But here’s the complication

Of course this isn’t a reality that’s escaped the big AI companies, if the future is orchestration and specialised models; who’s actually best positioned to build that infrastructure?

Google already offers Gemini 3 Pro at the top, Flash and Flash-Lite in the middle, and Gemma (open-source, ranging from 1 billion to 27 billion parameters) at the bottom. OpenAI has o3 for the hard problems and GPT-4.1-nano with fine-tuning for the cheap ones. Anthropic’s Haiku model runs at a third of the cost of its more capable Sonnet.

I mentioned regulatory capture before, and that’s really the idea there, put the other people out of business before we pivot to these smaller, lower-cost machines, because the barrier to entry there is smaller. So AI companies involved with this shift and are trying to lock in their advantages before it arrives.

There’s a real possibility that the orchestration layer, the thing that decides which model handles which task, becomes the new bottleneck. The models themselves might get commoditised, but the platform that coordinates them could end up just as concentrated as the current frontier.

My research brief for this article (Yes, my brief to a large frontier AI model) put it in terms I keep coming back to: “The technology is democratising, but the business models are consolidating. Smaller players can now build competitive AI products, but the orchestration layer might end up controlled by the same big players.”

So the opportunity is real, but it’s not guaranteed. Whether the SLM revolution actually distributes power more broadly or just reshuffles which part of the stack the big companies control... that’s still being decided. And it depends partly on choices that businesses, regulators, and users make over the next few years.

What do I think the reality will be?

Like every technology revolution, I think things will be more insidious and nuanced than we initially imagine. Just like the robotic revolution is quietly happening, not though each of us having a humanoid robot in our homes; instead with a bunch of specialised robots, a smart vacuum, a smart mower and a washing machine which tells us when it’s done with the load. I imagine AI will continuously embed itself into our daily lives through specialist applications, without us noticing. In fact I feel like it’s been doing that already, I mean, your Facebook/Instagram/Tiktok feed algorithm is the same multi-dimensional embedding transformer tech that LLMs are built on.

Like everyone, I have my own views on where we’re at in this AI bubble and what its (inevitable?) collapse might mean, I might write about that soon.

However, I do think we’re going through a huge shift in the way we approach technology, knowledge and most importantly decision making. Do we want to have a say in how it shapes our industries, our workplaces, and the way we relate to information? Or do we let a handful of companies, led by people with very specific financial incentives, make those decisions for us (again)?

Hank Green said something that I think is worth ending on. He was talking about the tension between worrying about hypothetical future problems and dealing with what’s right in front of us. His conclusion: “What if dealing with the problems we have right now is the thing that makes the future better?”

I think he might be onto something.

Discussion about this post

Ready for more?