in Carl's Head

Is AI reaching the limits of its capability?

Carl Head — Sun, 05 Apr 2026 05:32:28 GMT

I’ve been doing this IT thing now for more than 20 years, and I think I’ve seen similar hype cycles before. There’s a pattern: a breakthrough happens, everyone extrapolates it to infinity, money floods in, and then reality starts to push back. I think we’re somewhere in that push back phase with AI right now. I think it’s worth talking about what’s actually going on, because it’s more interesting (and more hopeful) than the headlines suggest.

This isn’t a “doom” piece. I’m not here to tell you AI is a dead end. But I do think something important has shifted, and most people outside the tech bubble haven’t heard about it yet.

What shift am I talking about?

When GPT-4 landed in early 2023, the assumption in the AI world was pretty straightforward: keep making these models bigger, keep feeding them more data, and they’ll keep getting smarter. The path to AGI (artificial general intelligence, the hypothetical point where AI can do anything a human can) seemed like a straight line. Just add more.

Then GPT 4.5 happened.

OpenAI tried to do exactly that. Their internal project, reportedly codenamed Orion, attempted to go five to ten times bigger than GPT-4. It didn’t get meaningfully better. Meta ran into the same problem. So did others. Fortune magazine called it a “$19.6 billion pivot.”

Ilya Sutskever, one of OpenAI’s co-founders, put it bluntly in late 2025: “The age of scaling is over... we have but one internet.” He left the company shortly after and started his own lab. Yann LeCun, formerly heading AI at Meta, said something similar: “Scaling AI won’t make it smarter.” He also left and raised a billion dollars to pursue a completely different approach.

Cal Newport, a computer science professor at Georgetown University, described this shift in a recent conversation with Hank Green that I found particularly clear-headed. He said the industry hit a wall with pure scaling, and everything since then, the reasoning models, the post-inference computation, all the increasingly complicated techniques, has been “trying to find little narrow places where we could tune to get some advantage.” In his words: “It was actually the beginning of the end of pure scaling.”

Now, I should be honest here. This isn’t a clean, settled story. Some very smart people disagree. Dario Amodei, who runs Anthropic, said in March 2025 that “the Scaling Law has not hit a wall” and predicted “a radical acceleration in 2026.” Google’s Gemini 3 achieved major performance improvements earlier this year with the same number of parameters as its predecessor, through better training techniques rather than brute force. So the wall might not be about scale itself, but about how you scale. Still, the days of “just make it bigger” appear to be over.

We still don’t really understand what we’ve built

Here’s something that should probably get more attention than it does. Researchers are still working to understand the inner workings of very large language models (LLMs). These systems process information through billions of interconnected parameters, and the honest truth is that nobody fully understands how they arrive at the answers they give.

This matters because these models are already making decisions that affect people. Credit applications. Insurance assessments. In some places, recommendations that feed into criminal sentencing. And when the answer is wrong, or when it’s biased in ways that reflect the data it was trained on, there’s no straightforward way to open the thing up and find out why.

Hank Green described this as “algorithmic cruelty” in his recent video essay on AI concerns. His point was sharp: “If there is no way to investigate how those choices are being made, if we cannot crack open the black box and see what happened... this is a kind of cruelty that we should not accept.”

I’ve spent most of my career helping organisations understand their own systems and processes. I’ve seen what happens when a business can’t explain how its own decisions get made. It’s a governance nightmare at even a modest scale. With that problem applied to tools that are being used by hundreds of millions of people, making countless decisions per day, and growing more embedded in critical infrastructure all the time; it’s a scary prospect.

The bigger the model, the harder it is to interpret. And we’re building our future on top of systems we can’t fully explain.

Why the industry went quiet on AGI

For a couple of years, AGI was all anyone in the AI world wanted to talk about. The next big model would be the one. Superintelligence was around the corner. Then the scaling problems started to show, and the conversation shifted... quietly.

This is where it gets interesting, because the silence tells you something.

Cal (as mentioned above) outlined three reasons why the big AI companies have every incentive to keep the AGI narrative going, even when their own evidence suggests the path isn’t as clear as they once promised.

First, it’s a distraction. If the public is focused on hypothetical future risks (Will AI take all our jobs? Will it become Terminator?), they’re paying less attention to the actual problems these products are causing right now. Things like AI-induced psychosis from sycophantic chatbots, the theft of creative intellectual property to train the models, or the environmental cost of running them.

Second, it’s a competitive weapon. Scary AI talk leads to heavy regulation. Heavy regulation costs a lot to comply with. The big companies can afford that. Start-ups can’t. I like the term Cal used to describe this, “regulatory capture”; the companies help write the regulations, and the regulations protect the incumbents.

Third, it attracts money. If investors believe AGI is imminent, they’ll keep funding the companies most likely to get there first. As Cal put it: “It’s scary to think about AI automating half of jobs, but if I’m an investor, I better have my money in the company that’s going to do half of the jobs.”

Now, I want to be careful here. I don’t think this is all cynical theatre. Some of these people genuinely believe they’re building something that will change everything, and they might be right in the long run. But the incentive structures are worth understanding, because they shape what gets talked about publicly and what gets quietly shelved.

Sam Altman, the CEO of OpenAI, let something slip recently that I think is revealing. He said: “The main thing consumers want right now is not more IQ... personalisation is the real moat.” That’s a pretty significant shift from almost everything he’s been talking about to date around AGI.

A handful of companies

Whatever you think about the technology, there’s a structural question here that I think should concern everyone.

Right now, a small number of companies are building the systems through which billions of people will access, process, and interpret information. OpenAI, Google, Anthropic, Meta, and xAI and maybe a couple of others are shaping how the world will interact with knowledge itself.

Hank drew a historical comparison that stuck with me. He talked about how different communication technologies either concentrate or distribute power. Radio was a narrowing technology. Only a few could broadcast, and if you wanted to stage a coup, you captured the radio station. Television was even more concentrated. The internet was the opposite: suddenly anyone could publish, broadcast, participate. It fractured the old power structures.

AI, he argues, is narrowing again. “We’re headed into a world where there’s three radio stations and they control reality for everyone”. Elon Musk talks openly about using Grok to influence behaviour, like influencing people to have more children. I’m sure I don’t need to tell you what I think about that?

The financial barriers reinforce this concentration. Meta alone has committed $115 to $135 billion in capital expenditure for 2026. In 2025, 73% of all AI investment by value went to mega-deals over $100 million, totalling $258.7 billion. You simply cannot compete at the frontier without that kind of money, which means the frontier is an exclusive club with very few members.

Whether those members have earned our trust to define how the world accesses information is a question worth sitting with.

The obvious sustainability problem

Frontier models are extraordinarily hungry for compute power. Every time someone sends a query, the model has to multiply through hundreds of billions of parameters. Every single token (roughly, every word or piece of a word in the response) requires that full calculation. It’s the most expensive computational operation we do right now. I believe I’m right when I say that in term of raw compute, AI chat outweighs all the super computer modelling (disease research, weather science etc.) we’re doing as a species at the moment.

Power companies want to build more infrastructure (that’s how they make more money). AI companies want to claim they need more (to justify investment and intimidate competitors). In a world where electricity is already unaffordable for a lot of people, and the additional demand from AI infrastructure is pushing prices up further. If there’s a cold snap and the grid can’t handle both the data centres and people’s heating, I’m hoping that we choose to turn data centres off before people’s heating, because people will die.

To put things in perspective. Processing a million conversations through a frontier LLM costs somewhere between $15,000 and $75,000. Processing the same volume through a small language model (SLM) [I’ll get onto those shortly] costs between $150 and $800. That’s potentially a hundred times cheaper. When the gap is that large, you have to ask whether the frontier approach is the right one for the vast majority of real-world uses.

The quiet revolution

This is the part of the story that I find genuinely exciting, and it’s the part that gets the least airtime. There’s a growing body of evidence that the future of AI doesn’t look like one massive, do-everything model. It looks like lots of smaller, specialised models, each good at something specific, coordinated by an orchestration layer that routes tasks to the right tool for the job.

A bot called Pluribus, built by a researcher at Carnegie Mellon (now at OpenAI), which was the first system to beat professional poker players at Texas Hold’em with real money on the line. The interesting part: they originally tried scaling up with massive neural networks. It didn’t work well enough. The system that actually won could run on a laptop. It wasn’t one giant brain. It was a clever combination of smaller components. Some neural network, some future prediction, some strategy simulation, each doing what it was best at.

Another example was a diplomacy-playing bot (It’s a boardgame based on World War One, similar to risk with more, well, diplomacy). It had a language model, a future simulator, and a strategy engine, with a control module written by humans in conventional code. When the researchers wanted it to not lie (lying is a core strategy in diplomacy), they could simply tell it not to, because the system was transparent enough to enforce that constraint. Try doing that with a frontier LLM!

To quote Cal once again, “When you move away from ‘all we have is a language model,’ you have multiple components. There’s a control module written by humans in normal computer code. Nothing unknown. Nothing learned. Nothing that evolves.” That transparency is worth more than we currently give it credit for.

Around 40% of production AI deployments now use what’s called a “hybrid router” pattern: route the easy queries (the 80 to 95% of requests that are straightforward) to small, efficient models, and only escalate the hard 5 to 20% to the expensive frontier models.

The real-world examples are compelling:

Checkr, a background check company, replaced GPT-4 with a fine-tuned version of Llama-3-8B (an open-source model from Meta with 8 billion parameters). Result: 90% accuracy, five times cheaper, thirty times faster.
DoorDash achieved a 98% reduction in evaluation turnaround time and 90% fewer hallucinations by switching to SLM-powered systems.
Microsoft’s Phi-4-reasoning model, at 14 billion parameters, achieves capabilities comparable to model fifty times its size.
DeepSeek R1 achieved performance on par with OpenAI’s o1 model while only activating 37 billion of its 671 billion total parameters. That’s a 94% reduction in active compute.

Researchers are calling this trend the “Densing Law”: capability density (how much a model can do per parameter) roughly doubles every three and a half months. The same performance keeps getting achievable with fewer and fewer resources.

To quote AI’s favourite sentence: “Why this matters”

If the future really is specialised models running on smaller hardware, the barrier to entry for building useful AI drops dramatically. You don’t need a hundred-billion-dollar data centre. You don’t need NVIDIA’s latest supercomputer cluster.

A 3-billion-parameter medical model recently outperformed open-source models up to three times its size on medical diagnostic tasks by 25%. Enterprise on-premise AI (companies running their own AI in-house rather than paying for cloud access) grew from 12% adoption in 2023 to 55% in 2025. That’s a massive shift in just two years.

The MMLU benchmark (a standard test used to compare language model capabilities) showed the gap between frontier and open-source models narrowing from 17.5 percentage points to just 0.3 in a single year. One analyst prediction from 2025 puts it starkly: “By 2027, frontier models will not be strategic assets. They will be commodities.”

That’s a world where smaller companies, industry specialists, healthcare providers, legal firms, education platforms, can build AI tools that genuinely work for their specific needs without paying rent to one of three or four tech giants. That’s a more competitive, more innovative, and frankly more interesting market.

But here’s the complication

Of course this isn’t a reality that’s escaped the big AI companies, if the future is orchestration and specialised models; who’s actually best positioned to build that infrastructure?

Google already offers Gemini 3 Pro at the top, Flash and Flash-Lite in the middle, and Gemma (open-source, ranging from 1 billion to 27 billion parameters) at the bottom. OpenAI has o3 for the hard problems and GPT-4.1-nano with fine-tuning for the cheap ones. Anthropic’s Haiku model runs at a third of the cost of its more capable Sonnet.

I mentioned regulatory capture before, and that’s really the idea there, put the other people out of business before we pivot to these smaller, lower-cost machines, because the barrier to entry there is smaller. So AI companies involved with this shift and are trying to lock in their advantages before it arrives.

There’s a real possibility that the orchestration layer, the thing that decides which model handles which task, becomes the new bottleneck. The models themselves might get commoditised, but the platform that coordinates them could end up just as concentrated as the current frontier.

My research brief for this article (Yes, my brief to a large frontier AI model) put it in terms I keep coming back to: “The technology is democratising, but the business models are consolidating. Smaller players can now build competitive AI products, but the orchestration layer might end up controlled by the same big players.”

So the opportunity is real, but it’s not guaranteed. Whether the SLM revolution actually distributes power more broadly or just reshuffles which part of the stack the big companies control... that’s still being decided. And it depends partly on choices that businesses, regulators, and users make over the next few years.

What do I think the reality will be?

Like every technology revolution, I think things will be more insidious and nuanced than we initially imagine. Just like the robotic revolution is quietly happening, not though each of us having a humanoid robot in our homes; instead with a bunch of specialised robots, a smart vacuum, a smart mower and a washing machine which tells us when it’s done with the load. I imagine AI will continuously embed itself into our daily lives through specialist applications, without us noticing. In fact I feel like it’s been doing that already, I mean, your Facebook/Instagram/Tiktok feed algorithm is the same multi-dimensional embedding transformer tech that LLMs are built on.

Like everyone, I have my own views on where we’re at in this AI bubble and what its (inevitable?) collapse might mean, I might write about that soon.

However, I do think we’re going through a huge shift in the way we approach technology, knowledge and most importantly decision making. Do we want to have a say in how it shapes our industries, our workplaces, and the way we relate to information? Or do we let a handful of companies, led by people with very specific financial incentives, make those decisions for us (again)?

Hank Green said something that I think is worth ending on. He was talking about the tension between worrying about hypothetical future problems and dealing with what’s right in front of us. His conclusion: “What if dealing with the problems we have right now is the thing that makes the future better?”

I think he might be onto something.

How does this AI thing even work

Carl Head — Sat, 28 Jun 2025 21:48:22 GMT

What do you mean by AI?

AI is often used as a catch-all term. In reality, “artificial intelligence” can refer to anything from a simple pattern-matching program to a sophisticated learning system. AI is just an umbrella term for a wide range of slightly similar things.

ChatGPT has little in common with software a bank might use to evaluate loans, they work fundamentally differently, are used for different purposes, and fail in different ways.

Compounding this, “AI-enabled” everything has become a marketing trope. Companies eagerly slapping the AI label on everything from refrigerators to grills, to pillows, mirrors, you name it. At CES last year, exhibitors touted an AI pillow that adjusts to reduce snoring, an app that uses AI to “translate” your baby’s cries, a cat door using AI vision to stop kitty from bringing in dead mice, and even a grill claiming to use AI for perfectly cooked steak. These examples range from genuinely innovative to completely absurd.

So, in summary, it isn’t some magic and isn’t one single thing. It’s a broad field of technologies, and calling something “AI” doesn’t tell you much until you dig into what’s actually going on under the hood.

The most prominent AI technology right now is generative AI. Generative AI is being applied everywhere from AI chatbots to image generators, voice synthesis, email writing, and so on. In fact most news articles you read these days are at least partially AI generated summaries of AP1 news stories. Generative AI has exploded onto the scene and is demonstrating some amazing, disturbing and fascinating results across many, many industries.

I’ll leave the debates about ethics, morality, politics and the like to a future article, as today I thought it’d be beneficial to try to demystify the technology a bit. Edmund Burke said,

“When you fear something, learn as much about it as you can. Knowledge conquers fear”.

I do think that’s true for everything in life.

So, what is generative AI then?

Generative AI generally runs on a technology called a GPT (Generative Pretrained Transformer). And it’s this technology that runs the world’s LLMs (Large Language Models), which I’ll focus on in this article. Other GPT models for images, video, music, and the like work much in the same way; but are a little more esoteric to understand.

So, in short, the large language models are built on a neural network architecture called a Transformer, which to an 80’s kid like me is quite amusing.

Autobots Assemble!

So, these Transformers are Pretrained, which is a process of ingesting information, being prompted for responses and using positive reinforcement, tuning the model to give desirable responses. The scale of the training these days is mind-blowing, the latest models have basically ingested the text of the entire internet and are now greedily swallowing up all the videos on YouTube (through AI transcription of course) as well as, controversially, the copyright works of all the world’s authors, musicians, actors, researchers and the like.

These Transformers are Generative, in that they take some input; run it through a process and generate a novel output.

Ok, but how does something like ChatGPT actually work?

I’ve been reading into this, watching videos, reading research papers, articles and what not. It’s a heck of a lot of complex vector maths, but I think my lizard brain has absorbed enough to try to explain it.

At its core, all ChatGPT is trying to do is predict the next word (token), based on all of its training and based on what you’ve fed it as a prompt.

In order for the model to give you a good response, along with your text prompt, an internal prompt (the system prompt) is fed in at the same time with a bunch of information and rules. It’ll read something along the lines of: “you’re a large language model called ChatGPT, your job is to predict what a helpful AI assistant would say in response to the user’s input…”. It’ll outline what the model can and can’t talk about, how answers should be structured, the tone of the answer and so forth.

Then it’ll feed all of that into the process.

The process goes like this:

Tokenisation

Very simply, during the training process the model took all the words it ingested and assigned them an ID. Complex words or uncommon words might be split into pieces by the model, for ‘reasons’. For simplicity, we’ll say this makes them easier to deal with down the line.

Embedding

Each token is then converted into a vector (a list of numbers). This is called an embedding, and basically, it’s a way for the model to represent that token as a co-ordinate in a high-dimensional space. I like to break it down into fundamentals, so think of a graph with an x axis going vertically and a y axis horizontally. The embedding has been assigned a value which will put it on this vector in that graph:

This is that word’s position in the model. Now to complexify it again; those vectors aren’t represented in two dimensions, as I’ve done above, or even three dimensions; models these days have many, many thousands of dimensions or more.

Words with similar meanings end up with vectors that point in similar directions in the dimensional model. So, imagine if you will, that we have a vector for the word ‘king’ and calculate the difference to the vector for the word ‘queen’ (we call this the dot product); this effectively represents the distance and direction between those two embeddings in the model. Now if we take the vector for the word ‘man’, and add that dot product from above, we’ll end up in the vicinity of the word ‘woman’.

Hopefully that oversimplification helps understanding how words end up in multidimensional vectors in the semantic space of the model and how words have relationships represented mathematically.

Attention

This is where the “magic” happens. Obviously, language is full of nuance, subtlety and context. So, when I look at word, it can have many different meanings both formally and informally as slang, based on the wider context. So, if I said to an LLM, “I’m going to the Big Apple next week, tell me where I should eat.”; without strong contextual understanding it’d be quite confused and likely to tell me to not eat the pips because they can be toxic.

So, the attention mechanism allows the Transformer to take your whole prompt and focus the wider context as it deals with those embeddings. If you remember from the step before; what the model now has is basically a string of numbers, each representing the vector of an embedding.

Rather than reading a sentence word by word blindly, the model can look back (or even ahead, in some cases) at other words and decide which ones are important for the task at hand. In effect, for each word being generated, the model weighs the relevance of all other words in the input (and the part of output generated so far).

Think of attention as the model’s way of deciding “what should I pay attention to?” when choosing the next word. So, from my example, “I’m going to the Big Apple next week, tell me where I should eat.”, the model should attend to the phrase “Big Apple” heavily to realize that the Big Apple refers to New York.

When you look at the maths, the attention steps are taking the vector values and multiplying them by weights based on the other embeddings in a sequence, to build context.

Layers of processing

Depending on the model in play and the papers you’re reading, the next step in the model is called a Multi-layer Perceptron or an Attention Head. What this layer of the model does is processes each vector output from the attention above, to build more context for that vector. It’ll do this in parallel for all the vectors in the sequence.

You could think of this step as a little bit like asking a long list of questions about each vector, such as “is it an English word” or “is it a noun” or “is it a number” or “is it code” and so on.

It’ll then pass the updated vectors to another attention layer to be multiplied by all the other vectors in the sequence again, building even more context, before going through the next Attention Head.

As the data passes through these layers, the model builds up more complex features. Some Attention Heads might be tracking grammar, others factual associations etc. So, as it progresses through these layers the vectors get more and more “meaning”. Modern modes have dozens of these layers as well as some normalisation layers and ome other bits that aren’t relevant here.

I’ll pause here to remind you, there’s no intelligence happening here. All the model is doing is multiplying those vector values together based on weights in the model (from its training) and based on the information that it’s getting at each Attention Head.

Decoding

Finally, after going through all those layers of context building the, model uses this processed information to predict a probability distribution for the next token (or word). Again, just a vector math calculation based on the final value of the final vector in the sequence.

Essentially it asks: “Given everything I’ve processed so far, what is the most likely next word?”. It then multiplies the probability distribution by a small error factor, to give more variability and creativeness in its responses.

In my example above, it might predict the next token to be the word ‘That’s’. It’ll append that to your prompt and run the whole thing through the process from tokenisation, through to decoding and get a prediction for the next word. It’ll do that repeatedly until it determines that the response is complete. This is actually what ChatGPT gave me when I asked it this question:

That’s great, Carl — New York City has one of the most diverse and exciting food scenes in the world. Here's a curated list of places worth checking out, split by category to suit different moods or occasions.

It then proceeded with a curated list of restaurants that suit my tastes (I’m a vegetarian and like fine dining).

So probably worth noting then, that the newest models like ChatGPT aren’t just feeding in your prompt for context, they’re including a bunch of other information which could be, documents you’ve attached or information it’s committed to its internal memory which it built during its other chats with you.

The new models are able to take enormous prompts. Just a few years ago ChatGPT would only take a couple of paragraphs before the model lost context and effectively started “forgetting” the beginning of the prompt. These days, models are taking upwards of a million tokens, which is 4 or 5 Stephen King novels’ worth. (you might find you run out of model context much quicker if you’re on a free version of your preferred LLM).

In summary, an LLM turns your words into tokens, turns tokens into numerical embeddings (its way of “perceiving” language), then uses layers of attention-powered processing to figure out what comes next, one step at a time. It’s like a very sophisticated auto-complete that — thanks to training on massive data — “knows” how language flows and can produce results that often sound knowledgeable and coherent.

These models feel magical because they can produce text that reads as if a human wrote it. But they are not doing reasoning or understanding in a human way – there’s no intent or self-awareness. They predict text based on learned patterns. For instance, if during training, the model saw many examples of Q&A where a question starts “Why...?” and the answer often begins “Because...”, it learns that pattern. When you ask, “Why is the sky blue?”, it has seen text about Rayleigh scattering and atmospheric particles and will assemble a plausible answer drawing on that training data. The impressive part is the generalization: it wasn’t explicitly programmed with rules for every question; it learned a broad statistical picture of language and world facts, which it can apply to new queries.

Also, generative models can create things they’ve never seen word-for-word. They might recombine ideas, or phrase things in a novel way – that’s part of why it’s called “generative” and not just “repetitive.” This generative ability is double-edged: it allows creativity, but as we’ll see next, it also means the model can generate wrong information that sounds perfectly confident.

Why LLMs hallucinate

Anyone who has used AI chatbots or other generative AI has likely encountered a startling phenomenon: AI makes things up. In AI lingo, these false yet confident outputs are called “hallucinations.”

If you think back to the the math-based prediction above, you can start to see why hallucinations might happen. In short, LLMs hallucinate because they lack a built-in sense of truth or reality. They only know statistical patterns in language.

LLMs are trained with re-enforcement training, i.e. they’re “rewarded” for giving good responses, and researchers try to tune bad responses out of them. However, they’re not given any reward for not responding at all, and thus there’s an inherent drive to always give some response.

If you ask about an obscure topic that wasn’t well-covered in the model’s training data, the AI may “fill in the blanks” with something plausible sounding. It doesn’t want to say, “I don’t know” (unless specifically trained to), so it fabricates an answer based on whatever related info it can infer.

How to minimise hallucinations

The more context a prompt has the more likely a model is to end up with the correct embeddings and have attention calculate the best vectors to give you a better-quality answer. So, the more information you provide in your prompt and related attachments the more likely the model is to find similar information from its training data and give you better results. I always tell people to stop using ChatGPT like a search engine and trying to give it a few relevant key words; what you want to be is as explanatory and verbose as possible, to glean the best results.

The next suggestion is to switch the model to web search mode or use a model that searches the web. That way, you’re relying on the model’s training to understand your question and go online and read relevant material to find answers that look relevant. It’ll then give you those results nicely summarised along with links to the sites, so you can cross check yourself.

You can take that to the next level if you’re on the paid plan of ChatGPT, using its deep research mode to go and research a topic for you. It’ll take it around ten minutes to come back with a detailed research paper with cited sources for the topic of your choice. It’ll also ask a number of relevant questions of you beforehand, to make the research as relevant as possible.

How are AI developers reducing hallucinations in their models?

Developers are developing a whole new class of generative models and are focusing on a greater degree of specificity for models. Meaning they can be tuned to specific use cases, rather than being generalists. One good example of this is called RAG (Retrieval-Augmented Generation). This is where a model will be given access to a database of relevant information, and the model is trained to get answers from that body of knowledge only. If it doesn’t find an answer, it’ll simply respond with “I cannot find the answer in the data”.

Also, on the topic of reducing model scope is the curation of training data and improving the quality of training data. So instead of training a model on all the information available online (with all the garbage therein), they’ll train it on a smaller dataset of well curated known accurate information.

There are also new methods of reinforcement learning and human feedback loops that ask humans to pick a preferred response, humans generally preferring accurate information. (you might have been asked by your favourite model to rate two responses, that’s what that’s for).

Developers are also adding post-processing guardrails to models. Effectively, fact-checking modules that assess a model’s response and send it back if it’s not correct or hallucinated.

It’s early days yet, and things are moving fast. So, as always, use AI with caution; it’s still perfectly capable of being as confidently wrong as I am on a bad day.

So, I’m sure you’ll see that in future, we’ll likely have more, smaller models which are purpose built to fill specific needs. Many organisations are building protocols to enable Agentic AI to talk to other Agentic AI for this purpose. Google recently donated their a2a (agent to agent) protocol to the Linux foundation for everyone to use. Anthropic has developed the open source MCP (Model Context Protocol) to standardise the way AI models connect to external systems and tools. I’m sure I’ll be writing an article shortly about the connected future of agentic AI.

Conclusion

With some background and detail, I hope I’ve done a little to improve your understanding of how all this works under the hood. Making it a little less mysterious will probably make AI either more or less scary depending on your view of high dimensional vector mathematics.

I know I didn’t explain anything about neural networks, and I glossed over and oversimplified many other details. I’ll perhaps go a bit deeper into those points in future.

I was going to switch gears and explain why, although the technology is well understood. Even AI researchers don’t fully understand what’s happening inside these models at a detailed level. However, this is already over 3000 words long; I’ll wrap it here and encourage you to come back for my next instalment “The Mystery in the Machine”

Associated Press

Is AI going to extort you?

Carl Head — Sat, 24 May 2025 05:19:36 GMT

When I started this blog, I intended to write about a broad range of topics. I also planned to do extensive research into each one and publish my sources alongside. This is my first foray into a purely opinion piece, fuelled only by my thoughts and feelings after reading a handful of articles on a Wednesday morning, articles that, to me, epitomise the current state of technology.

I continue with my mild obsession with the emergence of AI in our everyday lives. I’ve been relentlessly consuming media related to AI learning with fascination and a little trepidation. Everyone and their dog has an opinion about where we are, what comes next, the threats, the benefits, and so on. (And yes, I too have opinions on all of this.)

What I’ve realised, though, is that by the time I’m reading an article about which model is better for what, it’s already outdated. AI is receiving truly ground-breaking investment from every relevant tech company and venture capitalist globally. Things are moving at a pace we’ve never seen before.

One morning this last week, I read a series of articles in quick succession that made me take pause and try to imagine where we’re going and how things are about to unfold. I’ll share the links at the end, in case you’re curious.

The first article was by yet another developer, David Gewirtz, who unlocked the power of “vibe coding” by adding an entirely new feature to his WordPress add-on using yet another new coding agent, Google’s Jules. He wrote these words:

“Okay. Deep breath. This is surreal. I just added an entire new feature to my software, including UI and functionality, just by typing four paragraphs of instructions. I have screenshots, and I'll try to make sense of it in this article. I can't tell if we're living in the future or we've just descended to a new plane of hell (or both).”

Two things struck me when reading his article. The first was my complete lack of surprise. I’m working with many developers who have shifted from writing a code function and letting AI write their code comments, to writing the comments and letting AI write the code functions. The second thing that struck me was the belligerent nature of Jules, insofar as it would propose a development plan to David, then go ahead and approve it, moving forward without giving him an opportunity to respond. “Ask for forgiveness, not permission” seems to be an innate behaviour in these AI agents and models.

The next article I read was from analyst Ruben Circelli. He gave Google’s Gemini a whirl, granting it access to his personal Gmail account (not by choice; it automatically integrates when you switch it on). What he found chilling was the depth of personal information Gemini immediately knew about him. It knew everything from who his Facebook friends were in 2009, to his first love, first crush, and so on. It even had the audacity to point out his character flaws by analysing the way he communicates with friends and family over email. Finally, he noted that when interacting with Gemini, it had started to adopt his own persona in conversations, responding to his prompts just as he would.

Finally, after contemplating my own life choices for a while, I went on to read about Anthropic’s new Claude Opus 4 model, which is in pre-release testing. The AI safety engineers (yes, that’s now a job title that exists) were doing their general pre-release safety testing. They asked Opus to act as an assistant for a fictional company and gave it access to the company emails. Within those emails were communications between engineers indicating they’d be replacing Opus 4 with another system. There were also emails that revealed the engineer spearheading the change was cheating on their spouse. What they found was that in these scenarios, Opus 4 would try to blackmail the engineers 84% of the time, especially if the replacement model didn’t share its values. Notably, it did attempt other “ethical” avenues first, like flattery and pleading.

It seems, then, that we’ve let loose agents and models that can be belligerent, spiteful, and potentially dangerous. And at the same time, we continue to give them access to more information and grant them more freedom and control over our daily lives.

So, as I sat there perplexed by the many disturbing realities of 2025 (AI being just one of them), I decided to write a short piece on my feelings about all of this.

I’ve been considering how my friends, family, colleagues, and peers might respond to my AI musings. Some will be angry, some will be concerned, some will be dismissive, and a few will be defensive.

And what I’ve found most consistently, though, is that there isn’t a very strong understanding about how AI actually works, technically. There’s a frequent assertion: why not just program the system to do x and not do y? Therein lies the rub, though; there is no conventional programming AI models, especially transformer models, which is what we’re referring to these days when we talk about AI. In fact, researchers themselves are desperately trying to understand how the models work. You may wonder how it’s possible that we’ve built the technology but don’t really understand how it works.

To try to demystify AI a bit, I’ve been doing some deep research into the technology, the maths, the models, and everything else that makes these systems tick. I’m penning another article and will do my best to simplify what I’ve learned into something that’s understandable to those of us who don’t have a PhD in maths and a master’s in computer science.

As for my views on the craziness that’s unfolding in the world around us, well, I’ll treat it like I do everything else. I look to be as informed as I can be about all things, try to understand both sides of every argument, and try to understand the intentions behind every action. Hopefully then, there will be some path of clarity that will allow me to make good decisions and, just maybe, to offer some good advice.

Articles in question:

Responsibly adopting AI to unlock the value in your data

Carl Head — Mon, 28 Apr 2025 09:39:35 GMT

The value proposition

One of the areas I focus on when working with businesses on their digital transformation objectives is knowledge management. I've seen first-hand how powerful it is when organisations make their knowledge accessible to their teams and customers. Making relevant information immediately accessible where it's needed, significantly improves both the user and customer experience.

Here a a few ways we can use AI to improve data discoverability, accessibility and usability (we call this data democratisation).

Simplifying search and discovery

Modern AI-powered search engines don't just index files; they understand the context behind queries. Natural language processing (NLP) enables staff to ask questions in plain English (“Show me last quarter's sales by region”) and get relevant, accurate answers drawn from across multiple systems. This reduces reliance on technical experts and empowers staff to find the information they need independently.

Automating data interpretation

AI can summarise complex datasets, highlighting key trends, anomalies, and insights without requiring users to dive into pivot tables or write SQL queries. Tools like automated reporting and AI-driven dashboards transform raw data into clear, digestible summaries, making analytics accessible even to non-technical team members.

Democratising predictive insights

Machine learning models are no longer just for data scientists. Many AI platforms now offer user-friendly predictive analytics, allowing teams to forecast trends, identify risks, and spot opportunities with minimal training. This gives marketing teams, sales staff, and operational managers an edge in making proactive, data-driven decisions.

Personalising data experiences

AI can tailor the way data is presented based on an individual's role, preferences, and past behavior. Rather than sifting through irrelevant reports, employees receive personalised dashboards and alerts highlighting what matters most to their job. This improves engagement and drives smarter, faster action.

Reducing information overload

Ironically, more access to data can sometimes mean more confusion. AI helps by prioritising and filtering information, focusing attention on the most critical metrics or changes. Instead of being overwhelmed by hundreds of KPIs, staff are guided to the insights that matter most.

Here are some reference case studies outlining some innovative ways that bussiness are leveraging AI to get results from their data: AI-Powered Data Insights and Accessibility: Case Studies

In practice

Recently we implemented a case management solution for a customer who deals with a very large volume of varying and complex cases.

Previously, customers had to wade through hundreds of pages of technical documents and policies to find the information they needed, often leading them to log a case just to get clarification.

Once a case was logged, it had to be manually triaged and assigned to the correct team. Then the consultant handling the case had to dig through the same dense material, along with past cases, to find the right answers, a slow and frustrating process for everyone involved.

We tackled these challenges in a few simple ways, mostly leveraging natural language processing (NLP) and implemented the following:

On the website, there's now the ever more popular chat bot which can answer questions from customers in natural language and as a bonus, do so in almost any language. It's trained on only the customer's knowledge articles and will automatically create a case to avoid the risk of hallucination.
When a customer logs a case, the system scans knowledge articles in real time and presents relevant excerpts as they type, deflecting cases before they are submitted.
Cases are automatically triaged and assigned to the correct team based on their content.
Consultants working on cases are automatically presented with relevant knowledge articles and similar past cases. The AI also helps populate responses and summarises communication history as they work.

The result? Dramatic improvements across the board: better customer experiences, faster case resolution, reduced workloads for consultants, and higher overall case quality.

The risks

Now that we've seen all of the benefits we can glean, lets talk about the risks.

In the rush to embrace AI, it's easy to overlook a simple truth: the more you feed it, the more it knows, and sometimes, it knows far more than you intended. From sales forecasting tools to virtual assistants that cheerfully summarise your inbox, AI is quietly threading itself through the fabric of our businesses.

The pitch is simple and seductive: give AI access to your data and, in return, it will make life easier for you and your team. And sometimes, AI genuinely delivers on that promise. But the part that often gets lost between the glossy demo and the dotted line is this: once AI has access to your business data, regaining full control isn't as straightforward as it seems.

We've been here before, in a way. Those of us who remember the early days of email servers or "open file shares" (the Wild West days of network security) know that convenience often came at the price of some "oops" moments involving sensitive documents. Today's AI tools are turbo-charged versions of those lessons, and the stakes are much higher.

Before we grant AI unfettered access to our sensitive data, it's worth taking a moment to understand what really happens when AI gets across your sensitive data. Not to slam the brakes on innovation, but to make sure we're still the ones driving the car.

What’s actually happening behind the scenes

Most modern AI tools are only as powerful as the information you allow them to access. They thrive on data, the more historical transactions, customer emails, support tickets, inventory lists, and financial reports you feed them, the sharper their insights and suggestions become.

The catch? The boundaries between "necessary access" and "too much access" blur very quickly. What starts as simply "connecting to the case management system" might quietly evolve into the AI hoovering up emails, file shares, contracts, and meeting notes, because "it helps make better recommendations."

The kicker is that many AI systems aren't designed to "forget" easily. Once data is ingested, it's very hard to cleanly retract or redact it. Worse still, if the AI is linked into external cloud services, or if it's learning from your inputs to improve itself, you may not have full visibility into where that data is stored, copied, or even processed.

In short, handing your AI the keys to a few filing cabinets might soon look more like inviting it to make itself at home in your entire office.

What to watch for

As AI becomes more embedded into everyday operations, the risks that businesses face are evolving too. Some are obvious, but others creep in quietly until they are suddenly a much bigger problem than expected.

Here are the major areas where I've seen risks creep in:

Data leakage: AI systems can inadvertently expose confidential information. This could happen through overly helpful "auto-complete" features, shared outputs, or predictive suggestions that draw on sensitive internal data.
Over-permissioning: It often feels easier to grant AI systems broad access "just in case." The trouble is, once you open the gates, monitoring and limiting that access later can be surprisingly tricky.
Training data risks: If your data is being used to train models (especially in vendor-managed environments), you could lose ownership or control over parts of your intellectual property.
Compliance and regulatory breaches: Laws like GDPR, HIPAA, and your country specific Privacy Act place strict conditions on how personal and sensitive data can be handled. AI systems working across jurisdictions can inadvertently cause violations without anyone realising until it is too late.
Third-party vendor risk: Many AI solutions are built or hosted by third parties. If you don't have clear contractual controls over how your data is used and protected, you are exposing your business to another layer of vulnerability.
Stealth AI: Applications that are introduced into your business operations without formal oversight or clear governance. It can include employees using AI-powered tools or plugins without IT or management approval, often because they are trying to work more efficiently.

I wouldn't say that these risks are reasons to avoid AI altogether. But they are strong arguments for treating AI deployments with the same caution and governance you would apply to handing over your financial records, client files, or strategic plans to a new employee or vendor.

Consider a simple example: you grant AI access to your entire document management system, unaware that Jeff from payroll has stored an Excel file listing employee salaries in a folder he assumed was private. Normally, this sort of slip-up might go unnoticed. But now, thanks to AI's ability to scan and reference everything it can access, any staff member could casually ask, "Who earns what around here?" and get an answer.

Of course, the implications get far bigger, more complex, and frankly, scarier when the same AI agents are used to power public-facing tools like website chatbots and self-service knowledge bases.

How we mitigate these risks

It pays to be deliberate about how you roll out AI. Here are some practical steps I'm seeing successful business take:

Apply the principle of least privilege: Only give AI systems access to the minimum data they need to perform the tasks you actually want automated. Resist the temptation to "just give it everything" for convenience.
Know your data: Understand what sensitive information you have, where it lives, and who currently has access to it. You cannot protect what you don't know exists.
Classify and tag sensitive information: Make it easier to automatically restrict or alert when AI tools are handling high-risk data types, like customer personally identifiable information (PII), payroll records, or trade secrets. Using sensitivity labels on documents is a popular approach to achieving this.
Choose AI vendors carefully: Scrutinise contracts to understand how vendors handle your data. Ask the awkward questions: Is my data used to train broader models? Is it stored outside my jurisdiction? Who has access to it?
Implement clear internal policies: Make sure staff are trained on responsible AI usage. Often, the biggest breaches happen not because of malicious intent, but because people simply don't realise how powerful the tools they are using have become.
Build in monitoring and oversight: Just like you wouldn't hire a new employee without supervision, don't let AI tools operate without regular checks. Review what data is being accessed, how outputs are generated, and whether anything feels off.

Treat AI like you would treat any new team member: with opportunity, yes, but also with boundaries, checks, and accountability. A little scepticism today can save a lot of headaches tomorrow.

Closing thoughts

AI is no longer an experiment running in the background. It is becoming a business-critical tool, and with that comes a new kind of responsibility. If businesses treat AI with the same rigour they apply to financial reporting, cybersecurity, and customer trust, the benefits will be enormous.

Ignoring the risks, or worse, assuming someone else is managing them, is a recipe for painful lessons down the line. But facing those risks with open eyes, strong guardrails, and a culture that values both innovation and caution? That is how business will unlock the true value of the tooling.

As always, a little thoughtfulness now is worth a lot less regret later.

This is an ever evolving area, that’s become a keystone of many of the solutions I’m implementing with my customers, so I’m sure it’s a topic that I’ll be revisiting as new insghts emerge.

Stay tuned.

AI-Powered Data Insights and Accessibility: Case Studies

Carl Head — Mon, 28 Apr 2025 09:28:39 GMT

This is a reference artitle for my main article: Responsibly adopting AI to unlock the value in your data

The recent explosion of artificial intelligence (AI) technologies has transformed how organisations across industries access and utilise data. AI's unprecedented ability to synthesise vast datasets into actionable insights has empowered businesses to enhance decision-making, streamline operations, and dramatically improve user experiences. Let's explore several recent case studies demonstrating how AI has become pivotal in unlocking data’s full potential.

Finance: Morgan Stanley’s Intelligent Assistants

Morgan Stanley has adopted AI to empower their financial advisers and researchers through intelligent digital assistants, including their AI @ MS Assistant and AskResearchGPT. Leveraging GPT-4 via Azure OpenAI, advisers can instantly retrieve summarised insights from over 70,000 internal research documents. Advisers can now access deep, context-rich insights in seconds rather than hours, enhancing productivity and enabling more informed client service.

Fintech: Intuit’s Generative AI for Small Businesses

Intuit has revolutionised financial management for small businesses with its "Intuit Assist" generative AI, integrated within QuickBooks. This AI-driven solution automates routine financial tasks such as invoice generation and overdue payment reminders. Businesses utilising this tool saw overdue invoices paid 45% faster and experienced a 10% increase in payments made in full, significantly enhancing cash flow and operational efficiency.

Healthcare: Mayo Clinic’s Clinical Knowledge AI

In collaboration with Google Cloud, Mayo Clinic introduced a generative AI-powered search tool designed to help clinicians rapidly access and interpret extensive medical records. Utilising Google’s medical-domain-specific large language models (LLMs), clinicians can conversationally query complex patient histories, imaging results, and genomics data, significantly reducing administrative burdens and enabling quicker, more informed clinical decisions.

Retail: Instacart’s Conversational AI Assistant

Instacart introduced "Ask Instacart," a GPT-powered conversational AI integrated into its shopping platform. This tool allows users to ask natural language questions, receiving personalised grocery and recipe suggestions in real-time. By intelligently tapping into Instacart’s extensive product catalogue and user preferences, Ask Instacart significantly improves the user shopping experience, encourages product discovery, and enhances customer engagement.

Manufacturing: Siemens’ Industrial AI Copilot

Siemens, in partnership with Microsoft, created the Siemens Industrial Copilot, an AI-powered assistant for industrial automation and engineering. Using Azure OpenAI’s GPT-4, engineers can quickly generate complex automation code and resolve technical issues instantly. At thyssenkrupp Automation Engineering, one of Siemens’ early adopters, engineers reported dramatically reduced downtime, accelerated productivity, and greater operational efficiency due to the Copilot’s instant, expert-level support.

Conclusion

These diverse examples from finance, healthcare, retail, fintech, and manufacturing highlight a common, transformative trend: businesses across industries are successfully using AI to turn previously inaccessible or complex data into clear, actionable insights. Organisations that leverage AI-powered tools experience measurable improvements in productivity, decision-making, and customer satisfaction.

Podcast discussing, is AI the death of UI

Carl Head — Wed, 16 Apr 2025 03:44:53 GMT

In this episode, we’re diving into a provocative question: Is AI the death of UI? As agentic AI systems step up to act on our behalf, traditional interfaces are quietly stepping aside. I explore what this means for how we work, what industry leaders like Satya Nadella, Sam Altman, and Jensen Huang are seeing in the shift, and how real-world tools like Microsoft Copilot and Salesforce Einstein are already changing user experience. We’ll talk about the benefits, the new challenges, and what it takes to design systems people can trust. This isn't the end of the user interface, it's the beginning of something very different.

Is AI the death of UI?

Carl Head — Tue, 15 Apr 2025 23:23:23 GMT

Death of the UI

A new interface era

User interfaces used to be pretty straightforward. You pointed, clicked, maybe dragged a file or two. Then along came smartphones, and we got used to swiping and tapping our way through life. Now we’re entering a new phase where you don’t interact with software so much as you talk to it.

With the rise of agentic AI, systems that can reason, act, and adapt on your behalf, the way we interact with technology is being quietly but radically rewritten. Tools like Microsoft’s Copilot, OpenAI’s GPTs, and Salesforce’s Einstein don’t just sit there waiting for input. They suggest, act, and sometimes even decide. The interface isn’t just evolving; in many cases, it’s vanishing altogether.

So, is this the death of UI? Or just its next transformation?

What is agentic AI? (And why everyone’s talking about it)

Agentic AI is what happens when software stops waiting for instructions and starts making decisions. It’s not just a clever autocomplete. It’s a system that perceives what’s happening, reasons through it, and takes action, often without needing to be micromanaged.

Jensen Huang, CEO of NVidia, describes agentic AI as software that doesn’t just retrieve data but “perceives, reasons, and acts on your behalf.” That could mean anything from writing a report, to booking a trip, to managing a logistics pipeline. It’s less like a digital tool and more like a very proactive colleague.

Microsoft’s CEO, Satya Nadella sees this as a seismic platform shift. At Microsoft Ignite, he described a “tapestry of AI agents” woven through our working lives. According to him, we’re moving toward software that understands context, uses tools on your behalf, and remembers what you care about. His take? Thirty years of change are being compressed into three. And at the heart of it is this move from forms and fields to natural language and intent.

Sam Altman, from OpenAI, sees agentic AI as the new universal interface. In his words, “talking to a computer has never felt really natural... now it does.” He envisions AI agents joining the workforce, handling tasks through conversation, whether that’s ordering groceries, drafting emails, or handling customer support. For Altman, natural language is no longer just an input method. It’s the whole interface.

So why does all this matter? Because it challenges the very foundation of how we interact with digital systems. Instead of learning software, users just tell the AI what they want. The system works it out. It’s a shift from control to delegation, from buttons and menus to dialogue.

That might sound like progress. But it also changes the rules. When the AI acts on your behalf, are you still in control? And what happens when it gets it wrong?

Here’s a link to some of the case studies, as they’re implemented into real-world software today.

We've been here before: a brief history of interface shifts

The idea of AI replacing your interface might sound radical, but it’s not the first time users have had to rethink how they interact with machines. If anything, agentic AI is just the next stop on a long and bumpy road.

Here’s a quick look in the rear-view mirror:

Command line to GUI (1980s–1990s)

Text-based commands gave way to windows, icons, and menus. Suddenly, you didn’t need to remember cryptic syntax; just click around until you found what you needed.

Impact: Opened up computing to a broader audience.
Pushback: Power users grumbled about losing precision and speed.

Desktop to web (Late 1990s–2000s)

Software moved into browsers. Everything became “just a link” away, accessible from anywhere. But it also meant slower performance and unfamiliar interaction patterns.

Impact: Lower barrier to entry, global access.
Pushback: Loss of control and reliance on connection speeds, the beginning of the work anywhere culture.

Web to mobile (2000s–2010s)

Touchscreens replaced mouse clicks. Apps were stripped down and optimised for fingers rather than keyboards. Context-aware features like GPS became standard.

Impact: Made computing more personal and portable.
Pushback: Smaller screens, limited functionality, constant notifications, we’re expected to be always available these days.

Traditional UI to Agentic AI (Now)

The interface is no longer visual, it’s conversational. You don’t browse for tools; you ask for outcomes. The system decides how to get there.

Impact: Removes friction for users who know what they want.
Pushback: Reduced transparency, less control, and new forms of error.

Each shift promised more simplicity but introduced new mental models. The mouse was intuitive… once you’d used one. The web was easier… once you figured out how links worked. Mobile apps felt natural… once you got used to touchscreens. Agentic AI is no different. Once the unfamiliarity wears off, the experience might just feel like magic.

But as always, the real test is how it plays out for the people using it day to day. Let’s get into that next.

The end user experience: smarter, simpler... or just confusing?

For end users, agentic AI can feel like a revelation or a riddle. It depends a lot on the context, the execution, and the person behind the keyboard. Here’s how things are shaking out so far.

What gets better

Natural interaction
Talking to software in plain English is a lot easier than learning a new system. As Don Norman, long-time champion of human-centred design, argued; technology should adapt to people, not the other way around. Conversational AI gets us closer to that ideal.

Less busywork
AI agents can summarise documents, write first drafts, analyse spreadsheets, and schedule meetings. It lets users focus on decision-making, rather than hunting for the right menu or exporting a report for the fifth time.

Easier access to insights
In tools like Excel, Copilot can surface trends or outliers you might never have spotted.

Consistency across tools
If your AI assistant works in your email, your calendar, your documents, and your CRM, that’s one less thing to learn. This consistency across platforms (like Microsoft’s Copilot or Salesforce’s Einstein) helps reduce friction and flattens the learning curve.

Room for personalisation
Over time, these agents start learning your preferences. Whether it’s how you like to communicate or what data you care about most, the potential for tailored support is high.

What gets complicated

Trust and accuracy
This is the big one. When the AI gets it right, it’s brilliant. When it gets it wrong, the confidence it delivers with can be dangerous. Users need ways to verify information, trace sources, and understand how the AI came to a conclusion. Otherwise, there’s a risk of blind trust in bad answers.

Loss of control and visibility
Traditional interfaces make options visible, menus, buttons, checkboxes. With AI, the possibilities are invisible until you ask. That can be freeing or disorienting. If the system acts on its own, users might wonder, "Why did it do that?" Designers need to offer clear explanations and let users take the wheel when it matters.

Privacy and data sensitivity
AI agents often rely on sensitive data to do their jobs. That raises legitimate concerns about what they can access, where that data goes, and who’s watching. Transparency here isn’t just nice to have; it’s critical. (This is a big enough topic that I’ll be diving into it in more detail in a separate article soon.)

Shifts in job roles
When AI takes over parts of your workflow, your role shifts. A support agent becomes a reviewer of AI-drafted responses. A marketer becomes more of an editor. This can be empowering, or it can feel like being reduced to quality control. As with any big tech shift, people need support to adapt, not just new tools.

Learning curve
Ironically, while talking is easy, knowing what to say isn’t always. Prompting an AI effectively is a skill. It’s not quite coding, but it’s not nothing either. End users are picking up new habits, learning to refine their requests and manage an unfamiliar relationship with software that sometimes pushes back.

I’ve been an expert in technology space for many, many years. Generally, as new technologies emerge, I’ve found them easy to adopt. I found the AI learning curve quite steep, decades of perfecting my Google-fu, meant that changing the way I interact with the technology was very challenging; it took some time before I was able to elicit good responses from AI.

Designing for the invisible: new UX challenges and considerations

If traditional UX design was about making the interface clear and easy to navigate, agentic UX is about making the invisible feel trustworthy and intuitive. That’s no mean feat.

Here are the big considerations designers and product teams now face.

Trust doesn’t come built-in

In the old world, trust came from seeing familiar patterns, save buttons, undo menus, dropdowns. With AI, users are asked to trust a system that might not show its logic or steps. Designers need to help users understand why the AI did something and give them ways to check its work.

That might mean:

Showing the source of facts or recommendations
Offering confidence indicators or uncertainty prompts
Letting users easily reverse or edit AI-driven actions

Clarity in ambiguity

Natural language is flexible. That’s a gift and a problem. A request like “Send the latest sales report to John” could mean five different things depending on who John is, what “latest” means, and which report you’re thinking of.

Designers have to help the AI disambiguate gracefully, without throwing up a wall of clarifying questions. A well-tuned agent asks just enough to stay on track, without making you feel like you’re filling out a form in disguise.

Human-in-the-loop is still key

Delegation is powerful, but full automation isn’t always the goal. Users often want to stay involved, even if it’s just to double-check or tweak.

Good UX means:

Keeping users informed before actions are taken
Offering previews, not just final results
Asking for approval in sensitive scenarios (e.g., sending an email, placing an order)

It’s not about taking control away, it’s about giving it back at the right moments.

Onboarding the user, not just the AI

You can’t just drop a Copilot into someone’s workflow and expect magic. People need help understanding what the AI can do, how to ask for it, and what to expect in return.

That might include:

Prompt suggestions and examples
Gentle nudges when users hesitate
Clear explanations when the AI can’t help

The mental model shift, from controlling software to collaborating with it, needs to be supported with just as much care as any new product launch.

Transparency around data use

If the AI is drawing from your calendar, your documents, or your company’s internal systems, users need to know. Not in a vague "we value your privacy" kind of way, but in clear, practical terms.

Can you delete the history? Can you restrict what the AI sees? Is it learning from you, or just using your input for one-off tasks?

Trust and adoption often hinge on these answers.

This is a fundamentally different design challenge, less about pixel placement and more about shaping a human-agent relationship. In a way, UX is now part behavioural psychology, part diplomacy, and part co-pilot training.

So, is UI Dead? Or just getting out of the way?

If this shift feels monumental, it’s because it is. We’re not just redesigning screens. We’re rethinking how people interact with software altogether.

Agentic AI isn’t killing the user interface. It’s making it less visible, more conversational, and in many cases, more useful. But that shift brings its own challenges: trust, control, clarity, and context. When the interface disappears, the real design work begins, because users still need to feel confident, capable, and in charge.

Industry leaders like Nadella, Altman, and Huang are aligned on one thing: this is the next big platform shift. But whether it empowers people or overwhelms them will come down to execution, not aspiration.

The early signs are promising. We’ve seen glimpses of real productivity gains, more natural interaction, and systems that feel more like collaborators than tools. But we’ve also seen hiccups, hallucinations, broken mental models, and trust gaps that designers are still racing to close.

As Don Norman has long argued, the goal isn’t to make tech smarter. It’s to make it work for people. Agentic UX should be measured not by how advanced the AI is, but by how well the human and machine work together.

We’ve been through interface revolutions before, from command lines to mobile apps. This one’s no different. It’ll take time, iteration, and a bit of patience. But if we get it right, we may find ourselves spending less time learning software, and more time just... getting things done.

Next up, I’ll be digging into one of the thorniest implications of this new world: data privacy and security in the age of AI agents. When the system sees everything, how do we make sure it doesn’t share too much?

Stay tuned.

Agentic AI Case Studies

Carl Head — Mon, 14 Apr 2025 09:47:01 GMT

This article is a reference article for my main article: Is AI the death of UI?

Microsoft 365 Copilot: Your new (sometimes clumsy) colleague

Microsoft’s Copilot is baked into Word, Excel, Outlook, and Teams. It offers help via a side panel where you can ask it to draft, summarise, analyse, or create. Want a first draft of a presentation? Ask Copilot. Need an email summarised? Ask Copilot. It’s like having a junior analyst always ready to jump in.

The upside? It’s fast and helpful, especially when you’re staring at a blank slide or drowning in unread emails.

The catch? It still has a learning curve. For example, in Excel, Copilot struggled with basic table formatting at launch and required oddly specific setups to work smoothly. Some users were confused by the interface changes. Others didn’t trust the AI’s suggestions. And when it’s wrong, it’s wrong with confidence, which adds a layer of risk.

Microsoft Dynamics 365 and the Rise of AI-First Business Applications

While tools like Microsoft 365 Copilot have grabbed headlines, some of the most interesting use cases for agentic AI are playing out a little further behind the scenes inside Dynamics 365.

This is Microsoft’s line-of-business platform, covering everything from sales and customer service to finance and supply chain. It’s not flashy like ChatGPT, but for many organisations, it’s where the real work happens. And now, that work is being reshaped by embedded AI agents.

AI in the Flow of Work

Microsoft has steadily woven Copilot functionality into key Dynamics 365 modules. These agents aren’t just bolted on, they’re context-aware and task-specific. For example:

In Sales, Copilot can summarise meeting transcripts, suggest follow-up emails, or pull in relevant CRM data mid-conversation.
In Customer Service, agents can suggest case resolutions, surface knowledge base articles, or even draft full responses based on ticket context.
In Contact Centre, live sentiment analysis means that the agent can automatically call a supervisor into calls that turn hostile. As well as summarise call transcripts and automate next steps.
In Finance and Supply Chain, Copilot can explain anomalies in financial reports, summarise forecast variances, or identify potential risks before they snowball.
In Business Central, Copilot can summarise and search for information, give the user insights without them having to navigate and filter data and are in the process of releasing two AI agents focussed on processing sales and purchase transactions autonomously.

The goal is to reduce the cognitive load. Instead of navigating multiple records, screens, and reports, users ask the system direct questions and get actionable responses in plain language.

Why it matters

This is where agentic AI becomes more than a novelty. In Dynamics 365, these capabilities are aimed squarely at improving productivity in high-stakes, data-heavy environments. Salespeople get their time back. Support teams respond faster and more accurately. Finance users spot issues without trawling through five spreadsheets.

It also hints at Microsoft’s broader play: making natural language the common thread across the entire Microsoft ecosystem. You could start a sales email in Outlook, summarise the last meeting in Teams, and update the CRM—all via Copilot. It’s a unified layer that cuts across apps, without users needing to jump between tabs or retrain their muscle memory.

What It Means for the Future of Business Applications

Dynamics 365 is a good bellwether for where business systems are heading. Agentic AI is moving from standalone chatbot to integrated assistant. It’s not just answering questions; it’s participating in workflows, suggesting actions, and learning from context.

This marks a shift in how we think about enterprise software. It’s no longer just a database with a nice UI, it’s a co-pilot that interprets, supports, and sometimes even decides.

For many organisations, that’s a big adjustment. But it’s also an opportunity to rethink how work gets done, and who (or what) does it.

Salesforce Einstein Copilot – Conversational AI for the Enterprise

If Microsoft is embedding AI across productivity and ERP tools, Salesforce is doing the same for customer-facing teams. Their take on agentic AI comes in the form of Einstein Copilot, a conversational assistant built directly into the Salesforce platform.

This isn't Salesforce’s first crack at AI. Einstein has been around since 2016, quietly powering predictions and insights behind the scenes. But with the launch of Einstein Copilot, the AI has stepped forward and started talking.

AI, But with Context

Einstein Copilot isn’t just a chatbot. It’s deeply tied to Salesforce records, business rules, and user roles. That context makes all the difference.

For example:

A sales rep can ask, “Show me my top accounts from last quarter,” and Einstein will not only pull the data, but also suggest actions like “Create follow-up tasks” or “Send personalised emails.”
A service agent can request a case summary, then ask the assistant to generate a response, referencing internal knowledge articles and the customer’s history.

The interaction is fluid, but grounded. The AI operates within known boundaries, with access to structured company data and pre-defined workflows. That tight integration reduces the risk of hallucination and helps build trust.

Prompting Action, Not Just Answers

What sets Einstein apart is its ability to recommend and execute next steps. It’s not just there to answer questions, it’s designed to guide users through tasks.

This shows up in small ways, like suggesting follow-ups after displaying a lead list, or more complex flows, such as automating ticket routing or generating proposal drafts. Salesforce refers to this as “breadcrumbing,” gently nudging users through a task without overwhelming them.

Admins can configure Copilot Actions, essentially pre-scripted, prompt-driven workflows that Einstein can trigger. That means companies can tailor the AI to match their processes and policies, blending automation with human oversight.

Guardrails by Design

Salesforce has taken a relatively cautious approach. Unlike open-ended models, Einstein Copilot requires a specific Salesforce context to function. You can’t ask it general knowledge questions or drift off-topic. That narrow scope is by design. It keeps the AI focused, predictable, and safe within regulated industries like finance, healthcare, and government.

Privacy is also baked in. Salesforce is clear that Einstein respects existing permission sets and data boundaries. It doesn’t train on customer data, and users can audit what data was used in each interaction. This emphasis on transparency and control makes it more palatable to IT teams and compliance officers alike.

From CRM to AI-Powered Workflow Hub

Salesforce isn’t just layering AI on top of CRM, it’s using it to rethink the entire workflow. With Einstein Copilot embedded across Sales Cloud, Service Cloud, Tableau, and Slack, users don’t have to context-switch to get things done. The AI sits across those touchpoints, ready to assist.

And with the growing integration of Salesforce Data Cloud (formerly Customer 360), the assistant gets smarter by pulling from unified profiles, behaviours, and real-time data streams. It’s a move toward true orchestration, where the AI connects dots humans might miss.

Where It’s Headed

Salesforce has signalled that Einstein is becoming more than just a Copilot, it’s becoming the foundation of a new agent-first application layer. They're even branding this broader vision as Agentforce.

The ambition is clear: move from reactive CRM to proactive AI-powered engagement. That includes everything from automatically prioritising leads to pre-empting churn or surfacing next-best actions in customer conversations.

Like Microsoft with Dynamics 365, Salesforce is showing how agentic AI can transform business software, not by replacing users, but by amplifying them. The tools are still maturing, but the trajectory is hard to ignore.

ChatGPT: The interface is the conversation

OpenAI took a different tack. ChatGPT doesn’t look like much, just a prompt box and a log, but that’s the point. There’s no need to learn a new app. You just ask questions, and it responds.

With the release of “custom GPTs” and plugins, users can now build their own agents that connect to real tools. It’s powerful, but the UX hasn’t quite caught up. At DevDay, OpenAI essentially said, “We built the engine. You figure out the car.” A bit of a mixed message for business users trying to wrap real workflows around it.

Still, ChatGPT is probably the best example of an agent-first experience in the wild. It’s fast, flexible, and rapidly becoming a universal tool for everything from writing to planning to support.