
It Gets Things Wrong in a Way That Looks Right
It Gets Things Wrong in a Way That Looks Right
I built something with AI recently that looked impressive.
It was a content evaluation system I wanted for my consulting business, and the AI produced a good system very quickly. If I'd been evaluating it purely on whether it built what I was looking for, I would have called it done and moved on.
But I had deliberately given the AI very simple instructions for the outcome I wanted and very few guardrails to work with. This was an experiment.
The AI had taken the fastest path to a working solution, not the most maintainable one. The way it was structured meant that every time I needed to change anything, multiple changes would be required which increased the risk of errors.
It wasn't built to last. It was built to answer the question.
The only reason I caught it was because of work I did decades ago at university and as an analyst, building systems from scratch and learning the hard way what happens when you build without good design principles and structure. I'd spent years in that foundational work, making the kinds of mistakes that teach you what good architecture looks like and what it costs when you skip it.
If I hadn't had that background, I would have accepted the output at face value. It looked right. It worked. It just wasn't right in the way that matters over time.
The pattern I keep hearing
I was talking to a senior leader in a multi-national engineering organisation recently. They were telling about reviewing a piece of work that had been put together by their team using AI. The document was well-formatted, the language was professional, and it covered the topic at hand.
But when they read it carefully, there were gaps - assumptions that hadn't been tested, conclusions that sounded confident but didn't hold up under scrutiny. The kind of issues that anyone with deep experience in that area would catch, but that the AI had papered over with polished language.
What frustrated this senior leader wasn't the AI though. It was that nobody on their team had caught it before it reached them. Several people had reviewed the document and signed off on it. Not because they were careless, but because the output looked so complete and well-structured that it didn't trigger the instinct to question it.
The problem isn't that AI gets things wrong. It's that it gets things wrong in a way that looks right.
Why polished output is harder to check than rough output
When someone hands you a rough first draft, your brain automatically shifts into evaluation mode. The imperfections signal that the thinking is still in progress, so you engage with it critically.
You question the underlying assumptions. You look test the edge cases were the thinking breaks. You look for what is missing.
But when something arrives looking finished, well-written and nicely formatted, your brain processes it differently. It pattern-matches against "complete work" and your critical filter relaxes a little. You're reading to confirm rather than reading to question.
AI produces polished output by default. Every response is articulate, structured and confident, even when the underlying reasoning is flawed. And that polish means it’s easy to let our guard down and test for quality.
This matters beyond individual documents or code. It matters because it changes the standard of checking that happens across a team, and because the people who can still catch these issues are often the ones who learned their craft before AI existed.
The Maths exam model
But these principles and skills can be still learned if we choose to invest the time to learn them. We’ve seen this play out in the education system with calculators.
In senior Mathematics, exams still have a calculator-free section. It’s not because calculators aren't useful, but because there's a difference between someone who can use a calculator and someone who understands the mathematics well enough to know when the calculator's answer doesn't make sense.
The calculator-free section tests whether the student actually grasps what's happening beneath the tool. Once they've demonstrated that understanding, the calculator section includes much harder problems because they've earned the right to use the technology. They've proven they understand what function they need to use, they can clearly define the inputs and can evaluate whether what comes back is reasonable.
I think this is the model that businesses need to be thinking about when it comes to AI.
Not "don't use AI." That would be like banning calculators. But "make sure you have the foundational understanding to utilise AI effectively”.
The irony is these are the same skills we’ve always needed to delegate effectively to humans. It’s being able to clearly articulate the outcome you want, define the constraints and be able to recognise when the output doesn't meet the standard. This sounds like a simple thing to do but it is where expertise lives.
Because without that foundation, you can't spot the edge cases. You can't identify where the tool's assumptions break down. You can't tell the difference between a confident answer and a correct one.
The people who already have that foundation are the ones who did the work before AI made it optional. And as I explored in last week's newsletter, the pathway that created those people is the very thing AI is starting to replace.
What this means for business owners
If your team is using AI, and most teams now are in some form, two questions now become very important.
First, are you investing in the foundational understanding of anyone using AI so that you don’t get surface level AI output but what the business truly needs across all domains?
And the second is whether anyone is checking the answer. And whether the people doing the checking have the criteria and depth of experience to know what to look for.
This is not a technology problem. It's a leadership and governance question.
As I’ve mentioned before, I write this newsletter by hand, every week, for the same reason. Not because AI couldn't write something that sounds like me. It can.
But writing forces me to explore what I actually think, to test whether my reasoning holds up when I have to put it into words, and to maintain the thinking skills that I then bring to my clients.
It's the calculator-free section of my own work. And I believe it's what makes the calculator section, where I do use AI extensively, actually valuable.
So here’s two questions to ponder this week:
Are you investing in the foundational skills your team needs to direct AI well - the ability to define clear outcomes, set the right constraints, and specify what good looks like before the tool starts working?
And when that output comes back looking polished and professional, is someone with the right depth of experience actually checking whether it's correct?
Until next week,
Kylie.
