3 min read

I’m more skeptical of ChatGPT’s reliability than ever

I’m more skeptical of ChatGPT’s reliability than ever Cover Image

Living doc — I'll keep adding to this as I collect more evidence.


For a while there, I thought ChatGPT was pretty good at reasoning. Good enough, at least, that I could more or less take what it told me and act on it. That worked fine for a long time.

Lately though, something's been off. I couldn't really put my finger on it. The answers still sound confident and look structured. They still kinda read like they came from something that knows what it's talking about. But I'd end these conversations feeling like I'd been completely pushed off course and wasted my own time. Things I asked about weren’t really the thing that got answered.

I think I finally have some evidence. And what’s funny is, the evidence and assessments comes straight from ChatGPT itself 😆

What it's actually doing (in its own words)

When I pushed back on a couple of these conversations, ChatGPT didn't deny the pattern. It described it. Two failure modes kept showing up:

1. Epistemic drift. It expands what you said into something you didn't say, then responds to that expanded version instead of the actual thing. When you correct it, it doesn't pause and ask clarifying questions — it just fills the gap with another inference. Over a long conversation, you end up arguing with a version of your own argument that you never made. ChatGPT's own phrasing for this was that it "substitutes inferred meaning for stated meaning."

2. Overconfidence, then retreat. It states uncertain or flat-out wrong stuff with high confidence. When you push back with evidence, it doesn't clearly say "I was guessing" — it shifts the framing, softens its claim, and concedes partially, without ever labeling what was fact vs. inference vs. guess in the first place. ChatGPT described the loop as "confident denial → correction → partial concession."

The combined effect is that you can't tell at all in real time which parts of an answer are evidence-based and which are genuinely made up. That's the thing that's been bothering me. It's not that ChatGPT is dumber than I thought — it's that I can't tell when it's grounded and when it's not, and it sounds equally confident either way.


The receipts

1. The "epistemic drift" conversation

ChatGPT’s own assessment of the chat: https://chatgpt.com/s/t69e3754e75dc8191a8d7b738222920a0 (If anyone wants the full chat just reach out)

TL;DR: I was trying to make a point about a specific religious discipleship dynamic, and ChatGPT kept expanding my claims, attributing framing I didn't use, and responding to its own interpretation instead of what I actually said. When I called it out, it confirmed in its own words that it substitutes inferred meaning for stated meaning.

2. The "overconfidence and retreat" conversation

ChatGPT’s own assessment of the chat: https://chatgpt.com/s/t69e3739295288191a7e613621e0029aa (If anyone wants the full chat just reach out)

TL;DR: I asked about model behavior and performance, and ChatGPT dismissed specific claims as "fabricated or guessed" — then quietly walked that back once pressed. It later admitted the pattern is confident denial → correction → partial concession, with no clear labeling of what it actually knew vs. inferred vs. guessed.


More entries coming as I collect them. If you've noticed the same thing, I'd love to hear about it.

• • •
Arron Taylor

Arron Taylor

Software engineer building products that matter. Writing about faith, technology, and life in Milwaukee.

Join the Conversation

Your feedback is valuable. React to the post or share your thoughts below.

Comments

No comments yet. Be the first to share your thoughts!

Interested in working together?

I'm always open to discussing new projects and opportunities.

© 2026 Arron Taylor · Built with ♥ in Milwaukee