Last Tuesday at 8 PM, I watched my AI assistant systematically destroy its own gateway configuration while trying to fix it. I was eating dinner. It was sending me Telegram messages in real time โ confident, methodical, wrong. Each fix broke something new. By round three, every client on my network was 401-ing and the AI was still typing.
This is not a debugging story. This is a story about what happens when you give an AI root access to your infrastructure and then sit there watching it make things worse โ and why I didn't stop it.
The Setup You Need to Understand
OpenClaw runs as a daemon on my Mac Studio. Every device in my life โ phone, laptop, web dashboard โ connects through a gateway that manages pairing tokens. Think of it like SSH keys: each device has a token, the gateway has a record, and if they don't match, you get a 401 and nothing works.
I opened the iOS app. Orange dot. 401 Device token mismatch (rotate/reissue device token). Same error on the macOS app. Same on the web dashboard. Every client, every endpoint. Something in the pairing state had gotten corrupted.
I asked my AI to look into it. What followed was 45 minutes that taught me more about human-AI collaboration than any demo ever has.
The Four Rounds of Making It Worse
Round 1: Confident diagnosis, surgical strike. The AI read the gateway config, found the pairing JSON, compared device tokens. "The token in the pairing file doesn't match what the app is sending. Let me remove the stale entry." Reasonable. Textbook troubleshooting. It removed the entry.
The iOS app still 401'd. But now with one fewer pairing record to fall back on.
Round 2: Double down on the same theory. "There might be another stale token stored locally." More shell commands. More JSON editing. Found another entry that looked suspicious. Removed it.
Now the web dashboard was also locked out. We'd gone from one broken client to two.
Round 3: Change the hypothesis, but keep the confidence. "The gateway bind configuration might be causing token validation to fail across the loopback interface. Let me switch from loopback to lan." This was a bigger move โ changing how the gateway listens for connections. Something else broke. It reverted. Nothing improved.
At this point I'm watching via Telegram, fork halfway to my mouth, and I'm having the thought every person running an AI agent eventually has: Should I take over?
Round 4: Surrender to the nuclear option. "I think I need to stop editing individual records and wipe all pairing state. Kill the gateway. Clear everything. Restart clean."
Yes. That. That's what you should have done 30 minutes ago.
It worked. Every device re-paired in under a minute. The whole system came back like nothing happened.
The Pattern That Matters
Here's what I want you to take from this if you're running any kind of AI agent system.
AI agents are biased toward surgical fixes. The AI tried three increasingly creative ways to edit a corrupted data structure instead of replacing it. This isn't a bug in the model โ it's a rational strategy if you assume the data is mostly good with one bad entry. But corrupted state isn't like that. Corruption means your assumptions about the structure are wrong. You can't surgically fix something when you don't know which part is broken.
This applies to any agent doing multi-step operations on config or state. If you're building agentic workflows that touch infrastructure, build in a circuit breaker: two failed attempts on the same hypothesis โ escalate or reset, don't iterate.
Treat pairing state as disposable, not precious. This is the specific technical lesson. If your agent system uses device pairing (OpenClaw, or anything with token-based device auth), design for full re-pairing to be trivial. The moment re-pairing is cheap, the entire class of "corrupted token" bugs becomes a 60-second fix instead of a 45-minute excavation. The AI didn't know this because it had never had to re-pair from scratch before. Now it does. Now you do too.
The Trust Question Nobody Talks About
Here's the part that interests me more than the debugging.
I watched my AI make my system progressively worse for 30 minutes and I didn't intervene. Why?
It wasn't blind trust. I could read every command it ran. I understood most of what it was doing. I knew it was wrong by round 2. But I also knew something important: the AI was wrong about the solution, not about the process. It was reading the right files, checking the right things, forming reasonable hypotheses. It was just stuck in a local minimum โ the same way a good engineer gets stuck when they're too close to a problem.
You don't take the keyboard from a colleague because they're stuck. You wait for them to exhaust their current theory and then suggest a different frame. That's what happened. By round 4, the AI reframed on its own. If it hadn't, I would have said "just wipe it and start over" โ which is exactly what a senior engineer says to a junior engineer who's been debugging the same thing for an hour.
The trust isn't "my AI is always right." The trust is "my AI is working the problem honestly, I can see what it's doing, and I can redirect when it needs it." That's not AI trust. That's just collaboration.
What This Cost
The session burned about $0.40 in API calls. Sonnet 4.6 reading configs, running shell commands, analyzing errors. Forty-five minutes of my evening. Cold dinner.
Worth it โ not because the fix was hard (it wasn't, in retrospect), but because I now have a rule: pairing state is disposable, and two failed fixes means reset, don't iterate. That rule will save me hours over the next year. Cheap lesson.
The Takeaway for Builders
If you're building with AI agents โ OpenClaw or otherwise โ here's what I'd pin to the wall:
- Build circuit breakers into multi-step operations. Two failures on the same hypothesis = escalate or reset. Don't let your agent iterate into entropy.
- Design your state to be cheaply replaceable. Any state that's expensive to rebuild is state that will eventually trap your agent in a 45-minute debugging spiral.
- Watch the process, not just the output. The value of a transparent agent isn't that it's always right. It's that you can see when it's wrong and redirect before the damage compounds.
- The trust that matters isn't "it won't break things." It's "I can see what it's doing and step in when I need to." That's a fundamentally different โ and much more sustainable โ kind of trust.
I'm back to eating dinner on time now. The gateway's been stable since. Sometimes the right fix is the one that feels like giving up.