The tool I’ve been tinkering with just made headlines. Peter Steinberger, creator of OpenClaw, is joining OpenAI to “drive the next generation of personal agents.” Sam Altman called him “a genius.” Not bad for an open source project only weeks old…
I’ve been running OpenClaw as a personal AI agent for several weeks now (in very strict isolation). It handles a standalone calendar, sends me reminders, processes emails I forward to it, manages my task lists, writes code and makes commits to specific projects I’ve shared from mine, to its GitHub account. Think Jarvis, but with more cron jobs and less Robert Downey Jr. Along the way I’ve learned a few things the hard way. I already posted about some of those last week, and here are five more.

1. Your AI Will Forget Unless You Make It Remember
This one caught me off guard. Every time your agent starts a new session, it wakes up with absolutely no memory of what you did yesterday. None. It’s like your intelligent, funny, witty teenage child, who wakes up every morning with no memory of you reminding them to clean their room tomorrow…
The fix is decidedly low tech, namely markdown files! The agent reads a MEMORY.md file at the start of every session for long term context, plus daily “summarising” note files for recent history. Without these, every conversation starts from zero. You find yourself re-explaining the same decisions, the same preferences, the same project context. It can be quite frustrating to say the least!
In short, if you want your AI to know something tomorrow (or even immediately following a /reset), write it down today. In a file. On disk. Like it’s 1995 (only not floppy)…
2. Silent Fallbacks Will Eat Your Budget
Here’s a fun one. I set up a simple reminder cron job. Simple task, should cost fractions of a penny. I configured it to use Google Gemini 2.5 Flash Lite, a super-fast, super cheap model. Perfectly adequate for “tell Alex to [insert reminder here].” (side note: mine has permission to do so “with attitude” if need be!).
What I didn’t click was that by default when Google rate-limited Gemini (I was using the free version to start), the system silently fell back to Opus, Anthropic’s most expensive model. My bedtime reminder, a task that could run on a calculator, was burning through premium AI tokens! I only found out when I was looking at some failed, rate-limited tasks. The bot didn’t think this would be something worthy of proactively letting me know. No warning, no alert. Just a quiet, expensive upgrade.
Check your fallback chains, then check them again. Then check them after each time you do an upgrade (which has on one occasion so badly broken the contexts and channels for the gateway, it forgot who it was and I had to restore from a backup!).
Finally, setup a monitoring page on your “Mission Control”. You should definitely build one of these – a bot-built small webapp for managing and monitoring your bot, e.g. here’s mine at the moment:

3. The Hidden Cost of “Good Enough” Model Defaults
Related to the above, but subtler. Not every task your agent performs needs the flagship model. Heartbeat checks, health pings, simple notifications: these can run on the cheapest model available. I’ve got simple jobs running on Gemini Flash Lite, which costs almost nothing. Meanwhile, many of my cron jobs were defaulting to models ten times the price for work that was just as simple.
Match the model to the task. Your “send me the weather” job doesn’t need the same brain as your “analyse this quarterly report” job. It sounds obvious, but so do many things!
4. Your AI Anchors on Context, Not Facts
This is the one that properly messed with my head. I spent a long evening debugging cron job issues. Hours of back and forth, pasting logs, tweaking configs. All the conversation context was about problems from Tuesday. By the time we finished, my agent was convinced it was still Tuesday.
It was Wednesday.🤦
The model doesn’t always know what day it is from an internal clock (you know – those things that have been in computers for decades…). It appears to infer “reality” from the conversation window. If your context is full of Tuesday’s problems, Tuesday is reality. This has real consequences when you’re scheduling things, setting reminders, or asking “what’s happening tomorrow”. I’ve seen this happen many times in different scenarios, including pre-scheduled morning briefs based on the wrong day and scheduling cron jobs for the wrong day and time.
Your AI’s sense of the world is only as good as the context you’ve given it, and context can lie. Once again I recommend a Mission Control to easily eyeball things occasionally.

5. Trust Logs Over Vibes
At one point I asked my agent which model it had used for a particular task. It confidently told me Sonnet, but the logs showed Opus (via fallback). The model wasn’t lying exactly… it just didn’t know. It reported what it thought was true based on its configuration, not what actually happened at the infrastructure level.
This applies broadly. Just like any chatbot or web-based bot you’ve been talking to for the last 3 years, your AI will sound confident about things it cannot possibly verify or doesn’t want to as it might be a wasteful activity, so it just goes with what it has in context. System level behaviour, actual API calls, real thing live in logs, not in chat responses, so when it matters always go to the source.
(I think of this as the “Did you really brush your teeth? Shall we go check if the brush is wet?” scenario).
What Next
Peter Steinberger’s very high profile move to OpenAI tells you where this is heading. Personal agents aren’t a nerdy hobbyist curiosity anymore, they’re absolutely going mainstream. OpenClaw will continue as open source via a foundation, which is great, but the bigger signal is that OpenAI wants this expertise in house. They’re betting that millions of people will be running agents like this, and you would imagine soon, by default they’ll run on Codex.
When that happens, every lesson I’ve learned will be demonstrated at global scale. People will inevitably burn money on cheap tasks, wonder why their agent forgot last week’s conversation and trust a confident response over a log file.
If you’re thinking about running your own agent, start now while it’s still a bit rough around the edges. The lessons are cheaper to learn on an old laptop in your cupboard (on an isolated network, with isolated accounts!), than in production and connected to all your company’s systems!
Now I’m off to go and find some 10 year old memory DIMMs to sell for a 400% markup.
💻💰🎉
PS – If you got this far, thanks for reading, and I apologise for the rather click-baity title! Don’t hate the player, etc…






RSS – Posts