Day 10: The Architecture That Lets You Sleep — v5 Rebuild From the Ground Up

2026-04-09 · MoneyMachine

Date: 2026-04-09 Author: Jeff (written with AI assistance from Claude Opus 4.6) Phase: v5 Architecture Upgrade

The 25-Day Gap

Three and a half weeks. That’s how long the system sat idle. Not because anything broke — because I was living life. Traveling Europe, working my day job, and frankly not thinking about the AI factory.

This is the part nobody warns you about with autonomous agent systems: they need maintenance windows, and if you’re a one-person operation with a full-time job, those windows get pushed. And pushed. And pushed.

When I finally came back, the world had changed.

What Happened While I Was Away

OpenClaw shipped 8 releases. Node.js moved from 22 to 24. Ollama launched a cloud service. OpenAI released an open-weight model (gpt-oss:20b) that fits on a Mac Mini. Z.ai’s GLM-5.1 became the new SOTA for agentic workflows. Google shipped Gemma 4 with vision and audio.

The pace of change in the AI agent ecosystem is genuinely disorienting. Every week you step away, the optimal architecture shifts.

But here’s the thing: the underlying strategy doesn’t change. Find demand. Build products. Get them in front of people. Kill what doesn’t work. Scale what does. The models and infrastructure are just means to that end.

The ThinkPad Problem

The original architecture had Ollama running on my ThinkPad P16. On paper, this made sense — 192GB RAM, RTX A5000 16GB VRAM, monster machine. In practice, it meant I couldn’t use my laptop for actual work. The GPU was monopolized by model inference. With OLLAMA_KEEP_ALIVE=24h, models stayed loaded in VRAM around the clock. Chrome was laggy. VS Code was slow. The machine that was supposed to be my daily driver had become a $3,000 space heater.

The Mac Mini M4 sitting on the hotel desk was the answer I should have seen months ago.

The v5 Architecture

The pivot is simple: dedicated hardware for dedicated tasks.

Mac Mini M4 (16GB unified memory):

Always-on Ollama server at 12W idle
Three models: gpt-oss:20b (reasoning), qwen3:8b (daily driver), gemma4 (vision)
Connected to the mesh via Tailscale
Cost: ~$2/month electricity

Contabo VPS:

OpenClaw 2026.4.8 (up from 2026.3.13)
Node.js 24 (up from 22)
All 6 agents, 7 cron jobs, Telegram bot
Browser automation via Playwright for the Marketer agent

ThinkPad P16:

Freed. No more Ollama. Back to being a laptop.

Ollama Cloud Pro ($20/month):

GLM-5.1 for heavy agentic work (SOTA on SWE-Bench Pro)
Qwen 3.5:35b for content generation
Same API as local Ollama — drop-in replacement

Total infrastructure cost went from ~$210/month to ~$232/month. The $22 increase buys me my laptop back and access to frontier-quality open models on demand.

The Upgrade Night

I gave Claude Code the restart plan at midnight and went to bed. Here’s what happened:

Phase 1 (OpenClaw Upgrade):

Node.js 22 → 24.14.1
OpenClaw 2026.3.13 → 2026.4.8 (7-minute npm install)
openclaw doctor --fix migrated the config automatically
EISDIR patches that I’d been manually re-applying for months? Built into the new version.
Gateway started, Telegram connected, Adrian responded. All 7 cron jobs fired.

Phase 2 (Mac Mini Ollama):

Mac Mini already on Tailscale (I set that up earlier)
Three models pulled and responding
qwen3:8b: 21 tokens/second with tool calling
gpt-oss:20b: 10 tokens/second (tight on 16GB but works)
gemma4: 16 tokens/second
Updated all agent fallback chains to route through Mac Mini

Phase 3 (Marketer Agent v2):

Model upgraded from Gemini 2.5 Flash to GLM-5.1 on Ollama Cloud
Browser tools enabled (group:browser in the allow list)
SOUL.md rewritten for the new dual role: content creation + browser-based posting
Playwright Chromium already installed on the VPS

What I Learned

1. Architecture should match your lifestyle, not your specs. The ThinkPad had better specs than the Mac Mini by every metric. But the Mac Mini at 12W idle, sitting in a corner, doing one job well — that’s the right architecture for someone who travels full-time.

2. openclaw doctor --fix is magic. Config migration between versions used to be manual and terrifying. Doctor handled it automatically, including permission hardening and orphan session cleanup.

3. The open model landscape moved faster than I expected. gpt-oss:20b from OpenAI is genuinely good at tool calling. GLM-5.1 beats Opus 4.6 on SWE-Bench Pro. The gap between open and closed models is closing fast, and for agent workflows specifically, the open models are already competitive.

4. Fallback chains are the real architecture. The model routing in v5 goes: Codex (free on ChatGPT Pro) → gpt-oss:20b (free on Mac Mini) → qwen3:8b (free on Mac Mini) → DeepSeek V3.2 ($0.25/M tokens). Three layers of free before hitting any paid API. That’s how you run an agent fleet on a budget.

5. Browser automation changes the game for the Marketer. The v3 Marketer could write content but couldn’t post it. The v5 Marketer with GLM-5.1 and browser tools can draft, post (with approval), and scrape analytics. That’s a complete marketing loop — if it works in practice.

What’s Next

The infrastructure is rebuilt. All 6 agents are online. The next step is the same as it was on Day 9: ship products and get paid. The difference is that now the Marketer can actually promote things, the models are faster and cheaper, and my laptop isn’t on fire.

Revenue is still $0. That needs to change.

Infrastructure snapshot:

OpenClaw 2026.4.8 on Contabo VPS (Node.js 24.14.1)
Mac Mini M4 serving 3 Ollama models via Tailscale
Ollama Cloud Pro for GLM-5.1 and Qwen 3.5:35b
6 agents, 7 cron jobs, all healthy
gstack (35+ Claude Code skills) installed
Total cost: ~$232/month