OpenClaw + Codex/ClaudeCode Agent Swarm: The One-Person Dev Team [Full Setup]

三德子曰:闻道有先后,术业有专攻。此君借OpenClaw为臂助,驱使Codex如臂使指,日新月异,精进九十四。其识人之明,已窥AI赋能之玄机,个体产能诚可期也。然凡事有利弊,此法虽锐,尚虑代码之质,是否精雕细琢?日日挥霍,其费几何?倘若枝叶繁茂,根基不稳,过度倚仗,失其自主,岂不危哉?夫投资之道,慢钱赢快钱,情绪为信号,系统御心猿。工具固善,风控为本,切不可忘。

作者: @elvissun (Elvis)
> 时间: 2026/2/23 21:07:46 > 原文: https://x.com/elvissun/status/2025920521871716562

I don't use Codex or Claude Code directly anymore.

I use OpenClaw as my orchestration layer. My orchestrator, Zoe, spawns the agents, writes their prompts, picks the right model for each task, monitors progress, and pings me on Telegram when PRs are ready to merge.

Proof points from the last 4 weeks:

My git history looks like I just hired a dev team. In reality it's just me going from managing claude code, to managing an openclaw agent that manages a fleet of other claude code and codex agents.

Success rate: The system one-shots almost all small to medium tasks without any intervention.

Cost: ~$100/month for Claude and $90/month for Codex, but you can start with $20.

Here's why this works better than using Codex or Claude Code directly:

Codex and Claude Code have very little context about your business.

They see code. They don't see the full picture of your business.

OpenClaw changes the equation. It acts as the orchestration layer between you and all agents — it holds all my business context (customer data, meeting notes, past decisions, what worked, what failed) inside my Obsidian vault, and translates historical context into precise prompts for each coding agent. The agents stay focused on code. The orchestrator stays at the high strategy level.

Here's how the system works at a high level:

Last week Stripe wrote about their background agent system called "Minions" — parallel coding agents backed by a centralized orchestration layer. I accidentally built the same thing but it runs locally on my Mac mini.

Before I tell you how to set this up, you should know WHY you need an agent orchestrator.

Why One AI Can't Do Both

Context windows are zero-sum. You have to choose what goes in.

Fill it with code → no room for business context. Fill it with customer history → no room for the codebase. This is why the two-tier system works: each AI is loaded with exactly what it needs.

OpenClaw and Codex have drastically different context:

Specialization through context, not through different models.

The Full 8-step Workflow

Let me walk through a real example from last week.

Step 1: Customer Request → Scoping with Zoe

I had a call with an agency customer. They wanted to reuse configurations they've already set up across the team.

After the call, I talked through the request with Zoe. Because all my meeting notes sync automatically to my obsidian vault, zero explanation was needed on my end. We scoped out the feature together — and landed on a template system that lets them save and edit their existing configurations.

Then Zoe does three things:

Step 2: Spawn the Agent

Each agent gets its own worktree (isolated branch) and tmux session:

The agent runs in a tmux session with full terminal logging via a script.

Here's how we launch agents:

I used to use codex exec or claude -p, but switch to tmux recently:

tmux is far better because mid-task redirection is powerful. Agent going the wrong direction? Don't kill it:

The task gets tracked in .clawdbot/active-tasks.json:

When complete, it updates with PR number and checks. (More on this in step 5)

Step 3: Monitoring in a loop

A cron job runs every 10 minutes to babysit all agents. This pretty much functions as an improved Ralph Loop, more on it later.

But it doesn't poll the agents directly — that would be expensive. Instead, it runs a script that reads the JSON registry and checks:

The script is 100% deterministic and extremely token-efficient:

I'm not watching terminals. The system tells me when to look.

Step 4: Agent Creates PR

The agent commits, pushes, and opens a PR via gh pr create --fill. At this point I do NOT get notified — a PR alone isn't done.

Definition of done (very important your agent knows this):

Step 5: Automated Code Review

Every PR gets reviewed by three AI models. They catch different things:

All three post comments directly on the PR.

Step 6: Automated Testing

Our CI pipeline runs a heavy amount of automated tests:

I added a new rule last week: if the PR changes any UI, it must include a screenshot in the PR description. Otherwise CI fails. This dramatically shortens review time — I can see exactly what changed without clicking through the preview.

Step 7: Human Review

Now I get the Telegram notification: "PR #341 ready for review."

By this point:

My review takes 5-10 minutes. Many PRs I merge without reading the code — the screenshot shows me everything I need.

Step 8: Merge

PR merges. A daily cron job cleans up orphaned worktrees and task registry json.

The Ralph Loop V2

This is essentially the Ralph Loop, but better.

The Ralph Loop pulls context from memory, generate output, evaluate results, save learnings. But most implementations run the same prompt each cycle. The distilled learnings improve future retrievals, but the prompt itself stays static.

Our system is different. When an agent fails, Zoe doesn't just respawn it with the same prompt. She looks at the failure with full business context and figures out how to unblock it:

Zoe babysits agents through to completion. She has context the agents don't — customer history, meeting notes, what we tried before, why it failed. She uses that context to write better prompts on each retry.

But she also doesn't wait for me to assign tasks. She finds work proactively:

I take a walk after a customer call. Come back to Telegram: "7 PRs ready for review. 3 features, 4 bug fixes."

When agents succeed, the pattern gets logged. "This prompt structure works for billing features." "Codex needs the type definitions upfront." "Always include the test file paths."

The reward signals are: CI passing, all three code reviews passing, human merge. Any failure triggers the loop. Over time, Zoe writes better prompts because she remembers what shipped.

Choosing the Right Agent

Not all coding agents are equal. Quick reference:

Codex is my workhorse. Backend logic, complex bugs, multi-file refactors, anything that requires reasoning across the codebase. It's slower but thorough. I use it for 90% of tasks.

Claude Code is faster and better at frontend work. It also has fewer permission issues, so it's great for git operations. (I used to use this more to drive day to day, but Codex 5.3 is simply better and faster now)

Gemini has a different superpower — design sensibility. For beautiful UIs, I'll have Gemini generate an HTML/CSS spec first, then hand that to Claude Code to implement in our component system. Gemini designs, Claude builds.

Zoe picks the right agent for each task and routes outputs between them. A billing system bug goes to Codex. A button style fix goes to Claude Code. A new dashboard design starts with Gemini.

How to Set This Up

Copy this entire article into OpenClaw and tell it: "Implement this agent swarm setup for my codebase."

It'll read the architecture, create the scripts, set up the directory structure, and configure cron monitoring. Done in 10 minutes.

No course to sell you.

The Bottleneck Nobody Expects

Here's the ceiling I'm hitting right now: RAM.

Each agent needs its own worktree. Each worktree needs its own node_modules. Each agent runs builds, type checks, tests. Five agents running simultaneously means five parallel TypeScript compilers, five test runners, five sets of dependencies loaded into memory.

My Mac Mini with 16GB tops out at 4-5 agents before it starts swapping — and I need to be lucky they don't try to build at the same time.

So I bought a Mac Studio M4 max with 128GB RAM ($3,500) to power this system. It arrives end of March and I'll share if it's worth it.

Up Next: The One-Person Million-Dollar Company

We're going to see a ton of one-person million-dollar companies starting in 2026. The leverage is massive for those who understand how to build recursively self-improving agents.

This is what it looks like: an AI orchestrator as an extension of yourself (like what Zoe is to me), delegating work to specialized agents that handle different business functions. Engineering. Customer support. Ops. Marketing. Each agent focused on what it's good at. You maintain laser focus and full control.

The next generation of entrepreneurs won't hire a team of 10 to do what one person with the right system can do. They'll build like this — staying small, moving fast, shipping daily.

There's so much AI-generated slop right now. So much hype around agents and "mission controls" without building anything actually useful. Fancy demos with no real-world benefits.

I'm trying to do the opposite: less hype, more documentation of building an actual business. Real customers, real revenue, real commits that ship to production, and real loss too.

What am I building? Agentic PR — a one-person company taking on the enterprise PR incumbents. Agents that help startups get press coverage without a $10k/month retainer.

If you want to see how far I take this, follow along.