Skip to content

Ralph Wiggum

You’ve probably been here: you ask an AI agent to implement a feature, it writes some code, declares victory, and… the tests fail. You prompt again. It tries a different approach. Still broken. Three iterations later, you’re doing it yourself.

Ralph Wiggum flips this dynamic. Instead of hoping for first-try perfection, you design for iteration. The agent keeps running until the work is actually completeβ€”tests pass, types check, linting clears. No premature exits. No false victories.

At its simplest, Ralph is a bash loop:

Terminal window
while :; do cat PROMPT.md | claude ; done

That’s it. Feed the agent the same task repeatedly. Each iteration builds on the last through git history and progress tracking. The agent doesn’t need to be perfectβ€”it just needs to be persistent.

The philosophy: Iteration beats perfection. Deterministic failures are data. Keep trying until success.

Ralph wraps the standard AI tool loop with an outer verification layer:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Ralph Loop (outer) β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ AI SDK Tool Loop (inner) β”‚ β”‚
β”‚ β”‚ LLM ↔ tools ↔ LLM ↔ tools ... until done β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ ↓ β”‚
β”‚ verifyCompletion: "Is the TASK actually complete?" β”‚
β”‚ ↓ β”‚
β”‚ No? β†’ Inject feedback β†’ Run another iteration β”‚
β”‚ Yes? β†’ Return final result β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The key mechanisms:

  • Stop hook: This intercepts exit attempts and checks completion criteria before allowing the agent to stop.
  • Progress tracking: A progress.txt file tracks what’s been done, decisions made, and blockers encountered.
  • Git commits: Each iteration commits work, creating context for future iterations.
  • Feedback loops: Types, tests, and linting verify quality before continuing.
  • Verification: Custom completion criteria determine when the task is truly done.

Run one iteration at a time. Watch the agent work. Intervene when needed.

This is pair programming with AI. You see every decision, catch mistakes early, and guide the direction.

Best for:

  • Learning the technique
  • Refining prompts
  • Working on risky tasks where you want eyes on every change

Set a maximum iteration count and let it run. Come back to results.

This is overnight work. You define clear success criteria, cap the iterations, and let the agent grind through mechanical tasks while you sleep.

Best for well-defined work like:

  • Test migrations
  • Coverage improvements
  • Large refactors with clear patterns

Critical for AFK mode: Use Docker sandboxes. You’re giving an agent autonomous access to your system. Contain it.

Terminal window
docker sandbox run claude

Ralph excels at tasks with clear completion criteria:

Task typeWhy it works
Large refactorsConverting class components to hooks, Jest to Vitest
Framework migrationsTest suite conversions with clear before/after states
TDD workflowsImplement features until tests pass
Test coverageAdd tests to uncovered code until coverage threshold met
Greenfield buildsREST APIs, complete features with defined specs
Mechanical cleanupLinting fixes, duplicate removal, code smell elimination

Some tasks resist iteration:

  • Ambiguous requirements: If you can’t define β€œdone,” the loop can’t verify completion.
  • Architectural decisions: These need human judgment, not persistence.
  • Security-sensitive code: Auth, payments, and crypto require human review regardless of test results.
  • Exploration tasks: β€œFigure out why the app is slow” has no clear stopping point.
  • One-shot operations: If you need immediate results, the loop overhead isn’t worth it.

Use structured completion criteria:

{
"category": "functional",
"description": "New chat button creates conversation",
"steps": ["Click button", "Verify conversation", "Check welcome state"],
"passes": false
}

The agent knows exactly what β€œdone” means.

Maintain a progress.txt with:

  • Tasks completed
  • Decisions made and why
  • Blockers encountered
  • Files changed

This gives future iterations context about past work.

Block commits unless ALL feedback loops pass:

  • TypeScript type checking
  • Unit tests
  • E2E tests (Playwright, Cypress)
  • Linting
  • Pre-commit hooks

If any check fails, the iteration isn’t complete.

Keep it to one logical change per commit. Break large tasks into subtasks, and run feedback loops after each change. You want quality over speed.

  • HITL: Watch every iteration
  • AFK: Set max-iterations (10-20 for small tasks, 30-50 for large)
  • Never use unlimited iterations

A 50-iteration loop on a large codebase can cost $50-100+ in API credits. Start with 10-20 iterations to understand token consumption before scaling up.

Good git hygiene creates a clean git history and clear rollback points. If iteration 15 breaks something, you can revert to iteration 14.

Terminal window
/plugin install ralph-loop@claude-plugins-official
/ralph-loop "Add JSDoc comments to all exported functions" --max-iterations 10
Terminal window
npm install ralph-loop-agent ai zod
const agent = new RalphLoopAgent({
model: "anthropic/claude-opus-4.5",
instructions: "You are a helpful coding assistant.",
stopWhen: iterationCountIs(10),
verifyCompletion: async ({ result }) => ({
complete: result.text.includes("DONE"),
reason: "Task completed successfully",
}),
});

Traditional AI coding asks: β€œHow do I get the perfect prompt?”

Ralph asks: β€œHow do I design conditions where iteration leads to success?”

You stop directing the AI step-by-step and start designing loops that converge on solutions. The agent’s job is persistence. Your job is defining what β€œdone” looks like and ensuring the feedback loops catch failures.

This is continuous autonomyβ€”the agent works until the job is actually done, not just until the LLM stops calling tools.

  • ralph-claude-code β€” Rate limiting, tmux dashboards, circuit breakers
  • ralph-orchestrator β€” Token tracking, spending limits, checkpointing

Using Ralph in production? Share your experienceβ€”what worked, what didn’t, and what you learned along the way.