Circuit Breaker — Worker On/Off Control
internal prototype · canonical JSON + Dreamborn Forge HTML
internal generated
design_doc · markdown

Circuit Breaker — Worker On/Off Control

Circuit Breaker — Worker On/Off Control Date: 2026 04 27 Branch: feature/marketing agents Status: Approved for implementation Overview Add a circuit breaker to every worker agent — a flag that stops the runner from executing without stopping the systemd timer. Combined with a global kill switch that terminates all running processes immediately. Two mechanism...

Circuit Breaker — Worker On/Off Control

Date: 2026-04-27 Branch: feature/marketing-agents Status: Approved for implementation

---

Overview

Add a circuit breaker to every worker agent — a flag that stops the runner from executing without stopping the systemd timer. Combined with a global kill switch that terminates all running processes immediately.

  • Per-agent circuit breaker — graceful pause, takes effect on next timer tick
  • Global kill switch — immediate process termination + all breakers open

---

Storage

Migration: supabase/migrations/034_circuit_breaker.sql

Add one column to agent_state:

``sql ALTER TABLE agent_state ADD COLUMN circuit_breaker TEXT NOT NULL DEFAULT 'closed' CHECK (circuit_breaker IN ('open', 'closed')); ``

'closed' = normal (agent runs). 'open' = paused (agent skips). Default 'closed' — existing agents unaffected after migration.

No separate global flag. "All paused" = all rows have circuit_breaker = 'open'. Resume = patch rows back to 'closed'.

---

Runner Check

In agents/shared/runner.py, add _check_circuit_breaker(sb) called at the very top of _run() — before fetch_inbox, fetch_work_items, or any other logic.

  • If circuit_breaker = 'open': log "circuit breaker open — skipping", set agent_state.status = 'paused', return immediately
  • If Supabase is unreachable: fail to 'closed' (agents keep running — don't block the platform on a DB hiccup)
  • If no row exists for this agent: treat as 'closed'

``python def _check_circuit_breaker(self, sb: Client) -> bool: """Returns True if OK to run (closed), False if paused (open).""" try: row = sb.table("agent_state").select("circuit_breaker") \ .eq("agent", self.AGENT_ID).limit(1).execute() if row.data: return row.data[0].get("circuit_breaker", "closed") == "closed" except Exception as e: log.warning("circuit_breaker check failed — defaulting to closed: %s", e) return True ``

_run() calls this first:

``python def _run(self, sb: Client, creds: dict) -> None: if not self._check_circuit_breaker(sb): log.info("%s circuit breaker open — skipping", self.AGENT_ID) self._update_agent_state(sb, "paused") return # ... rest of existing _run logic ``

---

Kill Endpoint

Add POST /kill-agents to vps/hcs-post-server.js.

Two-step execution: 1. Patch all rows in agent_statecircuit_breaker = 'open' 2. pkill -f "agents.shared.runner" — terminates all running runner processes

Response:

``json { "ok": true, "killed": true } ``

pkill exit code 0 = processes killed, 1 = nothing was running — both are valid. Either way, breakers are open and nothing restarts.

---

CLI Script

scripts/circuit-breaker.js — run with node --env-file=.env scripts/circuit-breaker.js.

```

Pause one agent (graceful — takes effect on next timer tick)

node scripts/circuit-breaker.js --agent quinn --open

Resume one agent

node scripts/circuit-breaker.js --agent quinn --close

Resume all agents (no process start — just clears all breakers)

node scripts/circuit-breaker.js --resume-all

Global kill (terminates all running workers NOW + opens all breakers)

node scripts/circuit-breaker.js --kill ```

  • --open / --close: patch agent_state via Supabase (REDKEY_SUPABASE_URL + REDKEY_SUPABASE_SECRET_KEY)
  • --kill: POST to http://87.99.154.64:3001/kill-agents
  • --resume-all: patch all agent_state rows to circuit_breaker = 'closed'
  • Missing --agent with --open/--close: print usage and exit 1

---

Per-agent toggle
  • Pause icon when circuit_breaker = 'closed' (agent is running normally)
  • Play icon when circuit_breaker = 'open' (agent is paused)
  • Clicking PATCHes agent_state.circuit_breaker for that agent via Supabase
  • When open, show a small orange "paused" badge on the agent row
Header buttons

Two buttons in the cockpit header:

| Button | Color | Action | |--------|-------|--------| | Resume All | Green/neutral | Patches all agent_state.circuit_breaker = 'closed' | | Kill All | Red | Confirmation dialog → POST /kill-agents |

Confirmation dialog text: *"This will terminate all running workers immediately. Continue?"*

On Kill All success: cockpit reflects all agents as paused (orange badges). Resume All clears them.

---