{
  "id": "2026-04-24-completion-verification-ee9fb7e284",
  "scope": "redkey",
  "source_of_truth": "repo",
  "source_path": "docs/specs/2026-04-24-completion-verification.md",
  "source_kind": "markdown",
  "visibility": "internal",
  "renderer_id": "design_doc.dreamborn-forge.generated.v1",
  "design_system": "dreamborn-design-system:forge",
  "generated_at": "2026-05-09T13:00:55.667Z",
  "artifact_type": "design_doc",
  "schema_version": "design_doc.generated.v1",
  "title": "Completion Verification — Runner-Owned Task Governance",
  "summary": "Completion Verification — Runner Owned Task Governance Date: 2026 04 24 Status: Design Backlog: completion verification Problem Workers currently post task.complete themselves via the hedera tools MCP. The HCS topic receives completions before anything can validate them. The Engine advances workflow state based on the claim alone — there is no verification t...",
  "format_source": "markdown",
  "sections": [
    {
      "title": "Completion Verification — Runner-Owned Task Governance",
      "level": 1,
      "body": "**Date:** 2026-04-24  \n**Status:** Design  \n**Backlog:** `completion-verification`\n\n---"
    },
    {
      "title": "Problem",
      "level": 2,
      "body": "Workers currently post `task.complete` themselves via the hedera-tools MCP. The HCS topic receives completions before anything can validate them. The Engine advances workflow state based on the claim alone — there is no verification that work was actually done, correctly, or at all.\n\nThis creates three failure modes:\n1. Worker produces no output (bug, timeout, misunderstood brief)\n2. Worker produces output that exists but is wrong (technically valid, semantically broken)\n3. Worker produces output that partially addresses the task (missing criteria)\n\nAll three advance the workflow as if complete.\n\n---"
    },
    {
      "title": "Core Principle",
      "level": 3,
      "body": "**The worker's job is to produce output. The runner's job is to verify it and own the HCS protocol.**\n\nWorkers stop knowing about `task.complete`. Their prompt says: write your output to `{output_path}` and stop. The runner handles everything from there — validation, retry, and posting to HCS.\n\nThis means **only verified completions ever reach HCS**. The on-chain record is clean.\n\n---"
    },
    {
      "title": "Completion Contract",
      "level": 3,
      "body": "Each step in a workflow template defines what \"done\" looks like. This is the completion contract. The runner reads it from the task brief and validates against it.\n\n```yaml\nsteps:\n  - role: developer\n    brief: \"implement the feature\"\n    completion:\n      type: file\n      path: \"feature/{task_id}/output.md\"\n      min_length: 100\n      evaluate: \"Must address all acceptance criteria in the brief. Code must be syntactically valid and complete.\"\n\n  - role: reviewer\n    brief: \"review and approve the PR\"\n    completion:\n      type: signal\n      signal: approved\n      evaluate: \"Must include a clear verdict (approved/rejected) and specific reasoning.\"\n\n  - role: ba\n    brief: \"run the integration tests\"\n    completion:\n      type: none          # pure side-effect — runner posts complete immediately after worker exits cleanly\n```\n\n**Completion types:**\n\n| Type | What runner checks |\n|---|---|\n| `file` | Path exists, non-empty, optionally `min_length` met |\n| `signal` | Worker output JSON contains the specified signal field |\n| `none` | Worker exits cleanly — runner posts complete immediately, no validation |\n\nThe `evaluate` field is optional. When present, Haiku reads it alongside the output to assess semantic correctness.\n\n---"
    },
    {
      "title": "Runner Verification Loop",
      "level": 3,
      "body": "```\n1. Runner assembles brief (includes completion contract)\n2. Invoke Claude CLI — worker attempt 1\n3. Mechanical check against completion contract\n   FAIL → skip Haiku, go to step 6 with mechanical failure reason\n\n4. Haiku evaluation (if evaluate field present)\n   Input: original brief + completion contract + worker output\n   Output: pass / fail + specific diagnosis\n   PASS → step 5\n   FAIL → go to step 6 with Haiku's diagnosis\n\n5. Post task.complete to HCS → done\n\n6. Build revisionist brief:\n   - Original task brief\n   - Completion contract (\"here is what done looks like\")\n   - Specific failure reason from step 3 or 4\n   - Explicit instruction: produce output that satisfies the contract\n\n7. Invoke Claude CLI — revisionist attempt\n8. Mechanical check\n   FAIL → post task.blocked (see below) → done\n\n9. Haiku evaluation (if evaluate field present)\n   PASS → post task.complete → done\n   FAIL → post task.blocked (see below) → done\n```\n\nOne attempt. If the revisionist can't fix it, block. The diagnosis quality is what makes one attempt sufficient — the worker knows exactly what it got wrong.\n\n---"
    },
    {
      "title": "Haiku Evaluator",
      "level": 3,
      "body": "Haiku is used for semantic evaluation only — it never produces task output. It receives:\n\n- The original task brief\n- The completion contract (type, path/signal, evaluate criteria)\n- The worker's output (file content or signal value)\n\nIt returns a structured verdict:\n\n```json\n{\n  \"pass\": false,\n  \"diagnosis\": \"Output addresses acceptance criteria 1 and 3 but omits error handling for the edge case described in criterion 2. No test coverage for the null input path.\"\n}\n```\n\n**Haiku is not invoked when `evaluate` is absent** — mechanical check is sufficient for those steps.\n\n**Haiku is skipped entirely when `type: none`** — no output, no evaluation.\n\nCost profile: Haiku evaluation is a small read-only call against an existing output. Negligible cost relative to the Sonnet worker invocation.\n\n---"
    },
    {
      "title": "task.blocked Payload",
      "level": 3,
      "body": "When both attempts fail, the runner posts `task.blocked` with full context:\n\n```json\n{\n  \"type\": \"task.blocked\",\n  \"task_id\": \"uuid\",\n  \"workflow_instance_id\": \"uuid\",\n  \"step_num\": 3,\n  \"agent\": \"quinn\",\n  \"completion_spec\": {\n    \"type\": \"file\",\n    \"path\": \"feature/abc/output.md\",\n    \"evaluate\": \"Must address all acceptance criteria...\"\n  },\n  \"attempts\": [\n    {\n      \"attempt\": 1,\n      \"stage\": \"haiku_evaluation\",\n      \"failure\": \"Output addresses criteria 1 and 3 but omits error handling for criterion 2. No test coverage for null input.\"\n    },\n    {\n      \"attempt\": 2,\n      \"stage\": \"mechanical\",\n      \"failure\": \"Output file missing at feature/abc/output.md\"\n    }\n  ],\n  \"ts\": \"ISO8601\"\n}\n```\n\nThe Engine escalates to Sonnet on `task.blocked`. Sonnet has enough context to decide: re-post to `roles.exec` for human review, or attempt a different resolution.\n\nThe `roles.exec` item Justin sees is specific and actionable — not \"Quinn got blocked\" but the exact criteria, the exact failures, and what each attempt produced.\n\n---"
    },
    {
      "title": "Worker Prompt Change",
      "level": 3,
      "body": "Current worker prompt includes instruction to post `task.complete` via hedera-tools MCP. This is removed.\n\nReplacement:\n\n```\nWhen your work is complete, write your output to: {output_path}\n\nDo not post any task.complete or task.blocked messages. The runner handles protocol.\nYour only job is to produce correct output at the specified path.\n```\n\nFor `type: signal` tasks, the worker writes a JSON file to `output_path` containing the signal field:\n\n```json\n{ \"signal\": \"approved\", \"reasoning\": \"...\" }\n```\n\nFor `type: none` tasks, no output instruction is needed — runner posts complete on clean exit.\n\n---"
    },
    {
      "title": "What Moves Where",
      "level": 3,
      "body": "| Responsibility | Before | After |\n|---|---|---|\n| Post `task.complete` | Worker (via MCP) | Runner |\n| Post `task.blocked` | Worker (via MCP) | Runner |\n| Validate output | Nobody | Runner (mechanical + Haiku) |\n| Retry on failure | Nobody | Runner (one revisionist attempt) |\n| Failure diagnosis | Nobody | Haiku evaluator |\n\n---"
    },
    {
      "title": "Implementation Notes",
      "level": 3,
      "body": "- Completion contract fields added to `workflow_steps` schema (migration)\n- Runner reads `completion` block from task brief at startup\n- Mechanical validation is a simple utility in `agents/shared/runner.py`\n- Haiku evaluation is a second Claude CLI invocation with a short, structured prompt\n- Revisionist brief is assembled by the runner — not a separate agent, just a re-invocation with additional context prepended\n- `task.blocked` payload extended with `completion_spec` + `attempts` array\n- Worker persona updated to remove HCS posting instructions\n\n---"
    },
    {
      "title": "What This Delivers",
      "level": 2,
      "body": "- **Clean on-chain record** — only verified completions reach HCS\n- **Self-healing by default** — most failures resolved in the runner, no human needed\n- **Actionable blocks** — when escalation is needed, Justin sees exactly why\n- **No output cases handled** — `type: none` is a first-class completion type\n- **Semantic correctness** — Haiku catches \"technically valid, functionally wrong\" before the workflow advances\n- **Cheap** — Haiku evaluation adds minimal cost; revisionist only fires on failure"
    }
  ],
  "html_path": "artifacts/2026-04-24-completion-verification-ee9fb7e284.html",
  "json_path": "artifacts/2026-04-24-completion-verification-ee9fb7e284.json"
}