{
  "id": "2026-05-08-codex-agent-candidates-98bf9c4ea4",
  "scope": "redkey",
  "source_of_truth": "repo",
  "source_path": "docs/specs/2026-05-08-codex-agent-candidates.md",
  "source_kind": "markdown",
  "visibility": "internal",
  "renderer_id": "design_doc.dreamborn-forge.generated.v1",
  "design_system": "dreamborn-design-system:forge",
  "generated_at": "2026-05-09T13:00:55.745Z",
  "artifact_type": "design_doc",
  "schema_version": "design_doc.generated.v1",
  "title": "Codex Agent Candidate Assessment",
  "summary": "Codex Agent Candidate Assessment Date: 2026 05 08 Status: Draft Owner: Justin / Atlas Codex Scope: RedKey platform agent strategy Summary RedKey should build Codex native agents, but not all named RedKey agents are equally good first candidates. Codex is strongest when the work is repo centered and verifiable: inspect files, make scoped edits, run tests, rea...",
  "format_source": "markdown",
  "sections": [
    {
      "title": "Codex Agent Candidate Assessment",
      "level": 1,
      "body": "Date: 2026-05-08  \nStatus: Draft  \nOwner: Justin / Atlas-Codex  \nScope: RedKey platform agent strategy"
    },
    {
      "title": "Summary",
      "level": 2,
      "body": "RedKey should build Codex-native agents, but not all named RedKey agents are equally good first candidates.\n\nCodex is strongest when the work is repo-centered and verifiable: inspect files, make scoped edits, run tests, read logs, update docs, open PR-ready changes, and respond to review. The first Codex agent should therefore operate inside the RedKey harness before it attempts broad product, content, communication, or external-system workflows.\n\nRecommendation:\n\n1. Build **Team OS Gardener** first as the lowest-risk persistent Codex-style agent.\n2. Build **Quinn** next as the primary implementation/slice agent.\n3. Build **B2BEA Slice Auditor** as a focused reviewer agent for Quinn and B2BEA governance work.\n\nAtlas should remain the supervisor/persona layer. Jess and Mia are valid future agents, but they are better fits for OpenAI Agents SDK workflows with Gmail, Drive, Calendar, approvals, and persistent state than for the first Codex-native repo agent."
    },
    {
      "title": "Decision Criteria",
      "level": 2,
      "body": "A good first Codex agent should have:\n\n- Repo-local work.\n- Clear input artifacts.\n- Clear file ownership.\n- Verifiable output.\n- Low external side effects.\n- Repeatable test/build/lint commands.\n- Small blast radius.\n- A natural review loop.\n- Strong fit with the RedKey Team OS and Harness standards.\n\nA weaker first Codex agent has:\n\n- Heavy dependence on external communication tools.\n- Client-visible publication risk.\n- Ambiguous subjective output.\n- Weak automated verification.\n- Many approval paths.\n- Broad authority across projects or systems."
    },
    {
      "title": "Candidate Ranking",
      "level": 2,
      "body": "| Rank | Candidate | Fit | Why |\n| --- | --- | --- | --- |\n| 1 | Team OS Gardener | Best first persistent agent | Low-risk, repo-centered, improves the harness itself |\n| 2 | Quinn | Best implementation agent | Natural Codex fit: code, tests, docs, PR-ready patches |\n| 3 | B2BEA Slice Auditor | Best reviewer agent | Focused governance checks with strong existing patterns |\n| 4 | Bezel API Harness Agent | Strong backend candidate | Good smoke/test surface and clear API invariants |\n| 5 | Priya | Useful later | Planning/artifact synthesis depends on Studio retrieval maturity |\n| 6 | Jess | Not first | External workflow agent with Gmail/Drive/Calendar and approvals |\n| 7 | Mia | Not first | Content workflow agent; subjective output and publication risks |\n| 8 | Atlas | Do not convert first | Should remain supervisor/persona/router, not first autonomous agent |"
    },
    {
      "title": "1. Team OS Gardener",
      "level": 3,
      "body": "Purpose:\n\n- Keep RedKey context legible.\n- Scan Team OS docs, project routing, state files, specs, and plans for drift.\n- Open small patches to improve cross-links, freshness, and routing clarity.\n\nWhy first:\n\n- It exercises the harness without risking product code.\n- It improves the substrate future agents depend on.\n- It has low external side effects.\n- Its output is easy for Justin or Atlas to review.\n\nInputs:\n\n- `TEAM_OS.md` once created.\n- `project-refs.yaml`.\n- `docs/state.md`.\n- `docs/specs`.\n- `docs/plans`.\n- `docs/team-os`.\n- `.codex/atlas-codex/OPERATING.md`.\n- `AGENTS.md`.\n\nAllowed actions:\n\n- Read repository docs and config.\n- Propose or patch documentation fixes.\n- Flag contradictions between state, project refs, and specs.\n- Add missing cross-links.\n- Suggest stale-state cleanup.\n\nBlocked actions without approval:\n\n- Editing Studio canonical product artifacts.\n- Changing deployment config.\n- Changing application code.\n- Rewriting active project state beyond small routing corrections.\n- Deleting docs.\n\nVerification:\n\n- Markdown lint or basic formatting checks where available.\n- Link/cross-reference checks when implemented.\n- Fresh Atlas startup can still route active projects correctly."
    },
    {
      "title": "2. Quinn Codex Agent",
      "level": 3,
      "body": "Purpose:\n\n- Execute small implementation slices from approved specs or plans.\n- Make scoped repo patches.\n- Run verification.\n- Produce PR-ready summaries.\n\nWhy second:\n\n- Quinn is the clearest Codex-native role.\n- Implementation work has natural verification loops.\n- The Team OS Gardener should improve the context Quinn depends on before Quinn becomes persistent.\n\nInitial scope:\n\n- B2BEA runtime-governance slices.\n- Bezel API hardening tasks.\n- Focused test/build/doc patches.\n\nInputs:\n\n- Approved implementation plan or Studio build-execution artifact.\n- Repo-specific `AGENTS.md`.\n- Relevant specs and tests.\n- Harness standards.\n\nAllowed actions:\n\n- Inspect code.\n- Make scoped edits.\n- Add/update tests.\n- Run focused and full verification commands.\n- Update implementation notes.\n\nBlocked actions without approval:\n\n- Deployments.\n- Secrets or environment changes.\n- Broad refactors outside the assigned slice.\n- Client-visible content changes.\n- Canonical product planning changes outside Studio.\n\nVerification:\n\n- Focused tests for touched behavior.\n- Full test/build command where feasible.\n- Governance audit if the target repo has one.\n- Browser verification for UI work when applicable."
    },
    {
      "title": "3. B2BEA Slice Auditor",
      "level": 3,
      "body": "Purpose:\n\n- Review B2BEA website slices against existing governance rules.\n- Catch drift before merge.\n- Serve as a focused reviewer for Quinn or Atlas-Codex work.\n\nWhy third:\n\n- B2BEA already has strong slice history and governance patterns.\n- Auditor behavior can be narrow and repeatable.\n- It improves quality without initially granting implementation authority.\n\nChecks:\n\n- No rogue CSS or design-system violations.\n- Runtime JavaScript moved to governed assets when required.\n- Inline runtime migrations preserve behavior.\n- Focused tests updated.\n- Full test/build pass recorded.\n- Studio artifact/build-execution references are present when required.\n- Route/auth/access behavior remains aligned with policy.\n\nAllowed actions:\n\n- Read code and tests.\n- Run verification commands.\n- Produce review findings.\n- Optionally patch small doc/test expectation fixes after approval.\n\nBlocked actions without approval:\n\n- Changing product behavior.\n- Merging PRs.\n- Deploying.\n- Rewriting slice scope."
    },
    {
      "title": "Bezel API Harness Agent",
      "level": 3,
      "body": "Good fit once RedKey wants backend reliability automation.\n\nPossible responsibilities:\n\n- Run live and local smoke tests.\n- Check OpenAPI and SDK drift.\n- Verify claim lifecycle behavior.\n- Verify event leak boundaries.\n- Inspect migration state.\n- Propose small hardening patches.\n\nThis agent should wait until Team OS Gardener and Quinn establish baseline harness discipline."
    },
    {
      "title": "Priya",
      "level": 3,
      "body": "Priya is a planning and artifact synthesis agent. Codex can support Priya-like work, but Priya should not be the first Codex agent because planning quality depends on clean Studio artifact retrieval and strict artifact routing.\n\nPriya becomes a better candidate when:\n\n- Studio artifacts have reliable read/write helpers.\n- Product specs have stable schemas.\n- Agent outputs can be validated mechanically.\n- Artifact routing is enforced enough to prevent repo-doc leakage."
    },
    {
      "title": "Jess",
      "level": 3,
      "body": "Jess Podcast Coordinator is a better fit for an OpenAI Agents SDK workflow than a first Codex-native repo agent.\n\nReasons:\n\n- Needs Gmail, Calendar, Drive, guest context, and approval flows.\n- Produces client/human-facing communication.\n- Has external side effects.\n- Needs durable state across scheduling and prep workflows.\n\nJess is a good future SDK-agent candidate after approval gates and connector tools are well defined."
    },
    {
      "title": "Mia",
      "level": 3,
      "body": "Mia Content Creator is also better as a future workflow/content agent than the first Codex agent.\n\nReasons:\n\n- Output quality is subjective.\n- Publication risk is higher.\n- Verification is less mechanical.\n- Needs brand voice, content calendar, review, and approval workflows.\n\nMia should wait until content review and approval gates are encoded."
    },
    {
      "title": "Atlas",
      "level": 3,
      "body": "Atlas should not be converted into the first autonomous Codex agent.\n\nAtlas is the supervisor/router/persona layer:\n\n- Loads operating context.\n- Routes project work.\n- Applies memory protocol.\n- Helps Justin decide what to do.\n- Coordinates tools and agents.\n\nTurning Atlas into the first autonomous agent would blur supervision and execution. Atlas should remain the control layer while narrower agents do bounded work."
    },
    {
      "title": "Codex vs OpenAI Agents SDK",
      "level": 2,
      "body": "Use Codex-native agents when:\n\n- Work is repo-centered.\n- The agent needs shell, git, tests, build tools, and code review.\n- Outputs are patches, docs, tests, PRs, or verification reports.\n- The primary risk is code correctness or repo drift.\n\nUse OpenAI Agents SDK agents when:\n\n- RedKey owns orchestration, tool execution, approvals, runtime behavior, state, and storage.\n- The agent needs durable service behavior.\n- The workflow spans external tools such as Gmail, Drive, Calendar, Studio, Supabase, or Bezel API.\n- Human approval points need to be part of the runtime.\n- The agent needs multi-step state outside a single coding session.\n\nLikely split:\n\n- Team OS Gardener: Codex-native first, possibly later scheduled.\n- Quinn: Codex-native.\n- B2BEA Slice Auditor: Codex-native reviewer.\n- Bezel API Harness Agent: Codex-native or SDK depending on whether it becomes persistent.\n- Priya: SDK or Studio-integrated workflow after artifact schemas mature.\n- Jess: SDK.\n- Mia: SDK."
    },
    {
      "title": "First Pilot Definition",
      "level": 2,
      "body": "The first pilot should be **Team OS Gardener**.\n\nMinimum viable workflow:\n\n1. Read `project-refs.yaml`, `docs/state.md`, `AGENTS.md`, Atlas operating docs, and Team OS docs.\n2. Identify contradictions, stale next actions, missing cross-links, and routing gaps.\n3. Produce a short report.\n4. Patch only low-risk documentation fixes.\n5. Escalate anything that changes canonical project state.\n6. Run basic verification.\n7. Log durable lessons when the cleanup reveals a reusable rule.\n\nSuccess criteria:\n\n- Finds at least one real context/routing issue or confirms none with evidence.\n- Makes a small reviewable patch or produces a precise no-op report.\n- Does not modify product/client canonical artifacts incorrectly.\n- Does not inflate `docs/state.md` into a history archive.\n- Improves the next Atlas startup experience."
    },
    {
      "title": "Open Questions",
      "level": 2,
      "body": "1. Should Team OS Gardener run only on demand first, or on a weekly cadence?\n2. Should Quinn be a Codex subagent profile, a repeatable prompt/skill, or a durable Codex automation?\n3. Should B2BEA Slice Auditor be read-only at first?\n4. What is the minimum harness every project repo must expose before Quinn can touch it?\n5. How should Codex-native agent outputs be recorded in `agent_memory`, Studio artifacts, or repo docs?"
    },
    {
      "title": "Decision",
      "level": 2,
      "body": "Start with Team OS Gardener as the first Codex-native agent candidate. Use it to improve context quality and harness reliability before granting broader implementation authority to Quinn.\n\nQuinn should be the first true builder agent after that. B2BEA Slice Auditor should become the first focused reviewer agent."
    }
  ],
  "html_path": "artifacts/2026-05-08-codex-agent-candidates-98bf9c4ea4.html",
  "json_path": "artifacts/2026-05-08-codex-agent-candidates-98bf9c4ea4.json"
}