Executive Summary

The primary bottleneck in autonomous software development is not model intelligence, but context management and architectural determinism. Current “Agentic” approaches fail at scale because they rely on probabilistic guidance (prompts) for deterministic engineering tasks (builds, security, state management). Furthermore, the linear cost of token consumption versus the non-linear degradation of model attention creates a “Context Trap” that prevents complex multi-phase execution.

This architecture treats the Large Language Model (LLM) not as a chatbot, but as a nondeterministic kernel process wrapped in a deterministic runtime environment. We present a five-layer architecture that enforces strict separation of concerns, enables linear scaling of complexity, and achieves “escape velocity”—where the AI system contributes net-positive value to the development lifecycle.


1. The Core Problem: The Context-Capability Paradox

Research confirms that token usage alone explains 80% of performance variance in agent tasks. This creates a fundamental paradox:

  1. To handle complex tasks, agents need comprehensive instructions (skills).
  2. Comprehensive instructions consume the context window.
  3. Consumed context reduces the model’s ability to reason about the actual task.

The Monolith vs. Distributed Architecture Problem

Legacy: The Monolith

  • Monolithic Agent contains all instructions, all tools, and full state
  • Result: Context Overflow

Platform: Distributed Architecture

  • Orchestrator Skill spawns Specialized Agent (Worker)
  • Worker loads Skill Library and MCP Tools on-demand (JIT)
  • Deterministic Hooks enforce Validation Loop that gates the Worker
  • Worker outputs Structured State

1.1 The Solution: Inverting the Control Structure

We moved from a “Thick Agent” model to a “Thin Agent / Fat Platform” architecture:

  • Agents are reduced to stateless, ephemeral workers (<150 lines).
  • Skills hold the knowledge, loaded strictly on-demand (Just-in-Time).
  • Hooks provide the enforcement, operating outside the LLM’s context.
  • Orchestration manages the lifecycle of specialized roles.

2. Agent Architecture: The “Thin Agent” Pattern

2.1 Architectural Constraints

The architecture is defined by one hard constraint: Sub-agents cannot spawn other sub-agents. This prevents infinite recursion but necessitates a flat, “Leaf Node” execution model.

2.2 The “Thin Agent” Specification

Agents are specialized workers that execute specific tasks and return results. They do not manage state or coordinate workflows.

Gold Standard Specification:

Metric Value
Line Count Strictly <150 lines
Discovery Cost ~500-1000 characters (visible to orchestrator)
Execution Cost ~2,700 tokens per spawn (down from ~24,000 in early versions)

2.3 Sub-Agent Isolation

Every agent spawn creates a fresh instance with zero shared history from previous siblings. This solves “Context Drift” where agents confuse current requirements with past attempts.

Workflow:

Orchestrator Skill → Task Tool → Spawn Sub-Agent
→ Clean Context Window → Gateway Skill (Read)
→ Library Skills (On-Demand Route) → Task Execution
→ Structured JSON Return → Destroy Context

3. Skill Architecture: Two-Tier Progressive Loading

To bypass the hard limit of ~15,000 characters for skill definitions and preserve context for thinking and execution, we implemented a two-tier (Librarian Pattern) file system architecture.

3.1 Tier 1: Core Skills (The “BIOS”)

  • Location: .claude/skills/
  • Count: 49 high-frequency skills
  • Mechanism: Registered with the LLM as executable tools
  • Purpose: Fundamental workflows (debugging-systematically, developing-with-tdd) and Gateways

3.2 Tier 2: Library Skills (The “Hard Drive”)

  • Location: .claude/skill-library/
  • Count: 304+ specialized skills
  • Mechanism: Invisible to the LLM until explicitly loaded via Read()
  • Purpose: Deep domain knowledge (e.g., optimizing-react-performance, configuring-aws-lambda)

3.3 The Gateway Pattern (The Router)

Agents do not hardcode library paths. They invoke a Gateway Skill (e.g., gateway-frontend), which acts as a dynamic router based on intent detection.

Example:

Agent: "I need to fix a React infinite loop"
│
▼
Invokes: gateway-frontend
│
├─► Detects Intent: "infinite loop", "useEffect"
│
├─► Matches Routing Pattern:
│     • preventing-react-hook-infinite-loops
│     • debugging-react-infinite-loops│
└─► Output: "Read these specific file paths..."

This implements Intent-Based Context Loading, ensuring agents only load specific patterns relevant to their current task.


4. Orchestration Architecture: The Kernel

Since agents cannot spawn agents, orchestration must occur in the Main Conversation Thread. We treat the Main Thread as the “Kernel Mode” process, and sub-agents as “User Mode” processes.

4.1 The Orchestrator-Worker Pattern

The Orchestrator is a Skill running in the main thread. It holds the global state machine.

The Tool Restriction Boundary:

Role Tools Constraint
Orchestrator (Main Thread) Task, TodoWrite, Read NO Edit or Write - must delegate to worker
Worker (Sub-Agent) Edit, Write, Bash NO Task - must work, cannot delegate

4.2 Coordinator vs. Executor Models

We enforce a strict separation between agents that plan and agents that do:

Model Skill Role Tools Best For
Coordinator orchestrating-* Spawns specialists Task Complex, multi-phase features requiring parallelization
Executor executing-plans Implements directly Edit, Write Tightly coupled tasks requiring frequent human oversight

Key Insight: An agent cannot be both. If it has the Task tool (Coordinator), it is stripped of Edit permissions. If it has Edit permissions (Executor), it is stripped of Task permissions.

4.3 The Standard 16 Phase Orchestration Template

All complex workflows follow a rigorous 16-phase state machine:

Phase Name Purpose
1 Setup Worktree creation, output directory, MANIFEST.yaml
2 Triage Classify work type, select phases to execute
3 Codebase Discovery 2 Phase Discovery. Explore patterns, detect technologies
4 Skill Discovery Map technologies to skills
5 Complexity Technical assessment, execution strategy
6 Brainstorming Design refinement with human-in-loop
7 Architecting Plan Technical design AND task decomposition
8 Implementation Code development
9 Design Verification Verify implementation matches plan
10 Domain Compliance Domain-specific mandatory patterns
11 Code Quality Code review for maintainability
12 Test Planning Test strategy and plan creation
13 Testing Test implementation and execution
14 Coverage Verification Verify test coverage meets threshold
15 Test Quality No low-value tests, correct assertions
16 Completion Final verification, PR, cleanup

Compaction Gate

We enforce token hygiene programmatically. Before entering heavy execution phases (3, 8, 13), the system checks context usage:

  • < 75%: Proceed
  • 75-85%: Warning (Should compact)
  • > 85%:Hard Block. The system refuses to spawn new agents until compaction runs.

Intelligent Phase Skipping

Work Type Criteria Skipped Phases
BUGFIX Single issue, clear fix 5, 6, 7, 9, 12 (no architecture, no brainstorming)
SMALL <100 lines, single concern 5, 6, 7, 9 (no complexity analysis)
MEDIUM Multi-file, some design None
LARGE New subsystem, architectural All 16 phases execute

Result: A bug fix completes in ~5 phases instead of 16. A new subsystem gets full treatment.

Programmatic Context Tracking

The compaction gate reads session transcripts to calculate usage:

# Session transcript location
~/.claude/projects/<project-hash>/<session-id>.jsonl

# Current context = cache_read + cache_create + input
# 200K context window → thresholds:
# 150K (75%) = SHOULD compact
# 160K (80%) = MUST compact
# 170K (85%) = Hook BLOCKS agent spawning

4.4 State Management & Locking

To survive session resets and context exhaustion, state is persisted to disk:

  1. The Process Control Block (MANIFEST.yaml):
    • Located in .claude/.output/features/{id}/
    • Tracks current phase, active agents, and validation status
    • Allows orchestration to be “resumed” across different chat sessions
  2. Distributed File Locking:
    • When multiple developer agents run in parallel
    • Utilizes lockfile mechanism (.claude/locks/{agent}.lock)
    • Prevents race conditions on shared source files

4.5 The Five-Role Development Pattern

Real-world workflows utilize a specialized five-role assembly line:

Role Agent Responsibility Output
Specialized Lead *-lead Architecture & Strategy. Decomposes requirements into atomic tasks. Does not write code. Architecture Plan (JSON)
Specialized Developer *-developer Implementation. Executes specific sub-tasks from the plan. Focuses purely on logic. Source Code
Specialized Reviewer *-reviewer Compliance. Validates code against specs and patterns. Rejects non-compliant work. Review Report
Test Lead test-lead Strategy. Analyzes the implementation to determine what needs testing. Test Plan
Specialized Tester *-tester Verification. Writes and runs the tests defined by the Test Lead. Test Cases

Workflow:

User → Orchestrator: "Add Feature X"
Orchestrator → Lead: "Design X"
Lead → Orchestrator: Architecture Plan

loop Implementation Cycle:
    Orchestrator → Dev: "Implement Task 1"
    Dev → Orchestrator: Code
    Orchestrator → Reviewer: "Review Task 1"
    Reviewer → Orchestrator: Approval/Rejection

Orchestrator → TestLead: "Plan Tests for X"
TestLead → Orchestrator: Test Strategy

loop Verification Cycle:
    Orchestrator → Tester: "Execute Test Suite"
    Tester → Orchestrator: Pass/Fail

This specialization prevents the “Jack of All Trades” failure mode.

4.6 Orchestration Skills: The Coordination Infrastructure

The Iteration Problem

Without explicit termination signals, agents either exit prematurely or loop infinitely. The iterating-to-completion skill solves this with three mechanisms:

  1. Completion Promises: An explicit string (e.g., ALL_TESTS_PASSING) that the agent outputs only when success criteria are met.
  2. Scratchpads: A persistent file where agents record accomplishments, failures, and next steps.
  3. Loop Detection: If three consecutive iterations produce outputs with >90% string similarity, the system detects a stuck state and escalates.

The Persistence Problem

The persisting-agent-outputs and persisting-progress-across-sessions skills provide external memory:

  • Discovery Protocol: Agents follow deterministic protocol to find write locations
  • Blocked Agent Routing: When agent returns status: "blocked", orchestrator consults routing table
  • Context Compaction: Completed phase outputs are summarized to prevent “context rot”

The Parallelization Problem

The dispatching-parallel-agents skill identifies independent failures and spawns concurrent agents:

Agent 1 (frontend-tester) → Fix auth-abort.test.ts (3 failures)
Agent 2 (frontend-tester) → Fix batch-completion.test.ts (2 failures)
Agent 3 (frontend-tester) → Fix race-conditions.test.ts (1 failure)

All three run simultaneously. Time to resolution: 1x instead of 3x.

Skill Composition

orchestrating-feature-development (16-phase workflow)
├── persisting-agent-outputs (shared workspace)
├── persisting-progress-across-sessions (cross-session resume)
├── iterating-to-completion (intra-task loops)
└── dispatching-parallel-agents (concurrent debugging)

5. The Runtime: Deterministic Hooks

While Skills provide guidance, Hooks provide enforcement. We utilize lifecycle events (PreToolUse, PostToolUse, Stop) to inject deterministic logic that the LLM cannot bypass.

5.1 Defense in Depth: Eight-Layer Enforcement

Layer Description
LAYER 1: CLAUDE.md Full ruleset loaded at session start. Establishes norms.
LAYER 2: Skills Procedural workflows invoked on-demand. “How to do X.”
LAYER 3: Agent Definitions Role-specific behavior, mandatory skill lists, output formats.
LAYER 4: UserPromptSubmit Hooks Inject reminders every prompt. Gateway → library skill pattern.
LAYER 5: PreToolUse Hooks Block BEFORE action. Agent-first enforcement, compaction gates.
LAYER 6: PostToolUse Hooks Validate agent work before completion. Output location, skill compliance.
LAYER 7: SubagentStop Hooks Block premature exit. Quality gates, iteration limits, feedback loops.
LAYER 8: Stop Hooks Block premature exit. Quality gates, iteration limits, feedback loops.

Example: A developer agent writes code without spawning a reviewer:

Layer Mechanism Result
3 Agent definition says “reviewer validates your work” Rationalized
6 track-modifications.sh creates feedback-loop-state.json State initialized
8 feedback-loop-stop.sh blocks exit until review phase passes Blocked
8 quality-gate-stop.sh provides backup check Blocked

5.2 Agent-First Enforcement

The platform doesn’t suggest delegation—it forces it.

# PreToolUse hook intercepts Edit/Write
agent-first-enforcement.sh

1. Parse tool_input.file_path
2. Determine domain (backend, frontend, capability, tool)
3. Check if developer agent exists for that domain
4. If yes → BLOCK with: "Spawn {domain}-developer instead"

Before (rationalized):

User: "Fix the authentication bug in login.go"
Claude: "I'll just make this quick edit myself..."
→ Writes buggy code, no review, no tests

After (enforced):

User: "Fix the authentication bug in login.go"
Claude: Attempts Edit on login.go
→ BLOCKED: "backend-developer exists. Spawn it instead."
Claude: Spawns backend-developer with clear task
→ Agent follows TDD, gets reviewed, tests pass

5.3 The Three-Level Loop System

Level 1: Intra-Task Loop (iteration-limit-stop.sh)

  • Scope: Single agent
  • Function: Prevents endless spinning on single shell command
  • Limit: Max 10 iterations (configurable)

Level 2: Inter-Phase Loop (feedback-loop-stop.sh)

  • Scope: Implementation → Review → Test cycle
  • Function: Enforces code cannot be marked complete until Reviewer and Tester pass it
  • Logic:
    1. Listens for Edit/Write tools
    2. Sets “Dirty Bit” in feedback-loop-state.json
    3. Intercepts Stop event
    4. If Dirty Bit set and tests_passed != true, BLOCK EXIT
    5. Returns JSON with block reason

Multi-Domain Feedback Loops:

{
  "active": true,
  "iteration": 2,
  "modified_domains": ["backend", "frontend"],
  "domain_phases": {
    "backend": {
      "review": { "status": "PASS", "agent": "backend-reviewer" },
      "testing": { "status": "PASS", "agent": "backend-tester" }
    },
    "frontend": {
      "review": { "status": "PASS", "agent": "frontend-reviewer" },
      "testing": { "status": "FAIL", "agent": "frontend-tester" }
    }
  }
}

Behavior:

  • Exit blocked until ALL domains pass ALL phases
  • If any domain’s tests fail, ALL domains reset for next iteration
  • Domain-specific agents ensure expertise match

Level 3: Orchestrator Loop (Skill Logic)

  • Scope: The 16-phase workflow
  • Function: Re-invokes entire phases if macro-goals are missed

5.4 Ephemeral vs. Persistent State

Dual-state architecture:

Type Storage Purpose Lifespan
Ephemeral State feedback-loop-state.json Runtime Enforcement (blocking exit, tracking dirty bits) Cleared on session restart
Persistent State MANIFEST.yaml Workflow Coordination (resuming tasks, tracking phases) Survives session restarts

5.5 The Escalation Advisor

When an agent gets stuck in a loop, we implement an Out-of-Band Advisor:

  • Trigger: Stop event blocked > 3 times
  • Action: Hook invokes external LLM with session transcript
  • Prompt: “Analyze this loop. Why is the agent stuck? Provide a 1-sentence hint.”
  • Result: Hint injected into context as system message

5.6 Output Location Enforcement

Three-layer enforcement for output compliance:

LAYER 1: PostToolUse (Task) – FEEDBACK

  • task-skill-enforcement.sh
  • Warns if agent didn’t invoke persisting-agent-outputs
  • Non-blocking feedback to orchestrator

LAYER 2: SubagentStop – BLOCKING

  • output-location-enforcement.sh
  • Detects untracked .md files outside .claude/.output/
  • Checks skill compliance in file content
  • Analyzes git state for safe vs manual revert
  • BLOCKS completion if violations found

LAYER 3: Stop – DEFENSE IN DEPTH

  • quality-gate-stop.sh
  • Same detection as Layer 2
  • Catches anything that slipped through SubagentStop

5.7 Architectural Evolution

Synthesis of foundational patterns:

Feature Ralph Wiggum Continuous-Claude-v3 Superpowers This Platform
Scope Single Agent Loop Cross-Session Handoff Skill Framework Multi-Agent Orchestration
State None (Loop only) YAML Handoffs Context-based Dual (Ephemeral + Persistent)
Control Prompt-based Prompt-based Skill-based Deterministic Hooks
Feedback None None Human-in-loop Inter-Phase Loops + Escalation Advisor

6. The Supply Chain: Lifecycle Management

Managing 350+ prompts and 39+ specialized agents leads to entropy. We treat these assets as software artifacts.

6.1 The Agent Manager

Every agent definition must pass 9-Phase Agent Audit:

  • Leanness: Strictly <150 lines (or <250 for architects)
  • Discovery: Valid “Use when” triggers for Task tool
  • Skill Integration: Proper Gateway usage instead of hardcoded paths
  • Output Standard: JSON schema compliance for structured handoffs

6.2 The Skill Manager & TDD

28-Phase Skill Audit System:

  • Structural: Frontmatter validity, file size (<500 lines)
  • Semantic: Description signal-to-noise ratio
  • Referential: Integrity of all Read() paths and Gateway linkages

Hybrid Audit Pattern:

  1. Deterministic CLI Checks: TypeScript ASTs verify file structures, link validity, syntax
  2. Semantic LLM Review: “Reviewer LLM” judges clarity, tone, utility

TDD for Prompts (Red-Green-Refactor):

  1. Red: Capture transcript where agent fails
  2. Green: Update skill/hook until behavior corrected
  3. Refactor: Run “Pressure Tests” with adversarial system prompts

6.3 The Research Orchestrator: Content Accuracy

Research-First Workflow:

  1. Intent Expansion: Breaktopic into semantic interpretations
  2. Sequential Discovery: Agents dispatch to 6 sources:
    • Codebase: Existing patterns
    • Context7: Official documentation
    • GitHub: Community usage and issues
    • Web/Perplexity: Current best practices
  3. Synthesis: Aggregate findings, resolve conflicts, generate SKILL.md

7. Tooling Architecture: Progressive MCPs & Code Intelligence

7.1 The TypeScript Wrapper Pattern (MCP Code Execution)

Standard MCP implementations suffer from:

  1. Eager Loading: ~20k tokens injected at startup
  2. Context Rot: Every intermediate result replays back into context

Solution: On-Demand TypeScript Wrappers

  • Legacy Model: 5 MCP servers = 71,800 tokens at startup (36% of context)
  • Wrapper Model: 0 tokens at startup. Wrappers load via Gateway only when requested
  • Safety Layer: Zod schema validation on inputs, Response Filtering on outputs

Workflow:

Session Start: 0 Tokens Loaded ↓
Agent → Gateway: "I need to fetch a Linear issue"
Gateway → Agent: Returns Path: .claude/tools/linear/get-issue.ts
Agent → Wrapper: Execute(issueId: "ENG-123")
Wrapper: 1. Zod Validation ↓
Wrapper → MCP: Spawn Process & Request
MCP → Wrapper: Large JSON Response (50kb)
Wrapper: 2. Response Filtering ↓
Wrapper → Agent: Optimized JSON (500b)
Process Ends. Memory Freed.

7.2 Serena: Semantic Code Intelligence

The File-Reading Problem:

Traditional: Read full file (~8,000 tokens) → Find function → Generate replacement For 5 files: ~40,000 tokens

Symbol-Level: Query find_symbol(“processPayment”) → Returns function body (~200 tokens) For 5 files: ~1,000 tokens

Operations Comparison:

Operation Without Serena With Serena
Find function definition Read entire file(s), regex search find_symbol → exact location
Trace call hierarchy Read all potential callers find_referencing_symbols → direct graph
Insert new method Read file, string manipulation insert_after_symbol → surgical placement
Navigate dependencies Grep + manual file traversal find_symbol on imports → semantic resolution

Performance: Custom Connection Pool maintains warm LSP processes, reducing query latency from ~3s cold-start to ~2ms warm.


8. Infrastructure Integration: Zero-Trust Secrets

8.1 The run-with-secrets Wrapper

We do not give agents API keys. We give them a tool: 1password.run-with-secrets.

Configuration:

export const DEFAULT_CONFIG = {
  account: "company.1password.com",
  serviceItems: {
    "aws-dev": "op://Private/AWS Key/credential",
    "ci-cd": "op://Engineering/CI Key/credential",
  },
};

Execution Flow:

LLM → Agent: "List S3 Buckets"
Agent → Tool: run_with_secrets("aws s3 ls")
↓
SECURE ENCLAVE (Child Process):
Tool → 1Password: Request "AWS_ACCESS_KEY"
1Password → Tool: Inject as ENV VAR
Tool → AWS: Execute Command
AWS → Tool: Output: "bucket-a, bucket-b"
↓
Tool → Agent: Return Output
Agent → LLM: "Here are the buckets..."

Security Guarantee: The secret exists only in child process environment variables. Never printed to stdout, never logged, never enters LLM context window.


9. Horizontal Scaling Architecture

Traditional development is constrained by local hardware (laptop RAM/CPU), limiting analysis to 3-5 concurrent sessions.

Solution: DecoupleControl Plane (Laptop) from Execution Plane (Cloud):

  • Local: Engineer’s laptop deploys Docker instance using DevPod
  • Remote: Development environment runs in ephemeral Docker container within AMI
  • Bridge: Secure SSH tunnel forwards remote terminal to laptop

9.1 DevPod Benefits

  • Isolation: Each feature runs in own container
  • Resources: Provision 128GB RAM instances for massive monorepo analysis
  • Security: Code never leaves VPC. Laptop only sees terminal pixels

10. Roadmap: Beyond Orchestration

Current platform achieves Level 3 Autonomy (Orchestrated). Roadmap targets Level 5 (Self-Evolving).

10.1 Heterogeneous LLM Routing

Development Task Optimal Model Technical Advantage
Logic & Reasoning DeepSeek-R1 / V3 RL-based chain-of-thought for complex inference
Document Processing DeepSeek OCR 2 10x token efficiency with visual causal flow
UI/UX & Frontend Kimi 2.5 Native MoonViT architecture for visual debugging
Parallel Research Kimi 2.5 Swarm PARL-driven optimization across 100 agents
Massive Repository Mapping DeepSeek-v4 Engram O(1) lookup and tiered KV cache for million-token context

10.2 Self-Annealing & Auto-Correction

When agent fails quality gate > 3 times, system triggers Self-Annealing Workflow:

  1. Diagnosis: Meta-Agent reads session transcript, identifies “Rationalization Path”
  2. Patching:
    • Skill Annealing: Modify SKILL.md to add Anti-Pattern entries
    • Hook Hardening: Update bash script logic for edge cases
    • Agent Refinement: Update agent prompt via agent-manager
  3. Verification: Run pressure-testing-skill-content against patched artifact
  4. Pull Request: Create PR labeled [Self-Annealing] for human review

This transforms the platform into an antifragile system that gets stronger with every failure.

10.3 Agent-to-Agent Negotiation (Future)

Currently agents follow rigid JSON schemas. Future agents will negotiate API contracts dynamically:

"I need X, can you provide it?"
"No, but I can provide Y which is similar."
"Agreed, proceeding with Y."

10.4 Self-Healing Infrastructure (Future)

  • Detecting “Context Starvation” and auto-archiving memory
  • Identifying “Tool Hallucination” and generating new Zod schemas

11. Conclusion

By architecting a system where:

  1. Agents are ephemeral and stateless
  2. Context is strictly curated via Gateways
  3. Workflows are enforced by deterministic Kernel hooks
  4. Tools are progressively loaded and type-safe
  5. Secrets never touch the context

We transform the LLM from a “creative assistant” into a deterministic component of the software supply chain. This allows scaling development throughput linearly with compute, untethered by cognitive limits of human attention.


Key Architectural Principles

The Five Layers

  1. Agent Layer: Stateless, ephemeral workers (<150 lines)
  2. Skill Layer: Two-tier progressive loading (Core + Library)
  3. Orchestration Layer: Kernel-mode coordinator with 16-phase state machine
  4. Hook Layer: Eight-layer defense in depth for enforcement
  5. Infrastructure Layer: Zero-trust secrets, horizontal scaling

The Paradigm Shift

  • From Thick Agent → Thin Agent / Fat Platform
  • From Probabilistic Prompts → Deterministic Hooks
  • From Monolithic Context → Progressive Loading
  • From Single-Model → Heterogeneous Routing
  • From Static Rules → Self-Annealing System

Constraint Forces Innovation

The problem with capital is that it allows doing many inefficient things very fast. Without that luxury, the system must be clever instead. Build the machine, that builds the machine, that enables a team, to build all the things.


<
Previous Post
Auto research loop
>
Blog Archive
Archive of all previous blog posts