Platform Architecture for Autonomous Development

Executive Summary

The primary bottleneck in autonomous software development is not model intelligence, but context management and architectural determinism. Current “Agentic” approaches fail at scale because they rely on probabilistic guidance (prompts) for deterministic engineering tasks (builds, security, state management). Furthermore, the linear cost of token consumption versus the non-linear degradation of model attention creates a “Context Trap” that prevents complex multi-phase execution.

This architecture treats the Large Language Model (LLM) not as a chatbot, but as a nondeterministic kernel process wrapped in a deterministic runtime environment. We present a five-layer architecture that enforces strict separation of concerns, enables linear scaling of complexity, and achieves “escape velocity”—where the AI system contributes net-positive value to the development lifecycle.

1. The Core Problem: The Context-Capability Paradox

Research confirms that token usage alone explains 80% of performance variance in agent tasks. This creates a fundamental paradox:

To handle complex tasks, agents need comprehensive instructions (skills).
Comprehensive instructions consume the context window.
Consumed context reduces the model’s ability to reason about the actual task.

The Monolith vs. Distributed Architecture Problem

Legacy: The Monolith

Monolithic Agent contains all instructions, all tools, and full state
Result: Context Overflow

Platform: Distributed Architecture

Orchestrator Skill spawns Specialized Agent (Worker)
Worker loads Skill Library and MCP Tools on-demand (JIT)
Deterministic Hooks enforce Validation Loop that gates the Worker
Worker outputs Structured State

1.1 The Solution: Inverting the Control Structure

We moved from a “Thick Agent” model to a “Thin Agent / Fat Platform” architecture:

Agents are reduced to stateless, ephemeral workers (<150 lines).
Skills hold the knowledge, loaded strictly on-demand (Just-in-Time).
Hooks provide the enforcement, operating outside the LLM’s context.
Orchestration manages the lifecycle of specialized roles.

2. Agent Architecture: The “Thin Agent” Pattern

2.1 Architectural Constraints

The architecture is defined by one hard constraint: Sub-agents cannot spawn other sub-agents. This prevents infinite recursion but necessitates a flat, “Leaf Node” execution model.

2.2 The “Thin Agent” Specification

Agents are specialized workers that execute specific tasks and return results. They do not manage state or coordinate workflows.

Gold Standard Specification:

Metric	Value
Line Count	Strictly <150 lines
Discovery Cost	~500-1000 characters (visible to orchestrator)
Execution Cost	~2,700 tokens per spawn (down from ~24,000 in early versions)

2.3 Sub-Agent Isolation

Every agent spawn creates a fresh instance with zero shared history from previous siblings. This solves “Context Drift” where agents confuse current requirements with past attempts.

Workflow:

Orchestrator Skill → Task Tool → Spawn Sub-Agent
→ Clean Context Window → Gateway Skill (Read)
→ Library Skills (On-Demand Route) → Task Execution
→ Structured JSON Return → Destroy Context

3. Skill Architecture: Two-Tier Progressive Loading

To bypass the hard limit of ~15,000 characters for skill definitions and preserve context for thinking and execution, we implemented a two-tier (Librarian Pattern) file system architecture.

3.1 Tier 1: Core Skills (The “BIOS”)

Location: .claude/skills/
Count: 49 high-frequency skills
Mechanism: Registered with the LLM as executable tools
Purpose: Fundamental workflows (debugging-systematically, developing-with-tdd) and Gateways

3.2 Tier 2: Library Skills (The “Hard Drive”)

Location: .claude/skill-library/
Count: 304+ specialized skills
Mechanism: Invisible to the LLM until explicitly loaded via Read()
Purpose: Deep domain knowledge (e.g., optimizing-react-performance, configuring-aws-lambda)

3.3 The Gateway Pattern (The Router)

Agents do not hardcode library paths. They invoke a Gateway Skill (e.g., gateway-frontend), which acts as a dynamic router based on intent detection.

Example:

Agent: "I need to fix a React infinite loop"
│
▼
Invokes: gateway-frontend
│
├─► Detects Intent: "infinite loop", "useEffect"
│
├─► Matches Routing Pattern:
│     • preventing-react-hook-infinite-loops
│     • debugging-react-infinite-loops│
└─► Output: "Read these specific file paths..."

This implements Intent-Based Context Loading, ensuring agents only load specific patterns relevant to their current task.

4. Orchestration Architecture: The Kernel

Since agents cannot spawn agents, orchestration must occur in the Main Conversation Thread. We treat the Main Thread as the “Kernel Mode” process, and sub-agents as “User Mode” processes.

4.1 The Orchestrator-Worker Pattern

The Orchestrator is a Skill running in the main thread. It holds the global state machine.

The Tool Restriction Boundary:

Role	Tools	Constraint
Orchestrator (Main Thread)	Task, TodoWrite, Read	NO Edit or Write - must delegate to worker
Worker (Sub-Agent)	Edit, Write, Bash	NO Task - must work, cannot delegate

4.2 Coordinator vs. Executor Models

We enforce a strict separation between agents that plan and agents that do:

Model	Skill	Role	Tools	Best For
Coordinator	orchestrating-*	Spawns specialists	Task	Complex, multi-phase features requiring parallelization
Executor	executing-plans	Implements directly	Edit, Write	Tightly coupled tasks requiring frequent human oversight

Key Insight: An agent cannot be both. If it has the Task tool (Coordinator), it is stripped of Edit permissions. If it has Edit permissions (Executor), it is stripped of Task permissions.

4.3 The Standard 16 Phase Orchestration Template

All complex workflows follow a rigorous 16-phase state machine:

Phase	Name	Purpose
1	Setup	Worktree creation, output directory, MANIFEST.yaml
2	Triage	Classify work type, select phases to execute
3	Codebase Discovery	2 Phase Discovery. Explore patterns, detect technologies
4	Skill Discovery	Map technologies to skills
5	Complexity	Technical assessment, execution strategy
6	Brainstorming	Design refinement with human-in-loop
7	Architecting Plan	Technical design AND task decomposition
8	Implementation	Code development
9	Design Verification	Verify implementation matches plan
10	Domain Compliance	Domain-specific mandatory patterns
11	Code Quality	Code review for maintainability
12	Test Planning	Test strategy and plan creation
13	Testing	Test implementation and execution
14	Coverage Verification	Verify test coverage meets threshold
15	Test Quality	No low-value tests, correct assertions
16	Completion	Final verification, PR, cleanup

Compaction Gate

We enforce token hygiene programmatically. Before entering heavy execution phases (3, 8, 13), the system checks context usage:

< 75%: Proceed
75-85%: Warning (Should compact)
> 85%:Hard Block. The system refuses to spawn new agents until compaction runs.

Intelligent Phase Skipping

Work Type	Criteria	Skipped Phases
BUGFIX	Single issue, clear fix	5, 6, 7, 9, 12 (no architecture, no brainstorming)
SMALL	<100 lines, single concern	5, 6, 7, 9 (no complexity analysis)
MEDIUM	Multi-file, some design	None
LARGE	New subsystem, architectural	All 16 phases execute

Result: A bug fix completes in ~5 phases instead of 16. A new subsystem gets full treatment.

Programmatic Context Tracking

The compaction gate reads session transcripts to calculate usage:

# Session transcript location
~/.claude/projects/<project-hash>/<session-id>.jsonl

# Current context = cache_read + cache_create + input
# 200K context window → thresholds:
# 150K (75%) = SHOULD compact
# 160K (80%) = MUST compact
# 170K (85%) = Hook BLOCKS agent spawning

4.4 State Management & Locking

To survive session resets and context exhaustion, state is persisted to disk:

The Process Control Block (MANIFEST.yaml):
- Located in .claude/.output/features/{id}/
- Tracks current phase, active agents, and validation status
- Allows orchestration to be “resumed” across different chat sessions
Distributed File Locking:
- When multiple developer agents run in parallel
- Utilizes lockfile mechanism (.claude/locks/{agent}.lock)
- Prevents race conditions on shared source files

4.5 The Five-Role Development Pattern

Real-world workflows utilize a specialized five-role assembly line:

Role	Agent	Responsibility	Output
Specialized Lead	*-lead	Architecture & Strategy. Decomposes requirements into atomic tasks. Does not write code.	Architecture Plan (JSON)
Specialized Developer	*-developer	Implementation. Executes specific sub-tasks from the plan. Focuses purely on logic.	Source Code
Specialized Reviewer	*-reviewer	Compliance. Validates code against specs and patterns. Rejects non-compliant work.	Review Report
Test Lead	test-lead	Strategy. Analyzes the implementation to determine what needs testing.	Test Plan
Specialized Tester	*-tester	Verification. Writes and runs the tests defined by the Test Lead.	Test Cases

Workflow:

User → Orchestrator: "Add Feature X"
Orchestrator → Lead: "Design X"
Lead → Orchestrator: Architecture Plan

loop Implementation Cycle:
    Orchestrator → Dev: "Implement Task 1"
    Dev → Orchestrator: Code
    Orchestrator → Reviewer: "Review Task 1"
    Reviewer → Orchestrator: Approval/Rejection

Orchestrator → TestLead: "Plan Tests for X"
TestLead → Orchestrator: Test Strategy

loop Verification Cycle:
    Orchestrator → Tester: "Execute Test Suite"
    Tester → Orchestrator: Pass/Fail

This specialization prevents the “Jack of All Trades” failure mode.

4.6 Orchestration Skills: The Coordination Infrastructure

The Iteration Problem

Without explicit termination signals, agents either exit prematurely or loop infinitely. The iterating-to-completion skill solves this with three mechanisms:

Completion Promises: An explicit string (e.g., ALL_TESTS_PASSING) that the agent outputs only when success criteria are met.
Scratchpads: A persistent file where agents record accomplishments, failures, and next steps.
Loop Detection: If three consecutive iterations produce outputs with >90% string similarity, the system detects a stuck state and escalates.

The Persistence Problem

The persisting-agent-outputs and persisting-progress-across-sessions skills provide external memory:

Discovery Protocol: Agents follow deterministic protocol to find write locations
Blocked Agent Routing: When agent returns status: "blocked", orchestrator consults routing table
Context Compaction: Completed phase outputs are summarized to prevent “context rot”

The Parallelization Problem

The dispatching-parallel-agents skill identifies independent failures and spawns concurrent agents:

Agent 1 (frontend-tester) → Fix auth-abort.test.ts (3 failures)
Agent 2 (frontend-tester) → Fix batch-completion.test.ts (2 failures)
Agent 3 (frontend-tester) → Fix race-conditions.test.ts (1 failure)

All three run simultaneously. Time to resolution: 1x instead of 3x.

Skill Composition

orchestrating-feature-development (16-phase workflow)
├── persisting-agent-outputs (shared workspace)
├── persisting-progress-across-sessions (cross-session resume)
├── iterating-to-completion (intra-task loops)
└── dispatching-parallel-agents (concurrent debugging)

5. The Runtime: Deterministic Hooks

While Skills provide guidance, Hooks provide enforcement. We utilize lifecycle events (PreToolUse, PostToolUse, Stop) to inject deterministic logic that the LLM cannot bypass.

5.1 Defense in Depth: Eight-Layer Enforcement

Layer	Description
LAYER 1: CLAUDE.md	Full ruleset loaded at session start. Establishes norms.
LAYER 2: Skills	Procedural workflows invoked on-demand. “How to do X.”
LAYER 3: Agent Definitions	Role-specific behavior, mandatory skill lists, output formats.
LAYER 4: UserPromptSubmit Hooks	Inject reminders every prompt. Gateway → library skill pattern.
LAYER 5: PreToolUse Hooks	Block BEFORE action. Agent-first enforcement, compaction gates.
LAYER 6: PostToolUse Hooks	Validate agent work before completion. Output location, skill compliance.
LAYER 7: SubagentStop Hooks	Block premature exit. Quality gates, iteration limits, feedback loops.
LAYER 8: Stop Hooks	Block premature exit. Quality gates, iteration limits, feedback loops.

Example: A developer agent writes code without spawning a reviewer:

Layer	Mechanism	Result
3	Agent definition says “reviewer validates your work”	Rationalized
6	`track-modifications.sh` creates feedback-loop-state.json	State initialized
8	`feedback-loop-stop.sh` blocks exit until review phase passes	Blocked
8	`quality-gate-stop.sh` provides backup check	Blocked

5.2 Agent-First Enforcement

The platform doesn’t suggest delegation—it forces it.

# PreToolUse hook intercepts Edit/Write
agent-first-enforcement.sh

1. Parse tool_input.file_path
2. Determine domain (backend, frontend, capability, tool)
3. Check if developer agent exists for that domain
4. If yes → BLOCK with: "Spawn {domain}-developer instead"

Before (rationalized):

User: "Fix the authentication bug in login.go"
Claude: "I'll just make this quick edit myself..."
→ Writes buggy code, no review, no tests

After (enforced):

User: "Fix the authentication bug in login.go"
Claude: Attempts Edit on login.go
→ BLOCKED: "backend-developer exists. Spawn it instead."
Claude: Spawns backend-developer with clear task
→ Agent follows TDD, gets reviewed, tests pass

5.3 The Three-Level Loop System

Level 1: Intra-Task Loop (iteration-limit-stop.sh)

Scope: Single agent
Function: Prevents endless spinning on single shell command
Limit: Max 10 iterations (configurable)

Level 2: Inter-Phase Loop (feedback-loop-stop.sh)

Scope: Implementation → Review → Test cycle
Function: Enforces code cannot be marked complete until Reviewer and Tester pass it
Logic:
1. Listens for Edit/Write tools
2. Sets “Dirty Bit” in feedback-loop-state.json
3. Intercepts Stop event
4. If Dirty Bit set and tests_passed != true, BLOCK EXIT
5. Returns JSON with block reason

Multi-Domain Feedback Loops:

{
  "active": true,
  "iteration": 2,
  "modified_domains": ["backend", "frontend"],
  "domain_phases": {
    "backend": {
      "review": { "status": "PASS", "agent": "backend-reviewer" },
      "testing": { "status": "PASS", "agent": "backend-tester" }
    },
    "frontend": {
      "review": { "status": "PASS", "agent": "frontend-reviewer" },
      "testing": { "status": "FAIL", "agent": "frontend-tester" }
    }
  }
}

Behavior:

Exit blocked until ALL domains pass ALL phases
If any domain’s tests fail, ALL domains reset for next iteration
Domain-specific agents ensure expertise match

Level 3: Orchestrator Loop (Skill Logic)

Scope: The 16-phase workflow
Function: Re-invokes entire phases if macro-goals are missed

5.4 Ephemeral vs. Persistent State

Dual-state architecture:

Type	Storage	Purpose	Lifespan
Ephemeral State	feedback-loop-state.json	Runtime Enforcement (blocking exit, tracking dirty bits)	Cleared on session restart
Persistent State	MANIFEST.yaml	Workflow Coordination (resuming tasks, tracking phases)	Survives session restarts

5.5 The Escalation Advisor

When an agent gets stuck in a loop, we implement an Out-of-Band Advisor:

Trigger: Stop event blocked > 3 times
Action: Hook invokes external LLM with session transcript
Prompt: “Analyze this loop. Why is the agent stuck? Provide a 1-sentence hint.”
Result: Hint injected into context as system message

5.6 Output Location Enforcement

Three-layer enforcement for output compliance:

LAYER 1: PostToolUse (Task) – FEEDBACK

task-skill-enforcement.sh
Warns if agent didn’t invoke persisting-agent-outputs
Non-blocking feedback to orchestrator

LAYER 2: SubagentStop – BLOCKING

output-location-enforcement.sh
Detects untracked .md files outside .claude/.output/
Checks skill compliance in file content
Analyzes git state for safe vs manual revert
BLOCKS completion if violations found

LAYER 3: Stop – DEFENSE IN DEPTH

quality-gate-stop.sh
Same detection as Layer 2
Catches anything that slipped through SubagentStop

5.7 Architectural Evolution

Synthesis of foundational patterns:

Feature	Ralph Wiggum	Continuous-Claude-v3	Superpowers	This Platform
Scope	Single Agent Loop	Cross-Session Handoff	Skill Framework	Multi-Agent Orchestration
State	None (Loop only)	YAML Handoffs	Context-based	Dual (Ephemeral + Persistent)
Control	Prompt-based	Prompt-based	Skill-based	Deterministic Hooks
Feedback	None	None	Human-in-loop	Inter-Phase Loops + Escalation Advisor

6. The Supply Chain: Lifecycle Management

Managing 350+ prompts and 39+ specialized agents leads to entropy. We treat these assets as software artifacts.

6.1 The Agent Manager

Every agent definition must pass 9-Phase Agent Audit:

Leanness: Strictly <150 lines (or <250 for architects)
Discovery: Valid “Use when” triggers for Task tool
Skill Integration: Proper Gateway usage instead of hardcoded paths
Output Standard: JSON schema compliance for structured handoffs

6.2 The Skill Manager & TDD

28-Phase Skill Audit System:

Structural: Frontmatter validity, file size (<500 lines)
Semantic: Description signal-to-noise ratio
Referential: Integrity of all Read() paths and Gateway linkages

Hybrid Audit Pattern:

Deterministic CLI Checks: TypeScript ASTs verify file structures, link validity, syntax
Semantic LLM Review: “Reviewer LLM” judges clarity, tone, utility

TDD for Prompts (Red-Green-Refactor):

Red: Capture transcript where agent fails
Green: Update skill/hook until behavior corrected
Refactor: Run “Pressure Tests” with adversarial system prompts

6.3 The Research Orchestrator: Content Accuracy

Research-First Workflow:

Intent Expansion: Breaktopic into semantic interpretations
Sequential Discovery: Agents dispatch to 6 sources:
- Codebase: Existing patterns
- Context7: Official documentation
- GitHub: Community usage and issues
- Web/Perplexity: Current best practices
Synthesis: Aggregate findings, resolve conflicts, generate SKILL.md

7. Tooling Architecture: Progressive MCPs & Code Intelligence

7.1 The TypeScript Wrapper Pattern (MCP Code Execution)

Standard MCP implementations suffer from:

Eager Loading: ~20k tokens injected at startup
Context Rot: Every intermediate result replays back into context

Solution: On-Demand TypeScript Wrappers

Legacy Model: 5 MCP servers = 71,800 tokens at startup (36% of context)
Wrapper Model: 0 tokens at startup. Wrappers load via Gateway only when requested
Safety Layer: Zod schema validation on inputs, Response Filtering on outputs

Workflow:

Session Start: 0 Tokens Loaded ↓
Agent → Gateway: "I need to fetch a Linear issue"
Gateway → Agent: Returns Path: .claude/tools/linear/get-issue.ts
Agent → Wrapper: Execute(issueId: "ENG-123")
Wrapper: 1. Zod Validation ↓
Wrapper → MCP: Spawn Process & Request
MCP → Wrapper: Large JSON Response (50kb)
Wrapper: 2. Response Filtering ↓
Wrapper → Agent: Optimized JSON (500b)
Process Ends. Memory Freed.

7.2 Serena: Semantic Code Intelligence

The File-Reading Problem:

Traditional: Read full file (~8,000 tokens) → Find function → Generate replacement For 5 files: ~40,000 tokens

Symbol-Level: Query find_symbol(“processPayment”) → Returns function body (~200 tokens) For 5 files: ~1,000 tokens

Operations Comparison:

Operation	Without Serena	With Serena
Find function definition	Read entire file(s), regex search	find_symbol → exact location
Trace call hierarchy	Read all potential callers	find_referencing_symbols → direct graph
Insert new method	Read file, string manipulation	insert_after_symbol → surgical placement
Navigate dependencies	Grep + manual file traversal	find_symbol on imports → semantic resolution

Performance: Custom Connection Pool maintains warm LSP processes, reducing query latency from ~3s cold-start to ~2ms warm.

8. Infrastructure Integration: Zero-Trust Secrets

8.1 The run-with-secrets Wrapper

We do not give agents API keys. We give them a tool: 1password.run-with-secrets.

Configuration:

export const DEFAULT_CONFIG = {
  account: "company.1password.com",
  serviceItems: {
    "aws-dev": "op://Private/AWS Key/credential",
    "ci-cd": "op://Engineering/CI Key/credential",
  },
};

Execution Flow:

LLM → Agent: "List S3 Buckets"
Agent → Tool: run_with_secrets("aws s3 ls")
↓
SECURE ENCLAVE (Child Process):
Tool → 1Password: Request "AWS_ACCESS_KEY"
1Password → Tool: Inject as ENV VAR
Tool → AWS: Execute Command
AWS → Tool: Output: "bucket-a, bucket-b"
↓
Tool → Agent: Return Output
Agent → LLM: "Here are the buckets..."

Security Guarantee: The secret exists only in child process environment variables. Never printed to stdout, never logged, never enters LLM context window.

9. Horizontal Scaling Architecture

Traditional development is constrained by local hardware (laptop RAM/CPU), limiting analysis to 3-5 concurrent sessions.

Solution: DecoupleControl Plane (Laptop) from Execution Plane (Cloud):

Local: Engineer’s laptop deploys Docker instance using DevPod
Remote: Development environment runs in ephemeral Docker container within AMI
Bridge: Secure SSH tunnel forwards remote terminal to laptop

9.1 DevPod Benefits

Isolation: Each feature runs in own container
Resources: Provision 128GB RAM instances for massive monorepo analysis
Security: Code never leaves VPC. Laptop only sees terminal pixels

10. Roadmap: Beyond Orchestration

Current platform achieves Level 3 Autonomy (Orchestrated). Roadmap targets Level 5 (Self-Evolving).

10.1 Heterogeneous LLM Routing

Development Task	Optimal Model	Technical Advantage
Logic & Reasoning	DeepSeek-R1 / V3	RL-based chain-of-thought for complex inference
Document Processing	DeepSeek OCR 2	10x token efficiency with visual causal flow
UI/UX & Frontend	Kimi 2.5	Native MoonViT architecture for visual debugging
Parallel Research	Kimi 2.5 Swarm	PARL-driven optimization across 100 agents
Massive Repository Mapping	DeepSeek-v4 Engram	O(1) lookup and tiered KV cache for million-token context

10.2 Self-Annealing & Auto-Correction

When agent fails quality gate > 3 times, system triggers Self-Annealing Workflow:

Diagnosis: Meta-Agent reads session transcript, identifies “Rationalization Path”
Patching:
- Skill Annealing: Modify SKILL.md to add Anti-Pattern entries
- Hook Hardening: Update bash script logic for edge cases
- Agent Refinement: Update agent prompt via agent-manager
Verification: Run pressure-testing-skill-content against patched artifact
Pull Request: Create PR labeled [Self-Annealing] for human review

This transforms the platform into an antifragile system that gets stronger with every failure.

10.3 Agent-to-Agent Negotiation (Future)

Currently agents follow rigid JSON schemas. Future agents will negotiate API contracts dynamically:

"I need X, can you provide it?"
"No, but I can provide Y which is similar."
"Agreed, proceeding with Y."

10.4 Self-Healing Infrastructure (Future)

Detecting “Context Starvation” and auto-archiving memory
Identifying “Tool Hallucination” and generating new Zod schemas

11. Conclusion

By architecting a system where:

Agents are ephemeral and stateless
Context is strictly curated via Gateways
Workflows are enforced by deterministic Kernel hooks
Tools are progressively loaded and type-safe
Secrets never touch the context

We transform the LLM from a “creative assistant” into a deterministic component of the software supply chain. This allows scaling development throughput linearly with compute, untethered by cognitive limits of human attention.

Key Architectural Principles

The Five Layers

Agent Layer: Stateless, ephemeral workers (<150 lines)
Skill Layer: Two-tier progressive loading (Core + Library)
Orchestration Layer: Kernel-mode coordinator with 16-phase state machine
Hook Layer: Eight-layer defense in depth for enforcement
Infrastructure Layer: Zero-trust secrets, horizontal scaling

The Paradigm Shift

From Thick Agent → Thin Agent / Fat Platform
From Probabilistic Prompts → Deterministic Hooks
From Monolithic Context → Progressive Loading
From Single-Model → Heterogeneous Routing
From Static Rules → Self-Annealing System

Constraint Forces Innovation

The problem with capital is that it allows doing many inefficient things very fast. Without that luxury, the system must be clever instead. Build the machine, that builds the machine, that enables a team, to build all the things.

Auto research loop

AI Writes Code Faster, But Do We Still Understand It?