t: string; toolCallId?: string }>;
activeToolRegistry: ToolDefinition[];
iterationCounter: number;
maxIterations: number;
contextCompressionThreshold: number;
isCompactionPending: boolean;
executionAborted: boolean;
}
**Why this structure?**
- `conversationLedger` replaces naive chat history. It explicitly tracks user prompts, model responses, and tool outputs in a format the API can consume.
- `iterationCounter` and `maxIterations` enforce turn budgets, preventing infinite loops and controlling costs.
- `isCompactionPending` decouples context management from the main loop, allowing compression to trigger only when necessary.
### 2. Implement the Session Orchestrator
The orchestrator doesn't just call the model. It manages the lifecycle of each turn, validates tool calls, executes actions safely, and maintains state transitions.
```typescript
class AgentOrchestrator {
private state: AgentSessionState;
private modelClient: ModelAPIClient;
private toolExecutor: ToolExecutionEngine;
constructor(config: OrchestratorConfig) {
this.state = {
conversationLedger: [],
activeToolRegistry: config.tools,
iterationCounter: 0,
maxIterations: config.maxTurns || 25,
contextCompressionThreshold: config.compressionThreshold || 80,
isCompactionPending: false,
executionAborted: false
};
this.modelClient = new ModelAPIClient(config.apiKey);
this.toolExecutor = new ToolExecutionEngine(config.permissions);
}
async runSession(userPrompt: string): Promise<string> {
this.state.conversationLedger.push({ role: 'user', content: userPrompt });
while (!this.state.executionAborted && this.state.iterationCounter < this.state.maxIterations) {
this.state.iterationCounter++;
// 1. Request model reasoning
const modelResponse = await this.modelClient.generate({
messages: this.state.conversationLedger,
tools: this.state.activeToolRegistry
});
// 2. Parse intent vs. tool invocation
if (modelResponse.toolCalls.length > 0) {
const toolResults = await this.executeToolBatch(modelResponse.toolCalls);
this.state.conversationLedger.push(
{ role: 'assistant', content: modelResponse.text, toolCalls: modelResponse.toolCalls },
...toolResults
);
} else {
this.state.conversationLedger.push({ role: 'assistant', content: modelResponse.text });
return modelResponse.text;
}
// 3. Evaluate context pressure
if (this.estimateContextUsage() > this.state.contextCompressionThreshold) {
this.state.isCompactionPending = true;
}
if (this.state.isCompactionPending) {
await this.compressContext();
this.state.isCompactionPending = false;
}
}
throw new Error('Session terminated: iteration budget exhausted');
}
private async executeToolBatch(calls: ToolCall[]): Promise<Array<{ role: 'tool'; content: string; toolCallId: string }>> {
const results: Array<{ role: 'tool'; content: string; toolCallId: string }> = [];
for (const call of calls) {
const validated = this.toolExecutor.validate(call);
if (!validated.allowed) {
results.push({ role: 'tool', content: `Permission denied: ${validated.reason}`, toolCallId: call.id });
continue;
}
const output = await this.toolExecutor.run(call);
results.push({ role: 'tool', content: output, toolCallId: call.id });
}
return results;
}
private estimateContextUsage(): number {
const totalTokens = this.state.conversationLedger.reduce((sum, msg) => sum + this.roughTokenCount(msg.content), 0);
return (totalTokens / this.modelClient.maxContextWindow) * 100;
}
private async compressContext(): Promise<void> {
const summary = await this.modelClient.summarize(this.state.conversationLedger);
this.state.conversationLedger = [
{ role: 'system', content: `Previous context summary: ${summary}` },
...this.state.conversationLedger.slice(-4)
];
}
private roughTokenCount(text: string): number {
return Math.ceil(text.length / 4);
}
}
Architecture Decisions & Rationale
State-Driven Over Function-Driven
The loop revolves around AgentSessionState rather than passing parameters through nested functions. This prevents state drift, makes debugging deterministic, and allows external systems (logging, monitoring, UI) to inspect the session at any point.
Intent vs. Execution Separation
The model never touches the file system or shell directly. It outputs structured tool calls. The ToolExecutionEngine intercepts these calls, validates permissions, runs the action, and formats the output. This creates an audit trail and prevents arbitrary command injection.
Explicit Turn Budgeting
iterationCounter and maxIterations are mandatory. Multi-turn agents without exit conditions will loop indefinitely on ambiguous tasks, burning tokens and degrading UX. The budget acts as a safety valve and triggers graceful degradation.
Decoupled Context Compression
Compression isn't triggered every turn. It's evaluated based on context pressure (estimateContextUsage) and executed asynchronously. This preserves recent conversation history while summarizing older turns, maintaining coherence without hitting API limits.
Pitfall Guide
1. The "Model-as-Executor" Fallacy
Explanation: Assuming the LLM can directly read files, run commands, or modify code. Models generate text; they don't interact with the OS.
Fix: Enforce a strict boundary. The model outputs structured tool calls. A host runtime validates, executes, and returns results. Never pass raw shell commands to the model for execution.
2. Unbounded Iteration Loops
Explanation: Omitting turn budgets or exit conditions. The agent will cycle through the same failed hypothesis indefinitely, consuming tokens and freezing the session.
Fix: Implement maxIterations with exponential backoff or fallback strategies. Log iteration counts and trigger alerts when thresholds are approached.
3. Raw Shell Command Injection
Explanation: Allowing the model to emit bash or cmd strings and executing them directly. This bypasses permission checks, creates security vulnerabilities, and makes error recovery impossible.
Fix: Use structured tool schemas (e.g., FileRead, CommandRun, CodeEdit). Each tool must have explicit input validation, permission gating, and standardized output formatting.
4. Context Window Bleed
Explanation: Appending every tool result and model response to the conversation ledger without compression. Token costs spike, and the model loses focus on recent context.
Fix: Implement context pressure monitoring. Trigger summarization when usage exceeds 70–80%. Retain the last N turns verbatim and compress older history into a system prompt summary.
Explanation: Tools return empty strings or unstructured errors. The model misinterprets failures as success, leading to cascading mistakes.
Fix: Standardize tool output formats. Include status codes, error messages, and execution metadata. The orchestrator must parse these and feed structured feedback to the model.
6. Permission Bypass in Multi-Turn Flows
Explanation: Granting broad permissions upfront and never re-evaluating them. A tool that was safe in turn 1 may be dangerous in turn 5 after context shifts.
Fix: Implement per-call permission validation. Maintain a permission matrix that maps tools to allowed scopes, and re-check before each execution.
7. Ignoring Compaction Triggers
Explanation: Relying on fixed turn counts for compression instead of dynamic context pressure. This either compresses too early (losing critical details) or too late (hitting API limits).
Fix: Calculate context usage dynamically based on token estimates. Trigger compaction when pressure crosses a configurable threshold, not on arbitrary iteration counts.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Simple Q&A or documentation lookup | Direct API invocation | No environmental interaction needed; single turn suffices | Low (fixed token cost) |
| Debugging failing tests or tracing errors | State-driven ReAct runtime | Requires file reads, command execution, and iterative hypothesis testing | Medium-High (turn-based, but predictable with budgets) |
| CI/CD pipeline automation | Orchestrator with strict permission gating | Needs safe, auditable tool execution across multiple environments | High (requires robust error handling and rollback logic) |
| Large codebase refactoring | ReAct runtime with aggressive context compression | Context window limits require summarization to maintain coherence | Medium (compression reduces token overhead per turn) |
Configuration Template
const orchestratorConfig: OrchestratorConfig = {
apiKey: process.env.MODEL_API_KEY,
maxTurns: 30,
compressionThreshold: 75,
permissions: {
fileRead: { allowedPaths: ['./src', './tests'], denyPatterns: ['*.env', 'node_modules'] },
commandRun: { allowedCommands: ['npm test', 'git status', 'ls'], sudo: false },
codeEdit: { requireConfirmation: true, maxChangesPerTurn: 5 }
},
tools: [
{ name: 'FileRead', schema: { path: 'string' }, handler: readFileSystem },
{ name: 'CommandRun', schema: { command: 'string', cwd: 'string' }, handler: executeShell },
{ name: 'CodeEdit', schema: { filePath: 'string', search: 'string', replace: 'string' }, handler: applyPatch }
]
};
const agent = new AgentOrchestrator(orchestratorConfig);
const result = await agent.runSession('Diagnose why the integration tests are timing out and apply a fix.');
Quick Start Guide
- Install dependencies: Add your preferred HTTP client, token estimator, and model SDK to your project.
- Define tool schemas: Create structured interfaces for file operations, command execution, and code modifications. Attach permission rules to each.
- Initialize the orchestrator: Pass your API key, turn budget, compression threshold, and tool registry to the constructor.
- Run a session: Call
runSession() with a user prompt. The loop will handle reasoning, tool execution, context management, and termination automatically.
- Monitor telemetry: Log iteration counts, token usage, and compaction events. Adjust
maxTurns and compressionThreshold based on observed performance.