Back to Blog

Claude Code Decoded: Smart Context Loading

Stop loading entire files into context. Build an MCP server that automatically loads only the code you need, cutting token usage by 70-90%.

Black Dog Labs Team
5/19/2025
8 min read
claude-codeai-developmentmcpcost-optimizationproductivity

Claude Code Decoded: Smart Context Loading

When you ask Claude to "fix the calculateTax function," it doesn't need the entire 800-line tax.py file. It needs 50 lines - the function itself, maybe two dependencies, and nothing else. But by default, you're loading 4,000+ tokens when 300 would do.

This is the context tax: loading everything because finding the right subset is hard. The solution? Build an MCP server that automatically determines the minimal context needed for any task.

The problem with file-level context

Traditional approach:

User: "Fix the bug in calculateTax"
Claude receives: tax.py (800 lines, 4,200 tokens)
Claude needs: calculateTax function + getTaxRate dependency (50 lines, 280 tokens)
Waste: 93% of tokens

Multiply this across dozens of daily tasks and you're burning thousands of dollars annually on unnecessary context. And when you hit the context limit mid-session?

claude-code

Everything stops. All that carefully loaded context, gone. Time to start over with a fresh session - and load all those tokens again.

Building the smart context loader

Core concept

An MCP server that:

  1. Parses code to extract symbols (functions, classes, methods)
  2. Loads only the requested symbol
  3. Automatically includes direct dependencies
  4. Respects a token budget
  5. Returns formatted, ready-to-use context

Implementation

Install dependencies:

npm install @modelcontextprotocol/sdk tree-sitter tree-sitter-python

The code analyzer (handles parsing and extraction):

// src/code-analyzer.ts
import Parser from "tree-sitter";
import Python from "tree-sitter-python";
import fs from "fs/promises";
export class CodeAnalyzer {
private parser: Parser;
constructor() {
this.parser = new Parser();
this.parser.setLanguage(Python);
}
async analyzeFile(filePath: string) {
const content = await fs.readFile(filePath, "utf-8");
const tree = this.parser.parse(content);
return {
content,
symbols: this.extractSymbols(tree, content),
};
}
private extractSymbols(tree: Parser.Tree, content: string) {
const symbols = [];
const cursor = tree.walk();
const visit = () => {
const node = cursor.currentNode;
if (node.type === "function_definition" || node.type === "class_definition") {
const nameNode = node.childForFieldName("name");
if (nameNode) {
symbols.push({
name: content.slice(nameNode.startIndex, nameNode.endIndex),
type: node.type,
startLine: node.startPosition.row,
endLine: node.endPosition.row,
content: content.slice(node.startIndex, node.endIndex),
});
}
}
if (cursor.gotoFirstChild()) {
do { visit(); } while (cursor.gotoNextSibling());
cursor.gotoParent();
}
};
visit();
return symbols;
}
extractLines(content: string, startLine: number, endLine: number) {
return content.split("\n").slice(startLine, endLine + 1).join("\n");
}
}

The context optimizer (manages budgets and dependencies):

// src/context-optimizer.ts
export class ContextOptimizer {
private analyzer = new CodeAnalyzer();
async loadSymbol(filePath: string, symbolName: string, maxTokens = 2000) {
const analysis = await this.analyzer.analyzeFile(filePath);
const symbol = analysis.symbols.find(s => s.name === symbolName);
if (!symbol) {
throw new Error(`Symbol '${symbolName}' not found in ${filePath}`);
}
const result = {
primary: {
file: filePath,
name: symbol.name,
type: symbol.type,
lines: { start: symbol.startLine, end: symbol.endLine },
content: symbol.content,
},
dependencies: [],
tokens: this.estimateTokens(symbol.content),
};
// Find and add dependencies within token budget
let tokensUsed = result.tokens;
const deps = this.findDependencies(analysis, symbol);
for (const dep of deps) {
const depTokens = this.estimateTokens(dep.content);
if (tokensUsed + depTokens <= maxTokens) {
result.dependencies.push({
name: dep.name,
type: dep.type,
content: dep.content,
});
tokensUsed += depTokens;
}
}
result.tokens = tokensUsed;
return result;
}
private findDependencies(analysis, symbol) {
// Simple dependency detection: find symbols called within the target symbol
const symbolContent = symbol.content;
return analysis.symbols
.filter(s =>
s.name !== symbol.name &&
symbolContent.includes(s.name)
);
}
private estimateTokens(text: string) {
// Rough estimate: 1 token ≈ 4 characters
return Math.ceil(text.length / 4);
}
}

The MCP server (exposes the tool to Claude):

// src/index.ts
import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { CallToolRequestSchema, ListToolsRequestSchema } from "@modelcontextprotocol/sdk/types.js";
import { ContextOptimizer } from "./context-optimizer.js";
class SmartContextServer {
private server: Server;
private optimizer = new ContextOptimizer();
constructor() {
this.server = new Server(
{ name: "smart-context", version: "1.0.0" },
{ capabilities: { tools: {} } }
);
this.setupHandlers();
}
private setupHandlers() {
this.server.setRequestHandler(ListToolsRequestSchema, async () => ({
tools: [{
name: "load_symbol",
description: "Load a specific function or class with dependencies, respecting token budget",
inputSchema: {
type: "object",
properties: {
file_path: {
type: "string",
description: "Path to the Python file"
},
symbol_name: {
type: "string",
description: "Name of function or class to load"
},
max_tokens: {
type: "number",
description: "Maximum tokens to load (default: 2000)",
default: 2000
},
},
required: ["file_path", "symbol_name"],
},
}],
}));
this.server.setRequestHandler(CallToolRequestSchema, async (request) => {
if (request.params.name === "load_symbol") {
const args = request.params.arguments;
const context = await this.optimizer.loadSymbol(
args.file_path as string,
args.symbol_name as string,
(args.max_tokens as number) || 2000
);
let output = `# ${context.primary.name}\n\n`;
output += `**File:** ${context.primary.file}\n`;
output += `**Type:** ${context.primary.type}\n`;
output += `**Lines:** ${context.primary.lines.start}-${context.primary.lines.end}\n`;
output += `**Total tokens:** ${context.tokens}\n\n`;
output += "```python\n" + context.primary.content + "\n```\n";
if (context.dependencies.length > 0) {
output += "\n## Dependencies\n\n";
for (const dep of context.dependencies) {
output += `### ${dep.name} (${dep.type})\n\`\`\`python\n${dep.content}\n\`\`\`\n\n`;
}
}
return { content: [{ type: "text", text: output }] };
}
return {
content: [{ type: "text", text: `Unknown tool: ${request.params.name}` }],
isError: true,
};
});
}
async run() {
const transport = new StdioServerTransport();
await this.server.connect(transport);
console.error("Smart Context Loader running on stdio");
}
}
new SmartContextServer().run().catch(console.error);

Configure Claude Code:

Add to your claude_desktop_config.json:

{
"mcpServers": {
"smart-context": {
"command": "node",
"args": ["/path/to/your/build/index.js"]
}
}
}

Real-world usage

Before:

You: "Fix the CA tax rate in calculateTax"
[Claude loads entire tax.py: 4,200 tokens]

After:

You: "Load calculateTax from src/utils/tax.py"
Claude: # calculateTax
**File:** src/utils/tax.py
**Type:** function_definition
**Lines:** 45-52
**Total tokens:** 347
```python
def calculateTax(amount, state):
"""Calculate tax for amount and state"""
rate = getTaxRate(state)
return amount * rate

Dependencies

getTaxRate (function_definition)

def getTaxRate(state):
return TAX_RATES.get(state, 0.06)

You: "Fix CA rate to 0.0925" [Claude makes surgical edit with full context in 347 tokens vs 4,200]

**Savings: 92%**
## Adaptive context budgets
Different tasks need different amounts of context:
```typescript
const CONTEXT_BUDGETS = {
bug_fix: 1000, // Very focused, single function
small_feature: 2000, // Function + related code
refactor: 3000, // May need broader context
architecture: 5000, // System-wide understanding
};
// Use task type to set budget
await optimizer.loadSymbol(
"src/billing.py",
"processPayment",
CONTEXT_BUDGETS.bug_fix
);

Measuring effectiveness

Track these metrics to optimize your context loading:

interface ContextMetrics {
tokens_loaded: number;
tokens_referenced: number; // How many tokens Claude actually used
task_completed: boolean;
additional_context_needed: boolean;
}
// Good context loading:
// - 80%+ utilization (tokens referenced / tokens loaded)
// - 90%+ completion rate
// - <20% need for additional context

If you're consistently needing more context, increase your budget. If utilization is low, decrease it.

Extending to other languages

The tree-sitter ecosystem supports dozens of languages. Add TypeScript support:

import TypeScript from "tree-sitter-typescript";
class MultiLanguageAnalyzer {
private parsers = new Map();
constructor() {
this.parsers.set('.py', this.createParser(Python));
this.parsers.set('.ts', this.createParser(TypeScript.typescript));
this.parsers.set('.tsx', this.createParser(TypeScript.tsx));
}
async analyzeFile(filePath: string) {
const ext = path.extname(filePath);
const parser = this.parsers.get(ext);
if (!parser) {
throw new Error(`Unsupported file type: ${ext}`);
}
// Same analysis logic, different parser
}
}

Common pitfalls

Over-optimization

Don't spend hours optimizing context loading for files you rarely edit. Focus on:

  • Frequently modified files
  • Large files (>500 lines)
  • Files with expensive dependencies

Missing dependencies

Simple string matching for dependencies misses indirect calls. Consider:

  • Import analysis
  • Call graph construction
  • Semantic analysis

Start simple, add sophistication as needed.

Token estimation accuracy

The "4 characters per token" rule is approximate. For precise budgets:

import { encode } from 'gpt-tokenizer';
private estimateTokens(text: string) {
return encode(text).length;
}
Performance note

GPT tokenizer is slower. Use for budget enforcement, not exploratory analysis.

The broader impact

Smart context loading isn't just about saving tokens - it's about:

Faster responses

  • Less context = faster processing
  • 20-40% reduction in response time

Better results

  • More relevant context improves accuracy
  • Less noise in context window
  • Easier for Claude to focus on the task

File boundaries are a human construct for organizing code. They have nothing to do with what Claude needs to solve your problem.

When you load an 800-line file to fix one 50-line function, you're not being thorough - you're being wasteful. It's like photocopying an entire encyclopedia when you need one paragraph.

Smart context loading is about respecting the task, not the file structure. Parse the code, extract what matters, load dependencies within budget, and skip the rest. 70-90% token savings isn't optimization - it's just not being wasteful.

The best part? This isn't theoretical. Tree-sitter parsers exist for every major language. The MCP SDK handles the protocol. You're 200 lines of TypeScript away from never loading a full file again.

Or you could keep burning tokens and waiting for slower responses. Your call.

What's next

Smart context loading works great within a single repo. But modern architectures span multiple repositories - microservices, shared libraries, API contracts. Next up: Multi-Repo Context Loading - intelligently loading context across repositories without burning 50,000+ tokens on duplicate dependencies.


Series navigation

← Previous: Claude Code Decoded: The Handoff Protocol

→ Next: Claude Code Decoded: Multi-Repo Context

Other posts in this series:


Building AI-powered development tools? We help teams optimize their AI workflows and build production-grade MCP servers. Let's talk about your specific needs.