Migrate LLM agents from expensive to cheap models. Automatically.
Claude Sonnet ($15/MTok) → GPT-4o-mini ($0.15/MTok) = 100x cost reduction
You built an agent that works perfectly with Claude Sonnet. It costs $0.02 per run. At 10,000 runs/month, that’s $200. With GPT-4o-mini, it could be $2.
But migrating manually means 15+ hours of:
- Rewriting prompts that “just work” on smart models
- Running hundreds of test cases
- Debugging subtle failures
- Repeat for every new cheaper model
Distill automates this entire process.
┌─────────────────────────────────────────────────────────────────────┐
│ │
│ Your Agent Distill Optimized Agent │
│ (Sonnet) │ (GPT-4o-mini) │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌───────┐ ┌─────────────┐ ┌───────────┐ │
│ │$0.02 │ ───▶ │ Profile │ │ $0.002 │ │
│ │/run │ │ Judge │ ──────▶ │ /run │ │
│ │ │ │ Optimize │ │ │ │
│ │ 95% │ │ Validate │ │ 96% │ │
│ │success│ └─────────────┘ │ success │ │
│ └───────┘ │ └───────────┘ │
│ │ │
│ ▼ │
│ Iterates until 95%+ success │
│ │
└─────────────────────────────────────────────────────────────────────┘
# Clone and setup
git clone https://github.com/your-org/distill.git
cd distill
pnpm install
pnpm build
# Configure API keys
cp .env.example .env
# Add your ANTHROPIC_API_KEY and OPENAI_API_KEY
1. Define your agent (agent.yaml):
name: "Customer Support Bot"
description: "Answers product questions accurately and concisely"
model:
provider: anthropic
name: claude-sonnet-4-20250514
temperature: 0
systemPrompt: |
You are a helpful customer support agent for TechCorp.
Answer questions about our products accurately and concisely.
Be professional and friendly.
objective: "Provide accurate, helpful answers to customer questions"
successCriteria:
- "Answers are factually correct"
- "Tone is professional and friendly"
- "Responses are concise"
2a. Option A: Profile with expensive model (automatic):
Create test inputs (test-inputs.json):
[
"What is your return policy?",
"How do I reset my password?"
]
Profile to create gold standard:
pnpm profile -c agent.yaml -i test-inputs.json -o test-suite.json
2b. Option B: Use existing gold standard (manual – no cost):
If you already have gold standard responses from production:
[
{
"input": { "message": "What is your return policy?" },
"expectedOutput": { "response": "Our return policy allows..." }
},
{
"input": { "message": "How do I reset my password?" },
"expectedOutput": { "response": "To reset your password..." }
}
]
Create test suite directly:
pnpm create-test-suite -i manual-test-cases.json -o test-suite.json
3. Migrate to cheaper model:
pnpm migrate -c agent.yaml -p test-suite.json -t gpt-4o-mini -o agent.optimized.yaml
4. Evaluate the result:
pnpm evaluate -c agent.optimized.yaml -p test-suite.json
Output:
🔧 Loading agent from agent.yaml...
📊 Profiling source model (claude-sonnet-4-20250514)...
Running 50 test cases...
✅ Baseline established:
• Cost: $0.018/run
• Success rate: 94%
• Avg latency: 2.3s
🎯 Target: gpt-4o-mini
🚀 Starting migration...
Iteration 1/10
├─ Running evaluation... 42% success
├─ Judge feedback: "Missing context for product-specific terms"
└─ Modifier: Adding explicit product glossary to prompt
Iteration 2/10
├─ Running evaluation... 71% success
├─ Judge feedback: "Inconsistent formatting in responses"
└─ Modifier: Adding output format examples
Iteration 3/10
├─ Running evaluation... 88% success
├─ Judge feedback: "Edge cases with ambiguous questions"
└─ Modifier: Adding chain-of-thought for complex queries
Iteration 4/10
├─ Running evaluation... 96% success ✓
└─ Target achieved!
✨ Migration complete!
┌────────────────────────────────────────┐
│ Metric Before After │
│ ─────────────────────────────────────│
│ Model Sonnet 4o-mini │
│ Cost/run $0.018 $0.002 │
│ Success rate 94% 96% │
│ Latency 2.3s 0.8s │
│ ─────────────────────────────────────│
│ Monthly savings (10k runs): $160 │
└────────────────────────────────────────┘
📝 Saved: agent.optimized.yaml
- Automatic Profiling: Run your agent, capture gold standard outputs
- Manual Test Suites: Use your existing gold standards from production (no profiling cost!)
- LLM-as-Judge Evaluation: Smart comparison of outputs, not just string matching
- Iterative Prompt Optimization: Automatically refines prompts based on failures
- Convergence Strategies: Choose how to stop optimization
ThresholdPlusBonusRounds(default): Extra iterations after reaching thresholdAlwaysRunMax: Always run all iterations, return best resultEarlyStoppingWithPatience: Stop early if no improvement
- Anti-Overfitting: Modifier learns general strategies, not specific test answers
- Multi-Provider Support: Anthropic (Claude), OpenAI (GPT)
- CLI Tools:
profile,migrate,evaluatecommands - LangGraph Orchestration: Flexible graph-based migration flow
- Cost Tracking: Real execution traces with token counts
- Multi-Agent Decomposition: Automatically split complex agents into specialized sub-agents
- Architecture Optimization: Router + specialized agents for different task types
- Visual Dashboard: Web UI for monitoring migrations
- CI/CD Integration: Automated regression testing
| Scenario | Distill Helps? |
|---|---|
| Agent works on Sonnet, need to cut costs | ✅ Yes |
| Building new agent, want to start cheap | |
| Agent has consistent failures on cheap model | ✅ Yes |
| Need to migrate to new model (GPT-5, etc) | ✅ Yes |
| Agent requires vision/multimodal | 🔜 Coming in Phase 2 |
Distill is designed to evolve:
Phase 1: SingleAgent optimization
┌─────────────────────────┐
│ Profiler → Judge → │
│ PromptModifier → │
│ Validator │
└─────────────────────────┘
Phase 2: MultiAgent decomposition (same interfaces)
┌─────────────────────────┐
│ Profiler → Judge → │
│ ArchModifier → │ ← New modifier, same flow
│ Validator │
└─────────────────────────┘
The Agent interface abstracts single vs multi-agent:
interface Agent {
execute(input: Input): PromiseOutput>;
getCost(): number;
}
// Phase 1: SingleAgent implements Agent
// Phase 2: MultiAgent implements Agent (drop-in replacement)
Distill provides convenient npm scripts for all commands:
# Profile an agent (creates gold standard)
pnpm profile -c agent.yaml -i test-inputs.json -o test-suite.json
# Create test suite from manual gold standards (no profiling cost!)
pnpm create-test-suite -i manual-test-cases.json -o test-suite.json
# Migrate to cheaper model
pnpm migrate -c agent.yaml -p test-suite.json -t gpt-4o-mini
# Evaluate results
pnpm evaluate -c agent.optimized.yaml -p test-suite.json
Available scripts:
pnpm distill– Run any CLI commandpnpm profile– Shortcut for profile commandpnpm create-test-suite– Shortcut for create-test-suitepnpm migrate– Shortcut for migrate commandpnpm evaluate– Shortcut for evaluate command
All commands automatically load .env file for API keys.
distill/
├── packages/
│ ├── core/ # Main logic (provider-agnostic)
│ │ ├── profiler/ # Captures agent behavior
│ │ ├── judge/ # LLM-based evaluation
│ │ ├── modifier/ # Prompt & architecture optimization
│ │ ├── agents/ # Agent abstractions
│ │ └── validator/ # End-to-end testing
│ ├── cli/ # Command-line interface
│ └── web/ # Dashboard (placeholder)
├── examples/ # Real-world examples
├── docs/ # Documentation
├── turbo.json # Turborepo task configuration
└── pnpm-workspace.yaml # pnpm workspaces config
This project uses pnpm workspaces + Turborepo for fast, cached builds:
pnpm build # Build all packages (with caching)
pnpm dev # Watch mode for all packages
pnpm test # Run tests in parallel
pnpm lint # Lint all packages
pnpm typecheck # TypeScript validation
pnpm clean # Clean build artifacts and cache
Turborepo caches build outputs – subsequent builds are near-instant if nothing changed.
LLM costs are a universal problem. Every team building with AI faces the same migration pain. By open-sourcing Distill:
- Community improvements: More edge cases, more optimizations
- Provider support: Community can add Mistral, Llama, etc.
- Trust: See exactly how your prompts are being modified
- Standards: Help establish best practices for LLM migration
We welcome contributions! See CONTRIBUTING.md for:
- Development setup
- Running tests
- Submitting PRs
- Code style guidelines
MIT License – see LICENSE
Built with frustration from manually migrating agents too many times.
Documentation · Examples · Discord · Twitter