jimmc414/claude-context-fence: A design pattern for Claude Code skills that separates conversation awareness from dense reference material. Routers (context: inherit) extract user intent; recipes stores (context: fork) hold unbounded reference content at zero token cost to the main context window. Includes 22 working example skill pairs.

A design pattern for LLM tool skills that separates context awareness from dense reference material using a router/recipes architecture with context isolation.

Context windows are finite. Dense reference material like command catalogs, API references, recipe collections, decision matrices is valuable but bloats context. You want both conversation awareness and comprehensive references, but a single skill cannot be both lightweight and comprehensive without paying the full token cost of loading everything into the main context window.

The tension is structural. Thin skills that inherit the conversation can react to what the user actually said, but they carry no depth. Fat skills loaded with hundreds of recipes have depth, but every line of reference material occupies tokens in the main context whether or not it is relevant to the current query. A 1,500-line recipes file costs roughly 15,000 tokens just to be present.

The question: how do you give the LLM access to dense reference material without that material permanently occupying the context window?

The solution is an inherit to fork boundary. Two skills work as a coordinated pair:

Router (context: inherit): A thin skill that sees the full conversation. It extracts the user’s goal, identifies relevant file paths and constraints from prior discussion, builds a structured argument string, and invokes the recipes skill.
Recipes (context: fork): A dense reference store that runs in complete isolation. It receives the structured argument, matches it against its reference material, returns a result, and its entire content is discarded when execution completes. None of it enters the main context.

The router is the lens. The recipes store is the library. The lens focuses; the library never leaves the shelf.

Property	Router	Recipes
Context mode	`inherit` (sees conversation)	`fork` (isolated)
Typical size	50-160 lines	200-1800+ lines
Token cost to main context	Full (but small)	Zero
Contains	Context extraction logic, argument templates	Commands, recipes, decision matrices
Can invoke other skills	Yes (`allowed-tools: Skill`)	No (leaf node)

flowchart TD
    A["User request in conversation"] --> B{"Skill routing
(description competition)"}
    B --> C["Router skill
context: inherit
~100-160 lines"]
    C --> D["Extracts: goal, file paths,
filters, structure hints"]
    D --> E["Builds structured argument"]
    E --> F["Recipes skill
context: fork
~200-1800+ lines"]
    F --> G["Matches argument to recipe,
returns result"]
    G --> H["Result enters main context"]
    F -. "Fork discarded:
recipes content never
enters main context" .-> I["(freed)"]

    style C fill:#e8f4e8,stroke:#2d6a2d
    style F fill:#e8e8f4,stroke:#2d2d6a
    style I fill:#f4e8e8,stroke:#6a2d2d,stroke-dasharray: 5 5

The fork boundary means recipes content which can be hundreds or thousands of lines never enters the main context window. The router adds roughly 100-160 lines to context; the recipes store contributes zero. The savings scale with the density of the reference material.

Real numbers from the reference implementation:

Skill pair	Router lines	Recipes lines	Recipes tokens (est.)	Savings vs. inline
task-json	161	774	~7,700	83%
task-debug	146	1,829	~18,300	92%
task-network	134	1,524	~15,200	92%
task-dev	133	1,581	~15,800	92%
All 22 pairs	~2,900 total	~19,000 total	~190,000	87% avg

Token estimates use ~10 tokens/line for markdown with code blocks. The 87% average savings means that for every 100 tokens of skill infrastructure in the main context, roughly 670 tokens of reference material are available on demand but absent from the window.

The router’s description is deliberately richer and more specific than the recipes description. The recipes skill marks itself as “(internal)” and uses a minimal description. Combined with a single configuration rule (“prefer the more descriptive skill”), this ensures routing accuracy without complex dispatch logic. The LLM’s existing ability to match user intent to natural-language descriptions does the work. No routing tables, no regex matching, no priority scores.

The system handles misuse without failure. Direct recipes invocation works if the user provides explicit arguments, the fork simply processes them without conversation context. Vague router invocation triggers clarifying questions, because the router can see the conversation and recognize insufficient information. Trivial tasks bypass skills entirely; the LLM can always answer directly from its training data. No invocation path produces an error that the user cannot recover from.

Generic router template:

---
description: "Domain processing - verb1, verb2, verb3 data"
when_to_use: "Use when: scenario1, scenario2. Triggers: keyword1, keyword2, keyword3."
context: inherit
allowed-tools: Skill, Read
---

# Domain Task Router

You have access to the **full conversation context**. Build a well-formed argument for the recipes skill.

**User's request**: $ARGUMENTS

## Context Extraction

Analyze the conversation to identify:
1. **Goal** - What operation is needed?
2. **Target** - What data/file/system is involved?
3. **Constraints** - Filters, format requirements, scope limits
4. **Prior context** - Error messages, previous attempts, related discussion

## Invoke Recipes

Use the Skill tool to invoke `domain-recipes` with your constructed argument.

Generic recipes template:

---
description: "Domain recipes reference (internal)"
context: fork
allowed-tools: Read
---

# Domain Recipes

**Task**: $ARGUMENTS

## Quick Reference

| Task | Command |
|------|---------|
| Common operation 1 | `command --flag` |
| Common operation 2 | `command --other-flag` |

## Category 1
[Dense reference material, commands, decision matrices...]

## Category 2
[More reference material...]

The router should fit in your head. The recipes store should fit in a book.

This pattern was developed and validated with 22 skill pairs covering approximately 195 CLI tools. The pairs range from 774 to 1,829 lines on the recipes side. Adding a new domain is linear: write one router (~130 lines) and one recipes file (unbounded size, since it never touches the main context).

22 implemented skill pairs

Domain	Router	Recipes	Recipes lines
json	task-json	task-json-recipes	774
tabular	task-tabular	task-tabular-recipes	806
text-search	task-text-search	task-text-search-recipes	879
files	task-files	task-files-recipes	834
reverse-engineer	task-reverse-engineer	task-reverse-engineer-recipes	827
media	task-media	task-media-recipes	1,038
git	task-git	task-git-recipes	850
debug	task-debug	task-debug-recipes	1,829
network	task-network	task-network-recipes	1,524
containers	task-containers	(inline)	—
http	task-http	task-http-recipes	1,191
database	task-database	task-database-recipes	1,292
process	task-process	task-process-recipes	811
system	task-system	task-system-recipes	1,269
dev	task-dev	task-dev-recipes	1,581
api-test	task-api-test	task-api-test-recipes	1,091
crypto	task-crypto	task-crypto-recipes	724
logs	task-logs	task-logs-recipes	685
backup	task-backup	task-backup-recipes	748
archives	task-archives	task-archives-recipes	92
permissions	task-permissions	task-permissions-recipes	87
watch	task-watch	task-watch-recipes	160

Descriptions indexed by user symptoms route better than descriptions indexed by tool names.

# Generation 1: tool-indexed (weaker routing)
description: "strace, perf, valgrind, gdb profiling and debugging"

# Generation 2: symptom-indexed (stronger routing)
description: "Debugging and profiling - program crashes randomly, CPU at 100%,
  takes forever to run, eating all my RAM, why is it slow, memory leak, segfault"

Lead with the problem the user has, not the tool you will use. Users say “my program is slow,” not “I need strace.” Symptom-indexed descriptions match the user’s vocabulary, which is what the routing mechanism operates on. The LLM selects skills by comparing the user’s message against each skill’s description. When descriptions mirror how users phrase their problems, routing accuracy improves without any changes to the selection mechanism itself.

This is the single highest-leverage improvement available when writing skill descriptions. Tool names are implementation details. User symptoms are the query language.

Limitations and Trade-offs

Added latency. Two skill invocations instead of one. The router adds a round-trip before the recipes skill executes. For simple queries where the user provides all necessary context upfront, this overhead is noticeable.
Non-deterministic routing. The LLM exercises judgment when selecting skills. The same query may occasionally route differently across sessions. Configuration guidance and strong descriptions reduce but do not eliminate variance.
Overlapping domains require disambiguation. “Search for text in log files” could route to text-search, logs, or files. Disambiguation tables in configuration resolve common overlaps, but edge cases persist. This is inherent to any system where domain boundaries are fuzzy.
Source code availability short-circuits routing. When the user’s question can be answered by reading available source code, the LLM may bypass skills entirely. This is correct behavior, not a bug, but it means skill usage is not guaranteed even when a relevant skill exists.
Not a retrieval system. The router reads conversation context linearly. It does not perform semantic search or have access to external memory. Context extraction quality depends on how recently and clearly the relevant information appeared in conversation. Information from hundreds of messages ago may not be reliably extracted.

Source link

Limitations and Trade-offs

Leave a Reply Cancel reply