SecBear/nix-sandbox-mcp: Sandboxed code execution for LLMs, powered by Nix


Sandboxed code execution for LLMs, powered by Nix.

LLMs need to run code. Most solutions reach for Docker — heavyweight,
non-reproducible, and yet another daemon to manage. nix-sandbox-mcp uses Nix
instead: environments are declarative flake expressions, sandboxing is
jail.nix (bubblewrap + Linux
namespaces, no root required), and a planned
microvm.nix backend adds full VM
isolation when you need it. Everything runs locally — no cloud, no containers,
no image pulls.

Requirements: Linux with Nix (flakes enabled).
The sandbox uses bubblewrap + Linux namespaces for isolation — macOS and Windows
are not supported. WSL2 may work if your kernel has user namespaces enabled.

Add to your MCP client config:

{
  "mcpServers": {
    "nix-sandbox": {
      "command": "nix",
      "args": ["run", "github:secbear/nix-sandbox-mcp", "--", "--stdio"],
      "env": {
        "PROJECT_DIR": "/home/user/myproject"
      }
    }
  }
}

That’s it. The LLM gets three sandboxed environments (shell, python, node) with
your project mounted read-only at /project. Drop PROJECT_DIR if you don’t
need project access.

The bundled presets are a starting point. Define your own with a Nix flake:

# my-envs/flake.nix
{
  inputs = {
    nixpkgs.url = "github:NixOS/nixpkgs/nixos-unstable";
    nix-sandbox-mcp.url = "github:secbear/nix-sandbox-mcp";
  };

  outputs = { nixpkgs, nix-sandbox-mcp, ... }:
    let pkgs = nixpkgs.legacyPackages.x86_64-linux;
    in {
      packages.x86_64-linux = {
        data-science = nix-sandbox-mcp.lib.mkSandbox {
          inherit pkgs;
          name = "data-science";
          interpreter_type = "python";
          packages = [
            (pkgs.python3.withPackages (ps: [ ps.numpy ps.pandas ps.requests ]))
          ];
        };

        nix-tools = nix-sandbox-mcp.lib.mkSandbox {
          inherit pkgs;
          name = "nix-tools";
          interpreter_type = "bash";
          packages = [ pkgs.ripgrep pkgs.fd pkgs.jq pkgs.yq-go pkgs.tree ];
        };
      };
    };
}

Point NIX_SANDBOX_ENVS at your flake refs. They’re built at server startup and
merged with the bundled presets:

{
  "mcpServers": {
    "nix-sandbox": {
      "command": "nix",
      "args": ["run", "github:secbear/nix-sandbox-mcp", "--", "--stdio"],
      "env": {
        "PROJECT_DIR": "/home/user/myproject",
        "NIX_SANDBOX_ENVS": "github:myorg/envs#data-science,github:myorg/envs#nix-tools"
      }
    }
  }
}

Now the LLM can use custom tools against your live codebase, fully sandboxed:

# env: "nix-tools"
rg "TODO" /project/src --type rust -c
# /project/src/main.rs:3
# /project/src/config.rs:1
# env: "data-science"
import pandas as pd
df = pd.read_csv("/project/data/results.csv")
print(df.describe())

interpreter_type maps the sandbox to an agent REPL — "python", "bash", or
"node". Pass a session ID to persist variables and imports across calls.

If you prefer pre-building over startup builds, nix build your sandbox into
~/.config/nix-sandbox-mcp/sandboxes/ and skip NIX_SANDBOX_ENVS entirely. The
daemon scans that directory at startup.

All runtime settings are env vars in the MCP client JSON:

Variable Purpose Default
PROJECT_DIR Project directory to mount read-only (none)
PROJECT_MOUNT Mount point inside sandbox /project
NIX_SANDBOX_ENVS Comma-separated flake refs to build at startup (none)
NIX_SANDBOX_DIR Pre-built sandbox directory ~/.config/nix-sandbox-mcp/sandboxes
SESSION_IDLE_TIMEOUT Idle timeout in seconds 300
SESSION_MAX_LIFETIME Max session lifetime in seconds 3600

Build-time settings (environment definitions, default timeouts) live in
config.example.toml for customizing the bundled presets
or baking additional environments into the server at build time.

jail.nix (namespace isolation) — the current backend. Uses bubblewrap to
create unprivileged sandboxes with separate user, PID, network, and mount
namespaces. No network access by default.
Project files are mounted read-only. This protects against accidental damage and
opportunistic malicious code. It does not protect against kernel exploits —
the sandbox shares the host kernel.

microvm.nix (VM isolation) — planned. Separate Linux kernel per sandbox via
KVM, virtiofs for store access, vsock for communication. Full isolation
including kernel attack surface. This is the right choice for running untrusted
code from the internet.

MCP Client
  │ JSON-RPC over stdio
  ▼
Shell wrapper
  │ builds NIX_SANDBOX_ENVS, execs daemon
  ▼
Rust daemon
  ├─ ephemeral ──▶ bubblewrap jail ──▶ interpreter
  └─ session   ──▶ bubblewrap jail ──▶ sandbox_agent.py ──▶ persistent REPL

The daemon handles MCP protocol and process dispatch. Nix handles everything
else — environment resolution, package composition, sandbox wrapper generation.
Environments come from three sources (bundled presets, NIX_SANDBOX_ENVS
startup builds, pre-built artifacts in $NIX_SANDBOX_DIR) and all produce the
same artifact format. The daemon doesn’t know which source an environment came
from.

See CONTRIBUTING.md for build instructions, repo layout, and
internals.

MCP servers pay a token tax: every tool schema is injected into the LLM’s
context window at connection time. A server exposing 60 tools can burn ~47k
tokens before the user says anything. This matters because context is finite
and expensive — tokens spent on tool definitions are tokens unavailable for
reasoning.

Common approaches and their costs:

Approach Init cost Trade-off
Static loading (all tools upfront) ~150 tokens × N tools Context bloat scales linearly with tool count
Dynamic discovery (list → schema → call) ~400 tokens fixed Extra round-trips per invocation; LLM must learn discovery protocol
Skill/guide documents (SKILL.md) ~800 tokens on activation Rich guidance but heavy; separate document to maintain

Our approach: one parameterized tool.

nix-sandbox-mcp exposes a single run tool that takes an env parameter.
Adding environments (python, node, shell, custom flakes) doesn’t add tools —
it adds a value to a parameter. The fixed context cost is ~420 tokens
regardless of how many environments are configured:

Component Tokens What it contains
Tool schema ~75 Name, params (code, env, session), selection guidance
Server instructions ~160 Environment list, session workflow, debugging hints
Per-parameter descriptions ~80 Field-level usage hints via JSON Schema
Total ~420 Constant — does not grow with environment count

Compare: if each environment were a separate tool (3 bundled + 5 custom = 8
tools), that would cost ~1,200+ tokens and grow with every environment added.

Where guidance lives:

Rather than a separate guidance document, tool-selection and workflow hints are
embedded directly in the MCP protocol fields that LLMs already read:

  • Tool description — when to use the sandbox vs built-in shell (isolation,
    reproducibility, resource limits vs file edits, git, host commands)
  • Server instructions — available environments, session lifecycle
    (ephemeral by default, sessions for multi-step work), debugging hints
  • Parameter descriptions — per-field usage via JSON Schema description

This keeps all guidance in-band and co-located with the tool definition. No
extra documents to load, no discovery protocol to learn, no activation step.

Phase Status What
1 Done jail.nix backend, bundled presets, MCP protocol
2a Done Project mounting, custom flake refs in config
2b Done Session persistence (Python, Bash, Node REPLs)
2c Done Decoupled sandboxes (mkSandbox, directory scanning)
2d Done MCP-conventional config (env vars, NIX_SANDBOX_ENVS)
3a Planned microvm.nix backend for hardware-level isolation
3b Planned Dead interpreter recovery (restart bash/node on crash)

MIT



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *