lukeb-aidev/cohesix: Cohesix is an open-source high-assurance control-plane operating system built on the formally verified seL4 microkernel, designed to keep the trusted computing base intentionally small while enabling deterministic orchestration of edge GPU systems and auditable MLOps.


Cohesix

Open-source releases are available in releases/.

Tested platforms:

  • Apple Silicon M-series (macOS 26 host, Queen VM + host tools)
  • AWS g5g.xlarge (Queen VM, host tools, GPU integration, API integration)
  • AWS t4g.small (arm64 build host)
  • NVIDIA JetPack 6.2.1 (worker VM path, GPU integration)

Models Tested:

  • HuggingFaceTB/SmolVLM-500M-Instruct
  • WinKawaks/vit-small-patch16-224 (ViT-S/16)
  • WinKawaks/vit-tiny-patch16-224 (ViT-Ti/16)

Cohesix explores a specific and deliberately narrow problem space: how to build a small, auditable, and secure control plane for orchestrating distributed edge GPU systems, without inheriting the complexity, opacity, and attack surface of general-purpose operating systems.

Cohesix is a research operating system with practical goals: to test whether a formally grounded microkernel, a file-shaped control plane, and a strictly bounded userspace can handle real edge-orchestration workloads in hostile and unreliable environments. The project is informed by earlier work in film and broadcast systems where reliability, timing, and control mattered more than convenience.
In practical terms, the project would not be feasible without extensive use of AI agents for architecture review, design iteration, code synthesis, debugging assistance, and documentation refinement.

Cohesix is intentionally opinionated. It treats determinism, auditability, and security as design inputs rather than constraints, and is willing to exclude large classes of features to preserve those properties.

Cohesix has a strong MLOps fit: model lifecycle pointers, deterministic CAS updates, and bounded telemetry streams make training, rollout, and audit pipelines reproducible without introducing in-VM ML stacks.


Cohesix runs on seL4, a high-assurance capability-based microkernel with a machine-checked proof of correctness. seL4 provides strong isolation, explicit authority, and deterministic scheduling while keeping the kernel extremely small. This lets Cohesix place all policy and orchestration logic in a pure Rust userspace, minimize the trusted computing base, and enforce capability-scoped control planes without relying on POSIX semantics, in-kernel drivers, or ambient authority.


Cohesix is a minimal orchestration operating system for secure edge management, targeting a defined set of use cases around AI hives and distributed GPU workloads.

Technically, Cohesix is a pure Rust userspace running on upstream seL4 on aarch64/virt (GICv3). Userspace is shipped as a static CPIO boot payload containing the root task, the NineDoor Secure9P server, and worker roles; host tools live outside the VM.

Cohesix does not include a traditional filesystem; instead it exposes a synthetic Secure9P namespace where paths represent capability-scoped control and telemetry interfaces rather than persistent storage.

Cohesix does not provide HTTPS or TLS. Instead, it relies on an authenticated, encrypted private network (e.g. VPN or overlay) for transport security, keeping the Cohesix TCB small and focused. See this example.

All control and telemetry are file-shaped and exposed via Secure9P; the console mirrors those semantics. There are no ad-hoc RPC channels, no background daemons, and no general in-VM networking services.

Operators interact with Cohesix through two consoles:

  • a local PL011 UART console for early bring-up and recovery, and
  • a remote TCP console consumed by the cohsh shell, which mirrors serial semantics and provides the primary operational interface from Unix-like hosts.

The intended deployment target is physical ARM64 hardware booted via UEFI. Today, QEMU aarch64/virt is used for bring-up, CI, and testing, with the expectation that QEMU behaviour closely mirrors the eventual hardware profiles. Cohesix is not a general-purpose operating system and deliberately avoids POSIX semantics, libc, dynamic linking, and in-VM hardware stacks to keep the system small and analyzable.

In short, Cohesix treats orchestration itself as an operating-system problem, with authority, lifecycle, and failure handling as first-class concerns.


Plan 9 heritage and departures

Cohesix is deliberately influenced by Plan 9 from Bell Labs, but it is not a revival, clone, or generalisation of Plan 9. The influence is philosophical rather than literal, and the departures are explicit.

File-shaped control surfaces
Cohesix exposes control and observation as file operations. Paths such as /queen/ctl, /worker//telemetry, /log/*, /gpu//*, and /gpu/models/* are interfaces, not storage. This yields diffable state, append-only audit logs, and a uniform operator surface.

Namespaces as authority boundaries
Like Plan 9’s per-process namespaces, Cohesix uses per-session, role-scoped namespaces. A namespace is not global truth; it is a capability-filtered view of the system. Authority is defined by which paths are visible and writable.

Late binding of services
Services are not assumed to exist. Workers, GPU providers, and auxiliary capabilities are bound into the namespace only when required, supporting air-gapped operation, fault isolation, and minimal steady-state complexity.

Hostile networks by default
Cohesix assumes unreliable, adversarial, and partitioned networks. Every operation is bounded, authenticated, auditable, and revocable.

No single-system illusion
Partial visibility and degraded operation are normal. Cohesix explicitly rejects the idea of a seamless single-system image.

Control plane only
Secure9P is a control-plane protocol, not a universal IPC or data plane. Cohesix does not host applications, GUIs, or general user environments, and keeps heavy ecosystems outside the trusted computing base.

Explicit authority and revocation
Cohesix enforces capability tickets, time- and operation-bounded leases, and revocation-first semantics. Failure is handled by withdrawing authority, not by retries or self-healing loops.

Determinism over flexibility
Bounded memory, bounded work, and deterministic behaviour are prioritised over convenience and dynamism.


Architecture (high level)

A single Cohesix deployment is a hive: one Queen role orchestrating multiple workers over a shared Secure9P namespace. The root task owns initial authority and scheduling, NineDoor presents the synthetic namespace, and all lifecycle actions are file-driven under /queen, /worker/, /log, and /gpu/.

CUDA, NVML, and other heavy stacks remain host-side. The VM never touches GPU hardware directly.

Figure 1: Cohesix concept architecture (Queen/Worker hive over Secure9P, host-only GPU bridge, dual consoles)

flowchart LR

  subgraph HOST["Host (outside Cohesix VM/TCB)"]
    OP["Operator or Automation"]:::ext
    COHSH["cohsh (host-only)
Canonical shell
transport tcp
role and ticket attach"]:::hosttool GUI["SwarmUI (host-only)
Speaks cohsh protocol"]:::hosttool WIRE["secure9p-codec/core/transport (host)
bounded framing
TCP transport adapter"]:::hostlib GPUB["gpu-bridge-host (host)
CUDA and NVML here
lease enforcement
publishes gpu + models"]:::hosttool end subgraph TARGET["Target (QEMU aarch64 virt today; UEFI ARM64 later)"] subgraph K["Upstream seL4 kernel"] SEL4["seL4
caps, IPC, scheduling
formal foundation"]:::kernel end subgraph USER["Pure Rust userspace (static CPIO rootfs)"] RT["root-task
bootstrap caps and timers
cooperative event pump
spawns NineDoor and roles
owns side effects"]:::vm ND["NineDoor (Secure9P server)
exports synthetic namespace
role-aware mounts and policy"]:::vm Q["Queen role
orchestrates via queen ctl"]:::role WH["worker-heart
heartbeat telemetry"]:::role WG["worker-gpu (VM stub)
no CUDA or NVML
uses gpu files"]:::role end end UART["PL011 UART console
bring-up and recovery"]:::console TCP["TCP console
remote operator surface"]:::console subgraph NS["Hive namespace (Secure9P)"] PROC["Path: /proc
boot and status views"]:::path QUEENCTL["Path: /queen/ctl
append-only control
spawn kill bind mount
spawn gpu lease requests"]:::path WORKTEL["Path: /worker/ID/telemetry
append-only telemetry"]:::path LOGS["Path: /log/*
append-only streams"]:::path GPUFS["Path: /gpu/ID
info ctl job status
host-mirrored providers"]:::path GPUMODELS["Path: /gpu/models
available + active pointers
host-published registry"]:::path end SEL4 --> RT RT --> ND RT --- UART RT --- TCP OP --> COHSH OP --> GUI COHSH -->|tcp attach| TCP GUI -->|same protocol| TCP ND --> PROC ND --> QUEENCTL ND --> WORKTEL ND --> LOGS ND --> GPUFS ND --> GPUMODELS Q -->|Secure9P ops| ND WH -->|Secure9P ops| ND WG -->|Secure9P ops| ND QUEENCTL -->|validated then internal action| RT GPUB --> WIRE WIRE -->|Secure9P transport host-only| ND GPUB --> GPUFS GPUB --> GPUMODELS classDef kernel fill:#eeeeee,stroke:#555555,stroke-width:1px; classDef vm fill:#f7fbff,stroke:#2b6cb0,stroke-width:1px; classDef role fill:#f0fdf4,stroke:#15803d,stroke-width:1px; classDef console fill:#faf5ff,stroke:#7c3aed,stroke-width:1px; classDef path fill:#f8fafc,stroke:#334155,stroke-dasharray: 4 3; classDef hosttool fill:#fff7ed,stroke:#c2410c,stroke-width:1px; classDef hostlib fill:#fffbeb,stroke:#b45309,stroke-width:1px; classDef ext fill:#ffffff,stroke:#334155,stroke-width:1px;



Loading



  • root-task — seL4 bootstrapper configuring capabilities, timers, and the cooperative event pump; hosts the serial and TCP consoles and owns all side effects.
  • nine-door — Secure9P server exporting /proc, /queen, /worker, /log, and host-mirrored /gpu namespaces with role-aware policy.
  • worker-heart — Minimal worker emitting heartbeat telemetry into /worker//telemetry.
  • worker-gpu — VM-resident stub handling GPU lease state and telemetry hooks; never touches hardware.
  • cohsh — Host-only CLI and canonical shell for the hive; GUI tooling is expected to speak the same protocol.
  • gpu-bridge-host — Host-side process that discovers or mocks GPUs, enforces leases, and publishes /gpu//, /gpu/models/*, and /gpu/telemetry/schema.json into the VM.
  • host-sidecar-bridge — Host-side publisher for /host providers (systemd, k8s, docker, nvidia) using existing Secure9P semantics and manifest-backed polling defaults.
  • secure9p-codec / secure9p-core / secure9p-transport — Secure9P codec, core policy hooks, and transport adapters for host tools.

SwarmUI is the host-side desktop UI for Cohesix. It renders Live Hive telemetry and replays (including bounded per-worker text overlays and a detail panel), and it reuses the same console/Secure9P transports and core verbs as cohsh via an embedded console panel.

Figure 2 SwarmUI replay (Live Hive telemetry visualization)
SwarmUI replay screenshot

Option A: Run a pre-built release (fastest)

Pre-built bundles are available in releases/. Each bundle includes its own QUICKSTART.md.

  1. Extract the bundle for your OS (*-MacOS or *-linux).
  2. Install runtime dependencies (QEMU + SwarmUI libs):
    ./scripts/setup_environment.sh
  3. Terminal 1: boot the VM:
  4. Terminal 2: connect with cohsh:
    ./bin/cohsh --transport tcp --tcp-host 127.0.0.1 --tcp-port 31337 --role queen

    For non-local use, tunnel this TCP console over a VPN/overlay (no TLS inside the VM).

  5. If you plan to run non-mock PEFT flows, publish the live GPU registry so /gpu/models is visible:
    ./bin/gpu-bridge-host --publish --tcp-host 127.0.0.1 --tcp-port 31337 --auth-token changeme
  6. Optional UI (Mac or Linux desktop):

    Headless Linux: xvfb-run -a ./bin/swarmui


Option B: Build from source (macOS or Linux)

You need QEMU, Rust, Python 3, and an external seL4 build that produces elfloader and kernel.elf.

macOS 26 (Apple Silicon)

./toolchain/setup_macos_arm64.sh
source "$HOME/.cargo/env"

Linux (Ubuntu 24 recommended)

sudo apt-get update
sudo apt-get install -y git cmake ninja-build clang llvm lld python3 python3-pip qemu-system-aarch64
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --default-toolchain stable
source "$HOME/.cargo/env"

If you’re on another Linux distro, install the same dependencies with your package manager (QEMU + build essentials + Rust).

Build and run (QEMU + TCP console)

  1. Build seL4 externally (upstream) for aarch64 + qemu_arm_virt. Place the build at $HOME/seL4/build or pass --sel4-build below.
  2. Terminal 1: build and boot:
    SEL4_BUILD_DIR=$HOME/seL4/build ./scripts/cohesix-build-run.sh \
      --sel4-build "$HOME/seL4/build" \
      --out-dir out/cohesix \
      --profile release \
      --root-task-features cohesix-dev \
      --cargo-target aarch64-unknown-none \
      --transport tcp
  3. Terminal 2: connect with cohsh:
    cd out/cohesix/host-tools
    ./cohsh --transport tcp --tcp-port 31337 --role queen

See below for detailed design, interfaces, and milestone tracking:




Source link

Leave a Reply

Your email address will not be published. Required fields are marked *