BranchContext: CoW filesystem isolation for multi-sample vLLM workflows

TL;DR: We built BranchContext, an OS-level copy-on-write branching framework for AI agents. When vLLM generates N candidates via n=N, BranchContext can instantly fork the workspace into N isolated branches, test each candidate, and commit the winner. No git checkouts, no temp dirs, no cleanup. Curious if the community sees integration opportunities.


The problem

vLLM’s n= parameter is the cheapest way to get diverse candidates (prompt tokens charged once). But once you have N candidates, say N code patches from DeepSeek-R1, you need to test each one against the real world: apply the patch, run tests, check output. Today that means either sequential apply/test/revert cycles, or managing N temp directories manually.

What BranchContext does

BranchContext (paper) provides instant copy-on-write filesystem branches via FUSE. Each branch is process-isolated with optional cgroup v2 resource limits. The agent patterns sit on top:

from branching import Speculate, BestOfN, Workspace


candidates = [make_test(c.text) for c in vllm_output.outputs]

outcome = Speculate(candidates, first_wins=True)(workspace)
# OR
outcome = BestOfN(candidates, scores=logprob_scores)(workspace)

No candidate can corrupt the workspace or starve the system. Winner commits atomically; losers are discarded for free.

Potential integration points

  1. Multi-sample → branch → verify pipeline. The most straightforward use case. vLLM generates N outputs, BranchContext tests each in isolation. Works today with no changes on either side, just glue code. But a tighter integration (e.g., a vLLM post-processing hook) could make this seamless.

  2. Speculative decoding ↔ speculative execution. There’s an interesting parallel: vLLM’s speculative decoding is “draft model proposes tokens, target model verifies” at the token level. BranchContext’s Speculate is “agent proposes actions, tests verify” at the execution level. Both are speculate-then-verify patterns at different layers. Is there a useful abstraction that connects them?

  3. best_of with real-world scoring. vLLM’s best_of parameter already generates K sequences and returns top N by log-probability. BranchContext extends that to real-world scoring, generate K, execute each in a CoW branch, score by actual test results. Could this be a natural extension: a serving mode where candidates are validated before being returned to the client?

It could be possible that BranchContext’s copy-on-write branching aligns well with vLLM’s multi-sample (n=) and best_of workflows, especially for agentic or code-generation tasks where each candidate needs real-world validation. Though I’m not entirely certain, it might be that the most immediate integration is at the application layer: vLLM generates N outputs, and BranchContext manages isolated test branches for each, with no changes needed to vLLM itself—just glue code between output and validation steps. This pattern is already feasible today, as you described.

Regarding deeper integration, there isn’t an explicit vLLM post-processing hook or serving mode for real-world candidate validation in the data sources, nor is there a direct abstraction connecting vLLM’s token-level speculative decoding with BranchContext’s execution-level speculation. However, the speculate-then-verify analogy is compelling, and a tighter coupling (e.g., a vLLM extension or plugin that triggers BranchContext validation before returning results) could be a valuable community contribution, though this is not currently built-in or documented in vLLM’s core features or APIs. Would you like more detail on possible extension points or how to structure such a plugin?

Sources:

Hint: Mention @RunLLM in the post for followups.