TL;DR: We built BranchContext, an OS-level copy-on-write branching framework for AI agents. When vLLM generates N candidates via n=N, BranchContext can instantly fork the workspace into N isolated branches, test each candidate, and commit the winner. No git checkouts, no temp dirs, no cleanup. Curious if the community sees integration opportunities.
The problem
vLLM’s n= parameter is the cheapest way to get diverse candidates (prompt tokens charged once). But once you have N candidates, say N code patches from DeepSeek-R1, you need to test each one against the real world: apply the patch, run tests, check output. Today that means either sequential apply/test/revert cycles, or managing N temp directories manually.
What BranchContext does
BranchContext (paper) provides instant copy-on-write filesystem branches via FUSE. Each branch is process-isolated with optional cgroup v2 resource limits. The agent patterns sit on top:
from branching import Speculate, BestOfN, Workspace
candidates = [make_test(c.text) for c in vllm_output.outputs]
outcome = Speculate(candidates, first_wins=True)(workspace)
# OR
outcome = BestOfN(candidates, scores=logprob_scores)(workspace)
No candidate can corrupt the workspace or starve the system. Winner commits atomically; losers are discarded for free.
Potential integration points
-
Multi-sample → branch → verify pipeline. The most straightforward use case. vLLM generates N outputs, BranchContext tests each in isolation. Works today with no changes on either side, just glue code. But a tighter integration (e.g., a vLLM post-processing hook) could make this seamless.
-
Speculative decoding ↔ speculative execution. There’s an interesting parallel: vLLM’s speculative decoding is “draft model proposes tokens, target model verifies” at the token level. BranchContext’s Speculate is “agent proposes actions, tests verify” at the execution level. Both are speculate-then-verify patterns at different layers. Is there a useful abstraction that connects them?
-
best_of with real-world scoring. vLLM’s best_of parameter already generates K sequences and returns top N by log-probability. BranchContext extends that to real-world scoring, generate K, execute each in a CoW branch, score by actual test results. Could this be a natural extension: a serving mode where candidates are validated before being returned to the client?