core / mcts.py — MCTSReasoner
Monte Carlo Tree Search for action selection with cached alternatives and backtracking.
Pipeline
- Expand — ask the LLM for N (default 3) distinct next-action candidates with description + tool + risk note.
- Simulate — for each candidate ask a worker LLM to predict outcome and rate progress / cost / risk in [0, 1].
- Score — combined score:
0.6 · progress + 0.15 · (1 - cost) + 0.25 · (1 - risk). - Select — pick highest-scoring; push siblings onto a backtrack stack.
- Backtrack — when the chosen action fails,
backtrack()pops the next-best candidate.
Data structures
| Type | Fields |
|---|---|
ActionCandidate (line 31) | description, tool_name, tool_args, simulated_outcome, score, risk_notes, selected |
MCTSNode (line 52) | action, depth, children, visits, total_score, avg_score property |
Constructor
MCTSReasoner(llm_client, max_candidates=3, max_depth=2)
Public methods
| Method | Purpose |
|---|---|
async select_best_action(task, plan_state, available_tools, context) | Run the full Generate → Simulate → Score → Select pipeline (line 127). |
async backtrack() | Pop the next-best alternative from the stack (line 162). |
has_alternatives() -> bool | Whether the stack still has candidates (line 181). |
clear() | Reset (line 185). |
Diagram
Figure 6 — MCTS expand / score / select with cached siblings.
Concurrency
Fully async; asyncio.gather drives parallel simulation calls. Worker pool is preferred (cheaper, different perspective) when available.