core / mcts.py — MCTSReasoner

Monte Carlo Tree Search for action selection with cached alternatives and backtracking.

Pipeline

  1. Expand — ask the LLM for N (default 3) distinct next-action candidates with description + tool + risk note.
  2. Simulate — for each candidate ask a worker LLM to predict outcome and rate progress / cost / risk in [0, 1].
  3. Score — combined score: 0.6 · progress + 0.15 · (1 - cost) + 0.25 · (1 - risk).
  4. Select — pick highest-scoring; push siblings onto a backtrack stack.
  5. Backtrack — when the chosen action fails, backtrack() pops the next-best candidate.

Data structures

TypeFields
ActionCandidate (line 31)description, tool_name, tool_args, simulated_outcome, score, risk_notes, selected
MCTSNode (line 52)action, depth, children, visits, total_score, avg_score property

Constructor

MCTSReasoner(llm_client, max_candidates=3, max_depth=2)

Public methods

MethodPurpose
async select_best_action(task, plan_state, available_tools, context)Run the full Generate → Simulate → Score → Select pipeline (line 127).
async backtrack()Pop the next-best alternative from the stack (line 162).
has_alternatives() -> boolWhether the stack still has candidates (line 181).
clear()Reset (line 185).

Diagram

current task candidate A score 0.82 candidate B score 0.61 candidate C score 0.34 SELECTED ✓ backtrack stack: [B, C] if A fails → MCTSReasoner.backtrack() pops B and reruns the agent loop with B

Figure 6 — MCTS expand / score / select with cached siblings.

Concurrency

Fully async; asyncio.gather drives parallel simulation calls. Worker pool is preferred (cheaper, different perspective) when available.