core / mcts.py — MCTSReasoner

Monte Carlo Tree Search for action selection with cached alternatives and backtracking.

Pipeline

Expand — ask the LLM for N (default 3) distinct next-action candidates with description + tool + risk note.
Simulate — for each candidate ask a worker LLM to predict outcome and rate progress / cost / risk in [0, 1].
Score — combined score: 0.6 · progress + 0.15 · (1 - cost) + 0.25 · (1 - risk).
Select — pick highest-scoring; push siblings onto a backtrack stack.
Backtrack — when the chosen action fails, backtrack() pops the next-best candidate.

Type	Fields
`ActionCandidate` (line 31)	description, tool_name, tool_args, simulated_outcome, score, risk_notes, selected
`MCTSNode` (line 52)	action, depth, children, visits, total_score, `avg_score` property

MCTSReasoner(llm_client, max_candidates=3, max_depth=2)

Method	Purpose
`async select_best_action(task, plan_state, available_tools, context)`	Run the full Generate → Simulate → Score → Select pipeline (line 127).
`async backtrack()`	Pop the next-best alternative from the stack (line 162).
`has_alternatives() -> bool`	Whether the stack still has candidates (line 181).
`clear()`	Reset (line 185).

Figure 6 — MCTS expand / score / select with cached siblings.

Fully async; asyncio.gather drives parallel simulation calls. Worker pool is preferred (cheaper, different perspective) when available.