core / hypothesis.py — HypothesisTester

Parallel-debugging engine: generate competing hypotheses, test them concurrently, ask the LLM to rank survivors.

Why

Serial debugging burns 6-8 turns hopping between guesses. Parallelising the test phase collapses the same exploration into 2-3 turns: the worker pool runs every candidate test simultaneously, eliminates obvious losers via heuristic markers, then a single LLM call ranks what's left.

Data structures

TypeFields
Hypothesis (line 21)description · test_action · test_tool · result · consistent · confidence

Methods

MethodPurpose
__init__(llm_client, max_candidates=3, max_depth=2)Bind LLM client and limits.
async generate_hypotheses(problem, context, error_output) -> List[Hypothesis]LLM produces 3-5 candidates with description and recommended test (line 88).
async test_hypotheses_parallel(hypotheses, executor) -> List[Hypothesis]Run tests in parallel, set consistent=True when no error markers detected (line 128).
async evaluate_results(problem, hypotheses) -> dictLLM ranks survivors; returns most likely root cause + remediation (line 167).
async backtrack() -> Optional[Hypothesis]Pop next-best alternative when current pick fails (line 162).

Heuristic consistency check

If the test result lacks any of error · exception · traceback · not found, the hypothesis is provisionally marked consistent and forwarded to LLM evaluation. This avoids wasting an LLM call on hypotheses that are clearly wrong.

Concurrency

Fully async; asyncio.gather for parallel tests. The executor argument allows callers to inject a sandbox-aware test runner.