core / hypothesis.py — HypothesisTester
Parallel-debugging engine: generate competing hypotheses, test them concurrently, ask the LLM to rank survivors.
Why
Serial debugging burns 6-8 turns hopping between guesses. Parallelising the test phase collapses the same exploration into 2-3 turns: the worker pool runs every candidate test simultaneously, eliminates obvious losers via heuristic markers, then a single LLM call ranks what's left.
Data structures
| Type | Fields |
|---|---|
Hypothesis (line 21) | description · test_action · test_tool · result · consistent · confidence |
Methods
| Method | Purpose |
|---|---|
__init__(llm_client, max_candidates=3, max_depth=2) | Bind LLM client and limits. |
async generate_hypotheses(problem, context, error_output) -> List[Hypothesis] | LLM produces 3-5 candidates with description and recommended test (line 88). |
async test_hypotheses_parallel(hypotheses, executor) -> List[Hypothesis] | Run tests in parallel, set consistent=True when no error markers detected (line 128). |
async evaluate_results(problem, hypotheses) -> dict | LLM ranks survivors; returns most likely root cause + remediation (line 167). |
async backtrack() -> Optional[Hypothesis] | Pop next-best alternative when current pick fails (line 162). |
Heuristic consistency check
If the test result lacks any of error · exception · traceback · not found, the hypothesis is provisionally marked consistent and forwarded to LLM evaluation. This avoids wasting an LLM call on hypotheses that are clearly wrong.
Concurrency
Fully async; asyncio.gather for parallel tests. The executor argument allows callers to inject a sandbox-aware test runner.