core / dream.py
Idle-time consolidation, synthetic challenge generation, and the full self-play orchestration pipeline.
Responsibilities
- Mine failed tasks from the journal into standalone, self-contained challenges via journal_challenges.
- Validate LLM-generated challenges before running them — quality gate, preflight, and validator self-test gate.
- Detect cross-episode tool patterns that should become composed skills.
- Orchestrate the self-play simulation loop: spawn an isolated temp agent, run the challenge, score with correctness_weighted_score, verify and persist any new lesson via SkillMemory.
Validation stack (three gates, in order)
1. validate_challenge_quality(setup, validator) → (bool, reason)
Pattern-based quality gate (line 102). Rejects:
- Validators that call
random.seed,random.randint,random.uniform,random.choiceornp.random— data-generation markers that would make scoring non-deterministic. - Validators that depend on dynamic discovery (
os.listdir,glob.glob,pathlib.Path) when the setup writes files with explicit names — they must reference at least one shared filename. - SQL setup scripts where
CREATE TABLEcolumn count doesn't matchINSERTvalue count, or CSV headers that don't match row field count. - The unwinnable split pattern:
.strip().split('\n')combined with randomness and alen(act) != len(exp)check.
Rejection kinds are returned so the regeneration loop can route targeted feedback (files-mismatch → targeted "open file X" hint; data-gen → "don't call random.seed" hint; etc.). Up to 3 generation attempts; a files-mismatch rejection triggers the validator repair path which regenerates only the validator with a focused prompt.
2. Preflight (.preflight.py)
Runs the validator in the sandbox with __name__ == "__dry_run__" (bypasses any if __name__ == "__main__" guard). Catches module-scope NameError / ImportError / ModuleNotFoundError — unrunnable validators that used to burn a whole solver attempt.
3. Validator self-test gate (.validator_selftest.py)
Catches internal-contradiction bugs like the canonical float('60.00%') trap, where a validator formats an expected field with a unit suffix and then calls float() on it. Flow:
_instrument_validator_for_self_test(src)— AST-find the firstsubprocess.run(...solution.py...)statement and prepend a probe that dumps the first-resolvedexpected_*variable between<<<__GHOST_SELFTEST_EXPECTED_START__>>>/<<<__GHOST_SELFTEST_EXPECTED_END__>>>sentinels, thenraise SystemExit(42)._extract_selftest_dump(stdout)— pull the dumped block out of the probe's stdout.- Write a
solution.pythatsys.stdout.write(<dumped>); restore mocks from the post-setup snapshot. - Run the original validator.
_looks_like_validator_crash(out)returns True iff the traceback's innermost frame is.validator.py— in which case the challenge is rejected as unwinnable.
Candidate variable names scanned (in order): expected_output, expected_lines, expected, expected_text, expected_str, expected_result, golden_output, golden, correct_output, answer. The gate is best-effort: unparseable validators or those without a matching subprocess.run skip cleanly.
Runtime crash detector (widened)
Even past the three gates, a validator can raise during comparison. The attempt-loop circuit breaker classifies a traceback as a validator crash when the tail frame is .validator.py, solution.py is absent from the feedback, and the exception type is one of:
- Structural:
SyntaxError, IndentationError, ImportError, ModuleNotFoundError, NameError - Internal-contradiction:
ValueError, TypeError, KeyError, IndexError, AttributeError
Detection aborts the cycle after attempt 1 instead of burning all 3 on the same broken validator.
Lesson pipeline
_extract_structured_lesson
Meta-cognitive LLM call that returns {trigger, anti_pattern, correct_pattern, domains, confidence, task/mistake/solution (legacy mirrors)}. The prompt explicitly requires task-class triggers, forbids copying fixture literals, and mandates a non-empty taxonomy domain set.
_generalization_guard
Last line of defence against overfit lessons. Uses n-gram token overlap (_GENERALIZATION_MIN_NGRAM = 6) to reject lessons whose:
triggercopies a 6-token run from the challenge textcorrect_patterncopies a 6-token run fromsetup_scriptor the validatordomainsis empty or contains nothing from_VALID_LESSON_DOMAINS({data_analysis, regex_parse, sql, concurrency, algo, bash, python_general})- trigger or correct_pattern is empty
_verify_lesson_helpful
For struggled-then-won and failure cases, re-runs the solver once with the lesson prepended under the production ### SKILL PLAYBOOK: header. Keeps only if the outcome strictly improves.
Isolation markers
The isolated sub-context sets several attributes to prevent production-state writes:
ReadOnlySkillMemory.is_read_only = True— class marker thatagent.pychecks (viais True) at two points: (a) to skip the entire ~15 s Perfect-It follow-up LLM call during self-play, and (b) to short-circuit the confirmation turn that the solver would otherwise spend re-deriving thatsolution.pyjust ran clean. When the solver's lastexecutetool call exits 0 onsolution.pywith non-empty stdout, the turn loop setsforce_stop = Trueand synthesises a minimal final message, skipping the ~15–25 s "task complete" thinking turn. The outer validator re-runssolution.pydirectly and never reads the agent's reasoning, so the confirmation turn was pure dead time.isolated_context.args.perfect_it = False,smart_memory = 0.0,native_tools = True.selfplay_loop_task / selfplay_loop_stop / selfplay_loop_started_atare stripped — otherwise the inner sub-agent'shandle_chatwould trip the outer loop's user-message interrupt hook.verifier, uncertainty_tracker, mcts_reasoner, hypothesis_tester, frontier_trackerset toNone.
Rejection prompt contract (retry attempts)
When an attempt fails validation, the retry prompt injected for the next attempt contains the validator's feedback string (the FAIL line with expected-vs-actual output) plus optional float-formatting hints. It does not contain the .validator.py source. An earlier revision pasted the full validator script into the retry prompt so the agent could "debug the validator's logic", but this turned every struggled-then-won cycle into an answer-key lookup — the agent copied the validator's constants (multipliers, SQL query shape) instead of reasoning from the expected-vs-actual diff. Skill-gate lessons from those cycles were memorised constants, not transferable knowledge.
Retry prompts now force the solver to reason from the diff and the original task description. Some complex challenges will fail their second attempt that previously "succeeded" via copying — that is the intended behaviour: a genuine failure is better training signal than a cheated pass.
Public functions
| Function | Purpose |
|---|---|
validate_challenge_quality(setup, validator) → (bool, reason) | Pattern-based quality gate. |
_instrument_validator_for_self_test(src) → Optional[str] | AST probe injector (module-level, testable). |
_extract_selftest_dump(stdout) → Optional[str] | Pull dumped expected-output from sentinel markers. |
_looks_like_validator_crash(text) → bool | Tail-of-traceback check for .validator.py frame. |
detect_tool_patterns(skill_memory) → list | Cross-episode tool-call sequence detection. |
Dreamer.synthetic_self_play(model_name, is_background) | Full pipeline: seed → source select → gates → run → score → extract → verify → persist. |
Dreamer._try_journal_challenge(probability) | Probabilistic journal mining; probability bumped to 0.75 under saturation. |
Dreamer._generalization_guard(lesson, …) → (bool, reason) | Overfit-lesson rejection. |
Dreamer.dream(model_name) | Journal → long-term consolidation (the REM path). |
Concurrency
Async orchestration; the temp agent loop runs synchronously inside the simulation. Triggered by the lifespan-spawned biological watchdog, by the self_play tool (one-shot), or by self_play_loop (continuous). Validator probe / self-test / solver run on the sandbox via asyncio.to_thread.
Frontier-aware cluster selection
Before the source-selection stage, synthetic_self_play chooses which cluster to target. The default path calls FrontierTracker.pick_seed (brittle-pool weighted). When --frontier-selfplay is on (default) AND both ctx.prm_scorer is a real PRMScorer with has_model=True AND ctx.trajectory_collector is a real TrajectoryCollector (strict isinstance checks — MagicMock-backed test contexts fail closed), the dream loop instead:
- Builds the candidate cluster pool as
set(challenge_templates.TEMPLATES.keys()) ∪ tracker.clusters. - Computes per-cluster signals via
core/frontier_selection.py:compute_cluster_uncertainty(PRM boundary-distance) andcompute_cluster_rarity(log-decay ofTrajectory.clustercounts). - Calls
frontier_tracker.pick_frontier_seed(uncertainty_by_cluster=…, rarity_by_cluster=…, uniform_sample_prob=args.frontier_uniform_sample_prob).
Any exception in the frontier-aware block is logged at debug and falls through to pick_seed — frontier weighting must never block a self-play cycle. The selected seed's hint is appended to the challenge-generation prompt under ### FRONTIER SEED regardless of which picker produced it. The new path's hint begins with FRONTIER TARGET (PRM-weighted) so logs and tests can attribute the source.
Covered by tests/test_dream_frontier_weighted.py (real PRM + collector + tracker, mocked LLM/sandbox) and tests/test_dream_synthetic_curiosity.py (legacy path regression).
End-to-end walkthrough: see algorithms / dream cycle.