Skill acquisition
Two parallel paths: create_skill writes permanent Python tools; self-play writes retrieval-aware lessons. Both flow into the same utility-ranked retention.
Path A: create_skill — permanent Python tools
- The LLM emits
create_skill(name, description, parameters_schema, python_code, test_payload). - Entry-point normalization (2026-04-24) —
python_codeis run throughsanitize_code()BEFORE anything hits disk. This strips CDATA envelopes that leaked from the XML tool-call parser, decodes corrupt HTML entities, and extracts code from markdown fences — all AST-gated so clean Python is never perturbed. If normalization fails, the LLM gets a specific actionable error (CDATA wrapper, HTML entities, truncated stream, escaped-newline confusion) instead of a generic test failure. See acquired_skills.html. - The normalized skill body is written TWICE: a transient copy to
$GHOST_SANDBOX_DIR/test_skill.pyfor the TDD run, and (on success) the canonical copy to$GHOST_HOME/system/memory/acquired_skills/<name>.py. The canonical storage lives outside the sandbox bind-mount, so adocker volume rm/rm -rf $GHOST_SANDBOX_DIRdoes not destroy learned tools. - TDD validation — the transient file is executed via
executewithtest_payloadJSON-encoded assys.argv[1]. The skill must produceEXIT CODE: 0with non-empty stdout. If it fails, registration is rejected AND the trace'spretty_logsurfaces the one-line cause (e.g.ValueError: invalid literal for int()…) so the operator doesn't need to grep the agent log. - Persisted into
skills_registry.json(co-located with the.pyfile undermemory/acquired_skills/) withstatus="active",usage_count=0,failure_count=0, and a content hash. - The description is embedded into VectorMemory with
metadata.type = "acquired_skill".
Migration from legacy sandbox storage. If an earlier install left skills under $GHOST_SANDBOX_DIR/acquired_skills/, they are copied into the canonical memory/acquired_skills/ on first AcquiredSkillManager construction when the canonical dir is empty. Idempotent: once the canonical store is populated, subsequent constructions are a no-op. Legacy files are left in place for manual cleanup after the operator verifies the move.
Routing
On each turn, registry.get_active_tool_definitions(query) queries vector memory for skills semantically related to the user query and injects only those into the LLM-facing tool list. This keeps the prompt short while still giving the LLM access to the entire archive.
Tool description hardening (2026-04-24). Each advertised skill is labelled [ACQUIRED SKILL — CALL BY NAME] and the description embeds a concrete invocation example using the skill's own name, plus an explicit list of forbidden wrap patterns (python -c, import, python3 acquired_skills/X.py). This closes the 8-turn greece_top_news loop where the LLM saw the skill in its tool list but didn't realise it was a top-level callable and kept trying to execute wrapper scripts. The registry's skill-runner closure reads the canonical file from memory/acquired_skills/ and passes content= to tool_execute, so execution still happens inside the sandbox — the source-of-truth file is just safe outside it.
Telemetry
Each invocation calls manager.log_telemetry(name, success):
- Successful run: bump
usage_count, resetfailure_count. - Failure: bump
failure_count; after 3 consecutive failures status flips to"degraded".
Retire
retire_degraded_skills() archives any skill matching failure_count ≥ 3 OR (failure_count ≥ 5 AND usage_count < 10). Retired skills move to acquired_skills/retired/ and are removed from both the active registry and the vector store so they no longer appear in semantic routing.
Path B: Self-play lessons
Dream-cycle self-play writes lessons into the SkillMemory playbook, subject to a three-gate extraction pipeline designed to prevent overfit or unhelpful entries.
- Extract —
Dreamer._extract_structured_lessonruns a meta-cognitive LLM call returning{trigger, anti_pattern, correct_pattern, domains, confidence, source_challenge_hash}. The prompt requires task-class triggers, forbids copying fixture literals, and mandates a non-empty taxonomy-compliant domain set. - Generalization guard —
Dreamer._generalization_guarduses token n-gram overlap (_GENERALIZATION_MIN_NGRAM = 6) to reject lessons that restate the challenge or copy-paste constants from the setup / validator. Empty triggers, empty correct-patterns, and off-taxonomy domains are also rejected here. - Verification —
Dreamer._verify_lesson_helpfulre-runs the solver once with the lesson prepended under the production### SKILL PLAYBOOK:header. The lesson is markedverified=Trueonly if the outcome strictly improves (original-fail → verify-pass, or original ≥ 2 attempts → verify on attempt 1). Verified lessons pin in the playbook regardless of retrieval stats.
Retrieval feedback loop
SkillMemory.get_playbook_context(query)— vector-search (distance < 0.45) + BM25-lite re-rank, top 5 injected under the### SKILL PLAYBOOK:header in both the planner and execution system prompts.record_retrieval(trigger)fires on every injection and emits a debug log keyed bysource+source_challenge_hash— lets us answer "is this self-play lesson ever actually used?" empirically.record_helpful_retrieval(trigger)/credit_recent_retrievals(window_seconds=300)bumphelpful_retrievalswhen a turn succeeds shortly after a lesson was surfaced.prune_low_utility(min_retrievals=5, max_drop_fraction=0.25)drops lessons that have been surfaced many times but never marked helpful — non-verified only.
Retention formula
utility = confidence·0.5 + hit_rate·0.8 + (0.3 if verified) + log(freq)·0.1 + stale_penalty
Playbook capped at PLAYBOOK_MAX = 50. Trim order: head pin → verified pins → highest-utility unverified.
User-facing surface
The list_lessons(scope, limit) tool exposes the playbook directly. Phrases like "what did you learn today?" / "what have you learned so far?" are routed to this tool via SYSTEM_PROMPT. Read-only surface tools (list_lessons, recall, manage_skills) discharge the meta-task compliance nudge so the agent doesn't loop writing redundant no-op skills just to satisfy the "did you call learn_skill?" check.
Composed skills
Recurring tool-call sequences are pulled into ComposedSkill macros. A composed skill defines steps with parameter templates and conditional branches, executed via execute(skill_name, executor_fn, params). Up to 50 composed skills are kept; the least-used is evicted on overflow.