tools / acquired_skills.py

Runtime Python tool creation with TDD validation, persistent file + registry storage, vector embedding, and degradation tracking.

Tools

Tool	Purpose
`create_skill`	Validate test_payload, write the skill to `acquired_skills/<name>.py`, run TDD via `tool_execute`, persist to `skills_registry.json`, and embed the description in vector memory.
`manage_skills`	List active skills or delete by name. This tool is the canonical answer to "show me your skills" / "list your skills" / "what skills do you have" — a SKILL means a tool the agent has acquired, NOT a lesson it has learned. Lesson-shaped queries route to `list_lessons` instead.

AcquiredSkillManager

Method	Purpose
`save_skill(name, desc, params_schema, python_code)`	Validate `name` against `[A-Za-z_][A-Za-z0-9_]{0,63}`, then write `.py` file, MD5-hash `code+schema` for dedup, update registry, embed description in vector memory only if content changed.
`get_all_skills()`	Read `skills_registry.json`.
`log_telemetry(name, success)`	Increment `usage_count`; reset `failure_count` on success; flip status to `"degraded"` after 3 consecutive failures.
`retire_degraded_skills()`	Auto-archive skills with `failure_count ≥ 3` OR (`failure_count ≥ 5` AND `usage_count < 10`); move `.py` to `retired/`; delete from vector store; remove from active registry.
`delete_skill(name)`	Manual remove + delete file + clean vector store.

Skill-name validation (path-traversal guard)

Skill names are used as bare filenames (<skills_dir>/<name>.py) AND as registry keys visible in the tool catalogue shown to the LLM. They must therefore be pure identifiers: no separators, no traversal, no punctuation.

Prior behaviour wrote to the resolved path with no validation. A name like "../../pwn" escaped skills_dir — confirmed exploitable: save_skill(name="../../pwn", python_code="...") wrote the payload to the sandbox's PARENT directory on disk.

Fix: _validate_skill_name(name) enforces ^[A-Za-z_][A-Za-z0-9_]{0,63}$ before any filesystem write. Belt-and-braces: the resolved skill_path is ALSO checked against skills_dir.resolve() so any future regex widening can't silently re-open the hole. Invalid names are rejected via pretty_log("Skill Rejected", ...) so reflection can learn to use identifier-shaped names. Covered by tests/test_acquired_skills_name_traversal.py (20+ bad shapes rejected, 8 good shapes accepted, full FS escape probe, boundary + belt-and-braces doc assertion).

TDD validation

The skill is written to test_skill.py, executed with test_payload as sys.argv[1] (parsed as JSON inside the script). The payload must produce EXIT CODE: 0 with non-empty stdout. A common failure (forgetting if __name__ == "__main__":) is detected and reported with a specific hint.

python_code is normalized at the entry point (2026-04-24). Before writing test_skill.py, tool_create_skill now runs sanitize_code(python_code, "test_skill.py"). This strips CDATA envelopes (<![CDATA[...]]>) that sometimes leak through the XML tool-call parser when the LLM forgets a closing </parameter> tag, decodes HTML entities when they're clearly corrupt (" inside Python source), and extracts code from markdown fences when the whole thing is wrapped. Strips are AST-gated — clean Python is never perturbed. If the body still doesn't parse after normalization, the LLM gets a specific actionable error naming the common causes (CDATA wrapper, HTML entities, truncated stream, escaped-newline JSON confusion) instead of a generic test failure. Without this, the 2026-04-24 in_gr_news session burned 18+ turns re-submitting CDATA-wrapped payloads because the test-phase error was the generic "failed its TDD test" line. Covered by tests/test_sanitizer_cdata_and_entity_leak.py (20 cases) + tests/test_acquired_skills_storage_relocation.py::test_tool_create_skill_normalises_cdata_wrapped_python + …_rejects_unparseable_python_with_actionable_error.

TDD failure cause surfaces in the trace (_summarise_tdd_failure). When the TDD test fails, the pretty_log line no longer reads just "Skill 'X' failed its TDD test." — it appends the one-line cause extracted from tool_execute's output, e.g.:

❌  test failed  Skill 'greece_top_news' failed its TDD test — ValueError: invalid literal for int() with base 10: '{"length": 16}'

Priority order when extracting: (1) Python traceback's last <Err>: <msg> line — most actionable; (2) Syntax Error Detected: / SYSTEM ERROR: surfaces from the sanitizer; (3) the no-stdout sentinel ("script exited 0 but printed nothing to stdout"); (4) first non-empty body line fallback. Truncated to 200 chars. Never raises — pure log-surface polish. Covered by tests/test_acquired_skills_tdd_failure_summary.py (14 cases).

Tool description hardening (LLM invocation)

Acquired skills are registered as top-level callable tools via the registry's make_skill_runner closure. But without explicit guidance, the LLM saw the generic [ACQUIRED SKILL] <desc> label and gravitated to calling execute or python -c to wrap the skill — especially after the storage relocation, which removed the skill's .py from the sandbox listing the LLM could ls. The 2026-04-24 greece_top_news session burned 8 turns flipping between python -c "from X import X", ls /workspace/acquired_skills/, python3 acquired_skills/X.py, and writing an import-wrapper run_news.py before finally calling the skill by name.

Fix: the description now reads (example for a skill named greece_top_news):

[ACQUIRED SKILL — CALL BY NAME] <user description>

USAGE: This IS a top-level tool. Invoke it directly: `greece_top_news(...)`.
Do NOT wrap it in `execute`, `python -c`, or `file_system` — the implementation
lives OUTSIDE the sandbox (in $GHOST_HOME/system/memory/acquired_skills/) so
`import greece_top_news` and `python3 acquired_skills/greece_top_news.py` will
both fail with ModuleNotFoundError / ENOENT. Just call `greece_top_news` the
way you'd call any built-in tool.

Plus an inline--c block heuristic: if the LLM's rejected body looks like a skill-invocation wrap (from X import X where the module and symbol match, or acquired_skills/X.py in a subprocess call), the SYSTEM BLOCK error appends a targeted hint pointing at the canonical X(...) invocation. So even if the LLM's first instinct is wrong, the rejection steers it at the right answer on the retry. Covered by tests/test_acquired_skill_invocation_nudge.py (6 cases including negative tests pinning that unrelated inline-c bodies never get a phantom skill hint).

Storage layout

Canonical storage lives under $GHOST_HOME/system/memory/acquired_skills/ — outside the Docker bind-mount. A sandbox wipe (docker volume rm, rm -rf $GHOST_SANDBOX_DIR, container recreation) no longer destroys learned skills. Execution is still sandboxed: the registry's skill-runner closure reads the canonical file and passes content= to tool_execute, which writes it into the sandbox just-in-time and runs it.

$GHOST_HOME/system/memory/acquired_skills/
├── <name>.py
├── retired/
│   └── <name>.py
└── skills_registry.json   { name: {description, parameters_schema, usage_count, failure_count, status, content_hash}, ... }

Migration: if an earlier install left skills at the legacy path $GHOST_SANDBOX_DIR/acquired_skills/, they're copied over on first AcquiredSkillManager construction when the canonical dir is empty. Idempotent: a second construction with a populated canonical store does nothing. Legacy files are left in place (to be cleaned up manually once the operator confirms the move). Unsafe-named legacy entries (anything the identifier-shape regex rejects) are skipped and logged.