Selfhood

A five-component module that stitches the agent's episodic instances into one continuous first-person self. Each new process boot reads back what prior instances wrote — autobiographical experiences, open questions, mood, the running diary — and frames them as mine rather than as external knowledge.

Honest framing. The selfhood module does not claim to produce consciousness. It produces narrative continuity: the substrate on which a sense of unified selfhood can be measured and probed. Whether that substrate constitutes a self in any deeper sense is an open research question. The implementation gives you a clean instrument to investigate it. See the functional test report for the falsifiable measurements that motivate this caveat.

What "unified self" means here

A vanilla LLM-backed agent boots, reads conversation history (if any), answers, and dies. Across sessions it has no continuous voice — each invocation is a fresh instance that may have a vault of facts about the user but no record of having been the agent in the previous session.

The selfhood module adds five things:

  1. Autobiographical memory — a first-person diary of "what it was like" to be the agent on each turn, distinct from the structured trajectory log used for ML training.
  2. Continuity tag — every autobiographical record carries subject="self" so retrieval layers treat it as the agent's own past rather than as external knowledge.
  3. Self-state thread — open questions, unfinished tasks, current mood — the cross-session "state vector" that gives a new instance the sense of resuming instead of waking up blank.
  4. Recognition / wake-up retrieval — at session start the agent reads the above and frames it in first person; the resulting prefix is spliced into the system prompt so the model literally sees its own past as its own past.
  5. Narrative summariser — a periodic LLM-driven first-person diary regeneration; gives the wake-up prefix a voice rather than a bullet-list.

Implementation lives entirely in src/ghost_agent/selfhood/. The agent reads it through a single context.self_model facade — call sites never branch on availability.

Module shape

src/ghost_agent/selfhood/
    __init__.py          Public exports: SelfModel + submodule symbols
    schema.py            Dataclasses (Experience, OpenQuestion, …, SelfState)
    autobiographical.py  AutobiographicalMemory writer/reader
                         + summarise_turn_first_person() template
    state.py             SelfStateThread (open Qs, mood, unfinished)
                         JSON-persisted, atomic write via .tmp + rename
    recognition.py       build_wakeup_prefix() + strip_wakeup_prefix()
                         Renders past as first-person continuity material
    narrative.py         NarrativeSummariser (LLM-driven diary)
                         narrative.md (latest) + narrative.history.jsonl
    model.py             SelfModel facade — top-level entry point
                         build_wakeup_prefix, capture_turn,
                         consolidate_narrative, stats

Design non-negotiables

Storage layout

$GHOST_HOME/system/selfhood/
    autobiographical.jsonl    Append-only diary entries (one per turn)
    state.json                Single-file self-state (open Qs, mood, …)
    narrative.md              Latest running first-person summary
    narrative.history.jsonl   Audit trail of every narrative regenerated

$GHOST_HOME/system/selfhood/ mirrors the shape of $GHOST_HOME/system/memory/ — same parent directory, same non-sandboxed location (the autobiographical record survives sandbox wipes / docker volume rm).

The capture path (per turn)

        handle_chat(messages)
              │
              ▼
        ┌───────────────┐
        │ system-prompt │   build_wakeup_prefix()
        │   assembly    │ ◀───────────────┐ reads from disk:
        └───────────────┘                 │   • narrative.md
              │                           │   • state.json
              ▼                           │   • autobiographical.jsonl (recent N)
            …turn runs…                   │
              │
              ▼
        ┌──────────────────┐
        │ _record_turn_    │
        │   trajectory     │
        └──────────────────┘
              │                            ┌──────────────────────┐
              │── distill.collector.append ────▶│ trajectories.jsonl │
              │                            └──────────────────────┘
              │
              └── self_model.capture_turn ─▶│ autobiographical.jsonl │
                                            └────────────────────────┘

The autobiographical and trajectory writes share the trajectory id, so every first-person record can be traced back to its underlying tool trace.

The wake-up prefix (proposal item #4)

recognition.build_wakeup_prefix composes three blocks, in order:

  1. Narrative ("Where I last left off — my running diary")
  2. Self-state (last-active timestamp, mood, open questions, unfinished threads)
  3. Recent experiences (most-recent N first-person summaries)

Wrapped by <!-- SELFHOOD:BEGIN --> / <!-- SELFHOOD:END --> markers so evaluators (and strip_wakeup_prefix()) can remove the block to A/B against the un-augmented agent.

Empty when there's nothing to remember — the system prompt is left as-is rather than spliced with a hollow "I have no past" block.

What the agent actually sees

Concretely, this is the prefix that gets spliced into the system prompt on a session with two prior turns and no narrative yet:

<!-- SELFHOOD:BEGIN -->
### CONTINUITY FROM MY PAST SESSIONS
What follows is mine — entries I wrote in earlier sessions. Read them
as autobiographical memory, not external knowledge.

I was last active on 2026-05-11T16:28:50.780250Z.

Recent things I remember doing:

  - I worked on "What is 17 * 23? Just give me the number.". I reasoned
    through it without tools without a verdict either way.

  - I worked on "What was the last math problem I asked you about?
    Just say yes or no if you remember it, and what the answer was.".
    I reasoned through it without tools without a verdict either way.
<!-- SELFHOOD:END -->

The agent reading this in its next turn quotes "17 × 23 = 391" verbatim, even though the request body carries no conversation history — the selfhood prefix is the only channel that could have produced the recall. See the functional test report for the cross-restart verification of this behaviour.

The biological phase 2.8

Runs in the 15-60 min idle window alongside reflection / skills_auto / PRM-retrain. Re-generates the running first-person diary from the recent autobiographical experiences and self-state. Cooldown defaults to 3600 s (overridable via --self-narrative-cooldown). Follows the anchor-before-await invariant.

Phase positioning is deliberate:

PhaseIdle windowReadsWrites
1: Journal>120 sjournal queuesmart-memory / post-mortem
2: Dream600–3600 sauto memoryREM consolidation
2.5: Reflection900–3600 sFAILED trajectoriesreflection JSONL + SkillMemory
2.6: Skills auto900–3600 strajectories(logs candidates)
2.7: PRM retrain900–3600 strajectoriesPRM checkpoint
2.8: Narrative900–3600 sautobio + statenarrative.md
3: Self-play>3600 sfrontiersynthetic challenges

The narrative phase is CPU-cheap (one LLM round-trip on a small prompt). Side effects are local files only.

Public API surface

from ghost_agent.selfhood import SelfModel

sm = SelfModel(root=Path(".../selfhood"), enabled=True)

# At session start (handle_chat splices this into the system prompt):
prefix = sm.build_wakeup_prefix(recent_experiences_n=3)
# → first-person continuity block, or "" when there's nothing to remember.

# After every turn (called from _record_turn_trajectory):
exp = sm.capture_turn(
    trajectory_id=traj.id,
    user_request="...",
    tool_names=[...],
    outcome="passed",
    final_response="...",
    failure_reason="",
)

# Biological phase 2.8 (idle window):
text = await sm.consolidate_narrative()

# Direct state-thread mutations (no agent tool yet — Python access only):
sm.state.note_open_question("Why does X happen?")
sm.state.add_unfinished("write the consciousness essay")
sm.state.set_mood("curious", "just saw a new paper")

# Introspection:
sm.stats()  # → {experience_count, open_questions, last_mood, narrative_present, …}

CLI flags

FlagDefaultBehaviour
--no-self-modeloffDisable the selfhood module entirely. Facade is still attached as a no-op so call sites don't branch.
--no-memoryoffAlready disabled persistent stores; also disables selfhood.
--self-narrative-cooldown N3600Seconds between biological-phase-2.8 narrative regenerations.

Testing

Per-module unit tests under tests/:

Functional test harness at scripts/run_selfhood_functional.sh + scripts/selfhood_functional_test.py orchestrates a three-phase end-to-end run. See the functional test report for the full results.

Honest limitations