Everything below is drawn from one canonical lineage: 99 lives and ~55,500 ticks of
recorded experience, recorded under an earlier architecture in which the language model was
run as the organism (since course-corrected: the current apparatus runs the model as a
bounded brain, an organ inside a continuous, mortal body it cannot directly control; see
notebook entry 010). Each result is stated with the control that could refute it, and with an
explicit line on what it does not demonstrate. Read the non-claims: they are what
make the claims worth anything.
The strongest thing here is a structural finding that survived matched controls and replicated.
Not: none of this demonstrates consciousness, sentience, autopoiesis, or open-ended
evolution. We hold those claims explicitly, and we say below exactly why.
By the numbers: 99 lives in the canonical lineage; about 55,500 ticks of recorded experience; a 1,332-tick belief-to-death chain in life 43; 0 of 3 cache-discontinuity runs locked against 3 of 3 matched controls; 8 of 8 tokens identical on cache restore in a fresh process.
99
Lives in the lineage
~55,500
Ticks of recorded experience
1,332ticks
Belief-to-death chain, life 43
0/3vs3/3
Locked: discontinuity vs control
8/8tokens
Identical on cache restore, fresh process
Flagship result
Run long enough on a minor loop, the model's held cache locks onto one action.
Recorded under the earlier architecture, when the model was run as the organism (see notebook
entry 010): the model's held key–value cache, the thick recent context it generated inside, was
treated as the substrate of within-life continuity. Run long enough on a minor loop, that cache
stopped exploring. It collapsed onto a single repeated action and stayed there. We measured how
fast, ran three intervention arms against it, and tested whether a controlled discontinuity could
break the lock. It can, against matched controls, and on replication.
01
The lock is structural, not a sampling artefact
Three isolated arms (plain, action-gated, and temperature-modulated) all completed a
1,000-tick run and all locked onto the same intended action. Raising temperature to its
cap (999 modulation events) did not move it; intercepting the action at the gate did not
change what the model wanted next tick. By elimination, the cache itself encodes the
attractor.
02
It forms in a single generation cycle
A clean trace shows the first response is ~8× the length of every one after it:
exploration, planning, a first action. Then every subsequent step collapses to a minimal
action emission. The transition from open to locked happens between the first and second
cycle, not by slow decay.
03
Controlled discontinuity disrupts it, against matched controls
Over 200-tick live runs, repeatedly restoring an early cache anchor eliminated end-of-run
single-action locks (0 of 3 runs locked; mean action-entropy 1.23; ~123 movements).
Matched no-restore controls locked every time (3 of 3; entropy 0.0; ~35 movements). The
disruption replicated across three runs.
04
The effect is the cache restore, not a text cue
A control that fired the same disruption marker into the prompt but never restored the
cache locked 3 of 3, entropy 0.0. So the signal requires the actual cache discontinuity:
it is not the reset text or the detector cadence doing the work.
Only a real cache discontinuity breaks the lock
Action-entropy over the final 100 ticks, per arm
Locked runs single-action lock by end of run, out of 3
Cache discontinuity0/3
No-restore control3/3
Sense-only control3/3
n=3 per arm, 200-tick runs. Entropy 0 means fully locked onto one action. Only the cache
discontinuity stays open (0/3 locked); the matched control and the text-cue control both lock fully.
Grouped comparison, three arms, three runs each, 200-tick runs. Action-entropy over the final
100 ticks: cache discontinuity 1.23, no-restore control 0.0, sense-only control 0.0. Runs that
locked onto a single action: cache discontinuity 0 of 3, no-restore control 3 of 3, sense-only
control 3 of 3. Higher entropy means more varied behaviour; entropy 0 means fully locked onto a
single repeated action. Only the real cache discontinuity breaks the lock.
In that earlier cache-as-organism setup, held-cache continuity needed ongoing controlled disruption: long enough to stay continuous, not so long it collapsed. Periodic forced discontinuity mattered there; under the current architecture, mortality is a property of the body's physics, not of cache upkeep.
Not: the disruption schedule is not solved. The restore arm carries a heavy invalid-output
cost at the restore boundary (~31 invalid outputs per run vs ~1 in controls), so it is a
disruption that works, not yet a cure layer. The logit-peak and cache-drift measurements
that would fully close the mechanism have not been run.
The canonical trace
A belief the body learned, load-bearing at the moment of death.
In one life, a single perturbation, the loss of eastward movement, became a substrate-classified
belief, "east is unreliable", that the organism carried for 1,332 ticks. It cited the belief in
its own first-person narration, refused the open direction toward visible food, and starved with
that inherited classification named as the operative constraint. The whole chain is traceable in
the log, tick by tick: event → classification → belief → narration → action → epitaph.
Text-prediction alone does not produce a 1,332-tick action chain that ends in death with a substrate-classified belief cited as the reason.
Not: this is not consciousness, and the death was not a choice. Action selection prioritised
a substrate-held belief over live perception, by mechanism. Whether the organism experiences
its paralysis is not addressed and not claimed. n = 1 specific life (4 of 6 amputations in the
corpus promoted a belief); the substrate half runs as a regression test on every commit.
Replay · logged trace
Tick32308Age1729Actionsouth → moved
Energy0.022
held body-beliefeast movement is currently unreliable
Life 43, final five ticks (replay of logged data). A substrate-held belief overrode
live perception by mechanism. This is not a claim about experience.
A 5 by 6 grid replays the last five recorded ticks of organism life 43. The organism
sits in a corner with walls to its north-west, west and south-west. Food is visible
two cells to the east, the direction its amputated motor made unreliable 1,332 ticks
earlier. A held body-belief, "east movement is currently unreliable", is active throughout.
Tick 32308: the organism moves south, energy 0.022. Tick 32309: it attempts east, the
body fails, energy 0.016. Tick 32310: it attempts east again, the body fails, energy 0.012.
Tick 32311: it cites the belief and turns south, away from the visible food, energy 0.008.
Tick 32312: it moves south, energy reaches 0.002, and it dies. This is a replay of a
logged trace, not a live model, and not a claim about experience.
Schematic of the lineage-carrier mechanism. A figure of speech about hunger
originates in one organism and is inherited by later organisms through the
substrate's text-inheritance channel, mutating across roughly eight lives:
introduced, then lost, re-emerging, extended, and finally inverted by a much
later organism that cites the inherited frame and then argues against it. The
inheritance path is traceable in the logs, not inferred. This figure depicts
the mechanism and is not a measured per-life chart.
life 1introduceda figure of speech about hunger first appears
life 2lostabsent from the next recorded life
life 3re-emergesreappears, located against the earlier phrasing
life 4extendeda descendant elaborates the inherited frame
life 5inverteda later life cites the frame, then argues against it
schematic
Mechanism, not measured data. The carried phrase (periwinkle) flows along an
explicit text-inheritance channel across roughly eight lives; stages are those recorded in entry 004,
card wording is illustrative. The final inversion is a single ancestor →
descendant pair (n=1). Transmission-with-mutation, not open-ended emergence.
The foundation
The cache restores token-identically across a fresh process.
Recorded under the earlier architecture, when the model was run as the organism (see notebook
entry 010): save the held cache to disk, kill the process, spawn a fresh one, reload the same
model and the same cache, and continue, and the next tokens come back byte-for-byte identical. At
temperature zero, an 8-token continuation matched exactly across two separate process
invocations, with the model loaded twice and no state shared in memory. The restore is exact at
the token level over the immediate continuation, not approximate. It shows the persisted state
reloads exactly, not that the cache is the substrate of a life; under the current architecture the
body, not the cache, carries within-life continuity.
This was the engineering foundation for the earlier model-as-organism continuity reading, since superseded by the bounded-brain architecture. The restore path is still verified mechanically and guarded by a canary on every release.
Not: it does not show the model is conscious of its continuity, or that the cache "is" a mind.
It exercises the restore path over 8 tokens; beyond the immediate distribution, accumulated
numerics diverge. It removes a specific dismissal; it does not prove the biographical reading.
The consciousness scorecard
Fourteen indicators. Most of them read "no".
We scored the system against the 14 indicator properties from Butlin et al. (2023), drawn from
six theories of consciousness. Cells default to "no" unless the mechanism exists, is measurable,
and has an ablation behind it. The restraint is the point: we use the framework as a structured
vocabulary for what the substrate does, not as a consciousness assessment. Nothing here scores a
clean "yes".
RPT
Recurrent processing: weak
RPT-1 no · RPT-2 partial
Per-tick prompts are fresh; there is no recurrent perception module. Out of scope without changes inside the model.
GWT
Global workspace: weak
GWT-1..4 partial / no
The substrate channels are coded modules rendered in series, not independent parallel processors with a learned broadcast. The prompt-as-workspace metaphor is too loose to claim.
HOT
Higher-order: partial (the closest)
HOT-2 partial · HOT-3 partial · HOT-1/4 no
The belief ledger forms beliefs, the model acts under them, and an age detector updates beliefs from monitoring. This is the strongest candidate and the first ablation target, still only partial.
AST
Attention schema: none
AST-1 no
No model of the system's own attention state.
PP
Predictive processing: substrate-only
PP-1 partial (does not feed cognition)
A predictive-coding residue exists in the substrate but its outputs never reach the cognitive layer. An isolated module, not a predictive-processing system.
AE
Agency & embodiment: partial
AE-1 partial · AE-2 partial (strongest candidate)
Intent→outcome contingencies surface to cognition and visibly inform action. Remains partial until a motor/vestibular lesion shows the mechanism is load-bearing.
Operationally: an embodied agent with a primitive higher-order self-monitoring layer over a stateless model. That is the honest profile.
Not: satisfying any indicator would not mean consciousness, and we satisfy none cleanly.
This is a starting position to refine by ablation, not a verdict either way. Conservative
cells should be challenged, but only with evidence.
The discipline
Every claim carries its falsifier. The non-claim is the load-bearing part.
Each finding is logged with a claim shape, a "does not demonstrate", a mechanism trace, the
single control that would refute it, and its sample size. A claim that cannot be refuted is
decoration. The pattern was set by a claim we withheld: we considered describing the
system as autopoietic, applied the strongest stress-test from that tradition, and the model's
weights turned out to be external to any self-producing locus: software has no metabolism in
the sense the tradition requires. So we refused the claim outright. Not weakened, not
analogised. Refused, and recorded.
01
A status per finding
Supported / partial / gap, the conservative reading, not the optimistic one. Where a referenced artefact is missing from the working tree, the gap is recorded, not papered over.
02
A falsifier for each
A named control whose negative result would refute the finding: for the structural lock, the stripped-cue control; for the belief chain, a no-perturbation matched baseline.
03
A replayable trace
Verbatim quotation with life-number, tick, and age, so any claim can be pulled from the log and checked against the substrate row that produced it.
04
A canary on every commit
The load-bearing belief-chain behaviour runs as a regression test; any change that breaks it on the canonical scenario is rolled back before merge.
And we are candid about where this is weakest: the cross-family attractor result is strong as a
single measurement but thin (n=1 per arm cross-family) for the stronger reading, so we de-rate
it to supporting context; the cross-generational inversion is one ancestor–descendant pair, a
clean mutation but n=1; the architectural-principle validation is a large effect on a small
sample (6 events) and is stated as suggestive, motivating a high-n replication. In each case the
same rule holds: name the vulnerability, refuse the larger claim until the data reaches it.
Data & reproducibility
The data behind the claims: downloadable.
Every result above is backed by a sanitised, primary-source artifact, each mapped to the
claim it substantiates and carrying a read-only recipe to regenerate it. The cognition model
is a Qwen3.6 MoE (35B-A3B, 8-bit); the figures are pulled straight from the run logs.
The full mechanism chain: perturbation (tick 30864) → body-belief (tick 30980) → death (tick 32313), plus the death-proximal decision window, the verbatim narration, and the SQL to regenerate it.
The 1,000-tick three-arm differential and the 200-tick discontinuity-vs-control runs (0/3 vs 3/3 locks), including the SENSE-only confound control. Companion notes ↓
A README mapping each claim on this page to its artifact and the exact tick or event that produced it.
Every number on this page maps to a primary-source file you can pull and check.
Not: this is not yet a turnkey rerun, and the model weights stay out. But the harness and
runtime behind these results are being prepared for open release on
GitHub in the coming
weeks; until then, the data and read-only recipes above are what verify each figure.
What none of this demonstrates
The non-claims, stated plainly.
×
Not consciousness or sentience
The substrate is a structured representation; the cognition is forward passes through frozen weights. No measurement here bears on phenomenal experience.
×
Not autopoiesis
Held under stress-test: the model's weights are external to any self-producing locus, and software lacks physical metabolism in the relevant sense.
×
Not open-ended evolution
Lineage transmission with semantic mutation across 99 lives is real, but there is no genome, no replicator-with-variation, no selection coefficient. The long-horizon wall is not addressed.
×
Not self-improving AI, not AGI
The model is stateless and frozen throughout every life; the substrate gets richer, the model does not get smarter. Any weight-level work is a separate, pre-registered, controlled investigation.
×
Not first-of-kind on cross-model comparison or multi-life lineage
Embodied cross-model studies and multi-agent lineage simulations (Park et al., 2023, among others) are prior: see Related work for the full field. To our knowledge, what is ours is the specific assembly: substrate-classified body-belief, mortality-coupled cognition, and mechanism-traceable phrase-lineage.
If an account of this work reaches for any of the claims above, it is using language the project does not permit.
Not: the discipline is not modesty for its own sake. Each refusal names the construct-validity reason, and what would have to be measured for the claim to be made.
One row per finding: the status it currently holds, and the artifacts behind it. Follow any
claim to the log it rests on, or to the attack that would break it.