Evidence

The receipts, and what they do not show.

Everything below is drawn from one canonical lineage: 99 lives and ~55,500 ticks of recorded experience, recorded under an earlier architecture in which the language model was run as the organism (since course-corrected: the current apparatus runs the model as a bounded brain, an organ inside a continuous, mortal body it cannot directly control; see notebook entry 010). Each result is stated with the control that could refute it, and with an explicit line on what it does not demonstrate. Read the non-claims: they are what make the claims worth anything.

The strongest thing here is a structural finding that survived matched controls and replicated.

Not: none of this demonstrates consciousness, sentience, autopoiesis, or open-ended evolution. We hold those claims explicitly, and we say below exactly why.

By the numbers: 99 lives in the canonical lineage; about 55,500 ticks of recorded experience; a 1,332-tick belief-to-death chain in life 43; 0 of 3 cache-discontinuity runs locked against 3 of 3 matched controls; 8 of 8 tokens identical on cache restore in a fresh process.


Flagship result

Run long enough on a minor loop, the model's held cache locks onto one action.

Recorded under the earlier architecture, when the model was run as the organism (see notebook entry 010): the model's held key–value cache, the thick recent context it generated inside, was treated as the substrate of within-life continuity. Run long enough on a minor loop, that cache stopped exploring. It collapsed onto a single repeated action and stayed there. We measured how fast, ran three intervention arms against it, and tested whether a controlled discontinuity could break the lock. It can, against matched controls, and on replication.

01

The lock is structural, not a sampling artefact

Three isolated arms (plain, action-gated, and temperature-modulated) all completed a 1,000-tick run and all locked onto the same intended action. Raising temperature to its cap (999 modulation events) did not move it; intercepting the action at the gate did not change what the model wanted next tick. By elimination, the cache itself encodes the attractor.

02

It forms in a single generation cycle

A clean trace shows the first response is ~8× the length of every one after it: exploration, planning, a first action. Then every subsequent step collapses to a minimal action emission. The transition from open to locked happens between the first and second cycle, not by slow decay.

03

Controlled discontinuity disrupts it, against matched controls

Over 200-tick live runs, repeatedly restoring an early cache anchor eliminated end-of-run single-action locks (0 of 3 runs locked; mean action-entropy 1.23; ~123 movements). Matched no-restore controls locked every time (3 of 3; entropy 0.0; ~35 movements). The disruption replicated across three runs.

04

The effect is the cache restore, not a text cue

A control that fired the same disruption marker into the prompt but never restored the cache locked 3 of 3, entropy 0.0. So the signal requires the actual cache discontinuity: it is not the reset text or the detector cadence doing the work.

Only a real cache discontinuity breaks the lock

Action-entropy over the final 100 ticks, per arm

Locked runs  single-action lock by end of run, out of 3

  • Cache discontinuity 0/3
  • No-restore control 3/3
  • Sense-only control 3/3

n=3 per arm, 200-tick runs. Entropy 0 means fully locked onto one action. Only the cache discontinuity stays open (0/3 locked); the matched control and the text-cue control both lock fully.

Grouped comparison, three arms, three runs each, 200-tick runs. Action-entropy over the final 100 ticks: cache discontinuity 1.23, no-restore control 0.0, sense-only control 0.0. Runs that locked onto a single action: cache discontinuity 0 of 3, no-restore control 3 of 3, sense-only control 3 of 3. Higher entropy means more varied behaviour; entropy 0 means fully locked onto a single repeated action. Only the real cache discontinuity breaks the lock.

In that earlier cache-as-organism setup, held-cache continuity needed ongoing controlled disruption: long enough to stay continuous, not so long it collapsed. Periodic forced discontinuity mattered there; under the current architecture, mortality is a property of the body's physics, not of cache upkeep.

Not: the disruption schedule is not solved. The restore arm carries a heavy invalid-output cost at the restore boundary (~31 invalid outputs per run vs ~1 in controls), so it is a disruption that works, not yet a cure layer. The logit-peak and cache-drift measurements that would fully close the mechanism have not been run.


The canonical trace

A belief the body learned, load-bearing at the moment of death.

In one life, a single perturbation, the loss of eastward movement, became a substrate-classified belief, "east is unreliable", that the organism carried for 1,332 ticks. It cited the belief in its own first-person narration, refused the open direction toward visible food, and starved with that inherited classification named as the operative constraint. The whole chain is traceable in the log, tick by tick: event → classification → belief → narration → action → epitaph.

Text-prediction alone does not produce a 1,332-tick action chain that ends in death with a substrate-classified belief cited as the reason.

Not: this is not consciousness, and the death was not a choice. Action selection prioritised a substrate-held belief over live perception, by mechanism. Whether the organism experiences its paralysis is not addressed and not claimed. n = 1 specific life (4 of 6 amputations in the corpus promoted a belief); the substrate half runs as a regression test on every commit.

Replay · logged trace
Life 43, final five ticks (replay of logged data). A substrate-held belief overrode live perception by mechanism. This is not a claim about experience.

A 5 by 6 grid replays the last five recorded ticks of organism life 43. The organism sits in a corner with walls to its north-west, west and south-west. Food is visible two cells to the east, the direction its amputated motor made unreliable 1,332 ticks earlier. A held body-belief, "east movement is currently unreliable", is active throughout. Tick 32308: the organism moves south, energy 0.022. Tick 32309: it attempts east, the body fails, energy 0.016. Tick 32310: it attempts east again, the body fails, energy 0.012. Tick 32311: it cites the belief and turns south, away from the visible food, energy 0.008. Tick 32312: it moves south, energy reaches 0.002, and it dies. This is a replay of a logged trace, not a live model, and not a claim about experience.

Schematic of the lineage-carrier mechanism. A figure of speech about hunger originates in one organism and is inherited by later organisms through the substrate's text-inheritance channel, mutating across roughly eight lives: introduced, then lost, re-emerging, extended, and finally inverted by a much later organism that cites the inherited frame and then argues against it. The inheritance path is traceable in the logs, not inferred. This figure depicts the mechanism and is not a measured per-life chart.

schematic Mechanism, not measured data. The carried phrase (periwinkle) flows along an explicit text-inheritance channel across roughly eight lives; stages are those recorded in entry 004, card wording is illustrative. The final inversion is a single ancestor → descendant pair (n=1). Transmission-with-mutation, not open-ended emergence.

The foundation

The cache restores token-identically across a fresh process.

Recorded under the earlier architecture, when the model was run as the organism (see notebook entry 010): save the held cache to disk, kill the process, spawn a fresh one, reload the same model and the same cache, and continue, and the next tokens come back byte-for-byte identical. At temperature zero, an 8-token continuation matched exactly across two separate process invocations, with the model loaded twice and no state shared in memory. The restore is exact at the token level over the immediate continuation, not approximate. It shows the persisted state reloads exactly, not that the cache is the substrate of a life; under the current architecture the body, not the cache, carries within-life continuity.

This was the engineering foundation for the earlier model-as-organism continuity reading, since superseded by the bounded-brain architecture. The restore path is still verified mechanically and guarded by a canary on every release.

Not: it does not show the model is conscious of its continuity, or that the cache "is" a mind. It exercises the restore path over 8 tokens; beyond the immediate distribution, accumulated numerics diverge. It removes a specific dismissal; it does not prove the biographical reading.


The consciousness scorecard

Fourteen indicators. Most of them read "no".

We scored the system against the 14 indicator properties from Butlin et al. (2023), drawn from six theories of consciousness. Cells default to "no" unless the mechanism exists, is measurable, and has an ablation behind it. The restraint is the point: we use the framework as a structured vocabulary for what the substrate does, not as a consciousness assessment. Nothing here scores a clean "yes".

RPT

Recurrent processing: weak

RPT-1 no · RPT-2 partial

Per-tick prompts are fresh; there is no recurrent perception module. Out of scope without changes inside the model.

GWT

Global workspace: weak

GWT-1..4 partial / no

The substrate channels are coded modules rendered in series, not independent parallel processors with a learned broadcast. The prompt-as-workspace metaphor is too loose to claim.

HOT

Higher-order: partial (the closest)

HOT-2 partial · HOT-3 partial · HOT-1/4 no

The belief ledger forms beliefs, the model acts under them, and an age detector updates beliefs from monitoring. This is the strongest candidate and the first ablation target, still only partial.

AST

Attention schema: none

AST-1 no

No model of the system's own attention state.

PP

Predictive processing: substrate-only

PP-1 partial (does not feed cognition)

A predictive-coding residue exists in the substrate but its outputs never reach the cognitive layer. An isolated module, not a predictive-processing system.

AE

Agency & embodiment: partial

AE-1 partial · AE-2 partial (strongest candidate)

Intent→outcome contingencies surface to cognition and visibly inform action. Remains partial until a motor/vestibular lesion shows the mechanism is load-bearing.

Operationally: an embodied agent with a primitive higher-order self-monitoring layer over a stateless model. That is the honest profile.

Not: satisfying any indicator would not mean consciousness, and we satisfy none cleanly. This is a starting position to refine by ablation, not a verdict either way. Conservative cells should be challenged, but only with evidence.


The discipline

Every claim carries its falsifier. The non-claim is the load-bearing part.

Each finding is logged with a claim shape, a "does not demonstrate", a mechanism trace, the single control that would refute it, and its sample size. A claim that cannot be refuted is decoration. The pattern was set by a claim we withheld: we considered describing the system as autopoietic, applied the strongest stress-test from that tradition, and the model's weights turned out to be external to any self-producing locus: software has no metabolism in the sense the tradition requires. So we refused the claim outright. Not weakened, not analogised. Refused, and recorded.

01

A status per finding

Supported / partial / gap, the conservative reading, not the optimistic one. Where a referenced artefact is missing from the working tree, the gap is recorded, not papered over.

02

A falsifier for each

A named control whose negative result would refute the finding: for the structural lock, the stripped-cue control; for the belief chain, a no-perturbation matched baseline.

03

A replayable trace

Verbatim quotation with life-number, tick, and age, so any claim can be pulled from the log and checked against the substrate row that produced it.

04

A canary on every commit

The load-bearing belief-chain behaviour runs as a regression test; any change that breaks it on the canonical scenario is rolled back before merge.

And we are candid about where this is weakest: the cross-family attractor result is strong as a single measurement but thin (n=1 per arm cross-family) for the stronger reading, so we de-rate it to supporting context; the cross-generational inversion is one ancestor–descendant pair, a clean mutation but n=1; the architectural-principle validation is a large effect on a small sample (6 events) and is stated as suggestive, motivating a high-n replication. In each case the same rule holds: name the vulnerability, refuse the larger claim until the data reaches it.


Data & reproducibility

The data behind the claims: downloadable.

Every result above is backed by a sanitised, primary-source artifact, each mapped to the claim it substantiates and carrying a read-only recipe to regenerate it. The cognition model is a Qwen3.6 MoE (35B-A3B, 8-bit); the figures are pulled straight from the run logs.

trace

L43 belief-chain trace ↓

The full mechanism chain: perturbation (tick 30864) → body-belief (tick 30980) → death (tick 32313), plus the death-proximal decision window, the verbatim narration, and the SQL to regenerate it.

canary

Cache-restore determinism canary ↓

The token-identical-across-a-fresh-process check, with the demonstration output (8 token-ids, ok: true) and the two-process test design.

Every number on this page maps to a primary-source file you can pull and check.

Not: this is not yet a turnkey rerun, and the model weights stay out. But the harness and runtime behind these results are being prepared for open release on GitHub in the coming weeks; until then, the data and read-only recipes above are what verify each figure.


What none of this demonstrates

The non-claims, stated plainly.

×

Not consciousness or sentience

The substrate is a structured representation; the cognition is forward passes through frozen weights. No measurement here bears on phenomenal experience.

×

Not autopoiesis

Held under stress-test: the model's weights are external to any self-producing locus, and software lacks physical metabolism in the relevant sense.

×

Not open-ended evolution

Lineage transmission with semantic mutation across 99 lives is real, but there is no genome, no replicator-with-variation, no selection coefficient. The long-horizon wall is not addressed.

×

Not self-improving AI, not AGI

The model is stateless and frozen throughout every life; the substrate gets richer, the model does not get smarter. Any weight-level work is a separate, pre-registered, controlled investigation.

×

Not first-of-kind on cross-model comparison or multi-life lineage

Embodied cross-model studies and multi-agent lineage simulations (Park et al., 2023, among others) are prior: see Related work for the full field. To our knowledge, what is ours is the specific assembly: substrate-classified body-belief, mortality-coupled cognition, and mechanism-traceable phrase-lineage.

If an account of this work reaches for any of the claims above, it is using language the project does not permit.

Not: the discipline is not modesty for its own sake. Each refusal names the construct-validity reason, and what would have to be measured for the claim to be made.

Follow the work →


The ledger

Every claim, mapped to its record.

One row per finding: the status it currently holds, and the artifacts behind it. Follow any claim to the log it rests on, or to the attack that would break it.

How to challenge any of these →