Method
The strongest claim that survives falsification.
Studying a bounded model wired as a brain inside a mortal body is easy to oversell and hard
to keep honest. So the discipline comes first, before any result: what each claim is allowed
to say, what it is not, and what has to survive a control before it is written down at all.
This page is the firewall: the part of the work that makes the rest worth reading.
The rule we work to
One sentence does the load-bearing.
Every contributor (and there are software agents among them) works to the same line:
Make the strongest claim that survives falsification, then build the next test.
Not artificial-life rhetoric. A claim built to be attacked, pitched exactly as high as the
evidence will hold and no higher. Stated that way, it is much harder to dismiss, and much
harder for us to fool ourselves with.
The firewall
Three tiers. Claims do not leak upward.
The work lives in three tiers with a hard wall between them. Evidence, claims, and language
from a lower tier are never spent on a higher one. When in doubt, we drop down a tier.
v1
Ships: evidence today
Substrate-state cognition
The harness as it stands: a continuous, mortal body governed by its own physics,
coupled to a bounded frozen model that reads a compact summary of the body's state and
proposes only small parameter nudges (a lean toward repair, or toward seeking food).
The body lives or dies on its own physics; the brain nudges but cannot move, eat, or
act directly. Its claims are mechanically traceable (substrate event to classified
body-state summary to model read to bounded parameter nudge to the body acting on its
own physics) and reproducible under deterministic restore. This is the only language
allowed in public copy without qualification.
v2
Builds: in development
The same principle, richer substrates
An engineering roadmap that carries the v1 pattern onto new substrates: webcam-on-laptop
rather than a grid, place-sense, person-sense, sleep-time memory organisation. The
substrate gets richer; the model does not get smarter. v2 claims no new cognitive
capacity, and carries an "in development" label until each step passes its own test.
v3
Investigates: pre-registered
A weight-level research bet
A pre-registered investigation into whether bounded sleep-time weight updates on
accumulated substrate experience produce measurable, control-separable learning. It is
a question, not a capability. Presenting a v3 hypothesis as a project capability before
its controls have spoken is the largest violation the firewall recognises.
v1 ships. v2 builds. v3 investigates.
Not: a v1 result is never cited as if it proves a v3 capability, and a v2 engineering goal
is never dressed as a v1 finding. The wall is the point.
The principle
The body acts. The brain only leans.
One division of labour holds the architecture together. The body is a continuous, mortal
substrate that lives or dies on its own physics: it detects events, accumulates state, and
governs movement, energy, and survival directly, emitting structured, classified
labels into a channel the brain reads. The frozen model is a bounded brain that reads
that compact summary of the body's state and can only propose small parameter nudges (a lean
toward repair, or toward seeking food). It cannot move, eat, or act directly, and it does not
author the organism's voice or action.
Allowed, the body emitting classified state for the brain to read:
continuity_strain=high, energy:well_fed,
motor.east:active:east_unreliable.
Forbidden, describing the model as the living organism, or letting the
brain reach past its narrow channel to move, eat, or act directly rather than propose a
bounded nudge. That collapses the body/brain division and credits the model with
maintaining itself rather than helping the body survive. It is the canonical violation.
The principle was found, not assumed.
Not: a design preference. With the model demoted to a bounded brain that only nudges, its
learned prior is load-bearing for the body's survival, beating all four controls across two
model families. That is not life, consciousness, or self-maintenance, it is one survival
task on two models with no reproduction or evolution, and it is what the evidence forced.
What we do not claim
The non-claims are doctrine, not modesty.
These are stated as plainly as the claims. If a piece of copy reaches for any of them, it is
wrong and gets dropped a tier.
It runs a structured substrate through a model doing forward passes.
Not: consciousness, sentience, or phenomenal experience. Neither layer implies any of these, and we never write that it feels, experiences, or is aware.
It maintains a self-model and behaves with continuity across a life.
Not: autopoiesis, and not "alive." Held under external stress-test, the claim refused: software runs on hardware it does not itself produce and has no physical metabolism. Boden and Pattee draw that ceiling, and we keep it.
A phrase mutated and inverted as it passed down a lineage of lives.
Not: synthetic life in the substrate-emergence sense, and not open-ended evolution. This is striking heritable transmission, not the unbounded novelty that tradition means by the words.
v3 asks whether experience can shift the weights.
Not: self-improving AI, continual learning, biological learning, or a path to AGI. "Self-improving" is barred from v3 work until an outside reviewer applies it to a published study, if ever.
Controls before conclusions
A striking output is a suspect, not a result.
The model gets more rhetorically convincing long before any underlying claim is earned. The
method is built to distrust that feeling. Every result is pre-registered with explicit
falsifiers and anchored to the exact version that produced it; every empirical document
carries a "what this does not demonstrate" section. These are the checks that turn an
output into evidence.
01
Matched controls
A weight-level result must beat a control trained on the same volume of data, from the same window, differing only in content. Without it, any gain can be argued away as training volume rather than substrate signal.
02
Pre-registered falsifiers
The conditions under which a hypothesis is refuted are written down before the run, including a threshold below which the whole bet is abandoned. The analysis plan is fixed in advance, against unconscious metric-shopping.
03
Mechanism over rhetoric
More compelling-sounding text is explicitly insufficient. A result must move a hard, substrate-grounded metric on held-out states, and survive evaluation with the substrate removed and shuffled, not merely read well.
04
The negative-result paper, pre-written
For the weight-level bet, the paper describing each way it could fail is drafted before the study runs. If we cannot write that paper first, we are not ready to run the study. Either outcome is publishable.
The "does not demonstrate" section is what makes the "does demonstrate" credible.
Not: a disclaimer bolted on at the end. If we cannot write the does-not section, we do not yet understand what we have shown. A cherry-picked compelling output is a demo, not evidence.
Why it is written down
Discipline that lives only in good intentions erodes. So it is a document with authority: any
contributor may halt work on a suspected violation, rollback is the conservative default, and
every striking result is treated as a suspect until its mechanism is traced. The understatement
is deliberate. On this subject, measured language is the more confident position.
Ash Hart
See it applied, result by result →