Method

The strongest claim that survives falsification.

Studying a bounded model wired as a brain inside a mortal body is easy to oversell and hard to keep honest. So the discipline comes first, before any result: what each claim is allowed to say, what it is not, and what has to survive a control before it is written down at all. This page is the firewall: the part of the work that makes the rest worth reading.


The rule we work to

One sentence does the load-bearing.

Every contributor (and there are software agents among them) works to the same line:

Make the strongest claim that survives falsification, then build the next test.

Not artificial-life rhetoric. A claim built to be attacked, pitched exactly as high as the evidence will hold and no higher. Stated that way, it is much harder to dismiss, and much harder for us to fool ourselves with.


The firewall

Three tiers. Claims do not leak upward.

The work lives in three tiers with a hard wall between them. Evidence, claims, and language from a lower tier are never spent on a higher one. When in doubt, we drop down a tier.

v1

Ships: evidence today

Substrate-state cognition

The harness as it stands: a continuous, mortal body governed by its own physics, coupled to a bounded frozen model that reads a compact summary of the body's state and proposes only small parameter nudges (a lean toward repair, or toward seeking food). The body lives or dies on its own physics; the brain nudges but cannot move, eat, or act directly. Its claims are mechanically traceable (substrate event to classified body-state summary to model read to bounded parameter nudge to the body acting on its own physics) and reproducible under deterministic restore. This is the only language allowed in public copy without qualification.

v2

Builds: in development

The same principle, richer substrates

An engineering roadmap that carries the v1 pattern onto new substrates: webcam-on-laptop rather than a grid, place-sense, person-sense, sleep-time memory organisation. The substrate gets richer; the model does not get smarter. v2 claims no new cognitive capacity, and carries an "in development" label until each step passes its own test.

v3

Investigates: pre-registered

A weight-level research bet

A pre-registered investigation into whether bounded sleep-time weight updates on accumulated substrate experience produce measurable, control-separable learning. It is a question, not a capability. Presenting a v3 hypothesis as a project capability before its controls have spoken is the largest violation the firewall recognises.

v1 ships. v2 builds. v3 investigates.

Not: a v1 result is never cited as if it proves a v3 capability, and a v2 engineering goal is never dressed as a v1 finding. The wall is the point.


The principle

The body acts. The brain only leans.

One division of labour holds the architecture together. The body is a continuous, mortal substrate that lives or dies on its own physics: it detects events, accumulates state, and governs movement, energy, and survival directly, emitting structured, classified labels into a channel the brain reads. The frozen model is a bounded brain that reads that compact summary of the body's state and can only propose small parameter nudges (a lean toward repair, or toward seeking food). It cannot move, eat, or act directly, and it does not author the organism's voice or action.

Allowed, the body emitting classified state for the brain to read: continuity_strain=high, energy:well_fed, motor.east:active:east_unreliable.

Forbidden, describing the model as the living organism, or letting the brain reach past its narrow channel to move, eat, or act directly rather than propose a bounded nudge. That collapses the body/brain division and credits the model with maintaining itself rather than helping the body survive. It is the canonical violation.

The principle was found, not assumed.

Not: a design preference. With the model demoted to a bounded brain that only nudges, its learned prior is load-bearing for the body's survival, beating all four controls across two model families. That is not life, consciousness, or self-maintenance, it is one survival task on two models with no reproduction or evolution, and it is what the evidence forced.


What we do not claim

The non-claims are doctrine, not modesty.

These are stated as plainly as the claims. If a piece of copy reaches for any of them, it is wrong and gets dropped a tier.

It runs a structured substrate through a model doing forward passes.

Not: consciousness, sentience, or phenomenal experience. Neither layer implies any of these, and we never write that it feels, experiences, or is aware.

It maintains a self-model and behaves with continuity across a life.

Not: autopoiesis, and not "alive." Held under external stress-test, the claim refused: software runs on hardware it does not itself produce and has no physical metabolism. Boden and Pattee draw that ceiling, and we keep it.

A phrase mutated and inverted as it passed down a lineage of lives.

Not: synthetic life in the substrate-emergence sense, and not open-ended evolution. This is striking heritable transmission, not the unbounded novelty that tradition means by the words.

v3 asks whether experience can shift the weights.

Not: self-improving AI, continual learning, biological learning, or a path to AGI. "Self-improving" is barred from v3 work until an outside reviewer applies it to a published study, if ever.


Controls before conclusions

A striking output is a suspect, not a result.

The model gets more rhetorically convincing long before any underlying claim is earned. The method is built to distrust that feeling. Every result is pre-registered with explicit falsifiers and anchored to the exact version that produced it; every empirical document carries a "what this does not demonstrate" section. These are the checks that turn an output into evidence.

01

Matched controls

A weight-level result must beat a control trained on the same volume of data, from the same window, differing only in content. Without it, any gain can be argued away as training volume rather than substrate signal.

02

Pre-registered falsifiers

The conditions under which a hypothesis is refuted are written down before the run, including a threshold below which the whole bet is abandoned. The analysis plan is fixed in advance, against unconscious metric-shopping.

03

Mechanism over rhetoric

More compelling-sounding text is explicitly insufficient. A result must move a hard, substrate-grounded metric on held-out states, and survive evaluation with the substrate removed and shuffled, not merely read well.

04

The negative-result paper, pre-written

For the weight-level bet, the paper describing each way it could fail is drafted before the study runs. If we cannot write that paper first, we are not ready to run the study. Either outcome is publishable.

The "does not demonstrate" section is what makes the "does demonstrate" credible.

Not: a disclaimer bolted on at the end. If we cannot write the does-not section, we do not yet understand what we have shown. A cherry-picked compelling output is a demo, not evidence.


Why it is written down

Discipline that lives only in good intentions erodes. So it is a document with authority: any contributor may halt work on a suspected violation, rollback is the conservative default, and every striking result is treated as a suspect until its mechanism is traced. The understatement is deliberate. On this subject, measured language is the more confident position.

Ash Hart

See it applied, result by result →