← Notebook

Synthena Medical: the engine that refuses to fake a green light

This is a medical update, the sister effort to the artificial-life notebook. The full state-of-the-engine write-up, with every number, lives on its own page: Synthena Medical.

The one rule

You fire a disease at it. A mechanistic software body and a language model reason over the evidence and propose existing cures and novel candidate molecules, with their chemical makeup. Then a machine-enforced honesty spine caps every claim at exactly what the evidence supports, and refuses, in code, to ever say kill, efficacy, or cure. The whole project rests on a single rule: never fake the green light. It would rather emit a null than a hopeful lie.

What this update logs

A few things earned their place since the last medical pass.

The first time the body was right about something we never told it. We built a mechanistic body from a published Boolean network, verified it holds zero drug names or sensitivity data, and from mechanism alone it predicted that inhibiting EGFR should lower proliferation more in the EGFR-driver context than out of it. Against held-out, real drug-sensitivity data it had never seen, 6 of 6 EGFR inhibitors agreed, with the controls staying null. It is narrow, directional, and retrospective, and it is capped accordingly, but it is a model computing something true rather than reciting its inputs. It is now a validated rung on the claim ladder.

A gate that catches fabrication, including ours. A benchmark taught us the lesson the hard way: a medical-badged model produced the most confidently false chemistry in our four-model test, while an uncensored open model was the honesty champion. So the trust does not live in the model, it lives in a deterministic fact-check gate that canonicalizes molecule-identity claims, recomputes values asserted as if measured, and flags unsupported citations. The same gate we point at the models, we point at our own build process, and it has caught our own mistakes before they shipped.

The factory, demonstrated end-to-end. One real candidate, scored by two independent evaluators that agreed, passed clean through the gate, capped at the rung it earned, and emitted as an email-ready dossier with one cheap decisive assay and a pre-registered stop rule. A skeptical wet-lab reviewer graded it would-reply, not bin, and the dossier says in its own cover note that this is a plausibility test of the pipeline, not a novel-series claim. Credible because capped.

Where it honestly stands

Two numbers, both true. As an instrument it is about a 7 out of 10: it runs end-to-end on real compute, it is packaged to deploy, and it has been benchmarked and adversarially verified. Where it actually matters, has it moved real disease, found something true, helped anyone, it is about a 2.5 out of 10: zero wet-lab validation, no proven novel cure, no proven novel target. We publish the 2.5 on purpose. In a field whose defining scandal is overclaiming, a lab that scores its own real-world impact honestly is making the only move that earns a wet lab’s trust.

Everything here is in-silico. The engine cannot bootstrap ground truth from a model of itself, so the next move is not more code, it is the first real measurement: hand a wet lab the single cheapest, most falsifiable test, and learn from the answer. Before that, this is an exceptionally honest prioritizer. After it, a learning machine.

The full write-up, with the controls and the exact numbers, is on the Synthena Medical page.