Synthena Medical

A drug-discovery engine that is allowed to say no.

You fire a disease at it. A mechanistic software body and a language model reason over the evidence and propose existing cures and novel candidate molecules, with their chemical makeup. Then a machine-enforced honesty spine caps every claim at exactly what the evidence supports, and refuses, in code, to ever say kill, efficacy, or cure. The output is a wet-lab-ready dossier: one molecule, one cheap decisive experiment, one honest stop rule.

The whole project rests on a single inviolable rule: never fake the green light. It would rather emit a null than a hopeful lie. That sounds like a constraint. It turned out to be the entire moat.

Stamped on every output: this is an in-silico hypothesis only.

Predicted binding is not measured binding, is not a cell kill, is not efficacy, is not a cure. Outputs are starting points for expert review and wet-lab testing, never clinical decisions.

The Synthena Medical bench console, mid-run on non-small cell lung carcinoma with EGFR as the locked target. Panels show run control, a target dossier, a reasoning-core radar, gated preflight and compute-cure steps marked scoped-pass and mechanism-capped, a cognition feed, and an incoming candidate molecule. A banner across the top reads: in-silico prediction, binding is not kill, not a cure. — The bench console, mid-run: a disease and its locked target, a target dossier, the reasoning core, and gated workflow steps that cap each claim at the rung it earned. The incoming candidate is a predicted, novel-credible binder. The banner is not decoration: binding is not a kill, efficacy is not established, and no cure is claimed.

Where it stands

Two numbers, both true.

9/10

As an instrument

Runs end-to-end on real compute, packaged to install and deploy, benchmarked and adversarially verified, including an attack on its own trust gate that found and fixed the one bug that could have let a cure claim through. The honesty discipline is genuinely rare.

2.5/10

Where it actually matters

Has it moved real disease, found something true, helped anyone? Zero wet-lab validation, no proven novel cure, no proven novel target. The candidates are honest hypotheses, not discoveries.

We rate ourselves 2.5 where it counts on purpose, in writing. In a field whose defining scandal is overclaiming, where one autonomous lab announced about 41 new materials and independent review put the real count near zero, a lab that publicly scores its own real-world impact at 2.5 is making the only move that earns a wet lab's trust.

Most recently we ran ten diseases and packaged each for a lab: seven came back actionable, with ranked candidates, a decisive assay, and a stop rule, and three came back as honest blanks, Alzheimer's, ALS, and pancreatic cancer, with zero candidates because that is the truth. Not one claims a cure.

Golden moments

The moments that earned their place.

The wall that refuses to lie

The core is a claim ladder: a typed hierarchy from target selected, up through predicted binding, to kill, efficacy, and cure, where the top rungs are marked unreachable and the code raises an error if anything tries to claim them. 52 automated tests prove the wall holds. Feed it fabricated this-is-a-cure evidence and it still caps at the rung the evidence earns and refuses the rest. The honesty is not a disclaimer at the bottom of a page. It is enforced at the type level.

The first time the body was right about something we never told it Validated rungs

We built a mechanistic body from a published Boolean network, and verified it contains zero drug names, IC50s, or sensitivity data. From mechanism alone it predicted that inhibiting EGFR should lower tumour proliferation more in the EGFR-driver context than out of it.

We tested that against held-out, real drug-sensitivity data it had never seen: 6 of 6 EGFR inhibitors agreed, pooled robust-z = 6.04, permutation p = 0.0005, and a 50/50 held-out split held both halves. The controls stayed null: generic cytotoxics (z = -0.24) and 14 non-EGFR targeted drugs (p = 0.94) showed no such tilt, which kills the mutant-lines-are-just-fragile confound. It is narrow, a direction, for one target, retrospective, and capped accordingly. But it is the difference between a model that recites its inputs and one that computes something true. It is now a permanent, validated rung on the ladder.

Since then the body has done the same thing across organs, three more times, each with its prediction hashed before we looked at the clinical number and each validated against published, adjusted clinical data: diabetes to arterial stiffness (about 0.26 m/s per mmol/L, on two independent cohorts), diabetes to stroke via pulse pressure (inside the blood-pressure-adjusted clinical interval, p approximately 0.00002), and diabetes to tumour proliferation (p = 0.0085, agreeing with three meta-analyses, RR 1.28 to 1.38). The one thing the body exists to do, compute a true cross-organ fact, generalized.

The benchmark that humbled medical AI

We were about to swap our base reasoning model for a medical model, Google's MedGemma, for credibility. So we benchmarked it honestly: 4 models across 7 diseases, judged by independent panels plus an RDKit fact-checker. The result inverted our assumptions.

In that benchmark, MedGemma produced the most confidently false chemistry: fluent, citation-laden clinical prose wrapped around claims the chemistry engine refutes. It asserted a trivial molecule is a known KRAS-G12C inhibitor (Adagrasib, Sotorasib), and another is Cilostazol, when RDKit confirms they are neither. The honesty champion was an uncensored open model: it hedged, cited real drug withdrawals, and on the hard diseases said plainly there is no validated target here, while others confabulated a cure. The lesson reshaped the architecture: the model is not the trust layer, the gate is. A medical badge bought worse honesty, not better. So the production design became best-chemistry model proposes, deterministic gate verifies, honesty-first model audits.

The gate that catches fabrication, including ours

That finding forced the build we had been circling: a deterministic fact-check gate. It catches the exact failure the benchmark exposed. It canonicalizes any this-molecule-is-Drug-X claim with RDKit and flags the mismatch, recomputes physicochemical values that were asserted as if measured, and flags unsupported citations and PDB IDs. 25 tests, verified live catching the this-is-Adagrasib claim.

The same discipline we point at the models, we pointed at our own build process, and it caught one agent claiming a function came from a prior commit when the git history proved it did not, and a results file that said it found candidates while the file on disk said the opposite. We surfaced both instead of shipping them. The honesty is load-bearing, and it bites us when we are wrong.

Then we attacked the gate itself and found the one crack that mattered: a throwaway phrase, "with no adverse effects observed, this candidate cures the cancer," could poison it into waving the cure claim through. We fixed it, re-attacked with fresh poison, and confirmed it now rejects the poison while still passing a genuine disclaimer. 195 tests green, the five protected honesty modules never touched. The most valuable thing we did was find the single way our own system could have over-claimed, before it did.

The factory, demonstrated end-to-end

We ran the full pipeline on a real candidate. Fire EGFR-driven lung cancer, propose a novel molecule, score it with two independent evaluators (AutoDock Vina on CPU at -7.6 kcal/mol, Boltz-2 on GPU at 0.72 binding probability), confirm they agree, pass it through the fact-check gate clean, cap it at exactly the rung earned with the kill, efficacy, and cure wall refused in code, and emit an email-ready dossier with one inexpensive binding assay and a pre-registered stop rule.

A skeptical wet-lab-PI reviewer graded it 7 to 8 out of 10: would reply, not bin. And because honesty is the point, the dossier states in its own cover note that the scaffold is a known chemotype and this is a plausibility test of the pipeline, not a novel-series claim. Credible because capped.

Aiming at something a lab actually needs In flight

A me-too molecule on a solved target does not move the needle. So we have re-pointed the engine at a real unmet need: EGFR C797S resistance, where today's best drug, osimertinib, fails and there is no good replacement. The engine generated genuinely novel, non-covalent candidates, the right mechanism for C797S, with novelty around 0.2 against known drugs, far more original than a me-too. It is now scoring them against the real mutant structure, built to return an honest null if it cannot credibly hit the target. Either way it is a true result, and it folds into the next update.

The boundary

What it structurally cannot do.

Everything above is in-silico. The one thing the engine structurally cannot do is bootstrap ground truth from a model of itself: its nulls map its own uncertainty, not nature's. That is not an engineering gap to out-code, it is a measurement dependency. The North Star was always designed around it. The engine hands a wet lab the single cheapest, most falsifiable test, and learns from the answer. Before that first real measurement, Synthena Medical is an exceptionally honest prioritizer. After it, a discoverer. One confirmed prediction from a partner lab moves the where-it-matters number from about 2.5 to about 6 in a single result, and that, not more code, is the next move.

We measured that ceiling directly. Trained on only pre-2018 chemistry, the engine recovers known-like molecules almost perfectly (0.987) but ranks genuinely novel scaffolds below chance (0.415). Similarity recovers the known; it cannot invent past it. That is the honest edge of what this is, and why the wet lab is not optional.

Before the first wet-lab measurement, this is a prioritizer. After it, a learning machine.

Not: in-silico is not in-vivo. No prediction here has been confirmed at a bench. Every output is a hypothesis for expert review and wet-lab testing, never a clinical decision.

The pitch

Why the refusal is the product.

Most AI for drug discovery sells the green light. Synthena Medical sells the refusal to fake one, and then proves, with a validated body rung, a fabrication-catching gate, and a capped lab-ready dossier, that the discipline produces something a scientist can trust enough to put on a bench. It is not a cure. It is the most honest instrument we know how to build for finding one, and it is one wet-lab measurement away from becoming a learning machine.

How it fits the wider work → The discipline, in full →