v0.1 — Protein Function Reasoning

// protein input

sequence MVLSPADKTNVKAAWGKVGAHAGEYGAEAL...

organism Homo sapiens

// model reasoning

Analyzing primary sequence homology... Conserved domain detected: Globin-like fold. His63 and His92 coordinate heme Fe(II) binding. Context: hemoglobin alpha chain, oxygen transport. Confidence: 0.97

// expert RL signal

→ Reasoning chain validated. Evidence trace: GO:0006810, GO:0021700

Models that
think like
biologists.

Multimodal reasoning LLMs trained on biology tasks with expert RL feedback. Not just predicting function — reasoning through evidence the way a PhD scientist would.

What the model does

Chain-of-thought reasoning over biological context

⊞

Sequence → Structure → Function

Combines ESM2 protein embeddings, AlphaFold structural data, and biological literature into a single reasoning context. Not three models — one chain.

◯

Expert RL Feedback Loop

PhD biologists rate reasoning chains, not just predictions. The model learns what constitutes sound scientific thinking — not pattern-matching to training data.

▦

Explainable Trace Output

Every prediction comes with a human-readable reasoning trace: which domains, sequences, and evidence the model used. Auditable. Citable. Trustworthy.

The approach

Why current models fail at scientific reasoning

ProteinGPT, ESM, AlphaFold — these are exceptional at structure and function prediction. But they are descriptive, not explanatory. They tell you what a protein does. They don't tell you why, or how, or what the evidence for that conclusion is.

A biologist doesn't just classify — they trace evidence across sequence, structure, literature, and mechanistic context. AxiomBio trains models to replicate that chain of reasoning, then uses expert biologist feedback to reinforce it.

modalities in reasoning chain

expert RL over standard fine-tuning

→

verifiable reasoning traces

reasoning pipeline

01 Protein sequence input

↓

02 ESM2 embeddings + AlphaFold structure

↓

03 Biological literature grounding

↓

04 Chain-of-thought reasoning generation

↓

05 Expert biologist RL feedback

↓

06 Reasoning-validated model weights

vs. the field

What makes AxiomBio different

Standard PLM

Multimodal LLM

AxiomBio

Sequence + structure input

✗

✓

Chain-of-thought reasoning

✗

✓

Expert RL feedback (not crowd)

✗

✓

Human-readable reasoning traces

✗

✓

Mechanistic explanation, not just label

✗

✓

where we're headed

The virtual cell is the target. Reasoning is the path.

Protein function is the first task. The reasoning architecture we're building extends to full cellular context — predicting how a protein behaves in a specific cellular environment, how it interacts with other molecules, how it responds to perturbation.

When we can model a virtual cell with mechanistic reasoning instead of statistical pattern-matching, we change what drug discovery looks like. Every experiment faster. Every hypothesis more grounded. Every failed candidate understood before it's synthesized.

2026

Protein function reasoning

Expert RL on biological reasoning chains. First validated benchmarks.

↓

2027

Cellular context modeling

Protein-in-context: reasoning over pathways and subcellular localization.

↓

2028

Virtual cell simulation

Mechanistic reasoning across full cellular network. In silico perturbation at scale.

Models that think like biologists.