Germinal - Proto

Proto is not affiliated with Arc Institute. This toolkit is open source and builds on the implementation produced by this organization. Product names, logos, and trademarks are the property of their respective owners.

GitHub 202 GitHub 202 Preprint Preprint Cite Cite Tool Source Tool Source Open as Notebook Open as Notebook

SantiagoMille/germinal

Codebase for Germinal, a broadly enabling generative pipeline for efficient generation of epitope-targeted de novo antibodies.

202 stars

View repo

Efficient generation of epitope-targeted de novo antibodies with Germinal

Luis S. Mille-Fragoso, Claudia L. Driscoll, … Xiaojing J. Gao

bioRxiv (2025)

Read preprint

@article{mille_fragoso_2025_germinal,
  title={Efficient generation of epitope-targeted de novo antibodies with Germinal},
  author={Mille-Fragoso, Luis S. and Driscoll, Claudia L. and Wang, John N. and Dai, Haoyu and Widatalla, Talal and Zhang, Jim L. and Zhang, Xiaowei and Rao, Bing and Feng, Liang and Hie, Brian L. and Gao, Xiaojing J.},
  journal={bioRxiv},
  year={2025},
  doi={10.1101/2025.09.19.677421},
  url={https://www.biorxiv.org/content/10.1101/2025.09.19.677421},
  publisher={Cold Spring Harbor Laboratory}
}

Copy citation

proto-bio/proto-tools/proto_tools/tools/binder_design/germinal

View source

Open Notebook

Open notebook

Function	Description
`run_germinal_design()`	De novo epitope-targeted antibody design (VHH or scFv) using the Germinal pipeline (GPU)	Docs Source

License: Germinal’s own code is licensed under Apache-2.0, but it runs as a pipeline that depends on bundled components and model weights under separate license terms, including non-commercial or restricted-use terms. The bundled model weights are licensed under CC-BY-4.0. As a whole the pipeline has restrictions around commercial use and may require explicit attribution when utilized.Bundled dependencies, each under its own license:

PyRosetta: Custom (PyRosetta Software License)
IgLM: Custom (IgLM License)

Review the code license and the model weights license before any commercial use or redistribution.

Background

Germinal produces epitope-targeted antibody binders computationally from a target structure and an epitope definition, an alternative to animal- or library-based discovery such as immunization, phage display, and hybridoma screening. The Germinal publication reports experimental binding success rates of 4 to 22 percent across the benchmarks it evaluates. Germinal combines ColabDesign with AlphaFold2-Multimer hallucination, antibody language-model gradients (IgLM, AbLang), AbMPNN sequence redesign (Dreyer et al., 2023), and structure validation against Chai-1, AlphaFold3, or Protenix, followed by PyRosetta interface scoring and a multi-stage filter cascade. Relative to earlier antibody hallucination methods, Germinal additionally applies epitope-hotspot conditioning (the optimization is constrained so the binder contacts the user-specified residues), antibody language-model guidance (biasing designs toward sequences resembling natural antibodies), an early filtering stage that discards weak candidate designs before the computationally expensive structure-prediction step, and a structure-validation model that is independent of the model used during hallucination.

Tools

Germinal Antibody Design (`germinal-design`)

Runs one complete Germinal antibody-design campaign against a single target. Given a target PDB, a target chain, and the epitope hotspot residues, it runs a fixed (version-pinned) copy of the upstream run_germinal.py script, repeating hallucination, then AbMPNN redesign, then structure validation and filtering until either max_trajectories or max_passing_designs is reached, and returns ranked designs with predicted complex structures and per-design metrics (interface pTM, pAE, pDockQ2, pLDDT, and others). Upstream configuration defaults are preserved exactly; the only change is setting the structure-validation model (structure_model) to Chai-1, because Chai-1 installs automatically.

API Reference

Source

Input: GerminalInput

target_pdb

Structure

required

Target structure. Accepts a file path, raw PDB/CIF content string, Structure object, or a dict in the shape produced by Structure.model_dump(mode='json'). Must include a chain matching target_chain.

Show Structure

structure

string

required

Raw structure content in PDB or CIF format.

structure_format

string

Format of the content string (auto-detected if omitted).

b_factor_type

BFactorType

default:"unspecified"

What the B-factor column represents.

source

string

Optional source identifier (filepath or tool name).

metrics

Metrics

Associated metrics (e.g., pLDDT, pTM scores, per-chain lists, pairwise matrices). None values are stripped at construction.

target_chain

string

default:"A"

Chain ID(s) of the target. Single letter (e.g. "A") or comma-separated for multi-chain targets (e.g. "A,B").

binder_chain

string

default:"B"

Chain ID assigned to the designed binder. Default "B" (matches Germinal’s source convention in configs/target/pdl1.yaml).

hotspots

List[string]

Hotspot residues on the target in "<chain_letter><resnum>" format (e.g. ["A37", "A39", "A41"]). These are the residues the designed binder is forced to contact.

target_name

string

Short identifier for this target. Used as the Hydra target=<name> selector and as a prefix in output filenames. If None, the inference layer derives one from a hash of the PDB content.

hotspot_residue

string

Optional single residue (e.g. "W40") used as the Chai-1 contact-restraint anchor. Mirrors Germinal’s hotspot_residue field in configs/target/*.yaml.

Source

Config: GerminalConfig

design_type

enum

default:"vhh"

Run preset selector.Available options: vhh, scfv

max_trajectories

integer

default:"10000"

Hard cap on total trajectories before stopping.

max_hallucinated_trajectories

integer

default:"1000"

Cap on trajectories that complete the hallucination stage (before MPNN refinement).

max_passing_designs

integer

default:"100"

Stop early once this many designs pass all final filters.

structure_model

enum

default:"chai"

Cofolding backend for structure validation. Default "chai" (auto-installed); "af3" and "protenix" require user-provisioned weights/env (see README → Backend Configuration).Available options: chai, af3, protenix

plddt_threshold

number

Override final external_plddt (VHH: > 0.87, scFv: > 0.85). Distinct from upstream’s in-loop plddt_threshold save filter — use germinal_overrides.

iptm_threshold

number

Override final external_iptm (preset: > 0.74).

ipae_threshold

number

Override final external_pae in Å (VHH: < 7.5, scFv: < 8).

ptm_threshold

number

Override final external_ptm (preset: > 0.84).

pdockq2_threshold

number

Override final pdockq2 (preset: > 0.23).

germinal_overrides

Dict[string, any]

Arbitrary Hydra overrides for run_germinal.py (e.g. {"logits_steps": 100, "weights_iptm": 1.0}). Applied verbatim as <key>=<value> CLI args.

filter_overrides

Dict[string, Dict[string, Dict[string, any]]]

Override filter YAML values. Schema: {"initial" | "final": {<filter_name>: {"value": <v>, "operator": <op>}}}. Merged on top of the design_type preset.

output_dir

string

Optional persistent output directory. If unset, a temp dir is used and discarded after the call.

verbose

integer

default:"0"

Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.

device

string

default:"cuda"

Device for the Germinal subprocess. Forced to "cuda"; CPU is not supported by the upstream pipeline.

timeout

integer

Maximum execution time in seconds. None waits indefinitely.

seed

integer

Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.

Source

Output: GerminalOutput

designs

List[GerminalDesign]

All produced designs across the accepted, redesign_candidate, and trajectory stages.

Show GerminalDesign

sequence_heavy

string

required

Heavy chain (or VHH) amino-acid sequence.

sequence_light

string

Light chain sequence (scFv only).

structure

Structure

required

Predicted binder + target complex.

metrics

GerminalDesignMetrics

required

Per-design quality metrics.

stage_passed

enum

required

Highest pipeline stage this design reached.

design_id

string

required

Germinal’s internal design identifier ("<target>_<type>_s<seed>" for trajectory-only designs, "<target>_<type>_s<seed>_abmpnn_<j>" after AbMPNN redesign).

trajectory_index

integer

required

Trajectory seed (parsed from _s<seed>). Germinal uses the seed as its unique trajectory identifier.

mpnn_index

integer

required

AbMPNN sample index (1-based; 0 for trajectory-only designs that never reached the redesign stage).

pipeline_stats

Dict[string, integer]

Per-stage counts from Germinal’s failure_counts.csv (trajectories attempted, designs accepted, and per-filter failure counts).

Metrics

Metric	Type	Range	Availability
`plddt`	float	0.0 to 1.0	always
`ptm`	float	0.0 to 1.0	always
`i_ptm`	float	0.0 to 1.0	always
`i_pae`	float	≥ 0.0	always
`pae`	float	≥ 0.0	always
`loss`	float	unbounded	always
`lm_ll`	float	unbounded	always
`helix`	float	unbounded	always
`beta_strand`	float	unbounded	always
`clashes`	int	≥ 0.0	after filtering
`sc_rmsd`	float	≥ 0.0	after filtering
`binder_near_hotspot`	bool	unbounded	after filtering
`cdr3_hotspot_contacts`	int	≥ 0.0	after filtering
`percent_interface_cdr`	float	0.0 to 1.0	after filtering
`interface_shape_comp`	float	0.0 to 1.0	after filtering
`interface_hbonds`	int	≥ 0.0	after filtering
`surface_hydrophobicity`	float	0.0 to 1.0	after filtering
`interface_hydrophobicity`	float	≥ 0.0	after filtering
`pdockq2`	float	0.0 to 1.0	after filtering
`external_plddt`	float	0.0 to 1.0	after validation
`external_iptm`	float	0.0 to 1.0	after validation
`external_ptm`	float	0.0 to 1.0	after validation
`external_pae`	float	≥ 0.0	after validation
`external_i_pae`	float	≥ 0.0	after validation
`external_i_plddt`	float	0.0 to 1.0	after validation
`external_plddt_binder`	float	0.0 to 1.0	after validation
`external_chain_ptm`	float	0.0 to 1.0	after validation
`external_binder_pae`	float	≥ 0.0	after validation
`external_aggregate_score`	float	unbounded	after validation
`ipsae`	float	0.0 to 1.0	after validation
`ipsae_pdockq2`	float	0.0 to 1.0	after validation
`lis_lis`	float	unbounded	after validation
`lis_lia`	float	unbounded	after validation
`binder_score`	float	unbounded	after filtering
`interface_packstat`	float	0.0 to 1.0	after filtering
`interface_dG`	float	unbounded	after filtering
`interface_dSASA`	float	≥ 0.0	after filtering
`interface_dG_SASA_ratio`	float	unbounded	after filtering
`interface_fraction`	float	0.0 to 1.0	after filtering
`interface_nres`	int	≥ 0.0	after filtering
`interface_hbond_percentage`	float	0.0 to 1.0	after filtering
`interface_delta_unsat_hbonds`	int	≥ 0.0	after filtering
`interface_delta_unsat_hbonds_percentage`	float	0.0 to 1.0	after filtering
`clashes_unrelaxed`	int	≥ 0.0	after filtering
`hydrophobic_patches_binder`	int	≥ 0.0	after filtering
`hydrophobic_patches_struct`	int	≥ 0.0	after filtering
`sap_score`	float	unbounded	after filtering
`cdr_sap`	float	unbounded	after filtering
`cdr_hotspot_contacts`	int	≥ 0.0	after filtering
`percent_interface_cdr3`	float	0.0 to 1.0	after filtering
`alpha_interface`	float	0.0 to 1.0	after filtering
`beta_interface`	float	0.0 to 1.0	after filtering
`loops_interface`	float	0.0 to 1.0	after filtering
`alpha_all`	float	0.0 to 1.0	after filtering
`beta_all`	float	0.0 to 1.0	after filtering
`loops_all`	float	0.0 to 1.0	after filtering
`n_framework_mutations`	int	≥ 0.0	after filtering

Applications

This tool performs de novo therapeutic antibody discovery: generating epitope-targeted VHH or scFv binders against a chosen target. It requires only a target structure and an epitope definition, and produces a ranked set of designs with predicted complex structures and per-design quality metrics, ready for selection and experimental testing.

Usage Tips

design_type selects the run preset. "vhh" (single-domain nanobody, the default) or "scfv"; each loads the upstream preset with different filter thresholds, so leave the *_threshold fields set to None to let the correct preset apply.
Reduce max_trajectories when testing. The upstream default of 10000 corresponds to a run that takes hours to days; use a small value while testing, and set max_passing_designs below max_trajectories to stop early once enough designs pass the filters.
structure_model selects the structure-validation model. Validation uses a model independent of the hallucination step. The published filter thresholds were calibrated against AlphaFold3, so acceptance rates may differ under the "chai" default and the *_threshold fields may need adjusting to match the reported rates.
germinal_overrides passes additional settings to the underlying pipeline. Any upstream configuration option that is not exposed as a dedicated field can be supplied here as a <key>=<value> pair.

Toolkit Notes

These apply to every Germinal tool in this toolkit (germinal-design).

Requires a GPU and can run for a long time. An NVIDIA GPU with at least 40 GB of GPU memory is required (at least 80 GB for scFv mode or targets longer than 250 residues); running on CPU is not supported. A full run with default settings takes hours to days, so reduce max_trajectories when testing.
Structure-validation model setup varies. structure_model="chai" needs no manual setup. "af3" requires AlphaFold3 weights that you must request and install yourself (access is restricted; request it through DeepMind’s form) along with a container image. "protenix" requires a separate Protenix environment.
Bundled dependencies carry their own licenses. The pipeline requires PyRosetta (academic, non-profit, and government use is governed by the University of Washington CoMotion license; commercial use requires a separate license; redistribution is not permitted; consult the current PyRosetta licensing page for terms and availability) and IgLM (academic, non-commercial use only); Chai-1 is Apache-2.0 and AbLang2 is BSD-3-Clause. See the License note above and the linked terms.
One campaign per call. Each call is a single complete run against one target. To screen several targets, call the tool once per target (for example, in a loop) rather than expecting one call to process multiple targets at once.

Example notebook: See the full working example for a copy-paste-ready walkthrough.

Infrastructure Guides

The following guides cover how to run tools efficiently and at scale.

Tool Persistence

Keep a tool’s model warm across calls instead of reloading it every invocation.

Device Management

How GPUs are allocated to tools and how to target specific devices.

Parallel Execution

Fan a batch of inputs out across multiple GPUs.

Cloud Inference

Run tools on managed cloud infrastructure with no local setup.

​Background

​Tools

​Germinal Antibody Design (germinal-design)

​API Reference

​Applications

​Usage Tips

​Toolkit Notes

​Infrastructure Guides

Tool Persistence

Device Management

Parallel Execution

Cloud Inference

Background

Tools

Germinal Antibody Design (`germinal-design`)

API Reference

Applications

Usage Tips

Toolkit Notes

Infrastructure Guides