Skip to main content

Proto is not affiliated with Arc Institute. This toolkit is open source and builds on the implementation produced by this organization. Product names, logos, and trademarks are the property of their respective owners.


SantiagoMille/germinal
SantiagoMille/germinal
Codebase for Germinal, a broadly enabling generative pipeline for efficient generation of epitope-targeted de novo antibodies.
202 stars
View repo
Efficient generation of epitope-targeted de novo antibodies with Germinal
Luis S. Mille-Fragoso, Claudia L. Driscoll, … Xiaojing J. Gao
bioRxiv (2025)
Read preprint
@article{mille_fragoso_2025_germinal,
  title={Efficient generation of epitope-targeted de novo antibodies with Germinal},
  author={Mille-Fragoso, Luis S. and Driscoll, Claudia L. and Wang, John N. and Dai, Haoyu and Widatalla, Talal and Zhang, Jim L. and Zhang, Xiaowei and Rao, Bing and Feng, Liang and Hie, Brian L. and Gao, Xiaojing J.},
  journal={bioRxiv},
  year={2025},
  doi={10.1101/2025.09.19.677421},
  url={https://www.biorxiv.org/content/10.1101/2025.09.19.677421},
  publisher={Cold Spring Harbor Laboratory}
}
Copy citation
proto-bio/proto-tools/proto_tools/tools/binder_design/germinal
View source
Open Notebook
Open notebook
FunctionDescription
run_germinal_design()De novo epitope-targeted antibody design (VHH or scFv) using the Germinal pipeline (GPU) Docs Source
License: Germinal’s own code is licensed under Apache-2.0, but it runs as a pipeline that depends on bundled components and model weights under separate license terms, including non-commercial or restricted-use terms. The bundled model weights are licensed under CC-BY-4.0. As a whole the pipeline has restrictions around commercial use and may require explicit attribution when utilized.Bundled dependencies, each under its own license:
  • PyRosetta: Custom (PyRosetta Software License)
  • IgLM: Custom (IgLM License)
Review the code license and the model weights license before any commercial use or redistribution.

Background

Germinal produces epitope-targeted antibody binders computationally from a target structure and an epitope definition, an alternative to animal- or library-based discovery such as immunization, phage display, and hybridoma screening. The Germinal publication reports experimental binding success rates of 4 to 22 percent across the benchmarks it evaluates. Germinal combines ColabDesign with AlphaFold2-Multimer hallucination, antibody language-model gradients (IgLM, AbLang), AbMPNN sequence redesign (Dreyer et al., 2023), and structure validation against Chai-1, AlphaFold3, or Protenix, followed by PyRosetta interface scoring and a multi-stage filter cascade. Relative to earlier antibody hallucination methods, Germinal additionally applies epitope-hotspot conditioning (the optimization is constrained so the binder contacts the user-specified residues), antibody language-model guidance (biasing designs toward sequences resembling natural antibodies), an early filtering stage that discards weak candidate designs before the computationally expensive structure-prediction step, and a structure-validation model that is independent of the model used during hallucination.

Tools

Germinal Antibody Design (germinal-design)

Runs one complete Germinal antibody-design campaign against a single target. Given a target PDB, a target chain, and the epitope hotspot residues, it runs a fixed (version-pinned) copy of the upstream run_germinal.py script, repeating hallucination, then AbMPNN redesign, then structure validation and filtering until either max_trajectories or max_passing_designs is reached, and returns ranked designs with predicted complex structures and per-design metrics (interface pTM, pAE, pDockQ2, pLDDT, and others). Upstream configuration defaults are preserved exactly; the only change is setting the structure-validation model (structure_model) to Chai-1, because Chai-1 installs automatically.

API Reference

Source
target_pdb
Structure
required
Target structure. Accepts a file path, raw PDB/CIF content string, Structure object, or a dict in the shape produced by Structure.model_dump(mode='json'). Must include a chain matching target_chain.
target_chain
string
default:"A"
Chain ID(s) of the target. Single letter (e.g. "A") or comma-separated for multi-chain targets (e.g. "A,B").
binder_chain
string
default:"B"
Chain ID assigned to the designed binder. Default "B" (matches Germinal’s source convention in configs/target/pdl1.yaml).
hotspots
List[string]
Hotspot residues on the target in "<chain_letter><resnum>" format (e.g. ["A37", "A39", "A41"]). These are the residues the designed binder is forced to contact.
target_name
string
Short identifier for this target. Used as the Hydra target=<name> selector and as a prefix in output filenames. If None, the inference layer derives one from a hash of the PDB content.
hotspot_residue
string
Optional single residue (e.g. "W40") used as the Chai-1 contact-restraint anchor. Mirrors Germinal’s hotspot_residue field in configs/target/*.yaml.
Source
design_type
enum
default:"vhh"
Run preset selector.Available options: vhh, scfv
max_trajectories
integer
default:"10000"
Hard cap on total trajectories before stopping.
max_hallucinated_trajectories
integer
default:"1000"
Cap on trajectories that complete the hallucination stage (before MPNN refinement).
max_passing_designs
integer
default:"100"
Stop early once this many designs pass all final filters.
structure_model
enum
default:"chai"
Cofolding backend for structure validation. Default "chai" (auto-installed); "af3" and "protenix" require user-provisioned weights/env (see README → Backend Configuration).Available options: chai, af3, protenix
plddt_threshold
number
Override final external_plddt (VHH: > 0.87, scFv: > 0.85). Distinct from upstream’s in-loop plddt_threshold save filter — use germinal_overrides.
iptm_threshold
number
Override final external_iptm (preset: > 0.74).
ipae_threshold
number
Override final external_pae in Ã… (VHH: < 7.5, scFv: < 8).
ptm_threshold
number
Override final external_ptm (preset: > 0.84).
pdockq2_threshold
number
Override final pdockq2 (preset: > 0.23).
germinal_overrides
Dict[string, any]
Arbitrary Hydra overrides for run_germinal.py (e.g. {"logits_steps": 100, "weights_iptm": 1.0}). Applied verbatim as <key>=<value> CLI args.
filter_overrides
Dict[string, Dict[string, Dict[string, any]]]
Override filter YAML values. Schema: {"initial" | "final": {<filter_name>: {"value": <v>, "operator": <op>}}}. Merged on top of the design_type preset.
output_dir
string
Optional persistent output directory. If unset, a temp dir is used and discarded after the call.
verbose
integer
default:"0"
Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.
device
string
default:"cuda"
Device for the Germinal subprocess. Forced to "cuda"; CPU is not supported by the upstream pipeline.
timeout
integer
Maximum execution time in seconds. None waits indefinitely.
seed
integer
Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.
Source
designs
List[GerminalDesign]
All produced designs across the accepted, redesign_candidate, and trajectory stages.
pipeline_stats
Dict[string, integer]
Per-stage counts from Germinal’s failure_counts.csv (trajectories attempted, designs accepted, and per-filter failure counts).
Metrics
MetricTypeRangeAvailability
plddtfloat0.0 to 1.0always
ptmfloat0.0 to 1.0always
i_ptmfloat0.0 to 1.0always
i_paefloat≥ 0.0always
paefloat≥ 0.0always
lossfloatunboundedalways
lm_llfloatunboundedalways
helixfloatunboundedalways
beta_strandfloatunboundedalways
clashesint≥ 0.0after filtering
sc_rmsdfloat≥ 0.0after filtering
binder_near_hotspotboolunboundedafter filtering
cdr3_hotspot_contactsint≥ 0.0after filtering
percent_interface_cdrfloat0.0 to 1.0after filtering
interface_shape_compfloat0.0 to 1.0after filtering
interface_hbondsint≥ 0.0after filtering
surface_hydrophobicityfloat0.0 to 1.0after filtering
interface_hydrophobicityfloat≥ 0.0after filtering
pdockq2float0.0 to 1.0after filtering
external_plddtfloat0.0 to 1.0after validation
external_iptmfloat0.0 to 1.0after validation
external_ptmfloat0.0 to 1.0after validation
external_paefloat≥ 0.0after validation
external_i_paefloat≥ 0.0after validation
external_i_plddtfloat0.0 to 1.0after validation
external_plddt_binderfloat0.0 to 1.0after validation
external_chain_ptmfloat0.0 to 1.0after validation
external_binder_paefloat≥ 0.0after validation
external_aggregate_scorefloatunboundedafter validation
ipsaefloat0.0 to 1.0after validation
ipsae_pdockq2float0.0 to 1.0after validation
lis_lisfloatunboundedafter validation
lis_liafloatunboundedafter validation
binder_scorefloatunboundedafter filtering
interface_packstatfloat0.0 to 1.0after filtering
interface_dGfloatunboundedafter filtering
interface_dSASAfloat≥ 0.0after filtering
interface_dG_SASA_ratiofloatunboundedafter filtering
interface_fractionfloat0.0 to 1.0after filtering
interface_nresint≥ 0.0after filtering
interface_hbond_percentagefloat0.0 to 1.0after filtering
interface_delta_unsat_hbondsint≥ 0.0after filtering
interface_delta_unsat_hbonds_percentagefloat0.0 to 1.0after filtering
clashes_unrelaxedint≥ 0.0after filtering
hydrophobic_patches_binderint≥ 0.0after filtering
hydrophobic_patches_structint≥ 0.0after filtering
sap_scorefloatunboundedafter filtering
cdr_sapfloatunboundedafter filtering
cdr_hotspot_contactsint≥ 0.0after filtering
percent_interface_cdr3float0.0 to 1.0after filtering
alpha_interfacefloat0.0 to 1.0after filtering
beta_interfacefloat0.0 to 1.0after filtering
loops_interfacefloat0.0 to 1.0after filtering
alpha_allfloat0.0 to 1.0after filtering
beta_allfloat0.0 to 1.0after filtering
loops_allfloat0.0 to 1.0after filtering
n_framework_mutationsint≥ 0.0after filtering

Applications

This tool performs de novo therapeutic antibody discovery: generating epitope-targeted VHH or scFv binders against a chosen target. It requires only a target structure and an epitope definition, and produces a ranked set of designs with predicted complex structures and per-design quality metrics, ready for selection and experimental testing.

Usage Tips

  • design_type selects the run preset. "vhh" (single-domain nanobody, the default) or "scfv"; each loads the upstream preset with different filter thresholds, so leave the *_threshold fields set to None to let the correct preset apply.
  • Reduce max_trajectories when testing. The upstream default of 10000 corresponds to a run that takes hours to days; use a small value while testing, and set max_passing_designs below max_trajectories to stop early once enough designs pass the filters.
  • structure_model selects the structure-validation model. Validation uses a model independent of the hallucination step. The published filter thresholds were calibrated against AlphaFold3, so acceptance rates may differ under the "chai" default and the *_threshold fields may need adjusting to match the reported rates.
  • germinal_overrides passes additional settings to the underlying pipeline. Any upstream configuration option that is not exposed as a dedicated field can be supplied here as a <key>=<value> pair.

Toolkit Notes

These apply to every Germinal tool in this toolkit (germinal-design).
  • Requires a GPU and can run for a long time. An NVIDIA GPU with at least 40 GB of GPU memory is required (at least 80 GB for scFv mode or targets longer than 250 residues); running on CPU is not supported. A full run with default settings takes hours to days, so reduce max_trajectories when testing.
  • Structure-validation model setup varies. structure_model="chai" needs no manual setup. "af3" requires AlphaFold3 weights that you must request and install yourself (access is restricted; request it through DeepMind’s form) along with a container image. "protenix" requires a separate Protenix environment.
  • Bundled dependencies carry their own licenses. The pipeline requires PyRosetta (academic, non-profit, and government use is governed by the University of Washington CoMotion license; commercial use requires a separate license; redistribution is not permitted; consult the current PyRosetta licensing page for terms and availability) and IgLM (academic, non-commercial use only); Chai-1 is Apache-2.0 and AbLang2 is BSD-3-Clause. See the License note above and the linked terms.
  • One campaign per call. Each call is a single complete run against one target. To screen several targets, call the tool once per target (for example, in a loop) rather than expecting one call to process multiple targets at once.
Example notebook: See the full working example for a copy-paste-ready walkthrough.

Infrastructure Guides

The following guides cover how to run tools efficiently and at scale.

Tool Persistence

Keep a tool’s model warm across calls instead of reloading it every invocation.

Device Management

How GPUs are allocated to tools and how to target specific devices.

Parallel Execution

Fan a batch of inputs out across multiple GPUs.

Cloud Inference

Run tools on managed cloud infrastructure with no local setup.