Skip to main content
License: Boltz-2 is open source and free for academic and commercial use under an MIT license. Please refer to the license for full terms.

Proto is not affiliated with Boltz, MIT Jameel Clinic, and Recursion. This toolkit is open source and builds on the implementations produced by these organizations. Product names, logos, and trademarks are the property of their respective owners.


jwohlwend/boltz
jwohlwend/boltz
Official repository for the Boltz biomolecular interaction models
3.9k stars
View repo
Boltz-1: Democratizing Biomolecular Interaction Modeling
Jeremy Wohlwend, Gabriele Corso, … Regina Barzilay
bioRxiv (2024)
Read preprint
@article{wohlwend2024boltz1,
  title={Boltz-1: Democratizing Biomolecular Interaction Modeling},
  author={Wohlwend, Jeremy and Corso, Gabriele and Passaro, Saro and Reveiz, Mateo and Leidal, Ken and Swanson, Wojtek and Turnbull, Robert and Shuaibi, Muhammed and Ahdritz, Gustaf and Getz, Gad and Jaakkola, Tommi and Barzilay, Regina},
  journal={bioRxiv},
  year={2024},
  doi={10.1101/2024.11.19.624167},
  publisher={Cold Spring Harbor Laboratory}
}
Copy citation
proto-bio/proto-tools/proto_tools/tools/structure_prediction/boltz2
View source
Open Notebook
Open notebook
Coming soon!
Run this tool directly in Proto with no setup required.
FunctionDescription
run_boltz2_affinity()Predicted binding affinity (log10 IC50 μM) and binder probability for a small molecule against a … (GPU) Docs Source
run_boltz2()Multi-modal structure prediction using Boltz2 (GPU) Docs Source

Background

Boltz-2 (Passaro et al., 2025) predicts the joint 3D structure of a biomolecular assembly from the sequences and chemical components it contains. It builds on Boltz-1, one of the most widely used open-source alternatives to AlphaFold3, extending that co-folding model with a binding-affinity module, improved controllability, and additional training data. Like AlphaFold3, a single model folds complexes that mix proteins, DNA, RNA, and small-molecule ligands and predicts how those components are arranged relative to one another. Each protein chain can be paired with a multiple-sequence alignment (MSA) of evolutionarily related sequences, whose covariation patterns supply the evolutionary signal the model uses to place residues. Architecturally, Boltz-2 reproduces AlphaFold3: it carries a single representation of the input tokens and a pairwise representation over token pairs, refines them through an AlphaFold3-style trunk, and generates all-atom coordinates with a diffusion module that starts from noise and iteratively denoises into a structure. Several structures can be sampled per complex and ranked by a confidence score, reported as a complex predicted local distance difference test (pLDDT) for local reliability, a predicted aligned error (PAE) for the relative placement of any two tokens, and predicted template-modeling (pTM) and interface predicted template-modeling (ipTM) scores that summarize overall and interface accuracy. Beyond structure, Boltz-2 adds a binding-affinity module that approaches the accuracy of physics-based free-energy perturbation while running more than 1000 times faster. The reference implementation is open-sourced at jwohlwend/boltz under the MIT license, covering the code, weights, and training pipeline for both academic and commercial use, with the released weights distributed as boltz-community/boltz-2. It was developed by the Boltz team at the MIT Jameel Clinic together with Recursion.

Learning Resources

Tools

Boltz-2 Structure Prediction (boltz2-prediction)

Predicts the 3D structure of a biomolecular complex. Each input complex can combine protein, DNA, RNA, and ligand chains; the assembly is folded by Boltz-2 and returned as a predicted Structure per complex with confidence metrics: a complex pLDDT, pTM, interface pTM, per-chain and pairwise-chain pTM/ipTM, and predicted aligned error.

API Reference

Source
complexes
List[Complex]
required
List of complexes to predict structures for. Inherited from StructurePredictionInput. Each complex can contain multiple chains of proteins, DNA, RNA, and/or ligands.
msas
array
Pre-computed MSAs, one entry per complex. Each entry is a ComplexMSAs (per-chain MSAs keyed by chain index); paired=True marks rows taxonomy-aligned across chains. Populated by preprocess() or supplied directly.
Source
recycling_steps
integer
default:"3"
Iterative refinement passes through the model. Higher = more accurate but slower. Default 3 (matches upstream).
sampling_steps
integer
default:"200"
Denoising steps in the diffusion process. Higher = more refined but slower. Default 200 (matches upstream).
diffusion_samples
integer
default:"1"
Independent structure samples per complex; the best by confidence is returned. Default 1 (matches upstream).
step_scale
number
default:"1.5"
Diffusion step size (typical range 1.0-2.0). Lower = more sample diversity. Default 1.5 (matches upstream).
max_msa_seqs
integer
default:"8192"
Maximum number of MSA sequences fed into the model. Lower to reduce GPU memory on deep MSAs. Default 8192.
subsample_msa
boolean
default:"False"
Randomly subsample the MSA on each run for sample diversity (loses determinism). Default False.
num_workers
integer
default:"4"
Number of CPU workers for parallel processing during prediction. Automatically set to the minimum of available CPU cores or 4. Must be at least 1. Default: min(cpu_count, 4).
verbose
integer
default:"0"
Whether to print status messages during execution including MSA generation, model loading, and prediction progress. Inherited from StructurePredictionConfig. Default: False.
device
string
default:"cuda"
Device to run the model on. Options include "cuda" (NVIDIA GPU), "cpu" (CPU execution), or specific GPU devices like "cuda:0". Structure prediction is computationally intensive and strongly benefits from GPU acceleration. Default: "cuda".
timeout
integer
default:"1200"
Maximum execution time in seconds. None waits indefinitely. Default: 1200.
seed
integer
Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.
include_pae_matrix
boolean
default:"False"
Attach pae (avg_pae always emitted). Default: False.
use_msa
boolean
default:"True"
Whether to generate and use Multiple Sequence Alignments (MSAs) for protein chains using MMseqs2 homology search. Inherited from MSAStructurePredictionConfig. Default: True.
msa_search_config
Mmseqs2HomologySearchConfig
Configuration for MMseqs2 homology search (MSA generation). Only used when use_msa=True. Inherited from MSAStructurePredictionConfig. Default: None.
pair_heterocomplex_msas
boolean
default:"True"
Whether heterocomplex protein chains should use taxonomy-paired MSA generation. Inherited from MSAStructurePredictionConfig. Default: True.
Source
structures
List[Structure]
required
Predicted structures, each carrying a :class:Boltz2Metrics instance on .metrics.
Metrics (one set per structures item)
MetricTypeRangeAvailability
confidence_scorefloat0.0 to 1.0always
ptmfloat0.0 to 1.0always
iptmfloat0.0 to 1.0always
chains_ptmlist[float]0.0 to 1.0always
pair_chains_iptmlist[list[float]]0.0 to 1.0always
avg_paefloat0.0 to 32.0always
paelist[list[float]]0.0 to 32.0when include_pae_matrix=True
ligand_iptmfloat0.0 to 1.0depends on complex composition
protein_iptmfloat0.0 to 1.0depends on complex composition
complex_plddtfloat0.0 to 1.0depends on complex composition
complex_iplddtfloat0.0 to 1.0depends on complex composition
complex_pdefloat≥ 0.0depends on complex composition
complex_ipdefloat≥ 0.0depends on complex composition

Applications

This tool predicts the structure of multi-component assemblies such as protein-DNA and protein-RNA complexes or protein-ligand binding poses. Running it on a multi-chain complex also estimates how confidently the components are placed relative to each other through interface pTM and PAE, which is informative for assessing predicted interfaces.

Usage Tips

  • use_msa defaults to True. A ColabFold search generates an MSA for each protein chain; set it False for single-sequence prediction, or attach precomputed MSAs to the input. Protein chains with no detectable homologs fall back to an empty MSA.
  • Structures come from a diffusion process. diffusion_samples (default 1) independent samples are drawn per complex and the best is kept by confidence_score; sampling_steps (default 200) sets the number of denoising steps and step_scale (default 1.5) trades accuracy for sample diversity, where lower values are more diverse.
  • recycling_steps (default 3) trades accuracy for time. More recycling iterations refine the prediction but increase runtime.
  • Confidence is reported as a complex pLDDT, pTM, ipTM, and PAE. confidence_score, the primary metric, is iptm for multi-chain complexes and ptm for a single chain; complex_plddt is on a 0 to 1 scale and PAE is in angstroms (0 to about 32). Set include_pae_matrix to attach the full per-token PAE matrix.
  • Multi-modal inputs. Protein, DNA, RNA, and ligand entities are supported; chain modifications are not.

Boltz-2 Affinity (boltz2-affinity)

Predicts the binding affinity of a single small-molecule ligand against a protein target. Each input complex must contain at least one protein chain and at least one ligand chain; the binder is the complex’s sole ligand (auto-detected) or the chain named by binder_chain. Each complex returns a predicted Structure with the binding pose in the CIF and the affinity scores on structure.metrics: affinity_pred_value (log10 IC50 in μM; lower is stronger binding) and affinity_probability_binary (binder probability in [0, 1]).

API Reference

Source
binder_chain
SingleChainSelection
Ligand to score; None auto-detects the sole ligand.
complexes
List[Complex]
required
Each needs >=1 protein target and >=1 ligand chain.
msas
array
Inherited per-complex MSAs; each entry is a ComplexMSAs (paired=True for taxonomy-paired heterocomplexes).
Source
affinity_mw_correction
boolean
default:"False"
Apply molecular-weight correction to the affinity value head. Default: False.
sampling_steps_affinity
integer
default:"200"
Denoising steps for the affinity pass. Default: 200.
diffusion_samples_affinity
integer
default:"5"
Diffusion samples per complex for the affinity pass. Default: 5.
verbose
integer
default:"0"
Whether to print status messages during execution including MSA generation, model loading, and prediction progress. Inherited from StructurePredictionConfig. Default: False.
device
string
default:"cuda"
Device to run the model on. Options include "cuda" (NVIDIA GPU), "cpu" (CPU execution), or specific GPU devices like "cuda:0". Structure prediction is computationally intensive and strongly benefits from GPU acceleration. Default: "cuda".
timeout
integer
default:"1200"
Maximum execution time in seconds. None waits indefinitely. Default: 1200.
seed
integer
Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.
include_pae_matrix
boolean
default:"False"
No-op for affinity; excluded from the cache key. Default: False.
use_msa
boolean
default:"True"
Inherited. Use MMseqs2 MSAs for protein chains. Default: True.
msa_search_config
Mmseqs2HomologySearchConfig
Inherited. MMseqs2 homology-search config. Default: None.
pair_heterocomplex_msas
boolean
default:"True"
Inherited. Use taxonomy-paired MSA generation for heterocomplex protein chains. Default: True.
recycling_steps
integer
default:"3"
Inherited. Refinement passes for the structure pass. Default: 3.
sampling_steps
integer
default:"200"
Inherited. Denoising steps for the structure pass. Default: 200.
diffusion_samples
integer
default:"1"
Inherited. Structure samples per complex. Default: 1.
step_scale
number
default:"1.5"
Inherited. Diffusion step size for the structure pass. Default: 1.5.
max_msa_seqs
integer
default:"8192"
Inherited. Cap on MSA depth fed into the model. Default: 8192.
subsample_msa
boolean
default:"False"
Inherited. Randomly subsample the MSA each run. Default: False.
num_workers
integer
default:"4"
Inherited. Dataloader workers for prediction. Default: min(cpu_count, 4).
Source
structures
List[Structure]
required
List of predicted structures, one per input complex. Each structure contains the 3D coordinates in CIF format along with model-specific confidence metrics. The order matches the input complexes order.
Metrics (one set per structures item)
MetricTypeRangeAvailability
affinity_pred_valuefloatunboundedalways
affinity_probability_binaryfloat0.0 to 1.0always
affinity_pred_value1floatunboundedwhen ensemble emits per-model values
affinity_probability_binary1float0.0 to 1.0when ensemble emits per-model values
affinity_pred_value2floatunboundedwhen ensemble emits per-model values
affinity_probability_binary2float0.0 to 1.0when ensemble emits per-model values

Applications

This tool ranks candidate ligands against a chosen protein target, pairing a predicted affinity with a predicted binding pose — supporting hit discovery, structure-activity studies, and library-screening loops over a list of SMILES.

Usage Tips

  • affinity_pred_value is on a log10-IC50 (μM) scale. Values below 0 (sub-μM IC50) indicate strong binders; positive values indicate weaker binding. affinity_probability_binary is an independent binder probability and can stay high even when the IC50 estimate is uncertain.
  • One binder ligand per complex. The binder is auto-detected when a complex has exactly one ligand; set binder_chain (e.g. "B") to name it when a complex has several. The binder must be a ligand chain with at most 128 heavy atoms.
  • Structure-side and affinity-side knobs are independent. recycling_steps, sampling_steps, diffusion_samples, and MSA settings control the structure pass that runs first; sampling_steps_affinity (default 200) and diffusion_samples_affinity (default 5) control the affinity pass. Set affinity_mw_correction to apply Boltz-2’s molecular-weight correction to the affinity value head.
  • Stochastic predictions. The diffusion-based affinity head is stochastic; set seed for reproducibility.

Toolkit Notes

These apply to every Boltz-2 tool in this toolkit (boltz2-prediction, boltz2-affinity).
  • Requires a GPU. Boltz-2 runs through a PyTorch backend and needs an NVIDIA GPU; CPU execution is not practical.
  • MSA-based and AlphaFold3-style. Boltz-2 uses optional MSAs and a diffusion process. subsample_msa and unseeded runs are intentionally non-deterministic.
  • Shared model weights. Both tools run the same bundled Boltz-2 checkpoint; the affinity head ships with it, so boltz2-affinity needs no extra download or environment.
Example notebook: See the full working example for a copy-paste-ready walkthrough.

Infrastructure Guides

The following guides cover how to run tools efficiently and at scale.

Tool Persistence

Keep a tool’s model warm across calls instead of reloading it every invocation.

Device Management

How GPUs are allocated to tools and how to target specific devices.

Parallel Execution

Fan a batch of inputs out across multiple GPUs.

Cloud Inference

Run tools on managed cloud infrastructure with no local setup.