Boltz-2 - Proto

License: Boltz-2 is open source and free for academic and commercial use under an MIT license. Please refer to the license for full terms.

Proto is not affiliated with Boltz, MIT Jameel Clinic, and Recursion. This toolkit is open source and builds on the implementations produced by these organizations. Product names, logos, and trademarks are the property of their respective owners.

GitHub 3.9k GitHub 3.9k Preprint Preprint Cite Cite Tool Source Tool Source Open as Notebook Open as Notebook Open on Proto Open on Proto

jwohlwend/boltz

Official repository for the Boltz biomolecular interaction models

3.9k stars

View repo

Boltz-1: Democratizing Biomolecular Interaction Modeling

Jeremy Wohlwend, Gabriele Corso, … Regina Barzilay

bioRxiv (2024)

Read preprint

@article{wohlwend2024boltz1,
  title={Boltz-1: Democratizing Biomolecular Interaction Modeling},
  author={Wohlwend, Jeremy and Corso, Gabriele and Passaro, Saro and Reveiz, Mateo and Leidal, Ken and Swanson, Wojtek and Turnbull, Robert and Shuaibi, Muhammed and Ahdritz, Gustaf and Getz, Gad and Jaakkola, Tommi and Barzilay, Regina},
  journal={bioRxiv},
  year={2024},
  doi={10.1101/2024.11.19.624167},
  publisher={Cold Spring Harbor Laboratory}
}

Copy citation

proto-bio/proto-tools/proto_tools/tools/structure_prediction/boltz2

View source

Open Notebook

Open notebook

Coming soon!

Run this tool directly in Proto with no setup required.

Function	Description
`run_boltz2_affinity()`	Predicted binding affinity (log10 IC50 μM) and binder probability for a small molecule against a … (GPU)	Docs Source
`run_boltz2()`	Multi-modal structure prediction using Boltz2 (GPU)	Docs Source

Background

Boltz-2 (Passaro et al., 2025) predicts the joint 3D structure of a biomolecular assembly from the sequences and chemical components it contains. It builds on Boltz-1, one of the most widely used open-source alternatives to AlphaFold3, extending that co-folding model with a binding-affinity module, improved controllability, and additional training data. Like AlphaFold3, a single model folds complexes that mix proteins, DNA, RNA, and small-molecule ligands and predicts how those components are arranged relative to one another. Each protein chain can be paired with a multiple-sequence alignment (MSA) of evolutionarily related sequences, whose covariation patterns supply the evolutionary signal the model uses to place residues. Architecturally, Boltz-2 reproduces AlphaFold3: it carries a single representation of the input tokens and a pairwise representation over token pairs, refines them through an AlphaFold3-style trunk, and generates all-atom coordinates with a diffusion module that starts from noise and iteratively denoises into a structure. Several structures can be sampled per complex and ranked by a confidence score, reported as a complex predicted local distance difference test (pLDDT) for local reliability, a predicted aligned error (PAE) for the relative placement of any two tokens, and predicted template-modeling (pTM) and interface predicted template-modeling (ipTM) scores that summarize overall and interface accuracy. Beyond structure, Boltz-2 adds a binding-affinity module that approaches the accuracy of physics-based free-energy perturbation while running more than 1000 times faster. The reference implementation is open-sourced at jwohlwend/boltz under the MIT license, covering the code, weights, and training pipeline for both academic and commercial use, with the released weights distributed as boltz-community/boltz-2. It was developed by the Boltz team at the MIT Jameel Clinic together with Recursion.

Learning Resources

Boltz-2: democratizing biomolecular interaction modeling (MIT Jameel Clinic and Recursion) - an accessible overview of Boltz-2, including how it extends on the work of Boltz-1 and its binding-affinity capability.

Tools

Boltz-2 Structure Prediction (`boltz2-prediction`)

Predicts the 3D structure of a biomolecular complex. Each input complex can combine protein, DNA, RNA, and ligand chains; the assembly is folded by Boltz-2 and returned as a predicted Structure per complex with confidence metrics: a complex pLDDT, pTM, interface pTM, per-chain and pairwise-chain pTM/ipTM, and predicted aligned error.

API Reference

Source

Input: Boltz2Input

complexes

List[Complex]

required

List of complexes to predict structures for. Inherited from StructurePredictionInput. Each complex can contain multiple chains of proteins, DNA, RNA, and/or ligands.

Show Complex

chains

List[Chain | Fragment]

required

Chains in the complex, in input order.

msas

array

Pre-computed MSAs, one entry per complex. Each entry is a ComplexMSAs (per-chain MSAs keyed by chain index); paired=True marks rows taxonomy-aligned across chains. Populated by preprocess() or supplied directly.

Source

Config: Boltz2Config

recycling_steps

integer

default:"3"

Iterative refinement passes through the model. Higher = more accurate but slower. Default 3 (matches upstream).

sampling_steps

integer

default:"200"

Denoising steps in the diffusion process. Higher = more refined but slower. Default 200 (matches upstream).

diffusion_samples

integer

default:"1"

Independent structure samples per complex; the best by confidence is returned. Default 1 (matches upstream).

step_scale

number

default:"1.5"

Diffusion step size (typical range 1.0-2.0). Lower = more sample diversity. Default 1.5 (matches upstream).

max_msa_seqs

integer

default:"8192"

Maximum number of MSA sequences fed into the model. Lower to reduce GPU memory on deep MSAs. Default 8192.

subsample_msa

boolean

default:"False"

Randomly subsample the MSA on each run for sample diversity (loses determinism). Default False.

num_workers

integer

default:"4"

Number of CPU workers for parallel processing during prediction. Automatically set to the minimum of available CPU cores or 4. Must be at least 1. Default: min(cpu_count, 4).

verbose

integer

default:"0"

Whether to print status messages during execution including MSA generation, model loading, and prediction progress. Inherited from StructurePredictionConfig. Default: False.

device

string

default:"cuda"

Device to run the model on. Options include "cuda" (NVIDIA GPU), "cpu" (CPU execution), or specific GPU devices like "cuda:0". Structure prediction is computationally intensive and strongly benefits from GPU acceleration. Default: "cuda".

timeout

integer

default:"1200"

Maximum execution time in seconds. None waits indefinitely. Default: 1200.

seed

integer

Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.

include_pae_matrix

boolean

default:"False"

Attach pae (avg_pae always emitted). Default: False.

use_msa

boolean

default:"True"

Whether to generate and use Multiple Sequence Alignments (MSAs) for protein chains using MMseqs2 homology search. Inherited from MSAStructurePredictionConfig. Default: True.

msa_search_config

Mmseqs2HomologySearchConfig

Configuration for MMseqs2 homology search (MSA generation). Only used when use_msa=True. Inherited from MSAStructurePredictionConfig. Default: None.

pair_heterocomplex_msas

boolean

default:"True"

Whether heterocomplex protein chains should use taxonomy-paired MSA generation. Inherited from MSAStructurePredictionConfig. Default: True.

Source

Output: Boltz2Output

structures

List[Structure]

required

Predicted structures, each carrying a :class:Boltz2Metrics instance on .metrics.

Show Structure

structure

string

required

Raw structure content in PDB or CIF format.

structure_format

string

Format of the content string (auto-detected if omitted).

b_factor_type

BFactorType

What the B-factor column represents.

source

string

Optional source identifier (filepath or tool name).

metrics

Metrics

Associated metrics (e.g., pLDDT, pTM scores, per-chain lists, pairwise matrices). None values are stripped at construction.

Metrics (one set per structures item)

Metric	Type	Range	Availability
`confidence_score`	float	0.0 to 1.0	always
`ptm`	float	0.0 to 1.0	always
`iptm`	float	0.0 to 1.0	always
`chains_ptm`	list[float]	0.0 to 1.0	always
`pair_chains_iptm`	list[list[float]]	0.0 to 1.0	always
`avg_pae`	float	0.0 to 32.0	always
`pae`	list[list[float]]	0.0 to 32.0	when include_pae_matrix=True
`ligand_iptm`	float	0.0 to 1.0	depends on complex composition
`protein_iptm`	float	0.0 to 1.0	depends on complex composition
`complex_plddt`	float	0.0 to 1.0	depends on complex composition
`complex_iplddt`	float	0.0 to 1.0	depends on complex composition
`complex_pde`	float	≥ 0.0	depends on complex composition
`complex_ipde`	float	≥ 0.0	depends on complex composition

Applications

This tool predicts the structure of multi-component assemblies such as protein-DNA and protein-RNA complexes or protein-ligand binding poses. Running it on a multi-chain complex also estimates how confidently the components are placed relative to each other through interface pTM and PAE, which is informative for assessing predicted interfaces.

Usage Tips

use_msa defaults to True. A ColabFold search generates an MSA for each protein chain; set it False for single-sequence prediction, or attach precomputed MSAs to the input. Protein chains with no detectable homologs fall back to an empty MSA.
Structures come from a diffusion process. diffusion_samples (default 1) independent samples are drawn per complex and the best is kept by confidence_score; sampling_steps (default 200) sets the number of denoising steps and step_scale (default 1.5) trades accuracy for sample diversity, where lower values are more diverse.
recycling_steps (default 3) trades accuracy for time. More recycling iterations refine the prediction but increase runtime.
Confidence is reported as a complex pLDDT, pTM, ipTM, and PAE. confidence_score, the primary metric, is iptm for multi-chain complexes and ptm for a single chain; complex_plddt is on a 0 to 1 scale and PAE is in angstroms (0 to about 32). Set include_pae_matrix to attach the full per-token PAE matrix.
Multi-modal inputs. Protein, DNA, RNA, and ligand entities are supported; chain modifications are not.

Boltz-2 Affinity (`boltz2-affinity`)

Predicts the binding affinity of a single small-molecule ligand against a protein target. Each input complex must contain at least one protein chain and at least one ligand chain; the binder is the complex’s sole ligand (auto-detected) or the chain named by binder_chain. Each complex returns a predicted Structure with the binding pose in the CIF and the affinity scores on structure.metrics: affinity_pred_value (log10 IC50 in μM; lower is stronger binding) and affinity_probability_binary (binder probability in [0, 1]).

API Reference

Source

Input: Boltz2AffinityInput

binder_chain

SingleChainSelection

Ligand to score; None auto-detects the sole ligand.

complexes

List[Complex]

required

Each needs >=1 protein target and >=1 ligand chain.

Show Complex

chains

List[Chain | Fragment]

required

Chains in the complex, in input order.

msas

array

Inherited per-complex MSAs; each entry is a ComplexMSAs (paired=True for taxonomy-paired heterocomplexes).

Source

Config: Boltz2AffinityConfig

affinity_mw_correction

boolean

default:"False"

Apply molecular-weight correction to the affinity value head. Default: False.

sampling_steps_affinity

integer

default:"200"

Denoising steps for the affinity pass. Default: 200.

diffusion_samples_affinity

integer

default:"5"

Diffusion samples per complex for the affinity pass. Default: 5.

verbose

integer

default:"0"

Whether to print status messages during execution including MSA generation, model loading, and prediction progress. Inherited from StructurePredictionConfig. Default: False.

device

string

default:"cuda"

timeout

integer

default:"1200"

Maximum execution time in seconds. None waits indefinitely. Default: 1200.

seed

integer

include_pae_matrix

boolean

default:"False"

No-op for affinity; excluded from the cache key. Default: False.

use_msa

boolean

default:"True"

Inherited. Use MMseqs2 MSAs for protein chains. Default: True.

msa_search_config

Mmseqs2HomologySearchConfig

Inherited. MMseqs2 homology-search config. Default: None.

pair_heterocomplex_msas

boolean

default:"True"

Inherited. Use taxonomy-paired MSA generation for heterocomplex protein chains. Default: True.

recycling_steps

integer

default:"3"

Inherited. Refinement passes for the structure pass. Default: 3.

sampling_steps

integer

default:"200"

Inherited. Denoising steps for the structure pass. Default: 200.

diffusion_samples

integer

default:"1"

Inherited. Structure samples per complex. Default: 1.

step_scale

number

default:"1.5"

Inherited. Diffusion step size for the structure pass. Default: 1.5.

max_msa_seqs

integer

default:"8192"

Inherited. Cap on MSA depth fed into the model. Default: 8192.

subsample_msa

boolean

default:"False"

Inherited. Randomly subsample the MSA each run. Default: False.

num_workers

integer

default:"4"

Inherited. Dataloader workers for prediction. Default: min(cpu_count, 4).

Source

Output: Boltz2AffinityOutput

structures

List[Structure]

required

List of predicted structures, one per input complex. Each structure contains the 3D coordinates in CIF format along with model-specific confidence metrics. The order matches the input complexes order.

Show Structure

structure

string

required

Raw structure content in PDB or CIF format.

structure_format

string

Format of the content string (auto-detected if omitted).

b_factor_type

BFactorType

What the B-factor column represents.

source

string

Optional source identifier (filepath or tool name).

metrics

Metrics

Associated metrics (e.g., pLDDT, pTM scores, per-chain lists, pairwise matrices). None values are stripped at construction.

Metrics (one set per structures item)

Metric	Type	Range	Availability
`affinity_pred_value`	float	unbounded	always
`affinity_probability_binary`	float	0.0 to 1.0	always
`affinity_pred_value1`	float	unbounded	when ensemble emits per-model values
`affinity_probability_binary1`	float	0.0 to 1.0	when ensemble emits per-model values
`affinity_pred_value2`	float	unbounded	when ensemble emits per-model values
`affinity_probability_binary2`	float	0.0 to 1.0	when ensemble emits per-model values

Applications

This tool ranks candidate ligands against a chosen protein target, pairing a predicted affinity with a predicted binding pose — supporting hit discovery, structure-activity studies, and library-screening loops over a list of SMILES.

Usage Tips

affinity_pred_value is on a log10-IC50 (μM) scale. Values below 0 (sub-μM IC50) indicate strong binders; positive values indicate weaker binding. affinity_probability_binary is an independent binder probability and can stay high even when the IC50 estimate is uncertain.
One binder ligand per complex. The binder is auto-detected when a complex has exactly one ligand; set binder_chain (e.g. "B") to name it when a complex has several. The binder must be a ligand chain with at most 128 heavy atoms.
Structure-side and affinity-side knobs are independent. recycling_steps, sampling_steps, diffusion_samples, and MSA settings control the structure pass that runs first; sampling_steps_affinity (default 200) and diffusion_samples_affinity (default 5) control the affinity pass. Set affinity_mw_correction to apply Boltz-2’s molecular-weight correction to the affinity value head.
Stochastic predictions. The diffusion-based affinity head is stochastic; set seed for reproducibility.

Toolkit Notes

These apply to every Boltz-2 tool in this toolkit (boltz2-prediction, boltz2-affinity).

Requires a GPU. Boltz-2 runs through a PyTorch backend and needs an NVIDIA GPU; CPU execution is not practical.
MSA-based and AlphaFold3-style. Boltz-2 uses optional MSAs and a diffusion process. subsample_msa and unseeded runs are intentionally non-deterministic.
Shared model weights. Both tools run the same bundled Boltz-2 checkpoint; the affinity head ships with it, so boltz2-affinity needs no extra download or environment.

Example notebook: See the full working example for a copy-paste-ready walkthrough.

Infrastructure Guides

The following guides cover how to run tools efficiently and at scale.

Tool Persistence

Keep a tool’s model warm across calls instead of reloading it every invocation.

Device Management

How GPUs are allocated to tools and how to target specific devices.

Parallel Execution

Fan a batch of inputs out across multiple GPUs.

Cloud Inference

Run tools on managed cloud infrastructure with no local setup.

​Background

​Learning Resources

​Tools

​Boltz-2 Structure Prediction (boltz2-prediction)

​API Reference

​Applications

​Usage Tips

​Boltz-2 Affinity (boltz2-affinity)

​API Reference

​Applications

​Usage Tips

​Toolkit Notes

​Infrastructure Guides

Tool Persistence

Device Management

Parallel Execution

Cloud Inference

Background

Learning Resources

Tools

Boltz-2 Structure Prediction (`boltz2-prediction`)

API Reference

Applications

Usage Tips

Boltz-2 Affinity (`boltz2-affinity`)

API Reference

Applications

Usage Tips

Toolkit Notes

Infrastructure Guides