Proto is not affiliated with Biohub. This toolkit is open source and builds on the implementation produced by this organization. Product names, logos, and trademarks are the property of their respective owners.
Background
ESMFold2 (Candido et al., 2026) extends the ESM family from protein-only single-sequence folding to all-atom prediction of biomolecular complexes. Where the original ESMFold (Lin et al., 2023) used the ESM-2 protein language model as a learned substitute for an MSA and folded a single protein chain into a backbone, ESMFold2 supports proteins, DNA, RNA, small-molecule ligands, modified residues, and covalent bonds in a single joint prediction, comparable in scope to AlphaFold3 and Boltz-2. The model can be run in single-sequence mode, or, when an MSA is available for a protein chain, conditioned on the alignment to recover the evolutionary signal that aids prediction of difficult or sparsely-engineered targets. Architecturally, ESMFold2 conditions on representations from the frozen ESMC 6B language model, pools them into a two-dimensional pair representation refined through a stack of folding layers with a stabilized recurrent update, and concludes with a diffusion transformer that denoises directly into all-atom coordinates. Two inference-time parameters, the number of refinement loops through the folding stack and the number of diffusion sampling steps, trade computation time for accuracy and can materially improve predictions on difficult targets, especially antibody-antigen complexes. Alongside the structure, ESMFold2 reports calibrated confidence: a per-residue predicted local distance difference test (pLDDT), a predicted aligned error (PAE) for the relative placement of any two tokens, and predicted template-modeling (pTM) and interface predicted template-modeling (ipTM) scores that summarize overall and interface accuracy. Two checkpoints are available.esmfold2 is the larger, MSA-capable model recommended for difficult or long targets where alignment signal aids prediction; esmfold2-fast is an inference-optimized single-sequence variant intended for high-throughput applications. Both are distributed under the MIT license at Biohub/esm, the consolidated package that also distributes ESM3 and ESM C.
Learning Resources
- ESMFold2 model card (Biohub) - architecture details, training data, benchmark results, and intended-use guidance for the MSA-capable checkpoint.
Tools
ESMFold2 Structure Prediction (esmfold2-prediction)
Predicts the all-atom 3D structure of a biomolecular complex. Each input complex can combine protein, DNA, RNA, and ligand chains (with optional chain-level modifications and covalent bonds); the assembly is folded by ESMFold2 and returned as a predicted Structure per complex with confidence metrics: pLDDT, pTM, interface pTM (for multi-chain complexes), and predicted aligned error.API Reference
Input: ESMFold2Input
Input: ESMFold2Input
StructurePredictionInput.ComplexMSAs (per-chain MSAs keyed by chain index); paired=True marks rows taxonomy-aligned across chains. Populated by Config.preprocess() or supplied directly. Only consumed when Config.model_checkpoint == "esmfold2".Config: ESMFold2Config
Config: ESMFold2Config
"esmfold2-fast".Available options: esmfold2, esmfold2-fastNone uses the upstream sampler default. Default None.None uses the upstream sampler default. Default None.None uses the upstream default (256.0). Default None.False.True is coerced to 1 and False to 0."cuda". Inherited.None waits indefinitely. Default 1200.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.avg_pae is always emitted). Default False. Inherited.model_checkpoint='esmfold2'. Default False.use_msa=True. Inherited. Default: None.True.Output: ESMFold2Output
Output: ESMFold2Output
ESMFold2Metrics instance on .metrics.structures item)| Metric | Type | Range | Availability |
|---|---|---|---|
plddt | float | 0.0 to 1.0 | always |
ptm | float | 0.0 to 1.0 | always |
iptm | float | 0.0 to 1.0 | depends on complex composition |
avg_pae | float | 0.0 to 32.0 | always |
pae | list[list[float]] | 0.0 to 32.0 | when include_pae_matrix=True |
Applications
This tool predicts the structure of multi-component assemblies such as protein-protein, protein-DNA, protein-RNA, and protein-ligand complexes, including antibody-antigen interfaces where ESMFold2 is reported to be competitive with AlphaFold3. Running it on a multi-chain complex also estimates how confidently the components are placed relative to each other through interface pTM and PAE, which is informative for assessing predicted interfaces.Usage Tips
model_checkpointselects the variant.esmfold2-fast(default) is the inference-optimized single-sequence model and is appropriate for most high-throughput applications; selectesmfold2(withuse_msa=True, or by attaching precomputedmsason the input) for the larger MSA-capable model on difficult or long targets. Settinguse_msa=Truewithesmfold2-fastraises a validation error, andmsassupplied withesmfold2-fastare ignored with a logged warning.num_loops(default3) andnum_sampling_steps(default50) trade computation for accuracy. Both parameters materially affect prediction quality, with the largest gains on difficult targets such as antibody-antigen complexes. Increasing either improves accuracy but extends runtime; decreasing them accelerates high-throughput screens at some accuracy cost.- Multi-modal inputs. Protein, DNA, RNA, and small-molecule ligand chains are supported; ligands can be specified by CCD code or SMILES, and chain modifications and covalent bonds are accepted. SMILES-based ligand input is supported but currently has known accuracy issues; CCD codes are recommended.
- Confidence is reported as pLDDT, pTM, ipTM, and PAE. Mean pLDDT (0 to 1) is the primary per-structure quality metric;
iptmis emitted only for multi-chain complexes, andavg_paeis in angstroms (0 to about 32). Setinclude_pae_matrix=Trueto attach the full per-token PAE matrix.
Toolkit Notes
These apply to every ESMFold2 tool in this toolkit (esmfold2-prediction).
- Requires a GPU. ESMFold2 runs through a PyTorch backend and needs an NVIDIA GPU; CPU execution is not practical.
- Shared
biohub_esmenvironment. ESMFold2 is part of the consolidated Biohub/esm package and shares its standalone environment with the ESM3 and ESM C toolkits, so installing any one of them provisions the others. - AlphaFold3-style diffusion with optional MSAs. Predictions are stochastic, so set
seedfor reproducibility across runs. MSAs are only consumed by theesmfold2checkpoint; theesmfold2-fastcheckpoint is single-sequence by construction. - Structure prediction only. This toolkit provides ESMFold2’s structure prediction capability; the broader ESM family’s language-model, generation, and embedding capabilities are provided by the sibling ESM3 and ESM C toolkits.

Biohub