Proto is not affiliated with Meta AI and Biohub. This toolkit is open source and builds on the implementations produced by these organizations. Product names, logos, and trademarks are the property of their respective owners.
Background
ESMFold (Lin et al., 2023) predicts a protein’s 3D structure directly from its amino-acid sequence, without the multiple-sequence alignment (MSA) that AlphaFold2 depends on. AlphaFold2 infers which residues are in contact by reading coevolution across an alignment of homologous sequences. ESMFold instead relies on the ESM-2 protein language model, which has already internalized those evolutionary patterns by pre-training on hundreds of millions of natural sequences, so it works from the lone sequence with no alignment built at inference time. Skipping the alignment search makes ESMFold roughly an order of magnitude faster than AlphaFold2, at some cost in accuracy on targets where a deep, diverse MSA would otherwise help. The sequence first runs through a frozen ESM-2 transformer (the released model uses the 3-billion-parameter ESM-2), which produces a per-residue representation. A folding trunk, a simplified stand-in for AlphaFold2’s Evoformer, refines that representation, and a structure module reused essentially unchanged from AlphaFold2 then places each residue as a rigid backbone frame to produce all-atom coordinates. The whole prediction is recycled through these stages several times. Alongside the coordinates, ESMFold reports calibrated confidence: a per-residue predicted local distance difference test (pLDDT) for local reliability, a predicted aligned error (PAE) for the expected error in one residue’s position when the structure is aligned on another, and a predicted template-modeling score (pTM) for overall fold confidence. Meta AI open-sources the reference implementation at facebookresearch/esm under the MIT license; the released model is theesmfold_v1 checkpoint, whose structure module is taken from the OpenFold reimplementation of AlphaFold2. Because the language model carries the structural signal, ESM-2’s perplexity on a sequence correlates with how accurate the predicted structure will be, and accuracy continues to improve as the ESM-2 backbone is scaled up. ESMFold’s speed made its headline application possible: Meta AI folded over 600 million metagenomic protein sequences and released them as the ESM Metagenomic Atlas.
Learning Resources
- ESM Metagenomic Atlas Blog Post (Meta AI) - an overview blog post of the ESM Metagenomic Atlas, which contains structure predictions for nearly the entire MGnify database of metagenomic sequences.
Tools
ESMFold Structure Prediction (esmfold-prediction)
Predicts the 3D structure of one or more protein chains from their sequences. Each input complex (a single chain, or several chains folded together) is run through ESMFold and returned as a predicted Structure per complex with confidence metrics: per-residue pLDDT, a predicted TM-score (pTM), and predicted aligned error.API Reference
Input: ESMFoldInput
Input: ESMFoldInput
StructurePredictionInput. Each complex can contain one or more protein chains. The linked length actually folded (summed chain residues plus the inter-chain chain_linker, i.e. len(chain_linker) * (num_chains - 1)) must not exceed 2,400.ComplexMSAs (per-chain MSAs keyed by chain index); paired=True marks rows taxonomy-aligned across chains. Populated by preprocess() or supplied directly.Config: ESMFoldConfig
Config: ESMFoldConfig
"G" * 25).StructurePredictionConfig. Default: False."cuda", "cpu"). Inherited from StructurePredictionConfig. Default: "cuda".None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.False.Output: ESMFoldOutput
Output: ESMFoldOutput
ESMFoldMetrics instance on .metrics.structures item)| Metric | Type | Range | Availability |
|---|---|---|---|
avg_plddt | float | 0.0 to 1.0 | always |
ptm | float | 0.0 to 1.0 | depends on model output |
avg_pae | float | ≥ 0.0 | depends on model output |
pae | list[list[float]] | ≥ 0.0 | when include_pae_matrix=True |
Applications
This tool folds a protein sequence into a 3D model for structural analysis or as input to downstream structure based tools. Because ESMFold does not use an MSA, it is well suited to de novo or heavily engineered sequences that have no natural homologs for an alignment to capture.Usage Tips
- No MSA or template search is used. ESMFold does not incorporate MSAs into the prediction. There is no
use_msaoption (unlike Boltz-2 or Protenix), and passing one raises an error; the inheritedmsasinput, by contrast, is hidden and, if supplied, is ignored with a single logged warning. - Multi-chain complexes are approximated with an internal glycine linker.
chain_linker(default 25 glycines) joins chains before folding and is stripped from the output; this works best for homomeric assemblies and is unreliable for true hetero-complexes. Use AlphaFold3, Boltz2, Chai-1, or Protenix for those. - Protein sequences only, with a hard cap of 2,400 residues per complex. DNA, RNA, and ligands are not supported;
Xis allowed for unknown residues. The cap is enforced against the linked length actually folded: the sum of all chain residues plus the inter-chainchain_linkerinserted between them (len(chain_linker) * (chains - 1), 25 residues per junction by default). A multi-chain complex whose bare residues sum to just under 2,400 can still exceed the cap. - Confidence is reported as pLDDT, pTM, and PAE. Average pLDDT (0 to 1) is the primary per-structure quality metric; set
include_pae_matrixto attach the full per-residue PAE matrix.
ESMFold Gradient (esmfold-gradient)
Runs a single differentiable ESMFold confidence pass: one forward-and-backward gradient evaluation, not an iterative design loop. For one or more designated chains, a relaxed (L, 20) amino-acid distribution replaces the discrete sequence, and ESMFold folds the complex under that soft input. The resulting pLDDT, pTM, and PAE terms are combined into one weighted scalar loss, and a single backward pass returns its gradient with respect to the input logits, along with the loss value, the per-term metrics, and the predicted Structure.API Reference
Input: ESMFoldGradientInput
Input: ESMFoldGradientInput
target_chain_indices are replaced by the hard decode of logits before folding, but their lengths must match len(logits).Config: ESMFoldGradientConfig
Config: ESMFoldGradientConfig
StructurePredictionConfig. Default: False."cuda", "cpu"). Inherited from StructurePredictionConfig. Default: "cuda".None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.Output: ESMFoldGradientOutput
Output: ESMFoldGradientOutput
| Metric | Type | Range | Availability |
|---|---|---|---|
avg_plddt | float | 0.0 to 1.0 | always |
ptm | float | 0.0 to 1.0 | depends on model output |
avg_pae | float | ≥ 0.0 | depends on model output |
pae | list[list[float]] | ≥ 0.0 | when include_pae_matrix=True |
Applications
This tool supplies the loss and gradient signal that gradient-based or MCMC sequence-design loops optimize for foldability: minimizing the confidence loss pushes a relaxed sequence toward one ESMFold predicts will fold well. Withcompute_gradient=False it instead provides forward-only confidence scoring (loss, metrics, and predicted structure) of a candidate sequence for ranking or filtering.Usage Tips
- One pass per call; this tool is not an optimization loop. It evaluates a single relaxed sequence. Drive it from a sequence-design optimizer, or call it repeatedly, to actually design a sequence.
compute_gradientdefaults toTrue. It runs a forward and backward pass and returns the gradient with respect to the input logits; set itFalsefor forward-only scoring (gradient=None). The loss, metrics, and predicted structure are identical in both modes.loss_weightsselects and weights the confidence terms. Non-negative weights overplddt,ptm, andpae(default{"plddt": 1.0}); terms with weight0.0are skipped, and all-zero weights short-circuit to a zero gradient withloss=0.0.logitsand the returnedgradientshare canonical amino-acid orderACDEFGHIKLMNPQRSTVWY. Every chain listed intarget_chain_indicesmust have lengthlen(logits); non-target chains fold normally with their fixed sequences.softandhardtrade smoothness for discreteness. The default (soft=1.0,hard=0.0) uses pure soft probabilities for smooth optimization; sethard=1.0for a straight-through estimator (the forward pass sees argmax tokens while gradients still flow through the soft probabilities).
Toolkit Notes
These apply to every ESMFold tool in this toolkit (esmfold-prediction, esmfold-gradient).
- Requires a GPU. Both tools run ESMFold through a PyTorch backend and need an NVIDIA GPU (roughly 16 GB of VRAM or more for longer sequences); CPU execution is not practical.
max_batch_residuesis a starting cap, not a hard ceiling. On CUDA OOM the wrapper halves the cap (floor = longest single complex) and re-splits the offending sub-batch, so the default1200is usually fine to leave in place.- MSA-free and single-sequence. ESMFold folds from one sequence with no alignment or template search. Accuracy is generally lower than MSA-based methods on targets where a deep, diverse MSA would help.
num_recycles(default4) applies to both tools. Each recycling iteration refines the structure; raising it improves accuracy at higher runtime.

Meta AI
Biohub