Skip to main content
License: ESMFold is open source and free for academic and commercial use under an MIT license. Please refer to the license for full terms.

Proto is not affiliated with Meta AI and Biohub. This toolkit is open source and builds on the implementations produced by these organizations. Product names, logos, and trademarks are the property of their respective owners.


facebookresearch/esm
facebookresearch/esm
Evolutionary Scale Modeling (esm): Pretrained language models for proteins
4.0k stars
View repo
Evolutionary-scale prediction of atomic-level protein structure with a language model
Zeming Lin, Halil Akin, … Yaniv Shmueli
Science (2023)
Read paper
@article{lin2023esm2,
  title={Evolutionary-scale prediction of atomic-level protein structure with a language model},
  author={Lin, Zeming and Akin, Halil and Rao, Roshan and Hie, Brian and Zhu, Zhongkai and Lu, Wenting and Smetanin, Nikita and Verkuil, Robert and Kabeli, Ori and Shmueli, Yaniv and others},
  journal={Science},
  volume={379},
  number={6637},
  pages={1123--1130},
  year={2023},
  publisher={American Association for the Advancement of Science},
  doi={10.1126/science.ade2574}
}
Copy citation
proto-bio/proto-tools/proto_tools/tools/structure_prediction/esmfold
View source
Open Notebook
Open notebook
Coming soon!
Run this tool directly in Proto with no setup required.
FunctionDescription
run_esmfold_gradient()Differentiable ESMFold confidence loss and gradient w.r.t. target-chain logits (GPU) Docs Source
run_esmfold()Protein structure prediction using ESMFold (GPU) Docs Source

Background

ESMFold (Lin et al., 2023) predicts a protein’s 3D structure directly from its amino-acid sequence, without the multiple-sequence alignment (MSA) that AlphaFold2 depends on. AlphaFold2 infers which residues are in contact by reading coevolution across an alignment of homologous sequences. ESMFold instead relies on the ESM-2 protein language model, which has already internalized those evolutionary patterns by pre-training on hundreds of millions of natural sequences, so it works from the lone sequence with no alignment built at inference time. Skipping the alignment search makes ESMFold roughly an order of magnitude faster than AlphaFold2, at some cost in accuracy on targets where a deep, diverse MSA would otherwise help. The sequence first runs through a frozen ESM-2 transformer (the released model uses the 3-billion-parameter ESM-2), which produces a per-residue representation. A folding trunk, a simplified stand-in for AlphaFold2’s Evoformer, refines that representation, and a structure module reused essentially unchanged from AlphaFold2 then places each residue as a rigid backbone frame to produce all-atom coordinates. The whole prediction is recycled through these stages several times. Alongside the coordinates, ESMFold reports calibrated confidence: a per-residue predicted local distance difference test (pLDDT) for local reliability, a predicted aligned error (PAE) for the expected error in one residue’s position when the structure is aligned on another, and a predicted template-modeling score (pTM) for overall fold confidence. Meta AI open-sources the reference implementation at facebookresearch/esm under the MIT license; the released model is the esmfold_v1 checkpoint, whose structure module is taken from the OpenFold reimplementation of AlphaFold2. Because the language model carries the structural signal, ESM-2’s perplexity on a sequence correlates with how accurate the predicted structure will be, and accuracy continues to improve as the ESM-2 backbone is scaled up. ESMFold’s speed made its headline application possible: Meta AI folded over 600 million metagenomic protein sequences and released them as the ESM Metagenomic Atlas.

Learning Resources

  • ESM Metagenomic Atlas Blog Post (Meta AI) - an overview blog post of the ESM Metagenomic Atlas, which contains structure predictions for nearly the entire MGnify database of metagenomic sequences.

Tools

ESMFold Structure Prediction (esmfold-prediction)

Predicts the 3D structure of one or more protein chains from their sequences. Each input complex (a single chain, or several chains folded together) is run through ESMFold and returned as a predicted Structure per complex with confidence metrics: per-residue pLDDT, a predicted TM-score (pTM), and predicted aligned error.

API Reference

Source
complexes
List[Complex]
required
List of complexes to predict structures for. Inherited from StructurePredictionInput. Each complex can contain one or more protein chains. The linked length actually folded (summed chain residues plus the inter-chain chain_linker, i.e. len(chain_linker) * (num_chains - 1)) must not exceed 2,400.
msas
array
Pre-computed MSAs, one entry per complex. Each entry is a ComplexMSAs (per-chain MSAs keyed by chain index); paired=True marks rows taxonomy-aligned across chains. Populated by preprocess() or supplied directly.
Source
residue_idx_offset
integer
default:"512"
Residue numbering gap between chains in multi-chain structures. Used to ensure proper chain separation in the output PDB/CIF files. Higher values create larger gaps in residue numbering between chains. Must be at least 0. Default: 512.
chain_linker
string
default:"GGGGGGGGGGGGGGGGGGGGGGGGG"
Amino acid sequence used to link chains internally for multi-chain prediction. ESMFold predicts multi-chain complexes by linking chains with a flexible linker sequence (typically glycines). The linker is removed in the final output. Default: 25 glycines ("G" * 25).
max_batch_residues
integer
default:"1200"
Starting cap on total residues per inference batch; auto-halved on CUDA OOM (floor = longest single complex). Must be at least 100. Default: 1200.
num_recycles
integer
default:"4"
Recycling iterations through the structure module.
verbose
integer
default:"0"
Whether to print status messages during execution. Inherited from StructurePredictionConfig. Default: False.
device
string
default:"cuda"
Device to run the model on ("cuda", "cpu"). Inherited from StructurePredictionConfig. Default: "cuda".
timeout
integer
default:"600"
Maximum execution time in seconds. None waits indefinitely.
seed
integer
Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.
include_pae_matrix
boolean
default:"False"
Inherited. Default: False.
Source
structures
List[Structure]
required
Predicted structures, each carrying an :class:ESMFoldMetrics instance on .metrics.
Metrics (one set per structures item)
MetricTypeRangeAvailability
avg_plddtfloat0.0 to 1.0always
ptmfloat0.0 to 1.0depends on model output
avg_paefloat≥ 0.0depends on model output
paelist[list[float]]≥ 0.0when include_pae_matrix=True

Applications

This tool folds a protein sequence into a 3D model for structural analysis or as input to downstream structure based tools. Because ESMFold does not use an MSA, it is well suited to de novo or heavily engineered sequences that have no natural homologs for an alignment to capture.

Usage Tips

  • No MSA or template search is used. ESMFold does not incorporate MSAs into the prediction. There is no use_msa option (unlike Boltz-2 or Protenix), and passing one raises an error; the inherited msas input, by contrast, is hidden and, if supplied, is ignored with a single logged warning.
  • Multi-chain complexes are approximated with an internal glycine linker. chain_linker (default 25 glycines) joins chains before folding and is stripped from the output; this works best for homomeric assemblies and is unreliable for true hetero-complexes. Use AlphaFold3, Boltz2, Chai-1, or Protenix for those.
  • Protein sequences only, with a hard cap of 2,400 residues per complex. DNA, RNA, and ligands are not supported; X is allowed for unknown residues. The cap is enforced against the linked length actually folded: the sum of all chain residues plus the inter-chain chain_linker inserted between them (len(chain_linker) * (chains - 1), 25 residues per junction by default). A multi-chain complex whose bare residues sum to just under 2,400 can still exceed the cap.
  • Confidence is reported as pLDDT, pTM, and PAE. Average pLDDT (0 to 1) is the primary per-structure quality metric; set include_pae_matrix to attach the full per-residue PAE matrix.

ESMFold Gradient (esmfold-gradient)

Runs a single differentiable ESMFold confidence pass: one forward-and-backward gradient evaluation, not an iterative design loop. For one or more designated chains, a relaxed (L, 20) amino-acid distribution replaces the discrete sequence, and ESMFold folds the complex under that soft input. The resulting pLDDT, pTM, and PAE terms are combined into one weighted scalar loss, and a single backward pass returns its gradient with respect to the input logits, along with the loss value, the per-term metrics, and the predicted Structure.

API Reference

Source
chains
List[string]
required
Complete complex chain sequences. Entries listed in target_chain_indices are replaced by the hard decode of logits before folding, but their lengths must match len(logits).
target_chain_indices
List[integer]
default:"[0]"
Chain positions that should receive the relaxed target logits. Repeated target segments in proto-language should pass each occurrence once; gradients are summed through the shared logits tensor.
logits
List[array]
required
Target-chain logits in proto amino-acid order.
temperature
number
default:"1.0"
Softmax temperature for the target-chain relaxed sequence.
Source
loss_weights
Dict[string, number]
Weights for pLDDT, pTM, and pAE losses.
soft
number
default:"1.0"
Soft probability blend for relaxed target sequence.
hard
number
default:"0.0"
Straight-through hard-forward blend for relaxed target sequence.
compute_gradient
boolean
default:"True"
Whether to return the gradient with respect to logits.
verbose
integer
default:"0"
Whether to print status messages during execution. Inherited from StructurePredictionConfig. Default: False.
device
string
default:"cuda"
Device to run the model on ("cuda", "cpu"). Inherited from StructurePredictionConfig. Default: "cuda".
timeout
integer
default:"600"
Maximum execution time in seconds. None waits indefinitely.
seed
integer
Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.
include_pae_matrix
boolean
default:"False"
Attach the full per-residue PAE matrix.
residue_idx_offset
integer
default:"512"
Residue numbering gap between linked chains.
chain_linker
string
default:"GGGGGGGGGGGGGGGGGGGGGGGGG"
Sequence inserted between chains before folding.
max_batch_residues
integer
default:"1200"
Maximum residues per ESMFold inference batch.
num_recycles
integer
default:"4"
Structure module recycling iterations.
Source
structure
Structure
required
Predicted ESMFold complex structure.
gradient
array
Gradient matrix matching the input logits shape.
loss
number
required
Scalar weighted confidence objective value.
metrics
Dict[string, any]
Confidence metrics and per-term unweighted losses.
vocab
List[string]
required
Amino-acid column ordering for logits and gradient.
Metrics
MetricTypeRangeAvailability
avg_plddtfloat0.0 to 1.0always
ptmfloat0.0 to 1.0depends on model output
avg_paefloat≥ 0.0depends on model output
paelist[list[float]]≥ 0.0when include_pae_matrix=True

Applications

This tool supplies the loss and gradient signal that gradient-based or MCMC sequence-design loops optimize for foldability: minimizing the confidence loss pushes a relaxed sequence toward one ESMFold predicts will fold well. With compute_gradient=False it instead provides forward-only confidence scoring (loss, metrics, and predicted structure) of a candidate sequence for ranking or filtering.

Usage Tips

  • One pass per call; this tool is not an optimization loop. It evaluates a single relaxed sequence. Drive it from a sequence-design optimizer, or call it repeatedly, to actually design a sequence.
  • compute_gradient defaults to True. It runs a forward and backward pass and returns the gradient with respect to the input logits; set it False for forward-only scoring (gradient=None). The loss, metrics, and predicted structure are identical in both modes.
  • loss_weights selects and weights the confidence terms. Non-negative weights over plddt, ptm, and pae (default {"plddt": 1.0}); terms with weight 0.0 are skipped, and all-zero weights short-circuit to a zero gradient with loss=0.0.
  • logits and the returned gradient share canonical amino-acid order ACDEFGHIKLMNPQRSTVWY. Every chain listed in target_chain_indices must have length len(logits); non-target chains fold normally with their fixed sequences.
  • soft and hard trade smoothness for discreteness. The default (soft=1.0, hard=0.0) uses pure soft probabilities for smooth optimization; set hard=1.0 for a straight-through estimator (the forward pass sees argmax tokens while gradients still flow through the soft probabilities).

Toolkit Notes

These apply to every ESMFold tool in this toolkit (esmfold-prediction, esmfold-gradient).
  • Requires a GPU. Both tools run ESMFold through a PyTorch backend and need an NVIDIA GPU (roughly 16 GB of VRAM or more for longer sequences); CPU execution is not practical.
  • max_batch_residues is a starting cap, not a hard ceiling. On CUDA OOM the wrapper halves the cap (floor = longest single complex) and re-splits the offending sub-batch, so the default 1200 is usually fine to leave in place.
  • MSA-free and single-sequence. ESMFold folds from one sequence with no alignment or template search. Accuracy is generally lower than MSA-based methods on targets where a deep, diverse MSA would help.
  • num_recycles (default 4) applies to both tools. Each recycling iteration refines the structure; raising it improves accuracy at higher runtime.
Example notebook: See the full working example for a copy-paste-ready walkthrough.

Infrastructure Guides

The following guides cover how to run tools efficiently and at scale.

Tool Persistence

Keep a tool’s model warm across calls instead of reloading it every invocation.

Device Management

How GPUs are allocated to tools and how to target specific devices.

Parallel Execution

Fan a batch of inputs out across multiple GPUs.

Cloud Inference

Run tools on managed cloud infrastructure with no local setup.