ESMFold - Proto

License: ESMFold is open source and free for academic and commercial use under an MIT license. Please refer to the license for full terms.

Proto is not affiliated with Meta AI and Biohub. This toolkit is open source and builds on the implementations produced by these organizations. Product names, logos, and trademarks are the property of their respective owners.

GitHub 4.0k GitHub 4.0k Publication Publication Cite Cite Tool Source Tool Source Open as Notebook Open as Notebook Open on Proto Open on Proto

facebookresearch/esm

Evolutionary Scale Modeling (esm): Pretrained language models for proteins

4.0k stars

View repo

Evolutionary-scale prediction of atomic-level protein structure with a language model

Zeming Lin, Halil Akin, … Yaniv Shmueli

Science (2023)

Read paper

@article{lin2023esm2,
  title={Evolutionary-scale prediction of atomic-level protein structure with a language model},
  author={Lin, Zeming and Akin, Halil and Rao, Roshan and Hie, Brian and Zhu, Zhongkai and Lu, Wenting and Smetanin, Nikita and Verkuil, Robert and Kabeli, Ori and Shmueli, Yaniv and others},
  journal={Science},
  volume={379},
  number={6637},
  pages={1123--1130},
  year={2023},
  publisher={American Association for the Advancement of Science},
  doi={10.1126/science.ade2574}
}

Copy citation

proto-bio/proto-tools/proto_tools/tools/structure_prediction/esmfold

View source

Open Notebook

Open notebook

Coming soon!

Run this tool directly in Proto with no setup required.

Function	Description
`run_esmfold_gradient()`	Differentiable ESMFold confidence loss and gradient w.r.t. target-chain logits (GPU)	Docs Source
`run_esmfold()`	Protein structure prediction using ESMFold (GPU)	Docs Source

Background

ESMFold (Lin et al., 2023) predicts a protein’s 3D structure directly from its amino-acid sequence, without the multiple-sequence alignment (MSA) that AlphaFold2 depends on. AlphaFold2 infers which residues are in contact by reading coevolution across an alignment of homologous sequences. ESMFold instead relies on the ESM-2 protein language model, which has already internalized those evolutionary patterns by pre-training on hundreds of millions of natural sequences, so it works from the lone sequence with no alignment built at inference time. Skipping the alignment search makes ESMFold roughly an order of magnitude faster than AlphaFold2, at some cost in accuracy on targets where a deep, diverse MSA would otherwise help. The sequence first runs through a frozen ESM-2 transformer (the released model uses the 3-billion-parameter ESM-2), which produces a per-residue representation. A folding trunk, a simplified stand-in for AlphaFold2’s Evoformer, refines that representation, and a structure module reused essentially unchanged from AlphaFold2 then places each residue as a rigid backbone frame to produce all-atom coordinates. The whole prediction is recycled through these stages several times. Alongside the coordinates, ESMFold reports calibrated confidence: a per-residue predicted local distance difference test (pLDDT) for local reliability, a predicted aligned error (PAE) for the expected error in one residue’s position when the structure is aligned on another, and a predicted template-modeling score (pTM) for overall fold confidence. Meta AI open-sources the reference implementation at facebookresearch/esm under the MIT license; the released model is the esmfold_v1 checkpoint, whose structure module is taken from the OpenFold reimplementation of AlphaFold2. Because the language model carries the structural signal, ESM-2’s perplexity on a sequence correlates with how accurate the predicted structure will be, and accuracy continues to improve as the ESM-2 backbone is scaled up. ESMFold’s speed made its headline application possible: Meta AI folded over 600 million metagenomic protein sequences and released them as the ESM Metagenomic Atlas.

Learning Resources

ESM Metagenomic Atlas Blog Post (Meta AI) - an overview blog post of the ESM Metagenomic Atlas, which contains structure predictions for nearly the entire MGnify database of metagenomic sequences.

Tools

ESMFold Structure Prediction (`esmfold-prediction`)

Predicts the 3D structure of one or more protein chains from their sequences. Each input complex (a single chain, or several chains folded together) is run through ESMFold and returned as a predicted Structure per complex with confidence metrics: per-residue pLDDT, a predicted TM-score (pTM), and predicted aligned error.

API Reference

Source

Input: ESMFoldInput

complexes

List[Complex]

required

List of complexes to predict structures for. Inherited from StructurePredictionInput. Each complex can contain one or more protein chains. The linked length actually folded (summed chain residues plus the inter-chain chain_linker, i.e. len(chain_linker) * (num_chains - 1)) must not exceed 2,400.

Show Complex

chains

List[Chain | Fragment]

required

Chains in the complex, in input order.

msas

array

Pre-computed MSAs, one entry per complex. Each entry is a ComplexMSAs (per-chain MSAs keyed by chain index); paired=True marks rows taxonomy-aligned across chains. Populated by preprocess() or supplied directly.

Source

Config: ESMFoldConfig

residue_idx_offset

integer

default:"512"

Residue numbering gap between chains in multi-chain structures. Used to ensure proper chain separation in the output PDB/CIF files. Higher values create larger gaps in residue numbering between chains. Must be at least 0. Default: 512.

chain_linker

string

default:"GGGGGGGGGGGGGGGGGGGGGGGGG"

Amino acid sequence used to link chains internally for multi-chain prediction. ESMFold predicts multi-chain complexes by linking chains with a flexible linker sequence (typically glycines). The linker is removed in the final output. Default: 25 glycines ("G" * 25).

max_batch_residues

integer

default:"1200"

Starting cap on total residues per inference batch; auto-halved on CUDA OOM (floor = longest single complex). Must be at least 100. Default: 1200.

num_recycles

integer

default:"4"

Recycling iterations through the structure module.

verbose

integer

default:"0"

Whether to print status messages during execution. Inherited from StructurePredictionConfig. Default: False.

device

string

default:"cuda"

Device to run the model on ("cuda", "cpu"). Inherited from StructurePredictionConfig. Default: "cuda".

timeout

integer

default:"600"

Maximum execution time in seconds. None waits indefinitely.

seed

integer

Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.

include_pae_matrix

boolean

default:"False"

Inherited. Default: False.

Source

Output: ESMFoldOutput

structures

List[Structure]

required

Predicted structures, each carrying an :class:ESMFoldMetrics instance on .metrics.

Show Structure

structure

string

required

Raw structure content in PDB or CIF format.

structure_format

string

Format of the content string (auto-detected if omitted).

b_factor_type

BFactorType

What the B-factor column represents.

source

string

Optional source identifier (filepath or tool name).

metrics

Metrics

Associated metrics (e.g., pLDDT, pTM scores, per-chain lists, pairwise matrices). None values are stripped at construction.

Metrics (one set per structures item)

Metric	Type	Range	Availability
`avg_plddt`	float	0.0 to 1.0	always
`ptm`	float	0.0 to 1.0	depends on model output
`avg_pae`	float	≥ 0.0	depends on model output
`pae`	list[list[float]]	≥ 0.0	when include_pae_matrix=True

Applications

This tool folds a protein sequence into a 3D model for structural analysis or as input to downstream structure based tools. Because ESMFold does not use an MSA, it is well suited to de novo or heavily engineered sequences that have no natural homologs for an alignment to capture.

Usage Tips

No MSA or template search is used. ESMFold does not incorporate MSAs into the prediction. There is no use_msa option (unlike Boltz-2 or Protenix), and passing one raises an error; the inherited msas input, by contrast, is hidden and, if supplied, is ignored with a single logged warning.
Multi-chain complexes are approximated with an internal glycine linker. chain_linker (default 25 glycines) joins chains before folding and is stripped from the output; this works best for homomeric assemblies and is unreliable for true hetero-complexes. Use AlphaFold3, Boltz2, Chai-1, or Protenix for those.
Protein sequences only, with a hard cap of 2,400 residues per complex. DNA, RNA, and ligands are not supported; X is allowed for unknown residues. The cap is enforced against the linked length actually folded: the sum of all chain residues plus the inter-chain chain_linker inserted between them (len(chain_linker) * (chains - 1), 25 residues per junction by default). A multi-chain complex whose bare residues sum to just under 2,400 can still exceed the cap.
Confidence is reported as pLDDT, pTM, and PAE. Average pLDDT (0 to 1) is the primary per-structure quality metric; set include_pae_matrix to attach the full per-residue PAE matrix.

ESMFold Gradient (`esmfold-gradient`)

Runs a single differentiable ESMFold confidence pass: one forward-and-backward gradient evaluation, not an iterative design loop. For one or more designated chains, a relaxed (L, 20) amino-acid distribution replaces the discrete sequence, and ESMFold folds the complex under that soft input. The resulting pLDDT, pTM, and PAE terms are combined into one weighted scalar loss, and a single backward pass returns its gradient with respect to the input logits, along with the loss value, the per-term metrics, and the predicted Structure.

API Reference

Source

Input: ESMFoldGradientInput

chains

List[string]

required

Complete complex chain sequences. Entries listed in target_chain_indices are replaced by the hard decode of logits before folding, but their lengths must match len(logits).

target_chain_indices

List[integer]

default:"[0]"

Chain positions that should receive the relaxed target logits. Repeated target segments in proto-language should pass each occurrence once; gradients are summed through the shared logits tensor.

logits

List[array]

required

Target-chain logits in proto amino-acid order.

temperature

number

default:"1.0"

Softmax temperature for the target-chain relaxed sequence.

Source

Config: ESMFoldGradientConfig

loss_weights

Dict[string, number]

Weights for pLDDT, pTM, and pAE losses.

soft

number

default:"1.0"

Soft probability blend for relaxed target sequence.

hard

number

default:"0.0"

Straight-through hard-forward blend for relaxed target sequence.

compute_gradient

boolean

default:"True"

Whether to return the gradient with respect to logits.

verbose

integer

default:"0"

Whether to print status messages during execution. Inherited from StructurePredictionConfig. Default: False.

device

string

default:"cuda"

Device to run the model on ("cuda", "cpu"). Inherited from StructurePredictionConfig. Default: "cuda".

timeout

integer

default:"600"

Maximum execution time in seconds. None waits indefinitely.

seed

integer

include_pae_matrix

boolean

default:"False"

Attach the full per-residue PAE matrix.

residue_idx_offset

integer

default:"512"

Residue numbering gap between linked chains.

chain_linker

string

default:"GGGGGGGGGGGGGGGGGGGGGGGGG"

Sequence inserted between chains before folding.

max_batch_residues

integer

default:"1200"

Maximum residues per ESMFold inference batch.

num_recycles

integer

default:"4"

Structure module recycling iterations.

Source

Output: ESMFoldGradientOutput

structure

Structure

required

Predicted ESMFold complex structure.

Show Structure

structure

string

required

Raw structure content in PDB or CIF format.

structure_format

string

Format of the content string (auto-detected if omitted).

b_factor_type

BFactorType

What the B-factor column represents.

source

string

Optional source identifier (filepath or tool name).

metrics

Metrics

Associated metrics (e.g., pLDDT, pTM scores, per-chain lists, pairwise matrices). None values are stripped at construction.

gradient

array

Gradient matrix matching the input logits shape.

loss

number

required

Scalar weighted confidence objective value.

metrics

Dict[string, any]

Confidence metrics and per-term unweighted losses.

vocab

List[string]

required

Amino-acid column ordering for logits and gradient.

Metrics

Metric	Type	Range	Availability
`avg_plddt`	float	0.0 to 1.0	always
`ptm`	float	0.0 to 1.0	depends on model output
`avg_pae`	float	≥ 0.0	depends on model output
`pae`	list[list[float]]	≥ 0.0	when include_pae_matrix=True

Applications

This tool supplies the loss and gradient signal that gradient-based or MCMC sequence-design loops optimize for foldability: minimizing the confidence loss pushes a relaxed sequence toward one ESMFold predicts will fold well. With compute_gradient=False it instead provides forward-only confidence scoring (loss, metrics, and predicted structure) of a candidate sequence for ranking or filtering.

Usage Tips

One pass per call; this tool is not an optimization loop. It evaluates a single relaxed sequence. Drive it from a sequence-design optimizer, or call it repeatedly, to actually design a sequence.
compute_gradient defaults to True. It runs a forward and backward pass and returns the gradient with respect to the input logits; set it False for forward-only scoring (gradient=None). The loss, metrics, and predicted structure are identical in both modes.
loss_weights selects and weights the confidence terms. Non-negative weights over plddt, ptm, and pae (default {"plddt": 1.0}); terms with weight 0.0 are skipped, and all-zero weights short-circuit to a zero gradient with loss=0.0.
logits and the returned gradient share canonical amino-acid order ACDEFGHIKLMNPQRSTVWY. Every chain listed in target_chain_indices must have length len(logits); non-target chains fold normally with their fixed sequences.
soft and hard trade smoothness for discreteness. The default (soft=1.0, hard=0.0) uses pure soft probabilities for smooth optimization; set hard=1.0 for a straight-through estimator (the forward pass sees argmax tokens while gradients still flow through the soft probabilities).

Toolkit Notes

These apply to every ESMFold tool in this toolkit (esmfold-prediction, esmfold-gradient).

Requires a GPU. Both tools run ESMFold through a PyTorch backend and need an NVIDIA GPU (roughly 16 GB of VRAM or more for longer sequences); CPU execution is not practical.
max_batch_residues is a starting cap, not a hard ceiling. On CUDA OOM the wrapper halves the cap (floor = longest single complex) and re-splits the offending sub-batch, so the default 1200 is usually fine to leave in place.
MSA-free and single-sequence. ESMFold folds from one sequence with no alignment or template search. Accuracy is generally lower than MSA-based methods on targets where a deep, diverse MSA would help.
num_recycles (default 4) applies to both tools. Each recycling iteration refines the structure; raising it improves accuracy at higher runtime.

Example notebook: See the full working example for a copy-paste-ready walkthrough.

Infrastructure Guides

The following guides cover how to run tools efficiently and at scale.

Tool Persistence

Keep a tool’s model warm across calls instead of reloading it every invocation.

Device Management

How GPUs are allocated to tools and how to target specific devices.

Parallel Execution

Fan a batch of inputs out across multiple GPUs.

Cloud Inference

Run tools on managed cloud infrastructure with no local setup.

​Background

​Learning Resources

​Tools

​ESMFold Structure Prediction (esmfold-prediction)

​API Reference

​Applications

​Usage Tips

​ESMFold Gradient (esmfold-gradient)

​API Reference

​Applications

​Usage Tips

​Toolkit Notes

​Infrastructure Guides

Tool Persistence

Device Management

Parallel Execution

Cloud Inference

Background

Learning Resources

Tools

ESMFold Structure Prediction (`esmfold-prediction`)

API Reference

Applications

Usage Tips

ESMFold Gradient (`esmfold-gradient`)

API Reference

Applications

Usage Tips

Toolkit Notes

Infrastructure Guides