Proto is not affiliated with Meta AI and Biohub. This toolkit is open source and builds on the implementations produced by these organizations. Product names, logos, and trademarks are the property of their respective owners.
Background
ESM-IF1 (Hsu et al., 2022) solves the inverse-folding problem: given a protein backbone, it predicts an amino-acid sequence that will fold into that structure. This is the inverse of structure prediction and a core step in protein design, where a backbone is proposed first and a sequence that encodes it is designed afterwards. Internally, ESM-IF1 is a sequence-to-sequence transformer with geometric input layers. A Geometric Vector Perceptron graph network encodes the backbone atom coordinates (N, C-alpha, C) into rotation-invariant per-residue features, and an autoregressive decoder then generates the sequence one residue at a time. Because experimentally determined structures are limited, the model was trained on roughly 12 million UniRef50 sequences whose structures were predicted with AlphaFold2, alongside experimental structures from CATH. This raised native-sequence recovery to about 51% on structurally held-out backbones, and about 72% for buried residues. ESM-IF1 also handles complexes, partially masked structures, and binding interfaces. The reference implementation is maintained by Meta AI in facebookresearch/esm and distributed in thefair-esm package.
ProteinDPO (Widatalla et al., 2024) is a variant of ESM-IF1 fine-tuned with Direct Preference Optimization (DPO) on a mega-scale experimental protein-stability dataset. It keeps the ESM-IF1 architecture but is trained to prefer stabilizing over destabilizing sequences for a given backbone, which improves both its designs and its stability scoring. ProteinDPO’s implementation is available at evo-design/protein-dpo.
Learning Resources
- ESM inverse folding examples (Meta AI) - the official notebooks and scripts for running ESM-IF1 sequence design and scoring, including multi-chain complexes.
Tools
ESM-IF1 Sampling (esm-if1-sample)
Designs new sequences for a given backbone. Each input structure is encoded once and decoded into one or more candidate sequences, each returned with its average log-likelihood under the model.API Reference
Input: InverseFoldingInput
Input: InverseFoldingInput
chains_to_redesign and fixed_positions selections.Config: ESMIF1SampleConfig
Config: ESMIF1SampleConfig
esmif, protein_dpoTrue is coerced to 1 and False to 0.None waits indefinitely.Output: ESMIF1SampleOutput
Output: ESMIF1SampleOutput
ESMIF1DesignSet per input structure, in input order.Applications
Use this to redesign a natural protein or to generate sequences for a de novo backbone, including multi-chain complexes and binding interfaces where the surrounding chains are kept as context. With the default ProteinDPO weights the designs are biased toward higher experimental stability, which suits stabilization campaigns.Usage Tips
temperaturedefaults to1.0, rather than the0.1used by the other inverse-folding tools. ESM-IF1’s reference inference samples at1.0, and this toolkit retains that default, so its designs are more diverse than those produced by the backbone-MPNN models. Lower it toward0.1for conservative, near-greedy designs, and raise it for greater variation.batch_sizetrades GPU memory against throughput. It defaults tonum_sequences_per_structure, so all requested sequences are generated in a single forward pass. Increase it to generate more sequences per pass and improve throughput. Decrease it to lower peak GPU memory when a large request or long backbone exhausts memory.- Non-redesigned chains still shape the design. Chains you do not select stay as fixed structural context rather than being ignored, so designing one chain of a complex accounts for its partners.
fixed_positionsis counted from 1, not 0 to follow biological conventions for residue selection. - Output is structured per design.
output.design_sets[i].complexes[j]is anESMIF1Design; the designed target sequence isdesign.designed_chains[0].sequenceand the log-likelihood isdesign.metrics["log_likelihood"].ESMIF1Designsare aComplexsubclass and can be passed directly to structure predictors.
ESM-IF1 Scoring (esm-if1-score)
Evaluates how well existing sequences fit a structure. Each sequence is scored against its paired structure using the full multi-chain context, returning the average log-likelihood and perplexity.API Reference
Input: ESMIF1ScoringInput
Input: ESMIF1ScoringInput
Config: ESMIF1ScoringConfig
Config: ESMIF1ScoringConfig
esmif, protein_dpoTrue is coerced to 1 and False to 0.None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.Output: InverseFoldingScoringOutput
Output: InverseFoldingScoringOutput
Metrics subclass with scalar metrics (accessed via score.perplexity or score["perplexity"]) plus declared logits / vocab fields.scores item)| Metric | Type | Range | Availability |
|---|---|---|---|
log_likelihood | float | ≤ 0.0 | always |
avg_log_likelihood | float | ≤ 0.0 | always |
perplexity | float | ≥ 1.0 | always |
Applications
Use this to rank candidate sequences or assess point mutations by structural compatibility without generating new ones. With the default ProteinDPO weights the score also better reflects predicted stability, which is useful for prioritizing stabilizing variants.Usage Tips
- Lower perplexity is better, and it tracks the log-likelihood directly. Perplexity is
exp(-avg_log_likelihood), so the two metrics rank candidates identically. Treat the score as compatibility under the model, not a guarantee the sequence folds, and confirm shortlisted candidates with a structure predictor.
Toolkit Notes
These apply to every ESM-IF1 tool in this toolkit (esm-if1-sample, esm-if1-score).
- Requires a GPU. The geometric encoder and autoregressive decoder are not practical on CPU. Model weights download automatically on first use through the
fair-esmpackage. - ProteinDPO is the default for both tools.
esm-if1-sampleandesm-if1-scoreboth defaultweights_varianttoprotein_dpo, the stability-aligned variant, so by default designs are biased toward stability and scores reflect predicted stability rather than the original ESM-IF1 likelihood. Setweights_varianttoesmiffor the original ESM-IF1 model in either tool.

Meta AI
Biohub