Proto is not affiliated with Institute for Protein Design. This toolkit is open source and builds on the implementation produced by this organization. Product names, logos, and trademarks are the property of their respective owners.
Background
LigandMPNN (Dauparas et al., 2025) solves the inverse-folding problem for biomolecular assemblies: given a protein backbone together with the non-protein atoms around it, it predicts an amino-acid sequence compatible with that environment. It is a direct extension of ProteinMPNN, which sees only protein backbone atoms and is therefore blind to the bound ligands, nucleic acids, and metals that strongly shape which residues fit. Internally, LigandMPNN keeps ProteinMPNN’s message-passing design model and adds a second graph over the non-protein atoms. Residues and nearby ligand atoms exchange messages, and the model reads each atom’s chemical element, which is what lets it reason about coordinating a metal or packing against a large or unusual ligand. It generates the sequence autoregressively and can also produce sidechain conformations so binding interactions can be inspected directly. On native backbones it recovers roughly 63% of the native residues that contact small molecules, 51% of those contacting nucleotides, and 78% of those coordinating metals. The reference implementation is maintained by the Institute for Protein Design at dauparas/LigandMPNN.Learning Resources
- Introducing LigandMPNN (Institute for Protein Design) - an accessible overview of what LigandMPNN adds over ProteinMPNN and when to use it.
Tools
LigandMPNN Sampling (ligandmpnn-sample)
Designs new sequences for a backbone in the presence of its non-protein context. Each input structure is encoded once, with any ligand, nucleotide, or metal atoms included, and decoded into one or more candidate sequences with a perplexity and sequence recovery score.API Reference
Input: InverseFoldingInput
Input: InverseFoldingInput
chains_to_redesign and fixed_positions selections.Config: LigandMPNNSampleConfig
Config: LigandMPNNSampleConfig
True is coerced to 1 and False to 0.None waits indefinitely.Output: LigandMPNNSampleOutput
Output: LigandMPNNSampleOutput
LigandMPNNDesignSet per input structure, in input order.Applications
Use this to design or redesign binding sites, enzyme active sites, nucleic-acid-binding interfaces, and metal-coordination sites, where the identity of nearby non-protein atoms determines which residues work. It is the right choice over backbone-only ProteinMPNN whenever a ligand, cofactor, nucleic acid, or metal is part of the target.Usage Tips
- Keep
ligand_mpnn_use_atom_contextenabled. It defaults toTrueand is the whole point of LigandMPNN: it encodes the surrounding ligand, nucleotide, and metal atoms. Turning it off makes the model effectively ligand-blind, close to plain ProteinMPNN. - Set
ligand_mpnn_use_side_chain_contexttoTrueto honor a fixed motif. It conditions on the sidechain atoms of fixed residues, which helps when redesigning around a preserved catalytic or binding motif. It defaults toFalse. fixed_positionsis counted from 1, not 0, to match biological residue selection conventions. Listed positions keep their input residue, and chains or atoms you do not redesign still act as context rather than being removed.
LigandMPNN Scoring (ligandmpnn-score)
Evaluates how well existing sequences fit a structure and its non-protein context, returning log-likelihood-based metrics with optional per-position logits.API Reference
Input: LigandMPNNScoringInput
Input: LigandMPNNScoringInput
fixed_positions excluded from the metrics.Config: LigandMPNNScoringConfig
Config: LigandMPNNScoringConfig
single_aa, autoregressiveTrue is coerced to 1 and False to 0.None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.Output: InverseFoldingScoringOutput
Output: InverseFoldingScoringOutput
Metrics subclass with scalar metrics (accessed via score.perplexity or score["perplexity"]) plus declared logits / vocab fields.scores item)| Metric | Type | Range | Availability |
|---|---|---|---|
log_likelihood | float | ≤ 0.0 | always |
avg_log_likelihood | float | ≤ 0.0 | always |
perplexity | float | ≥ 1.0 | always |
Applications
Use this to rank designs or assess mutations near ligands, nucleic acids, or metals, where backbone-only scoring would miss the very interactions that matter. Lower perplexity indicates a better fit to the structure and its bound environment.Usage Tips
scoring_modechanges what the score means.single_aa(the default) scores each position from its own conditional probability and is order-independent, which is what you usually want for ranking.autoregressivescores along one seed-determined decoding order, so it depends on the seed.fixed_positionsexcludes residues from the aggregate score. Set it per (sequence, structure) input pair as a{chain: [positions]}selection counted from 1, not 0, to match biological residue selection conventions, so the score reflects only the residues you care about.return_logits(defaultFalse) has a size trade-off. Enabling it adds a per-position logit array per sequence for residue-level analysis, which dominates output size and memory for long sequences, so leave it off unless you need it.
Toolkit Notes
These apply to every LigandMPNN tool in this toolkit (ligandmpnn-sample, ligandmpnn-score).
- A GPU is recommended. LigandMPNN is a small message-passing model that also runs on CPU, but a GPU is much faster when designing or scoring many sequences.
- The non-protein context must be in the input structure. LigandMPNN only conditions on ligands, nucleotides, or metals that are present in the supplied structure; if they are absent, it behaves like backbone-only ProteinMPNN.

Institute for Protein Design