Skip to main content
License: LigandMPNN is open source and free for academic and commercial use under an MIT license. Please refer to the license for full terms.

Proto is not affiliated with Institute for Protein Design. This toolkit is open source and builds on the implementation produced by this organization. Product names, logos, and trademarks are the property of their respective owners.


dauparas/LigandMPNN
dauparas/LigandMPNN
562 stars
View repo
Atomic context-conditioned protein sequence design using LigandMPNN
Justas Dauparas, Gyu Rie Lee, … David Baker
Nat. Methods (2025)
Read paper
@ARTICLE{Dauparas2025-eg,
  title     = "Atomic context-conditioned protein sequence design using
               {LigandMPNN}",
  author    = "Dauparas, Justas and Lee, Gyu Rie and Pecoraro, Robert and An,
               Linna and Anishchenko, Ivan and Glasscock, Cameron and Baker,
               David",
  journal   = "Nat. Methods",
  publisher = "Springer Science and Business Media LLC",
  volume    =  22,
  number    =  4,
  pages     = "717--723",
  doi       = "10.1038/s41592-025-02626-1",
  month     =  apr,
  year      =  2025,
  language  = "en"
}
Copy citation
proto-bio/proto-tools/proto_tools/tools/inverse_folding/ligandmpnn
View source
Open Notebook
Open notebook
Coming soon!
Run this tool directly in Proto with no setup required.
FunctionDescription
run_ligandmpnn_sample()Sample protein sequences using LigandMPNN (GPU) Docs Source
run_ligandmpnn_score()Score protein sequences using LigandMPNN (GPU) Docs Source

Background

LigandMPNN (Dauparas et al., 2025) solves the inverse-folding problem for biomolecular assemblies: given a protein backbone together with the non-protein atoms around it, it predicts an amino-acid sequence compatible with that environment. It is a direct extension of ProteinMPNN, which sees only protein backbone atoms and is therefore blind to the bound ligands, nucleic acids, and metals that strongly shape which residues fit. Internally, LigandMPNN keeps ProteinMPNN’s message-passing design model and adds a second graph over the non-protein atoms. Residues and nearby ligand atoms exchange messages, and the model reads each atom’s chemical element, which is what lets it reason about coordinating a metal or packing against a large or unusual ligand. It generates the sequence autoregressively and can also produce sidechain conformations so binding interactions can be inspected directly. On native backbones it recovers roughly 63% of the native residues that contact small molecules, 51% of those contacting nucleotides, and 78% of those coordinating metals. The reference implementation is maintained by the Institute for Protein Design at dauparas/LigandMPNN.

Learning Resources

  • Introducing LigandMPNN (Institute for Protein Design) - an accessible overview of what LigandMPNN adds over ProteinMPNN and when to use it.

Tools

LigandMPNN Sampling (ligandmpnn-sample)

Designs new sequences for a backbone in the presence of its non-protein context. Each input structure is encoded once, with any ligand, nucleotide, or metal atoms included, and decoded into one or more candidate sequences with a perplexity and sequence recovery score.

API Reference

Source
inputs
List[InverseFoldingStructureInput]
required
Per-structure inputs, each containing a structure plus optional chains_to_redesign and fixed_positions selections.
Source
model_type
string
default:"ligand_mpnn"
LigandMPNN variant to load.
ligand_mpnn_use_atom_context
boolean
default:"True"
Whether ligand-aware variants encode ligand atom context.
ligand_mpnn_use_side_chain_context
boolean
default:"False"
Whether to condition on fixed-residue sidechain atoms.
ligand_mpnn_cutoff_for_score
number
default:"8.0"
Ligand-residue distance cutoff (A) for the ligand-interface recovery score.
excluded_amino_acids
array
One-letter codes of amino acids to exclude.
verbose
integer
default:"0"
Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.
device
string
default:"cuda"
Device to run the model on. Options include ‘cuda’ (NVIDIA GPU), ‘cpu’ (CPU execution), or specific GPU devices like ‘cuda:0’. Defaults to ‘cuda’.
timeout
integer
default:"600"
Maximum execution time in seconds. None waits indefinitely.
seed
integer
Random seed to use for sampling.
num_sequences_per_structure
integer
default:"1"
Total number of sequences to generate per input structure.
batch_size
integer
Number of sequences to process simultaneously on GPU. Defaults to num_sequences_per_structure.
temperature
number
default:"0.1"
Controls randomness in sampling from logits.
Source
design_sets
List[LigandMPNNDesignSet]
required
One LigandMPNNDesignSet per input structure, in input order.

Applications

Use this to design or redesign binding sites, enzyme active sites, nucleic-acid-binding interfaces, and metal-coordination sites, where the identity of nearby non-protein atoms determines which residues work. It is the right choice over backbone-only ProteinMPNN whenever a ligand, cofactor, nucleic acid, or metal is part of the target.

Usage Tips

  • Keep ligand_mpnn_use_atom_context enabled. It defaults to True and is the whole point of LigandMPNN: it encodes the surrounding ligand, nucleotide, and metal atoms. Turning it off makes the model effectively ligand-blind, close to plain ProteinMPNN.
  • Set ligand_mpnn_use_side_chain_context to True to honor a fixed motif. It conditions on the sidechain atoms of fixed residues, which helps when redesigning around a preserved catalytic or binding motif. It defaults to False.
  • fixed_positions is counted from 1, not 0, to match biological residue selection conventions. Listed positions keep their input residue, and chains or atoms you do not redesign still act as context rather than being removed.

LigandMPNN Scoring (ligandmpnn-score)

Evaluates how well existing sequences fit a structure and its non-protein context, returning log-likelihood-based metrics with optional per-position logits.

API Reference

Source
sequence_structure_pairs
List[SequenceStructurePair]
required
Sequence and structure pairs to score; each pair may carry per-pair fixed_positions excluded from the metrics.
Source
return_logits
boolean
default:"False"
Whether to include per-position logits.
scoring_mode
enum
default:"single_aa"
Single-position or autoregressive scoring mode.Available options: single_aa, autoregressive
verbose
integer
default:"0"
Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.
device
string
default:"cuda"
Device to run the model on.
timeout
integer
default:"600"
Maximum execution time in seconds. None waits indefinitely.
seed
integer
Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.
Source
scores
List[InverseFoldingScoringMetrics]
required
List of scoring outputs, one per input sequence-structure pair. Each entry is a Metrics subclass with scalar metrics (accessed via score.perplexity or score["perplexity"]) plus declared logits / vocab fields.
Metrics (one set per scores item)
MetricTypeRangeAvailability
log_likelihoodfloat≤ 0.0always
avg_log_likelihoodfloat≤ 0.0always
perplexityfloat≥ 1.0always

Applications

Use this to rank designs or assess mutations near ligands, nucleic acids, or metals, where backbone-only scoring would miss the very interactions that matter. Lower perplexity indicates a better fit to the structure and its bound environment.

Usage Tips

  • scoring_mode changes what the score means. single_aa (the default) scores each position from its own conditional probability and is order-independent, which is what you usually want for ranking. autoregressive scores along one seed-determined decoding order, so it depends on the seed.
  • fixed_positions excludes residues from the aggregate score. Set it per (sequence, structure) input pair as a {chain: [positions]} selection counted from 1, not 0, to match biological residue selection conventions, so the score reflects only the residues you care about.
  • return_logits (default False) has a size trade-off. Enabling it adds a per-position logit array per sequence for residue-level analysis, which dominates output size and memory for long sequences, so leave it off unless you need it.

Toolkit Notes

These apply to every LigandMPNN tool in this toolkit (ligandmpnn-sample, ligandmpnn-score).
  • A GPU is recommended. LigandMPNN is a small message-passing model that also runs on CPU, but a GPU is much faster when designing or scoring many sequences.
  • The non-protein context must be in the input structure. LigandMPNN only conditions on ligands, nucleotides, or metals that are present in the supplied structure; if they are absent, it behaves like backbone-only ProteinMPNN.
Example notebook: See the full working example for a copy-paste-ready walkthrough.

Infrastructure Guides

The following guides cover how to run tools efficiently and at scale.

Tool Persistence

Keep a tool’s model warm across calls instead of reloading it every invocation.

Device Management

How GPUs are allocated to tools and how to target specific devices.

Parallel Execution

Fan a batch of inputs out across multiple GPUs.

Cloud Inference

Run tools on managed cloud infrastructure with no local setup.