ProteinMPNN, selects the lowest-perplexity design, and predicts that design’s structural
ensemble with BioEmu. It illustrates inverse folding followed by ensemble prediction.
The full script sweeps four backbones; this runs one backbone at full sampling: 200 ProteinMPNN
sequences and a 100-conformation BioEmu ensemble. It requires a GPU.
Open as a runnable notebook
View as a Python script
Runtime: this walkthrough runs real models on a GPU and takes several minutes to complete. The first run is slower because it builds the tool environment and downloads model weights.
The backbone
The design target is a single chain of a cached PDB structure. The bundledpdb_cache supplies
the backbone file; PDB points at 6au6.pdb and CHAIN selects chain A. ProteinMPNN is an
inverse-folding model: it reads this fixed three-dimensional backbone and proposes amino acid
sequences predicted to fold into it, redesigning chain A while the geometry stays put.
python
Sample sequences with ProteinMPNN
ProteinMPNNGenerator proposes sequences conditioned on the backbone. The config passes the
structure through InverseFoldingStructureInput with chains_to_redesign=[CHAIN], so only chain
A is redesigned, and sets temperature=0.1; for this generator temperature controls sampling
randomness from 0 to 1, where near 0 is nearly deterministic and near 1 samples proportionally to
the model’s predicted probabilities. The Segment length is read from the chain’s own sequence,
and seeding 200 proposal slots makes the generator emit 200 sequences from the single structure.
ToolInstance.persist() keeps one warm worker cached and reused across all 200 calls instead of
starting one per sample. Each proposal carries a perplexity in its generator metadata; the
lowest-perplexity design is selected here.
python
Predict the structural ensemble
run_bioemu samples conformations of the chosen sequence, approximating its structural ensemble
rather than a single static fold. BioEmuInput takes the design as a single-chain protein
(complexes), and BioEmuConfig sets num_samples=100 (the number of conformations to sample
per sequence), batch_size=100, and an output_dir for the raw BioEmu output files. The run
writes its ensemble to the temporary directory and prints a completion message.
python
Next Steps
Using Generators
The inverse-folding generator family.
Protein Hunter
Inverse folding inside a design cycle.