Skip to main content
Protein Hunter designs a protein de novo by cycling between two models: a structure predictor proposes a fold for the current sequence, and an inverse-folding model redesigns the sequence for that fold. Repeating the cycle drives an initially unknown sequence toward one that folds well. The CyclingOptimizer coordinates the two, using Boltz2 for structure prediction and ProteinMPNN for inverse folding. This designs a 100-residue protein over five cycles. It requires a GPU and downloads the Boltz2 and ProteinMPNN weights on first use. Open as a runnable notebook View as a Python script
Runtime: this walkthrough runs real models on a GPU and takes several minutes to complete. The first run is slower because it builds the tool environment and downloads model weights.

The design target

A Segment is the stretch of sequence being designed; a Construct groups the segments that make up one molecule. Here a single designed_protein segment is seeded with "X" * DESIGN_LENGTH, an all-X (unknown) sequence of 100 residues that the cycle fills in. The ProteinMPNNGenerator performs the inverse-folding step, designing sequences predicted to fold into a given backbone structure. temperature=0.1 sets near-deterministic sampling that favors the most likely residues, excluded_amino_acids=["C"] forbids cysteine, and generator.assign(protein) binds the generator to the segment it writes into.
python
from proto_language.core import Construct, Segment
from proto_language.generator import ProteinMPNNGenerator, ProteinMPNNGeneratorConfig

DESIGN_LENGTH = 100

protein = Segment(sequence="X" * DESIGN_LENGTH, sequence_type="protein", label="designed_protein")
construct = Construct([protein])

generator = ProteinMPNNGenerator(
    ProteinMPNNGeneratorConfig(temperature=0.1, excluded_amino_acids=["C"])
)
generator.assign(protein)

The conditioning function

CyclingOptimizer calls a conditioning function once per cycle with the current sequences and feeds its output into the generator’s sample(). This one wraps each sequence in a Complex, predicts a fold for it with predict_structures(complexes, "boltz2", {}), and stashes the predicted PDB on each sequence under _metadata["designed_structure_pdb"] so it can be retrieved later. It returns the list of predicted structures, which the optimizer then passes to ProteinMPNN for the inverse-folding step.
python
from proto_tools import Complex, predict_structures
from proto_language.core import Sequence


def predict_structure(sequences: list[Sequence]) -> list:
    complexes = [Complex(chains=[seq.sequence]) for seq in sequences]
    structures = predict_structures(complexes, "boltz2", {}).structures
    for seq, structure in zip(sequences, structures):
        seq._metadata["designed_structure_pdb"] = structure.structure_pdb
    return structures

Run the cycle

CyclingOptimizer alternates the conditioning function and the generator for num_steps cycles: each cycle conditions on the current result_sequences, generates proposals, and (with no constraints here) accepts every proposal as the next cycle’s input. The config sets num_steps=5 cycles and num_results=2 independent proposal trajectories, with verbose=True to print per-cycle progress. target_segment names the segment being optimized, conditioning_fn supplies the structure-prediction step defined above, and the custom_logging callback (track) records each cycle’s sequence into trajectory. The Program runs the optimizer and collects the results.
python
from proto_language.core import Program
from proto_language.optimizer import CyclingOptimizer, CyclingOptimizerConfig

# Record the sequence after each cycle.
trajectory = []


def track(step, segments):
    trajectory.append((step, str(segments[0].proposal_sequences[0].sequence)))


optimizer = CyclingOptimizer(
    target_segment=protein,
    constructs=[construct],
    generators=[generator],
    constraints=[],
    config=CyclingOptimizerConfig(num_steps=5, num_results=2, verbose=True),
    conditioning_fn=predict_structure,
    custom_logging=track,
)

program = Program(optimizers=[optimizer], num_results=2)
program.run()

Inspect the result

protein.result_sequences[0] is the first trajectory’s final design. The recorded trajectory shows the sequence after each cycle, redesigned for its predicted fold each time, so the all-X start has been replaced by a concrete amino acid sequence whose length matches DESIGN_LENGTH.
python
designed = protein.result_sequences[0]

print("trajectory (the sequence is redesigned for its predicted fold each cycle):")
for step, seq in trajectory:
    print(f"  cycle {step}: {seq}")
print(f"\ndesigned sequence: {designed.sequence}")
print(f"length:            {len(designed)}")
trajectory (the sequence is redesigned for its predicted fold each cycle):
  cycle 1: MEEKEKLVKEKEEEAKKALKEYAEKAKKKLLEEAPEEKEEAEKLAEFAEKEALKGIKEGKFEEAKKKVEEFAKKIGGELAKVAEKLFKELIEAVLEAAEK
  cycle 2: AAAEAAARAARAAAARAKLDEEVDKAEKELIKANPDKKEEAKALAEFARATLERGIAEGKLEEAKEAILAKAKEVGGELGKVAEELFAKTAEAVRKAYEA
  cycle 3: AAAAAAAEAAHKAAAKKALDKEVAKAEKELIKANPKKKEEAKALAKYARDTLTEGIETGKLEEAKKKILAKAEEVGGELGKEAKKLFTKTADAVKAAYEA
  cycle 4: SMAEAAEEEKRKEAALKKLEEEVEKALKALKEANPEEKEKAEELAEFARETLTKGIETGKLDEAKKKVLAEAKKVGGELGKKAEEEFTKVAEAVKKAYEA
  cycle 5: SAAAAAAEAARKAAAKATLDTEVAKALAALKAANPDQAAQADALADFARATLTKGIETGKLDEAAAEVLARAKAVGGGLGAQAVKEFTKVAAAVKAAYEA

designed sequence: SAAAAAAEAARKAAAKATLDTEVAKALAALKAANPDQAAQADALADFARATLTKGIETGKLDEAAAEVLARAKAVGGGLGAQAVKEFTKVAAAVKAAYEA
length:            100

Next Steps

Symmetric Protein Design

Structure-constrained protein design with MCMC.

Using Optimizers

The cycling optimizer and its siblings.