Binder Design

This program designs a protein binder against a fixed target. It uses the idiomatic two-segment pattern: a length-only binder segment that is designed, and a fixed target segment whose sequence is taken from the target structure. The rfdiffusion-proteinmpnn-binder generator docks and designs a binder against the target coordinates, and a structure-confidence constraint folds the full target+binder complex to score the interface. A RejectionSamplingOptimizer keeps the best binders. The full script designs an 80-residue binder against PD-L1 and generates many candidates. This walkthrough uses a short binder, a small target, and two candidates. It requires a GPU and the RFdiffusion3, ProteinMPNN, and Boltz2 weights. Open as a runnable notebook View as a Python script

Runtime: this walkthrough runs real models on a GPU and takes several minutes to complete. The first run is slower because it builds the tool environment and downloads model weights.

The target

One source, the target Structure, yields two artifacts: coordinates for the generator to dock against, and a chain sequence for the fixed target segment. get_chain_sequence(TARGET_CHAIN, remove_non_standard=True) reads chain A’s amino-acid sequence out of the PDB, dropping non-standard residues. The construct then holds two segments: binder is length-only (length=BINDER_LENGTH, sequence_type="protein"), so its positions are open for design, while target is fixed by passing its sequence directly. Giving target no generator is what keeps it constant while the binder is optimized.

python

from pathlib import Path
from proto_tools.entities.structures import Structure
from proto_language.core import Segment, Construct

TARGET_PDB = Path.cwd().parent / "germinal" / "pdbs" / "insulin.pdb"
TARGET_CHAIN = "A"
BINDER_LENGTH = 40

target_structure = Structure(structure=TARGET_PDB.read_text())
target_sequence = target_structure.get_chain_sequence(TARGET_CHAIN, remove_non_standard=True)

binder = Segment(length=BINDER_LENGTH, sequence_type="protein", label="binder")
target = Segment(sequence=target_sequence, sequence_type="protein", label="target")
construct = Construct([binder, target])

The binder generator

The rfdiffusion-proteinmpnn-binder generator is assigned only to the binder segment; the target reaches it as coordinates through the config. On each sample() call it diffuses binder backbones docked to the target with RFdiffusion3, then designs each backbone’s binder-chain sequence with ProteinMPNN while holding the target chains fixed as context; the binder length is taken from the assigned segment. target_structure and target_chains=[TARGET_CHAIN] tell RFdiffusion3 which coordinates to keep fixed and dock against. The per-tool settings live in nested configs: RFdiffusion3Config(device="cuda") and ProteinMPNNSampleConfig(num_sequences_per_structure=1, device="cuda"), where num_sequences_per_structure is the number of sequences designed per backbone.

python

from proto_tools import ProteinMPNNSampleConfig, RFdiffusion3Config
from proto_language.generator import (
    RFdiffusionProteinMPNNBinderGenerator,
    RFdiffusionProteinMPNNBinderGeneratorConfig,
)

generator = RFdiffusionProteinMPNNBinderGenerator(
    RFdiffusionProteinMPNNBinderGeneratorConfig(
        target_structure=target_structure,
        target_chains=[TARGET_CHAIN],
        rfdiffusion3_config=RFdiffusion3Config(device="cuda"),
        proteinmpnn_config=ProteinMPNNSampleConfig(num_sequences_per_structure=1, device="cuda"),
    )
)
generator.assign(binder)

The interface constraint and search

A structure-confidence constraint folds the full target+binder complex with Boltz2 and scores the interface. structure_iptm_constraint reads the predicted interface TM-score (ipTM), which measures the quality of the inter-chain interface in a multimeric complex, and returns 1.0 - iptm so lower scores indicate a better predicted interface. It lists both segments in inputs=[binder, target], which is why the target is a sibling segment, and weight=1.0 multiplies its raw score. The RejectionSamplingOptimizer draws independent batches of binders and keeps the best: num_samples=2 candidates are generated and scored, and num_results=1 retains the single lowest-energy result. Program(..., seed=0) makes the run reproducible: the same seed and inputs produce the same output.

python

from proto_language import StructureBasedConstraintConfig, structure_iptm_constraint
from proto_language.core import Constraint, Program
from proto_language.optimizer import RejectionSamplingOptimizer, RejectionSamplingOptimizerConfig

iptm = Constraint(
    inputs=[binder, target],
    function=structure_iptm_constraint,
    function_config=StructureBasedConstraintConfig(structure_tool="boltz2"),
    label="iptm",
    weight=1.0,
)

optimizer = RejectionSamplingOptimizer(
    constructs=[construct],
    generators=[generator],
    constraints=[iptm],
    config=RejectionSamplingOptimizerConfig(num_samples=2, num_results=1),
)

program = Program(optimizers=[optimizer], num_results=1, seed=0)
program.run()

Inspect the result

The kept binders are read back from binder.result_sequences, ordered best first by lowest energy. The generator writes each designed binder sequence onto its proposal, so best.sequence is the amino-acid sequence of the top-ranked binder; the folded target+binder complex is also stored on the result for downstream use.

python

best = binder.result_sequences[0]
print(f"designed binder: {best.sequence}")

designed binder: AIDPAQAAAAAAEAEATRAALPTAADPAAAQAHIAYVEAN

Binder Design

The target

The binder generator

The interface constraint and search

Inspect the result

Next Steps

Protein Hunter

Using Generators

​The target

​The binder generator

​The interface constraint and search

​Inspect the result

​Next Steps

Protein Hunter

Using Generators

The target

The binder generator

The interface constraint and search

Inspect the result

Next Steps