Optimizers
Optimizers are the search algorithms of Proto. They coordinate generators and constraints in an iterative loop: generate proposal sequences, score them, select the best, and repeat. Different optimizers implement different search strategies, from rejection sampling to MCMC with simulated annealing.The Optimization Loop
Every optimizer follows the same fundamental loop: Energy scoring is where constraints come together. The optimizer callsscore_energy() which:
- Evaluates all filter constraints first (hard pass/fail)
- Rejects proposals that fail any filter (
energy = inf) - Evaluates scoring constraints only on surviving proposals
- Combines scores:
energy = Sigma(weight_i x score_i)
Optimizer Comparison
| MCMC | Rejection Sampling | BeamSearch | Cycling | Gradient | |
|---|---|---|---|---|---|
| Strategy | Random walk with acceptance criterion | Generate many, keep best | Beam search over token generation | Iterative conditioning cycles | Continuous relaxation with gradient descent |
| Exploration | High (temperature-controlled) | Broad (many samples) | Moderate (beam width) | Custom (user-defined) | Local (gradient-driven) |
| Exploitation | Good (annealing) | Low (no refinement) | Good (beam pruning) | Depends on pipeline | Strong (descent) |
| Best for | General optimization, refinement | Initial screening, exploration | Long DNA generation | Structure-conditioned design | Differentiable constraints |
| Inherits state? | Yes | Yes | No (starts from prompt) | Yes | Yes |
| Multi-segment? | Yes | Yes | No (single segment) | No (single segment) | No (single segment) |
Available Optimizers
MCMCOptimizer: Metropolis-Hastings Search
MCMCOptimizer: Metropolis-Hastings Search
The general-purpose optimizer. Uses Markov Chain Monte Carlo with Metropolis-Hastings acceptance to explore sequence space. Maintains one or more parallel trajectories, proposing mutations and accepting or rejecting them based on energy improvement and temperature.How it works:
- For each result sequence, generate
proposals_per_resultproposals - Score all proposals, pick the best one per result
- Accept or reject based on Metropolis-Hastings criterion:
- Always accept if energy improves
- Sometimes accept worse moves (controlled by temperature)
- Temperature anneals from
max_temperaturetomin_temperatureover the run
python
Number of parallel trajectories to maintain.
num_results=1 is standard single-chain MCMC. Higher values explore more of sequence space in parallel.Total number of MCMC steps. More steps allow better convergence but increase runtime.
Number of proposals to generate per result per step. Total proposals per step =
num_results x proposals_per_result.Starting temperature. Higher = more exploration (accepts worse moves more often).
Final temperature. Lower = more greedy (only accepts improvements).
RejectionSamplingOptimizer: Generate Many, Keep Best
RejectionSamplingOptimizer: Generate Many, Keep Best
The simplest optimizer. Generates a large number of proposals, scores them all, and keeps the top
num_results by lowest energy. No iterative refinement; just sampling with selection.How it works:- Generate proposals in batches of
proposal_batch_size - Score them all
- Track the top
num_resultsproposals seen so far (maintained in sorted order) - Repeat until
num_samplesreached (orenergy_thresholdmet)
python
Maximum number of proposals to generate.
Number of top sequences to keep (by lowest energy). Overrides the program-level
num_results when set.Number of proposal sequences to generate and evaluate per batch, capped at
num_samples.If set, enables early stopping: halts when all top-k proposals have energy below this threshold.
BeamSearchOptimizer: Autoregressive Beam Search
BeamSearchOptimizer: Autoregressive Beam Search
Optimizer for generating long sequences with autoregressive generators such as Evo2. Splits a long segment into chunks of
beam_length tokens and performs beam search at each boundary, pruning low-quality beams as the sequence grows.How it works:- Start from
promptsequence - Generate
beam_lengthtokens withproposals_per_resultvariations per beam - Score all beams with constraints
- Keep top
num_resultsbeams - Repeat until segment length is reached
python
Initial sequence to begin generation from. All beams extend from this prompt.
Number of tokens to generate per beam step.
Number of top beams to maintain at each step (K in beam search). Optional; defaults to the number of results requested by the program.
Number of proposal extensions per beam. Total proposals per step =
num_results x proposals_per_result.Aggregation method:
"mean" (average across beams, rewards consistency) or "last" (most recent beam only).Enable KV cache reuse for faster autoregressive generation across beam steps.
CyclingOptimizer: Iterative Conditioning Cycles
CyclingOptimizer: Iterative Conditioning Cycles
A generalized optimizer that alternates between a user-defined conditioning function and a generator. The conditioning function can modify generator config between iterations; for example, predicting a 3D structure from the current sequence, then using that structure to condition inverse folding for the next iteration.How it works:
- Run the conditioning function on current sequences
- The conditioning function updates generator config (e.g., sets new PDB structures)
- Generator produces new proposals conditioned on updated config
- Optionally evaluate constraints and roll back rejected proposals
- Repeat for
num_steps
protein-hunter pipeline automates the common pattern of structure prediction followed by inverse folding:python
Number of proposal trajectories to maintain across cycles.
Number of conditioning-generation cycles to run.
Named pipeline (e.g.,
"protein-hunter") for common conditioning patterns.Custom conditioning function for advanced use cases. Passed as a constructor argument to
CyclingOptimizer (not a CyclingOptimizerConfig field), and mutually exclusive with pipeline.The Gradient optimizer (continuous relaxation with differentiable constraints, paired with
PositionWeightGenerator) is summarized in the comparison table above; see its full reference at Gradient optimizer.Optimizer Decision Tree
Pool Architecture
Understanding the dual-pool system is key to understanding how optimizers work:proposal_sequences
The working pool. Generators write proposals here. Constraints evaluate sequences from here. Size = num_proposals.It acts as an inbox: new proposals arrive, are evaluated, and the best ones graduate to the result pool.result_sequences
The results pool. Contains the best sequences found so far. Size = num_results.It acts as a hall of fame: only the best-scoring sequences are retained here.Pool Initialization (Cycling)
When an optimizer starts, either fresh or after receiving results from a previous optimizer in a Program, both pools are initialized by cycling through the source sequences:Constraint Evaluation & Performance
Thescore_energy() method implements a two-pass evaluation strategy that skips expensive GPU computations on already-rejected proposals:
Pass 1, Filters: All filter constraints (those with a threshold) are evaluated first. Proposals that fail any filter are immediately rejected with energy = inf and marked with the rejecting constraint’s label. This is an AND gate; a proposal must pass every filter to survive.
Pass 2, Scoring: Scoring constraints (those with a weight) are only evaluated on proposals that passed all filters. This means expensive GPU evaluations (structure prediction, binding strength) are never run on proposals that already failed a cheap filter (homopolymer check, GC content range).
batch_size), constraints receive all passing proposals in a single call. Each tool handles its own memory internally: ESMFold splits by residue count, Boltz2 processes complexes sequentially, and so on. Users control this through tool-specific config fields (e.g., max_batch_residues for ESMFold) rather than a constraint-level parameter.
Temperature and Acceptance
Temperature controls the exploration-exploitation trade-off in MCMC. It determines how willing the optimizer is to accept a proposal that is worse than the current best:T(step) = T_max x (T_min / T_max)^((step - 1) / (num_steps - 1)), so step 1 is exactly T_max and the final step is exactly T_min.
Tool Cache Management
Constraints that call expensive bioinformatics tools (structure prediction, sequence alignment) benefit from caching. The optimizer manages a shared tool cache:python
History Tracking
Optimizers record snapshots of their state at configurable intervals for post-hoc analysis:python
Next Steps
Programs
Chain multiple optimizers into multi-stage pipelines
Generators
The models that propose candidate sequences
Constraints
The quality checklist optimizers minimize
Optimizer Reference
Full API reference for each optimizer




