Skip to main content

Programs

While individual optimizers run a single search strategy, a Program chains multiple optimizers into a multi-stage pipeline: broad exploration followed by targeted refinement, cheap filters before expensive scoring, temperature annealing across stages. A Program runs its optimizers sequentially, automatically handling the handoff of results between stages.

Single vs Multi-Stage

For simple designs, wrap one optimizer in a Program:
python
from proto_language.core import Segment, Construct, Constraint, Program
from proto_language.optimizer import MCMCOptimizer, MCMCOptimizerConfig
from proto_language.generator import (
    RandomNucleotideGenerator, RandomNucleotideGeneratorConfig
)
from proto_language.constraint import gc_content_constraint

# Setup
segment = Segment(length=100, sequence_type="dna")
construct = Construct([segment])

generator = RandomNucleotideGenerator(
    RandomNucleotideGeneratorConfig()
)
generator.assign(segment)

constraint = Constraint(
    inputs=[segment],
    function=gc_content_constraint,
    function_config={"min_gc": 45, "max_gc": 55},
)

# Single optimizer
optimizer = MCMCOptimizer(
    constructs=[construct],
    generators=[generator],
    constraints=[constraint],
    config=MCMCOptimizerConfig(num_steps=500, num_results=5, proposals_per_result=10),
)

program = Program(optimizers=[optimizer], num_results=5)
program.run()
InputSequenceMCMC500 stepsResults(5 best)
InputSequenceMCMC500 stepsResults(5 best)

The Handoff

When one optimizer finishes and the next begins, the Program performs a carefully orchestrated handoff:
Stage 1: Rejection SamplingHandoffStage 2: MCMCRun optimizerSort result_sequencesby energy (best first)Initialize next optimizer’s poolsby cycling through sorted resultsClear stale constraintmetadataRun optimizer
Stage 1: Rejection SamplingHandoffStage 2: MCMCRun optimizerSort result_sequencesby energy (best first)Initialize next optimizer’s poolsby cycling through sorted resultsClear stale constraintmetadataRun optimizer
After each optimizer completes: Optimizers are responsible for their own ordering. Rejection Sampling keeps result_sequences sorted by energy (best first) throughout its run. Other optimizers preserve their natural ordering. Before the next optimizer runs:
  1. _initialize_sequence_pools() reads from the previous optimizer’s result_sequences
  2. Both pools are filled by cycling through source (preserving diversity when sizes differ)
  3. Stale constraint metadata is cleared so the new stage starts with a clean slate

Optimizer-Specific Behavior

Not all optimizers use inherited state the same way:
OptimizerHow It Uses Previous Results
Rejection SamplingUses as starting proposals, then generates more and keeps overall best
MCMCUses as parallel trajectories, generates proposals from each
CyclingUses as working proposals for conditioning cycles
BeamSearchIgnores previous results. Always starts fresh from its prompt parameter
BeamSearch ignores previous optimizer results by design. It always starts fresh from its configured prompt since it is built for autoregressive generation. Place it as the first stage in a pipeline, or use it standalone.

Pipeline Design Recipes

The snippets below are illustrative patterns. They assume the segment, construct, generators, and the named constraint objects (for example gc_constraint, structure_constraint, expression_constraint) have already been defined as shown in the earlier examples and the Constraints guide.

Exploration then Refinement

Rejection Sampling (broad) then MCMC (focused)Use Rejection Sampling to quickly sample thousands of proposals with cheap constraints, then hand the best ones to MCMC for detailed optimization with expensive constraints.Most common multi-stage pattern.

Progressive Constraints

MCMC (basic) then MCMC (+ structure) then MCMC (+ expression)Start with cheap sequence-level constraints, then progressively add expensive constraints. Each stage builds on the previous one’s results.Avoids wasting GPU time scoring bad sequences.

Temperature Annealing

MCMC (hot) then MCMC (warm) then MCMC (cold)Explicit temperature stages: high temperature for broad exploration, medium for narrowing, low for final polishing. More control than single-optimizer annealing.Better for rugged energy landscapes.

Generator Switching

Rejection Sampling + RandomNucleotide then MCMC + ESM2Start with fast random mutations for initial screening, then switch to language-model-guided mutations for biologically informed refinement.Combines fast screening with language-model-guided refinement.

Exploration then Refinement

python
# Stage 1: Fast exploration with cheap constraints
gen1 = RandomNucleotideGenerator(
    RandomNucleotideGeneratorConfig(masking_strategy=MaskingStrategy(num_mutations=10))
)
gen1.assign(segment)

opt1 = RejectionSamplingOptimizer(
    constructs=[construct],
    generators=[gen1],
    constraints=[gc_filter, homopolymer_filter],
    config=RejectionSamplingOptimizerConfig(num_samples=5000, num_results=20),
)

# Stage 2: Structure-based refinement
gen2 = ESM2Generator(ESM2GeneratorConfig(masking_strategy=MaskingStrategy(num_mutations=3)))
gen2.assign(segment)

opt2 = MCMCOptimizer(
    constructs=[construct],
    generators=[gen2],
    constraints=[gc_constraint, plddt_constraint, rmsd_constraint],
    config=MCMCOptimizerConfig(
        num_steps=200,
        num_results=5,
        proposals_per_result=5,
        max_temperature=2.0,
    ),
)

Program(optimizers=[opt1, opt2], num_results=5).run()

Progressive Constraints

python
# Stage 1: Sequence composition only
gen1 = RandomNucleotideGenerator(RandomNucleotideGeneratorConfig())
gen1.assign(segment)
opt1 = MCMCOptimizer(
    constructs=[construct],
    generators=[gen1],
    constraints=[gc_constraint_1],
    config=MCMCOptimizerConfig(num_steps=300, num_results=10, proposals_per_result=5),
)

# Stage 2: Add structure prediction
gen2 = RandomNucleotideGenerator(RandomNucleotideGeneratorConfig(masking_strategy=MaskingStrategy(num_mutations=9)))
gen2.assign(segment)
opt2 = MCMCOptimizer(
    constructs=[construct],
    generators=[gen2],
    constraints=[gc_constraint_2, structure_constraint],
    config=MCMCOptimizerConfig(num_steps=200, num_results=5, proposals_per_result=5),
)

# Stage 3: Add expression constraint
gen3 = RandomNucleotideGenerator(RandomNucleotideGeneratorConfig(masking_strategy=MaskingStrategy(num_mutations=6)))
gen3.assign(segment)
opt3 = MCMCOptimizer(
    constructs=[construct],
    generators=[gen3],
    constraints=[gc_constraint_3, structure_constraint_2, expression_constraint],
    config=MCMCOptimizerConfig(num_steps=100, num_results=3, proposals_per_result=5),
)

Program(optimizers=[opt1, opt2, opt3], num_results=10).run()

Temperature Annealing

python
# High temperature: broad exploration
gen1 = RandomNucleotideGenerator(RandomNucleotideGeneratorConfig(masking_strategy=MaskingStrategy(num_mutations=8)))
gen1.assign(segment)
opt1 = MCMCOptimizer(
    constructs=[construct],
    generators=[gen1],
    constraints=constraints_1,
    config=MCMCOptimizerConfig(
        num_steps=500, num_results=10, proposals_per_result=10, max_temperature=5.0
    ),
)

# Medium temperature: narrowing
gen2 = RandomNucleotideGenerator(RandomNucleotideGeneratorConfig())
gen2.assign(segment)
opt2 = MCMCOptimizer(
    constructs=[construct],
    generators=[gen2],
    constraints=constraints_2,
    config=MCMCOptimizerConfig(
        num_steps=300, num_results=5, proposals_per_result=5, max_temperature=2.0
    ),
)

# Low temperature: polishing
gen3 = RandomNucleotideGenerator(RandomNucleotideGeneratorConfig(masking_strategy=MaskingStrategy(num_mutations=6)))
gen3.assign(segment)
opt3 = MCMCOptimizer(
    constructs=[construct],
    generators=[gen3],
    constraints=constraints_3,
    config=MCMCOptimizerConfig(
        num_steps=200, num_results=3, proposals_per_result=3, max_temperature=0.5
    ),
)

Program(optimizers=[opt1, opt2, opt3], num_results=10).run()

Running Stages Individually

Use run_stage() for fine-grained control: inspect results between stages, conditionally skip stages, or re-run a stage with different parameters.
python
program = Program(optimizers=[opt1, opt2, opt3], num_results=5)

# Run first stage
program.run_stage(0)
results = program.get_stage_results(0)

# Inspect before continuing
best = results["results"][results["best_result_idx"]]
print(f"Stage 1 best energy: {best['energy_score']:.4f}")

# Conditionally run next stage
if best["energy_score"] < 0.5:
    program.run_stage(1)
else:
    print("Stage 1 didn't converge, skipping refinement")
A previous stage can also be re-run, which resets the pipeline to that point and invalidates subsequent stages:
python
# Re-run stage 0 (invalidates stages 1 and 2)
program.run_stage(0)

Results and Export

Accessing Results

python
program.run()

# Final energy scores (from last optimizer)
print(program.energy_scores)  # [0.05, 0.08, 0.12, ...]

# Final sequences (from shared constructs)
for construct in program.constructs:
    for sequence in construct.joined_sequences:
        print(sequence.sequence)

# Structured results
results = program.extract_results(program.energy_scores)
for result in results["results"]:
    print(f"Result {result['result_idx']}: energy={result['energy_score']:.4f}")
    for construct in result["constructs"]:
        for seg in construct["segments"]:
            print(f"  {seg['label']}: {seg['sequence'][:50]}...")

Export Formats

# Export all 4 tables at once (sequences, constraints, constructs, optimization)
program.export(path="./results/", format="csv")
# Creates: results/sequences.csv, results/constraints.csv,
#          results/constructs.csv, results/optimization.csv

Stage-Specific Results

Access results from any completed stage:
python
# Results from stage 0
stage_0_results = program.get_stage_results(0)

# Export a specific stage's results (writes the 4-table folder for that stage)
program.export(path="./stage0_results/", format="csv", stage=0)

Optimizer-Level Export

Individual Optimizer instances also provide the same export methods (without the stage parameter):
python
optimizer.export(path="./results/", format="csv")
df = optimizer.to_dataframe(table="sequences")
fasta = optimizer.to_fasta()

State Serialization

Save and restore program state for long-running optimization or checkpointing:
python
# Save state
state = program.serialize_state()
# Save to file, database, etc.
import json
with open("checkpoint.json", "w") as f:
    json.dump(state, f)

# Later: restore state and continue
with open("checkpoint.json") as f:
    state = json.load(f)
program.restore_state(state, stage_index=1)
program.run_stage(1)  # Resume from stage 1

Important Rules

All optimizers in a Program must share the same Construct objects (by identity, not just value). This is how state persists between stages. The construct is created once and the same object is passed to all optimizers.
python
# Correct: same construct object
construct = Construct([segment])
opt1 = MCMCOptimizer(constructs=[construct], ...)
opt2 = MCMCOptimizer(constructs=[construct], ...)  # Same object

# Wrong: different construct objects (raises ValueError)
opt1 = MCMCOptimizer(constructs=[Construct([segment])], ...)
opt2 = MCMCOptimizer(constructs=[Construct([segment])], ...)  # Different object!
Each generator and constraint instance can only be used in one optimizer. This prevents shared mutable state bugs. Create new instances for each stage.
python
# Correct: separate generator instances per optimizer
gen1 = RandomNucleotideGenerator(config)
gen2 = RandomNucleotideGenerator(config)  # New instance, same config is fine
gen1.assign(segment)
gen2.assign(segment)

# Wrong: reusing the same generator instance (raises ValueError)
gen = RandomNucleotideGenerator(config)
gen.assign(segment)
opt1 = MCMCOptimizer(generators=[gen], ...)
opt2 = MCMCOptimizer(generators=[gen], ...)  # Same instance -- error!

Properties

PropertyDescription
constructsList of Construct objects being optimized (shared across all optimizers)
optimizersList of Optimizer objects in sequence
num_resultsProgram-level default for the number of output sequences. Each optimizer resolves its result count as: config override > num_results > error.
energy_scoresFinal energy scores from the last optimizer (after run())
current_stageIndex of current/next stage to run
verboseIf True, forces verbose mode in all optimizers

Next Steps

Quickstart

A complete program, from scratch

Optimizers

Deep dive into individual optimizer strategies

Constraints

Scoring functions for design objectives

Tools

The bioinformatics tools that constraints and generators call