Skip to main content

Generators

Generators propose candidate sequences during optimization. Where constraints define the requirements and optimizers orchestrate the search, generators determine where new candidate sequences come from. Every optimization step begins with generators proposing candidates. A generator takes the current sequences in a Segment, applies its strategy (random mutation, protein language model, structure-conditioned design), and fills the proposal_sequences pool for the optimizer to evaluate.
result_sequencesGeneratorproposal_sequencesOptimizerConstraintsinputsamplescoreselect best
result_sequencesGeneratorproposal_sequencesOptimizerConstraintsinputsamplescoreselect best

Generator Categories

Proto organizes generators by how they produce sequences. The three most common categories are below; a fourth, gradient-based generation (PositionWeightGenerator), produces differentiable position weights for the Gradient optimizer. Each category makes different trade-offs between speed, biological realism, and required prior knowledge.
Refine existing sequences by modifying selected positions.Mutation generators start from an existing sequence and introduce changes, either uniformly random or guided by a protein language model’s uncertainty estimates. Most require a starting sequence (ESM2Generator, for example, raises if the segment has none); the random generators (RandomProteinGenerator, RandomNucleotideGenerator) are the exception and initialize one automatically when none is provided.
MKTAYLLIGL…MKTAYVLIGL…mutate positions
MKTAYLLIGL…MKTAYVLIGL…mutate positions
When to use: A starting sequence is available and the goal is to refine it. This is the most common category for iterative optimization.
python
from proto_language.generator import (
    RandomNucleotideGenerator, RandomNucleotideGeneratorConfig
)
from proto_tools.transforms.masking import MaskingStrategy

generator = RandomNucleotideGenerator(
    RandomNucleotideGeneratorConfig(masking_strategy=MaskingStrategy(num_mutations=5))
)
See the Generator Reference for all available mutation generators and their configuration options.

Assigning Generators to Segments

Before an optimizer can use a generator, it must be assigned to a specific Segment. This tells the generator which part of the construct to modify.
python
from proto_language.core import Segment
from proto_language.generator import (
    RandomNucleotideGenerator, RandomNucleotideGeneratorConfig
)
from proto_tools.transforms.masking import MaskingStrategy

# Create a segment
segment = Segment(length=100, sequence_type="dna")

# Create and assign the generator
generator = RandomNucleotideGenerator(
    RandomNucleotideGeneratorConfig(masking_strategy=MaskingStrategy(num_mutations=5))
)
generator.assign(segment)
The assign() method validates compatibility:
  • The segment’s sequence_type must be supported by the generator
  • Ligand segments cannot have generators assigned (they’re fixed)
python
# This works: ESM2 supports protein
protein_segment = Segment(length=100, sequence_type="protein")
esm2_gen = ESM2Generator(ESM2GeneratorConfig())
esm2_gen.assign(protein_segment)

# This raises ValueError: ESM2 doesn't support DNA
dna_segment = Segment(length=100, sequence_type="dna")
esm2_gen.assign(dna_segment)

Multiple Generators

In multi-segment constructs, different generators can be assigned to different segments. Each generator independently proposes candidates for its assigned segment:
python
from proto_language.core import Segment, Construct
from proto_language.optimizer import MCMCOptimizer, MCMCOptimizerConfig

# Two segments with different generation strategies
promoter = Segment(length=200, sequence_type="dna", label="promoter")
coding_seq = Segment(length=300, sequence_type="dna", label="cds")

construct = Construct([promoter, coding_seq])

# Assign different generators to each segment
gen_promoter = RandomNucleotideGenerator(
    RandomNucleotideGeneratorConfig(masking_strategy=MaskingStrategy(num_mutations=20))
)
gen_cds = RandomNucleotideGenerator(
    RandomNucleotideGeneratorConfig(masking_strategy=MaskingStrategy(num_mutations=6))
)

gen_promoter.assign(promoter)
gen_cds.assign(coding_seq)

optimizer = MCMCOptimizer(
    constructs=[construct],
    generators=[gen_promoter, gen_cds],
    constraints=[...],
    config=MCMCOptimizerConfig(num_steps=500, num_results=5, proposals_per_result=10),
)
Use different mutation counts for different segments. Conserved regions (like coding sequences) benefit from fewer mutations per step, while exploratory regions (like promoters) can tolerate more.

GPU Memory & Batch Size

GPU generators process multiple proposal sequences per forward pass. The batch_size config parameter controls how many sequences are sent to the GPU at once. All generators default to batch_size=1 (sequential processing); increase it to enable batching. The framework splits the full set of proposals into chunks of batch_size and processes each chunk on the GPU. For example, if the optimizer requests 50 proposals and batch_size=16, the generator runs 4 forward passes (16 + 16 + 16 + 2).
If GPU out-of-memory (OOM) errors occur, reduce the generator’s batch_size in its config. This is especially common with long sequences or large models. See the Generator Reference for per-generator configuration details and available parameters.

Next Steps

Constraints

Quality requirements that sequences must satisfy

Optimizers

Learn how optimizers coordinate generators and constraints

Tools

Explore the bioinformatics tools that power generators

Generator Reference

Full API reference for each generator

Generator Catalog