
The interface
A generator is a class. Its__init__ stores configuration and loads any models; the
framework binds it to a segment with assign() and calls the public sample() once per
optimization step. A subclass implements the _sample() hook, which sample() invokes.
sample() operates on segment.proposal_sequences, a list of Sequence objects whose
.sequence strings are read and rewritten in place. A generator only proposes; duplication,
scoring, and selection remain the responsibility of the optimizer.
Every generator declares an input_type, a value of GeneratorInputType that determines how
the generator obtains its starting point and therefore the family to which it belongs:
STARTING_SEQUENCE (mutation), PROMPT (autoregressive), STRUCTURE (inverse folding), or
LOGITS (gradient).
Generator families
Mutation
Introduces point changes to an existing sequence, suited to local search around a
starting point. Examples:
RandomProteinGenerator, ESM2Generator,
SemigreedyMutationGenerator.Autoregressive
Generates a sequence token by token from a prompt, suited to de novo generation.
Examples:
ProGen2Generator, Evo2Generator.Inverse folding
Designs a sequence for a fixed three-dimensional backbone. Examples:
ProteinMPNNGenerator, LigandMPNNGenerator.Gradient
Optimizes a differentiable representation of the sequence. Example:
PositionWeightGenerator, paired with the gradient optimizer.input_type.
Using a built-in generator
RandomProteinGenerator belongs to the mutation family: it replaces the residues at a set of
masked positions with random amino acids. The number of positions altered per call is governed
by a MaskingStrategy. Because a generator only proposes, it can be exercised on its own by
supplying a starting sequence and calling sample() directly.
python
Defining a custom generator
A custom generator is aGenerator subclass registered with the @generator decorator. It
declares an input_type, stores its configuration in __init__, and implements _sample(),
which rewrites each proposal sequence in place.
The generator below is biologically motivated. Instead of substituting arbitrary amino acids,
it replaces a residue only with a biochemically similar one drawn from a table of BLOSUM
neighbours. Such conservative substitutions are more likely to preserve a fold than
unconstrained ones.
python
Validating a generator
As with a constraint, a generator should be exercised in isolation before it is placed in an optimizer. Assign it to a segment, supply a known starting sequence, callsample(), and
confirm that the sequence changes without a change in length or the introduction of invalid
residues.
python
Model-backed and gradient generators
Several of the built-in generators are backed by machine-learning models, yet they do not load those models themselves. A generator such asESM2Generator or ProteinMPNNGenerator stores
only its configuration in __init__ and, in _sample(), calls a proto-tool that runs the
model. Model loading, environment isolation, and device placement are the responsibility of the
tools layer, and its persistence mechanism can keep a model warm across calls, so the model is
not reloaded on every optimization step. The generator declares the tool it depends on through
tools_called.
__init__, since _sample() runs on every step:
PositionWeightGenerator
represents a sequence as a differentiable matrix of position weights and, paired with the
gradient optimizer, follows the gradient of differentiable objectives toward a design target.
The accompanying script applies this approach to two-stage de novo protein hallucination, in
which a continuous, soft sequence is progressively sharpened into a discrete one.
Practical considerations
A model-backed generator should delegate to the tools layer rather than load a model itself:
store configuration in
__init__ and call the model’s tool from _sample(). The tools layer
loads the model and can keep it warm across calls, so it is not reloaded on every step. Only a
generator that wraps a model directly needs to load it once in __init__.For reproducibility, set the seed at the program level by passing
seed= to Program rather
than by calling random.seed() within a generator. The built-in generators draw from a
per-generator RNG that the framework seeds deterministically from the program seed.Next Steps
Using Constraints
Define the objective the proposed sequences are scored against.
Using Optimizers
Combine generators and constraints into a search.
Generators concept
How generators relate to the rest of the model.
Generator reference
The complete catalog of built-in generators.