Target segment. An autoregressive
generator (Evo2) proposes candidate inserts, and two constraints fold the full construct through
chromatin-accessibility predictors (Borzoi and Enformer) and reward a signal that is high inside
the dot and dash windows and low inside the gaps. A BeamSearchOptimizer extends the insert a few
tokens at a time, keeping the candidates that best match the pattern.
The full script writes a roughly 21 kb insert that spells several letters and scores it across many
Borzoi ensemble replicates. This walkthrough uses a two-symbol pattern, a short 256 bp target, and
two beam proposals so it runs on a single GPU in a few minutes.
Open as a runnable notebook
View as a Python script
Runtime: this walkthrough runs real models on a GPU and takes several minutes to complete. The first run is slower because it builds the tool environment and downloads model weights.
The construct
ASegment is one stretch of sequence; a Construct groups the segments that make up the
molecule. The Morse constraints take three inputs in a fixed order: a left flank, the designable
Target, and a right flank. The flanks here are fixed: passing a sequence (the trimmed ends of
two FASTA files) fills them in and leaves nothing to design, while length=256 with no sequence
leaves the target open for the optimizer to fill. The flanks supply genomic context; the constraint
pads each side with neutral A bases up to the model’s receptive field, so short flanks are
sufficient here. The target carries label="Target" because the Morse constraints route their
per-window metadata to that name (metadata_recipient="Target").
python
The generator
The generator proposes new sequences for the optimizer to score.Evo2Generator wraps Evo2, a 7B
parameter genomic language model that generates DNA autoregressively, token by token from left to
right. The beam search asks it for short continuations of the insert, each conditioned on a prompt
taken from the end of the left flank. temperature=1.0 and top_k=4 control sampling: temperature
sets how sharply sampling favors high-probability tokens and top_k limits each step to the four
most probable bases. batch_size=2 sets how many sequences run together on the GPU. Setting
prepend_prompt=False keeps the prompt out of the returned tokens, so only the newly generated
bases fill the target. generator.assign(target) binds the generator to the segment it fills.
python
The Morse constraints
Constraints score how well a sequence meets the objective; the optimizer searches for sequences that lower these scores (lower energy is better). Each constraint folds the full left/target/right construct through a chromatin-accessibility predictor and compares the predicted signal over the target to the Morse pattern. Thepattern string uses dots and dashes with spaces between letters;
here ".-" lays out one dot window (a short accessible region) followed by one dash window (a long
one), with a gap between them. Scoring is the mean absolute difference between that target pattern
and the normalized predicted signal, so a sequence whose predicted accessibility is high inside the
dot and dash windows and low in the gaps scores best. The two predictors, Borzoi and Enformer, are
listed as separate Constraint objects each reading [left_flank, target, right_flank]; both carry
weight=1.0, so their scores are summed unweighted.
python
The beam search
BeamSearchOptimizer grows a single target_segment from the prompt, beam_length tokens at a
time. The segment is split into ceil(target_length / beam_length) steps; with a 256 bp target and
beam_length=128 the insert is built in two steps. At each step it asks the generator for
proposals_per_result continuations per beam, scores each proposal’s full accumulated sequence with
the two Morse constraints, and keeps the top num_results beams to seed the next step. score_by
controls how a beam’s per-step energies are aggregated when ranking: "last" uses only the most
recent step’s score. use_kv_caching=True reuses Evo2’s cached state across steps, and
prepend_prompt=False excludes the prompt from the final output. Program runs the optimizer with
a fixed seed=0 for reproducibility.
DeviceManager.configure(allow_multiple_per_device=True) permits multiple tool instances on the
same GPU so Evo2 and the scoring models can share one device. ToolInstance.persist() keeps the
tool workers (and their KV caches) alive across the run.
python
Inspect the result
The optimizer writes its surviving beams totarget.result_sequences; the first entry is the
best-ranked insert. program.energy_scores reports the final-stage energies, where lower is better,
so energy_scores[0] is the objective value for that top insert. Printing both shows the designed
target sequence alongside the energy it achieved against the combined Borzoi and Enformer Morse
objective.
python
Next Steps
Cell-Type-Specific DNA
Gradient design against a regulatory-activity predictor.
Using Optimizers
Beam search and the other optimization strategies.