Skip to main content
This program designs a DNA insert whose predicted chromatin-accessibility profile spells Morse code. A fixed left flank and right flank surround a designable Target segment. An autoregressive generator (Evo2) proposes candidate inserts, and two constraints fold the full construct through chromatin-accessibility predictors (Borzoi and Enformer) and reward a signal that is high inside the dot and dash windows and low inside the gaps. A BeamSearchOptimizer extends the insert a few tokens at a time, keeping the candidates that best match the pattern. The full script writes a roughly 21 kb insert that spells several letters and scores it across many Borzoi ensemble replicates. This walkthrough uses a two-symbol pattern, a short 256 bp target, and two beam proposals so it runs on a single GPU in a few minutes. Open as a runnable notebook View as a Python script
Runtime: this walkthrough runs real models on a GPU and takes several minutes to complete. The first run is slower because it builds the tool environment and downloads model weights.

The construct

A Segment is one stretch of sequence; a Construct groups the segments that make up the molecule. The Morse constraints take three inputs in a fixed order: a left flank, the designable Target, and a right flank. The flanks here are fixed: passing a sequence (the trimmed ends of two FASTA files) fills them in and leaves nothing to design, while length=256 with no sequence leaves the target open for the optimizer to fill. The flanks supply genomic context; the constraint pads each side with neutral A bases up to the model’s receptive field, so short flanks are sufficient here. The target carries label="Target" because the Morse constraints route their per-window metadata to that name (metadata_recipient="Target").
python
from pathlib import Path
from proto_language.core import Segment, Construct

DATA = Path.cwd().parent / "data"


def read_fasta(path):
    lines = [l.strip() for l in path.read_text().splitlines() if l and not l.startswith(">")]
    return "".join(lines).upper()

# Short flanks; the Morse constraint pads context up to each model's receptive field.
left_seq = read_fasta(DATA / "creb_dna_design_left_flank.fasta")[-1000:]
right_seq = read_fasta(DATA / "creb_dna_design_right_flank.fasta")[:1000]

TARGET_LEN = 256  # the designable insert, large enough to hold the pattern below

left_flank = Segment(sequence=left_seq, sequence_type="dna", label="left_flank")
target = Segment(length=TARGET_LEN, sequence_type="dna", label="Target")
right_flank = Segment(sequence=right_seq, sequence_type="dna", label="right_flank")

construct = Construct([left_flank, target, right_flank], label="morse_construct")

The generator

The generator proposes new sequences for the optimizer to score. Evo2Generator wraps Evo2, a 7B parameter genomic language model that generates DNA autoregressively, token by token from left to right. The beam search asks it for short continuations of the insert, each conditioned on a prompt taken from the end of the left flank. temperature=1.0 and top_k=4 control sampling: temperature sets how sharply sampling favors high-probability tokens and top_k limits each step to the four most probable bases. batch_size=2 sets how many sequences run together on the GPU. Setting prepend_prompt=False keeps the prompt out of the returned tokens, so only the newly generated bases fill the target. generator.assign(target) binds the generator to the segment it fills.
python
from proto_language.generator import Evo2Generator, Evo2GeneratorConfig

evo_prompt = left_seq[-256:]  # condition Evo2 on the sequence immediately upstream of the insert

generator = Evo2Generator(
    Evo2GeneratorConfig(
        prompts=[evo_prompt],
        temperature=1.0,
        top_k=4,
        batch_size=2,
        prepend_prompt=False,
        force_prompt_threshold=1,
        stop_at_eos=False,
        device="cuda",
    )
)
generator.assign(target)

The Morse constraints

Constraints score how well a sequence meets the objective; the optimizer searches for sequences that lower these scores (lower energy is better). Each constraint folds the full left/target/right construct through a chromatin-accessibility predictor and compares the predicted signal over the target to the Morse pattern. The pattern string uses dots and dashes with spaces between letters; here ".-" lays out one dot window (a short accessible region) followed by one dash window (a long one), with a gap between them. Scoring is the mean absolute difference between that target pattern and the normalized predicted signal, so a sequence whose predicted accessibility is high inside the dot and dash windows and low in the gaps scores best. The two predictors, Borzoi and Enformer, are listed as separate Constraint objects each reading [left_flank, target, right_flank]; both carry weight=1.0, so their scores are summed unweighted.
python
from proto_language.core import Constraint
from proto_language.constraint import (
    BorzoiChromatinAccessibilityMorseConfig,
    EnformerChromatinAccessibilityMorseConfig,
    borzoi_chromatin_accessibility_morse_constraint,
    enformer_chromatin_accessibility_morse_constraint,
)

PATTERN = ".-"  # one dot (short accessible window) then one dash (long accessible window)

borzoi = Constraint(
    inputs=[left_flank, target, right_flank],
    function=borzoi_chromatin_accessibility_morse_constraint,
    function_config=BorzoiChromatinAccessibilityMorseConfig(pattern=PATTERN, device="cuda"),
    label="morse_borzoi",
    weight=1.0,
)
enformer = Constraint(
    inputs=[left_flank, target, right_flank],
    function=enformer_chromatin_accessibility_morse_constraint,
    function_config=EnformerChromatinAccessibilityMorseConfig(pattern=PATTERN, device="cuda"),
    label="morse_enformer",
    weight=1.0,
)
BeamSearchOptimizer grows a single target_segment from the prompt, beam_length tokens at a time. The segment is split into ceil(target_length / beam_length) steps; with a 256 bp target and beam_length=128 the insert is built in two steps. At each step it asks the generator for proposals_per_result continuations per beam, scores each proposal’s full accumulated sequence with the two Morse constraints, and keeps the top num_results beams to seed the next step. score_by controls how a beam’s per-step energies are aggregated when ranking: "last" uses only the most recent step’s score. use_kv_caching=True reuses Evo2’s cached state across steps, and prepend_prompt=False excludes the prompt from the final output. Program runs the optimizer with a fixed seed=0 for reproducibility. DeviceManager.configure(allow_multiple_per_device=True) permits multiple tool instances on the same GPU so Evo2 and the scoring models can share one device. ToolInstance.persist() keeps the tool workers (and their KV caches) alive across the run.
python
from proto_language.core import Program
from proto_language.optimizer import BeamSearchOptimizer, BeamSearchOptimizerConfig
from proto_tools.utils import DeviceManager
from proto_tools.utils.tool_instance import ToolInstance

optimizer = BeamSearchOptimizer(
    target_segment=target,
    constructs=[construct],
    generators=[generator],
    constraints=[borzoi, enformer],
    config=BeamSearchOptimizerConfig(
        prompt=evo_prompt,
        beam_length=128,          # 256 bp target / 128 = 2 beam steps
        num_results=1,
        proposals_per_result=2,
        score_by="last",
        use_kv_caching=True,
        prepend_prompt=False,
    ),
)

DeviceManager.get_instance().configure(allow_multiple_per_device=True)
program = Program(optimizers=[optimizer], num_results=1, seed=0)

with ToolInstance.persist():
    program.run()

Inspect the result

The optimizer writes its surviving beams to target.result_sequences; the first entry is the best-ranked insert. program.energy_scores reports the final-stage energies, where lower is better, so energy_scores[0] is the objective value for that top insert. Printing both shows the designed target sequence alongside the energy it achieved against the combined Borzoi and Enformer Morse objective.
python
best_target = target.result_sequences[0].sequence
print(f"objective energy: {program.energy_scores[0]:.4f}")
print(f"designed insert:  {best_target[:60]}...")
objective energy: 0.7788
designed insert:  CATTCACTCCCCTTCGAAGTCCTAGACTTAGGTAGCCCTATTTCCTTATTCTTCCTTAGA...

Next Steps

Cell-Type-Specific DNA

Gradient design against a regulatory-activity predictor.

Using Optimizers

Beam search and the other optimization strategies.