SpliceTransformer constraint reads the donor and acceptor sites in their plasmid context, and an
AlphaGenome constraint predicts how strongly those splice sites are used once the cassette is
inserted into a genomic safe-harbor locus. The GT and AG boundary dinucleotides stay fixed while a
random-nucleotide generator mutates the intron core, and an MCMCOptimizer keeps the proposals
that raise the combined splice signal.
The full script sweeps several plasmid contexts and safe-harbor loci and balances on-target against
off-target cell types. This walkthrough uses one plasmid context, one safe-harbor locus (AAVS1), a
neural-cell ontology term, and two MCMC steps so it runs on a single GPU in a few minutes.
Open as a runnable notebook
View as a Python script
Runtime: this walkthrough runs real models on a GPU and takes several minutes to complete. The first run is slower because it builds the tool environment and downloads model weights.
The intron cassette
The cassette is assembled by the base intron-design helpers:get_initial_intron seeds a GT…AG
intron and process_splice_transformer_input centers it in a 1 kb target with 4 kb of plasmid
context on each side (the 4 kb is SpliceTransformer’s required left/right CONTEXT_LENGTH). The
cassette config records the inputs: intron_length=301 (the GT, a 297 bp core, and the AG, as
the inline comment notes), the plasmid context and gene-sequence files, and the gene insertion
position. The helper returns left_context, right_context, the concatenated target_seq, and
the donor_start / acceptor_end offsets. Those offsets slice the target into three Segment
objects (left flank, designable intron core, right flank) wrapped in a Construct, with the GT
and AG dinucleotides held inside the flanks so only the core is editable. donor_eval and
acceptor_eval mark the positions each scorer reads, just before the GT and just after the AG.
python
The generator and the SpliceTransformer boundary
ARandomNucleotideGenerator proposes mutations at masked positions. Its MaskingStrategy is set
to num_mutations=1, the exact number of positions to change per proposal, so each step edits a
single base; generator.assign(intron) binds it to the intron core, leaving the GT/AG boundaries
in the flanks untouched. The first constraint, splice_transformer_intron_boundary, concatenates
the three segments into the 1 kb target and runs SpliceTransformer over the target plus its 4 kb
flanks. It reads the donor probability at donor_pos and the acceptor probability at
acceptor_pos; the score it returns is a boundary penalty, 1 - mean(donor, acceptor)
probability, so a more recognizable donor and acceptor lowers the penalty.
python
The AlphaGenome splice-site-usage constraint
The second constraint concatenates the same three segments into the target, wraps it with thecassette_left_context / cassette_right_context, and integrates that cassette into the center of
genomic_context, here the AAVS1 safe-harbor locus read from alphagenome_context_aavs1.txt.
AlphaGenome then predicts splice-site usage, and the constraint reads it at the donor and acceptor
positions in splice_pos for the ontology term CL:0002319 (neural cell). With direction="max"
the score is 1 - mean(usage), so higher predicted usage at those positions lowers the score.
device="cuda" runs the model on the GPU.
python
The search
MCMCOptimizer runs Metropolis-Hastings: at each of num_steps=2 steps it generates a mutated
intron core, scores it with both constraints, and accepts or rejects under a cooling temperature.
num_results=1 and proposals_per_result=1 keep a single trajectory with one proposal per step.
clear_tool_cache=True clears the tool cache each iteration. Because both scorers call GPU tools,
DeviceManager.configure(allow_multiple_per_device=True) permits multiple tool instances on one
device so SpliceTransformer and AlphaGenome can both stay resident, and the ToolInstance.persist()
block auto-caches each tool on first use and reuses the warm worker for the rest of the run,
freeing GPU memory on exit. The Program runs the optimizer and collects the result.
python
Inspect the result
program.energy_scores reports the final-stage objective energy, where lower values indicate
better solutions; this combines the two constraint scores for the best trajectory.
intron.result_sequences[0].sequence is the designed intron core, the only editable segment; the
print shows its first 60 bases.
python
Next Steps
Intron Design
The base SpliceTransformer-only intron program.
Cell-Type-Specific DNA
Gradient design against a regulatory-activity predictor.