GT (donor) and AG (acceptor) dinucleotides, embedded
in a kilobase target window with four kilobases of plasmid context on each side, the input format
the SpliceTransformer model expects. The constraints reward strong donor and acceptor usage and a
tissue-specific splicing preference; an MCMCOptimizer searches the intron core.
The full script optimizes a 301 bp intron in a mScarlet reporter across several plasmid contexts
for thousands of steps. This walkthrough uses a short intron, one context, and a few steps so it
runs quickly. It requires a GPU for SpliceTransformer.
Open as a runnable notebook
View as a Python script
Runtime: this walkthrough runs real models on a GPU and takes several minutes to complete. The first run is slower because it builds the tool environment and downloads model weights.
Build the splicing window
SpliceTransformer scores each position of a 1 kb target sequence given 4 kb of flanking context on each side (a 9 kb window in total). This cell assembles that input. A fresh intron core is generated withrandom.choices and wrapped in the fixed GT donor and AG acceptor
dinucleotides, then centered in the TARGET_LENGTH (1000 bp) target with plasmid sequence
filling the remaining positions. donor_pos and acceptor_pos are the zero-indexed positions
the constraints will score: SpliceTransformer scores the donor at the base just before GT and
the acceptor at the base just after AG. random.seed(0) makes the generated core
reproducible.
python
Segments
ASegment is a stretch of sequence; a Construct groups the segments that make up one
molecule. The target is split into three segments, each carrying a starting sequence sliced
from target and sequence_type="dna": a fixed left_flank ending in the GT donor, the
variable intron core, and a fixed right_flank beginning with the AG acceptor. Only the
intron segment is assigned a generator below, so the flanks (and the splice-site
dinucleotides) stay fixed while the core is designed. The label on each segment is the key its
results are filed under.
python
Generator and splicing constraints
The generator proposes new sequences for the optimizer to score.RandomNucleotideGenerator
substitutes random bases at masked positions; MaskingStrategy(num_mutations=2) sets the exact
number of positions mutated per call to two. generator.assign(intron) binds it to the core
segment, so only the core is mutated. Constraints score how well a sequence meets the design
objective, and the optimizer searches for sequences that raise those scores. Both constraints
here read all three segments through inputs=[left_flank, intron, right_flank], which they
concatenate into the 1 kb target. splice_transformer_intron_boundary scores donor and acceptor
prediction at the donor_pos/acceptor_pos positions; splice_transformer_specificity scores
tissue-specific splice site usage at those same positions, here with tissue="BRAIN" and
direction="max" (maximize brain splicing). Each carries the 4 kb left_context and
right_context SpliceTransformer requires, and a label keying its scores in the result
metadata.
python
Run the search
TheMCMCOptimizer ties the construct, generator, and constraints together and runs
Metropolis-Hastings: at each step it generates proposals, scores them against the two splice
constraints, and accepts or rejects, always keeping improvements and accepting worse proposals
with a probability that falls as the temperature anneals from max_temperature (1.0) to
min_temperature (0.001) over num_steps. Here num_steps=3 runs a brief demonstration; the
intron core evolves while the donor and acceptor sites stay fixed in the flanks. The Program
runs the optimizer and collects the result.
python
Inspect the result
The designed intron core is read back fromintron.result_sequences[0], the optimized sequence
for that segment. program.constructs[0].joined_sequences[0] is the full cassette with the
flanks rejoined; its length confirms the assembled target is still 1000 bp, the window
SpliceTransformer expects.
python
Next Steps
Using Constraints
The splicing constraints used here.
Multi-Stage DNA Optimization
Another multi-segment DNA program.