
This toolkit is open source. Any third-party models, product names, or trademarks referenced are the property of their respective owners, and Proto is not affiliated with them.
Background
RNA splicing removes introns from pre-mRNA and joins exons, guided by sequence motifs at the donor (5’) and acceptor (3’) splice sites. Variants that create or disrupt these motifs can cause exon skipping, intron retention, or cryptic splicing, and are a major and frequently overlooked class of disease-causing mutations. SpliceAI (Jaganathan et al., 2019) is a deep dilated residual convolutional network that reads 10,000 bp of flanking context (5,000 bp per side) and outputs, for every position, the probability of being an acceptor, a donor, or neither. For variant interpretation, SpliceAI compares predictions for the reference and alternate sequences and reports four delta scores in [0, 1] — acceptor gain (DS_AG), acceptor loss (DS_AL), donor gain (DS_DG), and donor loss (DS_DL) — together with the delta positions (DP_*) of the affected sites relative to the variant. The maximum delta score is the headline number: the paper characterizes cutoffs of 0.2 (high recall), 0.5 (recommended), and 0.8 (high precision). The shipped model is an ensemble of five models whose per-position outputs are averaged. All variant coordinates follow the 1-based VCF convention.Learning Resources
- SpliceAI repository (Illumina) - the canonical CLI, the
Annotator/get_delta_scoresPython API, and the bundled GENCODE annotations and ensemble weights. - Jaganathan et al., 2019 (Cell) - the original paper describing the architecture, training data, and clinical validation of delta scores.
Tools
SpliceAI Variant Scoring (spliceai-score)
Scores genetic variants (chromosome / 1-based position / ref / alt) for splice-altering effects, returning per-gene delta scores and delta positions for acceptor and donor gain/loss. Requires a reference genome FASTA and a gene annotation (the bundled grch37/grch38, or a custom file).API Reference
Input: SpliceAIScoreInput
Input: SpliceAIScoreInput
Config: SpliceAIScoreConfig
Config: SpliceAIScoreConfig
None raises a MissingAssetError so un-provisioned hosts skip cleanly.'grch37' or 'grch38' (GENCODE files bundled with SpliceAI) or a path to a custom tab-separated annotation file.-D flag).-M flag).True is coerced to 1 and False to 0.None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.Output: SpliceAIScoreOutput
Output: SpliceAIScoreOutput
results item)| Metric | Type | Range | Availability |
|---|---|---|---|
max_delta_score | float | 0.0 to 1.0 | present for scored variants overlapping an annotated gene |
Applications
Use this to triage candidate variants from a sequencing study for splicing impact, to annotate a VCF with SpliceAI predictions, or to prioritize variants of uncertain significance where a coding effect is absent but a splicing effect is plausible. Themax_delta_score metric supports threshold-based filtering at the recommended 0.2 / 0.5 / 0.8 cutoffs.Usage Tips
reference_fastais required andpositionis 1-based. SpliceAI extracts the wild-type window around each variant from the genome you supply, so the FASTA, the annotation, and each variant’schromosomemust use consistent identifiers. Note this is the opposite ofAlphaGenome, whose coordinates are 0-based.annotationselects the gene model.grch37andgrch38load the GENCODE files bundled with SpliceAI; pass a path to score against a custom tab-separated annotation. Changing it restarts the worker.max_distance(default 50) andmaskmirror the SpliceAI-D/-Mflags. Widenmax_distanceto report splice sites farther from the variant; enablemaskto suppress scores for annotated-gain and unannotated-loss positions.
SpliceAI Splice-Site Prediction (spliceai-predict)
Predicts per-position [neither, acceptor, donor] probabilities directly from one or more DNA sequences. No reference genome is needed — the model runs on the sequence as given, padding 5,000 bp of context per side internally.API Reference
Input: SpliceAIPredictInput
Input: SpliceAIPredictInput
Config: SpliceAIPredictConfig
Config: SpliceAIPredictConfig
True is coerced to 1 and False to 0.None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.Output: SpliceAIPredictOutput
Output: SpliceAIPredictOutput
[neither, acceptor, donor] probability triple. Outer length and order match the input sequences; each inner length equals the corresponding input sequence’s length.Applications
Use this to scan an engineered construct, a minigene, or a transcript for latent splice sites, to visualize the acceptor/donor probability landscape across a region of interest, or to compare splice-site usage between designed sequence variants without assembling a genome and annotation.Usage Tips
- Output channels are
[neither, acceptor, donor]. Index channel 1 for acceptor and channel 2 for donor probabilities; each per-sequence array has the same length as the corresponding input sequence. - Sequences may differ in length. They are scored independently (per-item caching applies), so batching ragged sequences is fine; very short sequences still receive the full 10,000 bp
N-padded context.
Toolkit Notes
These apply to both SpliceAI tools in this toolkit (spliceai-score, spliceai-predict).
- Runs on GPU or CPU via TensorFlow. SpliceAI is the only TensorFlow tool in the catalog; the standalone env pins TensorFlow 2.15 (Keras 2) so the bundled
.h5models load, which constrains the runtime to Python 3.11. TensorFlow falls back to CPU automatically when no GPU is visible. - Weights and annotations ship with the package. The five ensemble models and the GENCODE
grch37/grch38annotations are bundled inpip install spliceai, so no weight download is needed. The reference genome FASTA forspliceai-scoreis user-supplied at call time. - Non-commercial license. SpliceAI’s code is PolyForm Strict and its bundled models are CC-BY-NC-4.0 — both noncommercial, so the toolkit is not hostable on Proto; commercial use requires a license from Illumina.