Skip to main content
License: AlphaGenome uses Apache-2.0 for code and Custom (AlphaGenome Terms of Use) for model weights and has restrictions around commercial use and may require explicit attribution when utilized. Model weights are gated and require accepting the provider’s terms and authenticating with a HuggingFace token. Please refer to the code license and model weights license for full terms.

Proto is not affiliated with Google DeepMind. This toolkit is open source and builds on the implementation produced by this organization. Product names, logos, and trademarks are the property of their respective owners.


google-deepmind/alphagenome_research
google-deepmind/alphagenome_research
Research code accompanying AlphaGenome
701 stars
View repo
google/alphagenome-all-folds
google/alphagenome-all-folds
View model
Advancing regulatory variant effect prediction with AlphaGenome
\vZiga Avsec, Natasha Latysheva, … Pushmeet Kohli
Nature (2026)
Read paper
@article{avsec2026alphagenome,
  title={Advancing regulatory variant effect prediction with AlphaGenome},
  author={Avsec, {\v{Z}}iga and Latysheva, Natasha and Cheng, Jun and Novati, Guido and Taylor, Kyle R and Ward, Tom and Bycroft, Clare and Nicolaisen, Lauren and Arvaniti, Eirini and Pan, Joshua and Thomas, Raina and Dutordoir, Vincent and Perino, Matteo and De, Soham and Karollus, Alexander and Gayoso, Adam and Sargeant, Toby and Mottram, Anne and Wong, Lai Hong and Drot{\'a}r, Pavol and Kosiorek, Adam and Senior, Andrew and Tanburn, Richard and Applebaum, Taylor and Basu, Souradeep and Hassabis, Demis and Kohli, Pushmeet},
  journal={Nature},
  year={2026},
  volume={649},
  number={8099},
  pages={1206--1218},
  doi={10.1038/s41586-025-10014-0}
}
Copy citation
proto-bio/proto-tools/proto_tools/tools/sequence_scoring/alphagenome
View source
Open Notebook
Open notebook
FunctionDescription
run_alphagenome_predict_intervals()Predict genomic signals for batched intervals using AlphaGenome (GPU) Docs Source
run_alphagenome_predict_sequences()Predict genomic signals from batched raw DNA sequences using AlphaGenome (GPU) Docs Source
run_alphagenome_predict_variants()Predict variant effects in batch using AlphaGenome (GPU) Docs Source
run_alphagenome_score_intervals()Score genomic intervals in batch with AlphaGenome interval scorers (GPU) Docs Source
run_alphagenome_score_ism_variants_batch()Run batched in-silico mutagenesis with AlphaGenome variant scorers (GPU) Docs Source
run_alphagenome_score_variants()Score variant effects in batch with AlphaGenome variant scorers (GPU) Docs Source

Background

Gene regulation is encoded in non-coding DNA through cis-regulatory elements such as promoters, enhancers, and insulators, whose activity depends on sequence context that can extend across hundreds of kilobases. Relating a DNA sequence, or a non-coding genetic variant, to its functional consequences therefore requires models that read long stretches of sequence and predict many regulatory readouts together. AlphaGenome (Avsec et al., 2026) is a sequence-to-function model that accepts a genomic interval of up to roughly one megabase and predicts thousands of genome tracks at base or near-base resolution. The predicted assays span RNA-seq coverage, CAGE and PRO-cap transcription initiation, ATAC-seq and DNase-seq chromatin accessibility, ChIP-seq profiles for histone modifications and transcription factors, splice site positions, splice site usage and junctions, and chromatin contact maps. Because the model scores arbitrary sequence, the effect of a variant can be estimated by comparing predictions for the reference and alternate alleles, which supports interpretation of non-coding variation and systematic in silico mutagenesis of regulatory regions. Models are available for both the human and mouse genomes.

Learning Resources

Tools

Predict Intervals (alphagenome-predict-intervals)

Predicts base-resolution regulatory signal tracks for one or more genomic intervals specified by chromosome and coordinates.

API Reference

Source
intervals
List[AlphaGenomeInterval]
required
Genomic intervals to predict. A single interval is auto-wrapped into a list.
Source
model_version
string
default:"all_folds"
AlphaGenome Hugging Face model version.
requested_outputs
List[string]
default:"['RNA_SEQ']"
Output type names to request.
ontology_terms
array
Optional ontology term filters.
organism
enum
default:"human"
Organism for predictions.Available options: human, mouse
verbose
integer
default:"0"
Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.
device
string
default:"cuda"
Device to run inference on.
timeout
integer
default:"1800"
Maximum execution time in seconds. AlphaGenome JAX compilation is slow on first run. None waits indefinitely.
seed
integer
Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.
Source
results
List[AlphaGenomePredictOutput]
required
Per-interval prediction outputs.

Applications

This tool surveys predicted chromatin accessibility, expression, transcription factor binding, and histone marks across a locus of interest, and supports comparison of predicted regulatory activity between cell types or tissues by restricting the prediction to chosen ontology terms. The resulting profiles also serve as references that later variant or mutagenesis analyses can be compared against.

Usage Tips

  • The requested outputs are configurable. Any combination of the available output types may be requested together in a single call.
  • Center the feature of interest. The model has the most flanking context in both directions at the center of the requested interval, so predictions are best supported there.

Predict Sequences (alphagenome-predict-sequences)

Predicts the same regulatory signal tracks directly from raw DNA sequences rather than from genome coordinates.

API Reference

Source
sequences
List[string]
required
Raw DNA sequences (A/C/G/T/N characters). A single string is auto-wrapped into a list. Each sequence must match a supported context length.
Source
model_version
string
default:"all_folds"
AlphaGenome Hugging Face model version.
requested_outputs
List[string]
default:"['RNA_SEQ']"
Output type names to request.
ontology_terms
array
Optional ontology term filters.
organism
enum
default:"human"
Organism for predictions.Available options: human, mouse
verbose
integer
default:"0"
Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.
device
string
default:"cuda"
Device to run inference on.
timeout
integer
default:"1800"
Maximum execution time in seconds. AlphaGenome JAX compilation is slow on first run. None waits indefinitely.
seed
integer
Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.
Source
results
List[AlphaGenomePredictOutput]
required
Per-sequence prediction outputs.

Applications

This tool scores synthetic or edited sequences, such as designed promoters and enhancers, that do not correspond to a reference genome position, and it is well suited to evaluating candidate sequences from a generative model before committing to laboratory synthesis.

Usage Tips

  • Raw sequences are not resized. Each sequence must already be exactly one of the supported context lengths.
  • Only DNA bases are accepted. Sequences may contain only the bases A, C, G, T, and N.

Predict Variants (alphagenome-predict-variants)

Predicts regulatory signal tracks for both the reference and alternate alleles of a variant within its surrounding interval.

API Reference

Source
variants
List[AlphaGenomeVariant]
required
Variants to predict. A single variant is auto-wrapped into a list.
Source
model_version
string
default:"all_folds"
AlphaGenome Hugging Face model version.
requested_outputs
List[string]
default:"['RNA_SEQ']"
Output type names to request.
ontology_terms
array
Optional ontology term filters.
organism
enum
default:"human"
Organism for predictions.Available options: human, mouse
verbose
integer
default:"0"
Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.
device
string
default:"cuda"
Device to run inference on.
timeout
integer
default:"1800"
Maximum execution time in seconds. AlphaGenome JAX compilation is slow on first run. None waits indefinitely.
seed
integer
Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.
Source
results
List[AlphaGenomePredictOutput]
required
Per-variant prediction outputs.

Applications

This tool shows how a non-coding variant reshapes predicted accessibility, expression, or splicing across a region, and it reveals the spatial extent of a variant’s predicted effect rather than reducing it to a single number.

Usage Tips

  • The variant must lie within the interval. A wider interval captures more of the distal regulatory consequences of the variant.
  • Use variant scoring for a ranked summary. When only effect-size magnitudes are needed, the variant scoring operation is more direct than reading the raw tracks.

Score Variants (alphagenome-score-variants)

Summarizes variant effects into per-track records using the model’s recommended variant scorers, comparing reference and alternate predictions.

API Reference

Source
variants
List[AlphaGenomeVariant]
required
Variants to score. A single variant is auto-wrapped into a list.
Source
model_version
string
default:"all_folds"
AlphaGenome Hugging Face model version.
variant_scorers
array
Scorer names from the library’s RECOMMENDED_VARIANT_SCORERS. None uses all recommended.
organism
enum
default:"human"
Organism for predictions.Available options: human, mouse
verbose
integer
default:"0"
Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.
device
string
default:"cuda"
Device to run inference on.
timeout
integer
default:"600"
Maximum execution time in seconds. None waits indefinitely.
seed
integer
Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.
Source
results
List[AlphaGenomeScoreOutput]
required
Per-variant score outputs.

Applications

This tool prioritizes candidate causal variants from a fine-mapping or genome-wide association study and annotates lists of non-coding variants with predicted effects across many assays at once.

Usage Tips

  • variant_scorers defaults to the full recommended set. Leaving it unset gives broad coverage but takes longer, while naming individual scorers restricts the analysis to the assays that matter for the question.
  • The _ACTIVE suffix changes what is measured. Standard scorers report the directional change between alleles (the log fold change of the alternate against the reference), whereas _ACTIVE scorers report the absolute activity level of the stronger allele and are non-directional.

Score Intervals (alphagenome-score-intervals)

Produces gene-level RNA-seq expression scores summarizing predicted activity across one or more genomic intervals.

API Reference

Source
intervals
List[AlphaGenomeInterval]
required
Genomic intervals to score. A single interval is auto-wrapped into a list.
Source
model_version
string
default:"all_folds"
AlphaGenome Hugging Face model version.
interval_scorers
array
Scorer names from the library’s RECOMMENDED_INTERVAL_SCORERS. None uses all recommended.
organism
enum
default:"human"
Organism for predictions.Available options: human, mouse
verbose
integer
default:"0"
Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.
device
string
default:"cuda"
Device to run inference on.
timeout
integer
default:"600"
Maximum execution time in seconds. None waits indefinitely.
seed
integer
Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.
Source
results
List[AlphaGenomeScoreOutput]
required
Per-interval score outputs.

Applications

This tool estimates predicted expression for genes overlapping a region of interest and compares predicted activity across a panel of intervals on a common scale.

Usage Tips

  • Interval scoring is gene-centric. It depends on a wide region of context, so intervals must be large enough to contain the surrounding gene. Very short intervals are not suitable.

Score ISM Variants Batch (alphagenome-score-ism-variants-batch)

Performs in silico mutagenesis by scoring every single-base substitution across a chosen sub-window and returning the effects as per-track records.

API Reference

Source
requests
List[AlphaGenomeISM]
required
ISM requests to process. A single request is auto-wrapped into a list.
Source
model_version
string
default:"all_folds"
AlphaGenome Hugging Face model version.
variant_scorers
array
Scorer names from the library’s RECOMMENDED_VARIANT_SCORERS. None uses all recommended.
organism
enum
default:"human"
Organism for predictions.Available options: human, mouse
verbose
integer
default:"0"
Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.
device
string
default:"cuda"
Device to run inference on.
timeout
integer
default:"600"
Maximum execution time in seconds. None waits indefinitely.
seed
integer
Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.
Source
results
List[AlphaGenomeScoreOutput]
required
Per-request score outputs.

Applications

This tool maps which exact positions within a promoter or enhancer drive a predicted regulatory signal and produces per-base importance profiles suitable for visualization as sequence logos.

Usage Tips

  • Keep the mutagenesis window narrow. The sub-window must lie fully inside the surrounding interval, and because the number of scored substitutions grows with its width, windows of tens to low hundreds of bases keep each run tractable.
  • An existing variant can be applied first. A known variant may be set before mutagenesis to explore how it changes the local sensitivity landscape.

Toolkit Notes

  • The model weights are gated. Running any tool requires accepting the AlphaGenome Terms of Use and authenticating with a HuggingFace access token, and use is restricted to non-commercial scientific research.
  • Coordinates are 0-based and half-open. Genomic intervals follow the BED convention used throughout genome browsers, where the start is inclusive and the end is exclusive.
  • Context lengths are fixed. AlphaGenome operates at 16,384, 131,072, 524,288, and 1,048,576 base pairs. Intervals that do not already match one of these lengths are centered and resized up to the smallest supported length that contains them, and longer windows capture more distal regulation at higher compute cost.
  • Output can be filtered by tissue and organism. The prediction tools restrict their tracks to particular cell types or tissues through UBERON ontology terms, and every tool runs against either the human or mouse genome through the organism setting.
Example notebook: See the full working example for a copy-paste-ready walkthrough.

Infrastructure Guides

The following guides cover how to run tools efficiently and at scale.

Tool Persistence

Keep a tool’s model warm across calls instead of reloading it every invocation.

Device Management

How GPUs are allocated to tools and how to target specific devices.

Parallel Execution

Fan a batch of inputs out across multiple GPUs.

Cloud Inference

Run tools on managed cloud infrastructure with no local setup.