AlphaGenome - Proto

License: AlphaGenome uses Apache-2.0 for code and Custom (AlphaGenome Terms of Use) for model weights and has restrictions around commercial use and may require explicit attribution when utilized. Model weights are gated and require accepting the provider’s terms and authenticating with a HuggingFace token. Please refer to the code license and model weights license for full terms.

Proto is not affiliated with Google DeepMind. This toolkit is open source and builds on the implementation produced by this organization. Product names, logos, and trademarks are the property of their respective owners.

GitHub 701 GitHub 701

HuggingFace

HuggingFace Publication Publication Cite Cite Tool Source Tool Source Open as Notebook Open as Notebook

google-deepmind/alphagenome_research

Research code accompanying AlphaGenome

701 stars

View repo

google/alphagenome-all-folds

View model

Advancing regulatory variant effect prediction with AlphaGenome

\vZiga Avsec, Natasha Latysheva, … Pushmeet Kohli

Nature (2026)

Read paper

@article{avsec2026alphagenome,
  title={Advancing regulatory variant effect prediction with AlphaGenome},
  author={Avsec, {\v{Z}}iga and Latysheva, Natasha and Cheng, Jun and Novati, Guido and Taylor, Kyle R and Ward, Tom and Bycroft, Clare and Nicolaisen, Lauren and Arvaniti, Eirini and Pan, Joshua and Thomas, Raina and Dutordoir, Vincent and Perino, Matteo and De, Soham and Karollus, Alexander and Gayoso, Adam and Sargeant, Toby and Mottram, Anne and Wong, Lai Hong and Drot{\'a}r, Pavol and Kosiorek, Adam and Senior, Andrew and Tanburn, Richard and Applebaum, Taylor and Basu, Souradeep and Hassabis, Demis and Kohli, Pushmeet},
  journal={Nature},
  year={2026},
  volume={649},
  number={8099},
  pages={1206--1218},
  doi={10.1038/s41586-025-10014-0}
}

Copy citation

proto-bio/proto-tools/proto_tools/tools/sequence_scoring/alphagenome

View source

Open Notebook

Open notebook

Function	Description
`run_alphagenome_predict_intervals()`	Predict genomic signals for batched intervals using AlphaGenome (GPU)	Docs Source
`run_alphagenome_predict_sequences()`	Predict genomic signals from batched raw DNA sequences using AlphaGenome (GPU)	Docs Source
`run_alphagenome_predict_variants()`	Predict variant effects in batch using AlphaGenome (GPU)	Docs Source
`run_alphagenome_score_intervals()`	Score genomic intervals in batch with AlphaGenome interval scorers (GPU)	Docs Source
`run_alphagenome_score_ism_variants_batch()`	Run batched in-silico mutagenesis with AlphaGenome variant scorers (GPU)	Docs Source
`run_alphagenome_score_variants()`	Score variant effects in batch with AlphaGenome variant scorers (GPU)	Docs Source

Background

Gene regulation is encoded in non-coding DNA through cis-regulatory elements such as promoters, enhancers, and insulators, whose activity depends on sequence context that can extend across hundreds of kilobases. Relating a DNA sequence, or a non-coding genetic variant, to its functional consequences therefore requires models that read long stretches of sequence and predict many regulatory readouts together. AlphaGenome (Avsec et al., 2026) is a sequence-to-function model that accepts a genomic interval of up to roughly one megabase and predicts thousands of genome tracks at base or near-base resolution. The predicted assays span RNA-seq coverage, CAGE and PRO-cap transcription initiation, ATAC-seq and DNase-seq chromatin accessibility, ChIP-seq profiles for histone modifications and transcription factors, splice site positions, splice site usage and junctions, and chromatin contact maps. Because the model scores arbitrary sequence, the effect of a variant can be estimated by comparing predictions for the reference and alternate alleles, which supports interpretation of non-coding variation and systematic in silico mutagenesis of regulatory regions. Models are available for both the human and mouse genomes.

Learning Resources

AlphaGenome overview (Google DeepMind) - the official project page summarizing what AlphaGenome does, how to access it, and its model terms.
AlphaGenome: AI for better understanding the genome (Google DeepMind) - the announcement blog post introducing the model, its capabilities, and intended research uses.
AlphaGenome research code (GitHub) - the reference client and model code, with example notebooks for prediction, variant scoring, and in silico mutagenesis.
AlphaGenome model weights (HuggingFace) - the gated model card describing the released checkpoints and their terms of use.

Tools

Predict Intervals (`alphagenome-predict-intervals`)

Predicts base-resolution regulatory signal tracks for one or more genomic intervals specified by chromosome and coordinates.

API Reference

Source

Input: AlphaGenomePredictIntervalsInput

intervals

List[AlphaGenomeInterval]

required

Genomic intervals to predict. A single interval is auto-wrapped into a list.

Show AlphaGenomeInterval

chromosome

string

required

Chromosome identifier, e.g. 'chr1'.

interval_start

integer

required

Interval start (0-based, inclusive).

interval_end

integer

required

Interval end (0-based, exclusive).

Source

Config: AlphaGenomePredictConfig

model_version

string

default:"all_folds"

AlphaGenome Hugging Face model version.

requested_outputs

List[string]

default:"['RNA_SEQ']"

Output type names to request.

ontology_terms

array

Optional ontology term filters.

organism

enum

default:"human"

Organism for predictions.Available options: human, mouse

verbose

integer

default:"0"

Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.

device

string

default:"cuda"

Device to run inference on.

timeout

integer

default:"1800"

Maximum execution time in seconds. AlphaGenome JAX compilation is slow on first run. None waits indefinitely.

seed

integer

Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.

Source

Output: AlphaGenomePredictIntervalsOutput

results

List[AlphaGenomePredictOutput]

required

Per-interval prediction outputs.

Show AlphaGenomePredictOutput

chromosome

string

required

Chromosome identifier.

interval_start

integer

required

Interval start (0-based).

interval_end

integer

required

Interval end (0-based, exclusive).

requested_outputs

List[string]

required

Output types requested.

result

Dict[string, any]

required

Serialized AlphaGenome prediction payload.

variant

Dict[string, any]

Variant metadata (variant predictions only).

tool_id

string

Unique tool identifier (e.g., "blast-search").

execution_time

number

Execution time in seconds.

timestamp

string

Execution timestamp.

success

boolean

Whether execution succeeded. True for any successful call. False only when the tool failed and PROTO_CAPTURE_ERRORS=1 is set; on the default raise path failures raise instead of returning an output. See notes/error-handling.md.

warnings

List[string]

Non-fatal warnings generated during execution.

errors

List[string]

Fatal error messages. Populated only when the tool failed and the wrapper is in capture mode; empty on success and on the default raise path. Each entry is "TypeName: message" followed by the formatted traceback. See notes/error-handling.md.

metadata

Dict[string, any]

Additional tool-specific metadata.

Applications

This tool surveys predicted chromatin accessibility, expression, transcription factor binding, and histone marks across a locus of interest, and supports comparison of predicted regulatory activity between cell types or tissues by restricting the prediction to chosen ontology terms. The resulting profiles also serve as references that later variant or mutagenesis analyses can be compared against.

Usage Tips

The requested outputs are configurable. Any combination of the available output types may be requested together in a single call.
Center the feature of interest. The model has the most flanking context in both directions at the center of the requested interval, so predictions are best supported there.

Predict Sequences (`alphagenome-predict-sequences`)

Predicts the same regulatory signal tracks directly from raw DNA sequences rather than from genome coordinates.

API Reference

Source

Input: AlphaGenomePredictSequencesInput

sequences

List[string]

required

Raw DNA sequences (A/C/G/T/N characters). A single string is auto-wrapped into a list. Each sequence must match a supported context length.

Source

Config: AlphaGenomePredictConfig

model_version

string

default:"all_folds"

AlphaGenome Hugging Face model version.

requested_outputs

List[string]

default:"['RNA_SEQ']"

Output type names to request.

ontology_terms

array

Optional ontology term filters.

organism

enum

default:"human"

Organism for predictions.Available options: human, mouse

verbose

integer

default:"0"

Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.

device

string

default:"cuda"

Device to run inference on.

timeout

integer

default:"1800"

Maximum execution time in seconds. AlphaGenome JAX compilation is slow on first run. None waits indefinitely.

seed

integer

Source

Output: AlphaGenomePredictSequencesOutput

results

List[AlphaGenomePredictOutput]

required

Per-sequence prediction outputs.

Show AlphaGenomePredictOutput

chromosome

string

required

Chromosome identifier.

interval_start

integer

required

Interval start (0-based).

interval_end

integer

required

Interval end (0-based, exclusive).

requested_outputs

List[string]

required

Output types requested.

result

Dict[string, any]

required

Serialized AlphaGenome prediction payload.

variant

Dict[string, any]

Variant metadata (variant predictions only).

tool_id

string

Unique tool identifier (e.g., "blast-search").

execution_time

number

Execution time in seconds.

timestamp

string

Execution timestamp.

success

boolean

warnings

List[string]

Non-fatal warnings generated during execution.

errors

List[string]

metadata

Dict[string, any]

Additional tool-specific metadata.

Applications

This tool scores synthetic or edited sequences, such as designed promoters and enhancers, that do not correspond to a reference genome position, and it is well suited to evaluating candidate sequences from a generative model before committing to laboratory synthesis.

Usage Tips

Raw sequences are not resized. Each sequence must already be exactly one of the supported context lengths.
Only DNA bases are accepted. Sequences may contain only the bases A, C, G, T, and N.

Predict Variants (`alphagenome-predict-variants`)

Predicts regulatory signal tracks for both the reference and alternate alleles of a variant within its surrounding interval.

API Reference

Source

Input: AlphaGenomePredictVariantsInput

variants

List[AlphaGenomeVariant]

required

Variants to predict. A single variant is auto-wrapped into a list.

Show AlphaGenomeVariant

variant_position

integer

required

Variant genomic position (0-based).

reference_bases

string

required

Reference allele, e.g. 'A' or 'AC'.

alternate_bases

string

required

Alternate allele, e.g. 'G' or 'GTT'.

chromosome

string

required

Chromosome identifier, e.g. 'chr1'.

interval_start

integer

required

Interval start (0-based, inclusive).

interval_end

integer

required

Interval end (0-based, exclusive).

Source

Config: AlphaGenomePredictConfig

model_version

string

default:"all_folds"

AlphaGenome Hugging Face model version.

requested_outputs

List[string]

default:"['RNA_SEQ']"

Output type names to request.

ontology_terms

array

Optional ontology term filters.

organism

enum

default:"human"

Organism for predictions.Available options: human, mouse

verbose

integer

default:"0"

Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.

device

string

default:"cuda"

Device to run inference on.

timeout

integer

default:"1800"

Maximum execution time in seconds. AlphaGenome JAX compilation is slow on first run. None waits indefinitely.

seed

integer

Source

Output: AlphaGenomePredictVariantsOutput

results

List[AlphaGenomePredictOutput]

required

Per-variant prediction outputs.

Show AlphaGenomePredictOutput

chromosome

string

required

Chromosome identifier.

interval_start

integer

required

Interval start (0-based).

interval_end

integer

required

Interval end (0-based, exclusive).

requested_outputs

List[string]

required

Output types requested.

result

Dict[string, any]

required

Serialized AlphaGenome prediction payload.

variant

Dict[string, any]

Variant metadata (variant predictions only).

tool_id

string

Unique tool identifier (e.g., "blast-search").

execution_time

number

Execution time in seconds.

timestamp

string

Execution timestamp.

success

boolean

warnings

List[string]

Non-fatal warnings generated during execution.

errors

List[string]

metadata

Dict[string, any]

Additional tool-specific metadata.

Applications

This tool shows how a non-coding variant reshapes predicted accessibility, expression, or splicing across a region, and it reveals the spatial extent of a variant’s predicted effect rather than reducing it to a single number.

Usage Tips

The variant must lie within the interval. A wider interval captures more of the distal regulatory consequences of the variant.
Use variant scoring for a ranked summary. When only effect-size magnitudes are needed, the variant scoring operation is more direct than reading the raw tracks.

Score Variants (`alphagenome-score-variants`)

Summarizes variant effects into per-track records using the model’s recommended variant scorers, comparing reference and alternate predictions.

API Reference

Source

Input: AlphaGenomeScoreVariantsInput

variants

List[AlphaGenomeVariant]

required

Variants to score. A single variant is auto-wrapped into a list.

Show AlphaGenomeVariant

variant_position

integer

required

Variant genomic position (0-based).

reference_bases

string

required

Reference allele, e.g. 'A' or 'AC'.

alternate_bases

string

required

Alternate allele, e.g. 'G' or 'GTT'.

chromosome

string

required

Chromosome identifier, e.g. 'chr1'.

interval_start

integer

required

Interval start (0-based, inclusive).

interval_end

integer

required

Interval end (0-based, exclusive).

Source

Config: AlphaGenomeScoreVariantsConfig

model_version

string

default:"all_folds"

AlphaGenome Hugging Face model version.

variant_scorers

array

Scorer names from the library’s RECOMMENDED_VARIANT_SCORERS. None uses all recommended.

organism

enum

default:"human"

Organism for predictions.Available options: human, mouse

verbose

integer

default:"0"

Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.

device

string

default:"cuda"

Device to run inference on.

timeout

integer

default:"600"

Maximum execution time in seconds. None waits indefinitely.

seed

integer

Source

Output: AlphaGenomeScoreVariantsOutput

results

List[AlphaGenomeScoreOutput]

required

Per-variant score outputs.

Show AlphaGenomeScoreOutput

scores

List[Dict[string, any]]

required

Tidy score records. Each dict contains keys such as variant_id, scored_interval, gene_id, gene_name, output_type, variant_scorer or interval_scorer, track_name, raw_score, etc.

tool_id

string

Unique tool identifier (e.g., "blast-search").

execution_time

number

Execution time in seconds.

timestamp

string

Execution timestamp.

success

boolean

warnings

List[string]

Non-fatal warnings generated during execution.

errors

List[string]

metadata

Dict[string, any]

Additional tool-specific metadata.

Applications

This tool prioritizes candidate causal variants from a fine-mapping or genome-wide association study and annotates lists of non-coding variants with predicted effects across many assays at once.

Usage Tips

variant_scorers defaults to the full recommended set. Leaving it unset gives broad coverage but takes longer, while naming individual scorers restricts the analysis to the assays that matter for the question.
The _ACTIVE suffix changes what is measured. Standard scorers report the directional change between alleles (the log fold change of the alternate against the reference), whereas _ACTIVE scorers report the absolute activity level of the stronger allele and are non-directional.

Score Intervals (`alphagenome-score-intervals`)

Produces gene-level RNA-seq expression scores summarizing predicted activity across one or more genomic intervals.

API Reference

Source

Input: AlphaGenomeScoreIntervalsInput

intervals

List[AlphaGenomeInterval]

required

Genomic intervals to score. A single interval is auto-wrapped into a list.

Show AlphaGenomeInterval

chromosome

string

required

Chromosome identifier, e.g. 'chr1'.

interval_start

integer

required

Interval start (0-based, inclusive).

interval_end

integer

required

Interval end (0-based, exclusive).

Source

Config: AlphaGenomeScoreIntervalsConfig

model_version

string

default:"all_folds"

AlphaGenome Hugging Face model version.

interval_scorers

array

Scorer names from the library’s RECOMMENDED_INTERVAL_SCORERS. None uses all recommended.

organism

enum

default:"human"

Organism for predictions.Available options: human, mouse

verbose

integer

default:"0"

Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.

device

string

default:"cuda"

Device to run inference on.

timeout

integer

default:"600"

Maximum execution time in seconds. None waits indefinitely.

seed

integer

Source

Output: AlphaGenomeScoreIntervalsOutput

results

List[AlphaGenomeScoreOutput]

required

Per-interval score outputs.

Show AlphaGenomeScoreOutput

scores

List[Dict[string, any]]

required

Tidy score records. Each dict contains keys such as variant_id, scored_interval, gene_id, gene_name, output_type, variant_scorer or interval_scorer, track_name, raw_score, etc.

tool_id

string

Unique tool identifier (e.g., "blast-search").

execution_time

number

Execution time in seconds.

timestamp

string

Execution timestamp.

success

boolean

warnings

List[string]

Non-fatal warnings generated during execution.

errors

List[string]

metadata

Dict[string, any]

Additional tool-specific metadata.

Applications

This tool estimates predicted expression for genes overlapping a region of interest and compares predicted activity across a panel of intervals on a common scale.

Usage Tips

Interval scoring is gene-centric. It depends on a wide region of context, so intervals must be large enough to contain the surrounding gene. Very short intervals are not suitable.

Score ISM Variants Batch (`alphagenome-score-ism-variants-batch`)

Performs in silico mutagenesis by scoring every single-base substitution across a chosen sub-window and returning the effects as per-track records.

API Reference

Source

Input: AlphaGenomeScoreISMInput

requests

List[AlphaGenomeISM]

required

ISM requests to process. A single request is auto-wrapped into a list.

Show AlphaGenomeISM

ism_interval_start

integer

required

ISM sub-interval start (0-based, inclusive).

ism_interval_end

integer

required

ISM sub-interval end (0-based, exclusive).

variant_position

integer

Optional existing variant position to apply before ISM (0-based).

reference_bases

string

Optional existing variant ref allele.

alternate_bases

string

Optional existing variant alt allele.

chromosome

string

required

Chromosome identifier, e.g. 'chr1'.

interval_start

integer

required

Interval start (0-based, inclusive).

interval_end

integer

required

Interval end (0-based, exclusive).

Source

Config: AlphaGenomeScoreVariantsConfig

model_version

string

default:"all_folds"

AlphaGenome Hugging Face model version.

variant_scorers

array

Scorer names from the library’s RECOMMENDED_VARIANT_SCORERS. None uses all recommended.

organism

enum

default:"human"

Organism for predictions.Available options: human, mouse

verbose

integer

default:"0"

Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.

device

string

default:"cuda"

Device to run inference on.

timeout

integer

default:"600"

Maximum execution time in seconds. None waits indefinitely.

seed

integer

Source

Output: AlphaGenomeScoreISMOutput

results

List[AlphaGenomeScoreOutput]

required

Per-request score outputs.

Show AlphaGenomeScoreOutput

scores

List[Dict[string, any]]

required

Tidy score records. Each dict contains keys such as variant_id, scored_interval, gene_id, gene_name, output_type, variant_scorer or interval_scorer, track_name, raw_score, etc.

tool_id

string

Unique tool identifier (e.g., "blast-search").

execution_time

number

Execution time in seconds.

timestamp

string

Execution timestamp.

success

boolean

warnings

List[string]

Non-fatal warnings generated during execution.

errors

List[string]

metadata

Dict[string, any]

Additional tool-specific metadata.

Applications

This tool maps which exact positions within a promoter or enhancer drive a predicted regulatory signal and produces per-base importance profiles suitable for visualization as sequence logos.

Usage Tips

Keep the mutagenesis window narrow. The sub-window must lie fully inside the surrounding interval, and because the number of scored substitutions grows with its width, windows of tens to low hundreds of bases keep each run tractable.
An existing variant can be applied first. A known variant may be set before mutagenesis to explore how it changes the local sensitivity landscape.

Toolkit Notes

The model weights are gated. Running any tool requires accepting the AlphaGenome Terms of Use and authenticating with a HuggingFace access token, and use is restricted to non-commercial scientific research.
Coordinates are 0-based and half-open. Genomic intervals follow the BED convention used throughout genome browsers, where the start is inclusive and the end is exclusive.
Context lengths are fixed. AlphaGenome operates at 16,384, 131,072, 524,288, and 1,048,576 base pairs. Intervals that do not already match one of these lengths are centered and resized up to the smallest supported length that contains them, and longer windows capture more distal regulation at higher compute cost.
Output can be filtered by tissue and organism. The prediction tools restrict their tracks to particular cell types or tissues through UBERON ontology terms, and every tool runs against either the human or mouse genome through the organism setting.

Example notebook: See the full working example for a copy-paste-ready walkthrough.

Infrastructure Guides

The following guides cover how to run tools efficiently and at scale.

Tool Persistence

Keep a tool’s model warm across calls instead of reloading it every invocation.

Device Management

How GPUs are allocated to tools and how to target specific devices.

Parallel Execution

Fan a batch of inputs out across multiple GPUs.

Cloud Inference

Run tools on managed cloud infrastructure with no local setup.

​Background

​Learning Resources

​Tools

​Predict Intervals (alphagenome-predict-intervals)

​API Reference

​Applications

​Usage Tips

​Predict Sequences (alphagenome-predict-sequences)

​API Reference

​Applications

​Usage Tips

​Predict Variants (alphagenome-predict-variants)

​API Reference

​Applications

​Usage Tips

​Score Variants (alphagenome-score-variants)

​API Reference

​Applications

​Usage Tips

​Score Intervals (alphagenome-score-intervals)

​API Reference

​Applications

​Usage Tips

​Score ISM Variants Batch (alphagenome-score-ism-variants-batch)

​API Reference

​Applications

​Usage Tips

​Toolkit Notes

​Infrastructure Guides

Tool Persistence

Device Management

Parallel Execution

Cloud Inference

Background

Learning Resources

Tools

Predict Intervals (`alphagenome-predict-intervals`)

API Reference

Applications

Usage Tips

Predict Sequences (`alphagenome-predict-sequences`)

API Reference

Applications

Usage Tips

Predict Variants (`alphagenome-predict-variants`)

API Reference

Applications

Usage Tips

Score Variants (`alphagenome-score-variants`)

API Reference

Applications

Usage Tips

Score Intervals (`alphagenome-score-intervals`)

API Reference

Applications

Usage Tips

Score ISM Variants Batch (`alphagenome-score-ism-variants-batch`)

API Reference

Applications

Usage Tips

Toolkit Notes

Infrastructure Guides