Proto is not affiliated with Google DeepMind. This toolkit is open source and builds on the implementation produced by this organization. Product names, logos, and trademarks are the property of their respective owners.
Background
AlphaMissense (Cheng et al., 2023) is a deep-learning model that scores the pathogenicity of human missense variants. It is adapted from AlphaFold and fine-tuned on human and primate population variant frequencies, treating variants common in healthy populations as benign and rare variants as putatively pathogenic. For each canonical UniProt sequence it scores all 19 alternate amino acids at every position, covering every possible single missense substitution. Its classification thresholds are set to a cutoff that reaches about 90% precision on ClinVar variants. The paper reports that the model classifies 89% of all 71 million possible human missense variants, labeling 32% likely pathogenic and 57% likely benign at the default thresholds. The predictions are not computed at query time. They are precomputed by Google DeepMind and distributed as static CSV files by the AlphaFold Protein Structure Database, maintained by EMBL-EBI, keyed by UniProt accession athttps://alphafold.ebi.ac.uk/files/AF-{accession}-F1-{suffix}.csv.
Internally, the tool strips and uppercases the supplied accession, builds the AlphaFold DB CSV URL for the requested coordinate system, and issues a single HTTP GET. The uniprot coordinate system fetches the aa-substitutions CSV, which holds the full protein-coordinate grid of every possible substitution. The hg19 and hg38 coordinate systems fetch the genomic CSVs, which cover only substitutions reachable by a single-nucleotide change (a single-nucleotide variant, SNV) and additionally carry chromosome, position, reference allele, alternate allele, and GENCODE transcript identifier. Each CSV row is parsed into one prediction record, with the genomic fields populated only in genomic mode. A 404 response means the accession is not covered and surfaces as a clear error. Predictions reflect the fixed CSV snapshot published by AlphaFold DB rather than a value recomputed per request.
Learning Resources
- AlphaFold DB FAQ (EMBL-EBI) - official documentation covering the AlphaMissense CSV files, coverage, and the genomic and protein coordinate variants.
- AlphaMissense GitHub (Google DeepMind) - the official repository with usage notes for the released code and prediction tables.
Tools
AlphaMissense DB Fetch (alphamissense-db-fetch)
Retrieves the complete AlphaMissense prediction set for a single human UniProt accession and returns every per-substitution prediction, the prediction count, the mean pathogenicity score, and the source CSV URL. The coordinate_system configuration selects the protein-coordinate grid of every possible substitution or one of the genomic-coordinate tables, which are limited to substitutions reachable by a single-nucleotide change. The genomic tables additionally populate chromosome, position, reference allele, alternate allele, and transcript identifier on each prediction.API Reference
Input: AlphaMissenseDBFetchInput
Input: AlphaMissenseDBFetchInput
Config: AlphaMissenseDBFetchConfig
Config: AlphaMissenseDBFetchConfig
"uniprot" (default) returns the full protein-coordinate saturation grid (~7,500 rows for TP53). "hg19" / "hg38" return SNV-accessible substitutions in genomic coordinates (~2,500 rows for TP53) and populate chrom / pos / ref / alt / transcript_id on each prediction; a protein variant reachable by multiple SNVs appears multiple times.Available options: uniprot, hg19, hg38True is coerced to 1 and False to 0.None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.Output: AlphaMissenseDBFetchOutput
Output: AlphaMissenseDBFetchOutput
sequence_length * 19 for UniProt-coordinate fetches).predictions is empty.Applications
Use this to pull model-based missense pathogenicity into a pipeline: triage missense variants of uncertain significance from clinical sequencing, prioritize candidate disease-causing variants from case cohorts, avoid disruptive substitutions during sequence design or optimization, or apply a per-residue pathogenicity penalty in a generative-design loop. The accession can come from the UniProt tool, which resolves a gene symbol and organism to a canonical reviewed human accession. The same accession also drives the AlphaFold DB tool, aligning per-residue pathogenicity scores with predicted backbone coordinates.Usage Tips
- The tool always returns the entire prediction set. There is no server-side filtering on the static CSV. Filter the returned
predictionslist in your own code by position, score, or classification. - Cache the output once per accession. A typical protein has roughly 7,000 to 20,000 substitution rows. Fetch once and reuse the result rather than refetching inside tight loops.
- Group by
positionfor hotspot analysis.mean_pathogenicityover a wide region is a coarse summary. Inspect predictions grouped by residue position to surface hotspots. - Coverage is human canonical isoforms only. Non-canonical isoforms and non-human accessions return a 404 and surface as a clear error. Resolve the accession with the UniProt tool first if the organism is uncertain.
Toolkit Notes
These apply to every AlphaMissense DB tool in this toolkit (alphamissense-db-fetch).
- Requires network access. The tool downloads the AlphaMissense CSV from the AlphaFold Protein Structure Database. It does not run offline and keeps no local copy of the predictions.
- Subject to EMBL-EBI fair use. The CSV is an anonymous static download from AlphaFold DB with no API key or account. Observe the EMBL-EBI terms of use and space out high-volume requests.

Google DeepMind