Proto is not affiliated with Google DeepMind and EMBL-EBI. This toolkit is open source and builds on the implementations produced by these organizations. Product names, logos, and trademarks are the property of their respective owners.
Background
The AlphaFold Protein Structure Database (AFDB) (Varadi et al., 2022) is a freely accessible archive of protein structures predicted by AlphaFold2 (Jumper et al., 2021), maintained by Google DeepMind and EMBL-EBI. It hosts predicted atomic coordinates for the UniProt reference proteomes. Each entry carries a per-residue confidence score (pLDDT, 0 to 100) and a pairwise predicted aligned error (pAE) matrix in angstroms. AFDB hosts AlphaFold2 single-chain predictions only. Multi-chain complexes are produced by separate pipelines and are not part of this database. Internally, the tool issues a GET request to the AFDB prediction endpoint athttps://alphafold.ebi.ac.uk/api/prediction/{accession}, which returns a JSON list of prediction records. It selects the canonical record (AF-{accession}-F1) by default, or the record matching the requested isoform, then follows the URLs carried in that record: pdbUrl or cifUrl for the structure body, plddtDocUrl for the per-residue pLDDT array, paeDocUrl for the pAE matrix, and msaUrl for the input multiple-sequence alignment (an A3M file). The mean pLDDT is read from the record’s globalMetricValue field. Records and their provenance come directly from the official AlphaFold DB REST API. Results reflect the live database, which always serves the latest version of each prediction.
Learning Resources
- AlphaFold DB FAQ (EMBL-EBI) - official guidance on coverage, confidence interpretation, versioning, and downloads.
- AlphaFold DB API documentation (EMBL-EBI) - the REST API specification for prediction records and artifact URLs.
- AlphaFold Protein Structure Database (EMBL-EBI Training) - a guided introduction to the database and how to interpret its predictions.
Tools
AlphaFold DB Fetch (alphafold-db-fetch)
Retrieves a single AlphaFold DB prediction record by UniProt accession and returns the predicted sequence and its 1-indexed coordinates, gene and organism metadata, mean pLDDT, the AFDB artifact URLs, the full JSON record, and an optional parsed Structure carrying per-residue pLDDT and optional pAE on structure.metrics.API Reference
Input: AlphaFoldDBFetchInput
Input: AlphaFoldDBFetchInput
None (default) returns the canonical entry (AF-{accession}-F1); 2 selects AF-{accession}-2-F1, etc. AFDB typically exposes isoforms 2-9 for human proteins. Raises ValueError if the requested isoform doesn’t exist.Config: AlphaFoldDBFetchConfig
Config: AlphaFoldDBFetchConfig
pdb, cifStructure on the output. Set to False for metadata-only probes (URLs, mean pLDDT, gene, sequence) — saves ~100-500 KB per call, meaningful for batch sweeps.output.structure.metrics["pae"]. Disabled by default — PAE files can be tens of MB for long proteins. No-op when include_structure=False.True is coerced to 1 and False to 0.None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.Output: AlphaFoldDBFetchOutput
Output: AlphaFoldDBFetchOutput
include_structure; when include_structure=True it is also mirrored at structure.metrics["avg_plddt"].None on legacy entries that predate the bcif export.structure_format, b_factor_type=BFactorType.PLDDT) with an :class:AlphaFoldDBMetrics metrics container carrying avg_plddt, plddt_per_residue, and (when include_pae=True) pae. None when include_structure=False.include_msa is False or when the entry has no associated MSA URL.| Metric | Type | Range | Availability |
|---|---|---|---|
avg_plddt | float | 0.0 to 100.0 | always |
plddt_per_residue | list[float] | 0.0 to 100.0 | always |
pae | list[list[float]] | ≥ 0.0 | when include_pae=True |
Applications
Use this to pull an AlphaFold-predicted structure into a pipeline when no experimental entry is needed: fetch a target by accession before inverse folding, docking, or binder design, screen accessions for AFDB coverage with metadata-only requests, or assess per-residue and pairwise confidence before structure-based work. The returnedStructure feeds directly into structure-consuming tools such as TM-align, US-align, and structure scoring. The UniProt tool supplies the UniProt accession from a gene name and organism, and the PDB tool provides the experimental counterpart when one exists.Usage Tips
- Coverage is broad but not universal. When AFDB has no prediction for an accession the tool raises
ValueError. Catch that error and fall back to predicting the structure from sequence. - A high
mean_plddtcan hide locally unreliable regions. Inspect the per-residue pLDDT onstructure.metricsbefore trusting any specific residue. latest_versionadvances when AFDB refreshes a prediction. Cache it alongside any structure you persist and refetch when it moves past the cached value.- Multiple records signal isoforms or fragments. The canonical record is selected by default and a warning lists the alternatives. To select a non-canonical isoform, pass the
isoforminput, and checkentry_id,sequence_start, andsequence_endto confirm which record was returned. - Low-confidence regions are usually real disorder, not a prediction error. Disordered or flexibly linked regions get very low per-residue confidence (pLDDT) and high predicted aligned error (pAE) between regions because they have no single fixed shape. Find those residue ranges from the per-residue pLDDT array and trim or down-weight just those residues. Do not throw away the whole prediction, because the confident domains are still reliable.
Toolkit Notes
These apply to every AlphaFold DB tool in this toolkit (alphafold-db-fetch).
- Requires network access. The tool calls the live AlphaFold DB REST API. It does not run offline and keeps no local copy of the database.
- Subject to AlphaFold DB rate limits. The EMBL-EBI API is unauthenticated and applies per-IP fair-use limits (EMBL-EBI Terms of Use). Space out high-volume requests.

Google DeepMind
EMBL-EBI