Skip to main content
License: AlphaFold3 uses CC-BY-NC-SA-4.0 for code and Custom (AlphaFold 3 Model Parameters Terms of Use) for model weights and has restrictions around commercial use and may require explicit attribution when utilized. Model weights are not publicly distributed and must be requested from the provider. Please refer to the code license and model weights license for full terms.

Proto is not affiliated with Google DeepMind. This toolkit is open source and builds on the implementation produced by this organization. Product names, logos, and trademarks are the property of their respective owners.


google-deepmind/alphafold3
google-deepmind/alphafold3
AlphaFold 3 inference pipeline.
7.8k stars
View repo
Accurate structure prediction of biomolecular interactions with AlphaFold 3
Josh Abramson, Jonas Adler, … Joshua Bambrick
Nature (2024)
Read paper
@article{abramson2024alphafold3,
  title={Accurate structure prediction of biomolecular interactions with AlphaFold 3},
  author={Abramson, Josh and Adler, Jonas and Dunger, Jack and Evans, Richard and Green, Tim and Pritzel, Alexander and Ronneberger, Olaf and Willmore, Lindsay and Ballard, Andrew J and Bambrick, Joshua and others},
  journal={Nature},
  volume={630},
  number={8016},
  pages={493--500},
  year={2024},
  publisher={Nature Publishing Group},
  doi={10.1038/s41586-024-07487-w}
}
Copy citation
proto-bio/proto-tools/proto_tools/tools/structure_prediction/alphafold3
View source
Open Notebook
Open notebook
FunctionDescription
run_alphafold3()Protein structure prediction using AlphaFold3 (GPU) Docs Source

Background

AlphaFold3 (Abramson et al., 2024) predicts the joint 3D structure of a biomolecular assembly from the sequences and chemical components it contains. It extends AlphaFold2 beyond single proteins: one model folds complexes that mix proteins, DNA, RNA, and small-molecule ligands, and predicts how those parts are arranged relative to one another. As in AlphaFold2, each protein chain is paired with a multiple-sequence alignment (MSA) of related sequences, whose covariation patterns give the model an evolutionary signal for placing residues. Internally, AlphaFold3 represents the assembly as a set of tokens: one per amino-acid residue or nucleotide, and one per atom for ligands and modified residues. It then learns a representation of every token and of every token pair. Where AlphaFold2 leaned on the large MSA-centric Evoformer, AlphaFold3 de-emphasizes the MSA, handling it in a separate preliminary module rather than iterating it through the deep trunk, and does most of its work in the ‘Pairformer’, which iteratively refines the token and pair representations through geometry-inspired “triangle attention” updates. The final representations are then fed into a diffusion module that iteratively denoises all-atom coordinates starting from random noise. Run from several random seeds, it produces multiple candidate structures, and the highest-confidence candidate is returned as the final prediction. In addition, AlphaFold3 reports calibrated confidence metrics such as the per-atom predicted local distance difference test (pLDDT) for local reliability, a predicted aligned error (PAE) for how well any two tokens are placed relative to each other, and predicted template-modeling (pTM) and interface predicted template-modeling (ipTM) scores for overall and interface accuracy.

Learning Resources

Tools

AlphaFold3 Structure Prediction (alphafold3-prediction)

Predicts the 3D structure of a biomolecular complex. Each input complex can combine protein, DNA, RNA, and ligand chains; the assembly is folded by AlphaFold3 and returned as a predicted Structure per complex with confidence metrics: per-residue pLDDT, pTM, interface pTM for multi-chain complexes, and predicted aligned error.

API Reference

Source
complexes
List[Complex]
required
List of complexes to predict structures for. Inherited from StructurePredictionInput. Each complex can contain one or more sequences of proteins, DNA, RNA, or ligands.
msas
array
Pre-computed MSAs, one entry per complex. Each entry is a ComplexMSAs (per-chain MSAs keyed by chain index); paired=True marks rows taxonomy-aligned across chains. Populated by preprocess() or supplied directly.
Source
name
string
default:"af3_job"
Name of the folding job. Default: "af3_job".
seeds
List[integer]
default:"[0]"
Seeds to use for AlphaFold3 when the common BaseConfig.seed field is unset. Default: [0]. Note: AlphaFold3 will do five diffusion samples per seed, so this often can be set to a single seed. More seeds are required for complex docking tasks, such as antibody-antigen docking.
output_dir
string
Path prefix for the AlphaFold3 output directory. Appends _af3_results to the provided string. If None (default), uses a temporary directory that is automatically cleaned up after inference. If specified, creates a persistent directory at the given path that will NOT be automatically deleted. Default: None.
model_dir
string
Local path to the directory containing AlphaFold3 model parameters (a single .bin or .bin.zst file per DeepMind’s release layout). If None (default), weights are resolved from PROTO_ALPHAFOLD3_WEIGHTS_DIR, then PROTO_MODEL_CACHE, then PROTO_HOME/proto_model_cache/alphafold3/ (see notes/storage.md).
sif_path
string
Optional path to a pre-built AlphaFold3 Apptainer image (.sif). When set, the tool runs apptainer run against this image (which dispatches via the sif’s %runscript) instead of the in-env Python install. When None (default), inference.py looks for $VENV_PATH/alphafold3.sif (provisioned by setup.sh) and falls back to the env-based install if absent.
num_recycles
integer
default:"10"
Recycling iterations.
num_diffusion_samples
integer
default:"5"
Diffusion samples per seed; total candidates = len(seeds) * num_diffusion_samples.
verbose
integer
default:"0"
Whether to print status messages during execution. Inherited from StructurePredictionConfig. Default: False.
device
string
default:"cuda"
Device to run the model on ("cuda", "cpu"). Inherited from StructurePredictionConfig. Default: "cuda".
timeout
integer
default:"600"
Maximum execution time in seconds. None waits indefinitely.
seed
integer
Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.
include_pae_matrix
boolean
default:"False"
Inherited. Default: False.
use_msa
boolean
default:"True"
Whether to generate and use Multiple Sequence Alignments (MSAs) for protein chains using MMseqs2 homology search. Inherited from MSAStructurePredictionConfig. Default: True.
msa_search_config
Mmseqs2HomologySearchConfig
Configuration for MMseqs2 homology search (MSA generation). Only used when use_msa=True. Inherited from MSAStructurePredictionConfig. Default: None.
pair_heterocomplex_msas
boolean
default:"True"
Whether heterocomplex protein chains should use taxonomy-paired MSA generation. Inherited from MSAStructurePredictionConfig. Default: True.
Source
structures
List[Structure]
required
Predicted structures, each carrying an :class:AlphaFold3Metrics instance on .metrics.
Metrics (one set per structures item)
MetricTypeRangeAvailability
avg_plddtfloat0.0 to 100.0always
avg_paefloat≥ 0.0always
paelist[list[float]]≥ 0.0when include_pae_matrix=True
ptmfloat0.0 to 1.0depends on model output
iptmfloat0.0 to 1.0depends on model output
ranking_scorefloatunboundeddepends on model output

Applications

This tool predicts the structure of multi-component assemblies such as protein-DNA and protein-RNA complexes or protein-ligand binding poses. Running it on a multi-chain complex also estimates how confidently the components are placed relative to each other through interface pTM and PAE, which is informative for assessing predicted interfaces.

Usage Tips

  • use_msa defaults to True. An MSA is then generated by a ColabFold search for protein chains; set it False to skip the search, or attach precomputed MSAs to the input.
  • Diffusion sampling is controlled by seeds and num_diffusion_samples. AlphaFold3 draws num_diffusion_samples (default 5) structures per seed and keeps the best by ranking score, so a single seed is often enough; the total number of candidates is len(seeds) times num_diffusion_samples.
  • num_recycles (default 10) trades accuracy for time. More recycling iterations refine the prediction but increase runtime.
  • Confidence is reported as pLDDT, pTM, ipTM, and PAE. Average pLDDT (0 to 1) is the primary per-structure quality metric; ipTM is populated only for multi-chain complexes.

Toolkit Notes

These apply to every AlphaFold3 tool in this toolkit (alphafold3-prediction).
  • Requires a GPU. AlphaFold3 needs an NVIDIA GPU; CPU execution is not practical.
  • Model weights are gated. AlphaFold3 weights are not publicly distributed; access is restricted to non-commercial research and must be requested from Google DeepMind through their form, then made available to the tool before it can run.
Example notebook: See the full working example for a copy-paste-ready walkthrough.

Infrastructure Guides

The following guides cover how to run tools efficiently and at scale.

Tool Persistence

Keep a tool’s model warm across calls instead of reloading it every invocation.

Device Management

How GPUs are allocated to tools and how to target specific devices.

Parallel Execution

Fan a batch of inputs out across multiple GPUs.

Cloud Inference

Run tools on managed cloud infrastructure with no local setup.