Skip to main content
License: AlphaFold2 uses Apache-2.0 for code and CC-BY-4.0 for model weights and may require explicit attribution when utilized. Please refer to the code license and model weights license for full terms.

Proto is not affiliated with Google DeepMind. This toolkit is open source and builds on the implementation produced by this organization. Product names, logos, and trademarks are the property of their respective owners.


google-deepmind/alphafold
google-deepmind/alphafold
Open source code for AlphaFold 2.
14.4k stars
View repo
Highly accurate protein structure prediction with AlphaFold
John Jumper, Richard Evans, … Anna Potapenko
Nature (2021)
Read paper
@article{jumper2021alphafold2,
  title={Highly accurate protein structure prediction with AlphaFold},
  author={Jumper, John and Evans, Richard and Pritzel, Alexander and Green, Tim and Figurnov, Michael and Ronneberger, Olaf and Tunyasuvunakool, Kathryn and Bates, Russ and {\v{Z}}{\'\i}dek, Augustin and Potapenko, Anna and others},
  journal={Nature},
  volume={596},
  number={7873},
  pages={583--589},
  year={2021},
  publisher={Nature Publishing Group},
  doi={10.1038/s41586-021-03819-2}
}
Copy citation
proto-bio/proto-tools/proto_tools/tools/structure_prediction/alphafold2
View source
Open Notebook
Open notebook
Coming soon!
Run this tool directly in Proto with no setup required.
FunctionDescription
run_alphafold2_gradient()AF2 binder design against a fixed target. Returns loss, Structure, and optionally gradient. (GPU) Docs Source
run_alphafold2()Protein structure prediction using AlphaFold2 via ColabDesign (GPU) Docs Source

Background

AlphaFold2 (Jumper et al., 2021) predicts a protein’s 3D structure from its amino-acid sequence, and was introduced at the CASP14 structure-prediction assessment in 2020. AF2 takes a multiple-sequence alignment (MSA) as a primary input. The MSA carries an evolutionary signal: residues that lie close together in the folded structure tend to mutate in a correlated way across related proteins. AF2 reads these covariation patterns to infer which parts of the chain are in contact. Because the signal comes from the alignment itself, accuracy scales with the depth and diversity of the MSA, and proteins with few detectable homologs are harder to fold. Internally, AlphaFold2 maintains two representations: an MSA representation and a pairwise representation over residue pairs. The Evoformer network repeatedly exchanges information between the two, using attention together with triangle-based updates on the pairwise representation that enforce geometric consistency among the inferred residue-residue distances. A structure module then turns these representations into an explicit 3D model, placing each residue as a rigid backbone frame with its own position and orientation. This whole process is recycled through the network several times, each pass refining the previous prediction. Along with the coordinates, AlphaFold2 emits two calibrated confidence measures: the per-residue predicted local distance difference test (pLDDT), which scores the model’s confidence in each residue’s local structure, and the predicted aligned error (PAE), which estimates the expected error in one residue’s position when the structure is aligned on another. This toolkit runs the original AlphaFold2 model through the ColabDesign JAX implementation rather than the full DeepMind or ColabFold pipeline. There is no template-search stage, and multiple-sequence alignments are optional: they can be generated by a ColabFold search, supplied precomputed, or skipped to run in single-sequence mode. Beyond folding, the same model exposes a per-residue gradient, which gradient-based binder-design methods use to optimize a binder sequence against a frozen target.

Learning Resources

Tools

AlphaFold2 Structure Prediction (alphafold2-prediction)

Predicts the 3D structure of one or more protein chains. Each input complex (a single chain, or several chains folded together) is run through the ColabDesign AlphaFold2 model, returning a predicted Structure per complex with confidence metrics: per-residue pLDDT, pTM, interface pTM for multi-chain complexes, and predicted aligned error.

API Reference

Source
complexes
List[Complex]
required
List of complexes to predict structures for. Inherited from StructurePredictionInput. Each complex can contain one or more protein chains.
msas
array
Pre-computed MSAs, one entry per complex. Each entry is a ComplexMSAs (per-chain MSAs keyed by chain index); paired=True marks rows taxonomy-aligned across chains. Populated by preprocess() or supplied directly.
Source
num_recycles
integer
default:"3"
Number of recycling iterations through the model. Higher values can improve accuracy at the cost of computation time.
model_num
integer
default:"1"
Which AlphaFold2 model parameter set to use (1-5). AF2 ships 5 independently trained parameter sets. Different sets can produce different predictions. Mutually exclusive with num_ensemble_models > 1; set one or the other. Default: 1.
num_ensemble_models
integer
default:"1"
Number of model parameter sets to run and average. Running multiple models and averaging their outputs can improve prediction quality at the cost of increased computation time. Mutually exclusive with model_num; when ensembling, models are selected from the full pool (models 1 through N). Range: 1-5. Default: 1.
verbose
integer
default:"0"
Whether to print status messages during execution. Inherited from BaseConfig. Default: False.
device
string
default:"cuda"
Device to run the model on ("cuda", "cpu"). Inherited from StructurePredictionConfig. Default: "cuda".
timeout
integer
default:"600"
Maximum execution time in seconds. None waits indefinitely.
seed
integer
Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.
include_pae_matrix
boolean
default:"False"
Inherited. Default: False.
use_msa
boolean
default:"True"
Whether to generate and use Multiple Sequence Alignments (MSAs) for protein chains using MMseqs2 homology search. Inherited from MSAStructurePredictionConfig. Default: True.
msa_search_config
Mmseqs2HomologySearchConfig
Configuration for MMseqs2 homology search (MSA generation). Only used when use_msa=True. Inherited from MSAStructurePredictionConfig. Default: None.
pair_heterocomplex_msas
boolean
default:"True"
Whether heterocomplex protein chains should use taxonomy-paired MSA generation. Inherited from MSAStructurePredictionConfig. Default: True.
Source
structures
List[Structure]
required
Predicted structures, each carrying an :class:AlphaFold2Metrics instance on .metrics.
Metrics (one set per structures item)
MetricTypeRangeAvailability
avg_plddtfloat0.0 to 1.0always
ptmfloat0.0 to 1.0always
iptmfloat0.0 to 1.0multi-chain input only
avg_paefloat≥ 0.0always
paelist[list[float]]≥ 0.0when include_pae_matrix=True

Applications

This tool folds a protein sequence into a 3D model for structural analysis, docking, or as input to downstream structure tools. Running it on a multi-chain complex additionally estimates how confidently the chains are placed relative to each other through the interface pTM and PAE, which is informative for assessing predicted protein-protein interfaces.

Usage Tips

  • use_msa defaults to True. An MSA is then generated by a ColabFold search for each protein chain; set it False for single-sequence prediction (faster, usually lower accuracy), or attach precomputed MSAs to the input to skip the search.
  • model_num and num_ensemble_models are mutually exclusive. model_num (default 1) selects one of AlphaFold2’s five trained parameter sets; num_ensemble_models runs several and averages them for higher accuracy at higher cost. Setting both raises an error.
  • Confidence is reported as pLDDT, pTM, ipTM, and PAE. Average pLDDT (0 to 1) is the primary per-structure quality metric; ipTM is populated only for multi-chain complexes. Set include_pae_matrix to attach the full per-residue PAE matrix.
  • Protein sequences only. DNA, RNA, and ligands are not supported; X is allowed for unknown residues.

AlphaFold2 Gradient (alphafold2-gradient)

Scores and differentiates a binder against a frozen target structure. Given a target-plus-binder template, it runs AlphaFold2 (through ColabDesign’s binder-design path) on the binder against the fixed target and returns the design loss, the predicted Structure, and, by default, the gradient of the loss with respect to the binder sequence logits.

API Reference

Source
target_pdb
Structure
required
Target+binder template PDB. Accepts a file path, raw PDB/CIF content string, Structure object, or a dict in the shape produced by Structure.model_dump(mode='json').
target_chain
string
default:"A"
Chain ID(s) of the frozen target in the PDB.
target_hotspot
string
Comma-separated hotspot residue indices on the target.
binder_chain
string
Binder template chain to redesign; None (default) designs de novo.
design_positions
array
Zero-based binder residue indices for loss focus (e.g. CDR loops). Germinal backend only.
logits
List[array]
required
Inherited — relaxed sequence logits (L x 20).
temperature
number
default:"1.0"
Inherited — softmax temperature.
Source
include_pae_matrix
boolean
default:"False"
Attach full per-residue PAE matrix. Default: False.
bias_redesign
number
Persistent softmax bias toward wildtype at non-design positions. Germinal backend only.
omit_aas
array
Amino acids to ban (e.g. ["C", "W"]).
num_recycles
integer
default:"3"
AF2 recycling iterations.
recycle_mode
enum
default:"last"
Which recycle’s output is used for loss/gradient. "last" matches Germinal’s VHH default; "average" averages across recycles; "sample" picks one uniformly; "first" uses only recycle 0.Available options: last, sample, average, first
model_num
integer
default:"1"
AF2 parameter set (1-5).
sample_models
boolean
default:"False"
Randomly sample model sets each forward pass.
use_multimer
boolean
default:"True"
Use AlphaFold multimer parameters for binder protocol.
rm_target_seq
boolean
default:"True"
Mask target template sequence in prep_inputs.
rm_target_sc
boolean
default:"False"
Mask target template side chains in prep_inputs.
rm_template_ic
boolean
default:"True"
Mask inter-chain template contacts in prep_inputs.
soft
number
default:"1.0"
ColabDesign softmax blending (0=raw logits, 1=full softmax). Passed per-step by the gradient optimizer.
hard
number
default:"0.0"
ColabDesign hard-sequence blending (0=relaxed, 1=straight-through argmax).
backend
enum
default:"base"
"base" (upstream ColabDesign) or "germinal" (Germinal fork with alpha, bias, framework contacts, extension losses).Available options: base, germinal
compute_gradient
boolean
default:"True"
Run backward pass and return gradient; False for forward-only scoring (returns gradient=None).
starting_binder_seq
string
Optional one-letter AA string used to seed the binder before gradient updates. Germinal backend only; length must equal len(logits).
loss_weights
Dict[string, number]
Binder-objective weights. Base keys: plddt, i_plddt, pae, i_pae, con, i_con, exp_res, rmsd, dgram_cce, fape. Germinal extension keys: rg, i_ptm, NC, helix, beta_strand.
intra_contact_num
integer
default:"2"
Intra-chain contacts per residue.
intra_contact_cutoff
number
default:"14.0"
Intra-chain distance cutoff (Ã…).
inter_contact_num
integer
default:"1"
Interface contacts per residue.
inter_contact_cutoff
number
default:"21.6875"
Interface distance cutoff (Ã…).
framework_contact_offset
number
default:"1.0"
Framework contact penalty offset in the Germinal inter-chain contact loss. Germinal backend only.
verbose
integer
default:"0"
Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.
device
string
default:"cuda"
Device to run the tool on.
timeout
integer
default:"600"
Maximum execution time in seconds. None waits indefinitely.
seed
integer
Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.
Source
structure
Structure
required
Predicted target+binder complex from the forward pass. B-factors are at the raw 0-100 PDB scale; b_factor_type=PLDDT means Structure.per_residue_plddt normalizes them to [0, 1].
gradient
array
Gradient matrix matching the input logits shape, or None when compute_gradient=False.
loss
number
required
Scalar objective value.
metrics
Dict[string, any]
Scalar auxiliary metrics (avg_plddt, ptm, iptm, avg_pae, plus per-loss values for every weighted ColabDesign loss term).
vocab
List[string]
required
Amino-acid column ordering.
Metrics
MetricTypeRangeAvailability
avg_plddtfloat0.0 to 1.0always
ptmfloat0.0 to 1.0always
iptmfloat0.0 to 1.0multi-chain input only
avg_paefloat≥ 0.0always
paelist[list[float]]≥ 0.0when include_pae_matrix=True

Applications

This tool supplies the loss and gradient signal that gradient-based binder-design methods optimize against a chosen target. With compute_gradient=False it instead provides forward-only scoring of a candidate binder (loss, metrics, and predicted structure) for ranking or filtering.

Usage Tips

  • Use this for protein binder objectives, not ligand generation. The input binder is an amino-acid chain and the returned losses describe a predicted protein-protein interface. For small-molecule compounds, choose chemistry-aware ligand tools instead.
  • One binder configuration per call; this tool is not an optimization loop. It evaluates a single binder against the fixed target. Drive it from a binder-design optimizer, or call it repeatedly, to actually design a binder.
  • compute_gradient defaults to True. It runs a forward and backward pass and returns the gradient with respect to the binder logits; set it False for forward-only scoring (gradient=None). The loss, metrics, and predicted structure are identical in both modes.
  • backend selects the loss set. "base" (the default) uses the upstream ColabDesign losses; "germinal" adds the Germinal fork’s alpha, bias, framework-contact, and extension losses. starting_binder_seq is only valid with "germinal".
  • target_hotspot focuses the design on chosen target residues. Supply comma-separated residue indices on the target to bias the binder toward a specific epitope; loss_weights (only the validated keys) tunes the objective terms.

Toolkit Notes

These apply to every AlphaFold2 tool in this toolkit (alphafold2-prediction, alphafold2-gradient).
  • Requires a GPU. Both tools run AlphaFold2 through a JAX backend and need an NVIDIA GPU; CPU execution is not practical.
  • Runs the original AlphaFold2 through ColabDesign, not the full DeepMind pipeline. There is no template-search stage; multiple-sequence alignments are optional and are used only by alphafold2-prediction.
  • num_recycles (default 3) applies to both tools. Each recycling iteration refines the structure; raising it improves accuracy at higher runtime.
Example notebook: See the full working example for a copy-paste-ready walkthrough.

Infrastructure Guides

The following guides cover how to run tools efficiently and at scale.

Tool Persistence

Keep a tool’s model warm across calls instead of reloading it every invocation.

Device Management

How GPUs are allocated to tools and how to target specific devices.

Parallel Execution

Fan a batch of inputs out across multiple GPUs.

Cloud Inference

Run tools on managed cloud infrastructure with no local setup.