AlphaFold2 - Proto

License: AlphaFold2 uses Apache-2.0 for code and CC-BY-4.0 for model weights and may require explicit attribution when utilized. Please refer to the code license and model weights license for full terms.

Proto is not affiliated with Google DeepMind. This toolkit is open source and builds on the implementation produced by this organization. Product names, logos, and trademarks are the property of their respective owners.

GitHub 14.4k GitHub 14.4k Publication Publication Cite Cite Tool Source Tool Source Open as Notebook Open as Notebook Open on Proto Open on Proto

google-deepmind/alphafold

Open source code for AlphaFold 2.

14.4k stars

View repo

Highly accurate protein structure prediction with AlphaFold

John Jumper, Richard Evans, … Anna Potapenko

Nature (2021)

Read paper

@article{jumper2021alphafold2,
  title={Highly accurate protein structure prediction with AlphaFold},
  author={Jumper, John and Evans, Richard and Pritzel, Alexander and Green, Tim and Figurnov, Michael and Ronneberger, Olaf and Tunyasuvunakool, Kathryn and Bates, Russ and {\v{Z}}{\'\i}dek, Augustin and Potapenko, Anna and others},
  journal={Nature},
  volume={596},
  number={7873},
  pages={583--589},
  year={2021},
  publisher={Nature Publishing Group},
  doi={10.1038/s41586-021-03819-2}
}

Copy citation

proto-bio/proto-tools/proto_tools/tools/structure_prediction/alphafold2

View source

Open Notebook

Open notebook

Coming soon!

Run this tool directly in Proto with no setup required.

Function	Description
`run_alphafold2_gradient()`	AF2 binder design against a fixed target. Returns loss, Structure, and optionally gradient. (GPU)	Docs Source
`run_alphafold2()`	Protein structure prediction using AlphaFold2 via ColabDesign (GPU)	Docs Source

Background

AlphaFold2 (Jumper et al., 2021) predicts a protein’s 3D structure from its amino-acid sequence, and was introduced at the CASP14 structure-prediction assessment in 2020. AF2 takes a multiple-sequence alignment (MSA) as a primary input. The MSA carries an evolutionary signal: residues that lie close together in the folded structure tend to mutate in a correlated way across related proteins. AF2 reads these covariation patterns to infer which parts of the chain are in contact. Because the signal comes from the alignment itself, accuracy scales with the depth and diversity of the MSA, and proteins with few detectable homologs are harder to fold. Internally, AlphaFold2 maintains two representations: an MSA representation and a pairwise representation over residue pairs. The Evoformer network repeatedly exchanges information between the two, using attention together with triangle-based updates on the pairwise representation that enforce geometric consistency among the inferred residue-residue distances. A structure module then turns these representations into an explicit 3D model, placing each residue as a rigid backbone frame with its own position and orientation. This whole process is recycled through the network several times, each pass refining the previous prediction. Along with the coordinates, AlphaFold2 emits two calibrated confidence measures: the per-residue predicted local distance difference test (pLDDT), which scores the model’s confidence in each residue’s local structure, and the predicted aligned error (PAE), which estimates the expected error in one residue’s position when the structure is aligned on another. This toolkit runs the original AlphaFold2 model through the ColabDesign JAX implementation rather than the full DeepMind or ColabFold pipeline. There is no template-search stage, and multiple-sequence alignments are optional: they can be generated by a ColabFold search, supplied precomputed, or skipped to run in single-sequence mode. Beyond folding, the same model exposes a per-residue gradient, which gradient-based binder-design methods use to optimize a binder sequence against a frozen target.

Learning Resources

AlphaFold: a solution to a 50-year-old grand challenge in biology (Google DeepMind) - a general-audience blog post explaining the protein-folding problem and how AlphaFold2 approaches it, published alongside the CASP14 result.

Tools

AlphaFold2 Structure Prediction (`alphafold2-prediction`)

Predicts the 3D structure of one or more protein chains. Each input complex (a single chain, or several chains folded together) is run through the ColabDesign AlphaFold2 model, returning a predicted Structure per complex with confidence metrics: per-residue pLDDT, pTM, interface pTM for multi-chain complexes, and predicted aligned error.

API Reference

Source

Input: AlphaFold2Input

complexes

List[Complex]

required

List of complexes to predict structures for. Inherited from StructurePredictionInput. Each complex can contain one or more protein chains.

Show Complex

chains

List[Chain | Fragment]

required

Chains in the complex, in input order.

msas

array

Pre-computed MSAs, one entry per complex. Each entry is a ComplexMSAs (per-chain MSAs keyed by chain index); paired=True marks rows taxonomy-aligned across chains. Populated by preprocess() or supplied directly.

Source

Config: AlphaFold2Config

num_recycles

integer

default:"3"

Number of recycling iterations through the model. Higher values can improve accuracy at the cost of computation time.

model_num

integer

default:"1"

Which AlphaFold2 model parameter set to use (1-5). AF2 ships 5 independently trained parameter sets. Different sets can produce different predictions. Mutually exclusive with num_ensemble_models > 1; set one or the other. Default: 1.

num_ensemble_models

integer

default:"1"

Number of model parameter sets to run and average. Running multiple models and averaging their outputs can improve prediction quality at the cost of increased computation time. Mutually exclusive with model_num; when ensembling, models are selected from the full pool (models 1 through N). Range: 1-5. Default: 1.

verbose

integer

default:"0"

Whether to print status messages during execution. Inherited from BaseConfig. Default: False.

device

string

default:"cuda"

Device to run the model on ("cuda", "cpu"). Inherited from StructurePredictionConfig. Default: "cuda".

timeout

integer

default:"600"

Maximum execution time in seconds. None waits indefinitely.

seed

integer

Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.

include_pae_matrix

boolean

default:"False"

Inherited. Default: False.

use_msa

boolean

default:"True"

Whether to generate and use Multiple Sequence Alignments (MSAs) for protein chains using MMseqs2 homology search. Inherited from MSAStructurePredictionConfig. Default: True.

msa_search_config

Mmseqs2HomologySearchConfig

Configuration for MMseqs2 homology search (MSA generation). Only used when use_msa=True. Inherited from MSAStructurePredictionConfig. Default: None.

pair_heterocomplex_msas

boolean

default:"True"

Whether heterocomplex protein chains should use taxonomy-paired MSA generation. Inherited from MSAStructurePredictionConfig. Default: True.

Source

Output: AlphaFold2Output

structures

List[Structure]

required

Predicted structures, each carrying an :class:AlphaFold2Metrics instance on .metrics.

Show Structure

structure

string

required

Raw structure content in PDB or CIF format.

structure_format

string

Format of the content string (auto-detected if omitted).

b_factor_type

BFactorType

What the B-factor column represents.

source

string

Optional source identifier (filepath or tool name).

metrics

Metrics

Associated metrics (e.g., pLDDT, pTM scores, per-chain lists, pairwise matrices). None values are stripped at construction.

Metrics (one set per structures item)

Metric	Type	Range	Availability
`avg_plddt`	float	0.0 to 1.0	always
`ptm`	float	0.0 to 1.0	always
`iptm`	float	0.0 to 1.0	multi-chain input only
`avg_pae`	float	≥ 0.0	always
`pae`	list[list[float]]	≥ 0.0	when include_pae_matrix=True

Applications

This tool folds a protein sequence into a 3D model for structural analysis, docking, or as input to downstream structure tools. Running it on a multi-chain complex additionally estimates how confidently the chains are placed relative to each other through the interface pTM and PAE, which is informative for assessing predicted protein-protein interfaces.

Usage Tips

use_msa defaults to True. An MSA is then generated by a ColabFold search for each protein chain; set it False for single-sequence prediction (faster, usually lower accuracy), or attach precomputed MSAs to the input to skip the search.
model_num and num_ensemble_models are mutually exclusive. model_num (default 1) selects one of AlphaFold2’s five trained parameter sets; num_ensemble_models runs several and averages them for higher accuracy at higher cost. Setting both raises an error.
Confidence is reported as pLDDT, pTM, ipTM, and PAE. Average pLDDT (0 to 1) is the primary per-structure quality metric; ipTM is populated only for multi-chain complexes. Set include_pae_matrix to attach the full per-residue PAE matrix.
Protein sequences only. DNA, RNA, and ligands are not supported; X is allowed for unknown residues.

AlphaFold2 Gradient (`alphafold2-gradient`)

Scores and differentiates a binder against a frozen target structure. Given a target-plus-binder template, it runs AlphaFold2 (through ColabDesign’s binder-design path) on the binder against the fixed target and returns the design loss, the predicted Structure, and, by default, the gradient of the loss with respect to the binder sequence logits.

API Reference

Source

Input: AlphaFold2GradientInput

target_pdb

Structure

required

Target+binder template PDB. Accepts a file path, raw PDB/CIF content string, Structure object, or a dict in the shape produced by Structure.model_dump(mode='json').

Show Structure

structure

string

required

Raw structure content in PDB or CIF format.

structure_format

string

Format of the content string (auto-detected if omitted).

b_factor_type

BFactorType

default:"unspecified"

What the B-factor column represents.

source

string

Optional source identifier (filepath or tool name).

metrics

Metrics

Associated metrics (e.g., pLDDT, pTM scores, per-chain lists, pairwise matrices). None values are stripped at construction.

target_chain

string

default:"A"

Chain ID(s) of the frozen target in the PDB.

target_hotspot

string

Comma-separated hotspot residue indices on the target.

binder_chain

string

Binder template chain to redesign; None (default) designs de novo.

design_positions

array

Zero-based binder residue indices for loss focus (e.g. CDR loops). Germinal backend only.

logits

List[array]

required

Inherited — relaxed sequence logits (L x 20).

temperature

number

default:"1.0"

Inherited — softmax temperature.

Source

Config: AlphaFold2GradientConfig

include_pae_matrix

boolean

default:"False"

Attach full per-residue PAE matrix. Default: False.

bias_redesign

number

Persistent softmax bias toward wildtype at non-design positions. Germinal backend only.

omit_aas

array

Amino acids to ban (e.g. ["C", "W"]).

num_recycles

integer

default:"3"

AF2 recycling iterations.

recycle_mode

enum

default:"last"

Which recycle’s output is used for loss/gradient. "last" matches Germinal’s VHH default; "average" averages across recycles; "sample" picks one uniformly; "first" uses only recycle 0.Available options: last, sample, average, first

model_num

integer

default:"1"

AF2 parameter set (1-5).

sample_models

boolean

default:"False"

Randomly sample model sets each forward pass.

use_multimer

boolean

default:"True"

Use AlphaFold multimer parameters for binder protocol.

rm_target_seq

boolean

default:"True"

Mask target template sequence in prep_inputs.

rm_target_sc

boolean

default:"False"

Mask target template side chains in prep_inputs.

rm_template_ic

boolean

default:"True"

Mask inter-chain template contacts in prep_inputs.

soft

number

default:"1.0"

ColabDesign softmax blending (0=raw logits, 1=full softmax). Passed per-step by the gradient optimizer.

hard

number

default:"0.0"

ColabDesign hard-sequence blending (0=relaxed, 1=straight-through argmax).

backend

enum

default:"base"

"base" (upstream ColabDesign) or "germinal" (Germinal fork with alpha, bias, framework contacts, extension losses).Available options: base, germinal

compute_gradient

boolean

default:"True"

Run backward pass and return gradient; False for forward-only scoring (returns gradient=None).

starting_binder_seq

string

Optional one-letter AA string used to seed the binder before gradient updates. Germinal backend only; length must equal len(logits).

loss_weights

Dict[string, number]

Binder-objective weights. Base keys: plddt, i_plddt, pae, i_pae, con, i_con, exp_res, rmsd, dgram_cce, fape. Germinal extension keys: rg, i_ptm, NC, helix, beta_strand.

intra_contact_num

integer

default:"2"

Intra-chain contacts per residue.

intra_contact_cutoff

number

default:"14.0"

Intra-chain distance cutoff (Å).

inter_contact_num

integer

default:"1"

Interface contacts per residue.

inter_contact_cutoff

number

default:"21.6875"

Interface distance cutoff (Å).

framework_contact_offset

number

default:"1.0"

Framework contact penalty offset in the Germinal inter-chain contact loss. Germinal backend only.

verbose

integer

default:"0"

Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.

device

string

default:"cuda"

Device to run the tool on.

timeout

integer

default:"600"

Maximum execution time in seconds. None waits indefinitely.

seed

integer

Source

Output: AlphaFold2GradientOutput

structure

Structure

required

Predicted target+binder complex from the forward pass. B-factors are at the raw 0-100 PDB scale; b_factor_type=PLDDT means Structure.per_residue_plddt normalizes them to [0, 1].

Show Structure

structure

string

required

Raw structure content in PDB or CIF format.

structure_format

string

Format of the content string (auto-detected if omitted).

b_factor_type

BFactorType

What the B-factor column represents.

source

string

Optional source identifier (filepath or tool name).

metrics

Metrics

Associated metrics (e.g., pLDDT, pTM scores, per-chain lists, pairwise matrices). None values are stripped at construction.

gradient

array

Gradient matrix matching the input logits shape, or None when compute_gradient=False.

loss

number

required

Scalar objective value.

metrics

Dict[string, any]

Scalar auxiliary metrics (avg_plddt, ptm, iptm, avg_pae, plus per-loss values for every weighted ColabDesign loss term).

vocab

List[string]

required

Amino-acid column ordering.

Metrics

Metric	Type	Range	Availability
`avg_plddt`	float	0.0 to 1.0	always
`ptm`	float	0.0 to 1.0	always
`iptm`	float	0.0 to 1.0	multi-chain input only
`avg_pae`	float	≥ 0.0	always
`pae`	list[list[float]]	≥ 0.0	when include_pae_matrix=True

Applications

This tool supplies the loss and gradient signal that gradient-based binder-design methods optimize against a chosen target. With compute_gradient=False it instead provides forward-only scoring of a candidate binder (loss, metrics, and predicted structure) for ranking or filtering.

Usage Tips

Use this for protein binder objectives, not ligand generation. The input binder is an amino-acid chain and the returned losses describe a predicted protein-protein interface. For small-molecule compounds, choose chemistry-aware ligand tools instead.
One binder configuration per call; this tool is not an optimization loop. It evaluates a single binder against the fixed target. Drive it from a binder-design optimizer, or call it repeatedly, to actually design a binder.
compute_gradient defaults to True. It runs a forward and backward pass and returns the gradient with respect to the binder logits; set it False for forward-only scoring (gradient=None). The loss, metrics, and predicted structure are identical in both modes.
backend selects the loss set. "base" (the default) uses the upstream ColabDesign losses; "germinal" adds the Germinal fork’s alpha, bias, framework-contact, and extension losses. starting_binder_seq is only valid with "germinal".
target_hotspot focuses the design on chosen target residues. Supply comma-separated residue indices on the target to bias the binder toward a specific epitope; loss_weights (only the validated keys) tunes the objective terms.

Toolkit Notes

These apply to every AlphaFold2 tool in this toolkit (alphafold2-prediction, alphafold2-gradient).

Requires a GPU. Both tools run AlphaFold2 through a JAX backend and need an NVIDIA GPU; CPU execution is not practical.
Runs the original AlphaFold2 through ColabDesign, not the full DeepMind pipeline. There is no template-search stage; multiple-sequence alignments are optional and are used only by alphafold2-prediction.
num_recycles (default 3) applies to both tools. Each recycling iteration refines the structure; raising it improves accuracy at higher runtime.

Example notebook: See the full working example for a copy-paste-ready walkthrough.

Infrastructure Guides

The following guides cover how to run tools efficiently and at scale.

Tool Persistence

Keep a tool’s model warm across calls instead of reloading it every invocation.

Device Management

How GPUs are allocated to tools and how to target specific devices.

Parallel Execution

Fan a batch of inputs out across multiple GPUs.

Cloud Inference

Run tools on managed cloud infrastructure with no local setup.

​Background

​Learning Resources

​Tools

​AlphaFold2 Structure Prediction (alphafold2-prediction)

​API Reference

​Applications

​Usage Tips

​AlphaFold2 Gradient (alphafold2-gradient)

​API Reference

​Applications

​Usage Tips

​Toolkit Notes

​Infrastructure Guides

Tool Persistence

Device Management

Parallel Execution

Cloud Inference

Background

Learning Resources

Tools

AlphaFold2 Structure Prediction (`alphafold2-prediction`)

API Reference

Applications

Usage Tips

AlphaFold2 Gradient (`alphafold2-gradient`)

API Reference

Applications

Usage Tips

Toolkit Notes

Infrastructure Guides