Skip to main content
pDockQ2
License: pDockQ2 has an AGPL-3.0 license and may require explicit attribution when utilized. Please refer to the license for full terms.

This toolkit is open source. Any third-party models, product names, or trademarks referenced are the property of their respective owners, and Proto is not affiliated with them.


Evaluation of AlphaFold-Multimer prediction on multi-chain protein complexes
Wensi Zhu, Aditi Shenoy, … Arne Elofsson
Bioinformatics (2023)
Read paper
@article{zhu_2023_pdockq2,
  title={Evaluation of AlphaFold-Multimer prediction on multi-chain protein complexes},
  author={Zhu, Wensi and Shenoy, Aditi and Kundrotas, Petras and Elofsson, Arne},
  journal={Bioinformatics},
  volume={39},
  number={7},
  pages={btad424},
  year={2023},
  doi={10.1093/bioinformatics/btad424},
}

@article{bryant_2022_pdockq,
  title={Improved prediction of protein-protein interactions using AlphaFold2},
  author={Bryant, Patrick and Pozzati, Gabriele and Elofsson, Arne},
  journal={Nature Communications},
  volume={13},
  number={1},
  pages={1265},
  year={2022},
  doi={10.1038/s41467-022-28865-w},
}
Copy citation
proto-bio/proto-tools/proto_tools/tools/structure_scoring/pdockq2
View source
Open Notebook
Open notebook
FunctionDescription
run_pdockq2()Score a cofolded protein complex with pDockQ2 (Zhu 2023), using pLDDT + PAE to summarize interfac… Docs Source

Background

DockQ (Basu and Wallner, 2016) is a continuous interface-quality measure for protein-protein docking models that combines the CAPRI quality indicators (fraction of native contacts, interface RMSD, and ligand RMSD) into a single score in the range 0 to 1. The published thresholds approximate the CAPRI quality classes of Acceptable (DockQ ≥ 0.23), Medium (DockQ ≥ 0.49), and High (DockQ ≥ 0.80). DockQ requires a known reference complex and cannot be computed when only the predicted structure is available. pDockQ (Bryant, Pozzati, and Elofsson, 2022) was introduced as a predicted version of DockQ that uses only AlphaFold2 outputs, with no reference complex required. It estimates DockQ for a dimer from the mean pLDDT of interface residues together with the logarithm of the number of interface contacts, calibrated against ground-truth DockQ values on a benchmark of heterodimers. pDockQ2 (Zhu, Shenoy, Kundrotas, and Elofsson, 2023) generalises pDockQ to larger multi-chain complexes and replaces the contact-count term with the Predicted Aligned Error (PAE) matrix, which captures pairwise residue-position uncertainty across chains. For each interface, the score combines the contact-weighted mean interface pLDDT with the mean of 1 / (1 + (PAE / 10)²) over interface residue pairs, then passes the product through a logistic sigmoid whose parameters were fit against ground-truth DockQ values on the AlphaFold-Multimer benchmark. The published analysis demonstrates that pDockQ2 estimates DockQ for each interface in a multimer rather than only for a single dimer.

Learning Resources

  • ElofssonLab/afm-benchmark (Elofsson Lab, Stockholm University). Reference implementation of pDockQ2 and the benchmark data from the original publication.
  • bjornwallner/DockQ (Wallner Lab, Linköping University). Reference implementation of the underlying DockQ measure that pDockQ2 estimates.

Tools

pDockQ2 Interface Quality (pdockq2)

Scores the per-interface quality of a cofolded protein complex by computing pDockQ2 for each chain pair and aggregating the per-chain scores into a single overall score. The tool takes a Structure with per-residue pLDDT in the B-factor column and the PAE matrix attached at structure.metrics["pae"], identifies CA-CA contacts between every pair of chains within a configurable distance cutoff, applies the published sigmoid, and returns the overall score together with a per-chain interface breakdown.

API Reference

Source
structure
Structure
required
Cofolded complex with per-residue pLDDT in the B-factor column (b_factor_type must be PLDDT or NORMALIZED_PLDDT) and the PAE matrix attached at structure.metrics['pae'] as a square list[list[float]] whose dimension matches the structure’s total residue count.
binder_chain
SingleChainSelection
required
Single-character chain ID of the binder (e.g. VHH).
target_chains
ChainSelection
required
Target chain IDs (single character each).
Source
distance_cutoff
number
default:"10.0"
CA-CA distance cutoff in Å for interface residue detection. Defaults to 10.0, matching germinal’s pDockQ.pDockQ2 wrapper default.
verbose
integer
default:"0"
Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.
device
string
default:"cpu"
Device to run the tool on.
timeout
integer
default:"600"
Maximum execution time in seconds. None waits indefinitely.
seed
integer
Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.
Source
metrics
PDockQ2Metrics
required
Scalar pDockQ2 metrics plus per-chain interface breakdown.
Metrics
MetricTypeRangeAvailability
pdockq2float0.0 to 1.0always
avg_interface_plddtfloat0.0 to 100.0always
avg_interface_paefloat0.0 to 1.0always
num_interface_contactsint≥ 0.0always

Applications

This tool is appropriate for filtering and ranking cofolded complexes from structure-prediction tools such as AlphaFold-Multimer, AlphaFold 3, Chai-1, Boltz-2, and Protenix. Representative applications include gating candidate protein binders from a design pipeline by predicted interface quality, ranking the most promising poses in a multi-chain prediction ensemble, and screening large sets of predicted complexes before committing to more expensive downstream analyses.

Usage Tips

  • The PAE matrix is required and must be attached at structure.metrics["pae"] as a square list[list[float]] whose dimension matches the total residue count of the structure. The input is rejected when the matrix is missing, not square, or of the wrong dimension.
  • Per-residue pLDDT must be supplied via the B-factor column. Structure predictors in proto-tools return the correct b_factor_type automatically, and Structure.from_file() auto-detects it for AlphaFold DB and ModelArchive files. For manually provided structures from other sources, pass b_factor_type=BFactorType.PLDDT (raw 0 to 100) or BFactorType.NORMALIZED_PLDDT (0 to 1) explicitly. The input is rejected when b_factor_type is any other value, since the published sigmoid was fit on a 0 to 100 pLDDT scale.
  • A pDockQ2 score above 0.23 corresponds to the “Acceptable” DockQ quality class. The thresholds derive from the underlying DockQ measure (Basu and Wallner, 2016): scores above 0.49 correspond to “Medium” quality and scores above 0.80 to “High” quality. Scores below 0.23 typically reflect either low interface pLDDT or high cross-chain PAE.
  • The overall score is the mean of pmidockq over target chains that contact the binder chain. When no target chain in target_chains is within the distance cutoff of binder_chain, the overall score is set to 0.0, num_interface_contacts is reported as 0, and a warning is logged. Verify the chain identifiers and the cutoff before interpreting an all-zero result as a poor interface.
  • distance_cutoff controls the CA-CA contact distance used to define interface residues. The wrapper default of 10.0 Å is more permissive than the 8.0 Å default used by the Elofsson Lab reference implementation against which the published sigmoid was calibrated. The qualitative DockQ-quality interpretation still applies at 10.0 Å, but quantitative scores will not exactly match the published values. Set distance_cutoff=8.0 for scores that match the original pDockQ2 calibration. The PAE normalisation distance inside the sigmoid is independently fixed at 10 Å per the published formula and is not affected by this setting.
  • The interface pLDDT is contact-pair weighted, not residue-deduplicated. A residue that contacts k cross-chain partners contributes its pLDDT k times to the interface mean. This matches the published pDockQ2 definition and is preserved by the wrapper.
  • The per-chain breakdown is available on result.metrics.interfaces. Each InterfacePDockQ2 entry exposes chain_id, neighbor_chains, if_plddt (0 to 100 pLDDT scale), norm_pae (0 to 1 normalised confidence, higher is more confident), and pmidockq (0 to 1 DockQ-scale prediction) for one chain. Inspect this list when debugging multi-chain targets or when the overall mean masks variation across interfaces.

Toolkit Notes

These apply to every pDockQ2 tool in this toolkit (pdockq2).
  • Outputs are returned as typed metric objects. Each PDockQ2Metrics result carries the overall pdockq2 score (0 to 1), avg_interface_plddt (0 to 100 pLDDT scale), avg_interface_pae (0 to 1 normalised confidence), and num_interface_contacts (integer count) together with a per-chain interfaces breakdown. The headline primary_metric is pdockq2, and results can be exported to JSON through the standard export method.
  • The tool implementation runs entirely in-process and uses CPU only. The scoring formula is re-implemented in pure Python with numpy, and no standalone environment or separate program is invoked. Per-call runtime is sub-second for typical complex sizes and scales quadratically with the total residue count because of the all-against-all CA-CA distance computation.
Example notebook: See the full working example for a copy-paste-ready walkthrough.

Infrastructure Guides

The following guides cover how to run tools efficiently and at scale.

Tool Persistence

Keep a tool’s model warm across calls instead of reloading it every invocation.

Device Management

How GPUs are allocated to tools and how to target specific devices.

Parallel Execution

Fan a batch of inputs out across multiple GPUs.

Cloud Inference

Run tools on managed cloud infrastructure with no local setup.