pDockQ2 - Proto

License: pDockQ2 has an AGPL-3.0 license and may require explicit attribution when utilized. Please refer to the license for full terms.

This toolkit is open source. Any third-party models, product names, or trademarks referenced are the property of their respective owners, and Proto is not affiliated with them.

Publication Publication Cite Cite Tool Source Tool Source Open as Notebook Open as Notebook

Evaluation of AlphaFold-Multimer prediction on multi-chain protein complexes

Wensi Zhu, Aditi Shenoy, … Arne Elofsson

Bioinformatics (2023)

Read paper

@article{zhu_2023_pdockq2,
  title={Evaluation of AlphaFold-Multimer prediction on multi-chain protein complexes},
  author={Zhu, Wensi and Shenoy, Aditi and Kundrotas, Petras and Elofsson, Arne},
  journal={Bioinformatics},
  volume={39},
  number={7},
  pages={btad424},
  year={2023},
  doi={10.1093/bioinformatics/btad424},
}

@article{bryant_2022_pdockq,
  title={Improved prediction of protein-protein interactions using AlphaFold2},
  author={Bryant, Patrick and Pozzati, Gabriele and Elofsson, Arne},
  journal={Nature Communications},
  volume={13},
  number={1},
  pages={1265},
  year={2022},
  doi={10.1038/s41467-022-28865-w},
}

Copy citation

proto-bio/proto-tools/proto_tools/tools/structure_scoring/pdockq2

View source

Open Notebook

Open notebook

Function	Description
`run_pdockq2()`	Score a cofolded protein complex with pDockQ2 (Zhu 2023), using pLDDT + PAE to summarize interfac…	Docs Source

Background

DockQ (Basu and Wallner, 2016) is a continuous interface-quality measure for protein-protein docking models that combines the CAPRI quality indicators (fraction of native contacts, interface RMSD, and ligand RMSD) into a single score in the range 0 to 1. The published thresholds approximate the CAPRI quality classes of Acceptable (DockQ ≥ 0.23), Medium (DockQ ≥ 0.49), and High (DockQ ≥ 0.80). DockQ requires a known reference complex and cannot be computed when only the predicted structure is available. pDockQ (Bryant, Pozzati, and Elofsson, 2022) was introduced as a predicted version of DockQ that uses only AlphaFold2 outputs, with no reference complex required. It estimates DockQ for a dimer from the mean pLDDT of interface residues together with the logarithm of the number of interface contacts, calibrated against ground-truth DockQ values on a benchmark of heterodimers. pDockQ2 (Zhu, Shenoy, Kundrotas, and Elofsson, 2023) generalises pDockQ to larger multi-chain complexes and replaces the contact-count term with the Predicted Aligned Error (PAE) matrix, which captures pairwise residue-position uncertainty across chains. For each interface, the score combines the contact-weighted mean interface pLDDT with the mean of 1 / (1 + (PAE / 10)²) over interface residue pairs, then passes the product through a logistic sigmoid whose parameters were fit against ground-truth DockQ values on the AlphaFold-Multimer benchmark. The published analysis demonstrates that pDockQ2 estimates DockQ for each interface in a multimer rather than only for a single dimer.

Learning Resources

ElofssonLab/afm-benchmark (Elofsson Lab, Stockholm University). Reference implementation of pDockQ2 and the benchmark data from the original publication.
bjornwallner/DockQ (Wallner Lab, Linköping University). Reference implementation of the underlying DockQ measure that pDockQ2 estimates.

Tools

pDockQ2 Interface Quality (`pdockq2`)

Scores the per-interface quality of a cofolded protein complex by computing pDockQ2 for each chain pair and aggregating the per-chain scores into a single overall score. The tool takes a Structure with per-residue pLDDT in the B-factor column and the PAE matrix attached at structure.metrics["pae"], identifies CA-CA contacts between every pair of chains within a configurable distance cutoff, applies the published sigmoid, and returns the overall score together with a per-chain interface breakdown.

API Reference

Source

Input: PDockQ2Input

structure

Structure

required

Cofolded complex with per-residue pLDDT in the B-factor column (b_factor_type must be PLDDT or NORMALIZED_PLDDT) and the PAE matrix attached at structure.metrics['pae'] as a square list[list[float]] whose dimension matches the structure’s total residue count.

Show Structure

structure

string

required

Raw structure content in PDB or CIF format.

structure_format

string

Format of the content string (auto-detected if omitted).

b_factor_type

BFactorType

default:"unspecified"

What the B-factor column represents.

source

string

Optional source identifier (filepath or tool name).

metrics

Metrics

Associated metrics (e.g., pLDDT, pTM scores, per-chain lists, pairwise matrices). None values are stripped at construction.

binder_chain

SingleChainSelection

required

Single-character chain ID of the binder (e.g. VHH).

Show SingleChainSelection

chain

string

required

The selected chain ID.

target_chains

ChainSelection

required

Target chain IDs (single character each).

Show ChainSelection

chains

List[string]

required

Chain IDs in the selection.

Source

Config: PDockQ2Config

distance_cutoff

number

default:"10.0"

CA-CA distance cutoff in Å for interface residue detection. Defaults to 10.0, matching germinal’s pDockQ.pDockQ2 wrapper default.

verbose

integer

default:"0"

Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.

device

string

default:"cpu"

Device to run the tool on.

timeout

integer

default:"600"

Maximum execution time in seconds. None waits indefinitely.

seed

integer

Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.

Source

Output: PDockQ2Output

metrics

PDockQ2Metrics

required

Scalar pDockQ2 metrics plus per-chain interface breakdown.

Show PDockQ2Metrics

interfaces

List[InterfacePDockQ2]

Per-target-chain breakdown kept as a Pydantic field so it stays out of metric iteration.

primary_metric

string

Name of the metric that best summarizes the result overall (e.g. "avg_plddt" for AlphaFold2). Used by downstream UI and reporting to pick a headline value.

Metrics

Metric	Type	Range	Availability
`pdockq2`	float	0.0 to 1.0	always
`avg_interface_plddt`	float	0.0 to 100.0	always
`avg_interface_pae`	float	0.0 to 1.0	always
`num_interface_contacts`	int	≥ 0.0	always

Applications

This tool is appropriate for filtering and ranking cofolded complexes from structure-prediction tools such as AlphaFold-Multimer, AlphaFold 3, Chai-1, Boltz-2, and Protenix. Representative applications include gating candidate protein binders from a design pipeline by predicted interface quality, ranking the most promising poses in a multi-chain prediction ensemble, and screening large sets of predicted complexes before committing to more expensive downstream analyses.

Usage Tips

The PAE matrix is required and must be attached at structure.metrics["pae"] as a square list[list[float]] whose dimension matches the total residue count of the structure. The input is rejected when the matrix is missing, not square, or of the wrong dimension.
Per-residue pLDDT must be supplied via the B-factor column. Structure predictors in proto-tools return the correct b_factor_type automatically, and Structure.from_file() auto-detects it for AlphaFold DB and ModelArchive files. For manually provided structures from other sources, pass b_factor_type=BFactorType.PLDDT (raw 0 to 100) or BFactorType.NORMALIZED_PLDDT (0 to 1) explicitly. The input is rejected when b_factor_type is any other value, since the published sigmoid was fit on a 0 to 100 pLDDT scale.
A pDockQ2 score above 0.23 corresponds to the “Acceptable” DockQ quality class. The thresholds derive from the underlying DockQ measure (Basu and Wallner, 2016): scores above 0.49 correspond to “Medium” quality and scores above 0.80 to “High” quality. Scores below 0.23 typically reflect either low interface pLDDT or high cross-chain PAE.
The overall score is the mean of pmidockq over target chains that contact the binder chain. When no target chain in target_chains is within the distance cutoff of binder_chain, the overall score is set to 0.0, num_interface_contacts is reported as 0, and a warning is logged. Verify the chain identifiers and the cutoff before interpreting an all-zero result as a poor interface.
distance_cutoff controls the CA-CA contact distance used to define interface residues. The wrapper default of 10.0 Å is more permissive than the 8.0 Å default used by the Elofsson Lab reference implementation against which the published sigmoid was calibrated. The qualitative DockQ-quality interpretation still applies at 10.0 Å, but quantitative scores will not exactly match the published values. Set distance_cutoff=8.0 for scores that match the original pDockQ2 calibration. The PAE normalisation distance inside the sigmoid is independently fixed at 10 Å per the published formula and is not affected by this setting.
The interface pLDDT is contact-pair weighted, not residue-deduplicated. A residue that contacts k cross-chain partners contributes its pLDDT k times to the interface mean. This matches the published pDockQ2 definition and is preserved by the wrapper.
The per-chain breakdown is available on result.metrics.interfaces. Each InterfacePDockQ2 entry exposes chain_id, neighbor_chains, if_plddt (0 to 100 pLDDT scale), norm_pae (0 to 1 normalised confidence, higher is more confident), and pmidockq (0 to 1 DockQ-scale prediction) for one chain. Inspect this list when debugging multi-chain targets or when the overall mean masks variation across interfaces.

Toolkit Notes

These apply to every pDockQ2 tool in this toolkit (pdockq2).

Outputs are returned as typed metric objects. Each PDockQ2Metrics result carries the overall pdockq2 score (0 to 1), avg_interface_plddt (0 to 100 pLDDT scale), avg_interface_pae (0 to 1 normalised confidence), and num_interface_contacts (integer count) together with a per-chain interfaces breakdown. The headline primary_metric is pdockq2, and results can be exported to JSON through the standard export method.
The tool implementation runs entirely in-process and uses CPU only. The scoring formula is re-implemented in pure Python with numpy, and no standalone environment or separate program is invoked. Per-call runtime is sub-second for typical complex sizes and scales quadratically with the total residue count because of the all-against-all CA-CA distance computation.

Example notebook: See the full working example for a copy-paste-ready walkthrough.

Infrastructure Guides

The following guides cover how to run tools efficiently and at scale.

Tool Persistence

Keep a tool’s model warm across calls instead of reloading it every invocation.

Device Management

How GPUs are allocated to tools and how to target specific devices.

Parallel Execution

Fan a batch of inputs out across multiple GPUs.

Cloud Inference

Run tools on managed cloud infrastructure with no local setup.

​Background

​Learning Resources

​Tools

​pDockQ2 Interface Quality (pdockq2)

​API Reference

​Applications

​Usage Tips

​Toolkit Notes

​Infrastructure Guides

Tool Persistence

Device Management

Parallel Execution

Cloud Inference

Background

Learning Resources

Tools

pDockQ2 Interface Quality (`pdockq2`)

API Reference

Applications

Usage Tips

Toolkit Notes

Infrastructure Guides