
This toolkit is open source. Any third-party models, product names, or trademarks referenced are the property of their respective owners, and Proto is not affiliated with them.
Background
The tool assigns secondary structure using the P-SEA algorithm (Labesse, Colloc’h, Pothier, and Mornon, 1997) implemented in Biotite, which classifies each protein residue as alpha helix, beta sheet, or loop from the Cα-atom trace alone using distance and angle patterns. The reported helix, sheet, and loop percentages summarise the overall secondary-structure composition of the structure, while the longest contiguous alpha-helix length is a separate scalar that captures the longest single helix without averaging across the structure. Radius of gyration is the mass-weighted root-mean-square distance of all atoms from the centre of mass and is a standard scalar measure of overall structural compactness used in small-angle X-ray scattering, polymer physics, and protein structural analysis. For proteins of a given length, compact native folds produce smaller gyration radii than disordered or partially folded conformations, which is what makes the metric useful as an artifact filter for predicted structures. Both metrics are useful as inexpensive sanity checks on structures produced by sequence-based predictors such as ESMFold, AlphaFold, Chai, Boltz, and Protenix. Predictors can default to extended helical bundles for low-confidence regions, and failed folds frequently appear as extended conformations with elevated gyration radii. The Biotite Python library (Kunzmann et al., 2023) provides the underlying secondary-structure annotation and gyration-radius implementations used by this tool.Learning Resources
- Biotite documentation (TU Darmstadt). API reference for the secondary-structure annotation and gyration-radius computations used by this tool.
Tools
Structure Quality Metrics (structure-metrics)
Computes five quality metrics for each input structure: helix_pct, sheet_pct, loop_pct (secondary-structure composition on the 0 to 100 scale), longest_alpha_helix (residue count of the longest contiguous alpha-helical segment), and gyration_radius (radius of gyration in Ã…). Inputs are passed as a list of structures and results are returned in the same order.API Reference
Input: StructureMetricsInput
Input: StructureMetricsInput
Structure objects per item.Config: StructureMetricsConfig
Config: StructureMetricsConfig
True is coerced to 1 and False to 0.None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.Output: StructureMetricsOutput
Output: StructureMetricsOutput
inputs.structures.metrics item)| Metric | Type | Range | Availability |
|---|---|---|---|
longest_alpha_helix | int | ≥ 0.0 | always |
gyration_radius | float | ≥ 0.0 | always |
helix_pct | float | 0.0 to 100.0 | always |
sheet_pct | float | 0.0 to 100.0 | always |
loop_pct | float | 0.0 to 100.0 | always |
Applications
This tool is appropriate as a fast first-pass filter for batch screening of predicted protein structures. Representative applications include flagging predicted structures with unrealistically long alpha helices that often arise as artifacts on low-confidence regions, identifying extended or disordered conformations that fail to fold compactly, summarising the secondary-structure composition of a designed protein, and ranking generated structures by structural plausibility before more expensive downstream analyses.Usage Tips
- Inputs accept a list of
Structureobjects, file paths, or raw PDB or mmCIF content strings. A single bare input is automatically wrapped in a list. Each item is coerced to aStructurebefore analysis. - All five metrics are computed over every chain of the input structure. There is no per-chain breakdown at the tool level. To analyse a specific chain, extract that chain into its own
StructureusingStructure.select_chain()before passing it in. - Filter thresholds depend on the protein family. A 50-residue alpha helix is a strong artifact signal for a typical globular protein but is normal for coiled-coil and fibrous proteins. A gyration radius above 45 Ã… indicates failed folding for a 1000-residue protein but is expected for naturally elongated proteins. Calibrate thresholds against known structures of the protein family being screened.
- The
secondary_structure_percentagessummary andlongest_alpha_helixuse the same P-SEA assignment. The helix percentage and longest contiguous helix length are derived from the same per-residue annotation, so a structure withhelix_pct=80andlongest_alpha_helix=200indicates that nearly the entire structure is one continuous helix.
Toolkit Notes
These apply to every Structure Metrics tool in this toolkit (structure-metrics).
- Outputs are returned as typed metric objects. Each
StructureQualityMetricsentry carrieslongest_alpha_helix(integer residue count),gyration_radius(Ã…),helix_pct,sheet_pct, andloop_pct(all on the 0 to 100 scale). Results can be exported to CSV or JSON through the standard export method. - The tool implementation runs entirely in-process and uses CPU only. Computation is performed in pure Python through Biotite, with no standalone environment or separate program invoked. Per-structure runtime is sub-second for typical protein sizes and scales linearly with the number of input structures.