Skip to main content
Protein Globularity
License: This constraint can use multiple tools, each under its own license. See the Tools Used tab and each tool’s page for license details.

This constraint is open source. Any third-party models, product names, or trademarks referenced are the property of their respective owners, and Proto is not affiliated with them.


proto-bio/proto-language/proto_language/constraint/protein_structure/protein_globularity_constraint.py
View source
Encourage compact, globular protein structures using ESMFold. This constraint function uses ESMFold to predict protein 3D structures and evaluates their compactness by analyzing the spatial distribution of backbone atoms. Globularity is measured as the standard deviation of distances from backbone atoms (N, CA, C, O) to the structure’s geometric centroid. Lower values indicate more compact, spherical structures characteristic of well-folded globular proteins, while higher values indicate extended, elongated, or poorly folded structures. Each input tuple is folded as one complex with an arbitrary number of protein chains. DNA chains are first resolved with ORFipy by scanning both strands for canonical ATG-to-stop ORFs and selecting the longest ORF as that chain’s translated CDS. Structure prediction is GPU-intensive and may take several minutes per protein depending on length and hardware.

API Reference

ConfigProteinGlobularityConfig Source
Configuration for protein globularity constraint.This class defines configuration parameters for evaluating protein structural compactness using ESMFold structure prediction. Globularity measures how compact and spherical a protein structure is, based on the spatial distribution of backbone atoms around the structure’s center of mass. More globular proteins have backbone atoms clustered tightly around the centroid, while extended structures show higher dispersion. Globularity is measured as the standard deviation of distances from backbone atoms (N, CA, C, O) to the structure’s centroid. Lower values indicate more compact, spherical structures. The score is normalized by dividing by max_globularity (default 20.0 Ångströms) and capped at 1.0.
max_globularity
number
default:"20.0"
Max std (Ã…) of backbone-atom distances to the structure centroid; above this is treated as unfolded.
esmfold_config
ESMFoldConfig
ESMFold configuration for structure prediction.
ReturnsConstraintOutput
Per-proposal score in [0.0, 1.0] (lower = more compact). The predicted complex Structure attaches to slot 0. metadata carries:
  • avg_plddt: Float average pLDDT score for structure confidence (0.0-1.0)
  • ptm: Float predicted TM-score for structure accuracy (0.0-1.0)
  • pdb_output: String PDB format structure file content
  • esmfolded_sequence: String colon-separated protein-chain representation
  • raw_globularity: Float standard deviation of backbone-to-centroid distances in Ã…ngströms (lower = more compact)
  • normalized_globularity: Float normalized globularity score (0.0-1.0, capped by max_globularity)
  • dna_chain_orfs: Per-DNA-chain ORFipy metadata when DNA chains are present

Usage

Evaluating protein structural compactness:
python
>>> from proto_language.core import Sequence, SequenceType
>>> seq = Sequence("MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSF", "protein")
>>> config = ProteinGlobularityConfig()
>>> results = protein_globularity_constraint([(seq,)], config)
>>> print(results[0].score)  # e.g., 0.425 (normalized score, lower = more compact)
>>> print(results[0].metadata["raw_globularity"])  # e.g., 8.5 (raw Ångströms)
>>> print(results[0].metadata["normalized_globularity"])  # e.g., 0.425
>>> print(results[0].metadata["avg_plddt"])  # e.g., 0.85 (also available)
Evaluating DNA sequence (with automatic ORF prediction):
python
>>> dna_seq = Sequence("ATGGTACTGAGCCCAGCG...", "dna")
>>> config = ProteinGlobularityConfig()
>>> results = protein_globularity_constraint([(dna_seq,)], config)
>>> print(results[0].score)  # Normalized score (0.0-1.0)
>>> # Single-DNA-chain proposals also flatten selected-CDS metadata.
>>> print(results[0].metadata["orfipy_orf_count"])  # e.g., 2
>>> print(results[0].metadata["selected_cds"]["amino_acid_length"])  # longest ORF length
>>> # Multi-chain proposals carry per-chain CDS metadata.
>>> print(results[0].metadata["translated_cds_by_chain"][0]["amino_acid_length"])
>>> print(results[0].metadata["raw_globularity"])  # e.g., 7.8 Ã…

Metadata

PropertyValue
Keyprotein-globularity
Functionprotein_globularity_constraint
Categoryprotein_structure
Modediscrete
Uses GPUTrue
Supported Typesdna, protein