
License: This constraint can use multiple tools, each under its own license. See the Tools Used tab and each tool’s page for license details.
This constraint is open source. Any third-party models, product names, or trademarks referenced are the property of their respective owners, and Proto is not affiliated with them.
Constraint(threshold=...)
parameter for pass/fail filtering.
API Reference
Configuration for the overall protein quality constraint.This configuration class orchestrates multiple protein quality sub-constraints
that can be enabled or disabled individually. It provides a flexible framework
for comprehensive protein quality assessment by combining various metrics
including sequence length, structural complexity, repetitiveness, amino acid
diversity, and balanced amino acid representation.The configuration uses a nested structure where all sub-constraint parameters
are exposed through a single
protein_quality_config attribute of type
ProteinQualitySubConfig. This design allows for easy serialization in
UI/API schemas while maintaining clear organization of constraint-specific
parameters.At least one sub-constraint must be enabled for the configuration to be valid.
This is enforced through a model validator that runs after initialization.The nested
protein_quality_config provides access to:- Length constraint: Validates protein length against min/max range or target value
- Complexity constraint: Detects low-complexity regions using segmasker
- Repetitiveness constraint: Identifies repeated k-mer patterns
- Diversity constraint: Ensures adequate amino acid type diversity
- Balanced amino acids constraint: Checks for underrepresented amino acid types
ProteinQualitySubConfig documentation
for complete parameter details.For more details, see:ProteinQualitySubConfig: Detailed documentation of all sub-constraint parameters and configuration optionsoverall_protein_quality_constraint: The constraint function that uses this configurationSequenceLengthConfig: Configuration for length constraintProteinComplexityConfig: Configuration for complexity constraintProteinRepetitivenessConfig: Configuration for repetitiveness constraintProteinDiversityConfig: Configuration for diversity constraintBalancedAaConfig: Configuration for balanced amino acids constraint
Nested configuration for protein quality checks
ReturnsConstraintOutput
One result per sequence. Scores range from 0.0 (best)
to 1.0 (worst) and represent the average of all enabled sub-constraint
scores, clipped to [0.0, 1.0]. For DNA sequences, the score reflects
the average quality across all predicted proteins. metadata carries:For DNA sequences:-
prodigal_proteins: List of dicts of predicted proteins from Prodigal, each with protein ID, sequence, length, etc. (orNoneif no ORFs were predicted) -
prodigal_protein_count: Integer count of predicted ORFs -
predicted_protein_count: Integer count of proteins (same as prodigal_protein_count) -
avg_constraint_score: Float average quality score across all predicted proteins -
protein_quality_details: List of dictionaries, one per predicted protein, each containing:protein_id: String identifier from Prodigallength: Integer protein length in amino acidsavg_constraint_score: Float average across enabled constraintsquality_scores: Dictionary mapping constraint names to scoresmetadata: Dictionary of additional constraint-specific metadata
protein_quality_scores: Dictionary mapping constraint names (e.g., “length”, “complexity”, “repetitiveness”, “diversity”, “balanced_aas”) to their individual scoresavg_constraint_score: Float average across all enabled constraints
Usage
Using all available constraints with custom thresholds:python
Metadata
| Property | Value |
|---|---|
| Key | overall-protein-quality |
| Function | overall_protein_quality_constraint |
| Category | protein_quality |
| Mode | discrete |
| Uses GPU | False |
| Supported Types | dna, protein |

