Constraints
A constraint encodes a biological requirement, such as a GC-content range, protein folding confidence, structural similarity, or motif presence, as a scoring function that the optimizer minimizes. Every constraint answers one question about a proposal sequence: how far is it from the requirement? The answer is a score between0.0 (perfect) and 1.0 (worst). The optimizer combines all constraint scores into a single energy value and searches for sequences that minimize it.
The Scoring Model
Proto uses a unified scoring model where all constraints return values on the same[0.0, 1.0] scale:
0.0 satisfies all constraints perfectly.
python
Two Modes: Scoring vs Filtering
Constraints operate in one of two mutually exclusive modes:Scoring Mode (Soft)
Usesweight to control relative importance. Returns a float score that contributes to the total energy.- Guides optimization toward better solutions
- Allows trade-offs between constraints
- Default mode (
weight=1.0)
python
Filter Mode (Hard)
Usesthreshold to create a binary pass/fail gate: a proposal passes when score <= threshold, otherwise it is rejected.- Proposals that fail are immediately rejected
- Rejected proposals skip all scoring constraints
- Saves compute on expensive evaluations
python
The Evaluation Pipeline
When the optimizer callsscore_energy(), constraints are evaluated in a specific order designed to reject bad proposals early and save expensive computation:
Key insight: Filter constraints run before scoring constraints. Rejected proposals skip expensive scoring entirely (like GPU-based structure prediction). Use cheap filters to screen out bad proposals before expensive scoring kicks in.
Creating Constraints
python
Key Parameters
The segments this constraint evaluates. Single-segment constraints get one sequence per proposal. Multi-segment constraints get a tuple of sequences (one per segment), enabling cross-segment evaluations like protein-protein interactions.
The scoring function from the constraint registry. Must accept
(input_sequences: list[tuple[Sequence, ...]], config) and return list[ConstraintOutput], one per proposal, each with a score in [0.0, 1.0].Configuration for the scoring function. Can be a dictionary (auto-validated against the function’s Pydantic config class) or a Pydantic model instance.
Label used for metadata tracking and result export. Defaults to the function name.
Multiplier for the raw score in energy calculation. Only used in scoring mode. Mutually exclusive with
threshold.If set, converts this to a filter constraint. Proposals with
score <= threshold pass; others are rejected. Mutually exclusive with weight.Constraint Categories
Proto provides built-in constraints organized by what they measure:Sequence Composition
GC content, k-mer frequency, homopolymer runs, and moreDNA/RNA sequence properties
Protein Quality
Length, complexity, amino acid balance, and moreProtein sanity checks
Protein Structure
Folding confidence, structural similarity, binding strength, and more3D folding and structural similarity
Sequence Annotation
Motif search, promoter strength, sequence similarity, and moreFunctional element detection
RNA Secondary Structure
Structure similarity, property matching, and moreRNA folding patterns
RNA Splicing
Splicing prediction and tissue specificitySplicing prediction
Sequence Alignment
Alignment quality metricsAlignment scoring
Common Constraint Patterns
- DNA Construct
- Protein Design
- Multi-Segment
A typical DNA construct optimization with sequence composition constraints:
python
Metadata Propagation
After evaluation, constraints write detailed results back to each sequence’s metadata. This shows why a sequence got its score:python
Full Metadata Structure
Full Metadata Structure
The metadata structure varies by constraint mode and number of input segments:Single-segment scoring constraint:Multi-segment constraint (additional linking info):The
python
python
data field contains constraint-specific metrics that vary by function. It exposes the actual measured values (GC percentage, pLDDT score, RMSD in angstroms) rather than just the normalized score.Custom Constraints
A constraint function can be defined without using the registry decorator:python
Custom constraint functions must return a
list[ConstraintOutput], one per input tuple, each with score in [0.0, 1.0] (0 = perfect) plus optional metadata. Add from proto_language import ConstraintOutput. Returning bare floats raises a TypeError at evaluation, and a non-finite score raises a ValueError.Next Steps
Optimizers
Learn how optimizers use constraints to search for optimal sequences
Tools
The bioinformatics tools that power constraint evaluation
Constraint Reference
Full API reference for every built-in constraint
Programs
Compose multi-stage pipelines with progressive constraints


























































