
This constraint is open source. Any third-party models, product names, or trademarks referenced are the property of their respective owners, and Proto is not affiliated with them.
- PWM score: Position weight matrix score based on conservation probabilities
- Match count: Simple count of consensus matches (out of 12 positions)
- Spacer length: Deviation from optimal 17 bp spacer
API Reference
Configuration for sigma-70 promoter similarity constraint.This class defines configuration parameters for evaluating bacterial promoter
similarity using a position weight matrix (PWM) model of E. coli sigma-70 promoters.
The model scores promoter elements based on similarity to consensus sequences
for the -35 and -10 boxes, the spacer distance between them, and
the total number of matches to consensus. This approach is based on RegulonDB
experimental data for E. coli sigma-70-dependent promoters.The scoring combines three components:
- PWM score: Similarity to consensus sequences weighted by conservation
- Match count: Number of exact matches to consensus (out of 12 positions)
- Spacer length: Distance between -35 and -10 boxes
The constraint scans sequences to find the best-scoring promoter within
the allowed spacer range. For sequences ≤32 bp, it treats the entire
sequence as a single promoter (first 6 bp = -35, last 6 bp = -10). For
longer sequences, it scans all possible positions.The final penalty combines three components:
- Box penalty = (1 - match_weight) * PWM_penalty + match_weight * match_penalty
- Total penalty = (1 - spacer_weight) * box_penalty + spacer_weight * spacer_penalty
-35 box consensus sequence (6 bp, typically TTGACA for E. coli sigma-70)
-10 box consensus sequence (6 bp Pribnow box, typically TATAAT for E. coli sigma-70)
Position-specific conservation probabilities for -35 box (6 values). From RegulonDB.
Position-specific conservation probabilities for -10 box (6 values). From RegulonDB.
Optimal spacer length between -35 and -10 boxes in base pairs (typically 17±1 bp)
Standard deviation for spacer length penalty. Lower values = stricter spacing requirement.
Weight (0-1) for spacer penalty in total score. Higher = spacing more important.
PWM score exponent for non-linearity. Lower values = more sensitive to mismatches.
Optimal number of matches to consensus (out of 12 total positions)
Standard deviation for match count penalty
Weight (0-1) for match count penalty in total score
Minimum acceptable spacer length in bp
Maximum acceptable spacer length in bp
ReturnsConstraintOutput
One result per sequence. Score ranges from 0.0 (perfect
promoter, exact consensus with optimal spacer) to 1.0 (poor/no promoter).
metadata carries a single sigma70 dict with the following fields:For valid promoters found:sigma70_score: Float overall penalty score (0.0-1.0)pos: Integer start position of the -35 box in the sequencebox35: String sequence of the -35 box (6 bp)box10: String sequence of the -10 box (6 bp)spacer_len: Integer spacer length between boxes (bp)total_matches: Integer total matches to consensus (out of 12)pwm_penalty: Float PWM-based penalty component (0.0-1.0)match_penalty: Float match count penalty component (0.0-1.0)spacer_penalty: Float spacer length penalty component (0.0-1.0)
sigma70_score: Float 1.0 (maximum penalty)reason: String “too_short”
sigma70_score: Float 1.0 (maximum penalty)reason: String “invalid_spacer”
Usage
Evaluating a canonical sigma-70 promoter:python
Metadata
| Property | Value |
|---|---|
| Key | sigma70-promoter |
| Function | sigma70_promoter_constraint |
| Category | sequence_annotation |
| Mode | discrete |
| Uses GPU | False |
| Supported Types | dna |