Skip to main content
Sequence Length

This constraint is open source. Any third-party models, product names, or trademarks referenced are the property of their respective owners, and Proto is not affiliated with them.


Source
proto-bio/proto-language/proto_language/constraint/sequence_composition/sequence_length_constraint.py
View source
Evaluate sequence length against target value or acceptable range. This constraint function evaluates whether sequences have appropriate lengths. It supports two modes: range mode (acceptable length window) and target mode (exact length matching).

API Reference

ConfigSequenceLengthConfig Source
Configuration for sequence length constraint.This class defines configuration parameters for evaluating sequence length in DNA, RNA, or protein sequences. The constraint supports two modes: range mode (specify acceptable length range) and target mode (specify exact target length).Supports two mutually exclusive modes:
  1. Range mode: Specify both min_length and max_length to define an acceptable length range. Sequences within this range receive score 0.0, while those outside are penalized based on distance from the range.
  2. Target mode: Specify target_length for exact length matching. Sequences exactly matching the target receive score 0.0, while deviations are penalized based on proportional distance from the target.
min_length
integer
Minimum acceptable length (use with max_length for range mode)
max_length
integer
Maximum acceptable length (use with min_length for range mode)
target_length
integer
Target length for exact matching (alternative to min/max range)
ReturnsConstraintOutput
One result per sequence. A score of 0.0 indicates the sequence meets the length requirement (within range or at target). Higher scores indicate greater deviation:
  • Range mode: Linear penalty based on distance outside [min, max]. Score = 0.0 if within range, else proportional to deviation distance.
  • Target mode: Normalized penalty as |actual - target| / target. For example, 10% deviation from target yields score ~0.1.
metadata carries:For range mode:
  • length: Integer actual sequence length
  • length_mode: String “range”
  • length_min: Integer minimum acceptable length
  • length_max: Integer maximum acceptable length
For target mode:
  • length: Integer actual sequence length
  • length_mode: String “target”
  • length_target: Integer target length

Usage

Range mode (protein):
python
>>> seqs = [(Sequence("MVLSP", "protein"),)]
>>> cfg = SequenceLengthConfig(min_length=4, max_length=10)
>>> results = sequence_length_constraint(seqs, cfg)
Target mode (DNA):
python
>>> seqs = [(Sequence("ATCGATCG", "dna"),)]
>>> cfg = SequenceLengthConfig(target_length=8)
>>> results = sequence_length_constraint(seqs, cfg)

Metadata

PropertyValue
Keysequence-length
Functionsequence_length_constraint
Categorysequence_composition
Modediscrete
Uses GPUFalse
Supported Typesdna, rna, protein