Skip to main content
Protein Length

This constraint is open source. Any third-party models, product names, or trademarks referenced are the property of their respective owners, and Proto is not affiliated with them.


Source
proto-bio/proto-language/proto_language/constraint/protein_quality/protein_length_constraint.py
View source
Evaluate whether protein sequence lengths fall within an acceptable range. This constraint function checks if protein sequences have lengths within a specified range, penalizing sequences that are too short or too long. This is useful for filtering out spurious ORF predictions. Penalties scale linearly with the distance outside the acceptable range.

API Reference

ConfigProteinLengthConfig Source
Configuration object for protein length constraint.This class defines configuration parameters for evaluating whether protein sequences fall within an acceptable length range. Length constraints are useful for filtering proteins that are too short or too long. The penalty scales linearly with the distance outside the acceptable range. For example, a protein 10 amino acids below min_length receives a proportionally smaller penalty than one 50 amino acids below.
min_length
integer
required
Minimum acceptable protein length below which sequences are penalized
max_length
integer
required
Maximum acceptable protein length above which sequences are penalized
ReturnsConstraintOutput
One result per sequence. A score of 0.0 indicates length is within the acceptable range [min_length, max_length] and higher values indicate greater deviation from the acceptable range. Penalties scale linearly: a sequence 10 amino acids outside the range receives half the penalty of one 20 amino acids outside. metadata carries:
  • protein_length: Integer length of the protein sequence in amino acids

Usage

Evaluating protein length within range:
python
>>> from proto_language.core import Sequence, SequenceType
>>> config = ProteinLengthConfig(min_length=10, max_length=500)
>>> seq = Sequence("MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSF", "protein")
>>> results = protein_length_constraint([(seq,)], config)
>>> print(results[0].score)  # 0.0
>>> print(results[0].metadata["protein_length"])  # 37

Metadata

PropertyValue
Keyprotein-length
Functionprotein_length_constraint
Categoryprotein_quality
Modediscrete
Uses GPUFalse
Supported Typesprotein