Skip to main content
Specific K-mer Frequency

This constraint is open source. Any third-party models, product names, or trademarks referenced are the property of their respective owners, and Proto is not affiliated with them.


Source
proto-bio/proto-language/proto_language/constraint/sequence_composition/specific_kmer_constraint.py
View source
Evaluate frequency or usage deviation of a specific k-mer. Supports two scoring modes:
  1. Frequency mode: Raw k-mer frequency (count / total k-mer positions).
  2. Usage deviation mode: Observed/expected ratio using a zero-order Markov model (product of individual character frequencies).
Metadata varies by mode:Frequency mode:
  • {kmer}_frequency: Float frequency value
Usage deviation mode:
  • {kmer}_usage_deviation: Float observed/expected ratio
  • {kmer}_count: Integer observed count
  • {kmer}_expected: Float expected count

API Reference

ConfigSpecificKmerConfig Source
Configuration for evaluating a single specific k-mer.For evaluating all k-mers of a given length, use KmerFrequencyConstraint.
kmer
string
required
The specific k-mer to evaluate (e.g., ‘CG’, ‘GATC’, ‘ATG’)
scoring_mode
enum
default:"frequency"
Scoring mode: ‘frequency’ for raw counts, ‘usage_deviation’ for observed/expected ratioOptions: frequency, usage_deviation
min_value
number
required
Minimum acceptable frequency/deviation based on scoring_mode
max_value
number
required
Maximum acceptable frequency/deviation based on scoring_mode
ReturnsConstraintOutput
One result per input sequence. The metadata field carries per-mode k-mer data (see Note).

Usage

python
from proto_language.core import Constraint
from proto_language.constraint import specific_kmer_constraint, SpecificKmerConfig

constraint = Constraint(
    inputs=[segment],
    function=specific_kmer_constraint,
    function_config=SpecificKmerConfig(
        # Configure parameters here
    ),
)

scores = constraint.evaluate()

Metadata

PropertyValue
Keyspecific-kmer-frequency
Functionspecific_kmer_constraint
Categorysequence_composition
Modediscrete
Uses GPUFalse
Supported Typesdna, rna, protein