Skip to main content
Sequence Motif Match

This constraint is open source. Any third-party models, product names, or trademarks referenced are the property of their respective owners, and Proto is not affiliated with them.


Source
proto-bio/proto-language/proto_language/constraint/sequence_annotation/seq_motif_constraint.py
View source
Score DNA sequences against sequence motifs using MEME. This constraint function uses MEME Suite’s Find Individual Motif Occurrences tool to search for sequence motifs represented as position weight matrices in DNA sequences. It evaluates whether sequences contain desired motifs (wanted) or unwanted motifs (not_wanted). The scoring strategy penalizes sequences based on motif presence:
  • Unwanted motifs: Strong matches (low p-values) result in high penalties, encouraging sequences without these binding sites
  • Wanted motifs: Strong matches result in low penalties (rewards), while missing wanted motifs result in high penalties
  • No motif specification: Any motif matches are penalized (novelty constraint)

API Reference

ConfigSeqMotifConfig Source
Configuration for sequence motif constraint using MEME.This class defines configuration parameters for evaluating DNA sequences against known transcription factor binding motifs using MEME Suite’s Find Individual Motif Occurrences tool. The constraint searches for position weight matrix motifs in sequences and can either encourage specific motifs (wanted) or discourage them (not_wanted), enabling design of sequences with controlled sites.
Motif names must match exactly with the names in the MEME file (case-sensitive). Use the MOTIF lines in the .meme file to identify available motif names.
motifs_path
string
required
Path to MEME format motif file (.meme) containing PWMs.
meme_bin_path
string
required
Path to directory containing MEME Suite binaries (must include fimo).
wanted
array
Motifs that should be present: ā€˜all’ (all motifs), ā€˜none’ (no requirement), or list of motif names.
not_wanted
array
Motifs that should NOT be present: ā€˜all’ (reject all), ā€˜none’ (allow all), or list of motif names.
scale
number
default:"1.0"
Scaling factor to adjust penalty magnitude (>1 = stricter, <1 = more lenient). Example: 1.0
exclusive
boolean
default:"True"
If True, automatically sets unwanted motifs as complement of wanted motifs
aggregation
enum
default:"smart"
How to aggregate penalties: ā€˜smart’ (adaptive), ā€˜average’, ā€˜max’ (strictest), ā€˜percentile’Options: smart, average, max, percentile
percentile_value
number
default:"95.0"
Which percentile to use when aggregation=ā€˜percentile’ (0-100)
unwanted_focus
boolean
default:"True"
When both wanted and unwanted motifs exist, weight unwanted motifs more heavily in final score
ReturnsConstraintOutput
One result per sequence. Score ranges from 0.0 (all criteria satisfied) to 1.0 (severe violations). metadata carries a single motif_constraint dict:
  • penalty: Float overall penalty score (0.0-1.0)
  • wanted: Sorted list of wanted motif names
  • not_wanted: Sorted list of unwanted motif names
  • found: Dictionary mapping motif names to their best (lowest) FIMO p-values
  • details: Dictionary with per-motif scoring details including:
    • penalty: Individual motif penalty
    • status: ā€œwanted_foundā€, ā€œwanted_missingā€, ā€œunwantedā€, or ā€œunwanted_absentā€
    • p_value: FIMO p-value if motif was found
  • aggregation_info: Dictionary with aggregation statistics:
    • method: Aggregation method used
    • unwanted_count: Number of unwanted motif evaluations
    • wanted_count: Number of wanted motif evaluations
    • unwanted_matches: Number of unwanted motifs found
    • wanted_matches: Number of wanted motifs found

Usage

Requiring specific transcription factor binding sites:
python
>>> from proto_language.core import Sequence, SequenceType
>>> promoter_seq = Sequence("ATCGGCGGGATCGTAATATAGCATGC", "dna")
>>> config = SeqMotifConfig(
...     motifs_path="/data/jaspar_vertebrates.meme",
...     meme_bin_path="/usr/local/meme/bin",
...     wanted=["SP1", "lacI"],
...     aggregation="average",
... )
>>> results = seq_motif_constraint([(promoter_seq,)], config)
>>> print(results[0].score)  # e.g., 0.15
>>> print(results[0].metadata["motif_constraint"]["found"])  # e.g., {"SP1": 1e-8}

Metadata

PropertyValue
Keyseq-motif
Functionseq_motif_constraint
Categorysequence_annotation
Modediscrete
Uses GPUFalse
Supported Typesdna