Sequence Motif Match

This constraint is open source. Any third-party models, product names, or trademarks referenced are the property of their respective owners, and Proto is not affiliated with them.

Source

proto-bio/proto-language/proto_language/constraint/sequence_annotation/seq_motif_constraint.py

View source Score DNA sequences against sequence motifs using MEME. This constraint function uses MEME Suite’s Find Individual Motif Occurrences tool to search for sequence motifs represented as position weight matrices in DNA sequences. It evaluates whether sequences contain desired motifs (wanted) or unwanted motifs (not_wanted). The scoring strategy penalizes sequences based on motif presence:

Unwanted motifs: Strong matches (low p-values) result in high penalties, encouraging sequences without these binding sites
Wanted motifs: Strong matches result in low penalties (rewards), while missing wanted motifs result in high penalties
No motif specification: Any motif matches are penalized (novelty constraint)

API Reference

ConfigSeqMotifConfig Source

Configuration for sequence motif constraint using MEME.This class defines configuration parameters for evaluating DNA sequences against known transcription factor binding motifs using MEME Suite’s Find Individual Motif Occurrences tool. The constraint searches for position weight matrix motifs in sequences and can either encourage specific motifs (wanted) or discourage them (not_wanted), enabling design of sequences with controlled sites.

Motif names must match exactly with the names in the MEME file (case-sensitive). Use the MOTIF lines in the .meme file to identify available motif names.

motifs_path

string

required

Path to MEME format motif file (.meme) containing PWMs.

meme_bin_path

string

required

Path to directory containing MEME Suite binaries (must include fimo).

wanted

array

Motifs that should be present: ‘all’ (all motifs), ‘none’ (no requirement), or list of motif names.

not_wanted

array

Motifs that should NOT be present: ‘all’ (reject all), ‘none’ (allow all), or list of motif names.

scale

number

default:"1.0"

Scaling factor to adjust penalty magnitude (>1 = stricter, <1 = more lenient). Example: 1.0

exclusive

boolean

default:"True"

If True, automatically sets unwanted motifs as complement of wanted motifs

aggregation

enum

default:"smart"

How to aggregate penalties: ‘smart’ (adaptive), ‘average’, ‘max’ (strictest), ‘percentile’Options: smart, average, max, percentile

percentile_value

number

default:"95.0"

Which percentile to use when aggregation=‘percentile’ (0-100)

unwanted_focus

boolean

default:"True"

When both wanted and unwanted motifs exist, weight unwanted motifs more heavily in final score

ReturnsConstraintOutput

One result per sequence. Score ranges from 0.0 (all criteria satisfied) to 1.0 (severe violations). metadata carries a single motif_constraint dict:

penalty: Float overall penalty score (0.0-1.0)
wanted: Sorted list of wanted motif names
not_wanted: Sorted list of unwanted motif names
found: Dictionary mapping motif names to their best (lowest) FIMO p-values
details: Dictionary with per-motif scoring details including:
- penalty: Individual motif penalty
- status: “wanted_found”, “wanted_missing”, “unwanted”, or “unwanted_absent”
- p_value: FIMO p-value if motif was found
aggregation_info: Dictionary with aggregation statistics:
- method: Aggregation method used
- unwanted_count: Number of unwanted motif evaluations
- wanted_count: Number of wanted motif evaluations
- unwanted_matches: Number of unwanted motifs found
- wanted_matches: Number of wanted motifs found

Usage

Requiring specific transcription factor binding sites:

python

>>> from proto_language.core import Sequence, SequenceType
>>> promoter_seq = Sequence("ATCGGCGGGATCGTAATATAGCATGC", "dna")
>>> config = SeqMotifConfig(
...     motifs_path="/data/jaspar_vertebrates.meme",
...     meme_bin_path="/usr/local/meme/bin",
...     wanted=["SP1", "lacI"],
...     aggregation="average",
... )
>>> results = seq_motif_constraint([(promoter_seq,)], config)
>>> print(results[0].score)  # e.g., 0.15
>>> print(results[0].metadata["motif_constraint"]["found"])  # e.g., {"SP1": 1e-8}

Metadata

Property	Value
Key	`seq-motif`
Function	`seq_motif_constraint`
Category	`sequence_annotation`
Mode	`discrete`
Uses GPU	`False`
Supported Types	`dna`

​API Reference

​Usage

​Metadata

API Reference

Usage

Metadata