Skip to main content
Protein Max Identity
License: This constraint can use multiple tools, each under its own license. See the Tools Used tab and each tool’s page for license details.

This constraint is open source. Any third-party models, product names, or trademarks referenced are the property of their respective owners, and Proto is not affiliated with them.


proto-bio/proto-language/proto_language/constraint/protein_quality/protein_max_identity_constraint.py
View source
Require top-hit MMseqs2 identity to remain below a maximum threshold.

API Reference

ConfigProteinMaxIdentityConfig Source
Configuration for maximum identity to a protein reference set.
mmseqs_db
string
required
Path to MMseqs2 target database (FASTA file or MMseqs2 createdb output).
max_identity
number
default:"90.0"
Maximum allowed percent identity (0-100, inclusive) to the top reference hit.
pass_no_hits
boolean
default:"True"
If True, proposals with no MMseqs2 hit pass as novel sequences.
reference_fasta
string
Optional FASTA file for recovering the top hit sequence by target ID.
mmseqs_config
Mmseqs2SearchProteinsConfig
Advanced MMseqs2 protein search configuration.
ReturnsConstraintOutput
One output per proposal. A score of 0.0 passes and 1.0 fails. Metadata contains top-hit identity, target ID, target sequence when available, and selected ORF details for DNA.

Usage

python
from proto_language.core import Constraint
from proto_language.constraint import protein_max_identity_constraint, ProteinMaxIdentityConfig

constraint = Constraint(
    inputs=[segment],
    function=protein_max_identity_constraint,
    function_config=ProteinMaxIdentityConfig(
        # Configure parameters here
    ),
)

scores = constraint.evaluate()

Metadata

PropertyValue
Keyprotein-max-identity
Functionprotein_max_identity_constraint
Categoryprotein_quality
Modediscrete
Uses GPUFalse
Supported Typesdna, protein