Protein Max Identity

License: This constraint can use multiple tools, each under its own license. See the Tools Used tab and each tool’s page for license details.

This constraint is open source. Any third-party models, product names, or trademarks referenced are the property of their respective owners, and Proto is not affiliated with them.

Tools Used Tools Used Source Source

proto-bio/proto-language/proto_language/constraint/protein_quality/protein_max_identity_constraint.py

View source Require top-hit MMseqs2 identity to remain below a maximum threshold.

API Reference

ConfigProteinMaxIdentityConfig Source

Configuration for maximum identity to a protein reference set.

mmseqs_db

string

required

Path to MMseqs2 target database (FASTA file or MMseqs2 createdb output).

max_identity

number

default:"90.0"

Maximum allowed percent identity (0-100, inclusive) to the top reference hit.

pass_no_hits

boolean

default:"True"

If True, proposals with no MMseqs2 hit pass as novel sequences.

reference_fasta

string

Optional FASTA file for recovering the top hit sequence by target ID.

mmseqs_config

Mmseqs2SearchProteinsConfig

Advanced MMseqs2 protein search configuration.

ReturnsConstraintOutput

One output per proposal. A score of 0.0 passes and 1.0 fails. Metadata contains top-hit identity, target ID, target sequence when available, and selected ORF details for DNA.

Usage

python

from proto_language.core import Constraint
from proto_language.constraint import protein_max_identity_constraint, ProteinMaxIdentityConfig

constraint = Constraint(
    inputs=[segment],
    function=protein_max_identity_constraint,
    function_config=ProteinMaxIdentityConfig(
        # Configure parameters here
    ),
)

scores = constraint.evaluate()

Metadata

Property	Value
Key	`protein-max-identity`
Function	`protein_max_identity_constraint`
Category	`protein_quality`
Mode	`discrete`
Uses GPU	`False`
Supported Types	`dna`, `protein`

Protein Length

Protein Nearest-Neighbor Gap Gini

​API Reference

​Usage

​Metadata

API Reference

Usage

Metadata