Skip to main content
Protein Nearest-Neighbor Gap Gini
License: This constraint can use multiple tools, each under its own license. See the Tools Used tab and each tool’s page for license details.

This constraint is open source. Any third-party models, product names, or trademarks referenced are the property of their respective owners, and Proto is not affiliated with them.


proto-bio/proto-language/proto_language/constraint/protein_quality/protein_nearest_neighbor_gap_gini_constraint.py
View source
Require low gap Gini against the nearest reference protein.

API Reference

ConfigProteinNearestNeighborGapGiniConfig Source
Configuration for gap Gini against nearest protein neighbors.
mmseqs_db
string
required
Path to MMseqs2 target database (FASTA file or MMseqs2 createdb output).
reference_fasta
string
required
FASTA file used to recover top-hit sequences by target ID.
max_gap_gini
number
default:"0.1"
Maximum acceptable gap Gini (0-1, inclusive) against the nearest reference protein.
pass_no_hits
boolean
default:"True"
If True, proposals with no MMseqs2 hit pass as novel sequences.
trim_alignment
boolean
default:"True"
Center-crop the pairwise alignment and strip end gaps before computing gap Gini.
mmseqs_config
Mmseqs2SearchProteinsConfig
Advanced MMseqs2 protein search configuration.
mafft_config
MafftConfig
Advanced MAFFT pairwise alignment configuration.
ReturnsConstraintOutput
One output per proposal. A score of 0.0 passes and 1.0 fails. Metadata contains the nearest target ID, nearest hit sequence, and computed gap Gini.

Usage

python
from proto_language.core import Constraint
from proto_language.constraint import protein_nearest_neighbor_gap_gini_constraint, ProteinNearestNeighborGapGiniConfig

constraint = Constraint(
    inputs=[segment],
    function=protein_nearest_neighbor_gap_gini_constraint,
    function_config=ProteinNearestNeighborGapGiniConfig(
        # Configure parameters here
    ),
)

scores = constraint.evaluate()

Metadata

PropertyValue
Keyprotein-nearest-neighbor-gap-gini
Functionprotein_nearest_neighbor_gap_gini_constraint
Categoryprotein_quality
Modediscrete
Uses GPUFalse
Supported Typesdna, protein