Skip to main content
MSA Generator

This generator is open source. Any third-party models, product names, or trademarks referenced are the property of their respective owners, and Proto is not affiliated with them.


Source
proto-bio/proto-language/proto_language/generator/msa_generator.py
View source
Generator that samples mutations from MSA position-specific distributions. This generator computes empirical probability distributions for each position in a multiple sequence alignment, then mutates proposal sequences by sampling from these distributions.

API Reference

ConfigMSAGeneratorConfig Source
Configuration object for MSAGenerator.This class defines configuration parameters for the MSA generator, which samples mutations from position-specific probability distributions derived from a multiple sequence alignment.
msa
MSA
required
Multiple sequence alignment (list of aligned sequences).
num_mutations
integer
default:"1"
Number of positions to mutate per sample
include_gaps
boolean
default:"False"
Whether to include gaps when calculating position probabilities

Usage

python
>>> from proto_language.generator import MSAGenerator, MSAGeneratorConfig
>>> from proto_language.core import Segment
>>> from proto_tools.entities.msa import MSA
>>> config = MSAGeneratorConfig(
...     msa=MSA(aligned_sequences=["MVLS", "AVLS", "MVLS"]),
...     num_mutations=1,
... )
>>> gen = MSAGenerator(config)
>>> segment = Segment(sequence="MVLS", sequence_type="protein")
>>> gen.assign(segment)
>>> gen.sample()  # Position 0 has 2/3 chance of M, 1/3 chance of A

Metadata

PropertyValue
Keymsa
ClassMSAGenerator
Categorymutation
Input Typestarting_sequence
Uses GPUFalse
Supported Sequence TypesAll
Allows Empty StartFalse