Skip to main content
Semigreedy Mutation Generator

This generator is open source. Any third-party models, product names, or trademarks referenced are the property of their respective owners, and Proto is not affiliated with them.


Source
proto-bio/proto-language/proto_language/generator/semigreedy_mutation_generator.py
View source
Introduce single-point mutations guided by a PSSM derived from seq.logits. Each call to sample() selects one position per proposal sequence and replaces the amino acid there by sampling from the softmax distribution over logits (with the current residue optionally excluded). Position selection is controlled by position_weighting:
  • "uniform": every position is equally likely.
  • "entropy": positions with higher Shannon entropy in the PSSM are more likely, targeting the most uncertain residues.
  • "plddt": positions are weighted by (1 - pLDDT) read from the canonical proposal.structure.per_residue_plddt property when a Structure with pLDDT B-factors is present, so structurally uncertain residues are mutated more frequently; otherwise it falls back to uniform.
frozen_positions hard-excludes listed indices from selection (deterministic counterpart to sequence_bias); whatever residue is there stays. Implements Germinal’s design_semigreedy phase (MCMCOptimizer at near-zero temperature, proposals_per_result > 1).
clear_logits=False (default) requires upstream GradientOptimizer logits at runtime; clear_logits=True runs as pure sequence-only mutation.

API Reference

ConfigSemigreedyMutationGeneratorConfig Source
Configuration for semigreedy single-point mutation sampling.Converts seq.logits (from a preceding gradient-based optimizer) to a PSSM via softmax and samples single-point mutations from it. Stage 2 of the Germinal pipeline: paired with MCMCOptimizer at near-zero temperature for greedy/semigreedy discrete refinement.
position_weighting
enum
default:"uniform"
‘uniform’ picks at random; ‘entropy’ picks high-entropy positions; ‘plddt’ picks low-pLDDT.Options: uniform, entropy, plddt
temperature
number
default:"1.0"
Softmax temperature on logits when building the PSSM. Below 1 sharpens; above 1 flattens (> 0).
exclude_current
boolean
default:"True"
Zero out the current amino acid before sampling to guarantee a mutation.
sequence_bias
SequenceLogitBiasConfig
Optional declarative sequence-symbol bias applied before AA sampling.
clear_logits
boolean
default:"False"
When True, ignore proposal logits and sample replacement from sequence_bias (or uniform if unset).
frozen_positions
array
Position indices to keep untouched during mutation (Python-style zero-based).

Usage

python
>>> from proto_language.core import Segment
>>> segment = Segment(sequence="ACDEF", sequence_type="protein")
>>> gen = SemigreedyMutationGenerator(SemigreedyMutationGeneratorConfig(position_weighting="entropy"))
>>> gen.assign(segment)
>>> # Normally logits come from a GradientOptimizer; here we set them manually:
>>> import numpy as np
>>> segment.proposal_sequences[0].logits = np.random.randn(5, 20)
>>> gen.sample()
>>> # Exactly one position differs from "ACDEF"

Metadata

PropertyValue
Keysemigreedy-mutation
ClassSemigreedyMutationGenerator
Categorymutation
Input Typestarting_sequence
Uses GPUFalse
Supported Sequence Typesprotein
Allows Empty StartFalse