Proto is not affiliated with Proto. This toolkit is open source and builds on the implementation produced by this organization. Product names, logos, and trademarks are the property of their respective owners.
Background
Random Protein Sampling performs random mutagenesis at the protein level: it takes a protein sequence, determines which positions are designable, and replaces each with an amino acid sampled from the distribution implied by a codon scheme. It generates protein-sequence diversity without any learned model, the simplest possible baseline against which model-guided designers can be compared. Internally, designable positions are either the_ characters already present in the input or, when none are present, positions chosen by the configured masking strategy. The codon scheme is expanded to its concrete codons, and each amino acid’s sampling weight is set proportional to the number of codons in the scheme that encode it, with stop codons excluded. UNIFORM instead assigns equal weight to all twenty standard amino acids. Each masked position is filled independently by a weighted random draw. With a fixed seed the output is deterministic.
This tool is original proto-tools code maintained by Proto.
Tools
Random Protein Sampling (random-protein-sample)
Fills every masked position in each input sequence with a random amino acid drawn from the configured codon scheme, returning one filled sequence per input.API Reference
Input: MaskedModelInput
Input: MaskedModelInput
Config: RandomProteinSampleConfig
Config: RandomProteinSampleConfig
"UNIFORM" gives equal weight to all 20 amino acids; other schemes (NNK, NNS, NDT, etc.) weight amino acids by the number of codons encoding them.Available options: UNIFORM, NNN, NNK, NNS, NDT, DBK, NRT"*" is included in the sampling distribution. For degenerate schemes it is weighted by its stop-codon count; for "UNIFORM" it is an equally weighted 21st symbol. Default: False (stops never sampled).True is coerced to 1 and False to 0.None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.Output: RandomProteinSampleOutput
Output: RandomProteinSampleOutput
Applications
Use this to build randomized protein libraries that mimic experimental degenerate-codon mutagenesis, for exampleNNK saturation at chosen positions for directed-evolution and combinatorial screening. It also serves as an unbiased random baseline for judging whether a model-guided designer beats chance.Usage Tips
codon_scheme(defaultUNIFORM) sets the amino-acid distribution.UNIFORMdraws all twenty amino acids equally; degenerate schemes (NNK,NNS,NDT,DBK,NRT) weight each amino acid by how many of the scheme’s codons encode it, so residues such as leucine, serine, and arginine appear more often than methionine or tryptophan.NDTgives an even 12-amino-acid library. It encodes twelve amino acids with no codon redundancy, so each is equally likely; useful for small focused libraries.- Stop codons are excluded by default. Set
allow_stop_codonstoTrueto include the stop symbol*in the distribution: for degenerate schemes it is weighted by its stop-codon count, and forUNIFORMit is an equally weighted 21st symbol. _masks override the masking strategy. If an input already contains_, exactly those positions are filled andmasking_strategyis ignored; remove the_characters to let the strategy choose positions instead.masking_strategy.fixed_positionsare 1-indexed. Positions listed there are never mutated; they are specified using 1-based indexing to match biological residue selection conventions.- Set
seedfor reproducibility. Sampling is otherwise nondeterministic; a fixed seed makes the filled sequences reproducible across runs.
Toolkit Notes
These apply to every Random Protein Sampling tool in this toolkit (random-protein-sample).
- Runs on CPU. The sampler is pure Python with no model and no external dependencies; execution is near-instant.
- Deterministic only with a seed. Without a
seedthe filled positions differ every run; set one when you need reproducible libraries.

Proto