Proto is not affiliated with Proto. This toolkit is open source and builds on the implementation produced by this organization. Product names, logos, and trademarks are the property of their respective owners.
Background
Random Nucleotide Sampling performs random mutagenesis at the nucleotide level: it takes a DNA or RNA sequence, determines which positions are designable, and replaces each with a base drawn uniformly from a chosen IUPAC degenerate-base pool. It generates nucleotide diversity without any learned model, the simplest possible baseline against which model-guided generators can be compared. Internally, designable positions are either the_ characters already present in the input or, when none are present, positions chosen by the configured masking strategy. Each masked position is filled independently by drawing one base uniformly at random from the pool that the IUPAC code expands to: N expands to A/C/G/T, R to A/G, S to G/C, and so on. Sampling is uniform within the pool, with no frequency weighting. When the input is RNA, sampled T bases are converted to U. With a fixed seed the output is deterministic.
This tool is original proto-tools code maintained by Proto.
Tools
Random Nucleotide Sampling (random-nucleotide-sample)
Fills every masked position in each input sequence with a random base from the configured IUPAC substitution pool, returning one filled sequence per input.API Reference
Input: RandomNucleotideSampleInput
Input: RandomNucleotideSampleInput
_ at positions to mutate. Accepts a single string or a list.Config: RandomNucleotideSampleConfig
Config: RandomNucleotideSampleConfig
"N" = any base (ACGT); "R" = purines (AG); "Y" = pyrimidines (CT); etc.Available options: N, R, Y, S, W, K, M, B, D, H, V"auto" detects DNA vs RNA by presence of U; "dna" or "rna" forces the type.Available options: auto, dna, rnaTrue is coerced to 1 and False to 0.None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.Output: RandomNucleotideSampleOutput
Output: RandomNucleotideSampleOutput
Applications
Use this to build randomized nucleotide libraries: degenerate positions in promoters, ribosome binding sites, UTRs, or coding regions for directed-evolution and combinatorial-screening campaigns. It also serves as an unbiased random baseline for judging whether a model-guided generator produces better-than-chance sequences.Usage Tips
substitution_scheme(defaultN) sets the substitution alphabet.Nallows any base for maximum diversity; restrict it to bias the library, for exampleRfor purines (A/G),Sfor strong pairs (G/C), orWfor weak pairs (A/T)._masks override the masking strategy. If an input already contains_, exactly those positions are filled andmasking_strategyis ignored; remove the_characters to let the strategy choose positions instead.sequence_type(defaultauto) controls RNA handling.autotreats the sequence as RNA only when it containsU; force it withdnaorrna. In RNA mode sampledTbases are written asU, so setrnaexplicitly when the input is fully masked.masking_strategy.fixed_positionsare 1-indexed. Positions listed there are never mutated; they are specified using 1-based indexing to match biological residue selection conventions.- Set
seedfor reproducibility. Sampling is otherwise nondeterministic; a fixed seed makes the filled sequences reproducible across runs.
Toolkit Notes
These apply to every Random Nucleotide Sampling tool in this toolkit (random-nucleotide-sample).
- Runs on CPU. The sampler is pure Python with no model and no external dependencies; execution is near-instant.
- Deterministic only with a seed. Without a
seedthe filled positions differ every run; set one when you need reproducible libraries.

Proto