Proto is not affiliated with Arc Institute. This toolkit is open source and builds on the implementation produced by this organization. Product names, logos, and trademarks are the property of their respective owners.
Background
Evo1 (Nguyen et al., 2024) is a 7-billion-parameter DNA language model trained with an autoregressive objective: during training the model learns to predict the next nucleotide given all preceding nucleotides. Training used the OpenGenome dataset, roughly 2.7 million prokaryotic and phage genomes, so the model’s predictions are most reliable for bacterial, archaeal, and phage sequences and are not expected to transfer well to eukaryotic genomes. It uses the StripedHyena architecture, a sequence model that combines convolutional state-space layers with a smaller number of attention layers. This design lets it process long stretches of DNA, up to 131,072 nucleotides for the long-context checkpoint, without the memory cost a pure attention model would incur at that length. The autoregressive objective yields two capabilities directly. Sampling from the predicted next-nucleotide distributions produces new candidate sequences, and reading off the probabilities the model assigns to an existing sequence gives a likelihood score that reflects how closely the sequence matches the patterns seen during training. Alongside the base checkpoints, the authors released specialized variants trained on CRISPR loci and on transposable elements for those sequence types. Evo1 is the first model in the Evo family; Evo2 extends the approach to eukaryotic genomes and longer context.Learning Resources
- Learning from DNA: a grand challenge in biology (Hazy Research, Stanford) - an accessible introduction to Evo from the authors, covering the motivation for genomic language modeling and how the model is trained and used.
- Evo: DNA foundation modeling from molecular to genome scale (Arc Institute) - an overview of Evo’s capabilities, including genome-scale generation and the StripedHyena architecture.
Tools
Evo1 Sampling (evo1-sample)
Generates DNA sequences by autoregressive sampling. Given one or more prompt sequences, the model extends each prompt nucleotide by nucleotide, drawing each new nucleotide from the model’s predicted distribution under the configured temperature, top_k, and top_p settings, until max_new_tokens new nucleotides have been produced. Optionally returns a per-sequence likelihood score (log-likelihood, average log-likelihood, and perplexity) for the generated sequences.API Reference
Input: CausalModelSampleInput
Input: CausalModelSampleInput
Config: Evo1SampleConfig
Config: Evo1SampleConfig
evo-1-8k-* variants use an 8,192-token context, evo-1-131k-base extends to 131,072 tokens, and -crispr/-transposon are domain fine-tunes.Available options: evo-1.5-8k-base, evo-1-8k-base, evo-1-131k-base, evo-1-8k-crispr, evo-1-8k-transposon4 (one per DNA base).True is coerced to 1 and False to 0.None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.False (the default), only newly generated tokens are returned.Applications
This tool produces candidate DNA sequences for downstream design and screening, including synthetic genes, regulatory regions, CRISPR systems (using theevo-1-8k-crispr checkpoint), and transposable elements (using the evo-1-8k-transposon checkpoint). The prompt sets the biological context for what follows, for example a start codon or promoter region.Usage Tips
- Match the checkpoint to the task.
evo-1-8k-base(the default) is the general prokaryotic and phage DNA model andevo-1-131k-baseis its genome-scale, long-context counterpart.evo-1-8k-crisprandevo-1-8k-transposonare task-specific variants ofevo-1-8k-basefor generating CRISPR-Cas systems and IS200/IS605 transposons; use them when generating those systems and a base checkpoint otherwise. top_kdefaults to 4, the size of the DNA alphabet. It exists mainly to keep generation on the four bases rather than other byte tokens, so it is not the diversity knob; control diversity withtemperature(lower stays near the training distribution, higher explores it) and leavetop_pat its default unless you specifically want nucleus sampling.- Output excludes the prompt by default.
prepend_prompt=Falsereturns only the newly generated nucleotides, not the prompt joined to its continuation; set itTrueif you need the full sequence back. - Prompt length plus
max_new_tokensmust fit the checkpoint’s context window (8,192 nucleotides for the 8k checkpoints;evo-1-131k-basefor longer). The model cannot attend beyond that window, so a long prompt directly reduces how much can be generated. - Generated sequences are candidates. Validate them with downstream tools (for example ORF detection, structure prediction, or homology search) before drawing biological conclusions.
Evo1 Scoring (evo1-score)
Scores existing DNA sequences under the Evo1 model. For each sequence, it computes the model’s predicted probability of every nucleotide given the preceding nucleotides and aggregates these into a log-likelihood, an average log-likelihood per nucleotide, and a perplexity. Optionally returns the per-position logits and the token vocabulary.API Reference
Input: CausalModelScoringInput
Input: CausalModelScoringInput
Config: Evo1ScoringConfig
Config: Evo1ScoringConfig
evo-1.5-8k-base, evo-1-8k-base, evo-1-131k-base, evo-1-8k-crispr, evo-1-8k-transposonTrue is coerced to 1 and False to 0.None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.Output: CausalModelScoringOutput
Output: CausalModelScoringOutput
Metrics subclass with scalar metrics (log_likelihood, avg_log_likelihood, perplexity) and optional per-position _pp-suffixed list extras; logits and vocab are declared fields for raw model outputs.scores item)| Metric | Type | Range | Availability |
|---|---|---|---|
log_likelihood | float | ≤ 0.0 | always |
avg_log_likelihood | float | ≤ 0.0 | always |
perplexity | float | ≥ 1.0 | always |
Applications
This tool measures how well a DNA sequence matches the patterns the model learned from natural prokaryotic and phage genomes. Lower perplexity means the sequence is more consistent with that training distribution. Use it to rank or filter candidate sequences (including the output ofevo1-sample), to compare variants of a sequence, or to flag sequences that fall far outside the model’s training domain.Usage Tips
- Compare length-normalized scores within one checkpoint. Total
log_likelihoodscales with sequence length, so useperplexityoravg_log_likelihoodwhen comparing sequences of different lengths. Different checkpoints learn different distributions that are not calibrated to a common scale, so scores from differentmodel_namevalues are hard to compare directly; a lower perplexity means the sequence is more consistent with that checkpoint’s training distribution. return_logitsdefaults toFalse. Leave it off unless you need the per-position distributions, since the logits tensor is large (sequence length by a 512-token vocabulary).
Toolkit Notes
These apply to every Evo1 tool in this toolkit (evo1-sample, evo1-score).
- Requires a GPU. An NVIDIA GPU with at least 24 GB of memory is recommended; CPU execution is possible but very slow and not practical for typical use.
batch_sizetrades memory for throughput across both tools. It sets how many prompts (evo1-sample) or sequences (evo1-score) are processed per GPU forward pass. Raise it for higher throughput on many short sequences; lower it (default1) if generation or scoring runs out of GPU memory.- Trained on prokaryotic and phage genomes. Predictions are most reliable within that domain. For eukaryotic genomes or longer context, use Evo2.

Arc Institute