Proto is not affiliated with Arc Institute. This toolkit is open source and builds on the implementation produced by this organization. Product names, logos, and trademarks are the property of their respective owners.
Background
Evo2 (Brixi et al., 2026) is a DNA language model trained with an autoregressive objective: during training the model learns to predict the next nucleotide given all preceding nucleotides. Training used the OpenGenome2 dataset, which spans bacterial, archaeal, eukaryotic, and phage genomes across all domains of life, so the model is not restricted to any single clade. It is available at several scales, the largest being 40 billion parameters, and uses the StripedHyena 2 architecture, a sequence model that combines convolutional state-space layers with a smaller number of attention layers. This design lets the model process very long stretches of DNA, up to roughly one million nucleotides for the long-context checkpoints, without the memory cost a pure attention model would incur at that length. Several checkpoints are also offered with shorter context windows for lower memory use, and one variant is trained specifically on Microviridae phage genomes. The autoregressive objective yields two capabilities directly. Sampling from the predicted next-nucleotide distributions produces new candidate sequences, and reading off the probabilities the model assigns to an existing sequence gives a likelihood score that reflects how closely the sequence matches the patterns seen during training. Evo2 is the second model in the Evo family; the earlier Evo1 was trained only on prokaryotic and phage genomes, whereas Evo2 extends to eukaryotic genomes and longer context.Learning Resources
- The Illustrated Evo 2 (NVIDIA Research) - a visual walkthrough of the Evo 2 architecture and how the model processes and generates DNA.
- Evo 2 Mechanistic Interpretability (Arc Institute) - an interactive look at the internal features Evo 2 learns, built with sparse autoencoders to surface interpretable genomic patterns.
Tools
Evo2 Sampling (evo2-sample)
Generates DNA sequences by autoregressive sampling. Given one or more prompt sequences in Evo2’s prompt format, the model extends each prompt nucleotide by nucleotide, drawing each new nucleotide from the model’s predicted distribution under the configured temperature, top_k, and top_p settings, until max_new_tokens new nucleotides have been produced or an end-of-sequence token is sampled. A key-value cache makes long generations efficient and can be carried forward to continue a generation.API Reference
Input: CausalModelSampleInput
Input: CausalModelSampleInput
Config: Evo2SampleConfig
Config: Evo2SampleConfig
evo2_7b, evo2_20b, evo2_40b, evo2_7b_base, evo2_40b_base, evo2_1b_base, evo2_7b_262k, evo2_7b_microviridaeTrue is coerced to 1 and False to 0.None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.False, only newly generated tokens are returned.Output: Evo2SampleOutput
Output: Evo2SampleOutput
Applications
This tool produces candidate DNA sequences for downstream design and screening, including genes, regulatory regions, and longer multi-gene segments. Because Evo2 is trained across all domains of life, it can be prompted with eukaryotic as well as prokaryotic and phage context, unlike the prokaryote-and-phage-only Evo1. The prompt sets the biological context for what follows.Usage Tips
- Match the checkpoint to the task.
evo2_7b(the default),evo2_20b, andevo2_40bare the 1M-context models in increasing size and capability. Theevo2_7b_base,evo2_40b_base, andevo2_1b_basecheckpoints are 8K-context counterparts (evo2_1b_baseis the smallest);evo2_7b_262kis a 262K-context variant;evo2_7b_microviridaeis a 7B model adapted on Microviridae genomes for generating that bacteriophage family. - Prompts use Evo2’s prompt format. Prompt strings follow Evo2’s special tokenization (for example a leading
+~before DNA); see the upstream Evo2 documentation for the conventions. top_kdefaults to 4, the size of the DNA alphabet. It exists mainly to keep generation on the four bases rather than other byte tokens, so it is not the diversity knob; control diversity withtemperature(lower stays near the training distribution, higher explores it) and leavetop_pat its default unless you specifically want nucleus sampling.- Output includes the prompt by default.
prepend_prompt=True(the default for this toolkit) returns the prompt joined to its continuation; set itFalseto receive only the newly generated nucleotides. - Prompt length plus
max_new_tokens(default 32) must fit the checkpoint’s context window. The model cannot attend beyond that window, so a long prompt directly reduces how much can be generated; pick a longer-context checkpoint when the combined length is large. stop_at_eosends generation early when the model emits an end-of-sequence token; set it toFalseto always produce the fullmax_new_tokens.- Generated sequences are candidates. Validate them with downstream tools (for example ORF detection, structure prediction, or homology search) before drawing biological conclusions.
Evo2 Scoring (evo2-score)
Scores existing DNA sequences under the Evo2 model. For each sequence, it computes the model’s predicted probability of every nucleotide given the preceding nucleotides and aggregates these into a log-likelihood, an average log-likelihood per nucleotide, and a perplexity. Optionally returns the per-position logits and the token vocabulary.API Reference
Input: CausalModelScoringInput
Input: CausalModelScoringInput
Config: Evo2ScoringConfig
Config: Evo2ScoringConfig
evo2_7b, evo2_20b, evo2_40b, evo2_7b_base, evo2_40b_base, evo2_1b_base, evo2_7b_262k, evo2_7b_microviridaeTrue is coerced to 1 and False to 0.None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.Output: CausalModelScoringOutput
Output: CausalModelScoringOutput
Metrics subclass with scalar metrics (log_likelihood, avg_log_likelihood, perplexity) and optional per-position _pp-suffixed list extras; logits and vocab are declared fields for raw model outputs.scores item)| Metric | Type | Range | Availability |
|---|---|---|---|
log_likelihood | float | ≤ 0.0 | always |
avg_log_likelihood | float | ≤ 0.0 | always |
perplexity | float | ≥ 1.0 | always |
Applications
This tool measures how well a DNA sequence matches the patterns the model learned during training across all domains of life. Lower perplexity means the sequence is more consistent with that distribution. Use it to rank or filter candidate sequences (including the output ofevo2-sample), to compare variants of a sequence, or to assess sequences from organisms outside the prokaryotic and phage range that Evo1 covers.Usage Tips
- Compare length-normalized scores within one checkpoint. Total
log_likelihoodscales with sequence length, so useperplexityoravg_log_likelihoodwhen comparing sequences of different lengths. Different checkpoints learn different distributions that are not calibrated to a common scale, so scores from differentmodel_checkpointvalues are hard to compare directly; a lower perplexity means the sequence is more consistent with that checkpoint’s training distribution. return_logitsdefaults toFalse. Leave it off unless you need the per-position distributions, since the logits tensor is large (sequence length by a 512-token vocabulary).prepend_bosadds a beginning-of-sequence token before scoring; leave itFalseunless matching a specific upstream convention.
Toolkit Notes
These apply to every Evo2 tool in this toolkit (evo2-sample, evo2-score).
- Requires a high-memory GPU; memory scales with model size and context length. The 7B checkpoint needs a high-memory NVIDIA GPU; the 20B and 40B models and the 1M-context checkpoints need substantially more. CPU execution is not practical.
batch_sizetrades memory for throughput across both tools. It sets how many prompts (evo2-sample) or sequences (evo2-score) are processed per GPU forward pass. Raise it for higher throughput on many short sequences; lower it (default1) if generation or scoring runs out of GPU memory.- Trained across all domains of life. Evo2 covers prokaryotic, eukaryotic, archaeal, and phage genomes. For prokaryote-and-phage-only generation with a smaller model, Evo1 is an alternative.

Arc Institute