Skip to main content
License: Evo2 is open source and free for academic and commercial use under an Apache-2.0 license. Please refer to the license for full terms.

Proto is not affiliated with Arc Institute. This toolkit is open source and builds on the implementation produced by this organization. Product names, logos, and trademarks are the property of their respective owners.


arcinstitute/evo2
arcinstitute/evo2
Genome modeling and design across all domains of life
3.8k stars
View repo
Genome modelling and design across all domains of life with Evo 2
Garyk Brixi, Matthew G Durrant, … Brian L Hie
Nature (2026)
Read paper
Genome modeling and design across all domains of life with Evo 2
G. Brixi, Matthew G. Durrant, … Brian L. Hie
bioRxiv (2025)
Read preprint
@ARTICLE{Brixi2026-jn,
  title     = "Genome modelling and design across all domains of life with Evo 2",
  author    = "Brixi, Garyk and Durrant, Matthew G and Ku, Jerome and
               Naghipourfar, Mohsen and Poli, Michael and Sun, Gwanggyu and
               Brockman, Greg and Chang, Daniel and Fanton, Alison and Gonzalez,
               Gabriel A and King, Samuel H and Li, David B and Merchant, Aditi
               T and Nguyen, Eric and Ricci-Tam, Chiara and Romero, David W and
               Schmok, Jonathan C and Taghibakhshi, Ali and Vorontsov, Anton and
               Yang, Brandon and Deng, Myra and Gorton, Liv and Nguyen, Nam and
               Wang, Nicholas K and Pearce, Michael T and Simon, Elana and
               Adams, Etowah and Amador, Zachary J and Ashley, Euan A and
               Baccus, Stephen A and Dai, Haoyu and Dillmann, Steven and Ermon,
               Stefano and Guo, Daniel and Herschl, Michael H and Ilango, Rajesh
               and Janik, Ken and Lu, Amy X and Mehta, Reshma and Mofrad,
               Mohammad R K and Ng, Madelena Y and Pannu, Jaspreet and R{\'e},
               Christopher and St John, John and Sullivan, Jeremy and Tey,
               Joseph and Viggiano, Ben and Zhu, Kevin and Zynda, Greg and
               Balsam, Daniel and Collison, Patrick and Costa, Anthony B and
               Hernandez-Boussard, Tina and Ho, Eric and Liu, Ming-Yu and
               McGrath, Thomas and Powell, Kimberly and Pinglay, Sudarshan and
               Burke, Dave P and Goodarzi, Hani and Hsu, Patrick D and Hie,
               Brian L",
  journal   = "Nature",
  publisher = "Springer Science and Business Media LLC",
  pages     = "1--13",
  doi       = "10.1038/s41586-026-10176-5",
  month     =  mar,
  year      =  2026,
  language  = "en"
}
Copy citation
proto-bio/proto-tools/proto_tools/tools/causal_models/evo2
View source
Open Notebook
Open notebook
Coming soon!
Run this tool directly in Proto with no setup required.
FunctionDescription
run_evo2_sample()Sample DNA sequences using Evo2 language model (GPU) Docs Source
run_evo2_score()Score DNA sequences using Evo2 language model (GPU) Docs Source

Background

Evo2 (Brixi et al., 2026) is a DNA language model trained with an autoregressive objective: during training the model learns to predict the next nucleotide given all preceding nucleotides. Training used the OpenGenome2 dataset, which spans bacterial, archaeal, eukaryotic, and phage genomes across all domains of life, so the model is not restricted to any single clade. It is available at several scales, the largest being 40 billion parameters, and uses the StripedHyena 2 architecture, a sequence model that combines convolutional state-space layers with a smaller number of attention layers. This design lets the model process very long stretches of DNA, up to roughly one million nucleotides for the long-context checkpoints, without the memory cost a pure attention model would incur at that length. Several checkpoints are also offered with shorter context windows for lower memory use, and one variant is trained specifically on Microviridae phage genomes. The autoregressive objective yields two capabilities directly. Sampling from the predicted next-nucleotide distributions produces new candidate sequences, and reading off the probabilities the model assigns to an existing sequence gives a likelihood score that reflects how closely the sequence matches the patterns seen during training. Evo2 is the second model in the Evo family; the earlier Evo1 was trained only on prokaryotic and phage genomes, whereas Evo2 extends to eukaryotic genomes and longer context.

Learning Resources

  • The Illustrated Evo 2 (NVIDIA Research) - a visual walkthrough of the Evo 2 architecture and how the model processes and generates DNA.
  • Evo 2 Mechanistic Interpretability (Arc Institute) - an interactive look at the internal features Evo 2 learns, built with sparse autoencoders to surface interpretable genomic patterns.

Tools

Evo2 Sampling (evo2-sample)

Generates DNA sequences by autoregressive sampling. Given one or more prompt sequences in Evo2’s prompt format, the model extends each prompt nucleotide by nucleotide, drawing each new nucleotide from the model’s predicted distribution under the configured temperature, top_k, and top_p settings, until max_new_tokens new nucleotides have been produced or an end-of-sequence token is sampled. A key-value cache makes long generations efficient and can be carried forward to continue a generation.

API Reference

Source
prompts
List[string]
required
Prompt sequences to condition generation on. Can be provided as a single string or a list of strings.
Source
model_checkpoint
enum
default:"evo2_7b"
Evo2 weights variant.Available options: evo2_7b, evo2_20b, evo2_40b, evo2_7b_base, evo2_40b_base, evo2_1b_base, evo2_7b_262k, evo2_7b_microviridae
local_path
string
Override HuggingFace download with a local weights directory.
top_k
integer
default:"4"
Limit sampling to the top-k most probable tokens at each step.
max_new_tokens
integer
default:"32"
Maximum number of new tokens to generate per prompt (excludes prompt).
cached_generation
boolean
default:"True"
Use the model’s per-call KV cache during generation.
force_prompt_threshold
integer
Tokens to prefill in parallel before switching to autoregressive prompt forcing.
max_seqlen
integer
Maximum sequence length the KV cache will be sized for.
skip_special_tokens
boolean
default:"False"
Filter EOS/PAD bytes from the detokenized output.
stop_at_eos
boolean
default:"True"
Stop generation when an EOS (id=0) token is sampled.
old_kv_cache
Evo2KVCacheRef
Worker-local KV cache handle returned by a previous persistent-worker generation call.
return_kv_cache
boolean
default:"False"
Return worker-local KV cache handles for continued generation.
return_logits
boolean
default:"False"
Include per-position logits in the output.
verbose
integer
default:"0"
Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.
device
string
default:"cuda"
Device to run the model on.
timeout
integer
default:"1800"
Maximum execution time in seconds. None waits indefinitely.
seed
integer
Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.
prepend_prompt
boolean
default:"True"
Include the input prompt at the start of each generated sequence; when False, only newly generated tokens are returned.
temperature
number
default:"1.0"
Sampling temperature controlling randomness.
top_p
number
default:"1.0"
Nucleus sampling threshold over per-position token probabilities.
batch_size
integer
default:"1"
Number of sequences to process simultaneously.
Source
logits
array
Per-position logits for each generated sequence (shape: [num_sequences, num_generated_tokens, vocab_size]).
kv_caches
array
Worker-local cache handles for continued generation inside the same persistent worker.
sequences
List[string]
required
Generated DNA sequences.

Applications

This tool produces candidate DNA sequences for downstream design and screening, including genes, regulatory regions, and longer multi-gene segments. Because Evo2 is trained across all domains of life, it can be prompted with eukaryotic as well as prokaryotic and phage context, unlike the prokaryote-and-phage-only Evo1. The prompt sets the biological context for what follows.

Usage Tips

  • Match the checkpoint to the task. evo2_7b (the default), evo2_20b, and evo2_40b are the 1M-context models in increasing size and capability. The evo2_7b_base, evo2_40b_base, and evo2_1b_base checkpoints are 8K-context counterparts (evo2_1b_base is the smallest); evo2_7b_262k is a 262K-context variant; evo2_7b_microviridae is a 7B model adapted on Microviridae genomes for generating that bacteriophage family.
  • Prompts use Evo2’s prompt format. Prompt strings follow Evo2’s special tokenization (for example a leading +~ before DNA); see the upstream Evo2 documentation for the conventions.
  • top_k defaults to 4, the size of the DNA alphabet. It exists mainly to keep generation on the four bases rather than other byte tokens, so it is not the diversity knob; control diversity with temperature (lower stays near the training distribution, higher explores it) and leave top_p at its default unless you specifically want nucleus sampling.
  • Output includes the prompt by default. prepend_prompt=True (the default for this toolkit) returns the prompt joined to its continuation; set it False to receive only the newly generated nucleotides.
  • Prompt length plus max_new_tokens (default 32) must fit the checkpoint’s context window. The model cannot attend beyond that window, so a long prompt directly reduces how much can be generated; pick a longer-context checkpoint when the combined length is large.
  • stop_at_eos ends generation early when the model emits an end-of-sequence token; set it to False to always produce the full max_new_tokens.
  • Generated sequences are candidates. Validate them with downstream tools (for example ORF detection, structure prediction, or homology search) before drawing biological conclusions.

Evo2 Scoring (evo2-score)

Scores existing DNA sequences under the Evo2 model. For each sequence, it computes the model’s predicted probability of every nucleotide given the preceding nucleotides and aggregates these into a log-likelihood, an average log-likelihood per nucleotide, and a perplexity. Optionally returns the per-position logits and the token vocabulary.

API Reference

Source
sequences
List[string]
required
Sequences to score. Can be provided as a single string or a list of strings.
Source
model_checkpoint
enum
default:"evo2_7b"
Evo2 weights variant.Available options: evo2_7b, evo2_20b, evo2_40b, evo2_7b_base, evo2_40b_base, evo2_1b_base, evo2_7b_262k, evo2_7b_microviridae
local_path
string
Override HuggingFace download with a local weights directory.
prepend_bos
boolean
default:"False"
Prepend a beginning-of-sequence token before scoring.
verbose
integer
default:"0"
Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.
device
string
default:"cuda"
Device to run the model on.
timeout
integer
default:"1800"
Maximum execution time in seconds. None waits indefinitely.
seed
integer
Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.
batch_size
integer
default:"1"
Number of sequences to process simultaneously on GPU. Larger batches improve throughput but use more GPU memory; reduce if encountering out-of-memory errors.
return_logits
boolean
default:"False"
Include per-position logits in the output.
Source
scores
List[CausalModelScoringMetrics]
required
List of scoring outputs, one per input sequence. Each entry is a Metrics subclass with scalar metrics (log_likelihood, avg_log_likelihood, perplexity) and optional per-position _pp-suffixed list extras; logits and vocab are declared fields for raw model outputs.
Metrics (one set per scores item)
MetricTypeRangeAvailability
log_likelihoodfloat≤ 0.0always
avg_log_likelihoodfloat≤ 0.0always
perplexityfloat≥ 1.0always

Applications

This tool measures how well a DNA sequence matches the patterns the model learned during training across all domains of life. Lower perplexity means the sequence is more consistent with that distribution. Use it to rank or filter candidate sequences (including the output of evo2-sample), to compare variants of a sequence, or to assess sequences from organisms outside the prokaryotic and phage range that Evo1 covers.

Usage Tips

  • Compare length-normalized scores within one checkpoint. Total log_likelihood scales with sequence length, so use perplexity or avg_log_likelihood when comparing sequences of different lengths. Different checkpoints learn different distributions that are not calibrated to a common scale, so scores from different model_checkpoint values are hard to compare directly; a lower perplexity means the sequence is more consistent with that checkpoint’s training distribution.
  • return_logits defaults to False. Leave it off unless you need the per-position distributions, since the logits tensor is large (sequence length by a 512-token vocabulary).
  • prepend_bos adds a beginning-of-sequence token before scoring; leave it False unless matching a specific upstream convention.

Toolkit Notes

These apply to every Evo2 tool in this toolkit (evo2-sample, evo2-score).
  • Requires a high-memory GPU; memory scales with model size and context length. The 7B checkpoint needs a high-memory NVIDIA GPU; the 20B and 40B models and the 1M-context checkpoints need substantially more. CPU execution is not practical.
  • batch_size trades memory for throughput across both tools. It sets how many prompts (evo2-sample) or sequences (evo2-score) are processed per GPU forward pass. Raise it for higher throughput on many short sequences; lower it (default 1) if generation or scoring runs out of GPU memory.
  • Trained across all domains of life. Evo2 covers prokaryotic, eukaryotic, archaeal, and phage genomes. For prokaryote-and-phage-only generation with a smaller model, Evo1 is an alternative.
Example notebook: See the full working example for a copy-paste-ready walkthrough.

Infrastructure Guides

The following guides cover how to run tools efficiently and at scale.

Tool Persistence

Keep a tool’s model warm across calls instead of reloading it every invocation.

Device Management

How GPUs are allocated to tools and how to target specific devices.

Parallel Execution

Fan a batch of inputs out across multiple GPUs.

Cloud Inference

Run tools on managed cloud infrastructure with no local setup.