Evo2 - Proto

License: Evo2 is open source and free for academic and commercial use under an Apache-2.0 license. Please refer to the license for full terms.

Proto is not affiliated with Arc Institute. This toolkit is open source and builds on the implementation produced by this organization. Product names, logos, and trademarks are the property of their respective owners.

GitHub 3.8k GitHub 3.8k Publication Publication Preprint Preprint Cite Cite Tool Source Tool Source Open as Notebook Open as Notebook Open on Proto Open on Proto

arcinstitute/evo2

Genome modeling and design across all domains of life

3.8k stars

View repo

Genome modelling and design across all domains of life with Evo 2

Garyk Brixi, Matthew G Durrant, … Brian L Hie

Nature (2026)

Read paper

Genome modeling and design across all domains of life with Evo 2

G. Brixi, Matthew G. Durrant, … Brian L. Hie

bioRxiv (2025)

Read preprint

@ARTICLE{Brixi2026-jn,
  title     = "Genome modelling and design across all domains of life with Evo 2",
  author    = "Brixi, Garyk and Durrant, Matthew G and Ku, Jerome and
               Naghipourfar, Mohsen and Poli, Michael and Sun, Gwanggyu and
               Brockman, Greg and Chang, Daniel and Fanton, Alison and Gonzalez,
               Gabriel A and King, Samuel H and Li, David B and Merchant, Aditi
               T and Nguyen, Eric and Ricci-Tam, Chiara and Romero, David W and
               Schmok, Jonathan C and Taghibakhshi, Ali and Vorontsov, Anton and
               Yang, Brandon and Deng, Myra and Gorton, Liv and Nguyen, Nam and
               Wang, Nicholas K and Pearce, Michael T and Simon, Elana and
               Adams, Etowah and Amador, Zachary J and Ashley, Euan A and
               Baccus, Stephen A and Dai, Haoyu and Dillmann, Steven and Ermon,
               Stefano and Guo, Daniel and Herschl, Michael H and Ilango, Rajesh
               and Janik, Ken and Lu, Amy X and Mehta, Reshma and Mofrad,
               Mohammad R K and Ng, Madelena Y and Pannu, Jaspreet and R{\'e},
               Christopher and St John, John and Sullivan, Jeremy and Tey,
               Joseph and Viggiano, Ben and Zhu, Kevin and Zynda, Greg and
               Balsam, Daniel and Collison, Patrick and Costa, Anthony B and
               Hernandez-Boussard, Tina and Ho, Eric and Liu, Ming-Yu and
               McGrath, Thomas and Powell, Kimberly and Pinglay, Sudarshan and
               Burke, Dave P and Goodarzi, Hani and Hsu, Patrick D and Hie,
               Brian L",
  journal   = "Nature",
  publisher = "Springer Science and Business Media LLC",
  pages     = "1--13",
  doi       = "10.1038/s41586-026-10176-5",
  month     =  mar,
  year      =  2026,
  language  = "en"
}

Copy citation

proto-bio/proto-tools/proto_tools/tools/causal_models/evo2

View source

Open Notebook

Open notebook

Coming soon!

Run this tool directly in Proto with no setup required.

Function	Description
`run_evo2_sample()`	Sample DNA sequences using Evo2 language model (GPU)	Docs Source
`run_evo2_score()`	Score DNA sequences using Evo2 language model (GPU)	Docs Source

Background

Evo2 (Brixi et al., 2026) is a DNA language model trained with an autoregressive objective: during training the model learns to predict the next nucleotide given all preceding nucleotides. Training used the OpenGenome2 dataset, which spans bacterial, archaeal, eukaryotic, and phage genomes across all domains of life, so the model is not restricted to any single clade. It is available at several scales, the largest being 40 billion parameters, and uses the StripedHyena 2 architecture, a sequence model that combines convolutional state-space layers with a smaller number of attention layers. This design lets the model process very long stretches of DNA, up to roughly one million nucleotides for the long-context checkpoints, without the memory cost a pure attention model would incur at that length. Several checkpoints are also offered with shorter context windows for lower memory use, and one variant is trained specifically on Microviridae phage genomes. The autoregressive objective yields two capabilities directly. Sampling from the predicted next-nucleotide distributions produces new candidate sequences, and reading off the probabilities the model assigns to an existing sequence gives a likelihood score that reflects how closely the sequence matches the patterns seen during training. Evo2 is the second model in the Evo family; the earlier Evo1 was trained only on prokaryotic and phage genomes, whereas Evo2 extends to eukaryotic genomes and longer context.

Learning Resources

The Illustrated Evo 2 (NVIDIA Research) - a visual walkthrough of the Evo 2 architecture and how the model processes and generates DNA.
Evo 2 Mechanistic Interpretability (Arc Institute) - an interactive look at the internal features Evo 2 learns, built with sparse autoencoders to surface interpretable genomic patterns.

Tools

Evo2 Sampling (`evo2-sample`)

Generates DNA sequences by autoregressive sampling. Given one or more prompt sequences in Evo2’s prompt format, the model extends each prompt nucleotide by nucleotide, drawing each new nucleotide from the model’s predicted distribution under the configured temperature, top_k, and top_p settings, until max_new_tokens new nucleotides have been produced or an end-of-sequence token is sampled. A key-value cache makes long generations efficient and can be carried forward to continue a generation.

API Reference

Source

Input: CausalModelSampleInput

prompts

List[string]

required

Prompt sequences to condition generation on. Can be provided as a single string or a list of strings.

Source

Config: Evo2SampleConfig

model_checkpoint

enum

default:"evo2_7b"

Evo2 weights variant.Available options: evo2_7b, evo2_20b, evo2_40b, evo2_7b_base, evo2_40b_base, evo2_1b_base, evo2_7b_262k, evo2_7b_microviridae

local_path

string

Override HuggingFace download with a local weights directory.

top_k

integer

default:"4"

Limit sampling to the top-k most probable tokens at each step.

max_new_tokens

integer

default:"32"

Maximum number of new tokens to generate per prompt (excludes prompt).

cached_generation

boolean

default:"True"

Use the model’s per-call KV cache during generation.

force_prompt_threshold

integer

Tokens to prefill in parallel before switching to autoregressive prompt forcing.

max_seqlen

integer

Maximum sequence length the KV cache will be sized for.

skip_special_tokens

boolean

default:"False"

Filter EOS/PAD bytes from the detokenized output.

stop_at_eos

boolean

default:"True"

Stop generation when an EOS (id=0) token is sampled.

old_kv_cache

Evo2KVCacheRef

Worker-local KV cache handle returned by a previous persistent-worker generation call.

return_kv_cache

boolean

default:"False"

Return worker-local KV cache handles for continued generation.

return_logits

boolean

default:"False"

Include per-position logits in the output.

verbose

integer

default:"0"

Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.

device

string

default:"cuda"

Device to run the model on.

timeout

integer

default:"1800"

Maximum execution time in seconds. None waits indefinitely.

seed

integer

Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.

prepend_prompt

boolean

default:"True"

Include the input prompt at the start of each generated sequence; when False, only newly generated tokens are returned.

temperature

number

default:"1.0"

Sampling temperature controlling randomness.

top_p

number

default:"1.0"

Nucleus sampling threshold over per-position token probabilities.

batch_size

integer

default:"1"

Number of sequences to process simultaneously.

Source

Output: Evo2SampleOutput

logits

array

Per-position logits for each generated sequence (shape: [num_sequences, num_generated_tokens, vocab_size]).

kv_caches

array

Worker-local cache handles for continued generation inside the same persistent worker.

sequences

List[string]

required

Generated DNA sequences.

Applications

This tool produces candidate DNA sequences for downstream design and screening, including genes, regulatory regions, and longer multi-gene segments. Because Evo2 is trained across all domains of life, it can be prompted with eukaryotic as well as prokaryotic and phage context, unlike the prokaryote-and-phage-only Evo1. The prompt sets the biological context for what follows.

Usage Tips

Match the checkpoint to the task. evo2_7b (the default), evo2_20b, and evo2_40b are the 1M-context models in increasing size and capability. The evo2_7b_base, evo2_40b_base, and evo2_1b_base checkpoints are 8K-context counterparts (evo2_1b_base is the smallest); evo2_7b_262k is a 262K-context variant; evo2_7b_microviridae is a 7B model adapted on Microviridae genomes for generating that bacteriophage family.
Prompts use Evo2’s prompt format. Prompt strings follow Evo2’s special tokenization (for example a leading +~ before DNA); see the upstream Evo2 documentation for the conventions.
top_k defaults to 4, the size of the DNA alphabet. It exists mainly to keep generation on the four bases rather than other byte tokens, so it is not the diversity knob; control diversity with temperature (lower stays near the training distribution, higher explores it) and leave top_p at its default unless you specifically want nucleus sampling.
Output includes the prompt by default. prepend_prompt=True (the default for this toolkit) returns the prompt joined to its continuation; set it False to receive only the newly generated nucleotides.
Prompt length plus max_new_tokens (default 32) must fit the checkpoint’s context window. The model cannot attend beyond that window, so a long prompt directly reduces how much can be generated; pick a longer-context checkpoint when the combined length is large.
stop_at_eos ends generation early when the model emits an end-of-sequence token; set it to False to always produce the full max_new_tokens.
Generated sequences are candidates. Validate them with downstream tools (for example ORF detection, structure prediction, or homology search) before drawing biological conclusions.

Evo2 Scoring (`evo2-score`)

Scores existing DNA sequences under the Evo2 model. For each sequence, it computes the model’s predicted probability of every nucleotide given the preceding nucleotides and aggregates these into a log-likelihood, an average log-likelihood per nucleotide, and a perplexity. Optionally returns the per-position logits and the token vocabulary.

API Reference

Source

Input: CausalModelScoringInput

sequences

List[string]

required

Sequences to score. Can be provided as a single string or a list of strings.

Source

Config: Evo2ScoringConfig

model_checkpoint

enum

default:"evo2_7b"

Evo2 weights variant.Available options: evo2_7b, evo2_20b, evo2_40b, evo2_7b_base, evo2_40b_base, evo2_1b_base, evo2_7b_262k, evo2_7b_microviridae

local_path

string

Override HuggingFace download with a local weights directory.

prepend_bos

boolean

default:"False"

Prepend a beginning-of-sequence token before scoring.

verbose

integer

default:"0"

Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.

device

string

default:"cuda"

Device to run the model on.

timeout

integer

default:"1800"

Maximum execution time in seconds. None waits indefinitely.

seed

integer

batch_size

integer

default:"1"

Number of sequences to process simultaneously on GPU. Larger batches improve throughput but use more GPU memory; reduce if encountering out-of-memory errors.

return_logits

boolean

default:"False"

Include per-position logits in the output.

Source

Output: CausalModelScoringOutput

scores

List[CausalModelScoringMetrics]

required

List of scoring outputs, one per input sequence. Each entry is a Metrics subclass with scalar metrics (log_likelihood, avg_log_likelihood, perplexity) and optional per-position _pp-suffixed list extras; logits and vocab are declared fields for raw model outputs.

Show CausalModelScoringMetrics

logits

array

Per-position logits array (seq_len, vocab_size). None unless return_logits=True.

vocab

array

Token ordering for logits.

primary_metric

string

Name of the metric that best summarizes the result overall (e.g. "avg_plddt" for AlphaFold2). Used by downstream UI and reporting to pick a headline value.

Metrics (one set per scores item)

Metric	Type	Range	Availability
`log_likelihood`	float	≤ 0.0	always
`avg_log_likelihood`	float	≤ 0.0	always
`perplexity`	float	≥ 1.0	always

Applications

This tool measures how well a DNA sequence matches the patterns the model learned during training across all domains of life. Lower perplexity means the sequence is more consistent with that distribution. Use it to rank or filter candidate sequences (including the output of evo2-sample), to compare variants of a sequence, or to assess sequences from organisms outside the prokaryotic and phage range that Evo1 covers.

Usage Tips

Compare length-normalized scores within one checkpoint. Total log_likelihood scales with sequence length, so use perplexity or avg_log_likelihood when comparing sequences of different lengths. Different checkpoints learn different distributions that are not calibrated to a common scale, so scores from different model_checkpoint values are hard to compare directly; a lower perplexity means the sequence is more consistent with that checkpoint’s training distribution.
return_logits defaults to False. Leave it off unless you need the per-position distributions, since the logits tensor is large (sequence length by a 512-token vocabulary).
prepend_bos adds a beginning-of-sequence token before scoring; leave it False unless matching a specific upstream convention.

Toolkit Notes

These apply to every Evo2 tool in this toolkit (evo2-sample, evo2-score).

Requires a high-memory GPU; memory scales with model size and context length. The 7B checkpoint needs a high-memory NVIDIA GPU; the 20B and 40B models and the 1M-context checkpoints need substantially more. CPU execution is not practical.
batch_size trades memory for throughput across both tools. It sets how many prompts (evo2-sample) or sequences (evo2-score) are processed per GPU forward pass. Raise it for higher throughput on many short sequences; lower it (default 1) if generation or scoring runs out of GPU memory.
Trained across all domains of life. Evo2 covers prokaryotic, eukaryotic, archaeal, and phage genomes. For prokaryote-and-phage-only generation with a smaller model, Evo1 is an alternative.

Example notebook: See the full working example for a copy-paste-ready walkthrough.

Infrastructure Guides

The following guides cover how to run tools efficiently and at scale.

Tool Persistence

Keep a tool’s model warm across calls instead of reloading it every invocation.

Device Management

How GPUs are allocated to tools and how to target specific devices.

Parallel Execution

Fan a batch of inputs out across multiple GPUs.

Cloud Inference

Run tools on managed cloud infrastructure with no local setup.

​Background

​Learning Resources

​Tools

​Evo2 Sampling (evo2-sample)

​API Reference

​Applications

​Usage Tips

​Evo2 Scoring (evo2-score)

​API Reference

​Applications

​Usage Tips

​Toolkit Notes

​Infrastructure Guides

Tool Persistence

Device Management

Parallel Execution

Cloud Inference

Background

Learning Resources

Tools

Evo2 Sampling (`evo2-sample`)

API Reference

Applications

Usage Tips

Evo2 Scoring (`evo2-score`)

API Reference

Applications

Usage Tips

Toolkit Notes

Infrastructure Guides