Evo2 DNA Language Model

License: Evo2 is open source and free for academic and commercial use under an Apache-2.0 license. Please refer to the license for full terms.

This generator is open source. Any third-party models, product names, or trademarks referenced are the property of their respective owners, and Proto is not affiliated with them.

Tools Used Tools Used Source Source Cite Cite

Go to Tool Page

proto-bio/proto-language/proto_language/generator/evo2_generator.py

View source

@ARTICLE{Brixi2026-jn,
  title     = "Genome modelling and design across all domains of life with Evo 2",
  author    = "Brixi, Garyk and Durrant, Matthew G and Ku, Jerome and
               Naghipourfar, Mohsen and Poli, Michael and Sun, Gwanggyu and
               Brockman, Greg and Chang, Daniel and Fanton, Alison and Gonzalez,
               Gabriel A and King, Samuel H and Li, David B and Merchant, Aditi
               T and Nguyen, Eric and Ricci-Tam, Chiara and Romero, David W and
               Schmok, Jonathan C and Taghibakhshi, Ali and Vorontsov, Anton and
               Yang, Brandon and Deng, Myra and Gorton, Liv and Nguyen, Nam and
               Wang, Nicholas K and Pearce, Michael T and Simon, Elana and
               Adams, Etowah and Amador, Zachary J and Ashley, Euan A and
               Baccus, Stephen A and Dai, Haoyu and Dillmann, Steven and Ermon,
               Stefano and Guo, Daniel and Herschl, Michael H and Ilango, Rajesh
               and Janik, Ken and Lu, Amy X and Mehta, Reshma and Mofrad,
               Mohammad R K and Ng, Madelena Y and Pannu, Jaspreet and R{\'e},
               Christopher and St John, John and Sullivan, Jeremy and Tey,
               Joseph and Viggiano, Ben and Zhu, Kevin and Zynda, Greg and
               Balsam, Daniel and Collison, Patrick and Costa, Anthony B and
               Hernandez-Boussard, Tina and Ho, Eric and Liu, Ming-Yu and
               McGrath, Thomas and Powell, Kimberly and Pinglay, Sudarshan and
               Burke, Dave P and Goodarzi, Hani and Hsu, Patrick D and Hie,
               Brian L",
  journal   = "Nature",
  publisher = "Springer Science and Business Media LLC",
  pages     = "1--13",
  doi       = "10.1038/s41586-026-10176-5",
  month     =  mar,
  year      =  2026,
  language  = "en"
}

Copy citation

Sequence generator using Evo2 genomic language model for DNA generation. This generator uses the Evo2 7B parameter model to autoregressively generate DNA sequences from prompt sequences. Supports advanced sampling strategies, KV caching for efficiency, and batch generation. The generator category is "autoregressive", indicating sequences are generated token-by-token from left to right. The number of tokens to generate is automatically calculated based on the assigned segment’s sequence_length, prompt length, and prepend_prompt setting.

API Reference

ConfigEvo2GeneratorConfig Source

Configuration object for Evo2Generator.This class defines configuration parameters for the Evo2 generator, which uses a 7B parameter genomic language model to generate DNA sequences autoregressively from prompt sequences.

All prompts must have identical lengths for batched generation. For detailed information on Evo2 parameters, see: https://github.com/arcinstitute/evo2

prompts

List[string]

required

Prompt sequences for DNA sequence generation (single prompt or multiple)

model_checkpoint

enum

default:"evo2_7b"

Evo2 model variant to load (currently only evo2_7b).Options: evo2_7b, evo2_20b, evo2_40b, evo2_7b_base, evo2_40b_base, evo2_1b_base, evo2_7b_262k, evo2_7b_microviridae

local_path

string

Path to local checkpoint weights for custom or finetuned models

device

string

default:"cuda"

GPU device to run Evo2 on (e.g. ‘cuda’ or ‘cuda:0’).

top_k

integer

default:"4"

Limits sampling to the top-k most probable tokens at each generation step.

top_p

number

default:"1"

Nucleus sampling cutoff. Restricts to the smallest token set with cumulative prob ≥ top-p.

temperature

number

default:"1.0"

Sharpness of sampling. Below 1 favors high-probability tokens; above 1 increases diversity.

force_prompt_threshold

integer

Optional number of tokens to prefill in parallel before switching to prompt forcing.

max_seqlen

integer

Optional maximum sequence length to generate. Determines the max size of the cache if larger.

stop_at_eos

boolean

default:"True"

Whether to stop at end-of-sequence token

batched

boolean

default:"True"

Generate all prompts together in a single batched forward pass. Required for multiple prompts.

batch_size

integer

default:"1"

Number of sequences to process simultaneously on GPU

cached_generation

boolean

default:"True"

Whether to reuse KV-cache state across decoding steps to avoid recomputation.

store_kv_cache

boolean

default:"False"

Retain and expose the per-sequence KV-cache after generation so downstream callers can continue.

prepend_prompt

boolean

default:"False"

Whether to prepend prompt to generation

verbose

boolean

default:"False"

Whether to print verbose output

Usage

python

>>> from proto_language.generator import Evo2Generator, Evo2GeneratorConfig
>>> from proto_language.core import Segment, SequenceType
>>> config = Evo2GeneratorConfig(prompts="ATG", temperature=0.8)
>>> gen = Evo2Generator(config)
>>> # Segment length determines how many tokens to generate
>>> segment = Segment(length=1003, sequence_type="dna")
>>> gen.assign(segment)  # prepend_prompt defaults to False, so max_new_tokens = 1003
>>> gen.sample()  # Generates DNA sequences

Metadata

Property	Value
Key	`evo2`
Class	`Evo2Generator`
Category	`autoregressive`
Input Type	`prompt`
Uses GPU	`True`
Supported Sequence Types	`dna`
Allows Empty Start	`False`

​API Reference

​Usage

​Metadata

API Reference

Usage

Metadata