Skip to main content
Evo2 DNA Language Model
License: Evo2 is open source and free for academic and commercial use under an Apache-2.0 license. Please refer to the license for full terms.

This generator is open source. Any third-party models, product names, or trademarks referenced are the property of their respective owners, and Proto is not affiliated with them.


Go to Tool Page
proto-bio/proto-language/proto_language/generator/evo2_generator.py
View source
@ARTICLE{Brixi2026-jn,
  title     = "Genome modelling and design across all domains of life with Evo 2",
  author    = "Brixi, Garyk and Durrant, Matthew G and Ku, Jerome and
               Naghipourfar, Mohsen and Poli, Michael and Sun, Gwanggyu and
               Brockman, Greg and Chang, Daniel and Fanton, Alison and Gonzalez,
               Gabriel A and King, Samuel H and Li, David B and Merchant, Aditi
               T and Nguyen, Eric and Ricci-Tam, Chiara and Romero, David W and
               Schmok, Jonathan C and Taghibakhshi, Ali and Vorontsov, Anton and
               Yang, Brandon and Deng, Myra and Gorton, Liv and Nguyen, Nam and
               Wang, Nicholas K and Pearce, Michael T and Simon, Elana and
               Adams, Etowah and Amador, Zachary J and Ashley, Euan A and
               Baccus, Stephen A and Dai, Haoyu and Dillmann, Steven and Ermon,
               Stefano and Guo, Daniel and Herschl, Michael H and Ilango, Rajesh
               and Janik, Ken and Lu, Amy X and Mehta, Reshma and Mofrad,
               Mohammad R K and Ng, Madelena Y and Pannu, Jaspreet and R{\'e},
               Christopher and St John, John and Sullivan, Jeremy and Tey,
               Joseph and Viggiano, Ben and Zhu, Kevin and Zynda, Greg and
               Balsam, Daniel and Collison, Patrick and Costa, Anthony B and
               Hernandez-Boussard, Tina and Ho, Eric and Liu, Ming-Yu and
               McGrath, Thomas and Powell, Kimberly and Pinglay, Sudarshan and
               Burke, Dave P and Goodarzi, Hani and Hsu, Patrick D and Hie,
               Brian L",
  journal   = "Nature",
  publisher = "Springer Science and Business Media LLC",
  pages     = "1--13",
  doi       = "10.1038/s41586-026-10176-5",
  month     =  mar,
  year      =  2026,
  language  = "en"
}
Copy citation
Sequence generator using Evo2 genomic language model for DNA generation. This generator uses the Evo2 7B parameter model to autoregressively generate DNA sequences from prompt sequences. Supports advanced sampling strategies, KV caching for efficiency, and batch generation. The generator category is "autoregressive", indicating sequences are generated token-by-token from left to right. The number of tokens to generate is automatically calculated based on the assigned segment’s sequence_length, prompt length, and prepend_prompt setting.

API Reference

ConfigEvo2GeneratorConfig Source
Configuration object for Evo2Generator.This class defines configuration parameters for the Evo2 generator, which uses a 7B parameter genomic language model to generate DNA sequences autoregressively from prompt sequences.
All prompts must have identical lengths for batched generation. For detailed information on Evo2 parameters, see: https://github.com/arcinstitute/evo2
prompts
List[string]
required
Prompt sequences for DNA sequence generation (single prompt or multiple)
model_checkpoint
enum
default:"evo2_7b"
Evo2 model variant to load (currently only evo2_7b).Options: evo2_7b, evo2_20b, evo2_40b, evo2_7b_base, evo2_40b_base, evo2_1b_base, evo2_7b_262k, evo2_7b_microviridae
local_path
string
Path to local checkpoint weights for custom or finetuned models
device
string
default:"cuda"
GPU device to run Evo2 on (e.g. ‘cuda’ or ‘cuda:0’).
top_k
integer
default:"4"
Limits sampling to the top-k most probable tokens at each generation step.
top_p
number
default:"1"
Nucleus sampling cutoff. Restricts to the smallest token set with cumulative prob ≥ top-p.
temperature
number
default:"1.0"
Sharpness of sampling. Below 1 favors high-probability tokens; above 1 increases diversity.
force_prompt_threshold
integer
Optional number of tokens to prefill in parallel before switching to prompt forcing.
max_seqlen
integer
Optional maximum sequence length to generate. Determines the max size of the cache if larger.
stop_at_eos
boolean
default:"True"
Whether to stop at end-of-sequence token
batched
boolean
default:"True"
Generate all prompts together in a single batched forward pass. Required for multiple prompts.
batch_size
integer
default:"1"
Number of sequences to process simultaneously on GPU
cached_generation
boolean
default:"True"
Whether to reuse KV-cache state across decoding steps to avoid recomputation.
store_kv_cache
boolean
default:"False"
Retain and expose the per-sequence KV-cache after generation so downstream callers can continue.
prepend_prompt
boolean
default:"False"
Whether to prepend prompt to generation
verbose
boolean
default:"False"
Whether to print verbose output

Usage

python
>>> from proto_language.generator import Evo2Generator, Evo2GeneratorConfig
>>> from proto_language.core import Segment, SequenceType
>>> config = Evo2GeneratorConfig(prompts="ATG", temperature=0.8)
>>> gen = Evo2Generator(config)
>>> # Segment length determines how many tokens to generate
>>> segment = Segment(length=1003, sequence_type="dna")
>>> gen.assign(segment)  # prepend_prompt defaults to False, so max_new_tokens = 1003
>>> gen.sample()  # Generates DNA sequences

Metadata

PropertyValue
Keyevo2
ClassEvo2Generator
Categoryautoregressive
Input Typeprompt
Uses GPUTrue
Supported Sequence Typesdna
Allows Empty StartFalse