Evo1 DNA Language Model

License: Evo1 is open source and free for academic and commercial use under an Apache-2.0 license. Please refer to the license for full terms.

This generator is open source. Any third-party models, product names, or trademarks referenced are the property of their respective owners, and Proto is not affiliated with them.

Tools Used Tools Used Source Source Cite Cite

Go to Tool Page

proto-bio/proto-language/proto_language/generator/evo1_generator.py

View source

@article{nguyen2024evo,
  title={Sequence modeling and design from molecular to genome scale with Evo},
  author={Nguyen, Eric and Poli, Michael and Durrant, Matthew G and Kang, Brian and Katrekar, Dhruva and Li, David B and Bartie, Liam J and Thomas, Armin W and King, Samuel H and Brixi, Garyk and Sullivan, Jeremy and Ng, Madelena Y and Lewis, Ashley and Lou, Aaron and Ermon, Stefano and Baccus, Stephen A and Hernandez-Boussard, Tina and R{\'e}, Christopher and Hsu, Patrick D and Hie, Brian L},
  journal={Science},
  volume={386},
  number={6723},
  pages={eado9336},
  year={2024},
  publisher={American Association for the Advancement of Science},
  doi={10.1126/science.ado9336}
}

Copy citation

Sequence generator using the Evo1 genomic language model. Supports multiple checkpoints including CRISPR and transposon fine-tuned variants. The number of tokens to generate is automatically calculated based on the assigned segment’s sequence_length.

API Reference

ConfigEvo1GeneratorConfig Source

Configuration object for Evo1Generator.

prompts

List[string]

required

Prompt sequences for DNA generation (single prompt or multiple)

model_checkpoint

enum

default:"evo-1-8k-base"

Evo1 model variant to load (e.g. evo-1-8k-base).Options: evo-1.5-8k-base, evo-1-8k-base, evo-1-131k-base, evo-1-8k-crispr, evo-1-8k-transposon

top_k

integer

default:"4"

At each step, restrict sampling to the k most probable tokens.

temperature

number

default:"1.0"

Sharpness of the sampling distribution. Below 1 sharpens; above 1 increases diversity. Must be > 0.

prepend_prompt

boolean

default:"False"

Whether to prepend prompt to generation

device

string

default:"cuda"

GPU device to run Evo1 on (e.g. ‘cuda’ or ‘cuda:0’).

batch_size

integer

default:"1"

Number of sequences to process simultaneously on GPU

verbose

boolean

default:"False"

Whether to print verbose output

Usage

python

>>> config = Evo1GeneratorConfig(
...     prompts="ATG",
...     model_checkpoint="evo-1-8k-crispr",
...     temperature=1.0,
... )
>>> gen = Evo1Generator(config)
>>> segment = Segment(length=1003, sequence_type="dna")
>>> gen.assign(segment)  # prepend_prompt defaults to False, so max_new_tokens = 1003
>>> gen.sample()

Metadata

Property	Value
Key	`evo1`
Class	`Evo1Generator`
Category	`autoregressive`
Input Type	`prompt`
Uses GPU	`True`
Supported Sequence Types	`dna`
Allows Empty Start	`False`

​API Reference

​Usage

​Metadata

API Reference

Usage

Metadata