Evo1 - Proto

License: Evo1 is open source and free for academic and commercial use under an Apache-2.0 license. Please refer to the license for full terms.

Proto is not affiliated with Arc Institute. This toolkit is open source and builds on the implementation produced by this organization. Product names, logos, and trademarks are the property of their respective owners.

GitHub 1.5k GitHub 1.5k Publication Publication Cite Cite Tool Source Tool Source Open as Notebook Open as Notebook Open on Proto Open on Proto

evo-design/evo

Biological foundation modeling from molecular to genome scale

1.5k stars

View repo

Sequence modeling and design from molecular to genome scale with Evo

Eric Nguyen, Michael Poli, … Brian L Hie

Science (2024)

Read paper

@article{nguyen2024evo,
  title={Sequence modeling and design from molecular to genome scale with Evo},
  author={Nguyen, Eric and Poli, Michael and Durrant, Matthew G and Kang, Brian and Katrekar, Dhruva and Li, David B and Bartie, Liam J and Thomas, Armin W and King, Samuel H and Brixi, Garyk and Sullivan, Jeremy and Ng, Madelena Y and Lewis, Ashley and Lou, Aaron and Ermon, Stefano and Baccus, Stephen A and Hernandez-Boussard, Tina and R{\'e}, Christopher and Hsu, Patrick D and Hie, Brian L},
  journal={Science},
  volume={386},
  number={6723},
  pages={eado9336},
  year={2024},
  publisher={American Association for the Advancement of Science},
  doi={10.1126/science.ado9336}
}

Copy citation

proto-bio/proto-tools/proto_tools/tools/causal_models/evo1

View source

Open Notebook

Open notebook

Coming soon!

Run this tool directly in Proto with no setup required.

Function	Description
`run_evo1_sample()`	Sample DNA sequences using Evo1 language model (GPU)	Docs Source
`run_evo1_score()`	Score DNA sequences using Evo1 language model (GPU)	Docs Source

Background

Evo1 (Nguyen et al., 2024) is a 7-billion-parameter DNA language model trained with an autoregressive objective: during training the model learns to predict the next nucleotide given all preceding nucleotides. Training used the OpenGenome dataset, roughly 2.7 million prokaryotic and phage genomes, so the model’s predictions are most reliable for bacterial, archaeal, and phage sequences and are not expected to transfer well to eukaryotic genomes. It uses the StripedHyena architecture, a sequence model that combines convolutional state-space layers with a smaller number of attention layers. This design lets it process long stretches of DNA, up to 131,072 nucleotides for the long-context checkpoint, without the memory cost a pure attention model would incur at that length. The autoregressive objective yields two capabilities directly. Sampling from the predicted next-nucleotide distributions produces new candidate sequences, and reading off the probabilities the model assigns to an existing sequence gives a likelihood score that reflects how closely the sequence matches the patterns seen during training. Alongside the base checkpoints, the authors released specialized variants trained on CRISPR loci and on transposable elements for those sequence types. Evo1 is the first model in the Evo family; Evo2 extends the approach to eukaryotic genomes and longer context.

Learning Resources

Learning from DNA: a grand challenge in biology (Hazy Research, Stanford) - an accessible introduction to Evo from the authors, covering the motivation for genomic language modeling and how the model is trained and used.
Evo: DNA foundation modeling from molecular to genome scale (Arc Institute) - an overview of Evo’s capabilities, including genome-scale generation and the StripedHyena architecture.

Tools

Evo1 Sampling (`evo1-sample`)

Generates DNA sequences by autoregressive sampling. Given one or more prompt sequences, the model extends each prompt nucleotide by nucleotide, drawing each new nucleotide from the model’s predicted distribution under the configured temperature, top_k, and top_p settings, until max_new_tokens new nucleotides have been produced. Optionally returns a per-sequence likelihood score (log-likelihood, average log-likelihood, and perplexity) for the generated sequences.

API Reference

Source

Input: CausalModelSampleInput

prompts

List[string]

required

Prompt sequences to condition generation on. Can be provided as a single string or a list of strings.

Source

Config: Evo1SampleConfig

model_name

enum

default:"evo-1-8k-base"

Evo1 weights variant; evo-1-8k-* variants use an 8,192-token context, evo-1-131k-base extends to 131,072 tokens, and -crispr/-transposon are domain fine-tunes.Available options: evo-1.5-8k-base, evo-1-8k-base, evo-1-131k-base, evo-1-8k-crispr, evo-1-8k-transposon

top_k

integer

default:"4"

Limit sampling to the top-k most probable tokens at each step. Defaults to 4 (one per DNA base).

max_new_tokens

integer

default:"100"

Maximum number of new tokens to generate per prompt (excludes prompt).

cached_generation

boolean

default:"True"

Use the KV cache for autoregressive generation.

force_prompt_threshold

integer

default:"128"

Number of tokens to prefill in parallel before switching to autoregressive prompt forcing; lower values reduce peak memory.

verbose

integer

default:"0"

Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.

device

string

default:"cuda"

Device to run the model on.

timeout

integer

default:"1800"

Maximum execution time in seconds. None waits indefinitely.

seed

integer

Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.

prepend_prompt

boolean

default:"False"

Prepend the input prompt to each generated sequence; when False (the default), only newly generated tokens are returned.

temperature

number

default:"1.0"

Softmax temperature; lower values are more deterministic.

top_p

number

default:"1.0"

Nucleus sampling threshold over per-position token probabilities.

batch_size

integer

default:"1"

Number of prompts to process simultaneously on GPU.

Source

Output: Evo1SampleOutput

scores

array

Scoring metrics per sequence, including log_likelihood, avg_log_likelihood, and perplexity.

sequences

List[string]

required

Generated DNA sequences.

Applications

This tool produces candidate DNA sequences for downstream design and screening, including synthetic genes, regulatory regions, CRISPR systems (using the evo-1-8k-crispr checkpoint), and transposable elements (using the evo-1-8k-transposon checkpoint). The prompt sets the biological context for what follows, for example a start codon or promoter region.

Usage Tips

Match the checkpoint to the task. evo-1-8k-base (the default) is the general prokaryotic and phage DNA model and evo-1-131k-base is its genome-scale, long-context counterpart. evo-1-8k-crispr and evo-1-8k-transposon are task-specific variants of evo-1-8k-base for generating CRISPR-Cas systems and IS200/IS605 transposons; use them when generating those systems and a base checkpoint otherwise.
top_k defaults to 4, the size of the DNA alphabet. It exists mainly to keep generation on the four bases rather than other byte tokens, so it is not the diversity knob; control diversity with temperature (lower stays near the training distribution, higher explores it) and leave top_p at its default unless you specifically want nucleus sampling.
Output excludes the prompt by default. prepend_prompt=False returns only the newly generated nucleotides, not the prompt joined to its continuation; set it True if you need the full sequence back.
Prompt length plus max_new_tokens must fit the checkpoint’s context window (8,192 nucleotides for the 8k checkpoints; evo-1-131k-base for longer). The model cannot attend beyond that window, so a long prompt directly reduces how much can be generated.
Generated sequences are candidates. Validate them with downstream tools (for example ORF detection, structure prediction, or homology search) before drawing biological conclusions.

Evo1 Scoring (`evo1-score`)

Scores existing DNA sequences under the Evo1 model. For each sequence, it computes the model’s predicted probability of every nucleotide given the preceding nucleotides and aggregates these into a log-likelihood, an average log-likelihood per nucleotide, and a perplexity. Optionally returns the per-position logits and the token vocabulary.

API Reference

Source

Input: CausalModelScoringInput

sequences

List[string]

required

Sequences to score. Can be provided as a single string or a list of strings.

Source

Config: Evo1ScoringConfig

model_name

enum

default:"evo-1-8k-base"

Evo1 weights variant.Available options: evo-1.5-8k-base, evo-1-8k-base, evo-1-131k-base, evo-1-8k-crispr, evo-1-8k-transposon

verbose

integer

default:"0"

Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.

device

string

default:"cuda"

Device to run the model on.

timeout

integer

default:"1800"

Maximum execution time in seconds. None waits indefinitely.

seed

integer

batch_size

integer

default:"1"

Number of sequences to process simultaneously on GPU. Larger batches improve throughput but use more GPU memory.

return_logits

boolean

default:"False"

Include per-position logits in the output.

Source

Output: CausalModelScoringOutput

scores

List[CausalModelScoringMetrics]

required

List of scoring outputs, one per input sequence. Each entry is a Metrics subclass with scalar metrics (log_likelihood, avg_log_likelihood, perplexity) and optional per-position _pp-suffixed list extras; logits and vocab are declared fields for raw model outputs.

Show CausalModelScoringMetrics

logits

array

Per-position logits array (seq_len, vocab_size). None unless return_logits=True.

vocab

array

Token ordering for logits.

primary_metric

string

Name of the metric that best summarizes the result overall (e.g. "avg_plddt" for AlphaFold2). Used by downstream UI and reporting to pick a headline value.

Metrics (one set per scores item)

Metric	Type	Range	Availability
`log_likelihood`	float	≤ 0.0	always
`avg_log_likelihood`	float	≤ 0.0	always
`perplexity`	float	≥ 1.0	always

Applications

This tool measures how well a DNA sequence matches the patterns the model learned from natural prokaryotic and phage genomes. Lower perplexity means the sequence is more consistent with that training distribution. Use it to rank or filter candidate sequences (including the output of evo1-sample), to compare variants of a sequence, or to flag sequences that fall far outside the model’s training domain.

Usage Tips

Compare length-normalized scores within one checkpoint. Total log_likelihood scales with sequence length, so use perplexity or avg_log_likelihood when comparing sequences of different lengths. Different checkpoints learn different distributions that are not calibrated to a common scale, so scores from different model_name values are hard to compare directly; a lower perplexity means the sequence is more consistent with that checkpoint’s training distribution.
return_logits defaults to False. Leave it off unless you need the per-position distributions, since the logits tensor is large (sequence length by a 512-token vocabulary).

Toolkit Notes

These apply to every Evo1 tool in this toolkit (evo1-sample, evo1-score).

Requires a GPU. An NVIDIA GPU with at least 24 GB of memory is recommended; CPU execution is possible but very slow and not practical for typical use.
batch_size trades memory for throughput across both tools. It sets how many prompts (evo1-sample) or sequences (evo1-score) are processed per GPU forward pass. Raise it for higher throughput on many short sequences; lower it (default 1) if generation or scoring runs out of GPU memory.
Trained on prokaryotic and phage genomes. Predictions are most reliable within that domain. For eukaryotic genomes or longer context, use Evo2.

Example notebook: See the full working example for a copy-paste-ready walkthrough.

Infrastructure Guides

The following guides cover how to run tools efficiently and at scale.

Tool Persistence

Keep a tool’s model warm across calls instead of reloading it every invocation.

Device Management

How GPUs are allocated to tools and how to target specific devices.

Parallel Execution

Fan a batch of inputs out across multiple GPUs.

Cloud Inference

Run tools on managed cloud infrastructure with no local setup.

​Background

​Learning Resources

​Tools

​Evo1 Sampling (evo1-sample)

​API Reference

​Applications

​Usage Tips

​Evo1 Scoring (evo1-score)

​API Reference

​Applications

​Usage Tips

​Toolkit Notes

​Infrastructure Guides

Tool Persistence

Device Management

Parallel Execution

Cloud Inference

Background

Learning Resources

Tools

Evo1 Sampling (`evo1-sample`)

API Reference

Applications

Usage Tips

Evo1 Scoring (`evo1-score`)

API Reference

Applications

Usage Tips

Toolkit Notes

Infrastructure Guides