Skip to main content
License: Evo1 is open source and free for academic and commercial use under an Apache-2.0 license. Please refer to the license for full terms.

Proto is not affiliated with Arc Institute. This toolkit is open source and builds on the implementation produced by this organization. Product names, logos, and trademarks are the property of their respective owners.


evo-design/evo
evo-design/evo
Biological foundation modeling from molecular to genome scale
1.5k stars
View repo
Sequence modeling and design from molecular to genome scale with Evo
Eric Nguyen, Michael Poli, … Brian L Hie
Science (2024)
Read paper
@article{nguyen2024evo,
  title={Sequence modeling and design from molecular to genome scale with Evo},
  author={Nguyen, Eric and Poli, Michael and Durrant, Matthew G and Kang, Brian and Katrekar, Dhruva and Li, David B and Bartie, Liam J and Thomas, Armin W and King, Samuel H and Brixi, Garyk and Sullivan, Jeremy and Ng, Madelena Y and Lewis, Ashley and Lou, Aaron and Ermon, Stefano and Baccus, Stephen A and Hernandez-Boussard, Tina and R{\'e}, Christopher and Hsu, Patrick D and Hie, Brian L},
  journal={Science},
  volume={386},
  number={6723},
  pages={eado9336},
  year={2024},
  publisher={American Association for the Advancement of Science},
  doi={10.1126/science.ado9336}
}
Copy citation
proto-bio/proto-tools/proto_tools/tools/causal_models/evo1
View source
Open Notebook
Open notebook
Coming soon!
Run this tool directly in Proto with no setup required.
FunctionDescription
run_evo1_sample()Sample DNA sequences using Evo1 language model (GPU) Docs Source
run_evo1_score()Score DNA sequences using Evo1 language model (GPU) Docs Source

Background

Evo1 (Nguyen et al., 2024) is a 7-billion-parameter DNA language model trained with an autoregressive objective: during training the model learns to predict the next nucleotide given all preceding nucleotides. Training used the OpenGenome dataset, roughly 2.7 million prokaryotic and phage genomes, so the model’s predictions are most reliable for bacterial, archaeal, and phage sequences and are not expected to transfer well to eukaryotic genomes. It uses the StripedHyena architecture, a sequence model that combines convolutional state-space layers with a smaller number of attention layers. This design lets it process long stretches of DNA, up to 131,072 nucleotides for the long-context checkpoint, without the memory cost a pure attention model would incur at that length. The autoregressive objective yields two capabilities directly. Sampling from the predicted next-nucleotide distributions produces new candidate sequences, and reading off the probabilities the model assigns to an existing sequence gives a likelihood score that reflects how closely the sequence matches the patterns seen during training. Alongside the base checkpoints, the authors released specialized variants trained on CRISPR loci and on transposable elements for those sequence types. Evo1 is the first model in the Evo family; Evo2 extends the approach to eukaryotic genomes and longer context.

Learning Resources

Tools

Evo1 Sampling (evo1-sample)

Generates DNA sequences by autoregressive sampling. Given one or more prompt sequences, the model extends each prompt nucleotide by nucleotide, drawing each new nucleotide from the model’s predicted distribution under the configured temperature, top_k, and top_p settings, until max_new_tokens new nucleotides have been produced. Optionally returns a per-sequence likelihood score (log-likelihood, average log-likelihood, and perplexity) for the generated sequences.

API Reference

Source
prompts
List[string]
required
Prompt sequences to condition generation on. Can be provided as a single string or a list of strings.
Source
model_name
enum
default:"evo-1-8k-base"
Evo1 weights variant; evo-1-8k-* variants use an 8,192-token context, evo-1-131k-base extends to 131,072 tokens, and -crispr/-transposon are domain fine-tunes.Available options: evo-1.5-8k-base, evo-1-8k-base, evo-1-131k-base, evo-1-8k-crispr, evo-1-8k-transposon
top_k
integer
default:"4"
Limit sampling to the top-k most probable tokens at each step. Defaults to 4 (one per DNA base).
max_new_tokens
integer
default:"100"
Maximum number of new tokens to generate per prompt (excludes prompt).
cached_generation
boolean
default:"True"
Use the KV cache for autoregressive generation.
force_prompt_threshold
integer
default:"128"
Number of tokens to prefill in parallel before switching to autoregressive prompt forcing; lower values reduce peak memory.
verbose
integer
default:"0"
Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.
device
string
default:"cuda"
Device to run the model on.
timeout
integer
default:"1800"
Maximum execution time in seconds. None waits indefinitely.
seed
integer
Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.
prepend_prompt
boolean
default:"False"
Prepend the input prompt to each generated sequence; when False (the default), only newly generated tokens are returned.
temperature
number
default:"1.0"
Softmax temperature; lower values are more deterministic.
top_p
number
default:"1.0"
Nucleus sampling threshold over per-position token probabilities.
batch_size
integer
default:"1"
Number of prompts to process simultaneously on GPU.
Source
scores
array
Scoring metrics per sequence, including log_likelihood, avg_log_likelihood, and perplexity.
sequences
List[string]
required
Generated DNA sequences.

Applications

This tool produces candidate DNA sequences for downstream design and screening, including synthetic genes, regulatory regions, CRISPR systems (using the evo-1-8k-crispr checkpoint), and transposable elements (using the evo-1-8k-transposon checkpoint). The prompt sets the biological context for what follows, for example a start codon or promoter region.

Usage Tips

  • Match the checkpoint to the task. evo-1-8k-base (the default) is the general prokaryotic and phage DNA model and evo-1-131k-base is its genome-scale, long-context counterpart. evo-1-8k-crispr and evo-1-8k-transposon are task-specific variants of evo-1-8k-base for generating CRISPR-Cas systems and IS200/IS605 transposons; use them when generating those systems and a base checkpoint otherwise.
  • top_k defaults to 4, the size of the DNA alphabet. It exists mainly to keep generation on the four bases rather than other byte tokens, so it is not the diversity knob; control diversity with temperature (lower stays near the training distribution, higher explores it) and leave top_p at its default unless you specifically want nucleus sampling.
  • Output excludes the prompt by default. prepend_prompt=False returns only the newly generated nucleotides, not the prompt joined to its continuation; set it True if you need the full sequence back.
  • Prompt length plus max_new_tokens must fit the checkpoint’s context window (8,192 nucleotides for the 8k checkpoints; evo-1-131k-base for longer). The model cannot attend beyond that window, so a long prompt directly reduces how much can be generated.
  • Generated sequences are candidates. Validate them with downstream tools (for example ORF detection, structure prediction, or homology search) before drawing biological conclusions.

Evo1 Scoring (evo1-score)

Scores existing DNA sequences under the Evo1 model. For each sequence, it computes the model’s predicted probability of every nucleotide given the preceding nucleotides and aggregates these into a log-likelihood, an average log-likelihood per nucleotide, and a perplexity. Optionally returns the per-position logits and the token vocabulary.

API Reference

Source
sequences
List[string]
required
Sequences to score. Can be provided as a single string or a list of strings.
Source
model_name
enum
default:"evo-1-8k-base"
Evo1 weights variant.Available options: evo-1.5-8k-base, evo-1-8k-base, evo-1-131k-base, evo-1-8k-crispr, evo-1-8k-transposon
verbose
integer
default:"0"
Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.
device
string
default:"cuda"
Device to run the model on.
timeout
integer
default:"1800"
Maximum execution time in seconds. None waits indefinitely.
seed
integer
Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.
batch_size
integer
default:"1"
Number of sequences to process simultaneously on GPU. Larger batches improve throughput but use more GPU memory.
return_logits
boolean
default:"False"
Include per-position logits in the output.
Source
scores
List[CausalModelScoringMetrics]
required
List of scoring outputs, one per input sequence. Each entry is a Metrics subclass with scalar metrics (log_likelihood, avg_log_likelihood, perplexity) and optional per-position _pp-suffixed list extras; logits and vocab are declared fields for raw model outputs.
Metrics (one set per scores item)
MetricTypeRangeAvailability
log_likelihoodfloat≤ 0.0always
avg_log_likelihoodfloat≤ 0.0always
perplexityfloat≥ 1.0always

Applications

This tool measures how well a DNA sequence matches the patterns the model learned from natural prokaryotic and phage genomes. Lower perplexity means the sequence is more consistent with that training distribution. Use it to rank or filter candidate sequences (including the output of evo1-sample), to compare variants of a sequence, or to flag sequences that fall far outside the model’s training domain.

Usage Tips

  • Compare length-normalized scores within one checkpoint. Total log_likelihood scales with sequence length, so use perplexity or avg_log_likelihood when comparing sequences of different lengths. Different checkpoints learn different distributions that are not calibrated to a common scale, so scores from different model_name values are hard to compare directly; a lower perplexity means the sequence is more consistent with that checkpoint’s training distribution.
  • return_logits defaults to False. Leave it off unless you need the per-position distributions, since the logits tensor is large (sequence length by a 512-token vocabulary).

Toolkit Notes

These apply to every Evo1 tool in this toolkit (evo1-sample, evo1-score).
  • Requires a GPU. An NVIDIA GPU with at least 24 GB of memory is recommended; CPU execution is possible but very slow and not practical for typical use.
  • batch_size trades memory for throughput across both tools. It sets how many prompts (evo1-sample) or sequences (evo1-score) are processed per GPU forward pass. Raise it for higher throughput on many short sequences; lower it (default 1) if generation or scoring runs out of GPU memory.
  • Trained on prokaryotic and phage genomes. Predictions are most reliable within that domain. For eukaryotic genomes or longer context, use Evo2.
Example notebook: See the full working example for a copy-paste-ready walkthrough.

Infrastructure Guides

The following guides cover how to run tools efficiently and at scale.

Tool Persistence

Keep a tool’s model warm across calls instead of reloading it every invocation.

Device Management

How GPUs are allocated to tools and how to target specific devices.

Parallel Execution

Fan a batch of inputs out across multiple GPUs.

Cloud Inference

Run tools on managed cloud infrastructure with no local setup.