ESM3 - Proto

License: ESM3 uses Custom (Cambrian Open License Agreement) for code and Custom (Cambrian Non-Commercial License Agreement) for model weights and has restrictions around commercial use and may require explicit attribution when utilized. Model weights are gated and require accepting the provider’s terms and authenticating with a HuggingFace token. Please refer to the code license and model weights license for full terms.

Proto is not affiliated with Biohub. This toolkit is open source and builds on the implementation produced by this organization. Product names, logos, and trademarks are the property of their respective owners.

GitHub 2.3k GitHub 2.3k

HuggingFace

HuggingFace Publication Publication Cite Cite Tool Source Tool Source Open as Notebook Open as Notebook

evolutionaryscale/esm

2.3k stars

View repo

EvolutionaryScale/esm3-sm-open-v1

View model

Simulating 500 million years of evolution with a language model

Thomas Hayes, Roshan Rao, … Marius Wiber

Science (2025)

Read paper

@article{hayes2025esm3,
  title={Simulating 500 million years of evolution with a language model},
  author={Hayes, Thomas and Rao, Roshan and Akin, Halil and Sofroniew, Nicholas J and Oktay, Deniz and Lin, Zeming and Verkuil, Robert and Tran, Vincent Q and Deaton, Jonathan and Wiber, Marius and others},
  journal={Science},
  volume={387},
  number={6735},
  pages={eads0018},
  year={2025},
  publisher={American Association for the Advancement of Science},
  doi={10.1126/science.ads0018}
}

Copy citation

proto-bio/proto-tools/proto_tools/tools/masked_models/esm3

View source

Open Notebook

Open notebook

Function	Description
`run_esm3_embeddings()`	Extract protein sequence embeddings and logits using ESM3 (GPU)	Docs Source
`run_esm3_sample()`	Sample masked positions in protein sequences using ESM3 language model (GPU)	Docs Source
`run_esm3_score()`	Score protein sequences using ESM3 language model (GPU)	Docs Source

Background

In 2025, Hayes et al. introduced ESM3, a generative model from EvolutionaryScale that departs from the encoder-only design of the ESM-1/ESM-2 line. ESM3 is a masked generative transformer that represents a protein across three simultaneous tracks (amino-acid sequence, discrete structure tokens, and function annotation). Training masks spans across all three tracks, so a single model can be prompted with any combination of partial sequence, structure, and function and asked to complete the rest. The flagship 98B-parameter model (esm3-large-2024-03) is available through the EvolutionaryScale Forge API under closed-beta access (also offered via AWS SageMaker); the publicly released open checkpoint, esm3_sm_open_v1, is the small 1.4B-parameter variant. ESM3 is the multimodal successor to ESM-2 (Lin et al., 2023). Where ESM-2 is a sequence-only masked language model, ESM3 adds structure and function tracks and a generative objective. For pure sequence-embedding workloads ESM-2 remains lighter and faster; ESM3 is the choice when masked generative editing matters. This toolkit exposes only the sequence-track operations (embeddings, masked sampling, scoring) over supplied sequences.

Tools

ESM3 Embeddings (`esm3-embedding`)

Runs a single forward pass over ESM3 and mean-pools the per-residue hidden states into a fixed-length sequence descriptor. Per-position amino-acid logits are returned on request.

API Reference

Source

Input: MaskedModelInput

sequences

List[string]

required

Protein sequence(s) to process. Can be provided as:

Source

Config: ESM3EmbeddingsConfig

model_checkpoint

string

default:"esm3_sm_open_v1"

ESM3 weights variant. Currently "esm3_sm_open_v1" is the only public open-weights checkpoint.

return_logits

boolean

default:"False"

Include per-position logits in the output (large; disable to save memory).

repr_layer

integer

default:"-1"

Transformer layer index for embeddings. -1 returns the post-norm last-block output (matches ESM2/ESMC -1 semantics); other indices select pre-norm per-block hiddens. Both are captured via a forward hook on model.transformer since ESM3.forward discards them.

verbose

integer

default:"0"

Print status messages during model execution.

device

string

default:"cuda"

Device to run the model on.

timeout

integer

default:"600"

Maximum execution time in seconds. None waits indefinitely.

seed

integer

Random seed. When set, tools run reproducibly up to small GPU float noise (see BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.

batch_size

integer

default:"1"

Number of sequences to process in parallel. Larger batches improve throughput but require more GPU memory.

Source

Output: ESM3EmbeddingsOutput

results

List[SequenceEmbedding]

required

Per-sequence embedding results. Each SequenceEmbedding contains:

Show SequenceEmbedding

mean_embedding

List[number]

required

Mean-pooled embedding vector for one sequence.

attention_mask

List[integer]

required

Binary mask indicating valid positions (1) vs padding (0).

logits

array

Optional per-position amino acid logits for one sequence.

projection

Projection2D

Optional 2D coordinate from a UMAP projection of all embeddings in the same call. Populated when n_sequences >= 4; None otherwise (single-point or 2-3-point UMAP is meaningless).

Applications

The mean-pooled embedding is a learned protein representation for downstream supervised tasks such as clustering, classification, and property regression, and powers similarity search through cosine similarity on the mean vector.

Usage Tips

repr_layer selects which transformer layer is mean-pooled. The default -1 returns the post-norm output of the last block (matching ESM-2/ESMC -1 semantics); other indices select pre-norm per-block hidden states, captured via a forward hook because ESM3.forward discards them.
Per-position logits are large. Enabling return_logits adds a per-position vocabulary-sized float tensor per sequence, dominating wall time and memory on long inputs. Leave it False unless the per-position distribution is needed.

ESM3 Sampling (`esm3-sample`)

Selects positions via a configurable masking strategy, masks them, and resamples from ESM3’s predicted distribution. single_pass fills every masked position in one forward pass; iterative_refinement dispatches to ESM3’s native batch_generate for multi-round commitment. Positions can also be pre-masked directly with _ in the input string, or a masking strategy can be used.

API Reference

Source

Input: MaskedModelInput

sequences

List[string]

required

Protein sequence(s) to process. Can be provided as:

Source

Config: ESM3SampleConfig

masking_strategy

MaskingStrategy

Positions to mask before sampling.

Show MaskingStrategy

method

enum

default:"random"

Scoring method for position selection. "random": uniform random, "entropy": highest model uncertainty, "max-logit": lowest model confidence.Available options: random, entropy, max-logit

num_mutations

integer

Exact number of positions to mask per sequence.

mask_fraction

number

Fraction of designable positions to mask (e.g. 0.15 for ~15%).

fixed_positions

array

1-indexed positions that must NOT be masked. Applied uniformly to all sequences.

temperature

number

default:"1.0"

Temperature for position selection. < 1.0 is greedy, 1.0 uses scores as-is, > 1.0 is more uniform.

model_name

string

Which masked model to use for scoring. Defaults to the sampling tool’s model when unset.

model_checkpoint

string

Model checkpoint override (uses tool default if None).

model_checkpoint

string

default:"esm3_sm_open_v1"

ESM3 weights variant.

sampling_method

enum

default:"single_pass"

“single_pass” fills every mask in one forward; “iterative_refinement” dispatches to model.batch_generate and uses the five GenerationConfig knobs below.Available options: single_pass, iterative_refinement

temperature

number

default:"1.0"

Softmax temperature.

top_p

number

default:"1.0"

Nucleus threshold (iterative only).

num_steps

integer

default:"20"

Refinement steps (iterative only).

schedule

enum

default:"cosine"

Unmask schedule (iterative only).Available options: cosine, linear

strategy

enum

default:"random"

Per-round commit selection (iterative only).Available options: random, entropy

temperature_annealing

boolean

default:"True"

Anneal toward 0 across rounds (iterative only).

return_logits

boolean

default:"False"

Include per-position logits.

verbose

integer

default:"0"

Verbosity level (0=quiet, 1=info, 2=debug, 3=raw subprocess stderr). True is coerced to 1 and False to 0.

device

string

default:"cuda"

Device to run on.

timeout

integer

default:"600"

Maximum execution time in seconds. None waits indefinitely.

seed

integer

batch_size

integer

default:"1"

Sequences per GPU forward pass.

Source

Output: ESM3SampleOutput

logits

array

Per-position logits for each sequence. Shape is (num_sequences, seq_len, vocab_size=20). Only present if return_logits=True in config.

sequences

List[string]

required

Sampled or mutated protein sequences. Each sequence is a string of amino acid characters and is a modified version of the input sequence with masked positions changed to model-predicted alternatives.

Applications

This tool drives guided point mutation, variant generation, and infilling at designable sites. Resampling masked positions from a protein language model is the core operation behind directed-evolution proposals and antibody affinity maturation. Which positions are resampled is set by the masking strategy; see its README for the available selection methods and tuning knobs.

Usage Tips

iterative_refinement produces more coherent joint samples than single_pass. It runs ESM3’s batch_generate over num_steps rounds (cosine or linear unmask schedule) instead of filling every mask independently in one pass; it is roughly num_steps× slower. Default to it when masking more than a handful of sites.
masking_strategy controls which positions get masked before sampling. See the masking strategy README for the available selection methods and tuning knobs. As an alternative to passing a strategy, pre-mask exact positions with _ directly in the input string and the masking strategy is skipped entirely.
temperature scales the per-position logits before sampling. Values of 0.5 to 0.7 yield conservative mutations close to the input; values above 1.0 broaden exploration of the model’s distribution.

ESM3 Scoring (`esm3-score`)

Computes masked-language-model pseudo-perplexity for each input sequence. Each position is masked individually and the model’s log-probability of the true amino acid under bidirectional context is recorded, then aggregated into per-sequence log-likelihood, average log-likelihood, and perplexity.

API Reference

Source

Input: MaskedModelInput

sequences

List[string]

required

Protein sequence(s) to process. Can be provided as:

Source

Config: ESM3ScoringConfig

model_checkpoint

string

default:"esm3_sm_open_v1"

ESM3 weights variant.

verbose

integer

default:"0"

Print status messages during scoring.

device

string

default:"cuda"

Device to run the model on.

timeout

integer

default:"600"

Maximum execution time in seconds. None waits indefinitely.

seed

integer

batch_size

integer

default:"1"

Masked variants per forward pass, pooled across all input sequences. Larger batches improve throughput but use more memory.

return_logits

boolean

default:"False"

Include per-position logits in the output (large; disable to save memory).

Source

Output: MaskedModelScoringOutput

scores

List[MaskedModelScoringMetrics]

required

List of scoring outputs, one per input sequence. Each entry is a Metrics subclass with scalar metrics (accessed via score.perplexity or score["perplexity"]) plus declared logits / vocab fields that carry raw model outputs when requested.

Show MaskedModelScoringMetrics

logits

array

Per-position logits array (seq_len, vocab_size). None unless return_logits=True.

vocab

array

Token ordering for logits.

primary_metric

string

Name of the metric that best summarizes the result overall (e.g. "avg_plddt" for AlphaFold2). Used by downstream UI and reporting to pick a headline value.

Metrics (one set per scores item)

Metric	Type	Range	Availability
`log_likelihood`	float	≤ 0.0	always
`avg_log_likelihood`	float	≤ 0.0	always
`perplexity`	float	≥ 1.0	always

Applications

ESM3 pseudo-perplexity is a fitness proxy for ranking variants, filtering generated sequences for naturalness, or comparing engineered constructs against wild type. The masked log-likelihood difference between wild-type and mutant residues is a zero-shot baseline for variant-effect prediction.

Usage Tips

Pseudo-perplexity is a relative score, not an absolute fitness. It is measured against the model’s training distribution and is sensitive to length, so it is most useful for comparing closely related sequences of similar length.
Ambiguous residues are excluded. Perplexity is computed only over the 20 canonical amino acids; X, B, Z, and similar are dropped from both the log-likelihood sum and the position count.

Toolkit Notes

These apply to every ESM3 tool in this toolkit (esm3-embedding, esm3-sample, esm3-score).

ESM3 is a gated model and requires a HuggingFace token. The open checkpoint lives behind a gated HuggingFace repo (EvolutionaryScale/esm3-sm-open-v1). Set the HF_TOKEN environment variable with an account that has accepted the model license, or every tool raises before loading.
One open checkpoint is available. esm3_sm_open_v1 is the only public open-weights checkpoint; larger ESM3 models are EvolutionaryScale API-only and not wrapped here.
ESM3 is larger than many ESM-2 variants. For sequence-embedding-only workloads, smaller ESM-2 variants are faster; consider reaching for ESM3 when you want masked generative editing. This toolkit takes only amino-acid sequences as input and does not expose the structure or function tracks.
batch_size controls memory usage across the toolkit. Lower it if you OOM; raise it for short-sequence throughput. For esm3-score, batch_size counts masked variants pooled across all input sequences rather than sequences themselves (each input contributes one masked variant per position).

Example notebook: See the full working example for a copy-paste-ready walkthrough.

Infrastructure Guides

The following guides cover how to run tools efficiently and at scale.

Tool Persistence

Keep a tool’s model warm across calls instead of reloading it every invocation.

Device Management

How GPUs are allocated to tools and how to target specific devices.

Parallel Execution

Fan a batch of inputs out across multiple GPUs.

Cloud Inference

Run tools on managed cloud infrastructure with no local setup.

​Background

​Tools

​ESM3 Embeddings (esm3-embedding)

​API Reference

​Applications

​Usage Tips

​ESM3 Sampling (esm3-sample)

​API Reference

​Applications

​Usage Tips

​ESM3 Scoring (esm3-score)

​API Reference

​Applications

​Usage Tips

​Toolkit Notes

​Infrastructure Guides

Tool Persistence

Device Management

Parallel Execution

Cloud Inference

Background

Tools

ESM3 Embeddings (`esm3-embedding`)

API Reference

Applications

Usage Tips

ESM3 Sampling (`esm3-sample`)

API Reference

Applications

Usage Tips

ESM3 Scoring (`esm3-score`)

API Reference

Applications

Usage Tips

Toolkit Notes

Infrastructure Guides