Proto is not affiliated with Biohub. This toolkit is open source and builds on the implementation produced by this organization. Product names, logos, and trademarks are the property of their respective owners.
| Function | Description | |
|---|---|---|
run_esm3_embeddings() | Extract protein sequence embeddings and logits using ESM3 (GPU) | Docs Source |
run_esm3_sample() | Sample masked positions in protein sequences using ESM3 language model (GPU) | Docs Source |
run_esm3_score() | Score protein sequences using ESM3 language model (GPU) | Docs Source |
Background
In 2025, Hayes et al. introduced ESM3, a generative model from EvolutionaryScale that departs from the encoder-only design of the ESM-1/ESM-2 line. ESM3 is a masked generative transformer that represents a protein across three simultaneous tracks (amino-acid sequence, discrete structure tokens, and function annotation). Training masks spans across all three tracks, so a single model can be prompted with any combination of partial sequence, structure, and function and asked to complete the rest. The flagship 98B-parameter model (esm3-large-2024-03) is available through the EvolutionaryScale Forge API under closed-beta access (also offered via AWS SageMaker); the publicly released open checkpoint, esm3_sm_open_v1, is the small 1.4B-parameter variant.
ESM3 is the multimodal successor to ESM-2 (Lin et al., 2023). Where ESM-2 is a sequence-only masked language model, ESM3 adds structure and function tracks and a generative objective. For pure sequence-embedding workloads ESM-2 remains lighter and faster; ESM3 is the choice when masked generative editing matters. This toolkit exposes only the sequence-track operations (embeddings, masked sampling, scoring) over supplied sequences.
Tools
ESM3 Sampling (esm3-sample)
Selects positions via a configurable masking strategy, masks them, and resamples from ESM3’s predicted distribution. single_pass fills every masked position in one forward pass; iterative_refinement dispatches to ESM3’s native batch_generate for multi-round commitment. Positions can also be pre-masked directly with _ in the input string, or a masking strategy can be used.API Reference
Input: MaskedModelInput
Input: MaskedModelInput
Config: ESM3SampleConfig
Config: ESM3SampleConfig
model.batch_generate and uses the five GenerationConfig knobs below.Available options: single_pass, iterative_refinementcosine, linearrandom, entropyTrue is coerced to 1 and False to 0.None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.Output: ESM3SampleOutput
Output: ESM3SampleOutput
Applications
This tool drives guided point mutation, variant generation, and infilling at designable sites. Resampling masked positions from a protein language model is the core operation behind directed-evolution proposals and antibody affinity maturation. Which positions are resampled is set by the masking strategy; see its README for the available selection methods and tuning knobs.Usage Tips
iterative_refinementproduces more coherent joint samples thansingle_pass. It runs ESM3’sbatch_generateovernum_stepsrounds (cosine or linear unmask schedule) instead of filling every mask independently in one pass; it is roughlynum_steps×slower. Default to it when masking more than a handful of sites.masking_strategycontrols which positions get masked before sampling. See the masking strategy README for the available selection methods and tuning knobs. As an alternative to passing a strategy, pre-mask exact positions with_directly in the input string and the masking strategy is skipped entirely.temperaturescales the per-position logits before sampling. Values of 0.5 to 0.7 yield conservative mutations close to the input; values above 1.0 broaden exploration of the model’s distribution.
ESM3 Scoring (esm3-score)
Computes masked-language-model pseudo-perplexity for each input sequence. Each position is masked individually and the model’s log-probability of the true amino acid under bidirectional context is recorded, then aggregated into per-sequence log-likelihood, average log-likelihood, and perplexity.API Reference
Input: MaskedModelInput
Input: MaskedModelInput
Config: ESM3ScoringConfig
Config: ESM3ScoringConfig
None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.Output: MaskedModelScoringOutput
Output: MaskedModelScoringOutput
Metrics subclass with scalar metrics (accessed via score.perplexity or score["perplexity"]) plus declared logits / vocab fields that carry raw model outputs when requested.scores item)| Metric | Type | Range | Availability |
|---|---|---|---|
log_likelihood | float | ≤ 0.0 | always |
avg_log_likelihood | float | ≤ 0.0 | always |
perplexity | float | ≥ 1.0 | always |
Applications
ESM3 pseudo-perplexity is a fitness proxy for ranking variants, filtering generated sequences for naturalness, or comparing engineered constructs against wild type. The masked log-likelihood difference between wild-type and mutant residues is a zero-shot baseline for variant-effect prediction.Usage Tips
- Pseudo-perplexity is a relative score, not an absolute fitness. It is measured against the model’s training distribution and is sensitive to length, so it is most useful for comparing closely related sequences of similar length.
- Ambiguous residues are excluded. Perplexity is computed only over the 20 canonical amino acids;
X,B,Z, and similar are dropped from both the log-likelihood sum and the position count.
Toolkit Notes
These apply to every ESM3 tool in this toolkit (esm3-embedding, esm3-sample, esm3-score).
- ESM3 is a gated model and requires a HuggingFace token. The open checkpoint lives behind a gated HuggingFace repo (EvolutionaryScale/esm3-sm-open-v1). Set the
HF_TOKENenvironment variable with an account that has accepted the model license, or every tool raises before loading. - One open checkpoint is available.
esm3_sm_open_v1is the only public open-weights checkpoint; larger ESM3 models are EvolutionaryScale API-only and not wrapped here. - ESM3 is larger than many ESM-2 variants. For sequence-embedding-only workloads, smaller ESM-2 variants are faster; consider reaching for ESM3 when you want masked generative editing. This toolkit takes only amino-acid sequences as input and does not expose the structure or function tracks.
batch_sizecontrols memory usage across the toolkit. Lower it if you OOM; raise it for short-sequence throughput. Foresm3-score,batch_sizecounts masked variants pooled across all input sequences rather than sequences themselves (each input contributes one masked variant per position).

Biohub