Skip to main content
ESM3 Protein Language Model
License: ESM3 uses Custom (Cambrian Open License Agreement) for code and Custom (Cambrian Non-Commercial License Agreement) for model weights and has restrictions around commercial use and may require explicit attribution when utilized. Model weights are gated and require accepting the provider’s terms and authenticating with a HuggingFace token. Please refer to the code license and model weights license for full terms.

This generator is open source. Any third-party models, product names, or trademarks referenced are the property of their respective owners, and Proto is not affiliated with them.


Go to Tool Page
proto-bio/proto-language/proto_language/generator/esm3_generator.py
View source
@article{hayes2025esm3,
  title={Simulating 500 million years of evolution with a language model},
  author={Hayes, Thomas and Rao, Roshan and Akin, Halil and Sofroniew, Nicholas J and Oktay, Deniz and Lin, Zeming and Verkuil, Robert and Tran, Vincent Q and Deaton, Jonathan and Wiber, Marius and others},
  journal={Science},
  volume={387},
  number={6735},
  pages={eads0018},
  year={2025},
  publisher={American Association for the Advancement of Science},
  doi={10.1126/science.ads0018}
}
Copy citation
Protein sequence mutation/refinement generator using ESM3 open language model. This generator uses the open-source ESM3 protein language model to refine existing protein sequences through iterative mutation. It masks positions according to the configured masking strategy and samples biologically plausible amino acids at those positions. The generator category is "mutation", indicating it refines proposal sequences through targeted mutations.

API Reference

ConfigESM3GeneratorConfig Source
Configuration object for ESM3Generator.This class defines configuration parameters for the ESM3 generator, which uses the open-source ESM3 protein language model to refine existing protein sequences through iterative mutation of masked positions. In Proto Language, ESM3 is registered as a mutation-category generator that edits the supplied starting sequence; the segment must carry a sequence (directly or from a prior optimizer stage).
ESM3 is the open-source version of EvolutionaryScale’s protein language model.
model_checkpoint
string
default:"esm3_sm_open_v1"
ESM-3 model variant to load.
masking_strategy
MaskingStrategy
Controls which positions to mask for sampling. Default: random 30%.
sampling_method
enum
default:"single_pass"
‘single_pass’ fills all masks in one forward; ‘iterative_refinement’ uses ESM-3 batch generation.Options: single_pass, iterative_refinement
temperature
number
default:"1.0"
Scales the randomness of sampling by adjusting probability distribution sharpness.
top_p
number
default:"1.0"
Nucleus sampling threshold; 1.0 disables
num_steps
integer
default:"20"
Iterative-refinement decoding steps; diminishing returns above 20
schedule
enum
default:"cosine"
Unmask schedule across rounds; ‘cosine’ fronts more commits lateOptions: cosine, linear
strategy
enum
default:"random"
Position-selection per round; ‘entropy’ commits the most-confident firstOptions: random, entropy
temperature_annealing
boolean
default:"True"
Anneal temperature toward 0 across rounds
device
string
default:"cuda"
GPU device to run ESM3 on (e.g. ‘cuda’ or ‘cuda:0’).
batch_size
integer
default:"1"
Number of sequences to process simultaneously on GPU

Usage

python
>>> from proto_language.generator import ESM3Generator, ESM3GeneratorConfig
>>> from proto_language.core import Segment
>>> from proto_tools.transforms.masking import MaskingStrategy
>>> config = ESM3GeneratorConfig(
...     temperature=1.0,
...     masking_strategy=MaskingStrategy(num_mutations=5),
... )
>>> gen = ESM3Generator(config)
>>> segment = Segment(sequence="M" * 100, sequence_type="protein")
>>> gen.assign(segment)
>>> gen.sample()  # Re-samples 5 randomly masked positions

Metadata

PropertyValue
Keyesm3
ClassESM3Generator
Categorymutation
Input Typestarting_sequence
Uses GPUTrue
Supported Sequence Typesprotein
Allows Empty StartFalse