Skip to main content
ESM2 Protein Language Model
License: ESM2 is open source and free for academic and commercial use under an MIT license. Please refer to the license for full terms.

This generator is open source. Any third-party models, product names, or trademarks referenced are the property of their respective owners, and Proto is not affiliated with them.


Go to Tool Page
proto-bio/proto-language/proto_language/generator/esm2_generator.py
View source
@article{lin2023esm2,
  title={Evolutionary-scale prediction of atomic-level protein structure with a language model},
  author={Lin, Zeming and Akin, Halil and Rao, Roshan and Hie, Brian and Zhu, Zhongkai and Lu, Wenting and Smetanin, Nikita and Verkuil, Robert and Kabeli, Ori and Shmueli, Yaniv and others},
  journal={Science},
  volume={379},
  number={6637},
  pages={1123--1130},
  year={2023},
  publisher={American Association for the Advancement of Science},
  doi={10.1126/science.ade2574}
}
Copy citation
Protein sequence mutation/refinement generator using ESM2 language model. This generator uses the ESM2 protein language model to refine existing protein sequences through iterative mutation. It masks positions according to the configured masking strategy and samples biologically plausible amino acids at those positions. The generator category is "mutation", indicating it refines proposal sequences through targeted mutations.

API Reference

ConfigESM2GeneratorConfig Source
Configuration object for ESM2Generator.This class defines configuration parameters for the ESM2 generator, which uses a protein language model to refine existing protein sequences through iterative mutation of masked positions. In Proto Language, ESM2 is registered as a mutation-category generator that edits the supplied starting sequence; the segment must carry a sequence (directly or from a prior optimizer stage).
model_checkpoint
enum
default:"esm2_t33_650M_UR50D"
ESM-2 model variant to load (e.g. esm2_t33_650M_UR50D).Options: esm2_t6_8M_UR50D, esm2_t12_35M_UR50D, esm2_t30_150M_UR50D, esm2_t33_650M_UR50D, esm2_t36_3B_UR50D, esm2_t48_15B_UR50D
masking_strategy
MaskingStrategy
Controls which positions to mask for sampling. Default: random 30%.
sampling_method
enum
default:"single_pass"
‘single_pass’ fills all masks in one forward; ‘iterative_refinement’ runs a MaskGIT-style loop.Options: single_pass, iterative_refinement
temperature
number
default:"1.0"
Sharpness of sampling. Below 1 sharpens toward the likely amino acid; above 1 increases diversity.
top_p
number
default:"1.0"
Nucleus sampling cumulative probability cutoff used in iterative refinement. 1.0 disables it.
num_steps
integer
default:"20"
Number of iterative-refinement rounds. Returns diminish above 20.
schedule
enum
default:"cosine"
Per-round unmask rate. ‘cosine’ commits more positions late; ‘linear’ commits the same each round.Options: cosine, linear
strategy
enum
default:"random"
How positions are picked each round. ‘entropy’ takes most-confident first; ‘random’ is uniform.Options: random, entropy
temperature_annealing
boolean
default:"True"
Anneal temperature toward 0 across rounds
device
string
default:"cuda"
GPU device to run ESM2 on (e.g. ‘cuda’ or ‘cuda:0’).
batch_size
integer
default:"1"
Number of sequences to process simultaneously on GPU

Usage

python
>>> from proto_language.generator import ESM2Generator, ESM2GeneratorConfig
>>> from proto_language.core import Segment
>>> from proto_tools.transforms.masking import MaskingStrategy
>>> config = ESM2GeneratorConfig(
...     temperature=1.0,
...     masking_strategy=MaskingStrategy(num_mutations=5),
... )
>>> gen = ESM2Generator(config)
>>> segment = Segment(sequence="M" * 100, sequence_type="protein")
>>> gen.assign(segment)
>>> gen.sample()  # Re-samples 5 randomly masked positions

Metadata

PropertyValue
Keyesm2
ClassESM2Generator
Categorymutation
Input Typestarting_sequence
Uses GPUTrue
Supported Sequence Typesprotein
Allows Empty StartFalse