Proto is not affiliated with Profluent. This toolkit is open source and builds on the implementation produced by this organization. Product names, logos, and trademarks are the property of their respective owners.
Background
ProGen3 (Bhatnagar et al., 2025) is a family of generative protein language models from Profluent. ProGen3 models employ a sparse mixture-of-experts (MoE) architecture, which routes model activations in the transformer feed-forward layers to smaller specialized MLPs to make each forward pass more computationally tractable. The published family spans 112 million to 46 billion parameters; this toolkit exposes theprogen3-112m through progen3-3b checkpoints. Pre-training used roughly 1.5 trillion amino-acid tokens sampled from the Profluent Protein Atlas, a curated collection of full-length natural proteins.
Unlike a strictly left-to-right model, ProGen3 is trained autoregressively in both directions: forward predicts each residue from the N-terminus toward the C-terminus, and reverse predicts from the C-terminus toward the N-terminus. Generation runs in a chosen direction, and scoring combines both directions into a single per-residue likelihood. Two capabilities follow from this objective. Sampling from the predicted next-residue distributions produces new candidate protein sequences, and the likelihood the model assigns to an existing sequence provides a zero-shot proxy-fitness score with no additional task-specific training.
Learning Resources
- ProGen3 showcase (Profluent) - an accessible overview of ProGen3, the Profluent Protein Atlas training data, and downstream applications such as antibody design and compact gene editors.
Tools
ProGen3 Sampling (progen3-sample)
Generates protein sequences by autoregressive sampling. Given one or more prompt sequences, the model extends each prompt one amino acid at a time, drawing each residue from the model’s predicted distribution under the configured temperature and top_p settings, in the chosen direction, until max_new_tokens residues have been generated (at least min_new_tokens).API Reference
Input: CausalModelSampleInput
Input: CausalModelSampleInput
Config: ProGen3SampleConfig
Config: ProGen3SampleConfig
progen3-112m, progen3-219m, progen3-339m, progen3-762m, progen3-1b, progen3-3b"forward" generates N→C, "reverse" generates C→N.Available options: forward, reverseTrue is coerced to 1 and False to 0.None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.True, returned sequences include the prompt and newly generated residues; if False, only the newly generated residues.Applications
This tool performs de novo protein design, generating novel sequences that resemble natural proteins, optionally conditioned on a prompt. Because generation can run in reverse (C-terminus toward N-terminus), a C-terminal fragment can be used as the prompt and the rest of the sequence grown toward the N-terminus, which a strictly left-to-right model cannot do.Usage Tips
directionchooses which terminus is generated."forward"(the default) continues a prompt from the N-terminus toward the C-terminus;"reverse"treats the prompt as a C-terminal fragment and generates toward the N-terminus. Note that the reverse generation will append to the prompt to grow the sequence on the left. All starting sequences should still be provided in the left to right direction from N->C.- Sampling defaults are conservative.
temperaturedefaults to0.2andtop_pto0.95, which keep generations close to natural-looking sequences; raisetemperaturefor more diverse but riskier designs. This tool exposes only nucleus (top_p) sampling for ProGen3; there is no top-k cutoff. max_new_tokensandmin_new_tokensbound the generated length. They count only newly generated residues (default256and1), separate from the prompt length.- Output includes the prompt by default.
prepend_prompt=True(the toolkit default) returns the prompt joined to its continuation; set itFalseto receive only the newly generated residues. - Generated sequences are candidates. Validate them with downstream tools (for example structure prediction, function annotation, or homology search) before drawing biological conclusions.
ProGen3 Scoring (progen3-score)
Scores existing protein sequences under ProGen3 using bidirectional likelihood. For each sequence it runs both a forward (N→C) and a reverse (C→N) pass, averages the per-position log-likelihoods into a single bidirectional value, and aggregates these into a log-likelihood, an average log-likelihood per residue, and a perplexity. It also exposes the forward, reverse, and bidirectional per-position values, and optionally the per-position logits.API Reference
Input: CausalModelScoringInput
Input: CausalModelScoringInput
Config: ProGen3ScoringConfig
Config: ProGen3ScoringConfig
progen3-112m, progen3-219m, progen3-339m, progen3-762m, progen3-1b, progen3-3bTrue is coerced to 1 and False to 0.None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.per_position_metrics (forward/reverse/bidirectional log-likelihoods).Output: CausalModelScoringOutput
Output: CausalModelScoringOutput
Metrics subclass with scalar metrics (log_likelihood, avg_log_likelihood, perplexity) and optional per-position _pp-suffixed list extras; logits and vocab are declared fields for raw model outputs.scores item)| Metric | Type | Range | Availability |
|---|---|---|---|
log_likelihood | float | ≤ 0.0 | always |
avg_log_likelihood | float | ≤ 0.0 | always |
perplexity | float | ≥ 1.0 | always |
Applications
This tool gives a zero-shot measure of how consistent a protein sequence is with ProGen3’s training distribution, usable as a fitness or plausibility signal without additional task-specific training. Because it uses both directions, every residue is scored with full surrounding context rather than left context only. Use it to rank or filter candidate sequences (including the output ofprogen3-sample), to compare variants of a sequence, or to flag sequences far from the model’s training distribution.Usage Tips
- Scores are bidirectional, not a single-direction log-likelihood. The reported
log_likelihood,avg_log_likelihood, andperplexityare derived from the averaged forward and reverse per-position values, so they are not directly comparable to a one-directional model’s scores. - Compare length-normalized scores within one checkpoint. Total
log_likelihoodscales with sequence length, so useperplexityoravg_log_likelihoodwhen comparing sequences of different lengths. Different checkpoints learn different distributions that are not calibrated to a common scale, so scores from differentmodel_checkpointvalues are hard to compare directly; a lower perplexity means the sequence is more consistent with that checkpoint’s training distribution. return_logitsdefaults toFalse. Leave it off unless you need the per-position distributions, since the logits tensor is large (sequence length by the token vocabulary).
Toolkit Notes
These apply to every ProGen3 tool in this toolkit (progen3-sample, progen3-score).
- Requires a GPU; memory scales with checkpoint size. This toolkit exposes the
progen3-112mthroughprogen3-3bcheckpoints; larger checkpoints are more capable but need substantially more GPU memory. CPU execution is not practical. batch_sizetrades memory for throughput across both tools. It sets how many same-length prompts (progen3-sample) or sequences (progen3-score) are processed per GPU forward pass. Raise it for higher throughput on many short sequences; lower it (default1) if generation or scoring runs out of GPU memory.model_checkpointselects the model size. The default isprogen3-762m; smaller checkpoints (progen3-112m,progen3-219m,progen3-339m) are faster and lighter, whileprogen3-1bandprogen3-3bare more capable at higher memory cost.

Profluent