Proto is not affiliated with Oxford Protein Informatics Group (OPIG). This toolkit is open source and builds on the implementation produced by this organization. Product names, logos, and trademarks are the property of their respective owners.
| Function | Description | |
|---|---|---|
run_ablang_embeddings() | Extract antibody sequence embeddings using AbLang (GPU) | Docs Source |
run_ablang_gradient() | Compute AbLang masked pseudo-log-likelihood gradient for relaxed antibody sequences (GPU) | Docs Source |
run_ablang_sample() | Restore masked antibody sequence positions using AbLang (GPU) | Docs Source |
run_ablang_score() | Score antibody sequences using AbLang language model (GPU) | Docs Source |
Background
AbLang (Olsen, Moal, and Deane, 2022) is a BERT-style masked language model trained exclusively on antibody variable-domain sequences from the OAS database. The published work demonstrates that AbLang restores residues missing from antibody sequence reads more accurately than germline-based imputation or the general-purpose ESM-1b protein language model, and runs approximately seven times faster than ESM-1b. Two single-chain checkpoints are provided,ablang1-heavy and ablang1-light, each with a 768-dimensional hidden representation.
AbLang-2 (Olsen, Moal, and Deane, 2024) is trained on both unpaired and paired antibody sequence data and addresses a germline-residue bias observed in earlier antibody language models that overweighted germline positions during training. The published analysis shows that AbLang-2 suggests a diverse set of valid mutations with high cumulative probability and provides paired-chain context for antibody design. The ablang2-paired checkpoint exposed by this toolkit has a 480-dimensional hidden representation.
Learning Resources
- oxpig/AbLang (OPIG, University of Oxford). Official AbLang repository, source code, and reference implementation of the heavy- and light-chain checkpoints.
- oxpig/AbLang2 (OPIG, University of Oxford). Official AbLang-2 repository for the paired heavy-plus-light checkpoint.
- Observed Antibody Space (OPIG). Public antibody sequence database used to train the AbLang models.
Tools
AbLang Sampling (ablang-sample)
Restores masked positions in antibody sequences using the AbLang masked-language-model head. Positions to be restored are marked with an underscore (_) in the input sequence, and the tool samples a replacement amino acid at each masked position from the model’s predicted distribution. The sampling temperature is configurable, and greedy argmax decoding is selected by setting temperature=0.API Reference
Input: AbLangSampleInput
Input: AbLangSampleInput
_ at positions to restore.Config: AbLangSampleConfig
Config: AbLangSampleConfig
temperature == 0 selects greedy argmax decoding (equivalent to ablang’s native restore mode). temperature == 1 samples from the unscaled model distribution; higher values flatten the distribution toward uniform, lower values sharpen toward greedy.likelihood-mode forward pass per batch.True is coerced to 1 and False to 0.None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.Applications
This tool is appropriate for completing antibody sequences with missing residues, a common need when working with B-cell receptor sequencing reads that drop the first several N-terminal residues. Representative applications include filling sequencing-dropout positions before downstream structural prediction, exploring single-position substitutions in CDR or framework regions, and generating antibody-context-aware variants for humanisation or affinity-maturation campaigns.Usage Tips
- Use the underscore (
_) as the mask character. Other placeholders such as*,X, or<mask>are not recognised. Each underscore in the input sequence is replaced with a sample drawn from the model distribution at that position. temperaturecontrols the sampling stochasticity. The default of1.0samples from the unscaled model distribution, producing different sequences across repeated calls. Settemperature=0for greedy argmax decoding, which matches AbLang’s nativerestoremode and produces deterministic output. Lower positive values sharpen toward the top prediction, higher values flatten toward uniform. Useseedto make stochastic runs reproducible.- Set
align=Trueto extend unknown-length termini. When the input sequence is shorter than expected, enabling ANARCI-based alignment lets AbLang restore residues at the N or C terminus as well as in the middle of the sequence. Settingalign=Trueforces greedy decoding regardless of thetemperaturesetting, since the ANARCI alignment is incompatible with stochastic sampling. - Set
return_logits=Trueto recover the per-position amino-acid distribution. When enabled, the output carries a per-position logit matrix of shape(num_sequences, seq_len, 20)alongside the sampled sequence, which is useful for downstream re-ranking or post-hoc analysis. The default omits the logits to keep the response small.
AbLang Scoring (ablang-score)
Computes per-sequence scores under the AbLang masked-language-model head. The scoring_mode configuration field selects between pseudo-log-likelihood ("pseudo_log_likelihood") and confidence ("confidence") scoring.API Reference
Config: AbLangScoringConfig
Config: AbLangScoringConfig
"pseudo_log_likelihood" masks each position individually (accurate, O(L) passes); "confidence" is a single-pass confidence proxy (faster, less accurate).Available options: pseudo_log_likelihood, confidenceTrue is coerced to 1 and False to 0.None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.likelihood-mode forward pass per batch.Output: MaskedModelScoringOutput
Output: MaskedModelScoringOutput
Metrics subclass with scalar metrics (accessed via score.perplexity or score["perplexity"]) plus declared logits / vocab fields that carry raw model outputs when requested.scores item)| Metric | Type | Range | Availability |
|---|---|---|---|
log_likelihood | float | ≤ 0.0 | always |
avg_log_likelihood | float | ≤ 0.0 | always |
perplexity | float | ≥ 1.0 | always |
Applications
This tool is appropriate for ranking antibody sequences by how “natural” they look under the model. Representative applications include selecting humanisation candidates closer to natural human antibody repertoires, flagging candidate sequences with low predicted naturalness for redesign, and ranking ProteinMPNN- or design-pipeline-generated sequences by pseudo-log-likelihood before more expensive downstream analyses.Usage Tips
- Pseudo-log-likelihood scores from different checkpoints sit on different scales and are not directly comparable. Each of
ablang1-heavy,ablang1-light, andablang2-pairedwas trained independently and produces scores on its own scale, so heavy-chain scores cannot be compared against light-chain scores and single-chain scores cannot be compared against paired-chain scores. Only compare antibodies that were scored with the same model variant. - Higher pseudo-log-likelihood corresponds to a more probable sequence under AbLang. Use scores comparatively across variants of the same antibody rather than as an absolute developability or affinity score. A high score reflects sequence likeness to the training distribution, not predicted experimental performance.
AbLang Gradient (ablang-gradient)
Computes the gradient of the AbLang masked pseudo-log-likelihood objective with respect to a relaxed antibody-logit input. The tool accepts an AntibodyLogits object whose heavy_chain and light_chain fields are per-position logit or probability matrices, masks each amino-acid position in turn, scores the bidirectional-context prediction with cross-entropy against the input distribution, and returns the gradient matrix together with the loss value and auxiliary metrics.API Reference
Input: AbLangGradientInput
Input: AbLangGradientInput
softmax(input / temperature) before computing the gradient. When None (default), the input is used as-is.Config: AbLangGradientConfig
Config: AbLangGradientConfig
False, uses soft blended embeddings directly.False for forward-only log-likelihood scoring (e.g. MCMC proposal ranking).None auto- selects a per-model default (lower if OOM, higher for throughput).True is coerced to 1 and False to 0.None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.Output: AbLangGradientOutput
Output: AbLangGradientOutput
None when compute_gradient=False (forward-only scoring).log_likelihood, avg_log_likelihood, perplexity, sequence_length, model_choice, objective.Applications
This tool is appropriate for differentiable antibody-design pipelines that update a continuous sequence representation by gradient descent. Representative applications include relaxed-logit hallucination for antibody design, joint optimisation of AbLang likelihood together with structure-based losses such as AlphaFold2 hallucination, and incorporating an antibody-specific naturalness term into broader binder-design objectives.Usage Tips
- Input logits use the canonical protein order
ACDEFGHIKLMNPQRSTVWY. The tool implementation internally maps to AbLang’s vocabulary order before the forward pass and returns the gradient in the same canonical order, so the user does not need to handle the AbLang-specific token order separately. - Set
temperatureto apply a softmax before scoring. Whentemperatureis set, the tool implementation appliessoftmax(input / temperature)to the input logits before the forward pass. Leavetemperature=None(the default) when the user already provides a normalised probability distribution. - Use the Straight-Through Estimator option for discrete-token gradients. Setting
use_ste=Truesubstitutes hard one-hot tokens in the forward pass while allowing gradients to flow through the soft probabilities, which can produce sharper update directions for some discrete-design loops. The default (use_ste=False) uses soft blended embeddings. - Set
compute_gradient=Falsefor forward-only scoring. This skips the backward pass and returnsgradient=Nonetogether with the loss value, which is useful for ranking candidates from a Monte Carlo proposal without paying the backward-pass cost.
Toolkit Notes
These apply to every AbLang tool in this toolkit (ablang-embedding, ablang-gradient, ablang-sample, ablang-score).
- All four tools route automatically among the three AbLang checkpoints based on the chains provided. Providing only a heavy chain selects
ablang1-heavy, providing only a light chain selectsablang1-light, and providing both selects the pairedablang2-pairedcheckpoint. At least one chain must be set on each input. - Every antibody in a batched call must use the same chain configuration. The embedding, scoring, and sampling tools accept a list of antibodies in a single call, and every antibody in that list must provide the same combination of heavy and light chains so that all entries route to the same checkpoint. Mixed lists are rejected at input construction with a clear error.
- AbLang is appropriate for antibody variable-domain sequences only. Non-antibody proteins should be analysed with a general-purpose protein language model such as ESM2 rather than AbLang, which was trained exclusively on antibody sequences and produces unreliable scores or embeddings outside that distribution.

Oxford Protein Informatics Group (OPIG)