Proto is not affiliated with Broad Institute, The Jackson Laboratory, and Yale University. This toolkit is open source and builds on the implementations produced by these organizations. Product names, logos, and trademarks are the property of their respective owners.
Background
Malinois is the regulatory sequence model used in CODA (Gosai et al., 2024) for machine-guided design of cell-type-targeting cis-regulatory elements. The model adapts the Basset-style convolutional architecture to MPRA data and predicts activity from a fixed 200 nucleotide insert after adding the assay flanks expected by the published checkpoint. The model returns one raw activity value for each supported cell context: K562, HepG2, and SK-N-SH. The scoring wrapper averages forward and reverse-complement predictions and returns selected raw outputs. The gradient wrapper applies max/min sigmoid objective terms to these raw scores and backpropagates through relaxed A,C,G,T logits, matching the Fast SeqProp-style design path used for regulatory DNA optimization.Tools
Malinois Score (malinois-score)
Scores one or more 200 bp DNA inserts and returns raw Malinois predictions keyed by requested cell type.API Reference
Input: MalinoisScoreInput
Input: MalinoisScoreInput
seq_length is checked at run time.Config: MalinoisScoreConfig
Config: MalinoisScoreConfig
artifact_url into the managed weights cache.True is coerced to 1 and False to 0.None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.Output: MalinoisScoreOutput
Output: MalinoisScoreOutput
scores.results item)| Metric | Type | Range | Availability |
|---|---|---|---|
K562 | float | unbounded | when requested |
HepG2 | float | unbounded | when requested |
SKNSH | float | unbounded | when requested |
Applications
Use this tool to rank regulatory DNA designs by predicted activity in K562, HepG2, or SK-N-SH cells, screen MPRA insert candidates, or compare candidate designs before selecting sequences for downstream validation.Usage Tips
- Sequence length is fixed by default. Inputs must match
seq_length, which defaults to 200 bp. - Cell type keys are canonical. Request outputs as
K562,HepG2, andSKNSH;SKNSHmaps to the SK-N-SH model output. - Batch size affects throughput. Increase
batch_sizefor many same-length inserts when GPU memory allows.
Malinois Gradient (malinois-gradient)
Computes a weighted differentiable activity objective and, by default, returns the gradient with respect to batched relaxed DNA logits.API Reference
Config: MalinoisGradientConfig
Config: MalinoisGradientConfig
True is coerced to 1 and False to 0.None waits indefinitely.BaseToolOutput.approx_equal), and the seed participates in cache keys. When None, cacheable seed-sensitive tools skip cache until seeded.Output: MalinoisGradientOutput
Output: MalinoisGradientOutput
None when compute_gradient=False.sample_metrics.| Metric | Type | Range | Availability |
|---|---|---|---|
loss | float | ≥ 0.0 | always |
K562 | float | unbounded | when requested |
HepG2 | float | unbounded | when requested |
SKNSH | float | unbounded | when requested |
Applications
Use this tool inside gradient-based DNA design loops to maximize activity in an on-target cell type while minimizing activity in off-target cell types. It is designed for optimizer calls rather than final biological validation.Usage Tips
- Logits are batched. Pass logits with shape
B x L x 4inA,C,G,Torder; useB=1for a single candidate. - Directions are per term.
direction="max"minimizes1 - sigmoid(raw)anddirection="min"minimizessigmoid(raw)after centering and scaling. - Soft/hard mixing controls relaxation.
soft=1.0, hard=0.0is fully soft; increasingharduses a straight-through hard-forward estimator.
Toolkit Notes
These apply to every Malinois tool in this toolkit (malinois-score, malinois-gradient).
- Requires a GPU. Both tools load a PyTorch Malinois checkpoint and run most practically on CUDA.
- Weights are provisioned automatically. By default, the standalone worker downloads the CODA Zenodo artifact into the managed model cache and verifies its MD5 checksum.
- The gradient tool is a single evaluation. It returns one loss and optional gradient for the provided logits; run it from an optimizer for iterative design.
Infrastructure Guides
The following guides cover how to run tools efficiently and at scale.Tool Persistence
Device Management
Parallel Execution
Cloud Inference
Additional Information
References
References
- Gosai, S. J. et al. Machine-guided design of cell-type-targeting cis-regulatory elements. Nature 634, 1211-1220 (2024). DOI: 10.1038/s41586-024-08070-z
- CODA/BODA2 repository: sjgosai/boda2
- CODA supplemental data and resources: Zenodo record 10698014

Broad Institute
The Jackson Laboratory
Yale University