GradientOptimizer to shape the sequence: the objective maximizes predicted K562 activity while
minimizing activity in HepG2 and SKNSH, yielding a K562-specific enhancer.
This runs three hundred gradient steps and returns twenty designs. It requires a GPU for the
Malinois model.
Open as a runnable notebook
View as a Python script
Runtime: this walkthrough runs real models on a GPU and takes several minutes to complete. The first run is slower because it builds the tool environment and downloads model weights.
The design segment and generator
ASegment is the stretch of sequence being designed; a Construct groups the segments that make up one molecule. The segment is seeded with "A" * SEQ_LENGTH, a placeholder that fixes the length (200 bp) and sequence_type="dna" vocabulary; the optimizer overwrites the actual bases. Instead of holding discrete characters, the GradientOptimizer represents each position as a set of differentiable per-position logits and updates them by gradient descent. PositionWeightGenerator is the only component that maps those continuous logits back to a discrete sequence; with sampling_mode="argmax" it decodes the most likely base at each position. generator.assign(segment) binds it to the segment it decodes.
python
The specificity objective
Constraints score how well a sequence meets the design objective; the optimizer searches for sequences that lower the combined loss. Malinois predicts MPRA activity for 200 bp DNA inserts in the K562, HepG2, and SK-N-SH (SKNSH) cell contexts, and each constraint maps the requested cell-type score to a bounded objective:direction="max" rewards higher predicted activity, direction="min" rewards lower activity. The helper builds three constraints over the same segment: maximize K562 activity, minimize HepG2 activity, and minimize SKNSH activity, each with seq_length=SEQ_LENGTH and weight=1.0. Each Constraint carries a label, the key its diagnostics appear under in the result metadata. All three back-propagate through the Malinois model, which is what lets the GradientOptimizer use them.
python
Run the gradient optimization
TheGradientOptimizer runs continuous gradient descent on the segment’s per-position logits, backpropagating each constraint into a logit gradient, merging the per-constraint gradients, and updating the logits each step. The config sets num_results=20 parallel trajectories over num_steps=300 steps with base learning rate lr=0.5. Updates use ml_optimizer="adam", and the per-constraint gradients are combined with merger="weighted_sum" (each scaled by its weight). lr_schedule="cosine" with scale_lr_by_temperature=True anneals the learning rate on a cosine curve across the run. gumbel_logit_init=True adds Gumbel noise to the initial logits so the 20 trajectories diverge, and save_best=True returns each trajectory’s lowest-loss design rather than its final step. The optional custom_logging callback fires at tracked steps; here track records each snapshot’s sequence and its K562 raw activity (read from the k562_max constraint metadata) into trajectory. Program(..., seed=0) makes the run deterministic.
python
Inspect the result
segment.result_sequences holds the returned designs; with save_best=True the first entry is the lowest-loss design. The first block prints a few evenly spaced snapshots from trajectory, each showing the step number, the K562 raw activity at that step, and the start of the sequence, so you can watch the activity rise as the logits are optimized. The final lines print the top design’s full sequence and its predicted K562 activity, read back from the k562_max constraint’s malinois_raw_score metadata.
python
Next Steps
Gradient Protein Hallucination
The gradient optimizer applied to proteins.
Using Optimizers
The gradient optimizer in context.