Skip to main content
each step: backprop ∇ → merge → update logitsloss contoursgradient stepminimum
each step: backprop ∇ → merge → update logitsloss contoursgradient stepminimum

This optimizer is open source. Any third-party models, product names, or trademarks referenced are the property of their respective owners, and Proto is not affiliated with them.


Source
proto-bio/proto-language/proto_language/optimizer/gradient_optimizer.py
View source
Continuous gradient descent on per-position logits of a single segment. The optimization variable is seq.logits: an (L, |vocab|) matrix carried on each of num_results parallel proposal Sequences (one independent trajectory per result). The companion PositionWeightGenerator is the only thing that maps those continuous logits back to a discrete sequence — it never proposes; it merely decodes (argmax or categorical) at tracked steps so snapshots, result_sequences, and the next-stage handoff carry a real sequence. Logits are seeded once (zeros, or initial_logits/sequence_bias, optionally plus per-trajectory Gumbel noise so parallel trajectories diverge) and then mutated in place every step; they are never re-proposed. Each step: (1) interpolate soft (relax↔hard blend), hard (straight-through), softmax temperature and learning rate from their start→end configs/schedules with progress = step / num_steps; (2) ask every compiled gradient provider to backpropagate its differentiable (gradient-mode or compiler-backed) constraint into a per-trajectory logit gradient and per-trajectory loss, given the current temperature/soft/hard and the step’s effective weight; non-finite gradients raise. Then per trajectory: (3) align per-constraint gradient norms (norm_alignment), scale each by its effective weight, and merge them with the configured merger (weighted_sum/pcgrad/mgda); (4) zero fixed_positions and optionally normalize the merged gradient; (5) take one ml_optimizer step (SGD or Adam) at the effective learning rate (_effective_lr optionally scales it by (1 - soft) + soft * temp, floored at min_lr_scale). After updating, energy_scores is set to the summed weighted constraint losses — the exact objective being minimized. At tracking_interval steps (and the last step) the generator decodes logits, proposals sync to results, and a snapshot is saved. Per-constraint weights can ramp over steps via constraint_weight_schedules (a ConstraintWeightSchedule keyed by Constraint.label); unknown labels warn and are ignored. With save_best=True (default) the lowest-loss logits per trajectory are restored and re-decoded at the end instead of returning the final step. Constraints: single target segment only; exactly one PositionWeightGenerator; every constraint must support gradient evaluation. Chain stages in a Program for multi-phase pipelines (logit-relaxation phase via germinal_logit_preset → softmax-annealing phase via germinal_softmax_preset).

How It Works

The gradient optimizer relaxes the sequence into continuous logits and takes gradient steps that lower the constraint loss, sharpening the relaxation from soft to hard before decoding. The discrete sequence is relaxed into a continuous logit matrix L×|V|, one per trajectory. Each step sharpens a softmax relaxation, backpropagates every differentiable constraint into a per-trajectory gradient, merges them, and applies an SGD or Adam update:
progress = step / num_steps
soft, hard, τ   interpolate  start → end  with progress
E_k = Σ_i  w_i(step) · L_i(logits_k)          (weighted constraint losses)
logits_k ← Optimizer(logits_k, ∇E_k, lr)      (SGD or Adam)
Gradients merge by weighted_sum, pcgrad, or mgda; fixed_positions stay frozen (gradient set to 0). With save_best, the lowest-loss logits seen across all steps are decoded at the end through the PositionWeightGenerator (argmax by default, or categorical sampling).

API Reference

ConfigGradientOptimizerConfig Source
Configuration for gradient-based sequence optimization.Each GradientOptimizer runs one mode (fixed or ramping soft, with optional temperature annealing). Chain multiple in a Program for multi-phase pipelines (e.g. logit phase → softmax phase).
Ramps use progress = step / num_steps with step starting at 1, so step 1 evaluates to start + (end - start) / num_steps (not exactly start); step num_steps evaluates exactly to end.
num_results
integer
Candidate designs for this optimizer. Overrides program-level count.
num_steps
integer
default:"1"
Number of gradient descent steps.
lr
number
default:"0.05"
Base learning rate for gradient updates.
sequence_bias
SequenceLogitBiasConfig
Per-position logit bias for the target vocabulary; added to initial logits to seed the search.
soft_start
number
default:"1.0"
Soft sampling weight at the first step. 0 uses hard logits; 1 uses the full softmax over logits.
soft_end
number
default:"1.0"
Soft sampling weight at the final step. 0 uses hard logits; 1 uses the full softmax.
hard_start
number
default:"0.0"
Straight-through blend at step 1. 0 is fully relaxed; 1 is argmax forward + relaxed gradient.
hard_end
number
default:"0.0"
Straight-through blend at the final step. 0 is fully relaxed; 1 = argmax forward + relaxed grad.
temperature_start
number
default:"1.0"
Softmax temperature at the first step. Lower values produce sharper distributions.
temperature_end
number
default:"1.0"
Softmax temperature at the final step. Lower values produce sharper distributions.
softmax_schedule
enum
default:"constant"
Curve interpolating the softmax temperature from start to end across optimization steps.Options: constant, cosine, exponential, hinge, linear, quadratic
lr_schedule
enum
default:"constant"
LR curve over the temperature endpoints; only active when scale_lr_by_temperature=True.Options: constant, cosine, exponential, hinge, linear, quadratic
merger
enum
default:"weighted_sum"
Strategy for merging gradients from multiple constraints.Options: weighted_sum, pcgrad, mgda
ml_optimizer
enum
default:"sgd"
Gradient update rule applied each step. Currently ‘sgd’ or ‘adam’.Options: sgd, adam
adam_config
AdamConfig
Beta and epsilon parameters used when the update algorithm is ‘adam’.
norm_alignment
enum
default:"none"
How per-constraint gradients are rescaled before merging: as-is, unit-normalized, or match-first.Options: none, unit, match_first
zero_norm_eps
number
default:"0.0"
In match_first mode, zero out gradients with norm below this threshold.
normalize_gradients
boolean
default:"True"
Normalize the merged gradient before each update.
normalize_mode
enum
default:"unit"
‘unit’ rescales the gradient to unit L2 norm; ‘sqrt_length’ scales magnitude by sqrt(length).Options: unit, sqrt_length
fixed_positions
array
Zero-based positions to freeze during optimization. Pair with sequence_bias to anchor each position.
scale_lr_by_temperature
boolean
default:"False"
Multiply LR by a blend of soft weight and softmax temperature; slows updates as sharpness rises.
min_lr_scale
number
default:"0.0"
Lower bound on the learning-rate scale factor when temperature scaling is enabled.
save_best
boolean
default:"True"
Return the lowest-loss result instead of the last iteration.
constraint_weight_schedules
array
Per-constraint weight schedules that override the constraint’s static weight at each step.
gumbel_logit_init
boolean
default:"False"
Add Gumbel noise to default-init logits (frozen positions excluded) to diverge trajectories.
gumbel_init_alpha
number
default:"1.0"
Divisor for the default-path Gumbel init noise. 1.0 = unscaled; larger shrinks it.
initial_logits
array
Base logit matrix (rows=positions, cols=vocab) that replaces default initialization.
softmax_init_positions
array
Zero-based positions perturbed with Gumbel noise and passed through a softmax over initial logits.
seed
integer
Random seed for reproducible optimization, generator, and constraint tool streams.
tracking_interval
integer
default:"1"
Save history and log progress every N steps. Step 0 and final step always saved.
track_proposals
boolean
default:"False"
Save granular per-proposal results (accept/reject) in history snapshots.
verbose
boolean
default:"False"
Emit per-step debug information about proposals, scores, and acceptance through the logger.

Usage

python
>>> from proto_language.constraint import MalinoisActivityConfig, malinois_activity_constraint
>>> from proto_language.core import Constraint, Construct, Program, Segment
>>> from proto_language.generator import PositionWeightGenerator, PositionWeightGeneratorConfig
>>> from proto_language.optimizer import GradientOptimizer, GradientOptimizerConfig
>>> seg = Segment(sequence="A" * 200, sequence_type="dna", label="enhancer")
>>> gen = PositionWeightGenerator(PositionWeightGeneratorConfig(sampling_mode="argmax"))
>>> gen.assign(seg)
>>> on_target = Constraint(  # differentiable path
...     inputs=[seg],
...     function=malinois_activity_constraint,
...     function_config=MalinoisActivityConfig(cell_type="K562", direction="max"),
...     label="malinois_k562_max",
...     weight=1.0,
... )
>>> off_target = Constraint(
...     inputs=[seg],
...     function=malinois_activity_constraint,
...     function_config=MalinoisActivityConfig(cell_type="HepG2", direction="min"),
...     label="malinois_hepg2_min",
...     weight=1.0,
... )
>>> optimizer = GradientOptimizer(
...     target_segment=seg,
...     constructs=[Construct([seg])],
...     generators=[gen],
...     constraints=[on_target, off_target],
...     config=GradientOptimizerConfig(
...         num_results=20,
...         num_steps=300,
...         lr=0.5,
...         ml_optimizer="adam",
...         merger="weighted_sum",
...         gumbel_logit_init=True,  # diverge the 20 trajectories
...     ),
... )
>>> program = Program([optimizer], num_results=20)
>>> # program.run()  # needs GPU

Metadata

PropertyValue
Keygradient
ClassGradientOptimizer
Targets Single SegmentTrue
Uses GPUFalse
Required Constraint Modegradient
Compatible Generatorsposition-weight