Gradient Optimizer

This optimizer is open source. Any third-party models, product names, or trademarks referenced are the property of their respective owners, and Proto is not affiliated with them.

Source

proto-bio/proto-language/proto_language/optimizer/gradient_optimizer.py

View source Continuous gradient descent on per-position logits of a single segment. The optimization variable is seq.logits: an (L, |vocab|) matrix carried on each of num_results parallel proposal Sequences (one independent trajectory per result). The companion PositionWeightGenerator is the only thing that maps those continuous logits back to a discrete sequence — it never proposes; it merely decodes (argmax or categorical) at tracked steps so snapshots, result_sequences, and the next-stage handoff carry a real sequence. Logits are seeded once (zeros, or initial_logits/sequence_bias, optionally plus per-trajectory Gumbel noise so parallel trajectories diverge) and then mutated in place every step; they are never re-proposed. Each step: (1) interpolate soft (relax↔hard blend), hard (straight-through), softmax temperature and learning rate from their start→end configs/schedules with progress = step / num_steps; (2) ask every compiled gradient provider to backpropagate its differentiable (gradient-mode or compiler-backed) constraint into a per-trajectory logit gradient and per-trajectory loss, given the current temperature/soft/hard and the step’s effective weight; non-finite gradients raise. Then per trajectory: (3) align per-constraint gradient norms (norm_alignment), scale each by its effective weight, and merge them with the configured merger (weighted_sum/pcgrad/mgda); (4) zero fixed_positions and optionally normalize the merged gradient; (5) take one ml_optimizer step (SGD or Adam) at the effective learning rate (_effective_lr optionally scales it by (1 - soft) + soft * temp, floored at min_lr_scale). After updating, energy_scores is set to the summed weighted constraint losses — the exact objective being minimized. At tracking_interval steps (and the last step) the generator decodes logits, proposals sync to results, and a snapshot is saved. Per-constraint weights can ramp over steps via constraint_weight_schedules (a ConstraintWeightSchedule keyed by Constraint.label); unknown labels warn and are ignored. With save_best=True (default) the lowest-loss logits per trajectory are restored and re-decoded at the end instead of returning the final step. Constraints: single target segment only; exactly one PositionWeightGenerator; every constraint must support gradient evaluation. Chain stages in a Program for multi-phase pipelines (logit-relaxation phase via germinal_logit_preset → softmax-annealing phase via germinal_softmax_preset).

How It Works

The gradient optimizer relaxes the sequence into continuous logits and takes gradient steps that lower the constraint loss, sharpening the relaxation from soft to hard before decoding. The discrete sequence is relaxed into a continuous logit matrix L×|V|, one per trajectory. Each step sharpens a softmax relaxation, backpropagates every differentiable constraint into a per-trajectory gradient, merges them, and applies an SGD or Adam update:

progress = step / num_steps
soft, hard, τ   interpolate  start → end  with progress
E_k = Σ_i  w_i(step) · L_i(logits_k)          (weighted constraint losses)
logits_k ← Optimizer(logits_k, ∇E_k, lr)      (SGD or Adam)

Gradients merge by weighted_sum, pcgrad, or mgda; fixed_positions stay frozen (gradient set to 0). With save_best, the lowest-loss logits seen across all steps are decoded at the end through the PositionWeightGenerator (argmax by default, or categorical sampling).

API Reference

ConfigGradientOptimizerConfig Source

Configuration for gradient-based sequence optimization.Each GradientOptimizer runs one mode (fixed or ramping soft, with optional temperature annealing). Chain multiple in a Program for multi-phase pipelines (e.g. logit phase → softmax phase).

Ramps use progress = step / num_steps with step starting at 1, so step 1 evaluates to start + (end - start) / num_steps (not exactly start); step num_steps evaluates exactly to end.

num_results

integer

Candidate designs for this optimizer. Overrides program-level count.

num_steps

integer

default:"1"

Number of gradient descent steps.

number

default:"0.05"

Base learning rate for gradient updates.

sequence_bias

SequenceLogitBiasConfig

Per-position logit bias for the target vocabulary; added to initial logits to seed the search.

soft_start

number

default:"1.0"

Soft sampling weight at the first step. 0 uses hard logits; 1 uses the full softmax over logits.

soft_end

number

default:"1.0"

Soft sampling weight at the final step. 0 uses hard logits; 1 uses the full softmax.

hard_start

number

default:"0.0"

Straight-through blend at step 1. 0 is fully relaxed; 1 is argmax forward + relaxed gradient.

hard_end

number

default:"0.0"

Straight-through blend at the final step. 0 is fully relaxed; 1 = argmax forward + relaxed grad.

temperature_start

number

default:"1.0"

Softmax temperature at the first step. Lower values produce sharper distributions.

temperature_end

number

default:"1.0"

Softmax temperature at the final step. Lower values produce sharper distributions.

softmax_schedule

enum

default:"constant"

Curve interpolating the softmax temperature from start to end across optimization steps.Options: constant, cosine, exponential, hinge, linear, quadratic

lr_schedule

enum

default:"constant"

LR curve over the temperature endpoints; only active when scale_lr_by_temperature=True.Options: constant, cosine, exponential, hinge, linear, quadratic

merger

enum

default:"weighted_sum"

Strategy for merging gradients from multiple constraints.Options: weighted_sum, pcgrad, mgda

ml_optimizer

enum

default:"sgd"

Gradient update rule applied each step. Currently ‘sgd’ or ‘adam’.Options: sgd, adam

adam_config

AdamConfig

Beta and epsilon parameters used when the update algorithm is ‘adam’.

norm_alignment

enum

default:"none"

How per-constraint gradients are rescaled before merging: as-is, unit-normalized, or match-first.Options: none, unit, match_first

zero_norm_eps

number

default:"0.0"

In match_first mode, zero out gradients with norm below this threshold.

normalize_gradients

boolean

default:"True"

Normalize the merged gradient before each update.

normalize_mode

enum

default:"unit"

‘unit’ rescales the gradient to unit L2 norm; ‘sqrt_length’ scales magnitude by sqrt(length).Options: unit, sqrt_length

fixed_positions

array

Zero-based positions to freeze during optimization. Pair with sequence_bias to anchor each position.

scale_lr_by_temperature

boolean

default:"False"

Multiply LR by a blend of soft weight and softmax temperature; slows updates as sharpness rises.

min_lr_scale

number

default:"0.0"

Lower bound on the learning-rate scale factor when temperature scaling is enabled.

save_best

boolean

default:"True"

Return the lowest-loss result instead of the last iteration.

constraint_weight_schedules

array

Per-constraint weight schedules that override the constraint’s static weight at each step.

gumbel_logit_init

boolean

default:"False"

Add Gumbel noise to default-init logits (frozen positions excluded) to diverge trajectories.

gumbel_init_alpha

number

default:"1.0"

Divisor for the default-path Gumbel init noise. 1.0 = unscaled; larger shrinks it.

initial_logits

array

Base logit matrix (rows=positions, cols=vocab) that replaces default initialization.

softmax_init_positions

array

Zero-based positions perturbed with Gumbel noise and passed through a softmax over initial logits.

seed

integer

Random seed for reproducible optimization, generator, and constraint tool streams.

tracking_interval

integer

default:"1"

Save history and log progress every N steps. Step 0 and final step always saved.

track_proposals

boolean

default:"False"

Save granular per-proposal results (accept/reject) in history snapshots.

verbose

boolean

default:"False"

Emit per-step debug information about proposals, scores, and acceptance through the logger.

Usage

python

>>> from proto_language.constraint import MalinoisActivityConfig, malinois_activity_constraint
>>> from proto_language.core import Constraint, Construct, Program, Segment
>>> from proto_language.generator import PositionWeightGenerator, PositionWeightGeneratorConfig
>>> from proto_language.optimizer import GradientOptimizer, GradientOptimizerConfig
>>> seg = Segment(sequence="A" * 200, sequence_type="dna", label="enhancer")
>>> gen = PositionWeightGenerator(PositionWeightGeneratorConfig(sampling_mode="argmax"))
>>> gen.assign(seg)
>>> on_target = Constraint(  # differentiable path
...     inputs=[seg],
...     function=malinois_activity_constraint,
...     function_config=MalinoisActivityConfig(cell_type="K562", direction="max"),
...     label="malinois_k562_max",
...     weight=1.0,
... )
>>> off_target = Constraint(
...     inputs=[seg],
...     function=malinois_activity_constraint,
...     function_config=MalinoisActivityConfig(cell_type="HepG2", direction="min"),
...     label="malinois_hepg2_min",
...     weight=1.0,
... )
>>> optimizer = GradientOptimizer(
...     target_segment=seg,
...     constructs=[Construct([seg])],
...     generators=[gen],
...     constraints=[on_target, off_target],
...     config=GradientOptimizerConfig(
...         num_results=20,
...         num_steps=300,
...         lr=0.5,
...         ml_optimizer="adam",
...         merger="weighted_sum",
...         gumbel_logit_init=True,  # diverge the 20 trajectories
...     ),
... )
>>> program = Program([optimizer], num_results=20)
>>> # program.run()  # needs GPU

Metadata

Property	Value
Key	`gradient`
Class	`GradientOptimizer`
Targets Single Segment	`True`
Uses GPU	`False`
Required Constraint Mode	`gradient`
Compatible Generators	`position-weight`

​How It Works

​API Reference

​Usage

​Metadata

How It Works

API Reference

Usage

Metadata