Beam Search Optimizer

This optimizer is open source. Any third-party models, product names, or trademarks referenced are the property of their respective owners, and Proto is not affiliated with them.

Source

proto-bio/proto-language/proto_language/optimizer/beam_search_optimizer.py

View source Beam search optimizer for sequence generation. This optimizer implements beam search for sequence optimization where a single target segment is generated with beam search. The optimizer maintains K beams (running sequences) and generates K x N total proposals at each step by producing N variations per beam. After constraint evaluation on the FULL accumulated sequence, only the top K sequences by energy are retained for the next step. The segment is split into ceil(sequence_length / beam_length) steps. Each step asks the single autoregressive generator for beam_length new tokens per proposal (the last step is truncated to the remaining tokens), scores each proposal on its full accumulated sequence, and resamples any beam left with fewer than proposals_per_result valid proposals (up to max_resample_attempts, raising RuntimeError if a beam still falls short) before ranking. Within a beam, proposals are kept by their most recent step energy; across beams, the top num_results survivors are ranked by score_by (“mean” averages a beam’s per-step energies, “last” uses only the most recent), become the next step’s parent beams, and seed the final result_sequences. prepend_prompt controls whether the prompt is included in the output, and use_kv_caching reuses generator cache state across steps (requires a KV-cache-capable generator). Use it for long autoregressive design under sequence-level constraints; it targets a single segment and requires a protein/DNA language-model generator (Evo1/Evo2/ProGen2), not a CPU generator.

How It Works

Beam search extends a single segment step by step, pruning weak beams as it grows. Beam search grows one segment left to right. The segment is split into num_beams = ⌈ L / beam_length ⌉ steps; at each step every one of the K = num_results beams is extended by beam_length tokens in proposals_per_result variations, each candidate is scored on its full accumulated sequence, and only the top K beams survive:

num_beams = ⌈ L / beam_length ⌉
score_agg(beam) = mean(beam_scores)   if score_by = "mean"
                = beam_scores[-1]      if score_by = "last"
keep the K beams with smallest score_agg

Beams left with too few valid proposals are resampled (up to max_resample_attempts). Beam search starts from prompt and ignores upstream results; use_kv_caching reuses the generator’s KV cache across steps.

API Reference

ConfigBeamSearchOptimizerConfig Source

Configuration object for BeamSearchOptimizer.This class defines configuration parameters for the beam search optimizer, which generates a single long segment by splitting it into beams of beam_length tokens and performing beam search at each beam boundary.

prompt

string

required

Non-empty seed sequence that every beam begins from and extends (e.g. ‘ATCG’ for DNA).

num_results

integer

Number of beams (top-K by energy) retained at each beam boundary. Overrides program-level count.

proposals_per_result

integer

required

Number of proposals to generate per result sequence at each beam step.

beam_length

integer

required

Tokens per beam-search step before re-ranking; segment split into ceil(len/this) steps.

score_by

enum

default:"mean"

‘mean’ averages a beam’s per-step energies across all steps; ‘last’ uses only the most recent.Options: mean, last

prepend_prompt

boolean

default:"True"

Whether to prepend the prompt to the generated sequence in the output.

use_kv_caching

boolean

default:"False"

Reuse cached KV state across beam steps to speed up generation; needs a KV-capable generator.

max_resample_attempts

integer

default:"3"

Maximum number of times to resample beams with invalid (inf/NaN) energies before giving up.

seed

integer

Random seed for reproducible optimization, generator, and constraint tool streams.

tracking_interval

integer

default:"1"

Save history and log progress every N steps. Step 0 and final step always saved.

track_proposals

boolean

default:"False"

Save granular per-proposal results (accept/reject) in history snapshots.

verbose

boolean

default:"False"

Emit per-step debug information about proposals, scores, and acceptance through the logger.

Usage

python

>>> from proto_language.constraint import gc_content_constraint
>>> from proto_language.core import Constraint, Construct, Segment
>>> from proto_language.generator import Evo2Generator, Evo2GeneratorConfig
>>>
>>> segment = Segment(length=10000, sequence_type="dna")
>>> generator = Evo2Generator(Evo2GeneratorConfig(prompts="ATCG"))
>>> gc = Constraint(
...     inputs=[segment], function=gc_content_constraint, function_config={"min_gc": 40, "max_gc": 60}
... )
>>> beam_search = BeamSearchOptimizer(
...     target_segment=segment,
...     constructs=[Construct([segment])],
...     generators=[generator],
...     constraints=[gc],
...     config=BeamSearchOptimizerConfig(
...         prompt="ATCG", beam_length=2000, num_results=5, proposals_per_result=10
...     ),
... )
>>> # beam_search.run() drives the loop

Metadata

Property	Value
Key	`beam-search`
Class	`BeamSearchOptimizer`
Targets Single Segment	`True`
Uses GPU	`False`
Compatible Generators	`evo1`, `evo2`, `progen2`

​How It Works

​API Reference

​Usage

​Metadata

How It Works

API Reference

Usage

Metadata