cobre-sddp
alpha
cobre-sddp implements the Stochastic Dual Dynamic Programming (SDDP) algorithm
(Pereira & Pinto, 1991) for long-term hydrothermal dispatch and energy planning.
It is the first algorithm vertical in the Cobre ecosystem: a training loop that
iteratively improves a piecewise-linear approximation of the value function for
multi-stage stochastic linear programs.
For the mathematical foundations — including the Benders decomposition, cut coefficient derivation, and risk measure theory — see the methodology reference.
This crate depends on cobre-core for system data types, cobre-stochastic for
inflow scenario generation and load noise parameters, cobre-solver for LP
subproblem solving, and cobre-comm for distributed communication.
Iteration lifecycle
Each training iteration follows a fixed eight-step sequence. The ordering ensures the lower bound is evaluated after the backward pass and cut synchronization, not during forward synchronization.
┌─────────────────────────────────────────────────────────────────────────┐
│ Step 1 Forward pass │
│ Each rank simulates config.forward_passes scenarios through │
│ all stages, solving the LP at each (scenario, stage) pair with │
│ the current FCF approximation. │
├─────────────────────────────────────────────────────────────────────────┤
│ Step 2 Forward sync │
│ allreduce (sum + broadcast) aggregates local UB statistics into │
│ a global mean, standard deviation, and 95% CI half-width. │
├─────────────────────────────────────────────────────────────────────────┤
│ Step 3 State exchange │
│ allgatherv gathers all ranks' trial point state vectors so │
│ every rank can solve the backward pass at ALL trial points. │
├─────────────────────────────────────────────────────────────────────────┤
│ Step 4 Backward pass │
│ Sweeps stages T-2 down to 0, solving the successor LP under │
│ every opening from the fixed tree, extracting LP duals to form │
│ Benders cut coefficients, and inserting one cut per trial point │
│ per stage into the Future Cost Function (FCF). │
├─────────────────────────────────────────────────────────────────────────┤
│ Step 5 Cut sync │
│ allgatherv shares each rank's newly generated cuts so that all │
│ ranks maintain an identical FCF at the end of each iteration. │
│ │
│ Step 5a Cut management pipeline (optional, two stages) │
│ S1: Strategy-based selection (Level1/LML1/Dominated) — │
│ runs at multiples of check_frequency. Dynamic (DCS) is a │
│ per-solve lazy loop that ignores check_frequency. │
│ S2: Budget enforcement — hard cap on active cuts per stage, │
│ runs every iteration when max_active_per_stage is set. │
│ │
│ Step 5b LB evaluation │
│ Rank 0 solves the stage-0 LP for every opening in the tree │
│ and aggregates the objectives via the stage-0 risk measure. │
│ The scalar lower bound is broadcast to all ranks. │
├─────────────────────────────────────────────────────────────────────────┤
│ Step 6 Convergence check │
│ The ConvergenceMonitor updates bound statistics and evaluates │
│ the configured stopping rules to determine whether to stop. │
├─────────────────────────────────────────────────────────────────────────┤
│ Step 7 Checkpoint │
│ The FlatBuffers policy checkpoint infrastructure is │
│ implemented in cobre-io (write_policy_checkpoint). The CLI │
│ writes a final snapshot after training completes. Periodic │
│ in-loop writes via checkpoint_interval are not yet wired │
│ into the training loop. │
├─────────────────────────────────────────────────────────────────────────┤
│ Step 8 Event emission │
│ TrainingEvent values are sent to the optional event channel │
│ for real-time monitoring by the CLI or TUI layer. │
└─────────────────────────────────────────────────────────────────────────┘
The convergence gap is computed as:
gap = (UB - LB) / max(1.0, |UB|)
The max(1.0, |UB|) guard prevents division by zero when the upper bound is
near zero.
Module overview
| Module | Responsibility |
|---|---|
training | train: the top-level loop orchestrator; wires all steps together |
forward | run_forward_pass, sync_forward: step 1 and step 2 |
state_exchange | ExchangeBuffers: step 3 allgatherv of trial point state vectors |
backward | run_backward_pass: step 4 Benders cut generation with work-stealing parallelism |
cut_sync | CutSyncBuffers: step 5 allgatherv of new cut wire records |
cut_selection | CutSelectionStrategy, CutMetadata, CutActivityUpdates: step 5a Stage 1 pool pruning |
lower_bound | evaluate_lower_bound: step 5b risk-adjusted LB computation (parallelized across openings) |
convergence | ConvergenceMonitor: step 6 bound tracking and stopping rule evaluation |
cut | CutPool, FutureCostFunction, CutRowMap, WARM_START_ITERATION: append-only cut storage with RHS-toggle deactivation, wire format, and LP row mapping |
basis_reconstruct | reconstruct_basis: slot-tracked warm-start basis reconstruction — reconciles stored cut rows by slot identity and classifies newly added cuts at the capture-time state |
config | TrainingConfig: algorithm parameters |
context | StageContext, TrainingContext: hot-path argument bundles that absorb parameters into context structs |
stopping_rule | StoppingRule, StoppingRuleSet, MonitorState: termination criteria |
risk_measure | RiskMeasure, BackwardOutcome: risk-neutral and CVaR aggregation |
horizon_mode | HorizonMode: finite vs. cyclic stage traversal (only Finite currently) |
indexer | StageIndexer, EquipmentCounts, FphaColumnLayout: LP column/row offset arithmetic for stage subproblems |
lp_builder | build_stage_templates, StageTemplates, PatchBuffer: stage template construction, LP scaling, and row-bound patch arrays |
workspace | SolverWorkspace, WorkspacePool, BasisStore, CapturedBasis: per-worker solver instances with pre-allocated scratch buffers and slot-tracked basis storage |
trajectory | TrajectoryRecord: forward pass LP solution record (primal, dual, state, cost) |
noise | Noise-to-RHS-patch logic shared across forward, backward, and simulation passes; includes accumulate_and_shift_lag_state for sub-monthly lag accumulation |
lag_transition | precompute_stage_lag_transitions: builds per-stage StageLagTransition configs from stage dates and lag period boundaries; accumulator seeding from RecentObservation for mid-season starts |
solver_stats | SolverStatsEntry, SolverStatsDelta, aggregate_solver_statistics: per-phase solver statistics delta computation and cross-worker aggregation |
scaling_report | ScalingReport, StageScalingReport, CoefficientRange: LP prescaling diagnostics written to JSON |
simulation | Full simulation pipeline with stage-major loop; all result types (SimulationHydroResult, etc.); simulate, aggregate_simulation |
error | SddpError: unified error type aggregating solver, comm, stochastic, and I/O errors |
fpha_fitting | FPHA fitting pipeline — computes piecewise-linear hydroelectric production hyperplanes from reservoir geometry |
hydro_models | prepare_hydro_models, EvaporationModel, FphaPlane, ResolvedProductionModel: hydro model preprocessing at initialization |
generic_constraints | Generic constraint row entries — user-defined linear constraints with 20 variable types |
inflow_method | InflowNonNegativityMethod: Truncation, Penalty, TruncationWithPenalty, and None strategies |
estimation | EstimationReport, StdRatioDivergence: PAR estimation pipeline outputs |
provenance | ModelProvenanceReport, build_provenance_report: round-trip audit trail for model preprocessing |
stochastic_summary | StochasticSummary, build_stochastic_summary: human-readable summary of stochastic preprocessing |
visited_states | VisitedStatesArchive: forward-pass trial point storage for cut selection and policy diagnostics |
policy_export | Policy checkpoint writing (FlatBuffers cuts, basis, states, metadata) |
policy_load | build_basis_cache_from_checkpoint, validate_policy_compatibility, load_boundary_cuts, inject_boundary_cuts: policy loading for warm-start, resume, and terminal boundary cut injection from external checkpoints |
training_output | build_training_output: assembles all training results for the output writers |
conversion | Type conversion utilities between internal and I/O representations |
setup | StudySetup, StudyParams, prepare_stochastic: pre-built study state; holds four optional scenario libraries (historical_library, external_inflow_library, external_load_library, external_ncs_library) built conditionally from per-class SamplingScheme selections |
Configuration
TrainingConfig
TrainingConfig controls the training loop parameters. All fields are public
and must be set explicitly — there is no Default implementation, preventing
silent misconfigurations.
| Field | Type | Description |
|---|---|---|
forward_passes | u32 | Scenarios per rank per iteration (must be >= 1) |
max_iterations | u64 | Safety bound on total iterations; also sizes the row pool |
checkpoint_interval | Option<u64> | Write checkpoint every N iterations; None = disabled |
warm_start_cuts | Vec<u32> | Per-stage pre-loaded cut counts from a policy file |
event_sender | Option<Sender<TrainingEvent>> | Channel for real-time monitoring events; None = silent |
cut_selection | Option<CutSelectionStrategy> | Stage 1 cut selection strategy; None = no selection |
budget | Option<u32> | Stage 2 max active cuts per stage; None = no budget |
StoppingRuleSet
The stopping rule set composes one or more termination criteria. Every set
must include an IterationLimit rule as a safety bound against infinite loops.
| Rule variant | Trigger condition |
|---|---|
IterationLimit | iteration >= limit |
TimeLimit | wall_time_seconds >= seconds |
BoundStalling | Relative LB improvement over a sliding window falls below tolerance |
SimulationBased | Periodic Monte Carlo simulation costs stabilize |
GracefulShutdown | External SIGTERM / SIGINT received (always evaluated first) |
The mode field controls how multiple rules combine:
StoppingMode::Any(OR): stop when any rule triggers.StoppingMode::All(AND): stop when all rules trigger simultaneously.
use cobre_sddp::stopping_rule::{StoppingMode, StoppingRule, StoppingRuleSet};
let stopping_rules = StoppingRuleSet {
rules: vec![
StoppingRule::IterationLimit { limit: 500 },
StoppingRule::BoundStalling {
tolerance: 0.001,
iterations: 20,
},
StoppingRule::GracefulShutdown,
],
mode: StoppingMode::Any,
};
RiskMeasure
RiskMeasure controls how per-opening backward pass outcomes are aggregated
into Benders cuts and how the lower bound is computed.
| Variant | Description |
|---|---|
Expectation | Risk-neutral expected value. Weights equal opening probabilities. |
CVaR | Convex combination (1 - λ)·E[Z] + λ·CVaR_α[Z]. alpha ∈ (0, 1], lambda ∈ [0, 1]. |
alpha = 1 with CVaR is equivalent to Expectation. lambda = 0 with
CVaR is also equivalent to Expectation. One RiskMeasure value is
assigned per stage from the stages.json configuration field risk_measure.
CutSelectionStrategy
Cut selection is optional. When configured, it forms Stage 1 of the two-stage cut management pipeline that also includes budget enforcement (Stage 2). See the user-facing Performance Accelerators guide for the full pipeline description.
| Variant | Selection mechanism |
|---|---|
Level1 | Deactivates cuts below tie_tolerance of the per-state max at every visited state |
Lml1 | Deactivates cuts that are not the oldest eligible within tie_tolerance at any visited state |
Dominated | Deactivates cuts below threshold of the per-state max at every visited state (all populated cuts) |
Dynamic | Lazy incremental scheme (DCS): adds at most nadic cuts per inner re-solve round (the inner loop repeats up to max_inner_iterations rounds per backward solve) that violate the LP solution by more than epsilon_viol; never deactivates cuts from the pool |
Level1, Lml1, and Dominated respect a check_frequency parameter:
selection only runs at iterations that are multiples of check_frequency
and never at iteration 0. Stage 0 is always exempt.
Level1, Lml1, and Dominated share a single value-evaluation kernel
(select_for_stage in cut_selection.rs) that performs
O(|populated cuts| x |visited states|) work per stage per check.
The VisitedStatesArchive is always collected during training when any
of these three variants is enabled; the archive feeds the kernel for
Level1, Lml1, and Dominated alike. Dominated uses its threshold
field as the tie tolerance; Level1 and Lml1 use tie_tolerance
(default 1e-10).
Dynamic (Dynamic Cut Selection, DCS) operates differently: it is a
per-solve lazy selection loop that adds cuts on demand. It never invokes
the value-evaluation kernel and does not respect check_frequency. The
initial active set is seeded from the active_window most recent
iterations. See the
Performance Accelerators
guide for the full description and the
cut_selection reference for
all DCS parameters.
Key data structures
StudySetup
StudySetup is constructed once by StudySetup::new from a validated System and Config. It owns all precomputed state — stage templates, stochastic context, FCF, indexer, initial state, risk measures, and entity counts — and holds it across async boundaries as owned (non-borrowed) data.
Four optional library fields are built conditionally based on per-class SamplingScheme selections:
| Field | Type | Built when |
|---|---|---|
historical_library | Option<HistoricalScenarioLibrary> | inflow_scheme == SamplingScheme::Historical |
external_inflow_library | Option<ExternalScenarioLibrary> | inflow_scheme == SamplingScheme::External |
external_load_library | Option<ExternalScenarioLibrary> | load_scheme == SamplingScheme::External |
external_ncs_library | Option<ExternalScenarioLibrary> | ncs_scheme == SamplingScheme::External |
Callers borrow StudySetup to construct TrainingContext and StageContext; the public accessor methods (historical_library(), external_inflow_library(), etc.) return Option<&T> and are None for sampling schemes that do not use those libraries.
FutureCostFunction
The Future Cost Function (FCF) holds one CutPool per stage. Each CutPool
is an append-only flat array of cut slots. Cuts are inserted deterministically
by (iteration, forward_pass_index) to guarantee bit-for-bit identical FCF
state across all MPI ranks. Once a slot is populated it retains a stable
integer index for the lifetime of the run — no slot is ever reused or removed.
The FCF is built once before training begins. Total slot capacity is
warm_start_cuts + max_iterations * forward_passes per stage.
Cut deactivation is applied via set_active(stage, slot, false). An inactive
cut remains in storage and in the stage LP; only its row bounds are toggled to
[-f64::INFINITY, +f64::INFINITY], making the constraint trivially satisfied
without affecting the slot index or LP row index. The LP row index of each
cut slot is therefore stable across iterations, including after cut-selection
deactivation.
Two aggregate metrics are available per stage and are written to
training/metadata.json under the row_pool object: cuts_in_lp counts the
rows in the stage LP (active inactive sentinel rows together — equal to
populated_count, the high-water mark of cuts ever inserted at that stage);
cuts_active counts only the currently active subset.
Cut pool memory and LP shape
The stage LP grows monotonically: each stage LP carries
base_rows + populated_count rows, where base_rows is the fixed structural
row count and populated_count is the number of cut slots ever populated at
that stage. Sentinel rows for inactive cuts occupy a row in the LP permanently
but contribute no binding constraint.
The worst-case coefficient storage per rank is bounded by:
populated_per_stage × state_dimension × 8 bytes × num_stages
Inactive cuts still consume pricing time during the LP solve: the row coefficients participate in dual-simplex scanning even when the RHS is at the infinity sentinel. This is a deliberate tradeoff — stable row indices enable allocation-free iteration and correct basis warm-start across cut-set changes, at the cost of a proportionally larger LP for runs that deactivate many cuts.
The cuts_in_lp and cuts_active fields in training/metadata.json under
row_pool expose this tradeoff quantitatively: cuts_in_lp is the total LP
row count (active + inactive), and cuts_active is the active subset. Both
fields are u64 and default to 0 when deserialising older manifests that
lack them.
PatchBuffer
A PatchBuffer holds the pre-allocated row-bound and column-bound arrays
consumed by the LP solver’s set_row_bounds and set_col_bounds calls.
It carries two regions:
-
Row-bound region — sized for
N + M*B + Npatches (N hydros, M stochastic load buses, B max blocks), holding Categories 3, 4, and 5:- Category 3
[0, N)— noise innovation: water-balance RHS at scenario noise. - Category 4
[N, N + M*B_active)— load balance row patches: equality constraint at stochastic load demand per bus per block (optional; empty whenn_load_buses == 0). - Category 5
[N + M*B, 2N + M*B)— z-inflow definition RHS.
- Category 3
-
Column-bound region — sized for
N*(1+L) + A*Kentries (A anticipated thermals, K max lead stages), holding Categories 1, 2, and 6:- Category 1 — incoming storage columns:
col_lower[h] == col_upper[h] == state[h]for each hydroh. - Category 2 — AR lag columns: tight bounds at each lag state value.
- Category 6 — anticipated-state columns: tight bounds at each ring-buffer slot.
- Category 1 — incoming storage columns:
State pinning (Categories 1, 2, 6) is applied exclusively via column bounds
(fill_col_state_patches); there are no equality rows for state fixing.
The backward pass writes only the column-bound region; noise innovations come
from the fixed opening tree and are written to the row-bound region via
fill_forward_patches.
The forward pass writes both regions (fill_forward_patches,
fill_col_state_patches, and optionally fill_load_patches).
When n_load_buses == 0, Category 4 is empty and forward_patch_count
returns N unchanged, so load noise adds no patch entries when absent.
ExchangeBuffers and CutSyncBuffers
Both types pre-allocate all communication buffers once at construction time and reuse them across all stages and iterations. This keeps the per-stage exchange allocation-free on the hot path.
ExchangeBuffers handles the state vector allgatherv (step 3):
- Send buffer:
local_count * n_statefloats. - Receive buffer:
local_count * num_ranks * n_statefloats (rank-major order).
CutSyncBuffers handles the cut wire allgatherv (step 5):
- Send buffer:
max_cuts_per_rank * cut_wire_size(n_state)bytes. - Receive buffer:
max_cuts_per_rank * num_ranks * cut_wire_size(n_state)bytes.
Load noise integration
When load_seasonal_stats.parquet is present in the case directory, the
cobre-io loader populates a PrecomputedNormal (from cobre-stochastic)
alongside the PAR model. This object stores the per-stage, per-bus mean and
standard deviation for stochastic bus demand and the per-block load factors
derived from the seasonal statistics.
The forward and backward passes apply stochastic load noise as follows:
-
Noise drawing: for each stochastic load bus
iat staget, the pass draws a standard normal variateeta(from the shared noise vector whose firstn_hydrosentries are inflow innovations and nextn_load_busesentries are load innovations). The realized demand is:load_rhs[i * K + blk] = max(0, mean(t, i) + std(t, i) * eta) * block_factor(t, i, blk)The
max(0, ...)clamp prevents negative demand.block_factorscales the base realization by the per-block load profile. -
Load patching:
fill_load_patcheswrites eachload_rhsentry into Category 4 of thePatchBuffer, targeting the load balance row for that bus and block. Row indices are provided byload_balance_row_starts(one per stage) andload_bus_indices(position of each stochastic bus within the LP row layout). -
State independence: load noise realizations do not produce additional state variables. The Benders cut coefficients cover only the hydro state dimensions (storage volumes and AR lags). Load noise enters the subproblem purely as a right-hand side perturbation of the bus power balance constraints.
Load noise follows the same PAR(p) framework used for inflow noise — the
combined noise vector [inflow_noise | load_noise] is drawn from the
correlated multivariate normal defined by the StochasticContext. For details
on the PAR(p) model and correlation structure, see the cobre-stochastic
crate page.
Convergence monitoring
ConvergenceMonitor tracks bound statistics and evaluates stopping rules. It
is constructed once before the loop begins and updated at the end of each
iteration via update(lb, &sync_result).
#![allow(unused)]
fn main() {
use cobre_sddp::convergence::ConvergenceMonitor;
use cobre_sddp::forward::SyncResult;
use cobre_sddp::stopping_rule::{StoppingMode, StoppingRule, StoppingRuleSet};
let rule_set = StoppingRuleSet {
rules: vec![StoppingRule::IterationLimit { limit: 100 }],
mode: StoppingMode::Any,
};
let mut monitor = ConvergenceMonitor::new(rule_set);
let sync = SyncResult {
global_ub_mean: 110.0,
global_ub_std: 5.0,
ci_95_half_width: 2.0,
sync_time_ms: 10,
};
let (stop, results) = monitor.update(100.0, &sync);
assert!(!stop);
assert_eq!(monitor.iteration_count(), 1);
// gap = (110 - 100) / max(1.0, 110.0) = 10/110
assert!((monitor.gap() - 10.0 / 110.0).abs() < 1e-10);
}
Accessor methods on ConvergenceMonitor:
| Method | Returns |
|---|---|
lower_bound() | Latest LB value |
upper_bound() | Latest UB mean |
upper_bound_std() | Latest UB standard deviation |
ci_95_half_width() | Latest 95% CI half-width |
gap() | Convergence gap: (UB - LB) / max(1.0, abs(UB)) |
iteration_count() | Number of completed update calls |
set_shutdown() | Signal a graceful shutdown before next update |
Event system
The training loop emits TrainingEvent values (from cobre-core) at each
lifecycle step boundary when config.event_sender is Some. Events carry
structured data for real-time display in the TUI or CLI layers.
Key events emitted during training:
| Event variant | When emitted |
|---|---|
ForwardPassComplete | After step 1 completes for all local scenarios |
ForwardSyncComplete | After step 2 global UB statistics are merged |
BackwardPassComplete | After step 4 row generation for all trial points |
PolicySyncComplete | After step 5 policy-row allgatherv |
PolicySelectionComplete | After step 5a Stage 1 selection (when strategy is set) |
PolicyBudgetEnforcementComplete | After step 5a Stage 2 budget enforcement (when budget is set) |
ConvergenceUpdate | After step 6 stopping rules evaluated |
IterationSummary | At the end of each iteration (LB, UB, gap, timing) |
TrainingFinished | When a stopping rule triggers |
Quick start (pseudocode)
The following shows the shape of a train call. All arguments must be built
from the upstream pipeline (cobre-io for system data, cobre-stochastic for
the opening tree, cobre-solver for the LP solver instance).
use cobre_sddp::{
FutureCostFunction, HorizonMode, RiskMeasure, StageIndexer,
TrainingConfig, TrainingResult,
stopping_rule::{StoppingMode, StoppingRule, StoppingRuleSet},
train,
};
// Build the FCF for num_stages stages, n_state state dimensions,
// forward_passes scenarios per rank, max_iterations iterations.
let mut fcf = FutureCostFunction::new(num_stages, n_state, forward_passes, max_iterations, &vec![0; num_stages]);
let config = TrainingConfig {
forward_passes: 10,
max_iterations: 500,
checkpoint_interval: None,
warm_start_cuts: 0,
event_sender: None,
};
let stopping_rules = StoppingRuleSet {
rules: vec![
StoppingRule::IterationLimit { limit: 500 },
StoppingRule::GracefulShutdown,
],
mode: StoppingMode::Any,
};
let horizon = HorizonMode::Finite { num_stages };
let result: TrainingResult = train(
&mut solver, // SolverInterface impl (e.g., HiGHS)
config,
&mut fcf,
&templates, // one StageTemplate per stage
&base_rows, // AR dynamics base row index per stage
&indexer, // StageIndexer from StageIndexer::new(n_hydro, max_par_order)
&initial_state, // known initial storage volumes
&opening_tree, // from cobre_stochastic::build_stochastic_context
&stochastic, // StochasticContext
&horizon,
&risk_measures, // one RiskMeasure per stage
stopping_rules,
None, // no cut selection in this example
None, // no external shutdown flag
&comm, // Communicator (LocalBackend or FerrompiBackend)
)?;
println!(
"Converged in {} iterations: LB={:.2}, UB={:.2}, gap={:.4}",
result.iterations, result.final_lb, result.final_ub, result.final_gap
);
Per-phase configuration
cobre-sddp defines three algorithmic phases and associates a HighsProfile
with each one. This lets the LP solver be tuned differently for training and
simulation without modifying call sites.
Phase enum
pub enum Phase {
Forward,
Backward,
Simulation,
}
| Variant | When it runs |
|---|---|
Forward | Forward sweep: solving LPs from stage 1 to T to sample trajectories. |
Backward | Backward sweep: solving LPs from stage T to 1 to generate Benders cuts. |
Simulation | Policy simulation: evaluating the trained policy on out-of-sample scenarios. |
Phase is Copy + Eq, so it can be used in match patterns and stored
cheaply by value. Phase::profile() returns the HighsProfile that should be
applied when entering that phase.
Named profile constants
Three pub const values define the per-phase solver configurations:
| Constant | Applied during |
|---|---|
FORWARD_PROFILE | Phase::Forward entry |
BACKWARD_PROFILE | Phase::Backward entry |
SIMULATION_PROFILE | Phase::Simulation entry |
In the current release FORWARD_PROFILE and SIMULATION_PROFILE equal
HighsProfile::default() field-for-field, while BACKWARD_PROFILE overrides
simplex_price_strategy to 2 (RowHyperSparse) to exploit sparsity on the
backward LPs; all other backward fields match the default. Compile-time
assertions in solver_phase.rs catch any future drift between the constants
and their documented values.
Further tuning — particularly of BACKWARD_PROFILE to reduce backward-pass
load imbalance — would update these constants without changing the call sites
or the Phase API.
Orchestrator call sites
Profiles are applied once per phase at the point where a solver workspace is first acquired for that phase:
- Forward sweep — applied in
forward_pass_state.rswhen a worker enters the forward pass. - Backward sweep — applied in
backward_pass_state.rswhen a worker enters the backward pass. - Simulation — applied in
simulation/state.rswhen the simulation pool worker is initialized.
Each call site invokes ProfiledSolver::set_profile with the result of
Phase::Forward.profile(), Phase::Backward.profile(), or
Phase::Simulation.profile(). Because ProfiledSolver skips FFI calls when
the requested profile matches the current one, re-entering the same phase
within a run incurs no overhead.
Error handling
All fallible operations return Result<T, SddpError>. The error type is
Send + Sync + 'static and can be propagated across thread boundaries or
wrapped by anyhow.
SddpError variant | Trigger |
|---|---|
Solver | LP solve failed for numerical or timeout reasons |
Communication | MPI collective operation failed |
Stochastic | Scenario generation or PAR model validation failed |
Io | Case directory loading or validation failed |
Validation | Algorithm configuration is semantically invalid |
Infeasible | LP has no feasible solution (stage, iteration, scenario) |
Simulation | Simulation phase error (LP failure, I/O, policy issue) |
Performance notes
For a comprehensive user-facing guide to all performance optimizations, see the Performance Accelerators chapter.
Pre-allocation discipline
The training loop makes no heap allocations on the hot path inside the iteration loop. All workspace buffers are allocated once before the loop:
WorkspacePool: oneSolverWorkspaceper thread (solver + PatchBuffer + ScratchBuffers + Basis).TrajectoryRecordflat vec:forward_passes * num_stagesrecords.PatchBuffer:N * (2 + L) + M * max_blocksentries per worker.ExchangeBuffers:local_count * num_ranks * n_statefloats.CutSyncBuffers:max_cuts_per_rank * num_ranks * cut_wire_size(n_state)bytes.ScratchBuffers: noise, inflow, lag matrix, PAR, eta, load, z-inflow buffers per worker.BasisStore:forward_passes * num_stagesbasis slots.
Backward pass work-stealing
The inner trial-point loop in the backward pass uses atomic counter
work-stealing (AtomicUsize::fetch_add(1, Relaxed)) instead of static
partitioning. Staged cuts are sorted by trial_point_idx after the parallel
region to preserve bit-for-bit determinism across thread counts.
Model persistence and incremental cuts
CutRowMap provides O(1) slot-to-row lookup so the append path skips cuts
that are already present in a given LP.
Both the stage LP and the LB LP are append-only: cuts are added but never
removed. The stage LP toggles inactive cuts’ RHS to
[-f64::INFINITY, +f64::INFINITY] (trivially satisfied) rather than dropping
the row; the LB LP does not toggle activity at all (it never deactivates cuts).
Cut row positions are stable across iterations in both LPs, and the lower
bound remains monotonically non-decreasing because the LB LP accumulates
every cut ever generated.
Cut wire format
The cut wire format used by CutSyncBuffers is at version 1
(CUT_WIRE_VERSION = 1). Every record is a cut record. Each record carries
a version byte at offset 0 and a record-tag byte at offset 13
(RECORD_TAG_CUT = 0, zeroed padding reserved for future tag dispatch):
- Cut record: a 25-byte fixed header (1 version byte + 24 bytes of
fields: slot index, iteration, forward pass index, 3 padding bytes,
intercept) followed by
n_state * 8bytes of coefficients. The total record size iscut_wire_size(n_state) = 25 + n_state * 8bytes.
Receivers reject any record whose version byte does not equal
CUT_WIRE_VERSION. No compatibility shim is provided; redeploy all nodes
when upgrading.
Basis cache wire format
CapturedBasis owns the pack/unpack layout for broadcasting a stored
basis via to_broadcast_payload and try_from_broadcast_payload. Each
stage’s payload is either a 0_i32 absent-sentinel or a 1_i32
present-sentinel followed by five length fields, the col_status and
row_status slices, the cut_row_slots indices cast to i32, and the
state_at_capture values carried in a separate f64 buffer.
broadcast_basis_cache in training issues four broadcasts per
transfer — i32 length, i32 payload, f64 length, f64 payload — wrapping
the single-stage serialisation in a stage-major loop.
Communication-free parallelism
Forward pass noise is generated without inter-rank communication. Each rank
independently derives its noise seed from (base_seed, iteration, scenario, stage_id)
using deterministic SipHash-1-3 seed derivation from cobre-stochastic. The opening tree is
pre-generated once before training and shared read-only across all iterations.
Solver statistics instrumentation
Per-call, per-phase timing and counting of all solver operations is tracked
in SolverStatistics and written to training/solver/iterations.parquet
and training/solver/retry_histogram.parquet. In multi-threaded runs,
per-worker statistics are aggregated via aggregate_solver_statistics() which
sums all fields across workers.
Testing
cargo test -p cobre-sddp
The crate requires no external system libraries beyond what is needed by the
workspace (HiGHS is always available; MPI is optional via the mpi feature
of cobre-comm).
Test suite overview
The test suite covers:
- Unit tests for each module’s core logic.
- Integration tests using
LocalBackend(single-rank) for the communication-involving modules (forward,backward,cut_sync,state_exchange,lower_bound,training). - Doc-tests for all public types and functions with constructible examples.
Feature flags
cobre-sddp has no optional feature flags of its own. Feature flag propagation
from cobre-comm (the mpi feature) controls whether MPI-based distributed
training is available at link time.
# Cargo.toml
cobre-sddp = { version = "0.1" }