Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

cobre-sddp

alpha

cobre-sddp implements the Stochastic Dual Dynamic Programming (SDDP) algorithm (Pereira & Pinto, 1991) for long-term hydrothermal dispatch and energy planning. It is the first algorithm vertical in the Cobre ecosystem: a training loop that iteratively improves a piecewise-linear approximation of the value function for multi-stage stochastic linear programs.

For the mathematical foundations — including the Benders decomposition, cut coefficient derivation, and risk measure theory — see the methodology reference.

This crate depends on cobre-core for system data types, cobre-stochastic for inflow scenario generation and load noise parameters, cobre-solver for LP subproblem solving, and cobre-comm for distributed communication.

Iteration lifecycle

Each training iteration follows a fixed eight-step sequence. The ordering ensures the lower bound is evaluated after the backward pass and cut synchronization, not during forward synchronization.

┌─────────────────────────────────────────────────────────────────────────┐
│  Step 1  Forward pass                                                   │
│          Each rank simulates config.forward_passes scenarios through     │
│          all stages, solving the LP at each (scenario, stage) pair with  │
│          the current FCF approximation.                                  │
├─────────────────────────────────────────────────────────────────────────┤
│  Step 2  Forward sync                                                   │
│          allreduce (sum + broadcast) aggregates local UB statistics into │
│          a global mean, standard deviation, and 95% CI half-width.      │
├─────────────────────────────────────────────────────────────────────────┤
│  Step 3  State exchange                                                 │
│          allgatherv gathers all ranks' trial point state vectors so     │
│          every rank can solve the backward pass at ALL trial points.    │
├─────────────────────────────────────────────────────────────────────────┤
│  Step 4  Backward pass                                                  │
│          Sweeps stages T-2 down to 0, solving the successor LP under    │
│          every opening from the fixed tree, extracting LP duals to form  │
│          Benders cut coefficients, and inserting one cut per trial point  │
│          per stage into the Future Cost Function (FCF).                  │
├─────────────────────────────────────────────────────────────────────────┤
│  Step 5  Cut sync                                                       │
│          allgatherv shares each rank's newly generated cuts so that all  │
│          ranks maintain an identical FCF at the end of each iteration.  │
│                                                                         │
│  Step 5a Cut management pipeline (optional, two stages)                 │
│          S1: Strategy-based selection (Level1/LML1/Dominated) —         │
│              runs at multiples of check_frequency. Dynamic (DCS) is a   │
│              per-solve lazy loop that ignores check_frequency.          │
│          S2: Budget enforcement — hard cap on active cuts per stage,    │
│              runs every iteration when max_active_per_stage is set.     │
│                                                                         │
│  Step 5b LB evaluation                                                  │
│          Rank 0 solves the stage-0 LP for every opening in the tree    │
│          and aggregates the objectives via the stage-0 risk measure.    │
│          The scalar lower bound is broadcast to all ranks.              │
├─────────────────────────────────────────────────────────────────────────┤
│  Step 6  Convergence check                                              │
│          The ConvergenceMonitor updates bound statistics and evaluates   │
│          the configured stopping rules to determine whether to stop.    │
├─────────────────────────────────────────────────────────────────────────┤
│  Step 7  Checkpoint                                                     │
│          The FlatBuffers policy checkpoint infrastructure is             │
│          implemented in cobre-io (write_policy_checkpoint). The CLI     │
│          writes a final snapshot after training completes. Periodic     │
│          in-loop writes via checkpoint_interval are not yet wired       │
│          into the training loop.                                        │
├─────────────────────────────────────────────────────────────────────────┤
│  Step 8  Event emission                                                 │
│          TrainingEvent values are sent to the optional event channel    │
│          for real-time monitoring by the CLI or TUI layer.              │
└─────────────────────────────────────────────────────────────────────────┘

The convergence gap is computed as:

gap = (UB - LB) / max(1.0, |UB|)

The max(1.0, |UB|) guard prevents division by zero when the upper bound is near zero.

Module overview

ModuleResponsibility
trainingtrain: the top-level loop orchestrator; wires all steps together
forwardrun_forward_pass, sync_forward: step 1 and step 2
state_exchangeExchangeBuffers: step 3 allgatherv of trial point state vectors
backwardrun_backward_pass: step 4 Benders cut generation with work-stealing parallelism
cut_syncCutSyncBuffers: step 5 allgatherv of new cut wire records
cut_selectionCutSelectionStrategy, CutMetadata, CutActivityUpdates: step 5a Stage 1 pool pruning
lower_boundevaluate_lower_bound: step 5b risk-adjusted LB computation (parallelized across openings)
convergenceConvergenceMonitor: step 6 bound tracking and stopping rule evaluation
cutCutPool, FutureCostFunction, CutRowMap, WARM_START_ITERATION: append-only cut storage with RHS-toggle deactivation, wire format, and LP row mapping
basis_reconstructreconstruct_basis: slot-tracked warm-start basis reconstruction — reconciles stored cut rows by slot identity and classifies newly added cuts at the capture-time state
configTrainingConfig: algorithm parameters
contextStageContext, TrainingContext: hot-path argument bundles that absorb parameters into context structs
stopping_ruleStoppingRule, StoppingRuleSet, MonitorState: termination criteria
risk_measureRiskMeasure, BackwardOutcome: risk-neutral and CVaR aggregation
horizon_modeHorizonMode: finite vs. cyclic stage traversal (only Finite currently)
indexerStageIndexer, EquipmentCounts, FphaColumnLayout: LP column/row offset arithmetic for stage subproblems
lp_builderbuild_stage_templates, StageTemplates, PatchBuffer: stage template construction, LP scaling, and row-bound patch arrays
workspaceSolverWorkspace, WorkspacePool, BasisStore, CapturedBasis: per-worker solver instances with pre-allocated scratch buffers and slot-tracked basis storage
trajectoryTrajectoryRecord: forward pass LP solution record (primal, dual, state, cost)
noiseNoise-to-RHS-patch logic shared across forward, backward, and simulation passes; includes accumulate_and_shift_lag_state for sub-monthly lag accumulation
lag_transitionprecompute_stage_lag_transitions: builds per-stage StageLagTransition configs from stage dates and lag period boundaries; accumulator seeding from RecentObservation for mid-season starts
solver_statsSolverStatsEntry, SolverStatsDelta, aggregate_solver_statistics: per-phase solver statistics delta computation and cross-worker aggregation
scaling_reportScalingReport, StageScalingReport, CoefficientRange: LP prescaling diagnostics written to JSON
simulationFull simulation pipeline with stage-major loop; all result types (SimulationHydroResult, etc.); simulate, aggregate_simulation
errorSddpError: unified error type aggregating solver, comm, stochastic, and I/O errors
fpha_fittingFPHA fitting pipeline — computes piecewise-linear hydroelectric production hyperplanes from reservoir geometry
hydro_modelsprepare_hydro_models, EvaporationModel, FphaPlane, ResolvedProductionModel: hydro model preprocessing at initialization
generic_constraintsGeneric constraint row entries — user-defined linear constraints with 20 variable types
inflow_methodInflowNonNegativityMethod: Truncation, Penalty, TruncationWithPenalty, and None strategies
estimationEstimationReport, StdRatioDivergence: PAR estimation pipeline outputs
provenanceModelProvenanceReport, build_provenance_report: round-trip audit trail for model preprocessing
stochastic_summaryStochasticSummary, build_stochastic_summary: human-readable summary of stochastic preprocessing
visited_statesVisitedStatesArchive: forward-pass trial point storage for cut selection and policy diagnostics
policy_exportPolicy checkpoint writing (FlatBuffers cuts, basis, states, metadata)
policy_loadbuild_basis_cache_from_checkpoint, validate_policy_compatibility, load_boundary_cuts, inject_boundary_cuts: policy loading for warm-start, resume, and terminal boundary cut injection from external checkpoints
training_outputbuild_training_output: assembles all training results for the output writers
conversionType conversion utilities between internal and I/O representations
setupStudySetup, StudyParams, prepare_stochastic: pre-built study state; holds four optional scenario libraries (historical_library, external_inflow_library, external_load_library, external_ncs_library) built conditionally from per-class SamplingScheme selections

Configuration

TrainingConfig

TrainingConfig controls the training loop parameters. All fields are public and must be set explicitly — there is no Default implementation, preventing silent misconfigurations.

FieldTypeDescription
forward_passesu32Scenarios per rank per iteration (must be >= 1)
max_iterationsu64Safety bound on total iterations; also sizes the row pool
checkpoint_intervalOption<u64>Write checkpoint every N iterations; None = disabled
warm_start_cutsVec<u32>Per-stage pre-loaded cut counts from a policy file
event_senderOption<Sender<TrainingEvent>>Channel for real-time monitoring events; None = silent
cut_selectionOption<CutSelectionStrategy>Stage 1 cut selection strategy; None = no selection
budgetOption<u32>Stage 2 max active cuts per stage; None = no budget

StoppingRuleSet

The stopping rule set composes one or more termination criteria. Every set must include an IterationLimit rule as a safety bound against infinite loops.

Rule variantTrigger condition
IterationLimititeration >= limit
TimeLimitwall_time_seconds >= seconds
BoundStallingRelative LB improvement over a sliding window falls below tolerance
SimulationBasedPeriodic Monte Carlo simulation costs stabilize
GracefulShutdownExternal SIGTERM / SIGINT received (always evaluated first)

The mode field controls how multiple rules combine:

  • StoppingMode::Any (OR): stop when any rule triggers.
  • StoppingMode::All (AND): stop when all rules trigger simultaneously.
use cobre_sddp::stopping_rule::{StoppingMode, StoppingRule, StoppingRuleSet};

let stopping_rules = StoppingRuleSet {
    rules: vec![
        StoppingRule::IterationLimit { limit: 500 },
        StoppingRule::BoundStalling {
            tolerance: 0.001,
            iterations: 20,
        },
        StoppingRule::GracefulShutdown,
    ],
    mode: StoppingMode::Any,
};

RiskMeasure

RiskMeasure controls how per-opening backward pass outcomes are aggregated into Benders cuts and how the lower bound is computed.

VariantDescription
ExpectationRisk-neutral expected value. Weights equal opening probabilities.
CVaRConvex combination (1 - λ)·E[Z] + λ·CVaR_α[Z]. alpha ∈ (0, 1], lambda ∈ [0, 1].

alpha = 1 with CVaR is equivalent to Expectation. lambda = 0 with CVaR is also equivalent to Expectation. One RiskMeasure value is assigned per stage from the stages.json configuration field risk_measure.

CutSelectionStrategy

Cut selection is optional. When configured, it forms Stage 1 of the two-stage cut management pipeline that also includes budget enforcement (Stage 2). See the user-facing Performance Accelerators guide for the full pipeline description.

VariantSelection mechanism
Level1Deactivates cuts below tie_tolerance of the per-state max at every visited state
Lml1Deactivates cuts that are not the oldest eligible within tie_tolerance at any visited state
DominatedDeactivates cuts below threshold of the per-state max at every visited state (all populated cuts)
DynamicLazy incremental scheme (DCS): adds at most nadic cuts per inner re-solve round (the inner loop repeats up to max_inner_iterations rounds per backward solve) that violate the LP solution by more than epsilon_viol; never deactivates cuts from the pool

Level1, Lml1, and Dominated respect a check_frequency parameter: selection only runs at iterations that are multiples of check_frequency and never at iteration 0. Stage 0 is always exempt.

Level1, Lml1, and Dominated share a single value-evaluation kernel (select_for_stage in cut_selection.rs) that performs O(|populated cuts| x |visited states|) work per stage per check. The VisitedStatesArchive is always collected during training when any of these three variants is enabled; the archive feeds the kernel for Level1, Lml1, and Dominated alike. Dominated uses its threshold field as the tie tolerance; Level1 and Lml1 use tie_tolerance (default 1e-10).

Dynamic (Dynamic Cut Selection, DCS) operates differently: it is a per-solve lazy selection loop that adds cuts on demand. It never invokes the value-evaluation kernel and does not respect check_frequency. The initial active set is seeded from the active_window most recent iterations. See the Performance Accelerators guide for the full description and the cut_selection reference for all DCS parameters.

Key data structures

StudySetup

StudySetup is constructed once by StudySetup::new from a validated System and Config. It owns all precomputed state — stage templates, stochastic context, FCF, indexer, initial state, risk measures, and entity counts — and holds it across async boundaries as owned (non-borrowed) data.

Four optional library fields are built conditionally based on per-class SamplingScheme selections:

FieldTypeBuilt when
historical_libraryOption<HistoricalScenarioLibrary>inflow_scheme == SamplingScheme::Historical
external_inflow_libraryOption<ExternalScenarioLibrary>inflow_scheme == SamplingScheme::External
external_load_libraryOption<ExternalScenarioLibrary>load_scheme == SamplingScheme::External
external_ncs_libraryOption<ExternalScenarioLibrary>ncs_scheme == SamplingScheme::External

Callers borrow StudySetup to construct TrainingContext and StageContext; the public accessor methods (historical_library(), external_inflow_library(), etc.) return Option<&T> and are None for sampling schemes that do not use those libraries.

FutureCostFunction

The Future Cost Function (FCF) holds one CutPool per stage. Each CutPool is an append-only flat array of cut slots. Cuts are inserted deterministically by (iteration, forward_pass_index) to guarantee bit-for-bit identical FCF state across all MPI ranks. Once a slot is populated it retains a stable integer index for the lifetime of the run — no slot is ever reused or removed.

The FCF is built once before training begins. Total slot capacity is warm_start_cuts + max_iterations * forward_passes per stage.

Cut deactivation is applied via set_active(stage, slot, false). An inactive cut remains in storage and in the stage LP; only its row bounds are toggled to [-f64::INFINITY, +f64::INFINITY], making the constraint trivially satisfied without affecting the slot index or LP row index. The LP row index of each cut slot is therefore stable across iterations, including after cut-selection deactivation.

Two aggregate metrics are available per stage and are written to training/metadata.json under the row_pool object: cuts_in_lp counts the rows in the stage LP (active inactive sentinel rows together — equal to populated_count, the high-water mark of cuts ever inserted at that stage); cuts_active counts only the currently active subset.

Cut pool memory and LP shape

The stage LP grows monotonically: each stage LP carries base_rows + populated_count rows, where base_rows is the fixed structural row count and populated_count is the number of cut slots ever populated at that stage. Sentinel rows for inactive cuts occupy a row in the LP permanently but contribute no binding constraint.

The worst-case coefficient storage per rank is bounded by:

populated_per_stage × state_dimension × 8 bytes × num_stages

Inactive cuts still consume pricing time during the LP solve: the row coefficients participate in dual-simplex scanning even when the RHS is at the infinity sentinel. This is a deliberate tradeoff — stable row indices enable allocation-free iteration and correct basis warm-start across cut-set changes, at the cost of a proportionally larger LP for runs that deactivate many cuts.

The cuts_in_lp and cuts_active fields in training/metadata.json under row_pool expose this tradeoff quantitatively: cuts_in_lp is the total LP row count (active + inactive), and cuts_active is the active subset. Both fields are u64 and default to 0 when deserialising older manifests that lack them.

PatchBuffer

A PatchBuffer holds the pre-allocated row-bound and column-bound arrays consumed by the LP solver’s set_row_bounds and set_col_bounds calls. It carries two regions:

  • Row-bound region — sized for N + M*B + N patches (N hydros, M stochastic load buses, B max blocks), holding Categories 3, 4, and 5:

    • Category 3 [0, N) — noise innovation: water-balance RHS at scenario noise.
    • Category 4 [N, N + M*B_active) — load balance row patches: equality constraint at stochastic load demand per bus per block (optional; empty when n_load_buses == 0).
    • Category 5 [N + M*B, 2N + M*B) — z-inflow definition RHS.
  • Column-bound region — sized for N*(1+L) + A*K entries (A anticipated thermals, K max lead stages), holding Categories 1, 2, and 6:

    • Category 1 — incoming storage columns: col_lower[h] == col_upper[h] == state[h] for each hydro h.
    • Category 2 — AR lag columns: tight bounds at each lag state value.
    • Category 6 — anticipated-state columns: tight bounds at each ring-buffer slot.

State pinning (Categories 1, 2, 6) is applied exclusively via column bounds (fill_col_state_patches); there are no equality rows for state fixing. The backward pass writes only the column-bound region; noise innovations come from the fixed opening tree and are written to the row-bound region via fill_forward_patches. The forward pass writes both regions (fill_forward_patches, fill_col_state_patches, and optionally fill_load_patches).

When n_load_buses == 0, Category 4 is empty and forward_patch_count returns N unchanged, so load noise adds no patch entries when absent.

ExchangeBuffers and CutSyncBuffers

Both types pre-allocate all communication buffers once at construction time and reuse them across all stages and iterations. This keeps the per-stage exchange allocation-free on the hot path.

ExchangeBuffers handles the state vector allgatherv (step 3):

  • Send buffer: local_count * n_state floats.
  • Receive buffer: local_count * num_ranks * n_state floats (rank-major order).

CutSyncBuffers handles the cut wire allgatherv (step 5):

  • Send buffer: max_cuts_per_rank * cut_wire_size(n_state) bytes.
  • Receive buffer: max_cuts_per_rank * num_ranks * cut_wire_size(n_state) bytes.

Load noise integration

When load_seasonal_stats.parquet is present in the case directory, the cobre-io loader populates a PrecomputedNormal (from cobre-stochastic) alongside the PAR model. This object stores the per-stage, per-bus mean and standard deviation for stochastic bus demand and the per-block load factors derived from the seasonal statistics.

The forward and backward passes apply stochastic load noise as follows:

  1. Noise drawing: for each stochastic load bus i at stage t, the pass draws a standard normal variate eta (from the shared noise vector whose first n_hydros entries are inflow innovations and next n_load_buses entries are load innovations). The realized demand is:

    load_rhs[i * K + blk] = max(0, mean(t, i) + std(t, i) * eta) * block_factor(t, i, blk)
    

    The max(0, ...) clamp prevents negative demand. block_factor scales the base realization by the per-block load profile.

  2. Load patching: fill_load_patches writes each load_rhs entry into Category 4 of the PatchBuffer, targeting the load balance row for that bus and block. Row indices are provided by load_balance_row_starts (one per stage) and load_bus_indices (position of each stochastic bus within the LP row layout).

  3. State independence: load noise realizations do not produce additional state variables. The Benders cut coefficients cover only the hydro state dimensions (storage volumes and AR lags). Load noise enters the subproblem purely as a right-hand side perturbation of the bus power balance constraints.

Load noise follows the same PAR(p) framework used for inflow noise — the combined noise vector [inflow_noise | load_noise] is drawn from the correlated multivariate normal defined by the StochasticContext. For details on the PAR(p) model and correlation structure, see the cobre-stochastic crate page.

Convergence monitoring

ConvergenceMonitor tracks bound statistics and evaluates stopping rules. It is constructed once before the loop begins and updated at the end of each iteration via update(lb, &sync_result).

#![allow(unused)]
fn main() {
use cobre_sddp::convergence::ConvergenceMonitor;
use cobre_sddp::forward::SyncResult;
use cobre_sddp::stopping_rule::{StoppingMode, StoppingRule, StoppingRuleSet};

let rule_set = StoppingRuleSet {
    rules: vec![StoppingRule::IterationLimit { limit: 100 }],
    mode: StoppingMode::Any,
};

let mut monitor = ConvergenceMonitor::new(rule_set);

let sync = SyncResult {
    global_ub_mean: 110.0,
    global_ub_std: 5.0,
    ci_95_half_width: 2.0,
    sync_time_ms: 10,
};

let (stop, results) = monitor.update(100.0, &sync);
assert!(!stop);
assert_eq!(monitor.iteration_count(), 1);
// gap = (110 - 100) / max(1.0, 110.0) = 10/110
assert!((monitor.gap() - 10.0 / 110.0).abs() < 1e-10);
}

Accessor methods on ConvergenceMonitor:

MethodReturns
lower_bound()Latest LB value
upper_bound()Latest UB mean
upper_bound_std()Latest UB standard deviation
ci_95_half_width()Latest 95% CI half-width
gap()Convergence gap: (UB - LB) / max(1.0, abs(UB))
iteration_count()Number of completed update calls
set_shutdown()Signal a graceful shutdown before next update

Event system

The training loop emits TrainingEvent values (from cobre-core) at each lifecycle step boundary when config.event_sender is Some. Events carry structured data for real-time display in the TUI or CLI layers.

Key events emitted during training:

Event variantWhen emitted
ForwardPassCompleteAfter step 1 completes for all local scenarios
ForwardSyncCompleteAfter step 2 global UB statistics are merged
BackwardPassCompleteAfter step 4 row generation for all trial points
PolicySyncCompleteAfter step 5 policy-row allgatherv
PolicySelectionCompleteAfter step 5a Stage 1 selection (when strategy is set)
PolicyBudgetEnforcementCompleteAfter step 5a Stage 2 budget enforcement (when budget is set)
ConvergenceUpdateAfter step 6 stopping rules evaluated
IterationSummaryAt the end of each iteration (LB, UB, gap, timing)
TrainingFinishedWhen a stopping rule triggers

Quick start (pseudocode)

The following shows the shape of a train call. All arguments must be built from the upstream pipeline (cobre-io for system data, cobre-stochastic for the opening tree, cobre-solver for the LP solver instance).

use cobre_sddp::{
    FutureCostFunction, HorizonMode, RiskMeasure, StageIndexer,
    TrainingConfig, TrainingResult,
    stopping_rule::{StoppingMode, StoppingRule, StoppingRuleSet},
    train,
};

// Build the FCF for num_stages stages, n_state state dimensions,
// forward_passes scenarios per rank, max_iterations iterations.
let mut fcf = FutureCostFunction::new(num_stages, n_state, forward_passes, max_iterations, &vec![0; num_stages]);

let config = TrainingConfig {
    forward_passes: 10,
    max_iterations: 500,
    checkpoint_interval: None,
    warm_start_cuts: 0,
    event_sender: None,
};

let stopping_rules = StoppingRuleSet {
    rules: vec![
        StoppingRule::IterationLimit { limit: 500 },
        StoppingRule::GracefulShutdown,
    ],
    mode: StoppingMode::Any,
};

let horizon = HorizonMode::Finite { num_stages };

let result: TrainingResult = train(
    &mut solver,        // SolverInterface impl (e.g., HiGHS)
    config,
    &mut fcf,
    &templates,         // one StageTemplate per stage
    &base_rows,         // AR dynamics base row index per stage
    &indexer,           // StageIndexer from StageIndexer::new(n_hydro, max_par_order)
    &initial_state,     // known initial storage volumes
    &opening_tree,      // from cobre_stochastic::build_stochastic_context
    &stochastic,        // StochasticContext
    &horizon,
    &risk_measures,     // one RiskMeasure per stage
    stopping_rules,
    None,               // no cut selection in this example
    None,               // no external shutdown flag
    &comm,              // Communicator (LocalBackend or FerrompiBackend)
)?;

println!(
    "Converged in {} iterations: LB={:.2}, UB={:.2}, gap={:.4}",
    result.iterations, result.final_lb, result.final_ub, result.final_gap
);

Per-phase configuration

cobre-sddp defines three algorithmic phases and associates a HighsProfile with each one. This lets the LP solver be tuned differently for training and simulation without modifying call sites.

Phase enum

pub enum Phase {
    Forward,
    Backward,
    Simulation,
}
VariantWhen it runs
ForwardForward sweep: solving LPs from stage 1 to T to sample trajectories.
BackwardBackward sweep: solving LPs from stage T to 1 to generate Benders cuts.
SimulationPolicy simulation: evaluating the trained policy on out-of-sample scenarios.

Phase is Copy + Eq, so it can be used in match patterns and stored cheaply by value. Phase::profile() returns the HighsProfile that should be applied when entering that phase.

Named profile constants

Three pub const values define the per-phase solver configurations:

ConstantApplied during
FORWARD_PROFILEPhase::Forward entry
BACKWARD_PROFILEPhase::Backward entry
SIMULATION_PROFILEPhase::Simulation entry

In the current release FORWARD_PROFILE and SIMULATION_PROFILE equal HighsProfile::default() field-for-field, while BACKWARD_PROFILE overrides simplex_price_strategy to 2 (RowHyperSparse) to exploit sparsity on the backward LPs; all other backward fields match the default. Compile-time assertions in solver_phase.rs catch any future drift between the constants and their documented values.

Further tuning — particularly of BACKWARD_PROFILE to reduce backward-pass load imbalance — would update these constants without changing the call sites or the Phase API.

Orchestrator call sites

Profiles are applied once per phase at the point where a solver workspace is first acquired for that phase:

  • Forward sweep — applied in forward_pass_state.rs when a worker enters the forward pass.
  • Backward sweep — applied in backward_pass_state.rs when a worker enters the backward pass.
  • Simulation — applied in simulation/state.rs when the simulation pool worker is initialized.

Each call site invokes ProfiledSolver::set_profile with the result of Phase::Forward.profile(), Phase::Backward.profile(), or Phase::Simulation.profile(). Because ProfiledSolver skips FFI calls when the requested profile matches the current one, re-entering the same phase within a run incurs no overhead.

Error handling

All fallible operations return Result<T, SddpError>. The error type is Send + Sync + 'static and can be propagated across thread boundaries or wrapped by anyhow.

SddpError variantTrigger
SolverLP solve failed for numerical or timeout reasons
CommunicationMPI collective operation failed
StochasticScenario generation or PAR model validation failed
IoCase directory loading or validation failed
ValidationAlgorithm configuration is semantically invalid
InfeasibleLP has no feasible solution (stage, iteration, scenario)
SimulationSimulation phase error (LP failure, I/O, policy issue)

Performance notes

For a comprehensive user-facing guide to all performance optimizations, see the Performance Accelerators chapter.

Pre-allocation discipline

The training loop makes no heap allocations on the hot path inside the iteration loop. All workspace buffers are allocated once before the loop:

  • WorkspacePool: one SolverWorkspace per thread (solver + PatchBuffer + ScratchBuffers + Basis).
  • TrajectoryRecord flat vec: forward_passes * num_stages records.
  • PatchBuffer: N * (2 + L) + M * max_blocks entries per worker.
  • ExchangeBuffers: local_count * num_ranks * n_state floats.
  • CutSyncBuffers: max_cuts_per_rank * num_ranks * cut_wire_size(n_state) bytes.
  • ScratchBuffers: noise, inflow, lag matrix, PAR, eta, load, z-inflow buffers per worker.
  • BasisStore: forward_passes * num_stages basis slots.

Backward pass work-stealing

The inner trial-point loop in the backward pass uses atomic counter work-stealing (AtomicUsize::fetch_add(1, Relaxed)) instead of static partitioning. Staged cuts are sorted by trial_point_idx after the parallel region to preserve bit-for-bit determinism across thread counts.

Model persistence and incremental cuts

CutRowMap provides O(1) slot-to-row lookup so the append path skips cuts that are already present in a given LP.

Both the stage LP and the LB LP are append-only: cuts are added but never removed. The stage LP toggles inactive cuts’ RHS to [-f64::INFINITY, +f64::INFINITY] (trivially satisfied) rather than dropping the row; the LB LP does not toggle activity at all (it never deactivates cuts). Cut row positions are stable across iterations in both LPs, and the lower bound remains monotonically non-decreasing because the LB LP accumulates every cut ever generated.

Cut wire format

The cut wire format used by CutSyncBuffers is at version 1 (CUT_WIRE_VERSION = 1). Every record is a cut record. Each record carries a version byte at offset 0 and a record-tag byte at offset 13 (RECORD_TAG_CUT = 0, zeroed padding reserved for future tag dispatch):

  • Cut record: a 25-byte fixed header (1 version byte + 24 bytes of fields: slot index, iteration, forward pass index, 3 padding bytes, intercept) followed by n_state * 8 bytes of coefficients. The total record size is cut_wire_size(n_state) = 25 + n_state * 8 bytes.

Receivers reject any record whose version byte does not equal CUT_WIRE_VERSION. No compatibility shim is provided; redeploy all nodes when upgrading.

Basis cache wire format

CapturedBasis owns the pack/unpack layout for broadcasting a stored basis via to_broadcast_payload and try_from_broadcast_payload. Each stage’s payload is either a 0_i32 absent-sentinel or a 1_i32 present-sentinel followed by five length fields, the col_status and row_status slices, the cut_row_slots indices cast to i32, and the state_at_capture values carried in a separate f64 buffer. broadcast_basis_cache in training issues four broadcasts per transfer — i32 length, i32 payload, f64 length, f64 payload — wrapping the single-stage serialisation in a stage-major loop.

Communication-free parallelism

Forward pass noise is generated without inter-rank communication. Each rank independently derives its noise seed from (base_seed, iteration, scenario, stage_id) using deterministic SipHash-1-3 seed derivation from cobre-stochastic. The opening tree is pre-generated once before training and shared read-only across all iterations.

Solver statistics instrumentation

Per-call, per-phase timing and counting of all solver operations is tracked in SolverStatistics and written to training/solver/iterations.parquet and training/solver/retry_histogram.parquet. In multi-threaded runs, per-worker statistics are aggregated via aggregate_solver_statistics() which sums all fields across workers.

Testing

cargo test -p cobre-sddp

The crate requires no external system libraries beyond what is needed by the workspace (HiGHS is always available; MPI is optional via the mpi feature of cobre-comm).

Test suite overview

The test suite covers:

  • Unit tests for each module’s core logic.
  • Integration tests using LocalBackend (single-rank) for the communication-involving modules (forward, backward, cut_sync, state_exchange, lower_bound, training).
  • Doc-tests for all public types and functions with constructible examples.

Feature flags

cobre-sddp has no optional feature flags of its own. Feature flag propagation from cobre-comm (the mpi feature) controls whether MPI-based distributed training is available at link time.

# Cargo.toml
cobre-sddp = { version = "0.1" }