Stochastic Modeling
Hydrothermal dispatch is inherently uncertain. Reservoir inflows depend on rainfall and snowmelt that cannot be known in advance, and electrical load varies in ways that are predictable in aggregate but noisy at any given moment. A dispatch policy that ignores uncertainty will systematically under-prepare for dry periods and over-commit thermal capacity in wet years.
Cobre addresses this by treating inflows and loads as stochastic processes. During training, the solver samples many scenario trajectories and builds a policy that performs well across the distribution of possible futures — not just for a single forecast. The stochastic layer is responsible for generating those scenario trajectories in a statistically sound, reproducible way.
The stochastic models are driven by historical statistics provided by the user
in the scenarios/ directory of the case. If no scenarios/ directory is
present, Cobre falls back to white-noise generation using only the stage
definitions in stages.json. For any study with real hydro plants, providing
historical inflow statistics gives the PAR(p) model the seasonal means, standard
deviations, and AR structure it needs; without it, Cobre falls back to white
noise, which does not reflect real inflow dynamics.
The scenarios/ Directory
The scenarios/ directory sits alongside the other input files in the case
directory:
my_study/
config.json
stages.json
...
scenarios/
inflow_seasonal_stats.parquet
load_seasonal_stats.parquet
inflow_ar_coefficients.parquet (when PAR model order > 0)
inflow_history.parquet (alternative to pre-computed stats)
non_controllable_stats.parquet (stochastic NCS availability)
external_inflow_scenarios.parquet (per-class external inflow)
external_load_scenarios.parquet (per-class external load)
external_ncs_scenarios.parquet (per-class external NCS)
correlation.json
noise_openings.parquet (user-supplied opening tree, optional)
The directory is optional. When it is absent, Cobre generates independent standard-normal noise at each stage for each hydro plant and scales it by a default standard deviation — effectively treating all uncertainty as white noise. This is sufficient for verifying a case loads correctly, but is not representative of real inflow dynamics.
When scenarios/ is present, Cobre reads the Parquet files and fits a
Periodic Autoregressive (PAR(p)) model for each hydro plant and each bus.
The fitted model generates correlated, seasonally-varying inflow and load
trajectories that reflect the historical statistics you supply.
Inflow Statistics
inflow_seasonal_stats.parquet provides the seasonal distribution of
historical inflows for every (hydro plant, stage) pair.
Schema
| Column | Type | Nullable | Description |
|---|---|---|---|
hydro_id | INT32 | No | Hydro plant identifier (matches id in hydros.json) |
stage_id | INT32 | No | Stage identifier (matches id in stages.json) |
mean_m3s | DOUBLE | No | Seasonal mean inflow in m³/s (must be finite) |
std_m3s | DOUBLE | No | Seasonal standard deviation in m³/s (must be >= 0) |
The file must contain exactly one row per (hydro_id, stage_id) pair.
Every hydro plant defined in hydros.json must have a row for every stage
defined in stages.json. The validator will reject the case if any
combination is missing. The AR model order (number of lags) is determined
from the inflow_ar_coefficients.parquet file when present, not from this file.
For the 1dtoy example, the file has 4 rows — one for each of the four
monthly stages — for the single hydro plant UHE1 (hydro_id = 0).
Inspecting the file
# Polars
import polars as pl
df = pl.read_parquet("scenarios/inflow_seasonal_stats.parquet")
print(df)
# Pandas
import pandas as pd
df = pd.read_parquet("scenarios/inflow_seasonal_stats.parquet")
print(df)
-- DuckDB
SELECT * FROM read_parquet('scenarios/inflow_seasonal_stats.parquet');
# R with arrow
library(arrow)
df <- read_parquet("scenarios/inflow_seasonal_stats.parquet")
print(df)
Load Statistics
load_seasonal_stats.parquet provides the seasonal distribution of
electrical demand at each bus. It drives the stochastic load model used
during training and simulation.
Schema
| Column | Type | Nullable | Description |
|---|---|---|---|
bus_id | INT32 | No | Bus identifier (matches id in buses.json) |
stage_id | INT32 | No | Stage identifier (matches id in stages.json) |
mean_mw | DOUBLE | No | Seasonal mean load in MW (must be finite) |
std_mw | DOUBLE | No | Seasonal standard deviation in MW (must be >= 0, 0 = deterministic) |
One row per (bus_id, stage_id) pair is required. Every bus in buses.json
must have a row for every stage. The load mean and standard deviation determine
both the expected demand level and how much it varies across scenarios in each
stage. A std_mw of 0.0 indicates deterministic load for that bus-stage pair.
The PAR(p) Model
PAR(p) stands for Periodic Autoregressive model of order p. It is the standard model for hydro inflow time series in long-term hydrothermal planning because inflows have two key properties the model captures well: seasonal patterns (wet seasons and dry seasons recur predictably each year) and autocorrelation (a wet month tends to be followed by another wet month, and vice versa).
What the AR order controls
The AR order (number of autoregressive lags) is determined by the
inflow_ar_coefficients.parquet file. If the file is absent or contains
no coefficients for a given (hydro_id, stage_id), the model defaults to
white noise (order 0). When estimated from history, the order is selected
automatically via PACF (see Estimation from History).
Order 0 — white noise. The inflow at each stage is drawn independently from a normal distribution with the specified mean and standard deviation. There is no memory between stages: knowing last month’s inflow tells you nothing about this month’s. This is the simplest setting and appropriate when you lack historical data to fit AR coefficients, or when the inflow series shows very little autocorrelation.
Order > 0 — periodic autoregressive. The inflow at each stage depends on the inflows at the preceding p stages, weighted by coefficients that reflect the seasonal autocorrelation structure. A wet period is followed by another wet period with the probability implied by the coefficients. Higher AR orders capture longer-range dependencies: order 1 captures month-to-month persistence, order 2 adds two-month memory, and so on. Monthly inflow series often show strong order-1 or order-2 autocorrelation; validate against your data.
AR coefficients file
When a non-trivial AR model is desired, Cobre requires an
inflow_ar_coefficients.parquet file in the scenarios/ directory. This
file contains the fitted AR coefficients in standardized form (as produced
by the periodic Yule-Walker equations). The schema and the fitting procedure
are documented in the Case Format Reference.
The 1dtoy example has no AR coefficients file, so all inflows use white
noise (order 0).
When to use higher AR orders
In general:
- Use order 0 when historical data is short or when you want to establish a baseline with the simplest possible model.
- Use order 1 for most real hydro systems. Monthly inflows have strong one-month autocorrelation, and a first-order model captures the bulk of it.
- Use order 2 or higher when the inflow series shows multi-month persistence (common in systems with large upstream catchments or snowmelt storage). Validate with autocorrelation plots of your historical data.
- AR coefficients require
std_m3s > 0in the corresponding seasonal statistics — zero variance makes the model non-identifiable.
For the theoretical derivation of the PAR(p) model, see Stochastic Modeling and PAR(p) Autoregressive Models in the methodology reference.
Annual component (PAR(p)-A)
Some hydro systems show persistence that spans more than one or two months — the kind of year-long memory that a standard PAR(p) model cannot capture with a few short lags. The annual component extension (PAR(p)-A) addresses this by adding one extra term to the autoregressive equation: the rolling 12-month average of the inflow series, which acts as a slow-moving background signal.
When to use it. Enable the annual component when your historical inflow series displays multi-year persistence or when a standard PAR model leaves significant residual autocorrelation at annual lags. It is most useful for systems with large upstream catchments where wet or dry conditions accumulate over an entire hydrological year.
How to enable it. Set "order_selection": "pacf_annual" in the estimation block
of config.json. No other configuration change is required; Cobre detects the setting
and extends the estimation pipeline automatically.
What it produces. In addition to the standard estimation outputs, Cobre writes
inflow_annual_component.parquet to the output directory. This file contains five
columns — hydro_id, stage_id, annual_coefficient, annual_mean_m3s, and
annual_std_m3s — one row per (hydro, stage) pair. The AnnualComponent type on
InflowModel carries the same three values at runtime.
For the mathematical derivation of the PAR(p)-A model, see PAR(p) Autoregressive Models in the methodology reference.
Estimation from History
Instead of supplying pre-computed seasonal statistics in
inflow_seasonal_stats.parquet, you can provide raw historical inflow
observations and let Cobre estimate the PAR(p) parameters for you.
Input: inflow_history.parquet
Place inflow_history.parquet in the scenarios/ directory. The schema
and required column types are documented in the
Case Format Reference. Each row represents
one historical observation of inflow at a given hydro plant and stage.
What Cobre estimates
When inflow_history.parquet is present, Cobre performs the following
estimation steps automatically before building the scenario model:
-
Seasonal statistics — mean and standard deviation are computed from the historical observations for each (hydro plant, stage) pair. These replace the values you would otherwise provide in
inflow_seasonal_stats.parquet. -
History classification — Each (hydro plant, stage) observation series is classified before fitting. Constant or near-constant series, saturating caps, and series dominated by a single modal value are detected automatically and routed to a degenerate fit (order 0) so that downstream stages do not over-fit a structurally uninformative bucket. Series with more than 10% strictly negative observations are flagged for diagnostics but otherwise fitted normally.
-
AR order selection — Cobre evaluates candidate orders and selects the best fit per (hydro plant, stage) using the periodic partial autocorrelation function (PACF) with a 95% significance threshold. This avoids overfitting in series with little autocorrelation and captures meaningful persistence where it exists. Two extensions over the classical PACF rule cover the corner cases the classical rule leaves implicit: (i) a structural-zero short-circuit forces the model to order 0 when the lag-1 conditional FACP is exactly zero (degenerate covariance), and (ii) a minimum-order-1 default keeps an AR(1) base whenever the lag-1 FACP is well defined but no lag exceeds the threshold.
-
AR coefficients — Coefficients for the selected order are estimated by solving the periodic Yule-Walker matrix system, which correctly accounts for the non-Toeplitz covariance structure of periodic autoregressive processes.
-
Maceira-Damazio iterative order reduction — After the initial fit, the recursively-composed contributions of each lag through the periodic monthly chain are computed. If any contribution is negative — a signal that the lag’s cumulative influence opposes the expected persistence direction and would propagate as an unstable Benders cut — the offending season’s AR ceiling is reduced and the Yule-Walker fit is re-run at the new ceiling. The reduction iterates across all seasons until every season’s contribution recursion yields non-negative entries.
-
Spatial correlation — The contemporaneous correlation between hydro plants is estimated from the historical residuals after AR fitting. The resulting correlation matrix is used by the spectral noise generator in exactly the same way as a manually specified
correlation.json.
History vs. pre-computed stats: choose one
inflow_history.parquet and inflow_seasonal_stats.parquet serve different
roles in the inflow model. When only inflow_history.parquet is present
(and inflow_seasonal_stats.parquet is absent), Cobre activates the
estimation path and derives seasonal statistics and AR coefficients from the
historical data. When inflow_seasonal_stats.parquet is present, it is used
directly regardless of whether inflow_history.parquet is also present.
Use history-based estimation when raw observations are available and you want
Cobre to handle the statistical fitting; use pre-computed stats when you have
already fitted the model externally or when you need precise control over the
parameters.
Inflow Source Resolution
The PAR(p) inflow model is built from up to five files in scenarios/. Three
of them — inflow_history.parquet, inflow_seasonal_stats.parquet, and
inflow_ar_coefficients.parquet — drive path resolution: their
presence/absence selects which of seven estimation paths Cobre executes. The
remaining two — correlation.json and inflow_annual_component.parquet — layer
orthogonally on top of that path.
Path-driver flags
| Symbol | File | Role |
|---|---|---|
| H | scenarios/inflow_history.parquet | Raw observations for fitting |
| S | scenarios/inflow_seasonal_stats.parquet | User-supplied μ, σ per (hydro, stage) |
| R | scenarios/inflow_ar_coefficients.parquet | User-supplied AR coefficients ψ[ℓ] |
The seven estimation paths
For each combination of (H, S, R), Cobre selects exactly one path and resolves
each model output as follows:
| # | H | S | R | Path | Seasonal stats μ, σ | AR coefficients ψ[ℓ] | Annual component (PAR-A) | Correlation Σ |
|---|---|---|---|---|---|---|---|---|
| 1 | 0 | 0 | 0 | Deterministic | no PAR model | none | n/a | identity, unless correlation.json provided |
| 2 | 0 | 1 | 0 | UserStatsWhiteNoise | user file | order-0 (white noise) | user file (if provided), else none | identity, unless correlation.json provided |
| 3 | 0 | 1 | 1 | UserProvidedNoHistory | user file | user file | user file (if provided), else none | identity, unless correlation.json provided |
| 4 | 1 | 0 | 0 | FullEstimation | fitted from H | fitted from H (PACF + Yule-Walker + Maceira-Damazio) | fitted from H iff order_selection = "pacf_annual" ¹ | estimated from H residuals, unless correlation.json provided |
| 5 | 1 | 0 | 1 | UserArHistoryStats | fitted from H | user file | always empty ² | estimated from H residuals using user ψ, unless correlation.json provided |
| 6 | 1 | 1 | 0 | PartialEstimation | user file (fitting stats used only for the YW solve) | fitted from H | fitted from H iff pacf_annual ¹ | estimated from H residuals using fitting stats, unless correlation.json provided |
| 7 | 1 | 1 | 1 | UserProvidedAll | user file | user file | user file (if provided), else none | identity, unless correlation.json provided ³ |
¹ When order_selection ≠ "pacf_annual", the fitted annual component is empty
even on paths 4 and 6.
² Path 5 explicitly discards any user-supplied
inflow_annual_component.parquet.
³ History is not re-consumed on path 7; correlation falls back to identity
unless correlation.json is supplied.
Invalid combinations collapse to
Deterministic. Cases with R=1 but H=0 and S=0 fall back to row 1 — AR coefficients alone cannot drive estimation.
The two orthogonal layers
correlation.json — wins on every path
When correlation.json is present, Cobre uses it verbatim regardless of which
of the seven paths runs. When absent, behavior splits:
- Estimation paths (4, 5, 6) — Σ is estimated from PAR residuals on
H. - Pass-through paths (1, 2, 3, 7) — Σ defaults to identity (independent noise).
This is the only file in the inflow stack that behaves as a true global override.
inflow_annual_component.parquet — only honored on pass-through paths
The user file is loaded by cobre-io and threaded into assemble_inflow_models,
but the estimation paths overwrite it:
| Path | User-supplied annual component is … |
|---|---|
Deterministic | n/a (no inflow models) |
UserStatsWhiteNoise | honored |
UserProvidedNoHistory | honored |
FullEstimation | overwritten by fitted values |
UserArHistoryStats | silently dropped (replaced by vec![]) |
PartialEstimation | overwritten by fitted values |
UserProvidedAll | honored |
To ship a hand-crafted PAR-A annual file, supply S and R so the run
lands on path 7 (UserProvidedAll).
Decision tree
┌─ inflow_history.parquet present? ─┐
│ │
yes no
│ │
┌─ seasonal_stats present? ─┐ ┌─ seasonal_stats present? ─┐
│ │ │ │
yes no yes no
│ │ │ │
┌── ar_coeffs? ──┐ ┌── ar_coeffs? ──┐ │ → Deterministic (1)
│ │ │ │ │
yes no yes no│
│ │ │ │ │
UserProvidedAll Partial UserAr Full
(7) Estimation HistoryStats Estimation
(6) (5) (4)
┌── ar_coeffs? ──┐
│ │
yes no
│ │
UserProvidedNoHistory UserStatsWhiteNoise
(3) (2)
Practical recipes
| Goal | Files to provide | Path landed |
|---|---|---|
| Smoke-test the LP without stochasticity | (no scenarios files) | 1 |
| Deterministic seasonal levels, no autoregression | inflow_seasonal_stats.parquet | 2 |
| Fully user-specified PAR(p) without raw observations | inflow_seasonal_stats.parquet, inflow_ar_coefficients.parquet | 3 |
| Hands-off: fit everything from raw observations | inflow_history.parquet | 4 |
| Fit stats from history, override the AR structure | inflow_history.parquet, inflow_ar_coefficients.parquet | 5 |
| Override the levels (μ, σ) but let Cobre fit the AR | inflow_history.parquet, inflow_seasonal_stats.parquet | 6 |
| Provide every parameter, including the PAR-A annual term | All three of H, S, R (and optionally annual file) | 7 |
| Pin a custom spatial correlation on any path | Add correlation.json | any |
The canonical implementation lives in crates/cobre-sddp/src/stochastic/estimation.rs —
EstimationPath::resolve and the dispatch in estimate_from_history — with the
per-path fitting logic in run_estimation (path 4), run_partial_estimation
(path 6), and run_user_ar_estimation (path 5).
Multi-Resolution Studies
Cobre supports studies that mix stages at different temporal resolutions — for example, weekly stages within a month followed by monthly stages, or monthly stages transitioning to quarterly stages. Three mechanisms handle the stochastic implications of these layouts automatically.
Noise Sharing
When multiple SDDP stages share the same season_id (for example, four weekly
stages all assigned to the April season), Cobre automatically shares PAR noise
draws across those stages. Each group of same-season_id stages within a
calendar period receives identical noise realizations, so that sub-monthly
stages present a consistent inflow trajectory that is consistent with the
monthly PAR model they were fitted from.
This sharing is controlled by a noise_group_id precomputed for each stage at
case load time. Uniform monthly studies assign a unique group to each stage, so
noise sharing has no effect and zero runtime overhead for standard studies. The
mechanism is seed-deterministic: identical tree_seed values produce identical
grouped noise assignments across runs and across MPI ranks.
Observation Aggregation
When the study uses a Custom cycle type with seasons of different durations
(for example, 12 monthly seasons followed by 4 quarterly seasons), Cobre
aggregates fine-grained historical observations into coarser season buckets
before PAR fitting. A user who provides monthly inflow_history.parquet for a
study that includes quarterly stages does not need to pre-aggregate the data:
Cobre calls aggregate_observations_to_season internally using
duration-weighted averaging to derive one observation per (hydro, season, year)
at the appropriate resolution for each PAR model.
The coarsening direction is mandatory — aggregating monthly to quarterly is supported; disaggregating quarterly to monthly is not and returns an error. Monthly-uniform studies bypass this step entirely.
Lag Resolution Transition
For studies that transition from monthly to quarterly stages, the PAR lag state must change resolution at the boundary. During the monthly phase, each monthly inflow is accumulated into a ring buffer indexed by the downstream (quarterly) lag. When the first quarterly stage is reached, the ring buffer contains a complete set of duration-weighted monthly contributions and the lag state is rebuilt from those values.
This transition is implemented in StageLagTransition via downstream
accumulation fields and is transparent to the LP and the cut representation.
The transition introduces no state variables in the LP; the lag state is an
internal solver variable updated in the hot-path functions. For
uniform-resolution studies, the downstream accumulation fields are unused and
the transition is a no-op.
For the full technical background — including the ring buffer design, frozen-lag
semantics, and the noise group precomputation algorithm — consult the temporal-resolution-debts design document in docs/design/.
Correlation
Hydro plants that share a watershed tend to have correlated inflows: when the upstream basin receives heavy rainfall, all plants along the river benefit simultaneously. Ignoring this correlation can cause the optimizer to underestimate the risk of a system-wide dry spell. Correlation can also be configured between load buses and between NCS entities.
Default behavior: independent noise
When no correlation configuration is provided, Cobre treats each entity’s
noise as independent of all others. Each entity draws its own noise
realization at each stage without any coupling. This is the correct setting
for the 1dtoy example, which has only one hydro plant.
Configuring spatial correlation
For multi-entity systems, Cobre supports spectral spatial correlation.
A correlation model is specified in correlation.json in the case directory
and defines named correlation groups, each with a symmetric correlation matrix.
The spectral method (eigendecomposition + matrix square root) is preferred
because it handles estimated matrices that are not strictly positive-definite
and rank-deficient matrices naturally, without requiring the matrix to satisfy
Cholesky conditions.
{
"method": "spectral",
"profiles": {
"default": {
"correlation_groups": [
{
"name": "basin_south",
"entities": [
{ "type": "inflow", "id": 0 },
{ "type": "inflow", "id": 1 }
],
"matrix": [
[1.0, 0.7],
[0.7, 1.0]
]
}
]
}
}
}
Backward compatibility:
"method": "cholesky"is accepted for existing case files and behaves identically to"spectral"as of v0.4.0.
Valid entity types
The "type" field in each entity reference must be one of:
"inflow"— hydro inflow series (entityidmatchesidinhydros.json)"load"— stochastic load demand (entityidmatchesidinbuses.json)"ncs"— non-controllable source availability (entityidmatchesidinnon_controllable_sources.json)
Same-type enforcement
All entities within a single correlation group must share the same entity
type. Mixing entity types — for example, placing an "inflow" entity and a
"load" entity in the same group — is not supported and produces a
StochasticError::InvalidCorrelation error at case load time. If you want to
correlate inflow with load, define separate groups with the same correlation
structure for each class.
Entities not listed in any group retain independent noise. Multiple profiles can be defined and scheduled to activate for specific stages (for example, using a wet-season correlation structure in January through March and a dry-season structure for the remaining months). Detailed correlation configuration documentation will be added with future multi-plant example cases.
Stochastic Load
Electrical load at each bus can be modeled as a stochastic process in
addition to, or independently of, inflow uncertainty. When
load_seasonal_stats.parquet is present in the scenarios/ directory,
Cobre applies a noise model to bus demand during training and simulation.
How load noise works
Load noise uses the same PAR(p) framework as inflows. For each bus and each
stage, Cobre draws a noise realization scaled by the bus’s mean_mw and
std_mw values from load_seasonal_stats.parquet. This realization is then
applied as a multiplicative factor on the base demand for that bus and stage:
the sampled load replaces the deterministic demand value during scenario
generation.
A bus with std_mw = 0 gets deterministic demand at each stage; a bus with
std_mw > 0 gets demand noise proportional to the standard deviation.
Optional: deterministic loads without the file
load_seasonal_stats.parquet is entirely optional. When the file is absent,
Cobre treats all bus demands as deterministic: the demand at each bus and
stage is the fixed value from the case data, with no noise applied. This is
the correct setting for studies where load uncertainty is negligible or where
you want to isolate inflow uncertainty in isolation.
Stochastic NCS Availability
Non-controllable sources (wind, solar, run-of-river) can have stochastic
available generation. When scenarios/non_controllable_stats.parquet is
present, Cobre samples a per-scenario availability factor for each NCS entity
and applies it to the entity’s max_generation_mw.
Schema
The file provides one row per (ncs_id, stage_id) pair:
| Column | Type | Nullable | Description |
|---|---|---|---|
ncs_id | INT32 | No | NCS entity ID (matches id in non_controllable_sources.json) |
stage_id | INT32 | No | Stage identifier (matches id in stages.json) |
mean | DOUBLE | No | Mean availability factor (dimensionless, must be in [0, 1]) |
std | DOUBLE | No | Standard deviation of availability factor (must be >= 0) |
How it works
For each forward and backward pass scenario, Cobre draws a standard normal
noise value η from the opening tree and computes:
A_r = max_generation_mw × clamp(mean + std × η, 0, 1)
The result A_r is then multiplied by the per-block factor from
scenarios/non_controllable_factors.json (default 1.0) to produce the
final NCS column upper bound:
col_upper = A_r × block_factor
With std = 0, the availability is deterministic at mean × max_generation_mw,
making the stochastic pipeline a strict generalization of the deterministic
ncs_bounds.parquet approach.
Optional: deterministic NCS without the file
When non_controllable_stats.parquet is absent, NCS availability is
deterministic: the LP column upper bound comes from constraints/ncs_bounds.parquet
(or defaults to max_generation_mw). No per-scenario variation occurs.
Seeds and Reproducibility
num_scenarios in stages.json
Each stage in stages.json has a num_scenarios field that controls how
many scenario branches are pre-generated for the opening scenario tree used
during the backward pass. A larger value gives the backward pass more
diverse inflow realizations to evaluate cuts against, at the cost of a
proportionally larger opening tree in memory. For the 1dtoy example this
is set to 10. Larger values increase scenario-tree diversity at proportional memory cost.
forward_passes in config.json
The forward_passes field in config.json controls how many scenario
trajectories are sampled during each training iteration’s forward pass.
This is distinct from num_scenarios: the forward pass draws new
trajectories on each iteration using a deterministic per-iteration seed,
while num_scenarios controls the pre-generated backward-pass tree.
Dual-Seed Architecture
Cobre uses two independent seeds, each controlling a different part of the stochastic pipeline:
training.tree_seed in config.json — the base seed for the opening
scenario tree. This seed governs all backward-pass openings and, when the
sampling scheme is in_sample (the default), also governs the forward-pass
scenario selection. When the same case is run with the same tree_seed, the
opening tree is bitwise identical across runs, regardless of the number of MPI
ranks.
training.scenario_source.seed in config.json — the forward seed used
when the sampling scheme is out_of_sample, historical, or external. This
seed controls the noise generated on-the-fly during each forward pass. It is
completely independent of tree_seed: changing it does not affect the
backward-pass tree, and changing tree_seed does not affect the forward pass.
tree_seed is optional: when omitted, Cobre uses a default seed of 42
(deterministic but arbitrary). scenario_source.seed is required when any
class uses out_of_sample, historical, or external; it is unused (and
may be omitted) when all classes use in_sample. To make a run fully
reproducible, specify both seeds explicitly:
// config.json
{
"training": {
"tree_seed": 42,
"forward_passes": 50,
"stopping_rules": [{ "type": "iteration_limit", "limit": 200 }],
"scenario_source": {
"seed": 99,
"inflow": { "scheme": "out_of_sample" },
"load": { "scheme": "in_sample" },
"ncs": { "scheme": "in_sample" }
}
}
}
When tree_seed is set to null in config.json, Cobre uses a default
seed of 42, producing a deterministic opening tree. Set tree_seed
explicitly to make the choice intentional. For scenario_source.seed, a
null value is only valid when all classes use in_sample (where no
forward-pass noise is generated); omitting it with any other scheme
triggers a validation error.
Noise Methods
The sampling_method field in each stage entry of stages.json controls
how noise vectors are generated within that stage when building the opening
scenario tree. This is orthogonal to the sampling scheme (see
Sampling Schemes below), which controls where the
forward-pass noise comes from. The noise method controls the algorithm;
the sampling scheme controls the source.
All methods produce standardized η ~ N(0,1) vectors. Everything downstream — the spectral correlation transform, the PAR model, and the LP constraint patching — is identical regardless of which method produced the noise. Switching from SAA to Sobol is a one-field configuration change.
The default method is "saa" when sampling_method is omitted.
SAA — Sample Average Approximation
SAA (Sample Average Approximation) is pure Monte Carlo sampling. Each opening
draws an independent sequence of standard-normal values from a Pcg64
generator seeded deterministically from the stage and opening index. There
is no coordination between openings; each is drawn without knowledge of the
others.
SAA is the simplest and most general method. It works for any dimension count
and any branching factor, and it has no restrictions on num_scenarios. Use
SAA as your baseline when you are uncertain which method to choose, or when
your branching factor is small (fewer than 50 scenarios per stage).
Configure SAA by setting "sampling_method": "saa" (or by omitting the
field, since SAA is the default).
LHS — Latin Hypercube Sampling
LHS (Latin Hypercube Sampling) is stratified sampling. For a stage with
N = num_scenarios openings, each dimension is divided into N
equal-probability strata [k/N, (k+1)/N) for k = 0, …, N-1. Exactly one
sample is placed within each stratum, and a Fisher-Yates shuffle independently
assigns strata to openings for every dimension. The result is marginal
uniformity: when you project all N noise vectors onto any single dimension,
the resulting samples cover the entire range of the standard-normal
distribution uniformly, with no stratum left empty.
LHS reduces the variance of sample-average estimates compared to SAA for the
same N, which typically means a better-converged backward-pass cut
approximation for the same computational budget. It is well-suited to moderate
branching factors and works for any dimension count.
Configure LHS by setting "sampling_method": "lhs" in the stage entry.
QMC-Sobol
QMC-Sobol uses Sobol quasi-random sequences, which are low-discrepancy
sequences that fill the unit hypercube more evenly than independent random
draws. Cobre implements the Joe-Kuo 2010 direction number dataset with
Matousek linear scrambling. The scrambling applies an affine transformation
x' = a·x + b (mod 2^32) with seed-derived parameters to each dimension,
breaking correlations between dimensions while preserving the low-discrepancy
property. The batch generator uses a Gray-code recurrence for O(1) updates
per point.
QMC-Sobol provides a faster convergence rate than both SAA and LHS for smooth
integrands, meaning that a smaller branching factor can achieve equivalent
policy quality. The convergence benefit is strongest when num_scenarios is a
power of 2 (32, 64, 128, 256, …), because Sobol sequences have optimal
2-equidistribution properties at powers of 2. You can use other values of
num_scenarios, but the theoretical convergence advantage is reduced.
QMC-Sobol supports up to 21,201 dimensions. If your system dimension (the total number of hydro plants, load buses, and NCS entities) exceeds 21,201, Cobre will return an error and refuse to run. In practice, this limit is never reached in hydrothermal planning models.
Configure QMC-Sobol by setting "sampling_method": "qmc_sobol".
QMC-Halton
QMC-Halton uses Halton sequences, another family of low-discrepancy
sequences. Each dimension uses a distinct prime base: dimension 1 uses
base 2, dimension 2 uses base 3, dimension 3 uses base 5, and so on. The
prime bases are computed at initialization time using the sieve of
Eratosthenes (sieve_primes). Cobre applies Owen-style random digit
scrambling to each dimension: a random permutation table is applied to each
digit position in each dimension, breaking the correlation artifacts that
affect plain Halton sequences at high dimensions (sometimes called the
“Halton curse”). Permutation tables are derived deterministically from the
stage seed.
QMC-Halton has no dimension limit — it can handle arbitrarily many dimensions by sieving as many primes as needed. This makes it a good alternative to QMC-Sobol for very high-dimensional cases, though in practice the dimension limit of QMC-Sobol (21,201) is rarely reached. The convergence properties of QMC-Halton are similar to QMC-Sobol but the scrambling approach differs; some integrands favor one over the other.
Configure QMC-Halton by setting "sampling_method": "qmc_halton".
HistoricalResiduals
HistoricalResiduals uses standardized noise values derived from actual
historical inflow observations rather than from synthetic distributions. For
each opening in the stage, Cobre selects a historical year (a “window”) from
the HistoricalScenarioLibrary and reads the pre-computed PAR residuals for
that year and stage directly into the noise vector. No random number generator
is invoked; the noise is determined entirely by which historical year is
selected.
This method requires inflow_history.parquet in the scenarios/ directory.
Cobre inverts the PAR(p) model for every valid (window, stage, hydro) triple
at case load time, computing:
eta = (obs - mu - sum(psi[l] * lag[l])) / sigma
where obs is the raw historical inflow, mu and sigma are the seasonal
mean and standard deviation, and psi[l] * lag[l] is the AR contribution
from the preceding l lags. The resulting eta values are stored once and
reused across training runs.
Window selection. For each opening, the window index is chosen deterministically using a hash of the base seed, the opening index, and the stage ID:
window_idx = derive_opening_seed(seed, opening, stage) % n_windows
Selection is with replacement, so the same historical year can appear in
multiple openings of the same stage. When n_windows < branching_factor, the
opening count for that stage is clamped to n_windows and Cobre emits a
warning. Having fewer historical windows than the branching factor is
acceptable — it means the opening tree samples the same years more than once
— but the policy quality is limited by the size of the historical record.
Correlation handling. HistoricalResiduals skips the spectral correlation step that all other noise methods apply after generation. Because each window corresponds to a real historical year, the joint distribution of eta values across hydro plants already reflects the empirical spatial correlation from that year. Applying a synthetic correlation transform on top of real residuals would distort rather than improve the representation.
Non-hydro slots. Only the hydro segment of the noise vector is filled from the historical library. Load and NCS slots are zeroed; those entities use their own noise sources as configured by the sampling scheme.
Configure HistoricalResiduals by setting
"sampling_method": "historical_residuals" in the stage entry of
stages.json:
{
"id": 0,
"start_date": "2024-01-01",
"end_date": "2024-02-01",
"blocks": [{ "id": 0, "name": "SINGLE", "hours": 744 }],
"num_scenarios": 50,
"sampling_method": "historical_residuals"
}
Use HistoricalResiduals when you want the backward-pass opening tree to be grounded in real historical sequences rather than synthetic draws. This is particularly useful when the historical record contains unusual events (severe droughts, extreme wet years) that are difficult to represent faithfully with a parametric distribution.
Selective (Reserved)
The "selective" method is reserved for future use. It is intended to
support representative scenario selection (clustering-based methods), but
the required infrastructure is not yet implemented. If you configure a stage
with "sampling_method": "selective", Cobre will return an error for the
opening tree generator. In the out-of-sample forward pass, it falls back to
SAA and emits a diagnostic warning.
Comparison
The following diagrams illustrate how each method distributes samples. SAA shows random clumps and gaps; LHS guarantees one sample per stratum; Sobol and Halton fill the space with low-discrepancy sequences.
| Method | Convergence rate | Dimension limit | Scenario count | Best for |
|---|---|---|---|---|
| SAA | O(N^{-1/2}) | None | Any | General use, small branching factors |
| LHS | Lower variance than SAA (same order) | None | Any | Moderate scenario counts, any dimension |
| QMC-Sobol | O(N^{-1} log^d N) | 21,201 | Powers of 2 preferred | Faster asymptotic convergence for smooth integrands, low-to-medium dimension |
| QMC-Halton | O(N^{-1} log^d N) | None | Any | High-dimension alternative to Sobol |
| HistoricalResiduals | N/A (empirical) | None | Limited by history length | Preserving empirical correlation, short history |
| Selective | N/A | N/A | N/A | Not implemented; reserved for future use |
Per-Stage Method Configuration
The sampling_method field is set per stage in stages.json. Different
stages in the same study can use different methods. This is useful when you
want a high-quality low-discrepancy method for the near-term stages (where
policy quality matters most) while using the simpler SAA for distant stages
where the investment decisions are less sensitive to sampling quality.
The following example configures a two-stage study where stage 0 uses LHS and stage 1 uses QMC-Sobol:
{
"policy_graph": { "type": "finite_horizon", "annual_discount_rate": 0.12 },
"stages": [
{
"id": 0,
"start_date": "2024-01-01",
"end_date": "2024-02-01",
"blocks": [{ "id": 0, "name": "SINGLE", "hours": 744 }],
"num_scenarios": 100,
"sampling_method": "lhs"
},
{
"id": 1,
"start_date": "2024-02-01",
"end_date": "2024-03-01",
"blocks": [{ "id": 0, "name": "SINGLE", "hours": 696 }],
"num_scenarios": 128,
"sampling_method": "qmc_sobol"
}
]
}
Mixed configurations are fully supported. Cobre applies each stage’s method independently when building the opening tree.
Sampling Schemes
The sampling scheme controls where the forward-pass noise comes from. This is a different concept from the noise method: the noise method controls the algorithm used to generate noise vectors for the opening tree, while the sampling scheme controls whether the forward pass reuses the pre-generated tree, generates fresh noise on-the-fly, replays historical observations, or reads from an externally supplied file.
Each entity class — inflow, load, and NCS — independently specifies its
forward-pass noise source. The sampling scheme is configured in config.json
under training.scenario_source using a per-class format:
// config.json
{
"training": {
"forward_passes": 50,
"stopping_rules": [{ "type": "iteration_limit", "limit": 200 }],
"scenario_source": {
"seed": 42,
"inflow": { "scheme": "in_sample" },
"load": { "scheme": "in_sample" },
"ncs": { "scheme": "in_sample" }
}
}
}
All three class keys ("inflow", "load", "ncs") default to
"in_sample" when absent. The "seed" field is shared across all classes
and is required when any class uses "out_of_sample", "historical", or
"external".
Independent simulation sampling:
simulation.scenario_sourceinconfig.jsoncan be set independently oftraining.scenario_source. Whensimulation.scenario_sourceis absent, the simulation phase falls back to the scheme configured undertraining.scenario_source. This lets you train with in-sample noise and simulate with out-of-sample or historical noise without changing the training configuration.
InSample (default)
With "scheme": "in_sample", the forward pass reuses the pre-generated
opening tree. At each (iteration, scenario, stage) triple, the solver
selects one opening from the tree using a deterministic per-iteration hash
derived from tree_seed. The backward pass and the forward pass see the
same set of noise realizations: the same scenarios that were used to build
cuts are the scenarios against which the forward trajectories are evaluated.
InSample is the default when training.scenario_source is absent from config.json.
It is simple to configure, requires no additional seed, and is appropriate for
most studies. The main limitation is that the forward pass cannot evaluate the
policy on noise realizations outside the opening tree, which can lead to an
optimistic bias when the branching factor is small.
OutOfSample
With "scheme": "out_of_sample", the forward pass generates fresh noise
on-the-fly at each (iteration, scenario, stage) triple. The fresh noise is
drawn from the same distribution as the opening tree but is independent of it
— the forward pass never looks at the tree. Each call derives a unique noise
vector from training.scenario_source.seed, the iteration index, the scenario
index, and the stage ID. The per-stage sampling_method controls which
algorithm (SAA, LHS, QMC-Sobol, or QMC-Halton) is used to generate the fresh
noise.
OutOfSample requires training.scenario_source.seed to be set. Configure it as follows:
// config.json
{
"training": {
"forward_passes": 50,
"stopping_rules": [{ "type": "iteration_limit", "limit": 200 }],
"scenario_source": {
"seed": 99,
"inflow": { "scheme": "out_of_sample" },
"load": { "scheme": "in_sample" },
"ncs": { "scheme": "in_sample" }
}
}
}
OutOfSample is preferred when you want to evaluate policy quality on scenarios that are independent of the scenarios used to build the policy. This avoids the in-sample optimism that arises with small branching factors, where the policy has effectively “seen” all the noise realizations during training. OutOfSample is especially useful during simulation, where you want an unbiased estimate of the policy’s expected cost on new scenarios.
Historical
With "scheme": "historical", the forward pass replays standardized noise
derived from historical inflow observations stored in inflow_history.parquet.
This allows you to evaluate the policy against actual historical sequences —
what would the policy have done during the drought of 1953 or the wet year of
1974?
Historical sampling applies only to the inflow class. The load and NCS classes configure their own schemes independently and are unaffected by the inflow class using Historical.
Window discovery
A “window” is a starting year y for which every hydro plant in the study has
a complete sequence of historical observations covering the entire study period
(plus the PAR model lag order of pre-study seasons needed to seed the AR
state). Cobre discovers valid windows by scanning inflow_history.parquet and
checking completeness for every candidate starting year.
When historical_years is absent from training.scenario_source, Cobre
auto-discovers all valid windows from the history file. If the history file covers years 1940
through 2010 and the study spans 12 monthly stages, then every year for which
the history is complete (accounting for the required pre-window lag seasons)
becomes a valid window.
Configuring historical_years
To restrict the pool of candidate windows, set historical_years in
scenario_source. Two forms are supported:
Explicit list — specify the exact starting years to use:
// config.json
{
"training": {
"forward_passes": 50,
"stopping_rules": [{ "type": "iteration_limit", "limit": 200 }],
"scenario_source": {
"seed": 7,
"inflow": { "scheme": "historical" },
"load": { "scheme": "in_sample" },
"ncs": { "scheme": "in_sample" },
"historical_years": [1940, 1953]
}
}
}
Inclusive range — specify a contiguous span of starting years:
// config.json
{
"training": {
"forward_passes": 50,
"stopping_rules": [{ "type": "iteration_limit", "limit": 200 }],
"scenario_source": {
"seed": 7,
"inflow": { "scheme": "historical" },
"load": { "scheme": "in_sample" },
"ncs": { "scheme": "in_sample" },
"historical_years": { "from": 1940, "to": 2010 }
}
}
}
In both forms, Cobre validates each candidate year against the history file
and silently discards years for which the data is incomplete. If no valid
windows remain after filtering, Cobre returns a StochasticError::InsufficientData
error. When the number of valid windows is smaller than forward_passes, a
diagnostic warning is emitted and windows are repeated across forward passes.
Lag seeding (apply_initial_state)
For PAR models with order > 0, the first stage of each forward pass requires historical inflow values from the stages immediately before the window’s start year — the “pre-study” lags. Historical sampling uses the raw historical observations at those pre-window stages directly as the PAR state vector. This means the AR dynamics of the first forward stage are initialized from the real historical record rather than from a generated value, preserving the continuity invariant between pre-window history and the replayed scenario.
How the HistoricalScenarioLibrary is used
At case load time, Cobre constructs a HistoricalScenarioLibrary by inverting
the PAR(p) model for every valid (window, stage) pair: it computes the
standardized noise value η = (obs − deterministic_base − Σ ψ[ℓ]·lag[ℓ]) / σ
using the raw historical inflow as lags. The resulting eta values are stored in
a flat buffer indexed by (window, stage, hydro). During the forward pass, the
ClassSampler::Historical variant selects a window deterministically from the
seed and iteration/scenario indices, then retrieves the pre-computed eta slice
for each stage without any per-step recomputation.
Scenario selection: random without replacement
Historical, External, and LHS all use the same underlying mechanism to select items from a pool without repetition: a seed-derived Fisher-Yates permutation. Each forward-pass scenario gets a unique window (or external trajectory, or LHS stratum) within each round, with no inter-worker communication required.
External
With "scheme": "external", the forward pass reads pre-generated scenario
realizations from per-class Parquet files in the scenarios/ directory. This
enables integration with external scenario generation tools — for example, a
climate model, a market forecast engine, or a bespoke sampling framework — and
injects their output directly into the Cobre forward pass.
Each entity class that uses External sampling requires its own file. The three files and their schemas are:
external_inflow_scenarios.parquet
| Column | Type | Nullable | Description |
|---|---|---|---|
stage_id | INT32 | No | Stage identifier (matches id in stages.json) |
scenario_id | INT32 | No | Zero-based scenario index (0 to n_scenarios − 1) |
hydro_id | INT32 | No | Hydro plant ID (matches id in hydros.json) |
value_m3s | DOUBLE | No | Inflow realization in m³/s for this (stage, scenario, hydro) |
external_load_scenarios.parquet
| Column | Type | Nullable | Description |
|---|---|---|---|
stage_id | INT32 | No | Stage identifier (matches id in stages.json) |
scenario_id | INT32 | No | Zero-based scenario index (0 to n_scenarios − 1) |
bus_id | INT32 | No | Bus ID (matches id in buses.json) |
value_mw | DOUBLE | No | Load realization in MW for this (stage, scenario, bus) |
external_ncs_scenarios.parquet
| Column | Type | Nullable | Description |
|---|---|---|---|
stage_id | INT32 | No | Stage identifier (matches id in stages.json) |
scenario_id | INT32 | No | Zero-based scenario index (0 to n_scenarios − 1) |
ncs_id | INT32 | No | NCS entity ID (matches id in non_controllable_sources.json) |
value | DOUBLE | No | Availability realization for this (stage, scenario, NCS) |
External standardization
Cobre does not use the raw values from external files directly. Before the forward pass can use them, each value is converted to the same standardized noise space (eta) that the PAR model and the opening tree use internally:
- Inflow — full PAR(p) inversion via
solve_par_noise: the observed value is converted toη = (obs − deterministic_base − Σ ψ[ℓ]·lag[ℓ]) / σusing the fitted PAR model coefficients and seasonal statistics. - Load — simple z-score normalization:
η = (value − mean) / stdusing themean_mwandstd_mwfromload_seasonal_stats.parquet. - NCS — simple z-score normalization:
η = (value − mean) / stdusing themeanandstdfromnon_controllable_stats.parquet.
The resulting eta values are stored in an ExternalScenarioLibrary — one
per class — and the ClassSampler::External variant retrieves them by
(stage, scenario) index during the forward pass.
Configuring External sampling
// config.json
{
"training": {
"forward_passes": 50,
"stopping_rules": [{ "type": "iteration_limit", "limit": 200 }],
"scenario_source": {
"seed": 1,
"inflow": { "scheme": "external" },
"load": { "scheme": "external" },
"ncs": { "scheme": "in_sample" }
}
}
}
Each class is configured independently. In the example above, inflow and load use external files while NCS uses the in-sample opening tree.
User-Supplied Opening Trees
By default, Cobre generates the backward-pass opening tree internally using
SipHash-derived seeds and the spatial correlation spectral factor. If you need
to supply your own noise realizations — for cross-tool comparison, sensitivity
analysis, or round-trip replay — you can place scenarios/noise_openings.parquet
in the case directory before running.
When the file is present, Cobre loads the opening tree from it instead of calling the internal generator. When the file is absent, the default generator runs as usual.
Schema
The file has exactly four columns:
| Column | Type | Required | Description |
|---|---|---|---|
stage_id | INT32 | Yes | Zero-based stage index (0 to n_stages − 1) |
opening_index | UINT32 | Yes | Zero-based opening index within the stage (0 to openings_per_stage − 1) |
entity_index | UINT32 | Yes | Zero-based entity index in system dimension order |
value | DOUBLE | Yes | Noise realization for this (stage, opening, entity) triple |
Entity ordering
The entity_index column follows the system dimension convention:
- Hydro entities, sorted by canonical ID (ascending)
- Load buses, sorted by canonical ID (ascending)
- NCS entities, sorted by canonical ID (ascending)
This matches the ordering used by Cobre’s internal opening tree generator. The file stores only indices, not entity identifiers, so an incorrect ordering causes silent value misassignment rather than a schema error. Double-check the entity ordering when constructing the file externally.
Use cases
- Cross-tool comparison. Generate a set of noise realizations in an external tool and inject them into Cobre to compare policy quality on identical scenarios.
- Sensitivity analysis. Construct an extreme scenario (for example, all hydros at minimum inflow for the entire study) and evaluate how the policy responds.
- Round-trip replay. Export the opening tree that Cobre used in a training
run with
exports.stochastic: trueinconfig.json, copyoutput/stochastic/noise_openings.parquettoscenarios/, and re-run to reproduce the exact same backward-pass context. See Exporting Stochastic Artifacts for the complete workflow.
Interaction with tree_seed
The training.tree_seed field in config.json remains required even when a
user-supplied opening tree is present. The opening tree and forward-pass noise
are independent: tree_seed governs the forward-pass scenario sampling
performed by sample_forward(), which uses SipHash seeds derived independently
of the opening tree. Supplying a custom opening tree has no effect on forward-pass
noise.
Limitations
- Partial-stage override is not supported. You must supply openings for all study stages. If you want to replace a subset of stages while keeping the rest internally generated, you must supply a complete tree and duplicate the internally generated values for the unmodified stages.
- User-supplied noise is used as-is. The spectral spatial correlation factor is not applied again. You are responsible for any spatial correlation structure encoded in the values you supply.
The file schema and validation rules are documented in the noise_openings.rs module.
Inflow Non-Negativity
Normal distributions used in PAR(p) models have unbounded support: even with a positive mean, there is a non-zero probability of drawing a negative noise realisation that, after applying the AR dynamics, produces a negative inflow value. Negative inflow has no physical meaning and, if uncorrected, would violate water balance constraints in the LP.
Cobre provides two available methods for handling negative inflow realisations,
controlled by the modeling.inflow_non_negativity.method field in config.json.
Penalty method (default)
The penalty method adds a high-cost slack variable to each water balance
row. When the solver encounters a scenario where the inflow would be negative,
it draws on this virtual inflow at the penalty cost rather than violating the
balance constraint. The penalty cost is configurable via the
inflow_non_negativity field in the case configuration; the default keeps it
high enough that the slack is used only when necessary.
In practice, the penalty is rarely activated in well-specified studies. It acts as a backstop for low-probability tail realisations. It is the default method.
Truncation method
Available since v0.1.1, the truncation method evaluates the full inflow
value before constructing the LP and clamps any negative result to zero. The
water balance row receives the clamped inflow directly; no slack variable is
added and no penalty cost is incurred. To enable truncation, set the method
field in config.json:
{
"modeling": {
"inflow_non_negativity": {
"method": "truncation"
}
}
}
Truncation eliminates the penalty cost for tail realisations at the expense of introducing a small bias: scenarios where the true inflow would be slightly negative are treated as zero-inflow scenarios, which is conservative but physically interpretable. For most well-specified studies, both methods produce similar results because negative realisations are rare.
Truncation with penalty
A combined truncation with penalty method is available, configured by
setting method to "truncation_with_penalty" in config.json:
{
"modeling": {
"inflow_non_negativity": {
"method": "truncation_with_penalty"
}
}
}
This method applies both truncation and a bounded slack variable: the inflow
is clamped to zero and a slack penalised by
penalties.json::hydro.inflow_nonnegativity_cost is added, providing a smooth
backstop for extreme tail realisations.
For the mathematical theory behind all three methods, see the Inflow Non-Negativity page in the methodology reference, or Oliveira et al. (2022), Energies 15(3):1115.
Temporal Resolution and PAR
The PAR(p) model is parameterized by season_id. Every stage in stages.json
carries a season_id that selects its PAR parameters — mean (mu), standard
deviation (sigma), and autoregressive coefficients (psi) — from the fitted
model. When multiple stages share the same season_id, they receive identical
stochastic parameters.
This design choice reflects a fundamental data-resolution constraint. If the
historical observations are at monthly resolution, the fitted PAR parameters
describe the distribution of monthly inflows. Applying those parameters to
sub-monthly stages (for example, four weekly stages all assigned
season_id = 3 for April) does not create additional information — it
reproduces the same monthly-scale noise for each week.
Why sub-monthly stages share noise. Sub-monthly stages sharing a
season_id receive the same PAR parameters and, for the HistoricalResiduals
noise method, the same noise realizations. This is not a limitation of the
implementation — it is an honest representation of what monthly-resolution
data can tell you. Monthly history cannot support independent weekly noise
draws; doing so would fabricate variability that does not exist in the record.
Users who need true sub-monthly variability should supply it through External
scenarios from a dedicated short-term model.
Recommended pattern for weekly decision granularity. When weekly dispatch decisions matter but external weekly scenarios are not available, the recommended approach is to use a monthly SDDP stage with chronological blocks rather than multiple weekly SDDP stages:
{
"id": 0,
"start_date": "2024-01-01",
"end_date": "2024-02-01",
"season_id": 0,
"blocks": [
{ "id": 0, "name": "WEEK1", "hours": 168 },
{ "id": 1, "name": "WEEK2", "hours": 168 },
{ "id": 2, "name": "WEEK3", "hours": 168 },
{ "id": 3, "name": "WEEK4", "hours": 240 }
],
"num_scenarios": 50
}
One monthly stage with four weekly chronological blocks provides weekly dispatch granularity in the LP while keeping one noise realization per month — consistent with the data resolution. The stage boundary carries a single Benders cut at monthly resolution. This avoids both the fabricated weekly variability and the lag-accumulation complications that arise with four independent weekly SDDP stages.
For the full technical background on temporal resolution design, including
applicability matrices for different study patterns, consult the temporal-resolution-debts design document in docs/design/.
Validation Rules
Cobre validates the consistency of temporal resolution settings at case load
time. The following rules apply when season_definitions is present in
stages.json and inflow_history.parquet is the active estimation source.
Rule 27 (error): season_id range coverage.
Every stage season_id must reference a season defined in
season_definitions. If a stage has season_id = 5 but the season map only
defines seasons 0–11, Cobre emits a BusinessRuleViolation error and
refuses to build the stochastic model.
- Triggers when: a stage’s
season_idis not present inseason_definitions.seasons[].id. - Resolution: Add the missing season to
season_definitions, or correct theseason_idin the stage entry.
Rule 28 (warning): observation coverage.
When a season has no inflow observations in inflow_history.parquet and the
inflow sampling scheme is not external, PAR estimation for that season will
have no data. Cobre emits a ModelQuality warning. This is not an error
because External-only seasons legitimately have no history requirement.
- Triggers when: a season defined in
season_definitionshas zero observations ininflow_history.parquetand the inflow scheme is notexternal. - Resolution: Provide historical observations for the season, switch the
inflow scheme to
externalfor that study, or remove the season if it is unused.
Rule 29 (error): resolution consistency.
All stages sharing the same season_id must have durations within 7 days of
each other. A stage group where one member is a monthly stage (28–31 days)
and another is a quarterly stage (89–92 days) indicates conflicting PAR model
parameterisations for the same season, and Cobre emits a
BusinessRuleViolation error.
- Triggers when: the maximum and minimum durations among stages in the
same
season_idgroup differ by more than 7 days. - Resolution: Assign distinct
season_idvalues to stages at different temporal resolutions (e.g., monthly stages use IDs 0–11, quarterly stages use IDs 12–15 in a customSeasonMap).
Rule 30 (warning): contiguity.
A season defined in season_definitions but not referenced by any stage will
have no PAR parameters and no observations. Cobre emits a ModelQuality
warning for each such season. This catches accidental gaps in the season ID
space (e.g., defining seasons 0–11 but stages only using 0–9).
- Triggers when: a season defined in
season_definitionsis not referenced by any stage’sseason_id. - Resolution: Remove the unreferenced season from
season_definitions, or assign it to at least one stage.
Rule 31 (error): observation-to-season alignment.
If any (hydro_id, season_id, year) triple has more than one observation in
inflow_history.parquet, the observation data has finer temporal resolution
than the season definitions. The PAR estimation pipeline expects exactly one
observation per (hydro, season, year). Multiple observations distort
parameter estimates. Cobre emits a BusinessRuleViolation error.
- Triggers when: a hydro plant has two or more observations in
inflow_history.parquetthat map to the same(season_id, year)pair (for example, daily observations paired with monthly seasons, or two monthly entries for the same hydro-season-year). - Resolution: Aggregate the finer-resolution observations to match the
season resolution before providing the file. Provide exactly one row per
(hydro_id, season_id, year)ininflow_history.parquet.
Related Pages
- Anatomy of a Case — introductory walkthrough of the
scenarios/directory and Parquet schemas - Configuration — full documentation of
config.jsonfields includingtree_seedandforward_passes - cobre-stochastic — internal architecture of the stochastic crate: PAR preprocessing, spectral correlation, opening tree, and seed derivation