cobre-io

alpha

cobre-io is the case directory loader for the Cobre ecosystem. It provides the load_case function, which reads a case directory from disk and produces a fully-validated [cobre_core::System] ready for use by downstream solver and analysis crates.

The crate owns the entire input path: JSON and Parquet parsing, layered validation, three-tier penalty and bound resolution, scenario model assembly, and optional parameter estimation from historical data. No other crate reads input files. Every crate downstream of cobre-io receives a structurally sound System with all foreign keys resolved and all domain rules verified.

Module overview

Module	Purpose
`config`	`Config` struct and `parse_config` — reads `config.json`
`system`	Entity parsers for buses, lines, hydros, thermals, energy contracts, pumping stations, and non-controllable sources
`extensions`	Hydro production model extensions — FPHA hyperplane loading, production model configuration parsing, and hydro geometry parsing
`scenarios`	Inflow and load statistical model loading, assembly, history-based estimation, and per-class external scenario loading (`external_inflow_scenarios.parquet`, `external_load_scenarios.parquet`, `external_ncs_scenarios.parquet`)
`constraints`	Stage-varying bound and penalty override loading from Parquet
`penalties`	Global penalty defaults parser (`penalties.json`)
`stages`	Stage sequence and policy graph loading (`stages.json`), per-class scenario source parsing (`ScenarioSource`), and backward-incompatibility detection for removed fields
`initial_conditions`	Reservoir initial storage loading
`validation`	Layered validation pipeline and `ValidationContext`
`resolution`	Three-tier penalty and bound resolution into O(1) lookup tables
`pipeline`	Orchestrator that wires all layers into a single `load_case` call
`report`	Structured validation report generation
`broadcast`	System serialization and deserialization for MPI broadcast
`output`	Output result types for simulation and training data; `output::hydro_models` exports fitted FPHA hyperplane coefficients to Parquet

`load_case`

#![allow(unused)]
fn main() {
pub fn load_case(path: &Path) -> Result<System, LoadError>
}

Loads a power system case directory and returns a fully-validated System.

path must point to the case root directory. That directory must contain config.json, penalties.json, stages.json, initial_conditions.json, the system/ subdirectory, the scenarios/ subdirectory, and the constraints/ subdirectory. See Case directory structure for the full layout.

load_case executes the following sequence:

Layer 1 — Structural validation. Checks that all required files exist on disk and records which optional files are present. Missing required files produce [LoadError::ConstraintError] entries. Missing optional files are silently noted in the file manifest without error.
Layer 2 — Schema validation. Parses every present file, verifies required fields, types, and value ranges. Returns [LoadError::IoError] for read failures and [LoadError::ParseError] for malformed JSON or invalid Parquet. Schema violations produce [LoadError::ConstraintError] entries.
Layer 3 — Referential integrity. Verifies that every cross-entity ID reference resolves to a known entity. Dangling foreign keys produce [LoadError::ConstraintError] entries.
Layer 4 — Dimensional consistency. Checks that optional per-entity files provide coverage for every entity that needs them (for example, that inflow statistical parameters exist for every hydro plant, and that load seasonal statistics cover every bus for every stage). Coverage gaps produce [LoadError::ConstraintError] entries.
Layer 5 — Semantic validation. Enforces domain business rules: acyclic hydro cascade topology, penalty ordering (lower tiers may not exceed upper), PAR model stationarity, stage count consistency, estimation prerequisites, and other invariants. Violations produce [LoadError::ConstraintError] entries.
Resolution. After all validation layers pass, three-tier penalty and bound resolution is performed. The result is pre-resolved lookup tables embedded in the System for O(1) solver access.
Scenario assembly. Inflow and load statistical models are assembled from the parsed seasonal statistics and autoregressive coefficients. When inflow_history.parquet is present and inflow_seasonal_stats.parquet is absent, the estimation pipeline derives seasonal statistics and AR coefficients from the historical data before assembly.
System construction. SystemBuilder::build() is called with the fully resolved data. Any remaining structural violations (duplicate IDs, broken cascade) surface as a final [LoadError::ConstraintError].

All validation diagnostics across Layers 1 through 5 are collected by ValidationContext before failing. When load_case returns an error, the error message contains every problem found, not just the first one.

Minimal example

#![allow(unused)]
fn main() {
use cobre_io::load_case;
use std::path::Path;

let system = load_case(Path::new("path/to/my_case"))?;
println!("Loaded {} buses, {} hydros", system.n_buses(), system.n_hydros());
}

Return type

On success, load_case returns a cobre_core::System — an immutable, Send + Sync container holding all entity registries, topology graphs, pre-resolved penalty and bound tables, scenario models, and the stage sequence. All entity collections are in canonical ID-sorted order.

On failure, load_case returns a LoadError. See Error handling for the full set of variants and when each occurs.

Case directory structure

A valid case directory has the following layout:

my_case/
├── config.json                          # Solver configuration (required)
├── penalties.json                       # Global penalty defaults (required)
├── stages.json                          # Stage sequence and policy graph (required)
├── initial_conditions.json              # Reservoir storage at study start (required)
├── system/
│   ├── buses.json                       # Electrical buses (required)
│   ├── lines.json                       # Transmission lines (required)
│   ├── hydros.json                      # Hydro plants (required)
│   ├── thermals.json                    # Thermal plants (required)
│   ├── non_controllable_sources.json    # Intermittent sources (optional)
│   ├── pumping_stations.json            # Pumping stations (optional)
│   └── energy_contracts.json           # Bilateral contracts (optional)
│   ├── hydro_geometry.parquet           # Reservoir geometry tables (optional)
│   ├── hydro_production_models.json    # FPHA production function configs (optional)
│   └── fpha_hyperplanes.parquet         # FPHA hyperplane coefficients (optional)
├── scenarios/
│   ├── inflow_seasonal_stats.parquet    # PAR model seasonal statistics (optional)
│   ├── inflow_ar_coefficients.parquet   # PAR autoregressive coefficients (optional)
│   ├── inflow_history.parquet           # Historical inflow series (optional)
│   ├── load_seasonal_stats.parquet      # Load model seasonal statistics (optional)
│   ├── load_factors.json                # Load scaling factors (optional)
│   ├── correlation.json                 # Cross-series correlation model (optional)
│   ├── external_inflow_scenarios.parquet    # External inflow scenarios (optional)
│   ├── external_load_scenarios.parquet      # External load scenarios (optional)
│   └── external_ncs_scenarios.parquet       # External NCS scenarios (optional)
└── constraints/
    ├── hydro_bounds.parquet             # Stage-varying hydro bounds (optional)
    ├── thermal_bounds.parquet           # Stage-varying thermal bounds (optional)
    ├── line_bounds.parquet              # Stage-varying line bounds (optional)
    ├── pumping_bounds.parquet           # Stage-varying pumping bounds (optional)
    ├── contract_bounds.parquet          # Stage-varying contract bounds (optional)
    ├── generic_constraints.json         # User-defined LP constraints (optional)
    ├── generic_constraint_bounds.parquet # Bounds for generic constraints (optional)
    ├── exchange_factors.json             # Block exchange factors (optional)
    ├── penalty_overrides_hydro.parquet  # Stage-varying hydro penalty overrides (optional)
    ├── penalty_overrides_bus.parquet    # Stage-varying bus penalty overrides (optional)
    ├── penalty_overrides_line.parquet   # Stage-varying line penalty overrides (optional)
    └── penalty_overrides_ncs.parquet    # Stage-varying NCS penalty overrides (optional)

For the full JSON and Parquet schemas for each file, see the Case Format Reference.

Validation pipeline

The validation pipeline layers run in sequence. Earlier layers gate later ones: if Layer 1 finds a missing required file, the file is not parsed in Layer 2. All diagnostics across all layers are collected before returning.

Case directory
      │
      ▼
┌─────────────────────────────────────────────────┐
│  Layer 1 — Structural                           │
│  Does each required file exist on disk?         │
│  Records optional-file presence in FileManifest.│
└────────────────────┬────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────┐
│  Layer 2 — Schema                               │
│  Parse JSON and Parquet. Check required fields, │
│  types, and value ranges. Collect schema errors.│
└────────────────────┬────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────┐
│  Layer 3 — Referential integrity                │
│  All cross-entity ID references must resolve.   │
│  (e.g., hydro.bus_id must exist in buses list)  │
└────────────────────┬────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────┐
│  Layer 4 — Dimensional consistency              │
│  Optional per-entity files must cover every     │
│  entity that needs them. Load cross-validation  │
│  checks bus coverage when load stats present.   │
└────────────────────┬────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────┐
│  Layer 5 — Semantic                             │
│  Domain business rules: acyclic cascade,        │
│  penalty ordering, PAR stationarity, stage      │
│  count consistency, estimation prerequisites,   │
│  and other invariants.                          │
└────────────────────┬────────────────────────────┘
                     │
                     ▼ (all layers pass)
              Resolution + Assembly
              System construction
                     │
                     ▼
              Ok(System)

What each layer checks

Layer 1 (Structural): Verifies that the four root-level required files (config.json, penalties.json, stages.json, initial_conditions.json) and the four required entity files (system/buses.json, system/lines.json, system/hydros.json, system/thermals.json) exist. Optional files are noted in the FileManifest but their absence is not an error. The FileManifest is passed to Layer 2 so that optional-file parsers are only called when the files are present.

Layer 2 (Schema): Parses every file found by Layer 1. For JSON files, deserialization uses serde with strict field requirements: every input file applies #[serde(deny_unknown_fields)], so missing required fields and unrecognised keys surface immediately as a hard parse error rather than being silently ignored. For Parquet files, column presence and data types are verified. Post-deserialization checks catch domain range violations (for example, negative capacity values) that serde cannot express. All parse and schema errors are collected by ValidationContext.

Layer 3 (Referential integrity): Checks all cross-entity foreign-key references. Examples: every hydro.bus_id must name a bus in the bus registry; every line.source_bus_id and line.target_bus_id must resolve; every pumping_station.source_hydro_id and destination_hydro_id must resolve; every bound override row’s entity ID must match a known entity. All broken references are collected before returning.

Layer 4 (Dimensional consistency): Verifies cross-file entity coverage. When scenarios/inflow_seasonal_stats.parquet is present, every hydro plant must have at least one row of statistics. When scenarios/inflow_ar_coefficients.parquet is present, the AR order must be consistent with the number of coefficient rows.

Load file cross-validation: When scenarios/load_seasonal_stats.parquet is present, every bus in the system must have a row for every study stage. A bus that is present in buses.json but missing from load_seasonal_stats.parquet for any stage produces a DimensionMismatch error. This ensures that the load model covers the full spatial and temporal extent of the case before any downstream model is built.

Other coverage checks ensure that optional per-entity Parquet files do not silently omit entities.

Layer 5 (Semantic): Enforces domain invariants that span multiple files or require reasoning about the system as a whole:

Acyclic cascade. The hydro downstream_id graph must be a directed forest (no cycles). A topological sort detects cycles.
Penalty ordering. Violation penalty tiers must be ordered: lower-tier penalties may not exceed upper-tier penalties for the same entity.
PAR model stationarity. Seasonal inflow statistics must satisfy the stationarity requirements of the PAR(p) model.
Stage count consistency. The number of stages must match across stages.json, scenario data, and any stage-varying Parquet files.
Estimation prerequisites. When the estimation path is active (see Estimation pipeline), three additional rules are enforced:
- season_definitions must be present in stages.json so that historical observations can be grouped by season for fitting.
- Every hydro plant in hydros.json must have at least one observation in inflow_history.parquet; hydros with no history cannot be estimated (BusinessRuleViolation).
- Each (hydro, season) group is checked for a minimum number of observations (configurable via estimation.min_observations_per_season); groups below the threshold produce a ModelQuality warning.

Estimation pipeline

When scenarios/inflow_history.parquet is present in the case directory and scenarios/inflow_seasonal_stats.parquet is absent, load_case activates the estimation path. In this mode, the seasonal statistics and AR coefficients required by the scenario model are derived automatically from the historical inflow series rather than being read from pre-computed Parquet files.

The trigger condition is checked after Layers 1 through 5 complete:

inflow_history.parquet present
    AND inflow_seasonal_stats.parquet absent
        → estimation path active

When the estimation path is inactive (explicit stats files are provided), inflow_history.parquet is loaded and stored on ScenarioData.inflow_history but does not influence model assembly. This allows downstream consumers to access the raw historical series without re-triggering estimation.

Estimation configuration types

The config.json file accepts an optional "estimation" section that controls the fitting procedure. All fields have defaults and the section may be omitted entirely.

Field	Type	Default	Description
`max_order`	`u32`	`6`	Maximum autoregressive lag order considered during model selection
`order_selection`	`"pacf"` or `"pacf_annual"`	`"pacf"`	Criterion for selecting the AR order: PACF significance testing, optionally augmented with an annual component
`min_observations_per_season`	`u32`	`30`	Minimum observations required per `(entity, season)` group

The estimation configuration is accessible at config.estimation after parse_config. The min_observations_per_season threshold is used both during Layer 5 validation (to emit a ModelQuality warning for sparse groups) and during the fitting procedure itself (to skip groups below the threshold).

Season map requirement

The estimation path groups historical observations by season in order to fit season-specific AR models. This requires the season_definitions field to be present in stages.json. If season_definitions is absent when estimation is active, Layer 5 emits a BusinessRuleViolation before fitting begins.

Penalty and bound resolution

After all five validation layers pass, load_case resolves the three-tier penalty and bound cascades into flat lookup tables embedded in the System.

Three-tier cascade

Penalty and bound values follow a three-tier precedence cascade:

Tier 1 — Global defaults (penalties.json)
    ↓ overridden by
Tier 2 — Entity-level overrides (system/*.json fields)
    ↓ overridden by
Tier 3 — Stage-varying overrides (constraints/penalty_overrides_*.parquet)

Tier-1 and tier-2 resolution happen during entity parsing (Layer 2). By the time the resolution step runs, each entity struct already holds its tier-2 resolved value in the relevant penalty or bound field.

The resolution step applies tier-3 stage-varying overrides from the optional Parquet files. For each (entity, stage) pair, the resolved value is:

The tier-3 override from the Parquet row, if a row exists for that pair.
Otherwise, the tier-2 value already stored in the entity struct.

Sparse expansion

Tier-3 overrides are stored sparsely: a Parquet row only needs to exist for stages where the override differs from the entity-level value. The resolution step expands this sparse representation into a dense [n_entities × n_stages] array for O(1) solver lookup at construction time.

Result

Resolution produces two pre-resolved tables stored on System:

ResolvedPenalties — per-(entity, stage) penalty values for buses, hydros, lines, and non-controllable sources.
ResolvedBounds — per-(entity, stage) upper and lower bound values for hydros, thermals, lines, pumping stations, and energy contracts.

Both tables use dense flat arrays with positional entity indexing (entity position in the canonical ID-sorted slice becomes its array index).

`Config` struct

Config is the in-memory representation of config.json. Use parse_config to load it independently of load_case:

#![allow(unused)]
fn main() {
use cobre_io::config::parse_config;
use std::path::Path;

let cfg = parse_config(Path::new("my_case/config.json"))?;
println!("forward_passes = {:?}", cfg.training.forward_passes);
}

Config has seven sections:

Section	Type	Default	Purpose
`modeling`	`ModelingConfig`	`{}`	Inflow non-negativity treatment method and cost
`training`	`TrainingConfig`	(required)	Iteration count, stopping rules, cut selection
`upper_bound_evaluation`	`UpperBoundEvaluationConfig`	`{}`	Inner approximation upper-bound evaluation settings
`policy`	`PolicyConfig`	fresh mode	Policy directory path, warm-start / resume mode
`simulation`	`SimulationConfig`	disabled	Post-training simulation scenario count and output
`exports`	`ExportsConfig`	all on	Flags controlling which output files are written
`estimation`	`EstimationConfig`	`{}`	AR model fitting settings for history-based estimation

Mandatory fields

Two fields in training have no defaults and must be present in config.json. parse_config returns LoadError::SchemaError if either is absent:

training.forward_passes — number of scenario trajectories per iteration (integer, >= 1)
training.stopping_rules — list of stopping rule entries (must include at least one iteration_limit rule)

Stopping rules

The training.stopping_rules array accepts four rule types, identified by the "type" field:

Type	Required fields	Stops when
`iteration_limit`	`limit: u32`	Iteration count reaches `limit`
`time_limit`	`seconds: f64`	Wall-clock time exceeds `seconds`
`bound_stalling`	`iterations: u32`, `tolerance: f64`	Lower bound improvement falls below tolerance
`simulation`	`replications`, `period`, `bound_window`, `distance_tol`, `bound_tol`	Policy and bound have both stabilized

Multiple rules combine according to training.stopping_mode: "any" (default, OR semantics — stop when any rule triggers) or "all" (AND semantics — stop only when all rules trigger simultaneously).

Policy modes

The policy.mode field controls warm-start behavior:

Mode	Behavior
`"fresh"`	(default) Start from scratch; no policy files are read
`"warm_start"`	Load existing cuts and states from `policy.path` as a starting approximation
`"resume"`	Resume an interrupted run from the last checkpoint

When mode is "warm_start" or "resume", load_case also validates policy compatibility: the stored policy’s entity counts, stage count, and cut dimensions must match the current case. Mismatches return LoadError::PolicyIncompatible.

Error handling

All errors returned by load_case and its internal parsers are variants of LoadError:

`IoError`

I/O error reading {path}: {source}

Occurs when a required file exists in the file manifest but cannot be read from disk (file not found, permission denied, or other OS-level I/O failure). Fields: path: PathBuf (the file that failed) and source: std::io::Error (the underlying error).

When it occurs: Layer 1 or Layer 2, when std::fs::read_to_string or a Parquet reader returns an error for a required file.

`ParseError`

parse error in {path}: {message}

Occurs when a file is readable but its content is malformed — invalid JSON syntax, unexpected end of input, or an unreadable Parquet column header. Fields: path: PathBuf and message: String (description of the parse failure).

When it occurs: Layer 2, during initial deserialization of JSON or Parquet files before any field-level validation runs.

`SchemaError`

schema error in {path}, field {field}: {message}

Occurs when a file parses successfully but a field violates a schema constraint: a required field is missing, a value is outside its valid range, or an enum discriminator names an unknown variant. Fields: path: PathBuf, field: String (dot-separated path to the offending field, e.g., "hydros[3].bus_id"), and message: String.

When it occurs: Layer 2, during post-deserialization validation. Also returned by parse_config when training.forward_passes or training.stopping_rules is absent.

`CrossReferenceError`

cross-reference error: {source_entity} in {source_file} references
non-existent {target_entity} in {target_collection}

Occurs when an entity ID field references an entity that does not exist in the expected registry. Fields: source_file: PathBuf, source_entity: String (e.g., "Hydro 'H1'"), target_collection: String (e.g., "bus registry"), and target_entity: String (e.g., "BUS_99").

When it occurs: Layer 3 (referential integrity). All broken references across all entity types are collected before returning.

`ConstraintError`

constraint violation: {description}

A catch-all for collected validation errors from any validation layer, and for SystemBuilder::build() rejections. The description field contains all error messages joined by newlines, each prefixed with its [ErrorKind], source file, optional entity identifier, and message text.

When it occurs: After any validation layer collects one or more error-severity diagnostics, or when SystemBuilder::build() finds duplicate IDs or a cascade cycle in the final construction step.

`PolicyIncompatible`

policy incompatible: {check} mismatch — policy has {policy_value},
system has {system_value}

Occurs when a warm-start or resume policy file is structurally incompatible with the current case. The four compatibility checks are: hydro count, stage count, cut dimension, and entity identity hash. Fields: check: String (name of the failing check), policy_value: String, and system_value: String.

When it occurs: After all five validation layers pass, when policy.mode is "warm_start" or "resume" and the stored policy fails a compatibility check.

Design notes

Collect-all validation. Unlike parsers that short-circuit on the first error, all five validation layers collect diagnostics into a shared ValidationContext before failing. When load_case returns a ConstraintError, the description field contains every problem found in a single report. This avoids the fix-one-error-re-run-repeat cycle on large cases.

File-format split. Entity identity data (IDs, names, topology, static parameters) lives in JSON. Time-varying and per-stage data (bounds, penalty overrides, statistical parameters, scenarios) lives in Parquet. JSON is easy to read and edit by hand; Parquet handles large numeric tables efficiently.

Resolution separates concerns. The three-tier cascade is resolved once at load time into dense arrays, not at every solver call. Downstream solver crates call system.penalties().hydro(entity_idx, stage_idx) and get an f64 with no branching, no hash lookups, and no tier logic. The complexity of the cascade is entirely contained in cobre-io.

Declaration-order invariance. All entity collections are sorted by ID before SystemBuilder::build() is called. Any System built from the same entities, regardless of the order they appear in the input files, produces a structurally identical result with identical pre-resolved tables.

Estimation as a loading mode. The estimation path is triggered by the presence of inflow_history.parquet combined with the absence of inflow_seasonal_stats.parquet. This design allows callers to switch between the explicit-stats path (provide pre-computed files) and the estimation path (provide raw history) without any code changes — only the files present in the case directory determine which path runs.

Keyboard shortcuts

Cobre