Understanding Results

After cobre run completes, the output directory contains three categories of artifacts: training convergence data, a saved policy checkpoint, and simulation dispatch results. This page explains how to read each category and how to query the results programmatically using cobre report.

If you have not yet run the quickstart, complete Quickstart first — this page references the my_first_study/results/ directory produced by that walkthrough.

The Post-Run Summary

When cobre run finishes, it prints a summary block to stderr. The 1dtoy run from the quickstart produces output similar to:

Training complete in 0.5s (128 iterations, iteration_limit)
  Lower bound:  1.55955e7 $/stage
  Upper bound:  5.79592e5 +/- 0.00000e0 $/stage
  Gap:          -2590.8% (started at 70.5%)
  Policy rows:  384 active / 384 generated
  LP solves:    5632 (5632 first-try, 0 retried, 0 failed)

Simulation complete in 0.6s (100 scenarios)
  Completed: 100  Failed: 0

Output written to my_first_study/results/

Exact numerical values vary across runs because scenario sampling is stochastic. The values below are representative of the 1dtoy example; your run will differ slightly.

Line	What it means
`Training complete in 0.5s (128 iterations, iteration_limit)`	Training ran for 128 iterations (the limit set in `config.json`) and stopped because the iteration limit was reached, not because a convergence criterion was met.
`Lower bound: 1.55955e7 $/stage`	The optimizer’s best proven lower bound on the minimum expected cost per stage. As training progresses this value rises and stabilizes.
`Upper bound: 5.79592e5 +/- 0.00000e0 $/stage`	A statistical estimate of the true expected cost, computed from the forward-pass scenarios in the final iteration. The `+/-` term is the standard deviation across those scenarios. With `forward_passes: 1` this is a single-scenario estimate, so the standard deviation is zero and the estimate is highly variable.
`Gap: -2590.8% (started at 70.5%)`	The relative distance between the lower and upper bounds expressed as a percentage. The large negative value is expected with `forward_passes: 1`: a single forward-pass scenario is a noisy upper-bound estimate that can land far below the lower bound. Increasing `forward_passes` produces a stable, well-behaved gap.
`Policy rows: 384 active / 384 generated`	The total number of optimality cut rows in the policy pool. All 384 are currently active; none were deactivated (the 1dtoy config does not enable cut selection).
`LP solves: 5632 (5632 first-try, 0 retried, 0 failed)`	Total number of linear programs solved across all stages and iterations, with a breakdown by outcome.
`Simulation complete in 0.6s (100 scenarios)`	The post-training simulation evaluated the trained policy over 100 independently sampled scenarios.
`Completed: 100 Failed: 0`	All 100 scenarios completed without solver errors.
`Output written to my_first_study/results/`	Root path of the output directory.

Lower bound vs. upper bound. The lower bound is the optimizer’s proven best estimate of the minimum achievable cost. The upper bound is the average cost observed when running the current policy over sampled scenarios. When the gap is small, the policy is near-optimal. When the gap is large, running more iterations will typically narrow it further.

Termination reasons. The parenthetical after the iteration count explains why training stopped:

iteration_limit — the maximum iteration count was reached (the 1dtoy default).
converged at iter N — a convergence criterion was met at iteration N and training stopped early. This appears when you configure a bound_stalling or similar rule in config.json.

Theory reference: For the mathematical definition of lower and upper bounds, optimality gap, and stopping criteria, see Convergence in the methodology reference.

Output Directory Structure

All artifacts are written under the results directory you specified with --output. The 1dtoy run produces:

my_first_study/results/
  training/
    metadata.json           Run metadata: configuration, convergence, row-pool, bounds, solve stats
    convergence.parquet     Per-iteration convergence metrics (lower bound, upper bound, gap)
    dictionaries/
      codes.json            Integer-to-string code mappings for entity categories
      state_dictionary.json State variable definitions and units
      entities.csv          Entity registry (id, name, type)
      variables.csv         LP variable registry
      bounds.parquet        LP variable bound definitions
    timing/
      iterations.parquet    Per-iteration wall-clock timing broken down by phase
  policy/
    cuts/
      stage_000.bin         FlatBuffers-encoded optimality cuts for stage 0
      stage_001.bin         ... stage 1
      stage_002.bin         ... stage 2
      stage_003.bin         ... stage 3
    basis/
      stage_000.bin         LP basis checkpoints for warm-starting
      stage_001.bin
      stage_002.bin
      stage_003.bin
    metadata.json           Policy metadata: stage count, cut counts per stage
  simulation/
    metadata.json           Run metadata: scenario counts, cost statistics, solve stats
    buses/
      scenario_id=0000/data.parquet
      scenario_id=0001/data.parquet
      ...                   One partition per scenario
    costs/
      scenario_id=0000/data.parquet
      ...
    hydros/
      scenario_id=0000/data.parquet
      ...
    thermals/
      scenario_id=0000/data.parquet
      ...
    inflow_lags/            Inflow lag state data used to initialize scenario chains

The three top-level subdirectories have distinct roles:

training/ — everything produced during the training loop: convergence history, timing, and the dictionaries needed to interpret LP variable indices.
policy/ — the trained policy checkpoint. These binary files encode the optimality cuts built during training. They can be used to resume or extend a study.
simulation/ — the dispatch results from evaluating the trained policy over 100 simulation scenarios.

Training Results

Reading `training/metadata.json`

The training metadata file is the canonical record of what happened during training. The 1dtoy run produces:

{
  "cobre_version": "0.9.1",
  "hostname": "<hostname>",
  "solver": "highs",
  "solver_version": "<solver version>",
  "started_at": "<timestamp>",
  "completed_at": "<timestamp>",
  "duration_seconds": 0.15,
  "status": "complete",
  "configuration": {
    "seed": null,
    "max_iterations": 128,
    "forward_passes": 1,
    "stopping_mode": "any",
    "policy_mode": "fresh"
  },
  "problem_dimensions": {
    "num_stages": 4,
    "num_hydros": 1,
    "num_thermals": 2,
    "num_buses": 1,
    "num_lines": 0
  },
  "iterations": {
    "completed": 128,
    "converged_at": null
  },
  "convergence": {
    "achieved": false,
    "final_gap_percent": -2590.77437875556,
    "termination_reason": "iteration_limit"
  },
  "row_pool": {
    "total_generated": 384,
    "total_active": 384,
    "peak_active": 384,
    "cuts_active": 384,
    "rows_in_lp_total": 0,
    "rows_in_lp_solve_count": 0,
    "rows_in_lp_max": 0
  },
  "bounds": {
    "final_lower_bound": 15595518.381798675,
    "final_upper_bound": 579592.1986224408,
    "final_upper_bound_std": 0.0
  },
  "solve_stats": {
    "total_lp_solves": 5632,
    "first_try": 5632,
    "retried": 0,
    "failed": 0,
    "forward_solve_seconds": 0.016,
    "backward_solve_seconds": 0.079,
    "parallelism": 1
  },
  "distribution": {
    "backend": "local",
    "world_size": 1,
    "ranks_participated": 1,
    "num_nodes": 1,
    "threads_per_rank": 1,
    "hosts": [{ "hostname": "<hostname>", "ranks": [0] }]
  }
}

Field-by-field explanation of the key fields:

Field	Meaning
`cobre_version`	The cobre binary version that produced this output. Useful for auditing results from different releases.
`solver`	LP backend used: `"highs"` or `"clp"`.
`status`	`"complete"` when the training run finished normally.
`iterations.completed`	Number of training iterations that were executed.
`iterations.converged_at`	If training stopped early due to a convergence criterion, the iteration number where it stopped. `null` for an iteration-limit stop.
`convergence.achieved`	`true` if a convergence stopping rule was satisfied, `false` if the iteration limit was reached first.
`convergence.final_gap_percent`	The gap between lower and upper bounds at the end of training, as a percentage. A large or negative value (as seen in the 1dtoy case) indicates the bounds have not tightened sufficiently.
`convergence.termination_reason`	Machine-readable reason for stopping. Common values: `"iteration_limit"`, `"bound_stalling"`.
`row_pool.total_generated`	Total optimality cut rows created across all stages over the entire training run.
`row_pool.total_active`	Cut rows still active in the pool at the end of training.
`row_pool.peak_active`	Highest number of simultaneously active cut rows observed during training.
`row_pool.cuts_active`	Cut rows currently active in the LP at termination.
`row_pool.rows_in_lp_total`	Sum of resident rows-in-LP over every lazy-selection solve. Zero when no lazy selection ran.
`row_pool.rows_in_lp_solve_count`	Number of lazy-selection solves in the run. Zero when no lazy selection ran.
`row_pool.rows_in_lp_max`	Largest resident rows-in-LP over any single lazy-selection solve. Zero when no lazy selection ran.
`bounds.final_lower_bound`	Final proven lower bound on the minimum expected cost at termination.
`bounds.final_upper_bound`	Final upper bound estimate at termination. `null` when upper-bound evaluation is disabled.
`distribution.backend`	Communication backend: `"local"` for single-process, `"mpi"` for distributed runs.
`distribution.world_size`	Number of processes involved in the run. `1` for single-process runs.
`distribution.threads_per_rank`	Number of rayon worker threads per process.

What “converged” means in practice. A converged run (convergence.achieved: true) means a stopping rule determined that continuing would not meaningfully improve the policy. The 1dtoy case hits its 128-iteration budget before a convergence rule fires, so achieved is false. For larger studies, configure a bound_stalling or gap_threshold stopping rule in config.json to stop automatically when the gap stabilizes.

Simulation Results

Hive-Partitioned Layout

The simulation output uses Hive partitioning: results are split into one data.parquet file per scenario, stored in a directory named scenario_id=NNNN/. This layout is natively understood by Polars, Pandas (via PyArrow), R’s arrow package, and DuckDB — they can read the entire simulation/costs/ directory as a single table and filter by scenario_id at the storage layer without loading all data into memory.

The four entity categories are:

Directory	Contents
`buses/`	Power balance results: load, generation injections, deficit, and excess at each bus per stage and block.
`hydros/`	Hydro dispatch: turbined flow, spillage, reservoir storage levels, inflows, and generation per plant per stage and block.
`thermals/`	Thermal dispatch: generation output per unit per cost segment per stage and block.
`costs/`	Objective cost breakdown: total cost, thermal cost, hydro cost, penalty cost, and discount factor per stage.

Results are in Parquet format. To read them, use any columnar data tool:

# Polars — reads all 100 scenarios at once
import polars as pl
df = pl.read_parquet("my_first_study/results/simulation/costs/")
print(df.head())

# Pandas + PyArrow
import pandas as pd
df = pd.read_parquet("my_first_study/results/simulation/costs/")
print(df.head())

-- DuckDB — filter to a single scenario
SELECT * FROM read_parquet('my_first_study/results/simulation/costs/**/*.parquet')
WHERE scenario_id = 0;

# R with arrow
library(arrow)
ds <- open_dataset("my_first_study/results/simulation/costs/")
dplyr::collect(dplyr::filter(ds, scenario_id == 0))

Querying Results with `cobre report`

cobre report reads the JSON metadata files and prints a structured JSON summary to stdout. Use it with jq to extract specific metrics in scripts or CI pipelines.

# Print the full report
cobre report my_first_study/results

The output has this top-level shape:

{
  "output_directory": "/abs/path/to/results",
  "status": "complete",
  "bounds": { "final_lower_bound": ..., "final_upper_bound": ... },
  "training": { "iterations": {}, "convergence": {}, "row_pool": {}, "bounds": {}, "configuration": {}, "problem_dimensions": {} },
  "cost": { "mean_cost": ..., "std_cost": ... } | null,
  "simulation": { "scenarios": {}, "cost": {} } | null
}

Practical `jq` queries

# Extract the final convergence gap
cobre report my_first_study/results | jq '.training.convergence.final_gap_percent'

# Check how many iterations ran
cobre report my_first_study/results | jq '.training.iterations.completed'

# Check simulation scenario counts
cobre report my_first_study/results | jq '.simulation.scenarios'

# Use the status in a CI script: exit non-zero if training failed
status=$(cobre report my_first_study/results | jq -r '.status')
if [ "$status" != "complete" ]; then
  echo "Run did not complete successfully: $status" >&2
  exit 1
fi

# Check convergence was achieved (returns true or false)
cobre report my_first_study/results | jq '.training.convergence.achieved'

For the complete cobre report documentation and all available JSON fields, see CLI Reference.

For a detailed description of every field in every output file, see Output Format Reference.

Cobre