Parser Structure and Inputs
This page explains how PowerTableDataParser.jl turns a directory of spreadsheet-style inputs into a PowerSystemTableData object. It is meant to be read once to build a mental model of the pieces involved; the Parse Tabular Data from .csv Files how-to covers the step-by-step mechanics.
The four kinds of input
Building a PowerSystemTableData requires four kinds of input:
- CSV files — one per component category (
bus.csv,gen.csv, ...). These carry the actual numerical and string data for each component. - A user descriptor YAML (
user_descriptors.yaml) — tells the parser how your column names and units map onto the PowerSystems-standard field names and per-unit conventions. - A generator mapping YAML (
generator_mapping.yaml) — tells the parser which concreteGeneratorsubtype to instantiate for each(fuel, unit_type)pair ingen.csv. - A time-series pointer file (
timeseries_pointers.jsonor.csv, optional) — associates components with time-series data on disk.
In addition, the package ships a built-in default descriptor JSON (power_system_inputs.json) that defines the PowerSystems-standard schema. You almost never edit this file directly: the user descriptor YAML overrides entries in it.
The parsing flow
At a high level, construction of PowerSystemTableData(directory, base_power, user_descriptor_file; ...) proceeds in four phases:
┌─────────────────┐ ┌──────────────────┐ ┌──────────────────────┐
│ directory/ │ │ user_descriptors │ │ generator_mapping │
│ ├─ bus.csv │ │ .yaml │ │ .yaml │
│ ├─ branch.csv │ │ (your column │ │ (fuel, type) → │
│ ├─ gen.csv │ │ names / units) │ │ Generator subtype │
│ └─ ... │ └──────────────────┘ └──────────────────────┘
└────────┬────────┘ │ │
│ (1) discover & │ │
│ read CSVs │ │
▼ │ │
Dict{String, DataFrame} │ │
│ │ │
│ (2) attach │ │
│ descriptors ◄─────┘ │
│ │
│ (3) load generator mapping ◄───────────────────┘
▼
┌─────────────────────────────────────────────────────────────────────┐
│ PowerSystemTableData │
│ base_power | category_to_df | user_descriptors | descriptors | │
│ generator_mapping | timeseries_metadata_file | directory │
└─────────────────────────────────────────────────────────────────────┘
│
│ (4) later: PowerSystems.System(data; ...)
▼
System1. Discover and read the CSV files
The constructor walks directory with two regular expressions:
^[A-Za-z]+$— treats a plain alphabetic sub-directory as a grouping folder and recurses into it for its own*.csvfiles.(.*?)\.csv— matches CSV files indirectoryitself.
Each matching file is read with CSV.File into a DataFrames.DataFrame and stored under its stem name (bus.csv becomes key "bus"). The raw layout is a Dict{String, Any} whose keys are either a top-level category name ("bus", "gen", ...) or a sub-folder name mapping to a nested dict.
If no CSVs are found, the constructor errors. If bus.csv is missing, the next phase raises DataFormatError("key 'bus' not found in input data"): bus data is the only required input.
2. Associate DataFrames with categories
The dictionary of DataFrames is re-keyed by InputCategory enum values (BUS, BRANCH, GEN, LOAD, DC_BRANCH, RESERVE, STORAGE). The mapping from CSV name to enum is the INPUT_CATEGORY_NAMES constant in src/power_system_table_data.jl. Categories that are not present in the directory are simply omitted from category_to_df (so, for example, a system with no storage just has no STORAGE entry).
At the same time, the two descriptor inputs are loaded:
- The default descriptor JSON (always loaded from the package) defines the canonical field names, default values, expected units, and per-unit conventions.
- The user descriptor YAML is loaded and re-keyed by uppercased category symbol (
:BUS,:GENERATOR, ...). The special case"reserves"is renamed to"reserve"to match the enum.
The user descriptors effectively overlay the defaults: for each PowerSystems field the user YAML can provide a custom_name (the column name as it appears in your CSV), a unit, a unit_system, or a base_reference. Fields left blank in the user YAML fall back to the default descriptor's entry.
3. Load the generator mapping
get_generator_mapping(generator_mapping_file) reads generator_mapping.yaml and returns a Dict{NamedTuple, String} where each key is (fuel = "...", unit_type = "...") and the value is the name of a concrete Generator subtype (e.g. ThermalStandard, RenewableDispatch). A bundled default lives at src/generator_mapping_cdm.yaml.
Duplicate (fuel, type) entries in the YAML raise an error — the mapping must be unambiguous. Legacy GenericBattery entries are silently translated to EnergyReservoirStorage with a warning.
4. Resolve the time-series pointer file
The constructor accepts a timeseries_metadata_file keyword (default joinpath(directory, "timeseries_pointers")). If that path does not exist as given, the parser tries appending .json and then .csv; if neither exists, the field is set to nothing and no time series will be attached when the system is built. This makes time-series data strictly opt-in.
The pointer file itself is not parsed here — it is stored on the PowerSystemTableData and consumed later by PowerSystems.jl when you call System(data; ...).
How the YAML files are used
The two YAML inputs play very different roles:
user_descriptors.yaml — schema translation
This file answers the question: "What do my CSV columns mean in PowerSystems terms?" A minimal snippet for the generator category might look like:
generator:
- name: name
custom_name: GEN UID
- name: bus_id
custom_name: Bus ID
- name: fuel
custom_name: Fuel
- name: active_power
custom_name: MW
unit: MW
unit_system: natural_unitsThe name field is the PowerSystems-standard name (must exist in the default descriptor JSON); custom_name is the column header in your CSV; unit and unit_system declare the physical units and per-unit convention of your data. When PowerSystems.jl later iterates over rows of this category, the parser uses these mappings to:
- pull the right column from the DataFrame, even if you renamed it;
- convert your values into the per-unit convention PowerSystems expects (system base, device base, or natural units);
- apply simple unit conversions (
GW↔MW,radian↔degree, ...).
Entries you do not override are taken verbatim from the default descriptor JSON, so you only need to list the columns you actually want to customize.
generator_mapping.yaml — type dispatch
This file answers the question: "Given a fuel and type in gen.csv, which concrete Generator subtype should I construct?" The bundled default looks like:
HydroDispatch:
- {fuel: HYDRO, type: ROR}
RenewableDispatch:
- {fuel: SOLAR, type: PV}
- {fuel: WIND, type: WIND}
ThermalStandard:
- {fuel: COAL, type: null}
- {fuel: NG, type: null}
- {fuel: NUCLEAR, type: null}A null type means "match any type for this fuel". At parse time the parser looks up (fuel, unit_type) in this dictionary and creates the corresponding struct. Any row whose (fuel, type) pair is not in the mapping will raise an error — if you introduce a new technology, you extend this YAML.
Supported CSV categories
The package currently recognizes the categories listed in INPUT_CATEGORY_NAMES:
| CSV file | Category enum | Required? |
|---|---|---|
bus.csv | InputCategory.BUS | yes |
branch.csv | InputCategory.BRANCH | no |
dc_branch.csv | InputCategory.DC_BRANCH | no |
gen.csv | InputCategory.GENERATOR | no |
load.csv | InputCategory.LOAD | no |
reserves.csv | InputCategory.RESERVE | no |
storage.csv | InputCategory.STORAGE | no |
Two conventions on bus.csv are worth calling out explicitly:
- Columns named
areaandzoneare promoted intoAreaandLoadZoneobjects automatically. - Columns named
max_active_power/max_reactive_powercreatePowerLoadobjects for nonzero rows and contribute to thepeak_active_power/peak_reactive_powerof the correspondingLoadZone.
Other enum values (SIMULATION_OBJECTS, FACTS, DCBRTYPE, DCBRSTATUS, TICT) exist in the schema for forward compatibility with PowerSystems, but there is no corresponding top-level CSV file name wired up in INPUT_CATEGORY_NAMES today.
The PowerSystemTableData object
After construction, PowerSystemTableData is simply a container holding the pieces above:
| Field | What it holds |
|---|---|
base_power | System base MVA (defaults to 100.0 if not provided) |
category_to_df | Dict{Symbol, DataFrame}, keyed by InputCategory symbol |
user_descriptors | Parsed user_descriptors.yaml (keyed by uppercased symbol) |
descriptors | Parsed default power_system_inputs.json |
generator_mapping | Dict{NamedTuple, String} from (fuel, type) to subtype |
timeseries_metadata_file | Absolute path to pointer file, or nothing |
directory | Original input directory (used to resolve relative paths) |
Nothing in this object depends on PowerSystems.jl; the handoff happens only when you pass data to PowerSystems.System(data; ...), which iterates the DataFrames row-by-row using the descriptors and mapping to build concrete system components.
Deprecation notice
The tabular parser — originally part of PowerSystems.jl's CDM subsystem — is in long-term maintenance mode. PowerSystems.jl is moving toward a database-backed data layer; users starting a new dataset are encouraged to write a small custom Julia importer rather than depend on this parser. This package exists to give existing CDM-based workflows a stable home while that transition plays out.