Parse Tabular Data from .csv Files
This guide walks through parsing a directory of CSV files into a PowerSystemTableData object and handing it to PowerSystems.jl to build a System. For the conceptual background on how the parser combines CSV, YAML, and time-series inputs, see Parser Structure and Inputs.
Minimal usage
If your CSV files already follow the PowerSystems-standard column names and units, a minimal invocation looks like this:
using PowerTableDataParser
using PowerSystems
data_dir = "/data/my-data-dir"
base_power = 100.0
descriptors = joinpath(data_dir, "user_descriptors.yaml")
data = PowerSystemTableData(data_dir, base_power, descriptors)
sys = System(data; time_series_in_memory = true)This call will:
- Discover and read every
*.csvindata_dir(and one level of sub-directories). - Require that
bus.csvexists; other category files are optional. - Load
user_descriptors.yamland merge it with the built-in default descriptorpower_system_inputs.json. - Load the bundled default
generator_mapping_cdm.yamlto resolve(fuel, type)pairs ingen.csvto concreteGeneratorsubtypes. - Look for a
timeseries_pointers.json(or.csv) indata_dir; if found, attach it soSystemcan load time series.
Full usage with overrides
To override both the generator mapping and the time-series pointer file:
data_dir = "/data/my-data-dir"
base_power = 100.0
descriptors = "./user_descriptors.yaml"
timeseries_metadata_file = "./timeseries_pointers.json"
generator_mapping_file = "./generator_mapping.yaml"
data = PowerSystemTableData(
data_dir,
base_power,
descriptors;
generator_mapping_file = generator_mapping_file,
timeseries_metadata_file = timeseries_metadata_file,
)
sys = System(data; time_series_in_memory = true)Example configuration files can be found in the RTS-GMLC repository:
Supported categories
Components for each category must be defined in their own CSV file. The following categories are currently supported:
bus.csv(required)- Columns named
areaandzonecreate a corresponding set ofAreaandLoadZoneobjects. - Columns named
max_active_powerormax_reactive_powercreatePowerLoadobjects when nonzero values are encountered, and contribute to thepeak_active_power/peak_reactive_powerof the correspondingLoadZone.
- Columns named
branch.csvdc_branch.csvgen.csvload.csvreserves.csvstorage.csv
All of these files must reside in the directory passed to PowerSystemTableData, or inside a single level of alphabetical sub-folders within it.
CSV data configurations
Custom construction of generators
PowerTableDataParser constructs concrete subtypes of Generator based on the fuel and type columns in gen.csv and the generator_mapping_file. The default file is src/generator_mapping_cdm.yaml; override it by passing your own via the generator_mapping_file keyword.
Each top-level key in the YAML is a target Generator subtype, and its value is a list of (fuel, type) pairs that should map to that subtype:
ThermalStandard:
- {fuel: COAL, type: null}
- {fuel: NG, type: null}
RenewableDispatch:
- {fuel: SOLAR, type: PV}
- {fuel: WIND, type: WIND}A null type acts as a wildcard for any type value with that fuel. Duplicate (fuel, type) entries raise an error.
Column names
PowerTableDataParser provides an input-mapping layer so you can keep your own column names. For example, when parsing raw data for a generator the parser expects a column called name. If the raw data instead defines that column as GEN UID, set the custom_name field under the generator category in your user_descriptors.yaml:
generator:
- name: name
custom_name: GEN UIDTo build a complete user_descriptors.yaml from scratch, start from the defaults defined in src/power_system_inputs.json and copy the entries you need, adding custom_name, unit, or unit_system overrides as appropriate. The user-specific customizations are intentionally kept in YAML rather than JSON to make them easier to edit by hand. Do not edit the default JSON file.
Per-unit conversion
PowerTableDataParser defines whether it expects a column value to be per-unit system base, per-unit device base, or in natural units via the unit_system field in power_system_inputs.json. If it expects a per-unit convention that differs from your values, set unit_system in user_descriptors.yaml and the parser will automatically convert the values.
For example, if you have a max_active_power column stored in natural units (MW) but power_system_inputs.json specifies unit_system: device_base, add unit_system: natural_units in user_descriptors.yaml and the parser will divide the value by the entry in the column identified by the base_reference field in power_system_inputs.json. You can also override base_reference by adding base_reference: My Column to make the device-base per-unit conversion divide by My Column instead. System-base per-unit conversions always divide by the base_power passed to the PowerSystemTableData constructor.
Unit conversion
The parser supports a limited set of unit conversions. For example, if power_system_inputs.json indicates a value's unit is degree but your values are in radians, set unit: radian in your YAML file. Other valid unit entries include GW, GWh, MW, MWh, kW, and kWh.
Attaching time series
PowerSystems.jl requires a metadata file that associates components with their time-series data. PowerTableDataParser accepts either a JSON or CSV pointer file via the timeseries_metadata_file keyword; the default search path is joinpath(directory, "timeseries_pointers"), with .json and .csv tried in that order.
Each entry in the pointer file must provide:
simulation— user description of the simulationresolution— resolution of the time series in secondsmodule— module that defines the abstract type of the componentcategory— component type (Bus,ElectricLoad,Generator,LoadZone,Reserve)component_name— name of the componentname— user-defined name for the time-series datanormalization_factor—1.0for pre-normalized data,"Max"to divide by the column max, or a numeric scaling factorscaling_factor_multiplier_module— module that defines the scaling factor accessorscaling_factor_multiplier— accessor function namedata_file— path to the time-series data file
The module, category, and component_name entries must be valid arguments to get_component(${module}.${category}, sys, $name). The scaling_factor_multiplier_module and scaling_factor_multiplier entries must be sufficient to return the scaling factor data via ${scaling_factor_multiplier_module}.${scaling_factor_multiplier}(component).
See RTS-GMLC for a worked example.
By default PowerSystems.jl stores time-series data in HDF5 files and reads them on demand. Pass time_series_in_memory = true to System when your data fits in memory; pass time_series_directory = X to point the HDF5 store at a specific directory, or set the environment variable SIENNA_TIME_SERIES_DIRECTORY.
Extending the tabular parser
This section is for developers who want to teach the parser about new columns. It assumes familiarity with the sections above.
The key rule is: do not read hard-coded column names out of DataFrames. Use the descriptor layer so PowerSystems-standard names stay decoupled from whatever the user happens to call their column.
Procedure
Add an entry to the array of parameters for your category in
src/power_system_inputs.jsonfollowing these rules:- Use
snake_caseforname. nameanddescriptionare required.- Prefer a name that is generic and not dataset-specific.
- Define
unitwhen applicable. - If the parser should treat the value as system per-unit, set
system_per_unit: true.
- Use
If you maintain widely-used user descriptor files (e.g. the RTS-GMLC SIIP config), update them and submit pull requests so downstream users pick up the new field.
Consume the new column in your parsing code like this:
function demo_bus_csv_parser!(data::PowerSystemTableData)
for bus in iterate_rows(data, InputCategory.BUS)
@show bus.name, bus.max_active_power, bus.max_reactive_power
end
enditerate_rows returns a NamedTuple whose fields are the name entries defined in power_system_inputs.json, already translated from the user's column names and unit conventions.
The tabular parser is in long-term maintenance mode. PowerSystems.jl will eventually move to a database-backed data layer, and new datasets are encouraged to ship a small custom Julia importer rather than depend on this parser. This package exists to keep existing CDM-based workflows working while that transition proceeds.