PSML: A Multi-scale Time-series Dataset for Machine Learning in Decarbonized Energy Grids

Overview

PSML: A Multi-scale Time-series Dataset for Machine Learning in Decarbonized Energy Grids

The electric grid is a key enabling infrastructure for the ambitious transition towards carbon neutrality as we grapple with climate change. With deepening penetration of renewable energy resources and electrified transportation, the reliable and secure operation of the electric grid becomes increasingly challenging. In this paper, we present PSML, a first-of-its-kind open-access multi-scale time-series dataset, to aid in the development of data-driven machine learning (ML) based approaches towards reliable operation of future electric grids. The dataset is generated through a novel transmission + distribution (T+D) co-simulation designed to capture the increasingly important interactions and uncertainties of the grid dynamics, containing electric load, renewable generation, weather, voltage and current measurements at multiple spatio-temporal scales. Using PSML, we provide state-of-the-art ML baselines on three challenging use cases of critical importance to achieve: (i) early detection, accurate classification and localization of dynamic disturbance events; (ii) robust hierarchical forecasting of load and renewable energy with the presence of uncertainties and extreme events; and (iii) realistic synthetic generation of physical-law-constrained measurement time series. We envision that this dataset will enable advances for ML in dynamic systems, while simultaneously allowing ML researchers to contribute towards carbon-neutral electricity and mobility.

Dataset Navigation

We put Full dataset in Zenodo. Please download, unzip and put somewhere for later benchmark results reproduction and data loading and performance evaluation for proposed methods.

wget https://zenodo.org/record/5130612/files/PSML.zip?download=1
7z x 'PSML.zip?download=1' -o./

Minute-level Load and Renewable

  • File Name
    • ISO_zone_#.csv: CAISO_zone_1.csv contains minute-level load, renewable and weather data from 2018 to 2020 in the zone 1 of CAISO.
  • Field Description
    • Field time: Time of minute resolution.
    • Field load_power: Normalized load power.
    • Field wind_power: Normalized wind turbine power.
    • Field solar_power: Normalized solar PV power.
    • Field DHI: Direct normal irradiance.
    • Field DNI: Diffuse horizontal irradiance.
    • Field GHI: Global horizontal irradiance.
    • Field Dew Point: Dew point in degree Celsius.
    • Field Solar Zeinth Angle: The angle between the sun's rays and the vertical direction in degree.
    • Field Wind Speed: Wind speed (m/s).
    • Field Relative Humidity: Relative humidity (%).
    • Field Temperature: Temperature in degree Celsius.

Minute-level PMU Measurements

  • File Name
    • case #: The case 0 folder contains all data of scenario setting #0.
      • pf_input_#.txt: Selected load, renewable and solar generation for the simulation.
      • pf_result_#.csv: Voltage at nodes and power on branches in the transmission system via T+D simualtion.
  • Filed Description
    • Field time: Time of minute resolution.
    • Field Vm_###: Voltage magnitude (p.u.) at the bus ### in the simulated model.
    • Field Va_###: Voltage angle (rad) at the bus ### in the simulated model.
    • Field P_#_#_#: P_3_4_1 means the active power transferring in the #1 branch from the bus 3 to 4.
    • Field Q_#_#_#: Q_5_20_1 means the reactive power transferring in the #1 branch from the bus 5 to 20.

Millisecond-level PMU Measurements

  • File Name
    • Forced Oscillation: The folder contains all forced oscillation cases.
      • row_#: The folder contains all data of the disturbance scenario #.
        • dist.csv: Three-phased voltage at nodes in the distribution system via T+D simualtion.
        • info.csv: This file contains the start time, end time, location and type of the disturbance.
        • trans.csv: Voltage at nodes and power on branches in the transmission system via T+D simualtion.
    • Natural Oscillation: The folder contains all natural oscillation cases.
      • row_#: The folder contains all data of the disturbance scenario #.
        • dist.csv: Three-phased voltage at nodes in the distribution system via T+D simualtion.
        • info.csv: This file contains the start time, end time, location and type of the disturbance.
        • trans.csv: Voltage at nodes and power on branches in the transmission system via T+D simualtion.
  • Filed Description

    trans.csv

    • Field Time(s): Time of millisecond resolution.
    • Field VOLT ###: Voltage magnitude (p.u.) at the bus ### in the transmission model.
    • Field POWR ### TO ### CKT #: POWR 151 TO 152 CKT '1 ' means the active power transferring in the #1 branch from the bus 151 to 152.
    • Field VARS ### TO ### CKT #: VARS 151 TO 152 CKT '1 ' means the reactive power transferring in the #1 branch from the bus 151 to 152.

    dist.csv

    • Field Time(s): Time of millisecond resolution.
    • Field ####.###.#: 3005.633.1 means per-unit voltage magnitude of the phase A at the bus 633 of the distribution grid, the one connecting to the bus 3005 in the transmission system.

Installation

  • Install PSML from source.
git clone https://github.com/tamu-engineering-research/Open-source-power-dataset.git
  • Create and activate anaconda virtual environment
conda create -n PSML python=3.7.10
conda activate PSML
  • Install required packages
pip install -r ./Code/requirements.txt

Package Usage

We've prepared the standard interfaces of data loaders and evaluators for all of the three time series tasks:

(1) Data loaders

We prepare the following Pytorch data loaders, with both data processing and splitting included. You can easily load data with a few lines for different tasks by simply modifying the task parameter.

from Code.dataloader import TimeSeriesLoader

loader = TimeSeriesLoader(task='forecasting', root='./PSML') # suppose the raw dataset is downloaded and unzipped under Open-source-power-dataset
train_loader, test_loader = loader.load(batch_size=32, shuffle=True)

(2) Evaluators

We also provide evaluators to support fair comparison among different approaches. The evaluator receives the dictionary input_dict (we specify key and value format of different tasks in evaluator.expected_input_format), and returns another dictionary storing the performance measured by task-specific metrics (explanation of key and value can be found in evaluator.expected_output_format).

from Code.evaluator import TimeSeriesEvaluator
evaluator = TimeSeriesEvaluator(task='classification', root='./PSML') # suppose the raw dataset is downloaded and unzipped under Open-source-power-dataset
# learn the appropriate format of input_dict
print(evaluator.expected_input_format) # expected input_dict format
print(evaluator.expected_output_format) # expected output dict format
# prepare input_dict
input_dict = {
    'classification': classfication,
    'localization': localization,
    'detection': detection,
}
result_dict = evaluator.eval(input_dict)
# sample output: {'#samples': 110, 'classification': 0.6248447204968943, 'localization': 0.08633372048006195, 'detection': 42.59349593495935}

Code Navigation

Please see detailed explanation and comments in each subfolder.

  • BenchmarkModel
    • EventClassification: baseline models for event detection, classification and localization
    • LoadForecasting: baseline models for hierarchical load and renewable point forecast and prediction interval
    • Synthetic Data Generation: baseline models for synthetic data generation of physical-laws-constrained PMU measurement time series
  • Joint Simulation: python codes for joint steady-state and transient simulation between transmission and distribution systems
  • Data Processing: python codes for collecting the real-world load and weather data

License

The PSML dataset is published under CC BY-NC 4.0 license, meaning everyone can use it for non-commercial research purpose.

Suggested Citation

  • Please cite the following paper when you use this data hub:
    X. Zheng, N. Xu, L. Trinh, D. Wu, T. Huang, S. Sivaranjani, Y. Liu, and L. Xie, "PSML: A Multi-scale Time-series Dataset for Machine Learning in Decarbonized Energy Grids." (2021).

Contact

Please contact us if you need further technical support or search for cooperation. Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Email contact:   Le Xie,   Yan Liu,   Xiangtian Zheng,   Nan Xu,   Dongqi Wu,   Loc Trinh,   Tong Huang,   S. Sivaranjani.

You might also like...
EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

MADE (Multi-Adapter Dataset Experts) This repository contains the implementation of MADE (Multi-adapter dataset experts), which is described in the pa

PyTorch implementation of Algorithm 1 of "On the Anatomy of MCMC-Based Maximum Likelihood Learning of Energy-Based Models"

Code for On the Anatomy of MCMC-Based Maximum Likelihood Learning of Energy-Based Models This repository will reproduce the main results from our pape

 Learning Energy-Based Models by Diffusion Recovery Likelihood
Learning Energy-Based Models by Diffusion Recovery Likelihood

Learning Energy-Based Models by Diffusion Recovery Likelihood Ruiqi Gao, Yang Song, Ben Poole, Ying Nian Wu, Diederik P. Kingma Paper: https://arxiv.o

[NeurIPS 2021] Code for Unsupervised Learning of Compositional Energy Concepts

Unsupervised Learning of Compositional Energy Concepts This is the pytorch code for the paper Unsupervised Learning of Compositional Energy Concepts.

tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series classification, regression and forecasting.
tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series classification, regression and forecasting.

Time series Timeseries Deep Learning Pytorch fastai - State-of-the-art Deep Learning with Time Series and Sequences in Pytorch / fastai

A universal framework for learning timestamp-level representations of time series

TS2Vec This repository contains the official implementation for the paper Learning Timestamp-Level Representations for Time Series with Hierarchical C

Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal, multi-exposure and multi-focus image fusion.

U2Fusion Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal (VIS-IR, medical), multi

A PyTorch implementation of
A PyTorch implementation of "Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning", IJCAI-21

MERIT A PyTorch implementation of our IJCAI-21 paper Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning. Depen

Releases(v1.0.0)
  • v1.0.0(Nov 10, 2021)

    The electric grid is a key enabling infrastructure for the ambitious transition towards carbon neutrality as we grapple with climate change. With deepening penetration of renewable energy resources and electrified transportation, the reliable and secure operation of the electric grid becomes increasingly challenging. In this paper, we present PSML, a first-of-its-kind open-access multi-scale time-series dataset, to aid in the development of data-driven machine learning based approaches towards reliable operation of future electric grids. The dataset is generated through a novel transmission + distribution co-simulation designed to capture the increasingly important interactions and uncertainties of the grid dynamics, containing electric load, renewable generation, weather, voltage and current measurements at multiple spatio-temporal scales. Using PSML, we provide state-of-the-art ML baselines on three challenging use cases of critical importance to achieve: (i) early detection, accurate classification and localization of dynamic disturbance events; (ii) robust hierarchical forecasting of load and renewable energy with the presence of uncertainties and extreme events; and (iii) realistic synthetic generation of physical-law-constrained measurement time series. We envision that this dataset will provide use-inspired ML research in dynamic safety-critical systems, while simultaneously enabling ML researchers to contribute towards decarbonization of energy sectors.

    Source code(tar.gz)
    Source code(zip)
Owner
Texas A&M Engineering Research
Texas A&M Engineering Research
Black-Box-Tuning - Black-Box Tuning for Language-Model-as-a-Service

Black-Box-Tuning Source code for paper "Black-Box Tuning for Language-Model-as-a-Service". Being busy recently, the code in this repo and this tutoria

Tianxiang Sun 149 Jan 04, 2023
The implemetation of Dynamic Nerual Garments proposed in Siggraph Asia 2021

DynamicNeuralGarments Introduction This repository contains the implemetation of Dynamic Nerual Garments proposed in Siggraph Asia 2021. ./GarmentMoti

42 Dec 27, 2022
Unconstrained Text Detection with Box Supervisionand Dynamic Self-Training

SelfText Beyond Polygon: Unconstrained Text Detection with Box Supervisionand Dynamic Self-Training Introduction This is a PyTorch implementation of "

weijiawu 34 Nov 09, 2022
This is the official pytorch implementation of Student Helping Teacher: Teacher Evolution via Self-Knowledge Distillation(TESKD)

Student Helping Teacher: Teacher Evolution via Self-Knowledge Distillation (TESKD) By Zheng Li[1,4], Xiang Li[2], Lingfeng Yang[2,4], Jian Yang[2], Zh

Zheng Li 9 Sep 26, 2022
PyTorch implementation of GLOM

GLOM PyTorch implementation of GLOM, Geoffrey Hinton's new idea that integrates concepts from neural fields, top-down-bottom-up processing, and attent

Yeonwoo Sung 20 Aug 17, 2022
Learning trajectory representations using self-supervision and programmatic supervision.

Trajectory Embedding for Behavior Analysis (TREBA) Implementation from the paper: Jennifer J. Sun, Ann Kennedy, Eric Zhan, David J. Anderson, Yisong Y

58 Jan 06, 2023
Build a medical knowledge graph based on Unified Language Medical System (UMLS)

UMLS-Graph Build a medical knowledge graph based on Unified Language Medical System (UMLS) Requisite Install MySQL Server 5.6 and import UMLS data int

Donghua Chen 6 Dec 25, 2022
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Segmentation Transformer Implementation of Segmentation Transformer in PyTorch, a new model to achieve SOTA in semantic segmentation while using trans

Abhay Gupta 161 Dec 08, 2022
The pytorch implementation of DG-Font: Deformable Generative Networks for Unsupervised Font Generation

DG-Font: Deformable Generative Networks for Unsupervised Font Generation The source code for 'DG-Font: Deformable Generative Networks for Unsupervised

130 Dec 05, 2022
Code for TIP 2017 paper --- Illumination Decomposition for Photograph with Multiple Light Sources.

Illumination_Decomposition Code for TIP 2017 paper --- Illumination Decomposition for Photograph with Multiple Light Sources. This code implements the

QAY 7 Nov 15, 2020
A new video text spotting framework with Transformer

TransVTSpotter: End-to-end Video Text Spotter with Transformer Introduction A Multilingual, Open World Video Text Dataset and End-to-end Video Text Sp

weijiawu 67 Jan 03, 2023
Companion repository to the paper accepted at the 4th ACM SIGSPATIAL International Workshop on Advances in Resilient and Intelligent Cities

Transfer learning approach to bicycle sharing systems station location planning using OpenStreetMap Companion repository to the paper accepted at the

Politechnika Wrocławska - repozytorium dla informatyków 4 Oct 24, 2022
[ ICCV 2021 Oral ] Our method can estimate camera poses and neural radiance fields jointly when the cameras are initialized at random poses in complex scenarios (outside-in scenes, even with less texture or intense noise )

GNeRF This repository contains official code for the ICCV 2021 paper: GNeRF: GAN-based Neural Radiance Field without Posed Camera. This implementation

Quan Meng 191 Dec 26, 2022
Generative Models for Graph-Based Protein Design

Graph-Based Protein Design This repo contains code for Generative Models for Graph-Based Protein Design by John Ingraham, Vikas Garg, Regina Barzilay

John Ingraham 159 Dec 15, 2022
TensorFlow Implementation of Unsupervised Cross-Domain Image Generation

Domain Transfer Network (DTN) TensorFlow implementation of Unsupervised Cross-Domain Image Generation. Requirements Python 2.7 TensorFlow 0.12 Pickle

Yunjey Choi 865 Nov 17, 2022
Code for the ECCV2020 paper "A Differentiable Recurrent Surface for Asynchronous Event-Based Data"

A Differentiable Recurrent Surface for Asynchronous Event-Based Data Code for the ECCV2020 paper "A Differentiable Recurrent Surface for Asynchronous

Marco Cannici 21 Oct 05, 2022
[RSS 2021] An End-to-End Differentiable Framework for Contact-Aware Robot Design

DiffHand This repository contains the implementation for the paper An End-to-End Differentiable Framework for Contact-Aware Robot Design (RSS 2021). I

Jie Xu 60 Jan 04, 2023
JORLDY an open-source Reinforcement Learning (RL) framework provided by KakaoEnterprise

Repository for Open Source Reinforcement Learning Framework JORLDY

Kakao Enterprise Corp. 330 Dec 30, 2022
Supervised Contrastive Learning for Downstream Optimized Sequence Representations

SupCL-Seq 📖 Supervised Contrastive Learning for Downstream Optimized Sequence representations (SupCS-Seq) accepted to be published in EMNLP 2021, ext

Hooman Sedghamiz 18 Oct 21, 2022
Picasso: A CUDA-based Library for Deep Learning over 3D Meshes

The Picasso Library is intended for complex real-world applications with large-scale surfaces, while it also performs impressively on the small-scale applications over synthetic shape manifolds. We h

97 Dec 01, 2022