Energy Datasets for NILMTK

8 minute read

Published:

Energy consumption datasets are the outcome of measurement campaigns in representative households and industrial facilities. Such datasets are utilised to train and test NILM algorithms on. Especially in NILMTK, there exist features to easily import such datasets and use them by means of a few commands. NILMTK supports a number of datasets converter functions that convert datasets and the corresponding metadata to a single H5 file, which is very convenient! In this manual, we will discuss which energy datasets exist as H5 version and show where they can be obtained. Our main sources are the NILMTK project page, the dataset page of NILM.EU, NILM.ca, and information gathered from Oli Parson’s blog.

AMPds: Almanac of Minutely Power dataset (R2013)

AMPds contains electricity, water, and natural gas measurements at one minute intervals — a total of 1,051,200 readings per meter for 2 years of monitoring. Weather data from Environment Canada’s YVR weather station has also been added. This hourly weather data covers the same period of time as AMPds and includes a summary of climate normals observed from the years between 1981-2010. Utility billing data is also included for cost analyses. Source: AMPds site

Download from Harvard Dataverse

NILMTK Function: convert_ampds(input_path, output_filename, format=’HDF’)

Access the corresponding research paper here

COMBED

COMBED contains a month of smart meter data collected from different sensing points in IIITD’s academic building. COMBED was one of the first energy related data set from a commercial building where data is sampled more than once a minute. Our data set comes with a loader allowing it to be easily plugged into NILMTK. Source: COMBED project page

Download from Github

NILMTK Function: convert_combed(combed_path, output_filename, format=’HDF’)

Access the corresponding research paper here

Dataport

Dataport’s vast database of original and curated data includes ERCOT market operations, minute-interval appliance-level customer electricity use from nearly 1,000 houses and apartments in Pecan Street’s multi-state residential electricity use research, and minute-interval gas and water use from hundreds of homes in Pecan Street’s gas and water research testbeds as well as energy and climate-related federal datasets. Source: Dataport.cloud

Download from Dataport.cloud or via NILMTK

NILMTK Function: download_dataport(‘username’, ‘password’, ‘/path/output_filename.h5’, periods_to_load={26: (‘2014-04-01’, ‘2014-05-01’)})

Access the corresponding research paper here

ECO data set (Electricity Consumption & Occupancy)

The ECO data set is a comprehensive data set for non-intrusive load monitoring and occupancy detection research. It was collected in 6 Swiss households over a period of 8 months. For each of the households, the ECO data set provides: 1 Hz aggregate consumption data. Each measurement contains data on current, voltage, and phase shift for each of the three phases in the household. 1 Hz plug-level data measured from selected appliances. Occupancy information measured through a tablet computer (manual labeling) and a passive infrared sensor (in some of the households). Source: ECO data set website

Download ECO

NILMTK Function: convert_eco(dataset_loc, hdf_filename, timezone)

Access the corresponding research paper here

iAWE

The Indraprastha Institute of Information Technology recently released the iAWE data set, which contains aggregate and sub-metered electricity and gas data from 33 household sensors at 1 second resolution. The data set covers 73 days of a single house in Delhi, India. Each individual channel of the data can be downloaded separately in either SQL or CSV format from the download section at the bottom of the webpage. Source: Oli Parson’s blog

Download iAWE

NILMTK Function: convert_iawe(iawe_path, output_filename, format=”HDF”)

Access the corresponding research paper here

REDD

REDD contains both household-level and circuit-level data from 6 US households, over various durations (between a few weeks and a few months). Each house has two-phase mains input, and 10-25 individually monitored circuits. High-frequency (kHz) current and voltage data are available for both mains circuits, while low-frequency power measurements (3-4 second intervals) are available for the appliance circuits. This data set was collected primarily for the evaluation of non-event based NIALM methods. The authors have password protected access to the data set to keep track of its usage. Source: Oli Parson’s blog

Download REDD

NILMTK Function: convert_redd(redd_path, output_filename, format=’HDF’)

Access the corresponding research paper here

REFIT

The REFIT Electrical Load Measurements dataset includes cleaned electrical consumption data in Watts for 20 households at aggregate and appliance level, timestamped and sampled at 8 second intervals. This dataset is intended to be used for research into energy conservation and advanced energy services, ranging from non-intrusive appliance load monitoring, demand response measures, tailored energy and retrofit advice, appliance usage analysis, consumption and time-use statistics and smart home/building automation. Source: REFIT website

Download REFIT

NILMTK Function: convert_refit(input_path, output_filename, format=’HDF’)

Access the corresponding research paper here

UK-DALE

Jack Kelly released the first version of the UK-DALE in January 2015. The data set contains 16 kHz current and voltage aggregate meter readings and 6 second sub-metered power data from individual appliances across 3 UK homes, as well as 1 second aggregate and 6 second sub-metered power data for 2 additional homes. An update to the data set was released in August 2015 which has expanded the data available for house 1 to 2.5 years. Low frequency data is available to download in CSV or NILMTK HDF5 format, while high frequency data can be downloaded in FLAC file format. Source: Oli Parson’s blog

Download from Jack-Kelly.com

NILMTK Function: convert_ukdale(ukdale_path, output_filename, format=’HDF’)

Access the corresponding research paper here

Further we recommend

AMPds 2

Version 2 of the AMPds dataset has been release to help load disaggregation/NILM and eco-feedback researcher test their algorithms, models, systems, and prototypes. This dataset is intended to be multi-year capture of the consumption of my house. This dataset contains electricity, water, and natural gas measurements at one minute intervals. This dataset contains a total of 1,051,200 readings for 2 years of monitoring (from April/2012 to March/2014) per meter. There are a total of 21 power meters, 2 water meters (with additional appliance usage annotations), and 2 natural gas meters. Weather data from Environment Canada’s YVR weather station has also been added. This hourly weather data covers the same period of time as AMPds and includes a summary of climate normals observed from the years between 1981-2010. Billing data from utility companies is also included for cost/benefit analysis. Source: NILM.ca

Access the corresponding research paper here

BLUED

The BLUED data set contains high-frequency (12 kHz) household-level data from a single US household over a period of approximately 8 days. The data set also contains an event list of each time an appliance within the household changes state (e.g. microwave turns on). This data set was collected primarily for the evaluation of event based NIALM methods. The authors have also password protected access to the data set to keep track of its usage. Source: Oli Parson’s blog

Access the corresponding research paper here

RAE

A dataset that captures smart meter and sub-meter data. Houses are located in and around Vancouver, Canada. The Rainforest Automation Energy (RAE) dataset to help smart grid researchers test their algorithms which make use of smart meter data. RAE contains 72 days of 1Hz data from a residential house’s mains and 24 sub-meters resulting in 6.2 million samples for each sub-meter. In addition to power data, environmental and sensor data from the house’s thermostat is included. Sub-meter data includes heat pump and rental suite captures which is of interest to power utilities. Source: NILM.ca

Access the corresponding research paper here

Tracebase

The tracebase repository contains individual appliance data with the intention of creating a database for training NIALM algorithms. The repository contains a total of 1883 days of power readings, taken at 1 second intervals, for 158 appliance instances, of 43 different appliance types. Since the aim is to create an appliance database, no aggregate measurements are collected. Source: Oli Parson’s blog

Access the corresponding research paper here

To be extended…

We will frequently update and extend this manual. Did we forget any important aspect? Feel free to leave a comment below!

Comments