Analysing Energy Datasets with NILMTK

5 minute read

Published:

Energy consumption datasets are the outcome of measurement campaigns in representative households and industrial facilities. Such datasets are utilised to train and test NILM algorithms. Before using such datasets extensively, it might be a good idea to explore the dataset and collect some basic information. For this purpose, NILMTK provides a set of useful functions that we want to discuss briefly in this blog post by means of the dataset AMPds. Our main source is the NILMTK Documentation.

@article{makonin2016electricity,
  title={Electricity, water, and natural gas consumption of a residential house in Canada from 2012 to 2014},
  author={Makonin, Stephen and Ellert, Bradley and Baji{\'c}, Ivan V and Popowich, Fred},
  journal={Scientific data},
  volume={3},
  pages={160037},
  year={2016},
  publisher={Nature Publishing Group}
}

As first step, we import the AMPds dataset and print its NILM metadata.

IN[1]:

from nilmtk import DataSet
from nilmtk.utils import print_dict

ddir = '/Users/christoph/.../nilmtk/data/'
amp = DataSet(ddir +'AMPds.h5')

print_dict(amp.metadata)

OUT[1]:

  • name: AMPds
  • long_name: The Almanac of Minutely Power Dataset
  • creators:
    • Makonin, Stephen
    • Popowich, Fred
    • Bartram, Lyn
    • Gill, Bob
    • Bajic, Ivan V.
  • publication_date: 2013, 2014
  • institution: Simon Fraser University (SFU)
  • contact: stephen@makonin.com
  • description: a dataset consisting of electricity, water, and natural gas consumption for 2 years
  • subject: First dataset from Canada
  • number_of_buildings: 1
  • timezone: America/Vancouver
  • geo_location:
    • locality: Burnaby
    • country: CA
    • latitude: 49.269
    • longitude: -122.992
  • related_documents:
  • schema: https://github.com/nilmtk/nilm_metadata/tree/v0.2
  • meter_devices:
    • PS18:
      • model: PowerScout 18
      • manufacturer: DENT Instruments
      • manufacturer_url: http://www.dentinstruments.com
      • description: Branch curcuit power meter (18)
      • sample_period: 60
      • max_sample_period: 60
      • measurements:
        • {'physical_quantity': 'voltage', 'type': 'apparent', 'upper_limit': 270, 'lower_limit': 0}
        • {'physical_quantity': 'current', 'type': 'apparent', 'upper_limit': 400, 'lower_limit': 0}
        • {'physical_quantity': 'frequency', 'type': 'apparent', 'upper_limit': 70, 'lower_limit': 0}
        • {'physical_quantity': 'power factor', 'type': 'apparent', 'upper_limit': 1, 'lower_limit': 0}
        • {'physical_quantity': 'power factor', 'type': 'real', 'upper_limit': 1, 'lower_limit': 0}
        • {'physical_quantity': 'power', 'type': 'active', 'upper_limit': 96000, 'lower_limit': 0}
        • {'physical_quantity': 'cumulative energy', 'type': 'active', 'upper_limit': 500000000, 'lower_limit': 0}
        • {'physical_quantity': 'power', 'type': 'reactive', 'upper_limit': 96000, 'lower_limit': 0}
        • {'physical_quantity': 'cumulative energy', 'type': 'reactive', 'upper_limit': 500000000, 'lower_limit': 0}
        • {'physical_quantity': 'power', 'type': 'apparent', 'upper_limit': 96000, 'lower_limit': 0}
        • {'physical_quantity': 'cumulative energy', 'type': 'apparent', 'upper_limit': 500000000, 'lower_limit': 0}

In order to explore the dataset, we need to create an ElecMeter object. This NILMTK-specific object includes all site meters and appliance-level meters.

IN[2]:

elec = amp.buildings[1].elec
elec

OUT[2]:

MeterGroup(meters=
  ElecMeter(instance=1, building=1, dataset='AMPds', site_meter, appliances=[])
  ElecMeter(instance=2, building=1, dataset='AMPds', appliances=[Appliance(type='light', instance=1)])
  ElecMeter(instance=3, building=1, dataset='AMPds', appliances=[Appliance(type='light', instance=2)])
  ElecMeter(instance=4, building=1, dataset='AMPds', appliances=[Appliance(type='light', instance=3)])
  ElecMeter(instance=5, building=1, dataset='AMPds', appliances=[Appliance(type='unknown', instance=1)])
  ElecMeter(instance=6, building=1, dataset='AMPds', appliances=[Appliance(type='unknown', instance=2)])
  ElecMeter(instance=7, building=1, dataset='AMPds', appliances=[Appliance(type='sockets', instance=1)])
  ElecMeter(instance=8, building=1, dataset='AMPds', appliances=[Appliance(type='unknown', instance=3)])
  ElecMeter(instance=9, building=1, dataset='AMPds', appliances=[Appliance(type='unknown', instance=4)])
  ElecMeter(instance=10, building=1, dataset='AMPds', appliances=[Appliance(type='unknown', instance=5)])
  ElecMeter(instance=11, building=1, dataset='AMPds', appliances=[Appliance(type='fridge', instance=1)])
  ElecMeter(instance=12, building=1, dataset='AMPds', appliances=[Appliance(type='unknown', instance=6)])
  ElecMeter(instance=13, building=1, dataset='AMPds', appliances=[Appliance(type='unknown', instance=7)])
  ElecMeter(instance=14, building=1, dataset='AMPds', appliances=[Appliance(type='heat pump', instance=1)])
  ElecMeter(instance=15, building=1, dataset='AMPds', appliances=[Appliance(type='unknown', instance=8)])
  ElecMeter(instance=16, building=1, dataset='AMPds', appliances=[Appliance(type='light', instance=4)])
  ElecMeter(instance=17, building=1, dataset='AMPds', appliances=[Appliance(type='sockets', instance=2)])
  ElecMeter(instance=18, building=1, dataset='AMPds', appliances=[Appliance(type='unknown', instance=9)])
  ElecMeter(instance=19, building=1, dataset='AMPds', appliances=[Appliance(type='television', instance=1)])
  ElecMeter(instance=20, building=1, dataset='AMPds', appliances=[Appliance(type='sockets', instance=3)])
  ElecMeter(instance=21, building=1, dataset='AMPds', appliances=[Appliance(type='electric oven', instance=1)])
)

IN[3]:

elec.dropout_rate()

OUT[3]:

0.0

IN[4]:

print("Sample Period: " + str( elec.sample_period() ))
print("Available physical quantities: " + str( elec.available_physical_quantities() ))
print("Available AC types :" + str(elec.available_ac_types('power')))
print("Duration of the measurement campaign: "+ str(elec.uptime()))

OUT[4]:

Sample Period: 60
Available physical quantities: ['voltage', 'frequency', 'current', 'power factor', 'power', 'cumulative energy']
Available AC types :['reactive', 'apparent', 'active']
Duration of the measurement campaign: 729 days 23:59:00

Next, we plot the good sections of the dataset.

IN[5]:

elec.plot_good_sections()

OUT[5]:

<matplotlib.axes._subplots.AxesSubplot at 0xa1aeadb38>

png

The plots indicate that the majority of sections is all right. Nice! A very handy function of NILMTK is the wiring graph. This graph shows the wiring of a particular building and helps in getting an overview of the appliance’s connections.

IN[6]:

elec.draw_wiring_graph()

OUT[6]:

(<networkx.classes.digraph.DiGraph at 0xa1aff1ef0>,
 <matplotlib.axes._axes.Axes at 0xa1aface80>)

png

Due to several reasons, a certain amount of appliances cannot be equipped with a measurement plug during a measurement campaign. As a result, the exact power signal of that appliance is unknown. The function proportion_of_energy_submetered allows estimating the amount of submetered energy inside a dataset.

IN[7]:

elec.proportion_of_energy_submetered()

OUT[7]:

Running MeterGroup.proportion_of_energy_submetered...

0.9146573111854486

AMPds shows a score of 0.91, which is quite remarkable. The next aspect of our interest is the fraction of energy consumed by the appliances.

IN[8]:

elec.fraction_per_meter()

OUT[8]:

21/21 ElecMeter(instance=21, building=1, dataset='AMPds', appliances=[Appliance(type='electric oven', instance=1)])

(1, 1, AMPds)     0.514922
(2, 1, AMPds)     0.003918
(3, 1, AMPds)     0.012565
(4, 1, AMPds)     0.016394
(5, 1, AMPds)     0.033271
(6, 1, AMPds)     0.002704
(7, 1, AMPds)     0.000577
(8, 1, AMPds)     0.006046
(9, 1, AMPds)     0.007568
(10, 1, AMPds)    0.021275
(11, 1, AMPds)    0.022731
(12, 1, AMPds)    0.067881
(13, 1, AMPds)    0.025865
(14, 1, AMPds)    0.074558
(15, 1, AMPds)    0.005128
(16, 1, AMPds)    0.019010
(17, 1, AMPds)    0.000431
(18, 1, AMPds)    0.111456
(19, 1, AMPds)    0.021547
(20, 1, AMPds)    0.028022
(21, 1, AMPds)    0.004130
dtype: float64

Fractions are one thing but there is nothing better than the good old pie chart ;)

IN[9]:

import matplotlib.pyplot as plt
from pylab import rcParams
%matplotlib inline
rcParams['figure.figsize'] = (13, 6)

fraction = elec.submeters().fraction_per_meter().dropna()

labels = elec.get_labels(fraction.index)
plt.figure(figsize=(6,6))
fraction.plot(kind='pie', labels=labels);

OUT[9]:

20/20 ElecMeter(instance=21, building=1, dataset='AMPds', appliances=[Appliance(type='electric oven', instance=1)])

png

By setting the observation time window of the dataset object, we can easily plot a specific section of the measurement campaign.

IN[10]:

amp.set_window(start='2013-12-12 00:00:00', end='2013-12-19 00:00:00')
elec.plot();
plt.xlabel("Time");

OUT[10]: png

Last but not least, we’d like to mention the power histogram function. By looking at this specific kind of histogram, we can identify the most common power values of the dataset.

IN[11]:

elec.plot_power_histogram()

OUT[11]:

Loading data for meter ElecMeterID(instance=21, building=1, dataset='AMPds')     
Done loading data all meters for this chunk.

<matplotlib.axes._subplots.AxesSubplot at 0xa1b36fef0>

png

To be extended…

We will frequently update and extend this manual. Did we forget about any important aspect? Feel free to leave a comment below!

Comments