Simple Disaggregation on SynD with NILMTK

4 minute read

Published:

In this tutorial, we will use NILMTK to train and test two of its build-in benchmarking algorithms on SynD. The source code of this tutorial is based on material that was released by the creators of NILMTK. Thanks for sharing!

Remarks to this tutorial:

  1. We use a rather old version of NILMTK in this tutorial i.e. nilmtk <= 0.3.0
  2. With FHMM and CO, we selected rather old-fashioned than state-of-the-art disaggregators. However, we aim to provide a simple introduction to NILM in this tutorial and not a presentation of novel cutting-edge tech.

Step 1: Do imports!

IN[1]:

from __future__ import print_function, division
import sys
from matplotlib import rcParams
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import warnings
from six import iteritems

from sklearn.metrics import mean_squared_error

from nilmtk import DataSet, TimeFrame, MeterGroup, HDFDataStore
from nilmtk.disaggregate import CombinatorialOptimisation, FHMM, MLE

from nilmtk.elecmeter import ElecMeterID

Step 2: Let’s define performance metrics and the prediction procedure!

IN[2]:

def compute_RMSE(gt, pred):
    rms_error = {}
    for appliance in gt.columns:
        rms_error[appliance] = np.sqrt(mean_squared_error(gt[appliance], pred[appliance]))
    return pd.Series(rms_error)


def compute_MNE(gt, pred):
    mne = {}
    for appliance in gt.columns:
        mne[appliance] = np.sum(abs(gt[appliance] - pred[appliance])**2) / np.sum(gt[appliance]**2)
    return pd.Series(mne)


def predict(clf, test_elec, sample_period, timezone):
    pred = {}
    gt = {}

    for i, chunk in enumerate(test_elec.mains().load(sample_period=sample_period)):
        chunk_drop_na = chunk.dropna()
        try:
            pred[i] = clf.disaggregate_chunk(chunk_drop_na)
        except RuntimeError:
            continue
        gt[i] = {}

        for meter in test_elec.submeters().meters:
            # Only use the meters that we trained on (this saves time!)
            gt[i][meter] = next(meter.load(sample_period=sample_period))
        gt[i] = pd.DataFrame({k: v.squeeze() for k, v in iteritems(gt[i]) if len(v)},
                             index=next(iter(gt[i].values())).index).dropna()

    # If everything can fit in memory
    gt_overall = pd.concat(gt)
    gt_overall.index = gt_overall.index.droplevel()
    pred_overall = pd.concat(pred)
    pred_overall.index = pred_overall.index.droplevel()

    # Having the same order of columns
    gt_overall = gt_overall[pred_overall.columns]

    # Intersection of index
    gt_index_utc = gt_overall.index.tz_convert("UTC")
    pred_index_utc = pred_overall.index.tz_convert("UTC")
    common_index_utc = gt_index_utc.intersection(pred_index_utc)

    common_index_local = common_index_utc.tz_convert(timezone)
    gt_overall = gt_overall.ix[common_index_local]
    pred_overall = pred_overall.ix[common_index_local]
    appliance_labels = [m for m in gt_overall.columns.values]
    gt_overall.columns = appliance_labels
    pred_overall.columns = appliance_labels
    return gt_overall, pred_overall

Step3: Define settings and create variables!

IN[3]:

################## SETTINGS ##################

sample_period = 10

d_dir = '/Users/christoph/datasets/SynD-release/'

################## VARS ##################

train = DataSet(d_dir+'SynD.h5')
test = DataSet(d_dir+'SynD.h5')

train.set_window(end="2020-02-07")
test.set_window(start="2020-02-07")

train_elec = train.buildings[1].elec
test_elec = test.buildings[1].elec

top_5_train_elec = train_elec.submeters().select_top_k(k=5)

OUT[3]:

21/21 ElecMeter(instance=22, building=1, dataset='SynD', appliances=[Appliance(type='kettle', instance=1)])1)]))]))]))])

Step 4: Train and predict!

IN[4]:

################## DISAGGREGATE ##################
predictions = {}

classifiers = {'CO':CombinatorialOptimisation(), 'FHMM':FHMM()}

for clf_name, clf in classifiers.items():
    print("*"*20)
    print(clf_name)
    print("*" *20)
    clf.train(top_5_train_elec, sample_period=sample_period)
    gt, predictions[clf_name] = predict(clf, test_elec, sample_period, train.metadata['timezone'])


OUT[4]:

********************
CO
********************
Training model for submeter 'ElecMeter(instance=2, building=1, dataset='SynD', appliances=[Appliance(type='fridge', instance=1)])'
Training model for submeter 'ElecMeter(instance=4, building=1, dataset='SynD', appliances=[Appliance(type='electric space heater', instance=1)])'
Training model for submeter 'ElecMeter(instance=3, building=1, dataset='SynD', appliances=[Appliance(type='dish washer', instance=1)])'
Training model for submeter 'ElecMeter(instance=9, building=1, dataset='SynD', appliances=[Appliance(type='clothes iron', instance=1)])'
Training model for submeter 'ElecMeter(instance=5, building=1, dataset='SynD', appliances=[Appliance(type='washing machine', instance=1)])'
Done training!
Estimating power demand for 'ElecMeter(instance=2, building=1, dataset='SynD', appliances=[Appliance(type='fridge', instance=1)])'
Estimating power demand for 'ElecMeter(instance=4, building=1, dataset='SynD', appliances=[Appliance(type='electric space heater', instance=1)])'
Estimating power demand for 'ElecMeter(instance=3, building=1, dataset='SynD', appliances=[Appliance(type='dish washer', instance=1)])'
Estimating power demand for 'ElecMeter(instance=9, building=1, dataset='SynD', appliances=[Appliance(type='clothes iron', instance=1)])'
Estimating power demand for 'ElecMeter(instance=5, building=1, dataset='SynD', appliances=[Appliance(type='washing machine', instance=1)])'
********************
FHMM
********************
Training model for submeter 'ElecMeter(instance=2, building=1, dataset='SynD', appliances=[Appliance(type='fridge', instance=1)])'
Training model for submeter 'ElecMeter(instance=4, building=1, dataset='SynD', appliances=[Appliance(type='electric space heater', instance=1)])'
Training model for submeter 'ElecMeter(instance=3, building=1, dataset='SynD', appliances=[Appliance(type='dish washer', instance=1)])'
Training model for submeter 'ElecMeter(instance=9, building=1, dataset='SynD', appliances=[Appliance(type='clothes iron', instance=1)])'
Training model for submeter 'ElecMeter(instance=5, building=1, dataset='SynD', appliances=[Appliance(type='washing machine', instance=1)])'

Finally: Check performance of FHMM and CO

IN[5]:

rmse = {}
mne = {}

for clf_name in classifiers.keys():
    rmse[clf_name] = compute_RMSE(gt, predictions[clf_name])
    mne[clf_name] = compute_MNE(gt, predictions[clf_name])

print('\n\n+++++ RESULTS +++++')

print('\n++ RMSE ++')
print(pd.DataFrame(rmse).round(1))
res_1 = pd.DataFrame(rmse).round(1)
print('\n++ MNE ++')
print(pd.DataFrame(mne).round(2))

OUT[5]:



+++++ RESULTS +++++

++ RMSE ++
                                                       CO   FHMM
ElecMeter(instance=2, building=1, dataset='SynD...  106.0   22.7
ElecMeter(instance=4, building=1, dataset='SynD...  280.4  248.5
ElecMeter(instance=3, building=1, dataset='SynD...  163.8  116.6
ElecMeter(instance=9, building=1, dataset='SynD...  159.0  113.4
ElecMeter(instance=5, building=1, dataset='SynD...  229.1  222.2

++ MNE ++
                                                       CO  FHMM
ElecMeter(instance=2, building=1, dataset='SynD...  11.22  0.51
ElecMeter(instance=4, building=1, dataset='SynD...   0.54  0.42
ElecMeter(instance=3, building=1, dataset='SynD...   0.36  0.18
ElecMeter(instance=9, building=1, dataset='SynD...   0.39  0.20
ElecMeter(instance=5, building=1, dataset='SynD...   1.13  1.06

That’s all for today. As you see, SynD can be used like any other NILMTK dataset. Please note that there have been significant updates to NILMTK with a major revision and new APIs.

best,

Christoph

Comments