In this manual, we set up a Conda environment for Scikit learn. Please note that this manual considers Mac OS and Linux systems only. The installation procedure for Windows might differ from the one presented in this blog post.
Virtual environments make it super easy to organise your Python packages. We suggest Conda for this purpose. Why? Well:
With over 6 million users, the open source Anaconda Distribution is the easiest way to do Python data science and machine learning. It includes hundreds of popular data science packages and the conda package and virtual environment manager for Windows, Linux, and MacOS. Conda makes it quick and easy to install, run, and upgrade complex data science and machine learning environments like Scikit-learn, TensorFlow, and SciPy. Anaconda Distribution is the foundation of millions of data science projects as well as Amazon Web Services’ Machine Learning AMIs and Anaconda for Microsoft on Azure and Windows.
Open a terminal window and start the installation from command line:
cd Downloads/ sudo bash Anaconda3-5.2.0-Linux-x86_64.sh -u
This will initiate the installation process, which will guide you through several steps. Install Conda and test the installation by executing the command conda in the command prompt.
In case the command conda results in a bad interpreter error, apply the following fix:
cd /home/user/ nano .bashrc
add the line:
Finally, execute the command:
Now create a new Conda environment:
conda create --name eninf
Next, activate the freshly created environment:
conda activate eninf
Finally, we begin with installing software packages:
conda install scikit-learn matplotlib pandas jupyter
Alternatively, open Anaconda Navigator and use the graphical user interface. Please mind that this is merely a symbolic image, as our environment is named eninf and there is no nilmtk.yml in this context.
Basically, the heartpiece of what we need is now installed on your computer. The next step involves Jupyter notebooks. Working with Jupyter opens up many posibilities and is said to be a must-have feature. Therefore, add the environment to Jupyter:
Now, most of our desired packages are installed and the NILMTK env is ready to roll! To activate it, execute:
python -m ipykernel install --user --name eninf --display-name "Python (eninf)"
Testing your Installation
The time has come to check your installation. Create a new folder and open Jupyter.
mkdir lab jupyter notebook lab/
Create a new notebook and don’t forget to use “Python (eninf)”. Fill in the Vector Quantization example:
print(__doc__) # Code source: Gaël Varoquaux # Modified for documentation by Jaques Grobler # License: BSD 3 clause import numpy as np import scipy as sp import matplotlib.pyplot as plt from sklearn import cluster try: # SciPy >= 0.16 have face in misc from scipy.misc import face face = face(gray=True) except ImportError: face = sp.face(gray=True) n_clusters = 5 np.random.seed(0) X = face.reshape((-1, 1)) # We need an (n_sample, n_feature) array k_means = cluster.KMeans(n_clusters=n_clusters, n_init=4) k_means.fit(X) values = k_means.cluster_centers_.squeeze() labels = k_means.labels_ # create an array from labels and values face_compressed = np.choose(labels, values) face_compressed.shape = face.shape vmin = face.min() vmax = face.max() # original face plt.figure(1, figsize=(3, 2.2)) plt.imshow(face, cmap=plt.cm.gray, vmin=vmin, vmax=256) # compressed face plt.figure(2, figsize=(3, 2.2)) plt.imshow(face_compressed, cmap=plt.cm.gray, vmin=vmin, vmax=vmax) # equal bins face regular_values = np.linspace(0, 256, n_clusters + 1) regular_labels = np.searchsorted(regular_values, face) - 1 regular_values = .5 * (regular_values[1:] + regular_values[:-1]) # mean regular_face = np.choose(regular_labels.ravel(), regular_values, mode="clip") regular_face.shape = face.shape plt.figure(3, figsize=(3, 2.2)) plt.imshow(regular_face, cmap=plt.cm.gray, vmin=vmin, vmax=vmax) # histogram plt.figure(4, figsize=(3, 2.2)) plt.clf() plt.axes([.01, .01, .98, .98]) plt.hist(X, bins=256, color='.5', edgecolor='.5') plt.yticks(()) plt.xticks(regular_values) values = np.sort(values) for center_1, center_2 in zip(values[:-1], values[1:]): plt.axvline(.5 * (center_1 + center_2), color='b') for center_1, center_2 in zip(regular_values[:-1], regular_values[1:]): plt.axvline(.5 * (center_1 + center_2), color='b', linestyle='--') plt.show()
Here is what you should see now:
Congrats! You have made it.