Setting up Scikit learn with Conda.

3 minute read


In this manual, we set up a Conda environment for Scikit learn. Please note that this manual considers Mac OS and Linux systems only. The installation procedure for Windows might differ from the one presented in this blog post.


Virtual environments make it super easy to organise your Python packages. We suggest Conda for this purpose. Why? Well:

With over 6 million users, the open source Anaconda Distribution is the easiest way to do Python data science and machine learning. It includes hundreds of popular data science packages and the conda package and virtual environment manager for Windows, Linux, and MacOS. Conda makes it quick and easy to install, run, and upgrade complex data science and machine learning environments like Scikit-learn, TensorFlow, and SciPy. Anaconda Distribution is the foundation of millions of data science projects as well as Amazon Web Services’ Machine Learning AMIs and Anaconda for Microsoft on Azure and Windows.

Get Anaconda for Python here.

Open a terminal window and start the installation from command line:

cd Downloads/
sudo bash -u

This will initiate the installation process, which will guide you through several steps. Install Conda and test the installation by executing the command conda in the command prompt.

In case the command conda results in a bad interpreter error, apply the following fix:

cd /home/user/
nano .bashrc

add the line: export PATH=~/anaconda3/bin:$PATH

Finally, execute the command:

source .bashrc

Now create a new Conda environment:

conda create --name eninf

Next, activate the freshly created environment:

conda activate eninf

Finally, we begin with installing software packages:

conda install scikit-learn matplotlib pandas jupyter

Alternatively, open Anaconda Navigator and use the graphical user interface. Please mind that this is merely a symbolic image, as our environment is named eninf and there is no nilmtk.yml in this context.


Basically, the heartpiece of what we need is now installed on your computer. The next step involves Jupyter notebooks. Working with Jupyter opens up many posibilities and is said to be a must-have feature. Therefore, add the environment to Jupyter:

Now, most of our desired packages are installed and the NILMTK env is ready to roll! To activate it, execute:

python -m ipykernel install --user --name eninf --display-name "Python (eninf)"

Testing your Installation

The time has come to check your installation. Create a new folder and open Jupyter.

mkdir lab
jupyter notebook lab/

Create a new notebook and don’t forget to use “Python (eninf)”. Fill in the Vector Quantization example:



# Code source: Gaël Varoquaux
# Modified for documentation by Jaques Grobler
# License: BSD 3 clause

import numpy as np
import scipy as sp
import matplotlib.pyplot as plt

from sklearn import cluster

try:  # SciPy >= 0.16 have face in misc
    from scipy.misc import face
    face = face(gray=True)
except ImportError:
    face = sp.face(gray=True)

n_clusters = 5

X = face.reshape((-1, 1))  # We need an (n_sample, n_feature) array
k_means = cluster.KMeans(n_clusters=n_clusters, n_init=4)
values = k_means.cluster_centers_.squeeze()
labels = k_means.labels_

# create an array from labels and values
face_compressed = np.choose(labels, values)
face_compressed.shape = face.shape

vmin = face.min()
vmax = face.max()

# original face
plt.figure(1, figsize=(3, 2.2))
plt.imshow(face,, vmin=vmin, vmax=256)

# compressed face
plt.figure(2, figsize=(3, 2.2))
plt.imshow(face_compressed,, vmin=vmin, vmax=vmax)

# equal bins face
regular_values = np.linspace(0, 256, n_clusters + 1)
regular_labels = np.searchsorted(regular_values, face) - 1
regular_values = .5 * (regular_values[1:] + regular_values[:-1])  # mean
regular_face = np.choose(regular_labels.ravel(), regular_values, mode="clip")
regular_face.shape = face.shape
plt.figure(3, figsize=(3, 2.2))
plt.imshow(regular_face,, vmin=vmin, vmax=vmax)

# histogram
plt.figure(4, figsize=(3, 2.2))
plt.axes([.01, .01, .98, .98])
plt.hist(X, bins=256, color='.5', edgecolor='.5')
values = np.sort(values)
for center_1, center_2 in zip(values[:-1], values[1:]):
    plt.axvline(.5 * (center_1 + center_2), color='b')

for center_1, center_2 in zip(regular_values[:-1], regular_values[1:]):
    plt.axvline(.5 * (center_1 + center_2), color='b', linestyle='--')

Here is what you should see now:

Congrats! You have made it.