Static Computation Graphs

Week 14 day 1: Static and Dynamic Computation Graphs

Objectives:

Quick note about Numpy temporaries
Cover a couple of common ML frameworks
Study automatic differentiation methods

While Google's TensorFlow and FaceBook's PyTorch are popular libraries, and (with some caveats) are available in the default Anaconda channel, they are not installed by default. Let's look at the many libraries briefly, then we will install and play with those two.

Install

Let's set up a Conda environment with the necessary libraries. While Anaconda does have PyTorch, it only has it for Linux, so let's add the pytorch channel and install pytorch from there.

conda create -n mlwork python==3.6 anaconda ipykernel tensorflow pytorch -c pytorch

You can replace the metapackage anaconda with the list of packages you will be using. If you are on OSC, logging out then back in should be enough to set this up for Jupyter. On other systems using the latest Anaconda:

conda activate mlwork
python -m ipykernel install --user --name <pick_name_here>
conda deactivate

Or, an older Anaconda:

source activate mlwork
python -m ipykernel install --user --name <pick_name_here>
source deactivate

(Skip source on Windows)

ML Libraries

Theano: A research project, one of the very first static graph systems. Killed by newer frameworks except for use in MC3.
Tensorflow: Powers Android's ML. Rapidly became most popular framework. Often used as a backend to a framework rather than directly (the popular Keras framework is now built in). Shifting from static graph to dynamic graph as default in 2.0 to look more like PyTorch (easier to learn and debug). Still no Python 3.7 support, so use Python 3.6.
PyTorch: Comes from the old Lua-based Torch. Very easy to debug. Sort of similar but not quite to Numpy. Still heading to version 1.0, which will support production use (Facebook currently has Caffe, which is production ready). Very popular for such a young library. Has built-in ML framework. Amazing documentation. Best way to learn TensorFlow?!
Chainer: The original basis of PyTorch's design. Stays as close as possible to Numpy (and CuPy was developed to power the GPU side of this framework).
CNTK: Microsoft's offering, specializes in natural language processing.

All the libraries have CPU/GPU support, decent performance, etc. There are lots more; these are just the most popular.

ML libraries without ML?

You may have noticed that I'm covering ML libraries before covering ML. That's because we have already covered fitting, and ML is mostly fitting. Let's review fitting:

You have a sample of an underlying distribution
You build a model of that distribution
There are (many) parameters that describe the model
We select a metric that compares the model to the data
We want to minimize the parameters to give a good description of the data

What's different in ML? We usually use a different metric instead of NLL, the models are larger but make of simpler parts. That's about it! So how can we improve fitting?

Automatic Differentiation

Why use a framework instead of just writing plain Numpy? A few possible reasons:

Avoid temporaries: Numpy takes extra memory and time with temporaries. It would be nice to avoid them (note: Numba does this)
Performance: Numpy runs separate bits of compiled code for each calculation. Some of these could be combined (note: Numba does this)
GPU: Many systems have GPUs, and GPUs are great for massive but "simple" calculations. Numpy does not directly support GPUs (but CuPy, Numba, and others do).
Differentiation: You can avoid a lot of calculations if you can get gradients easily! This is a big deal most ML frameworks.

Quick aside: Temporaries

Numpy classically had issues with temporaries. It's much better now (at least on Numpy 1.13+ on Linux and macOS)

import numpy as np

np.random.seed(42)
N = 1_000_000
a = np.random.rand(N)
b = np.random.rand(N)
c = np.random.rand(N)

%%timeit
s = a + b + c # How many arrays are in memory? (classic: 5, 1.13+: 4 on some systems)

%%timeit
ab = a + b
s = ab + c # Right here, how many arrays are in memory? (5)
del ab

%%timeit
s = a + b
s += c # (4)

Depending on your system, the first time should look like one of the other two times.

Fitting revisit

Let's make a small sample of data to fit.

from scipy.optimize import minimize

np.random.seed(42)
dist = np.random.normal(loc=1, scale=2., size=100_000)

def gaussian(x, μ, σ):
    return 1/np.sqrt(2*np.pi*σ**2) * np.exp(-(x-μ)**2/(2*σ**2))

def nll(params, dist):
    mean, sigma = params
    return -np.sum(np.log(gaussian(dist, mean, sigma)))

minimize(nll, (.5, 1.), args=(dist,))

Dynamic graphs: PyTorch

When converting to Torch, let's notice a couple of quirks compared to Numpy:

The basic "ndarray" in ML frameworks is usually called a tensor. No, it is not a true mathematical tensor.
Torch loves 32-bit floats. You'll need to request 64 for every tensor.
Use the function tensor to make Tensors, not the constructor Tensor
Use math functions from torch rather than Numpy. Numpy 1.13+ has the ability to call a custom library's functions, but Torch does not (yet?) use it.

import torch

tdist = torch.tensor(dist, dtype=torch.float64)

tmean = torch.tensor([0.5], dtype=torch.float64, requires_grad=True)
tsigma = torch.tensor([0.5], dtype=torch.float64, requires_grad=True)
    
def tgaussian(x, μ, σ):
    return 1/torch.sqrt(2*np.pi*σ**2) * torch.exp(-(x-μ)**2/(2*σ**2))

result = -torch.sum(torch.log(tgaussian(tdist, tmean, tsigma)))
print(result.item())

result.backward()
print(tmean.grad.item(), tsigma.grad.item())

Unfortunately, this is not trivial to put into minimize, since the autograd requires "result", and is built once each time this runs.

Static Graphs: TensorFlow

TensorFlow does not make any pretense about looking or acting like Numpy.

The basic "ndarray" is a Tensor again. Though you have customized tensors for different uses: placeholders, constants, and more.
Use math functions from TensorFlow rather than Numpy. It's way too different; you are simply "scheduling" an operation, not making one.
TensorFlow has an "Eager Evaluation" mode that acts like PyTorch, and will become the "default" in 2.0. We'll stay with static graphs at the moment, though.

import tensorflow as tf

# Make the distribution a constant Tensor; 
# it does not change in iterations so TensorFlow can optimize for that.
x = tf.constant(dist)

# Make placeholders for values we are going to "feed" in
# (0D tensor == scalar)
tf_mean = tf.placeholder(dtype=tf.float64)
tf_sigma = tf.placeholder(dtype=tf.float64)

# tf_gaussian is a Tensor graph (like a function) can can compute this expression!
tf_gaussian = 1/tf.sqrt(2*np.pi*tf_sigma**2) * tf.math.exp(-(x-tf_mean)**2/(2*tf_sigma**2))

# This is still just a "description of what to do", no computation has been done yet
tf_nll = -tf.reduce_sum(tf.math.log(tf_gaussian))

# We can compute symbolic derivatives with the graph, as well
tf_grads = tf.gradients(tf_nll, [tf_mean, tf_sigma])

with tf.Session() as sess:

    loss_value = sess.run(tf_nll,
                          feed_dict={tf_mean:0.5,
                                     tf_sigma:0.5})
    
    grads = sess.run(tf_grads,
                     feed_dict={tf_mean:0.5,
                                tf_sigma:0.5})

    print(loss_value, grads)

Notes:

You have to run inside a session. Sessions are slow to start/stop, so loops should be inside the session (sess.run is fast).
The actual computation is quite fast, and can happen on the GPU.
TensorFlow is a bit verbose and tricky to setup, but can be amazingly clear.
(Not shown) TensorFlow has great graph visualization tools (TensorBoard) (using them is tricky)