While Google's TensorFlow and FaceBook's PyTorch are popular libraries, and (with some caveats) are available in the default Anaconda channel, they are not installed by default. Let's look at the many libraries briefly, then we will install and play with those two.
Install
Let's set up a Conda environment with the necessary libraries. While Anaconda does have PyTorch, it only has it for Linux, so let's add the pytorch
channel and install pytorch
from there.
conda create -n mlwork python==3.6 anaconda ipykernel tensorflow pytorch -c pytorch
You can replace the metapackage anaconda
with the list of packages you will be using. If you are on OSC, logging out then back in should be enough to set this up for Jupyter. On other systems using the latest Anaconda:
conda activate mlwork
python -m ipykernel install --user --name <pick_name_here>
conda deactivate
Or, an older Anaconda:
source activate mlwork
python -m ipykernel install --user --name <pick_name_here>
source deactivate
(Skip source
on Windows)
ML Libraries
- Theano: A research project, one of the very first static graph systems. Killed by newer frameworks except for use in MC3.
- Tensorflow: Powers Android's ML. Rapidly became most popular framework. Often used as a backend to a framework rather than directly (the popular Keras framework is now built in). Shifting from static graph to dynamic graph as default in 2.0 to look more like PyTorch (easier to learn and debug). Still no Python 3.7 support, so use Python 3.6.
- PyTorch: Comes from the old Lua-based Torch. Very easy to debug. Sort of similar but not quite to Numpy. Still heading to version 1.0, which will support production use (Facebook currently has Caffe, which is production ready). Very popular for such a young library. Has built-in ML framework. Amazing documentation. Best way to learn TensorFlow?!
- Chainer: The original basis of PyTorch's design. Stays as close as possible to Numpy (and CuPy was developed to power the GPU side of this framework).
- CNTK: Microsoft's offering, specializes in natural language processing.
All the libraries have CPU/GPU support, decent performance, etc. There are lots more; these are just the most popular.
- You have a sample of an underlying distribution
- You build a model of that distribution
- There are (many) parameters that describe the model
- We select a metric that compares the model to the data
- We want to minimize the parameters to give a good description of the data
What's different in ML? We usually use a different metric instead of NLL, the models are larger but make of simpler parts. That's about it! So how can we improve fitting?
Automatic Differentiation
Why use a framework instead of just writing plain Numpy? A few possible reasons:
- Avoid temporaries: Numpy takes extra memory and time with temporaries. It would be nice to avoid them (note: Numba does this)
- Performance: Numpy runs separate bits of compiled code for each calculation. Some of these could be combined (note: Numba does this)
- GPU: Many systems have GPUs, and GPUs are great for massive but "simple" calculations. Numpy does not directly support GPUs (but CuPy, Numba, and others do).
- Differentiation: You can avoid a lot of calculations if you can get gradients easily! This is a big deal most ML frameworks.
import numpy as np
np.random.seed(42)
N = 1_000_000
a = np.random.rand(N)
b = np.random.rand(N)
c = np.random.rand(N)
%%timeit
s = a + b + c # How many arrays are in memory? (classic: 5, 1.13+: 4 on some systems)
%%timeit
ab = a + b
s = ab + c # Right here, how many arrays are in memory? (5)
del ab
%%timeit
s = a + b
s += c # (4)
Depending on your system, the first time should look like one of the other two times.
from scipy.optimize import minimize
np.random.seed(42)
dist = np.random.normal(loc=1, scale=2., size=100_000)
def gaussian(x, μ, σ):
return 1/np.sqrt(2*np.pi*σ**2) * np.exp(-(x-μ)**2/(2*σ**2))
def nll(params, dist):
mean, sigma = params
return -np.sum(np.log(gaussian(dist, mean, sigma)))
minimize(nll, (.5, 1.), args=(dist,))
Dynamic graphs: PyTorch
When converting to Torch, let's notice a couple of quirks compared to Numpy:
- The basic "ndarray" in ML frameworks is usually called a tensor. No, it is not a true mathematical tensor.
- Torch loves 32-bit floats. You'll need to request 64 for every tensor.
- Use the function
tensor
to make Tensors, not the constructorTensor
- Use math functions from torch rather than Numpy. Numpy 1.13+ has the ability to call a custom library's functions, but Torch does not (yet?) use it.
import torch
tdist = torch.tensor(dist, dtype=torch.float64)
tmean = torch.tensor([0.5], dtype=torch.float64, requires_grad=True)
tsigma = torch.tensor([0.5], dtype=torch.float64, requires_grad=True)
def tgaussian(x, μ, σ):
return 1/torch.sqrt(2*np.pi*σ**2) * torch.exp(-(x-μ)**2/(2*σ**2))
result = -torch.sum(torch.log(tgaussian(tdist, tmean, tsigma)))
print(result.item())
result.backward()
print(tmean.grad.item(), tsigma.grad.item())
Unfortunately, this is not trivial to put into minimize, since the autograd requires "result", and is built once each time this runs.
Static Graphs: TensorFlow
TensorFlow does not make any pretense about looking or acting like Numpy.
- The basic "ndarray" is a Tensor again. Though you have customized tensors for different uses: placeholders, constants, and more.
- Use math functions from TensorFlow rather than Numpy. It's way too different; you are simply "scheduling" an operation, not making one.
- TensorFlow has an "Eager Evaluation" mode that acts like PyTorch, and will become the "default" in 2.0. We'll stay with static graphs at the moment, though.
import tensorflow as tf
# Make the distribution a constant Tensor;
# it does not change in iterations so TensorFlow can optimize for that.
x = tf.constant(dist)
# Make placeholders for values we are going to "feed" in
# (0D tensor == scalar)
tf_mean = tf.placeholder(dtype=tf.float64)
tf_sigma = tf.placeholder(dtype=tf.float64)
# tf_gaussian is a Tensor graph (like a function) can can compute this expression!
tf_gaussian = 1/tf.sqrt(2*np.pi*tf_sigma**2) * tf.math.exp(-(x-tf_mean)**2/(2*tf_sigma**2))
# This is still just a "description of what to do", no computation has been done yet
tf_nll = -tf.reduce_sum(tf.math.log(tf_gaussian))
# We can compute symbolic derivatives with the graph, as well
tf_grads = tf.gradients(tf_nll, [tf_mean, tf_sigma])
with tf.Session() as sess:
loss_value = sess.run(tf_nll,
feed_dict={tf_mean:0.5,
tf_sigma:0.5})
grads = sess.run(tf_grads,
feed_dict={tf_mean:0.5,
tf_sigma:0.5})
print(loss_value, grads)
Notes:
- You have to run inside a session. Sessions are slow to start/stop, so loops should be inside the session (
sess.run
is fast). - The actual computation is quite fast, and can happen on the GPU.
- TensorFlow is a bit verbose and tricky to setup, but can be amazingly clear.
- (Not shown) TensorFlow has great graph visualization tools (TensorBoard) (using them is tricky)