Vectorization

Week 6 Day 1: Vectorization

Objectives:

Become comfortable with arrays > 1D
Work on avoiding loops (even nested ones)

import numpy as np

Vectorized calculations

We will rather loosely be using the term "vectorized" to mean array-at-a-time computations. True (hardware-level) vectorization actually runs a small set of multiple calculations at the same time on your device - this may or may not happen in our array computations - and it isn't important at the moment.

a = np.array([1,2,3,4])
b = np.array([5,6,7,8])

# Classic loops:
c1 = np.empty_like(a)
for i in range(len(a)):
    c1[i] = a[i] + b[i]
    
# Array-at-a-time, or "vectorized":
c2 = a + b

print(c1)
print(c2)

Let's try in two diminsions:

Note: since I'm explicitly looking at the output values, I'm okay to use square matricies. It's often a nice check to use non-square matricies if you have the choice; it catches missmatches in diminsions!

a = np.array([[1,2],[3,4]])
b = np.array([[5,6],[7,8]])

# Classic loops:
c1 = np.empty_like(a)
for i in range(a.shape[0]):
    for j in range(a.shape[1]):
        c1[i,j] = a[i,j] + b[i,j]
    
# Array-at-a-time, or "vectorized":
c2 = a + b

print(c1)
print(c2)

Broadcasting reminder:

We've seen 1D broadcasting quite a bit: scalars can be mixed with arrays:

np.array([1,2]) + 1

That expands to multiple diminsions like this:

(a, b, ..., z) OP (a, b, ..., z) -> element wise
(a, b, ..., z) OP (a, 1, ..., z) -> length 1 instead of b -> duplicated b times
(a, b, ..., z) OP (b, ..., z) -> prepend a diminsion of length 1 -> see above

The third rule explains scalars, but is a bit problematic. Try to use the second rule when possible instead.

print(np.ones([2,3]) + np.ones([2, 3]))
print(np.ones([2,3]) + np.ones([1, 3]))
print(np.ones([2,3]) + np.ones([3]))

Memory layout

Computer memory is linear. It might be fragmented a bit, but it is not 2D or 3D. (and if it was, it would be even harder to manage). So, all objects more than 1D have to be laid out in some fasion. There are three choices for 2D arrays:

Pointers to 1D arrays. Requires look ups to traverse - bad for performance and memory.
Row major C order. Default in Python. Similar to the way we write.
Column major F order (from Fortran). Default in Matlab, optional in Python.

Law: whenever there are two ways to do something, it will always be done both ways.

A = np.array([[1,2],[3,4]], order="C")
print(A)
print(A.flatten("K"))

A = np.array([[1,2],[3,4]], order="F")
print(A)
print(A.flatten("K"))

This matters when we make loops - how do we want the computer to look up the items in the list?

A = np.array([[1,2],[3,4]])

print('Continious:', end=' ')
for i in range(2):
    print(A[0, i], end=' ')
    
print()
print('Not continious:', end=' ')
for i in range(2):
    print(A[i, 0], end=' ')

Numpy hides the details whenever it can. Even if you use Fortran style arrays, everything will still "just work"; unless you try hard, even "flatten" works the same way - it turns it into a C style flat array even if that's not how it's laid out in memory (add "K" to fix). Any method or function that depends on the order will have an order= argument. UFuncts will loop in the best order automatically. And Transposes are basically no-ops - they just change from C style to F style and vise versa!

print(np.isfortran(A))
print(np.isfortran(A.T))

Note: we'll discuss why A.T is the transpose of a matrix in a minute.

Questions:

Why is np.ones([1,3]).reshape([2]) not allowed?
Why is np.ones([3]).reshape([1,3]) a no op?

np.ones([3]).flatten('K')

Adding a 1 length dimension is basically a "no-op" - the memory of the array does not change, just the interpretation of it.

So why is the auto up-scaling broadcasting (rule 3 above) irritating to use?

Given these two brodcasting expressions:

$$ \left[ \begin{matrix} 1 & 2 \\ 3 & 4 \end{matrix} \right] \times \left[ \begin{matrix} 1 & 2 \end{matrix} \right] $$$$ \left[ \begin{matrix} 1 & 2 \\ 3 & 4 \end{matrix} \right] \times \left[ \begin{matrix} 1 \\ 2 \end{matrix} \right] $$

What happens if the second one is just a 1D array?

a = np.array([[1,2],[3,4]])

print(a * np.array([[1,2]]))
print(a * np.array([[1],[2]]))

print(a * np.array([1,2]))

A = np.array([[1,2],[3,4]])
A.T

Aside:

Python objects (since Python 2.2, at least) provide properties - those are methods that look at at like members!

So the following:

value = object.member

can secretly call a function! (Think object.member(), though it's technically calling object.member.__get__().) A similar trick exists for setting, as well.

Why would you use them?

You can turn a real member into a property as your class gets more complex
You can avoid some () (great for chaining)
You can take advantage of the nice set syntax

When would you use them?

Not for long calculations - a user does not expect a program to hang when accessing or setting a member
Only for simple properties that do not have any options

You can access the transpose of a matrix:

a.T

Since a 1D transpose does nothing, the following expression does not do what you might naively expect:

# print(a * np.array([1,2]).T)

The solution? Make this an explicit row or column vector. Generally, when doing linear algebra, you should be using 2D arrays. This does not mean that Python is wrong for having 1D arrays - it just adds flexibility.

Quick way to make a column vector:

print(np.array([[1,2]]).T)

Interesting note: The transpose actually just converts an array between C ordering and F ordering!

A.flags

A.T.flags

Vectorization (array computing) with 2D arrays

It can be a little tricky to vectorize code, but the speed is worth it, and it can be easier to read in some cases. Let's take an example:

We have points in 3D space with a weight W. What is the weighted mean?

state = np.random.RandomState(42)
x = state.rand(10)
y = state.rand(10)
z = state.rand(10)
w = state.rand(10)

xsum = 0
ysum = 0
zsum = 0
wsum = 0

for i in range(len(x)):
    xsum += x[i] * w[i]
    ysum += y[i] * w[i]
    zsum += z[i] * w[i]
    wsum += w[i]
    
xsum /= wsum
ysum /= wsum
zsum /= wsum

fig, axs = plt.subplots(1,3, figsize=(14,4))
axs[0].scatter(x,y, s=w*1000)
axs[0].plot(xsum, ysum, 'rx')
axs[1].scatter(x,z, s=w*1000)
axs[1].plot(xsum, zsum, 'rx')
axs[2].scatter(y,z, s=w*1000)
axs[2].plot(ysum, zsum, 'rx')
plt.show()

#state = np.random.RandomState(42)
#M = state.rand(4, 10)
M = np.stack([x,y,z])

M.shape

Wsum = np.sum(w)
Msum = np.sum(M * w, axis=1) / Wsum
Msum.shape

print(Msum == np.array([xsum, ysum, zsum]))

Question: Why is this true?

M1 = np.stack([x,y,z,w])
state = np.random.RandomState(42)
M2 = state.rand(4, 10)

print(M1 == M2)

We can also do this with a 4x10 array.

Wsum2 = np.sum(M2[3])
Msum2 = np.sum(M2[:3] * M2[3], axis=1) / Wsum
Msum2.shape
Msum2 == np.array([xsum, ysum, zsum])

This (as we've seen) is nicer to generate, and keeps our number of variables down, but has the issue that it becomes less clear to read. We can use one of two tricks:

Set w = M2[3] and M=[:3]. These will still be the same memory space (no copy is made - they are just "views"), so this does not come at a penalty.
Use a record array, or Pandas. We'll cover this (Pandas, at least) later. If the types of the columns do not match, this is the only way to do it.