Matrix multiplication and the inner product

The inner product is a very general object in mathematics, applying to far more than just linear algebra. However, for our purposes, it is also called the dot product and is simply a way to combine two vectors and get a number out of them. It can give useful information, such as how similar two vectors are. A dot product of zero means two vectors are perpendicular, while a dot product of one means two normalized vectors are parallel. For our purposes not much of this is important; we’ll just need to know how to calculate the inner product in order to calculate the product of two matrices.

Dot product

The dot product for us is defined by $$r\cdot s = \sum_{i=0}^n r_is_i,\quad n=\textrm{length}(r)=\textrm{length}(s).$$ (Strictly, the dot product is only defined when \(r\) is a row and \(s\) is a column, but we don’t need to worry about that just now). So the dot product is the sum of the products of corresponding elements of the vectors. Since we want to write this in code without worrying about whether we’re using columns or rows, we will write a helper function called flatten(). This will put all of the elements of any matrix into a single flat list, making it much easier to write the dot() function. It will also be useful for other things later on.

def flatten(vector):
    return [[element for row in vector for element in row]]

We haven’t used the indexing convention this time, because it doesn’t really get us anything. Notice that both of our for loops are within the same row of the new matrix, so this will just produce a single long row, composed of all the elements of that matrix in the correct order (since lists are an ordered datatype in Python). We enclose the whole thing in a second pair of seemingly redundant braces to ensure that it is a valid matrix, just in case we want to do something else with it later. Here we find what I consider to be one of the few weaknesses of the list comprehension syntax; these loops are evaluated in the reverse order to how they’re written. It would be much more natural if we could write [[element for element in row for row in vector]], but unfortunately Python is built the other way – it works through the second loop first – and we’ll just have to live with it.

Given that, our dot product function is easy (I’ll call it dot() in case we want a more powerful/general inner product later on).

def dot(vector1, vector2):
    vec1 = flatten(vector1)
    vec2 = flatten(vector2)
    length = range(columns(vec1))
    return sum([vec1[i] * vec2[i] for i in length])

We’ve taken advantage of the sum() function in Python which can take an iterable, such as a list, as input and returns the sum of all the elements. Because we have pre-flattened the vectors we don’t need to care about whether they are row or column vectors. We choose to trust that whoever provides the inputs knows what they are doing.

We’ll also want to write a couple of helper functions to get specific rows and columns of a given matrix; if we’re going to dot a row with a column we need to get that row and column first.

def get_row(matrix, row_num):
    return [matrix[row_num]]

That one was fairly easy. So easy in fact that you might wonder why we even bothered, since it would be easy to do that on a case by case basis. The reason is that the column getter is a little more complex, to the point that writing it every time would become tiresome and almost certainly introduce easily avoidable errors. If we need get_column() we might as well define get_row() as well for consistency.

def get_column(matrix, col_num):
    length = range(rows(matrix))
    return [[matrix[i][col_num]] for i in length]

I won’t explain this one. It uses a lot of the same logic we’ve used before. If you don’t understand it, it might be a good idea to go back a little and make sure everything so far has cemented properly in your mind.

Multiplying matrices

Onto the crux of the matter. If you’ve ever done a course on linear algebra you will remember how much of a pain it was learning to multiply matrices, and I’d bet you still avoid doing it by hand whenever possible. So I’m very sorry, but I will give a brief reminder of how it’s done! Just for now, I’m going to introduce the notation \(A^{(n)}\) to refer to the \(n\)th row of a matrix, and the notation \(A_{(n)}\) to refer to the \(n\)th column. This is just so that we can write the definition of the matrix product in a compact way. It is $$AB = \begin{bmatrix}A^{(0)}B_{(0)} & A^{(0)}B_{(1)} & \dots \\ A^{(1)}B_{(0)} & A^{(1)}B_{(1)} & \dots \\ \vdots & \vdots & \ddots\end{bmatrix}$$ What I’m saying here is that the \((i,j)\) element of the solution is the dot product of row \(i\) of the first matrix, and column \(j\) of the second matrix. Importantly, matrices don’t need to be identical in size to be multiplied, but there is a constraint we need to be aware of. If a matrix of size \((m,n)\) is written as \(A_{m\times n}\), then $$A_{m\times n}B_{n\times k} = C_{m\times k}.$$ The product is not defined unless \(n\) is the same for both matrices, and the resultant matrix has \(m\) rows and \(k\) columns.

I won’t go into any greater detail here, but I encourage you to be sure you understand this point before moving on. I would strongly recommend sites like Khan Academy for practice.

Since this is probably the most complex function we’ve written so far, I’ll break ranks here and initially write it using traditional for loops, which we’ll then translate into a list comprehension.

def matrix_matrix(matrix1, matrix2):
    m = range(rows(matrix1))
    k = range(columns(matrix2))
    new_matrix = zeroes(m, k)
    for i in m:
        for j in k:
            new_matrix[i][j] = dot(get_row(matrix1, i), get_column(matrix2, j))

We first initialized a matrix of zeroes so that we can easily alter the values of particular elements, rather than use slightly more fiddly methods like list.append(). Then we simply grab the \(i\)th row and \(j\)th column of matrix1 and matrix2 respectively, and set new_matrix[i][j] equal to their dot product.

Converting this into a list comprehension is fairly trivial. One nice thing about it is that we won’t need to pre-initialize new_matrix. Remembering that loops follow the rule ‘top-down \(\equiv\) out-in’, we get

def matrix_matrix(matrix1, matrix2):
    m = range(rows(matrix1))
    k = range(columns(matrix2))
    return [[dot(get_row(matrix1, i), get_column(matrix2, j) for j in k] for i in m]

Mathematicians, who know from experience how annoying the actual procedure of multiplying matrices can be, may be pleasantly surprised at how simple this function is. We’ve made no assumptions about the size of the input matrices (other then them being valid multiplicable matrices), and that lack of assumptions produces a powerfully general function. It would be easy to think we’d need special cases for square and long/tall rectangular matrices, but not so.

Summary

So far our functional repertoire is:

  • rows(matrix): count the number of rows in a matrix.
  • columns(matrix): count the number of columns in a matrix.
  • add(matrix1, matrix2): compute the element-wise sum of two matrices.
  • subtract(matrix1, matrix2): compute the element-wise difference of two matrices.
  • zeroes(num_rows, num_cols=-1): generate a matrix of the given size filled with zeroes.
  • ones(num_rows, num_cols=-1): generate a matrix of the given size filled with ones.
  • eye(size): generate a square identity matrix of the given size.
  • scalar_matrix(scalar, matrix): compute the product of a scalar value and a matrix.
  • get_row(matrix, row_num): grab a particular matrix row and return it as a valid matrix.
  • get_column(matrix, col_num): grab a particular matrix column and return it as a valid matrix.
  • matrix_matrix(matrix1, matrix2): compute the standard product of two matrices.

Previous: Some stock functions
Next: Normalization and unitary matrices

Leave a Reply