Matrix Multiplication

In this post, we’ll explore matrix multiplication using NumPy and understand how it’s used in Machine Learning applications. We’ll cover the mathematical definition, practical implementation, and important considerations about matrix dimensions and NumPy’s broadcasting behavior.

Definition of Matrix Multiplication

If $A$ is an $m \times n$ matrix and $B$ is an $n \times p$ matrix, the matrix product $C = AB$ (denoted without multiplication signs or dots) is defined to be the $m \times p$ matrix such that:

$$c_{ij} = a_{i1}b_{1j} + a_{i2}b_{2j} + \ldots + a_{in}b_{nj} = \sum_{k=1}^{n} a_{ik}b_{kj}$$

where $a_{ik}$ are the elements of matrix $A$, $b_{kj}$ are the elements of matrix $B$, and $i = 1, \ldots, m$, $k=1, \ldots, n$, $j = 1, \ldots, p$.

In other words: $c_{ij}$ is the dot product of the $i$-th row of $A$ and the $j$-th column of $B$.

Matrix Multiplication in Python

Like with the dot product, there are several ways to perform matrix multiplication in Python. As we learned in Linear Algebra 104, calculations are more efficient in vectorized form. Let’s explore the most commonly used NumPy functions.

First, define two matrices:

import numpy as np

A = np.array([[4, 9, 9], [9, 1, 6], [9, 2, 3]])
print("Matrix A (3 by 3):\n", A)

B = np.array([[2, 2], [5, 7], [4, 4]])
print("Matrix B (3 by 2):\n", B)

Output:

Matrix A (3 by 3):
 [[4 9 9]
  [9 1 6]
  [9 2 3]]

Matrix B (3 by 2):
 [[2 2]
  [5 7]
  [4 4]]

Using `np.matmul()`

You can multiply matrices $A$ and $B$ using the NumPy function np.matmul():

np.matmul(A, B)

This will output a $3 \times 2$ matrix:

array([[89, 107],
       [47, 49],
       [40, 44]])

Using the `@` Operator

Python’s @ operator also works for matrix multiplication, giving the same result:

A @ B

Output:

array([[89, 107],
       [47, 49],
       [40, 44]])

Matrix Convention and Broadcasting

Dimension Requirements

Mathematically, matrix multiplication is only defined if the number of columns of matrix $A$ equals the number of rows of matrix $B$. Otherwise, the dot products between rows and columns won’t be defined.

In our example above, changing the order to $BA$ will not work because the dimension rule doesn’t hold:

try:
    np.matmul(B, A)
except ValueError as err:
    print(err)

Output:

matmul: Input operand 1 has a mismatch in its core dimension 0, 
with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 3 is different from 2)

The same error occurs with the @ operator:

try:
    B @ A
except ValueError as err:
    print(err)

Important: When using matrix multiplication, you must carefully consider the dimensions. The number of columns in the first matrix must match the number of rows in the second matrix. This is crucial for understanding how Neural Networks work!

NumPy’s Vector Shortcut

For vector multiplication, NumPy provides a convenient shortcut. You can define two vectors $x$ and $y$ of the same size (which can be understood as two $3 \times 1$ matrices):

x = np.array([1, -2, -5])
y = np.array([4, 3, -1])

print("Shape of vector x:", x.shape)
print("Number of dimensions of vector x:", x.ndim)
print("Shape of vector x, reshaped to a matrix:", x.reshape((3, 1)).shape)
print("Number of dimensions of vector x, reshaped to a matrix:", x.reshape((3, 1)).ndim)

Output:

Shape of vector x: (3,)
Number of dimensions of vector x: 1
Shape of vector x, reshaped to a matrix: (3, 1)
Number of dimensions of vector x, reshaped to a matrix: 2

Automatic Transposition

Following strict matrix convention, multiplication of matrices $3 \times 1$ and $3 \times 1$ is not defined. You would expect an error, but observe what happens:

np.matmul(x, y)

Output:

No error! The result is actually the dot product $x \cdot y$. Vector $x$ was automatically transposed into a $1 \times 3$ vector, and the matrix multiplication $x^T y$ was calculated.

While this is very convenient, you need to be aware of this NumPy behavior to avoid using it incorrectly. The following will return an error:

try:
    np.matmul(x.reshape((3, 1)), y.reshape((3, 1)))
except ValueError as err:
    print(err)

Output:

matmul: Input operand 1 has a mismatch in its core dimension 0, 
with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 3 is different from 1)

Does `np.dot()` Work for Matrix Multiplication?

You might wonder: does np.dot() also work for matrix multiplication?

np.dot(A, B)

Output:

array([[89, 107],
       [47, 49],
       [40, 44]])

Yes, it works! What happens is called broadcasting in Python: NumPy broadcasts the dot product operation to all rows and all columns, producing the resulting product matrix.

Broadcasting with Scalars

Broadcasting also works in other cases. For example:

A - 2

Output:

array([[ 2,  7,  7],
       [ 7, -1,  4],
       [ 7,  0,  1]])

Mathematically, subtraction of a $3 \times 3$ matrix $A$ and a scalar is not defined, but Python broadcasts the scalar, creating a $3 \times 3$ array and performing subtraction element by element.

Practical Applications

A practical example of matrix multiplication can be seen in linear regression models. In linear regression:

Input features are typically represented as a matrix $X$ (where each row is a sample, each column is a feature)
Model parameters (weights) are represented as a vector $w$
The prediction is computed as $\hat{y} = Xw$

Matrix multiplication allows us to compute predictions for all samples simultaneously in a vectorized, efficient manner.

Similarly, in Neural Networks:

Each layer performs matrix multiplication between inputs and weights
Understanding dimension compatibility is crucial for designing network architectures
Broadcasting enables efficient batch processing

Summary

Key takeaways about matrix multiplication:

Definition: Element $c_{ij}$ is the dot product of row $i$ of $A$ and column $j$ of $B$
Dimension rule: Number of columns in first matrix must equal number of rows in second matrix
NumPy functions: Both np.matmul() and @ operator work for matrix multiplication
Vector shortcut: NumPy automatically transposes vectors for convenient multiplication
Broadcasting: Python broadcasts operations for convenience, but be careful with dimensions
Applications: Matrix multiplication is fundamental to linear regression and neural networks

Understanding these concepts is essential for working with Machine Learning algorithms, where matrix operations form the computational backbone of most models.

Pro tip: Always be mindful of matrix dimensions when building ML models. Many bugs in neural network code stem from dimension mismatches in matrix operations!

Linear Algebra 105: Matrix Multiplication

Matrix Multiplication

Definition of Matrix Multiplication

Matrix Multiplication in Python

Using np.matmul()

Using the @ Operator

Matrix Convention and Broadcasting

Dimension Requirements

NumPy’s Vector Shortcut

Automatic Transposition

Does np.dot() Work for Matrix Multiplication?

Broadcasting with Scalars

Practical Applications

Summary

Using `np.matmul()`

Using the `@` Operator

Does `np.dot()` Work for Matrix Multiplication?