Few things are as ubiquitous in game and graphics programming as a matrix. In this installment of the math primer, we take a look at these structures, investigating not only their numerical significance, but also what they represent visually. Next time, we’ll see how to combine them with vectors and with other matrices to form complex transformations, which we rely heavily on in game code.
When we refer to a coordinate system, we represent the system with a central point (origin), and 2 (for 2D) or 3 (for 3D) linearly independent vectors, also called axes. Look back to the earlier primer articles for a refresher if you need it. To measure a point with respect to this coordinate system, we use an ordered 2- or 3-tuple, such as (3, 5, 1), which represents starting at the origin, then moving 3 units along the first vector, 5 units along the second vector’s direction, and finally 1 unit along the 3rd vector’s direction. The three vectors used in this way are called a basis. A basis is the set of linearly independent vectors used to define a coordinate system (or more accurately, a coordinate frame or frame of reference). If we label these vectors u, v, and w, then we can write the 3-tuple above as 3u + 5v + 1w. We can even define left or right handedness of the basis by looking at the way we orient the third vector with respect to the first two. Specifically, third here means the one that we write last when expressing the tuple. When someone mentions the standard basis, they’re referring to the set of vectors (normally labeled i, j, and k), where the values are (1, 0, 0), (0, 1, 0), and (0, 0, 1) respectively, normally centered around the origin O (0, 0, 0). Let’s take a look at Figure 1 for a more visual representation.
Something important to note here is that we can express any basis in terms of another basis. That means that we can express any basis in terms of the standard basis. If we look at Figure 1 a little more closely, we see that even though u, v, and w are vectors making up a basis, they’re still just vectors, and that means we can express them in terms of i, j, and k. For instance, if we use the basis consisting of u, v, and w as our frame of reference, then u would most likely be written as (1, 0, 0). However, using the standard basis as our point of reference, we might measure u to be some vector (0.6, 0.6, 0) or something of the like. This is a key concept to keep in mind: basis vectors can all be measured and represented in terms of another basis, including the standard one.
It’s getting a little tedious to constantly keep referring to a basis in a manner such as “the basis consisting of basis vectors u, v, and w”. Wouldn’t it be nice if there was a succinct notation to capture that? Well, there is! We can write the basis vectors in block form, like:
Where u, v, and w are expressed as vectors in terms of the standard basis. This is a matrix, and the expanded form looks like this:
Now, we need to clarify a few things here before continuing. The way I’ve laid out the matrix above is not the only correct way to lay out a basis. Just as coordinate systems have left handed and right handed variations, which are just a matter of preference and convention, matrices too have some convention considerations to make. Matrices may be used in what’s called row-major or column-major form, which directly correlates with how you wish to write vectors. Vectors can be written as row-vectors or column-vectors. See Figure 2 for a comparison of the two. It’s important to realize that just as picking one handedness of coordinate system over another had no bearing on the result, just the shape of the formulae, the same is true of how we express our vectors and matrices.
Figure 2: The row-vector on the left is written with the 3 components horizontal. The column vector on the right is written with the 3 components laid out vertically.
How we write out matrices is directly related to what kind of vectors we use. If we use row vectors, then we lay down our basis vectors as rows of the matrix. If we are using column vectors, then we lay down the basis vectors as columns in the matrix. See Figure 3 below:
Figure 3: The row-major matrix on the left consists of the basis vectors written as rows of the matrix. The column-major matrix on the right consists of the basis vectors written as columns in the matrix.
Again, which we choose doesn’t make a difference as long as we are consistent with our convention and form our formulae appropriately for the form we’ve chosen. In mathematics, and most texts on mathematics and graphics, column form is more widespread. However, in the actual game and graphics world things are a little more divided. For instance, DirectX and XNA both use row form, but OpenGL uses column form. You could write an entire library in column form, but still use DirectX or XNA to render, you just need to convert between the forms at the appropriate places. The conversion between the two, called a transpose, will be covered a bit later in this post.
I will choose to use column vector form even though most of my coding examples will likely be either DirectX or XNA. You might think that seems counterintuitive, but I believe writing out mathematics in a manner which is more consistent with academia is more natural, and will match what you find in research papers, math books, and other references around the web. My coding examples are just that: examples. I don’t think it’s justified to form my mathematical explanations around my examples’ choice of graphics API.
NOTE: It is important to realize that this discussion of row versus column major matrix is only relevant when talking about vectors. As we’ll see in a moment, matrices can be used for many other things besides vectors, and in those cases there is only a single form of a matrix. In other words: row major and column major matrices are still the same matrix, we’ve just chosen to impose a convention on how we write vectors in terms of matrices.
While it’s convenient to write vectors as single row or column matrices, and basis vectors as the rows or columns of a 9 element square matrix, these are far from the only uses of a matrix. In fact, most formal definitions of a matrix are something along the lines of “a rectangular arrangement of numbers, organized into rows and columns”, which says nothing about vectors.
Let’s try and define matrices a little more generally now. We know that they look like blocks of numbers, and that they are considered to have rows and columns, so let’s add that we can refer to the number of rows and columns as “n by m”, or n x m, where n is the number of rows and m is the number of columns. This is summarized in Figure 4.
Figure 4: Matrices are labeled using row x column. From left to right, the matrix sizes are: 2x3, 1x2, 3x1, and 3x3.
We can see from the first matrix in the figure that elements within the matrix are always labeled using the row, then the column number, and are 1-based. The subscripts in the first matrix show the row and column numbers of element aij. The second and third matrices could be interpreted as row and column vectors, respectively. The final matrix in the figure leads us to a few more definitions.
If the number of rows and columns in a matrix are the same, the matrix is said to be a square matrix. Since the number of rows and columns are the same, we can refer to square matrices with a single size dimension. For instance, we can say something is a dimension 3 square matrix, meaning it’s a 3x3. The set of all elements in the vector for which the row and column are the same (a11, a22, etc…) is called the diagonal of the matrix. For example, in the rightmost matrix above, the diagonal is the set of numbers (4, 3, 8). If the only non-0 elements in a matrix are within the diagonal, then the matrix is called a diagonal matrix. So, to summarize, our final matrix above can be called a square, diagonal matrix. Finally, as a matter of notation, I’ll use capital, bold, italic letters to represent a matrix. This should make it easier to tell them apart from vectors (bold lowercase) and points (italic capital). For example, a matrix M.
There is a special matrix, called the identity matrix, which is a square, diagonal matrix with only 1s in the diagonal. It can be any size, is normally written as I or In (where n is the size), and we’ll see in a moment that it’s used to ensure multiplicative identity holds (hence, the name).
Now that we have notation and terminology out of the way, let’s start looking at operations we can do with matrices!
The most trivial operation we can perform on a matrix is taking it’s transpose. This swaps all the rows of the matrix with all of the columns. If the matrix begins as an n x m matrix, then the transpose is an m x n. The transpose of M is written as MT. Examples:
There are a few important observations to make. Firstly, if we look back to our discussion on row and column vectors and matrices, then we can see clearly now that to convert between the two we take the transpose. Secondly, it’s important to note that the diagonal of a matrix remains the same after taking it’s transpose. This will always be the case. In fact, many like to think of taking the transpose as taking the reflection of the non-diagonal elements across the diagonal.
Addition and subtraction of two matrices doesn’t come up quite as often in games, but no discussion of matrix operations would be complete without them. Matrix addition can only be done between two matrices of the same exact dimensions, and is a trivial operation of summing the elements at the same location in each:
Subtraction is done in the exact same way, again requiring the matrices be of the same dimension and shape.
Scalar multiplication is as straightforward as can be. Multiplying a matrix M by a scalar s just multiplies each element of the matrix by s.
Multiplying, also called concatenating, two matrices is by far the most common operation done on matrices in game code. We’ll explore why when we talk about transformations later in this post. Multiplying matrices isn’t as straightforward as adding or subtracting, but with a little help visualizing what it is we’re doing, it’s not too bad. Let’s take the following two matrices, A and B.
Multiplication of matrices requires that the number of columns of the first matrix match the number of rows of the second. We can refer to this dimension as d. We can see that our matrices A and B meet that requirement, with d = 3. The resulting matrix from multiplication will have dimensions of n x m, where n is the number of rows in A and m is the number of columns in B. The formal definition of matrix multiplication then is:
Using our example matrices A and B from above:
By this definition, matrix multiplication is not commutative, since the column and row requirements may not be met, and even if they were, the result would be different. The only matrices which meet the row and column requirements when reversed are square matrices, but again the result of the multiplication is different.
While we can certainly think of multiplication in this way, I find it far easier to visualize the multiplication in a more vector-oriented way. If we imagine the rows of the first matrix as vectors (like a row major basis), and the columns of the second matrix as vectors (like a column major basis), then what we’re really doing is taking the dot product of each possible pair of vectors, with the dot product of the ith row of the first matrix and the jth column of the second matrix making up the element mij in the resulting matrix. In other words:
We now can see what the purpose of the identity matrix, I, is. Multiplying any matrix by an identity matrix of the appropriate size, yields the original matrix:
Multiplying a vector by a matrix is called transforming the vector. We’ll look at transformations in the next post in this series, but let’s understand the math behind it first. This is where our discussion of column versus row vectors becomes most relevant. To multiply a vector and a matrix, we treat the vector as a single row or column matrix and multiply as usual. This implies that which side of the matrix the vector goes on is important, and must satisfy the row and column requirements of the multiplication. If our vector is a row vector, then we could write a 3-vector and 3x3 matrix multiplication as:
We could similarly write a column vector multiplied by the same matrix as:
Notice that the vector is on the other side of the matrix. This is required to make the multiplication work. Looking at the expressions above, the product of the multiplication would be different in each case. However, we know intuitively that transforming a vector by a matrix can only have a single answer, and that our choice to use row or column vectors shouldn’t impact the result. In order to move the vector from one side of the matrix to the other to satisfy the multiplication requirements, we must also transpose our matrix to ensure that the result remains the same. If we do that, the product of the multiplication will be the same regardless of whether we choose row vectors or column vectors. This is exactly what we were talking about up above in the basis section. So the correct form for the second equation becomes:
Which will ensure we get the same result as the first case.
There are many more operations we can do with matrices, and I’ll cover some as we go through the next few blog posts. But for now, we have enough covered that we can start to look at linear transformations. The next installment will start our exploration of transformations, beginning with linear transformations. I hope you enjoyed the introduction of matrices, and as always let me know if there’s anything you’d like to see me explain in more detail.