Linear Algebra
This material provides some bare definitions, and simple examples using the Python module numpy. It is hoped that the reader will realize the important geometric interpretations for these concepts as well. These notes are adopted from Professor Bill Willson's notes which can be found here
from numpy import array as vector from numpy import matrix, dot, cross, inner, outer
Vectors
A vector may be viewed as a sequence of real numbers.
Dimensionality of a Vector
The dimension of the vector is the number of real numbers in the sequence.
v = vector((3.2, 4.7, –2.4, 0.0)) print v.size #. Output: 4
So v is a vector of dimension 4, and the 4 items in the sequence are referred to as the components of the vector. Note that when we talk about a vector, we define only a dimension, not plural as we would in matrices.
Vector Notation
It is often useful to work with vectors whose components are symbols representing numbers (pro-numerals). Thus (x1, x2, x3, x4) is a vector of symbols. Sometimes (more often than not) it is convenient to use a symbol to refer to the whole vector - a pro-vector, if you like. Often bold face symbols are used for this purpose. So we might write x = (x1, x2, x3, x4), or y = (y1, y2, y3, y4).
Normal Vectors
A normal vector is a vector of unit length; i.e., length 1. It is mathematically defined as follows...

Vector Addition (and Subtraction)
Vectors can be added together by adding the corresponding components. Thus...

In code...
v = vector((2, 7, 4, 0)) + vector((1, 3, 5, 9)) print v #. Output: [ 3, 10, 9, 9]
Vector Multiplication (and Division)
A vector can also be multiplied by a scalar (a real number) by multiplying each component of the vector by the real number. For example...

In code...
print v, 10*v #. Output [ 3, 10, 9, 9], [ 30, 100, 90, 90]
There is no particular reason why the components have to be real numbers. In some applications (but probably not in the neural networks course for which these revision notes were written) the components might be complex numbers, or members of a finite field such as the whole numbers with arithmetic done modulo p, where p is a prime number.
To be explicit, it should be noted that a vector cannot be multiplied (or divided) by another vector, only by scalars.
Vector Cross Products
The cross product of orthonormal vectors u and v, give the vector n which is orthogonal to both u and v, and is also a unit vector; i.e. n is also orthonormal.
u×v = -v×u ∀u, v
Vector Inner (Dot) Products
The inner product (dot product) of two vectors of the same length is obtained by multiplying together the corresponding components, and taking the sum of the products. The inner product of vectors x and y is written x • y, hence the alternative name dot-product. For example...

Symbolically...
![$w\; •\; x\; =\; \left[ w_{1},\; w_{2},\; w_{3},\; w_{4} \right]\; •\; \left[ x_{1},\; x_{2},\; x_{3},\; x_{4} \right]\; =\; w_{1}\cdot x_{1}\; +\; w_{2}\cdot x_{2}\; +\; w_{3}\cdot x_{3}\; +\; w_{4}\cdot x_{4}$](http://ai.autonomy.net.au/tracmath/86f3b857fd11022fd4608ad3202dac191f6f7e52.png)
And in code...
v1 = vector((1, 2, 3, 4)) v2 = vector((5, 6, 7, 8)) print dot(v1, v2) #. Output... 70
Inner products are the same as dot products in vectors, but this not the case in matrices, and strictly speaking, a vector is not exactly the same as either a single-row or a single-column matrix either. The explanation of why is discussed below.
If the dot product of two vectors is less than zero, then the angle between them is greater than 90°.
Orthogonal Vectors
Two vectors are said to be orthogonal if their inner product is 0. For example...
(1, 1, 1, 1) • (1, –1, 1, –1) = 1 + –1 + 1 + –1 = 0.
Vectors in two and three dimensions are orthogonal, in the sense just defined, if they are at right angles to each other.
Vector Length
The length of a vector x is defined to be the square root of its inner product with itself...
| x | = √(x • x) |
Where the length of x is sometimes written
Given x = (1, –1, 1, –1), then...
| x | = √((1×1) + (–1×–1) + (1×1) + (–1×–1)) = √4 = 2. |
Normal Vectors
A vector is said to be normal if it has length 1, also referred to as a unit vector, for example, the vector (0.5, –0.5, 0.5, –0.5) is normal. If x is any vector, then x/
Orthonormal Vectors
A set of vectors then are orthonormal, if they are all of unit length, and perpendicular to one another.
It makes no sense to describe a single vector as being orthonormal, or orthogonal.
Matrices
Dimensionality of a Matrix
Where a vector is 1-dimensional, a matrix has 2 (and less commonly, more than 2) dimensions.
Alternatively, you can think of a matrix as a rectangular array of numbers. Again, they don't have to be numbers, but for our purposes they will be. Here is an example of a matrix...
m = matrix(((1, 2, 3, 4), (5, 4, 3, 2), (0, 1, 0, -1))) print m #. Output... matrix([[ 1, 2, 3, 4], [ 5, 4, 3, 2], [ 0, 1, 0, -1]])
This is a 3×4 matrix (3 rows and 4 columns); the size of the matrix is 3×4. As with vectors, matrices of the same size can be added together by adding corresponding components.
Matrix Notation
When matrices are written in symbolic form, they look like this:
[ a11 a12 a13 a14 ] [ a21 a22 a23 a24 ] [ a31 a32 a33 a34 ]
Each entry, aij, has two subscripts - the first one, i in aij, indicates which row the entry belongs to, and the second one, j in aij, indicates which column the entry belongs to. For example, a32 belongs to the third row and second column of the matrix.
Matrices as a whole are often symbolised using a capital letter, like A. So aij is the prototypical entry in the matrix A. If A is an m×n matrix, then we say that m is the row dimension of the matrix (or just the number of rows) and n is the column dimension (or number of columns).
Matrix Transpose
The transpose AT of a matrix A is obtained by interchanging its rows and columns. So the transpose of the 3×4 matrix is illustrated in code...
print m print m.transpose() #. Output... matrix([[ 1, 2, 3, 4], [ 5, 4, 3, 2], [ 0, 1, 0, -1]]) matrix([[ 1, 5, 0], [ 2, 4, 1], [ 3, 3, 0], [ 4, 2, -1]])
The transpose of an m×n matrix is an n×m matrix.
Notice that a vector with n components can be viewed as a 1×n matrix, however there are subtle differences between a vector and 1-dimensional vector.
Matrix Transpose, the Inner Product vs the Dot Product, and the Outer Product
First of all, matrices imply an associated orientation (or alignment) feature which vectors lack; For example, a matrix of size 1×n is not the same as a matrix of size n×1, even if the component values are the same (in which case they would be transposes of one another). Conversely, no such orientation exists in vectors. Often the term column-vector or row-vector are used; these are effectively vectors with alignment/orientation, and hence, they're actually matrices. A few examples will now be made to illustrate why it's important to understand the difference between the row/column-vectors (1xn matrices), and vectors.
The computation of the inner product, and dot product of two vectors is exactly the same, because the transpose of a vector is the same as the original vector; this does not hold for matrices.
Here's a few examples starting with vectors...
v1 = vector((1, 2, 3, 4)) v2 = v1.transpose() print v1 == v2 print (dot(v1, v1), dot(v1, v2), dot(v2, v1), dot(v2, v2)) print (inner(v1, v1), inner(v1, v2), inner(v2, v1), inner(v2, v2)) #. Output... [True True True True] (30, 30, 30, 30) (30, 30, 30, 30)
That is, as a result of having no associated alignment, a vector's transpose is identical to the original, and hence, taking the inner product is identical to taking the dot product.
In matrices however, there is a difference...
m1 = matrix((1, 2, 3, 4)) m2 = m1.transpose() print m1 print m2 #. Output... [[1, 2, 3, 4]] [[1], [2], [3], [4]]
Hence the transpose of a row vector matrix is a column-vector matrix. Now comes the distinction between the dot product and the inner product of two matrices...
dot(m1, m1) dot(m2, m2) #. Output... Traceback (most recent call last): File "<stdin>", line 1, in ? ValueError: objects are not aligned Traceback (most recent call last): File "<stdin>", line 1, in ? ValueError: objects are not aligned
So how do we make them aligned? By transposing one of course, but which one? Well, depending on which one of the two we transpose, we would effectively be calculating either the inner product, or the outer product...
dot(m1.transpose(), m1) dot(m1, m1.transpose()) #. Output... matrix([[ 1, 2, 3, 4], [ 2, 4, 6, 8], [ 3, 6, 9, 12], [ 4, 8, 12, 16]]) matrix([[30]])
Note this is a little counter-intuitive as numpy does this the other-way around, if on paper you write M1 • M2, in numpy you'd write dot(m2, m1).
Aside from that little confusion, you can clearly see that the alignment or orientation in matrices is very important.
Of course, the inner-product and outer-products are there so there's no need to explicitely have to transpose your matrices all the time...
print inner(m1, m1) print outer(m1, m1) #. Output... [[30]] [[ 1, 2, 3, 4], [ 2, 4, 6, 8], [ 3, 6, 9, 12], [ 4, 8, 12, 16]]
Matrix Multiplication (Dot Product)
It is sometimes possible to multiply two matrices together. This is done, not by multiplying corresponding entries together as with matrix addition, but by computing an inner product for each entry of the product matrix.
The only time when two matrices are able to be multiplied is where the number of rows of the first matrix is equal to the number of columns in the second matrix, and the resulting matrix will have a dimension defined as follows... Am×n • Bn×p = Cm×p
For example...
m1 = ( (1, 2, 3), (5, 4, 3), ) m2 = ( (3, 0), (1, 1), (1, 0), ) print dot(m1, m2) print dot(m2, m1) #. Output [[ 8 2] [22 4]] [[3 6 9] [6 6 6] [1 2 3]]
Notice that it is sometimes possible to multiply a matrix by a vector (viewed as a matrix). If the matrix is m×n and the vector has n components, then you can view the vector as a column vector (n×1 matrix) and multiply them, to get an m×1 result - a column vector of dimension m...
m1=matrix(((1, 2, 3), (3, 4, 1), (0, 1, 2))) v1=array((1, 2, 1)) print dot(m1, v1) #. Output... [[ 8, 12, 4]]
You could call this post-multiplication by the vector, as the vector appears post - or after - the matrix. Pre-multiplication by a row vector is also sometimes possible too - an m-dimensional row vector (i.e. a 1×m matrix) can be multiplied by an m×n matrix to give a 1×n resultant matrix - another row-vector, but of dimension n.
print dot(v1, m1) #. Output... [[ 7, 11, 7]]
Matrix Facts
- (A.B)T = BT.AT
- If AT = A-1, then AT.A = I ∀A, where I is the Identity matrix
- A.AT = S ∀A, where S is a symmetrical matrix
- And if this is the case, then ST = S, proof...
- ST = (A.AT)T = ATT.AT = A.AT = S
- And if this is the case, then ST = S, proof...
Hyperplanes
The best way to describe a hyperplane is perhapse to start by example...
- A hyperplane in a 2-D space is a line
- A hyperplane in a 3-D space is a plane
- A hyperplane in a N-D space is not possible to illustrate graphically, but the concept and analogy follows from the previous two examples.
The general equation of a plane in 3-space is...
ax + by + cz = d
...where x, y, and z are variables and a, b, c and d are constant coefficients. That is, the plane is the set of all points (x, y, z) such that ax + by + cz = d holds.
Thinking in a vector and inner-product sense, the equation of a plane in 3-space can be rewritten as follows...
(a, b, c) • (x, y, z) = d
If we write a for (a, b, c) and x for (x, y, z) instead, the equation becomes...
a • x = d.
Note that any (but not all) of a, b, c, and d can be zero. So examples of equations of planes are:
a) 2x + 3y + 4z = 5
b) x + y = 3
c) y + z = 7
d) x + 3z = 4
e) x = 2
f) y = 1
g) z = 43.2
So, a plane is a hyperplane in 3-D space. Similarly, a line is a hyperplane in 2-space, and it has an general equation of the form a • x = d, where this time, a = (a, b), and x = (x, y). Notice that equations (b) through to (g), above, are equations of planes in 3-D space, but also are equations of lines in 2-D space. How can this be?
Consider (b): x + y = 3.
- In 2-D space, this is the set of all points (x, y) such that the equation holds.
- In 3D-space, it is the set of all points (x, y, z) such that the equation holds - i.e. x + y = 3, and z can have any value at all.
Similarly, with (e), in 2-D space, this is the line consisting of all points (x, y) such that x = 2, and y can have any value at all, and in 3-D space, it is the line consisting of all points (x, y, z) such that x = 2, and y and z can have any value at all.
In the case of a line in 2-D space, the line is one-dimensional, and 2-D space is of course 2-dimensional. For a plane in 3-D space, the plane is 2-dimensional and 3-D space is 3-dimensional. In general, in an n-dimensional space, (i.e. a space of vectors with n components), a hyperplane is a (n–1)-dimensional object. As before, its general equation is of the form a • x = d, however this time, a and x are n-dimensional vectors. So a = (a1, a2, ..., an) and x = (x1, x2, ..., xn).
In the case of a line in 2-D space, (ax + by = d), it is easy to check that for a point (x, y), not on the line, which side of the line the point is on is determined by whether ax + by – d is positive or negative. For example, with the line x + y = 1, the point (x, y) = (0, 0) is to the left of the line, and x + y – 1 = –1 is negative, while the point (x, y) = (1, 1) is to the right of the line and x + y – 1 = +1 is positive.
You can check this for planes in 3-D space, too, with a little more difficulty. It holds in general, that for a point x in n-D space, the point lies on one "side" of the hyperplane if a • x – d is negative, and on the other side if a • x – d is positive.
Eigenvectors & Eigenvalues
When you pre-multiply a column vector x by a matrix A: x → A⋅x, you transform the vector to another vector.
Let us consider a particular matrix A. For some vectors x, A⋅x will be a multiple of x. For example...
![$\\
Let\; x\; =\; \left[ 0,\; 1,\; 0 \right]^{T}\\
Let\; A\; =\; \left[ \begin{array}{ccc} 2 & 0 & 0 \\ 0 & 7 & 0 \\ 0 & 0 & 3 \end{array} \right]
$](http://ai.autonomy.net.au/tracmath/eb03dda4201da74f3492780e5fe5eb1b5cfa5614.png)
In such as case, it is easy to check that A⋅x = 7x.
In such a case we say that...
- x is an eigenvector of A
- The multiple - in this case 7 - is an eigenvalue of A
A matrix can have several eigenvalues (and hence several eigenvectors). For example, our sample matrix A has three eigenvalues: 2, 7, and 3.
In the generic case of Ax = λx, x is the eigenvector, and λ is the eigenvalue.
Useful Examples
References
Related
Ancestors ☣ Mathematics
Other ☣ Algorithm/Learning/TDL/QLearning

