2. Linear Algebra#
Contents
2.1. Overview#
Linear algebra is one of the most useful branches of applied mathematics for economists to invest in.
For example, many applied problems in economics and finance require the solution of a linear system of equations, such as
or, more generally,
The objective here is to solve for the “unknowns”
When considering such problems, it is essential that we first consider at least some of the following questions
Does a solution actually exist?
Are there in fact many solutions, and if so how should we interpret them?
If no solution exists, is there a best “approximate” solution?
If a solution exists, how should we compute it?
These are the kinds of topics addressed by linear algebra.
In this lecture we will cover the basics of linear and matrix algebra, treating both theory and computation.
We admit some overlap with this lecture, where operations on NumPy arrays were first explained.
Note that this lecture is more theoretical than most, and contains background material that will be used in applications as we go along.
Let’s start with some imports:
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (11, 5) #set default figure size
import numpy as np
from matplotlib import cm
from mpl_toolkits.mplot3d import Axes3D
from scipy.linalg import inv, solve, det, eig
2.2. Vectors#
A vector of length
We will write these sequences either horizontally or vertically as we please.
(Later, when we wish to perform certain matrix operations, it will become necessary to distinguish between the two)
The set of all
For example,
Traditionally, vectors are represented visually as arrows from the origin to the point.
The following figure represents three vectors in this manner
fig, ax = plt.subplots(figsize=(10, 8))
# Set the axes through the origin
for spine in ['left', 'bottom']:
ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')
ax.set(xlim=(-5, 5), ylim=(-5, 5))
ax.grid()
vecs = ((2, 4), (-3, 3), (-4, -3.5))
for v in vecs:
ax.annotate('', xy=v, xytext=(0, 0),
arrowprops=dict(facecolor='blue',
shrink=0,
alpha=0.7,
width=0.5))
ax.text(1.1 * v[0], 1.1 * v[1], str(v))
plt.show()

2.2.1. Vector Operations#
The two most common operators for vectors are addition and scalar multiplication, which we now describe.
As a matter of definition, when we add two vectors, we add them element-by-element
Scalar multiplication is an operation that takes a number
Scalar multiplication is illustrated in the next figure
fig, ax = plt.subplots(figsize=(10, 8))
# Set the axes through the origin
for spine in ['left', 'bottom']:
ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')
ax.set(xlim=(-5, 5), ylim=(-5, 5))
x = (2, 2)
ax.annotate('', xy=x, xytext=(0, 0),
arrowprops=dict(facecolor='blue',
shrink=0,
alpha=1,
width=0.5))
ax.text(x[0] + 0.4, x[1] - 0.2, '$x$', fontsize='16')
scalars = (-2, 2)
x = np.array(x)
for s in scalars:
v = s * x
ax.annotate('', xy=v, xytext=(0, 0),
arrowprops=dict(facecolor='red',
shrink=0,
alpha=0.5,
width=0.5))
ax.text(v[0] + 0.4, v[1] - 0.2, f'${s} x$', fontsize='16')
plt.show()

In Python, a vector can be represented as a list or tuple, such as x = (2, 4, 6)
, but is more commonly
represented as a NumPy array.
One advantage of NumPy arrays is that scalar multiplication and addition have very natural syntax
x = np.ones(3) # Vector of three ones
y = np.array((2, 4, 6)) # Converts tuple (2, 4, 6) into array
x + y
array([3., 5., 7.])
4 * x
array([4., 4., 4.])
2.2.2. Inner Product and Norm#
The inner product of vectors
Two vectors are called orthogonal if their inner product is zero.
The norm of a vector
The expression
Continuing on from the previous example, the inner product and norm can be computed as follows
np.sum(x * y) # Inner product of x and y
12.0
np.sqrt(np.sum(x**2)) # Norm of x, take one
1.7320508075688772
np.linalg.norm(x) # Norm of x, take two
1.7320508075688772
2.2.3. Span#
Given a set of vectors
New vectors created in this manner are called linear combinations of
In particular,
In this context, the values
The set of linear combinations of
The next figure shows the span of
The span is a two-dimensional plane passing through these two points and the origin.
ax = plt.figure(figsize=(10, 8)).add_subplot(projection='3d')
x_min, x_max = -5, 5
y_min, y_max = -5, 5
α, β = 0.2, 0.1
ax.set(xlim=(x_min, x_max), ylim=(x_min, x_max), zlim=(x_min, x_max),
xticks=(0,), yticks=(0,), zticks=(0,))
gs = 3
z = np.linspace(x_min, x_max, gs)
x = np.zeros(gs)
y = np.zeros(gs)
ax.plot(x, y, z, 'k-', lw=2, alpha=0.5)
ax.plot(z, x, y, 'k-', lw=2, alpha=0.5)
ax.plot(y, z, x, 'k-', lw=2, alpha=0.5)
# Fixed linear function, to generate a plane
def f(x, y):
return α * x + β * y
# Vector locations, by coordinate
x_coords = np.array((3, 3))
y_coords = np.array((4, -4))
z = f(x_coords, y_coords)
for i in (0, 1):
ax.text(x_coords[i], y_coords[i], z[i], f'$a_{i+1}$', fontsize=14)
# Lines to vectors
for i in (0, 1):
x = (0, x_coords[i])
y = (0, y_coords[i])
z = (0, f(x_coords[i], y_coords[i]))
ax.plot(x, y, z, 'b-', lw=1.5, alpha=0.6)
# Draw the plane
grid_size = 20
xr2 = np.linspace(x_min, x_max, grid_size)
yr2 = np.linspace(y_min, y_max, grid_size)
x2, y2 = np.meshgrid(xr2, yr2)
z2 = f(x2, y2)
ax.plot_surface(x2, y2, z2, rstride=1, cstride=1, cmap=cm.jet,
linewidth=0, antialiased=True, alpha=0.2)
plt.show()

2.2.3.1. Examples#
If
If
then the span of
Now consider
If
Hence
2.2.4. Linear Independence#
As we’ll see, it’s often desirable to find families of vectors with relatively large span, so that many vectors can be described by linear operators on a few vectors.
The condition we need for a set of vectors to have a large span is what’s called linear independence.
In particular, a collection of vectors
linearly dependent if some strict subset of
has the same span as .linearly independent if it is not linearly dependent.
Put differently, a set of vectors is linearly independent if no vector is redundant to the span and linearly dependent otherwise.
To illustrate the idea, recall the figure that showed the span of vectors
If we take a third vector
linearly dependent if
lies in the planelinearly independent otherwise
As another illustration of the concept, since
The following statements are equivalent to linear independence of
No vector in
can be formed as a linear combination of the other elements.If
for scalars , then .
(The zero in the first expression is the origin of
2.2.5. Unique Representations#
Another nice thing about sets of linearly independent vectors is that each element in the span has a unique representation as a linear combination of these vectors.
In other words, if
then no other coefficient sequence
Indeed, if we also have
Linear independence now implies
2.3. Matrices#
Matrices are a neat way of organizing data for use in linear operations.
An
Often, the numbers in the matrix represent coefficients in a system of linear equations, as discussed at the start of this lecture.
For obvious reasons, the matrix
In the former case,
If
The matrix formed by replacing
If
For a square matrix
If, in addition to being diagonal, each element along the principal diagonal is equal to 1, then
2.3.1. Matrix Operations#
Just as was the case for vectors, a number of algebraic operations are defined for matrices.
Scalar multiplication and addition are immediate generalizations of the vector case:
and
In the latter case, the matrices must have the same shape in order for the definition to make sense.
We also have a convention for multiplying two matrices.
The rule for matrix multiplication generalizes the idea of inner products discussed above and is designed to make multiplication play well with basic linear operations.
If
There are many tutorials to help you visualize this operation, such as this one, or the discussion on the Wikipedia page.
If
As perhaps the most important special case, consider multiplying
According to the preceding rule, this gives us an
Note
Another important special case is the identity matrix.
You should check that if
If
2.3.2. Matrices in NumPy#
NumPy arrays are also used as matrices, and have fast, efficient functions and methods for all the standard matrix operations 1.
You can create them manually from tuples of tuples (or lists of lists) as follows
A = ((1, 2),
(3, 4))
type(A)
tuple
A = np.array(A)
type(A)
numpy.ndarray
A.shape
(2, 2)
The shape
attribute is a tuple giving the number of rows and columns —
see here
for more discussion.
To get the transpose of A
, use A.transpose()
or, more simply, A.T
.
There are many convenient functions for creating common matrices (matrices of zeros, ones, etc.) — see here.
Since operations are performed elementwise by default, scalar multiplication and addition have very natural syntax
A = np.identity(3)
B = np.ones((3, 3))
2 * A
array([[2., 0., 0.],
[0., 2., 0.],
[0., 0., 2.]])
A + B
array([[2., 1., 1.],
[1., 2., 1.],
[1., 1., 2.]])
To multiply matrices we use the @
symbol.
In particular, A @ B
is matrix multiplication, whereas A * B
is element-by-element multiplication.
See here for more discussion.
2.3.3. Matrices as Maps#
Each
These kinds of functions have a special property: they are linear.
A function
You can check that this holds for the function
In fact, it’s known that
2.4. Solving Systems of Equations#
Recall again the system of equations (2.1).
If we compare (2.1) and (2.2), we see that (2.1) can now be written more conveniently as
The problem we face is to determine a vector
This is a special case of a more general problem: Find an
Given an arbitrary function
If so, is it always unique?
The answer to both these questions is negative, as the next figure shows
def f(x):
return 0.6 * np.cos(4 * x) + 1.4
xmin, xmax = -1, 1
x = np.linspace(xmin, xmax, 160)
y = f(x)
ya, yb = np.min(y), np.max(y)
fig, axes = plt.subplots(2, 1, figsize=(10, 10))
for ax in axes:
# Set the axes through the origin
for spine in ['left', 'bottom']:
ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')
ax.set(ylim=(-0.6, 3.2), xlim=(xmin, xmax),
yticks=(), xticks=())
ax.plot(x, y, 'k-', lw=2, label='$f$')
ax.fill_between(x, ya, yb, facecolor='blue', alpha=0.05)
ax.vlines([0], ya, yb, lw=3, color='blue', label='range of $f$')
ax.text(0.04, -0.3, '$0$', fontsize=16)
ax = axes[0]
ax.legend(loc='upper right', frameon=False)
ybar = 1.5
ax.plot(x, x * 0 + ybar, 'k--', alpha=0.5)
ax.text(0.05, 0.8 * ybar, '$y$', fontsize=16)
for i, z in enumerate((-0.35, 0.35)):
ax.vlines(z, 0, f(z), linestyle='--', alpha=0.5)
ax.text(z, -0.2, f'$x_{i}$', fontsize=16)
ax = axes[1]
ybar = 2.6
ax.plot(x, x * 0 + ybar, 'k--', alpha=0.5)
ax.text(0.04, 0.91 * ybar, '$y$', fontsize=16)
plt.show()

In the first plot, there are multiple solutions, as the function is not one-to-one, while
in the second there are no solutions, since
Can we impose conditions on
In this context, the most important thing to recognize about the expression
In particular, if
Hence the range of
We want the range to be large so that it contains arbitrary
As you might recall, the condition that we want for the span to be large is linear independence.
A happy fact is that linear independence of the columns of
Indeed, it follows from our earlier discussion that if
2.4.1. The Square Matrix Case#
Let’s discuss some more details, starting with the case where
This is the familiar case where the number of unknowns equals the number of equations.
For arbitrary
In view of the observations immediately above, if the columns of
Hence there always exists an
Moreover, the solution is unique.
In particular, the following are equivalent
The columns of
are linearly independent.For any
, the equation has a unique solution.
The property of having linearly independent columns is sometimes expressed as having full column rank.
2.4.1.1. Inverse Matrices#
Can we give some sort of expression for the solution?
If
A similar expression is available in the matrix case.
In particular, if square matrix
As a consequence, if we pre-multiply both sides of
This is the solution that we’re looking for.
2.4.1.2. Determinants#
Another quick comment about square matrices is that to every such matrix we assign a unique number called the determinant of the matrix — you can find the expression for it here.
If the determinant of
Perhaps the most important fact about determinants is that
This gives us a useful one-number summary of whether or not a square matrix can be inverted.
2.4.2. More Rows than Columns#
This is the
This case is very important in many settings, not least in the setting of linear regression (where
Given arbitrary
In this setting, the existence of a solution is highly unlikely.
Without much loss of generality, let’s go over the intuition focusing on the case where the columns of
It follows that the span of the columns of
This span is very “unlikely” to contain arbitrary
To see why, recall the figure above, where
Imagine an arbitrarily chosen
What’s the likelihood that
In a sense, it must be very small, since this plane has zero “thickness”.
As a result, in the
However, we can still seek the best approximation, for example, an
To solve this problem, one can use either calculus or the theory of orthogonal projections.
The solution is known to be
2.4.3. More Columns than Rows#
This is the
In this case there are either no solutions or infinitely many — in other words, uniqueness never holds.
For example, consider the case where
Thus, the columns of
This set can never be linearly independent, since it is possible to find two vectors that span
(For example, use the canonical basis vectors)
It follows that one column is a linear combination of the other two.
For example, let’s say that
Then if
In other words, uniqueness fails.
2.4.4. Linear Equations with SciPy#
Here’s an illustration of how to solve linear equations with SciPy’s linalg
submodule.
All of these routines are Python front ends to time-tested and highly optimized FORTRAN code
A = ((1, 2), (3, 4))
A = np.array(A)
y = np.ones((2, 1)) # Column vector
det(A) # Check that A is nonsingular, and hence invertible
-2.0
A_inv = inv(A) # Compute the inverse
A_inv
array([[-2. , 1. ],
[ 1.5, -0.5]])
x = A_inv @ y # Solution
A @ x # Should equal y
array([[1.],
[1.]])
solve(A, y) # Produces the same solution
array([[-1.],
[ 1.]])
Observe how we can solve for inv(A) @ y
, or using solve(A, y)
.
The latter method uses a different algorithm (LU decomposition) that is numerically more stable, and hence should almost always be preferred.
To obtain the least-squares solution scipy.linalg.lstsq(A, y)
.
2.5. Eigenvalues and Eigenvectors#
Let
If
then we say that
Thus, an eigenvector of
The next figure shows two eigenvectors (blue arrows) and their images under
As expected, the image
A = ((1, 2),
(2, 1))
A = np.array(A)
evals, evecs = eig(A)
evecs = evecs[:, 0], evecs[:, 1]
fig, ax = plt.subplots(figsize=(10, 8))
# Set the axes through the origin
for spine in ['left', 'bottom']:
ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')
ax.grid(alpha=0.4)
xmin, xmax = -3, 3
ymin, ymax = -3, 3
ax.set(xlim=(xmin, xmax), ylim=(ymin, ymax))
# Plot each eigenvector
for v in evecs:
ax.annotate('', xy=v, xytext=(0, 0),
arrowprops=dict(facecolor='blue',
shrink=0,
alpha=0.6,
width=0.5))
# Plot the image of each eigenvector
for v in evecs:
v = A @ v
ax.annotate('', xy=v, xytext=(0, 0),
arrowprops=dict(facecolor='red',
shrink=0,
alpha=0.6,
width=0.5))
# Plot the lines they run through
x = np.linspace(xmin, xmax, 3)
for v in evecs:
a = v[1] / v[0]
ax.plot(x, a * x, 'b-', lw=0.4)
plt.show()

The eigenvalue equation is equivalent to
This in turn is equivalent to stating that the determinant is zero.
Hence to find all eigenvalues, we can look for
This problem can be expressed as one of solving for the roots of a polynomial
in
This in turn implies the existence of
Some nice facts about the eigenvalues of a square matrix
The determinant of
equals the product of the eigenvalues.The trace of
(the sum of the elements on the principal diagonal) equals the sum of the eigenvalues.If
is symmetric, then all of its eigenvalues are real.If
is invertible and are its eigenvalues, then the eigenvalues of are .
A corollary of the first statement is that a matrix is invertible if and only if all its eigenvalues are nonzero.
Using SciPy, we can solve for the eigenvalues and eigenvectors of a matrix as follows
A = ((1, 2),
(2, 1))
A = np.array(A)
evals, evecs = eig(A)
evals
array([ 3.+0.j, -1.+0.j])
evecs
array([[ 0.70710678, -0.70710678],
[ 0.70710678, 0.70710678]])
Note that the columns of evecs
are the eigenvectors.
Since any scalar multiple of an eigenvector is an eigenvector with the same eigenvalue (check it), the eig routine normalizes the length of each eigenvector to one.
2.5.1. Generalized Eigenvalues#
It is sometimes useful to consider the generalized eigenvalue problem, which, for given
matrices
This can be solved in SciPy via scipy.linalg.eig(A, B)
.
Of course, if
2.6. Further Topics#
We round out our discussion by briefly mentioning several other important topics.
2.6.1. Series Expansions#
Recall the usual summation formula for a geometric progression, which states
that if
A generalization of this idea exists in the matrix setting.
2.6.1.1. Matrix Norms#
Let
The norms on the right-hand side are ordinary vector norms, while the norm on the left-hand side is a matrix norm — in this case, the so-called spectral norm.
For example, for a square matrix
2.6.1.2. Neumann’s Theorem#
Let
In other words,
Neumann’s theorem states the following: If
2.6.1.3. Spectral Radius#
A result known as Gelfand’s formula tells us that, for any square matrix
Here
As a consequence of Gelfand’s formula, if all eigenvalues are strictly less than one in modulus,
there exists a
In which case (2.4) is valid.
2.6.2. Positive Definite Matrices#
Let
We say that
positive definite if
for everypositive semi-definite or nonnegative definite if
for every
Analogous definitions exist for negative definite and negative semi-definite matrices.
It is notable that if
2.6.3. Differentiating Linear and Quadratic Forms#
The following formulas are useful in many economic contexts. Let
and all be vectors be an matrix be an matrix and be an vector
Then
Exercise 2.1 below asks you to apply these formulas.
2.6.4. Further Reading#
The documentation of the scipy.linalg
submodule can be found here.
Chapters 2 and 3 of the Econometric Theory contains a discussion of linear algebra along the same lines as above, with solved exercises.
If you don’t mind a slightly abstract approach, a nice intermediate-level text on linear algebra is [Jänich, 1994].
2.7. Exercises#
Exercise 2.1
Let
subject to the linear constraint
Here
is an matrix and is an matrix is an matrix and is an matrixboth
and are symmetric and positive semidefinite
(What must the dimensions of
One way to solve the problem is to form the Lagrangian
where
Try applying the formulas given above for differentiating quadratic and linear forms to obtain the first-order conditions for maximizing
Show that these conditions imply that
.The optimizing choice of
satisfies .The function
satisfies where .
As we will see, in economic contexts Lagrange multipliers often are shadow prices.
Note
If we don’t care about the Lagrange multipliers, we can substitute the constraint into the objective function, and then just maximize
Solution to Exercise 2.1
We have an optimization problem:
s.t.
with primitives
be a symmetric and positive semidefinite matrix be a symmetric and positive semidefinite matrix an matrix an matrix
The associated Lagrangian is:
Step 1.
Differentiating Lagrangian equation w.r.t y and setting its derivative equal to zero yields
since P is symmetric.
Accordingly, the first-order condition for maximizing L w.r.t. y implies
Step 2.
Differentiating Lagrangian equation w.r.t. u and setting its derivative equal to zero yields
Substituting
Substituting the linear constraint
which is the first-order condition for maximizing
Thus, the optimal choice of u must satisfy
which follows from the definition of the first-order conditions for Lagrangian equation.
Step 3.
Rewriting our problem by substituting the constraint into the objective function, we get
Since we know the optimal choice of u satisfies
To evaluate the function
For simplicity, denote by
Regarding the second term
Notice that the term
Regarding the third term
Hence, the summation of second and third terms is
This implies that
Therefore, the solution to the optimization problem
- 1
Although there is a specialized matrix data type defined in NumPy, it’s more standard to work with ordinary NumPy arrays. See this discussion.
- 2
Suppose that
. Take any nonzero vector , and let . We have . Hence every point is pulled towards the origin.