Appendix A — Linear algebra

Matrices come to us from a branch of mathematics known as linear algebra. A full course in linear algebra would provide a lot more detail than we can cover here. Our goal is merely to provide enough background to make sense of the material in the text that uses matrices. We focus on some basic details such as what matrices are, some of the operations we can perform with them, and some useful results.

In the canonical setting in econometrics, we have \(n\) observations on \(k\) variables. Each observation might be a person or a firm, or even a firm at a particular point in time. In earlier chapters of the book, we considered the possibility that the variables for observation \(i\) are related in the following way:

\[ y_i = \beta_0 + x_{1i} \beta_1 + x_{2i} \beta_2 + \dots + x_{ki} \beta_{k} + \epsilon_i \] For example, \(y_i\) might represent the profitability of a firm in a given year and the various \(x\) variables are factors assumed to affect that profitability, such as capital stock and market concentration. The error term (\(\epsilon_i\)) allows the equation to hold when the variables \(x_{1i}\) through \(x_{ki}\) do not determine the exact value of \(y_i\). Given we have \(n\) observations, we actually have \(n\) equations.

\[ \begin{aligned} y_1 &= x_{11} \beta_1 + x_{21} \beta_2 + \dots + x_{k1} \beta_{k} + \epsilon_1 \\ y_2 &= x_{12} \beta_1 + x_{22} \beta_2 + \dots + x_{k2} \beta_{k} + \epsilon_2 \\ \vdots &= \qquad \vdots \qquad \vdots \qquad \vdots \qquad \vdots \qquad \vdots \\ y_n &= x_{ni} \beta_1 + x_{ni} \beta_2 + \dots + x_{ni} \beta_{k} + \epsilon_n \end{aligned} \] As we shall see, matrices allow us to write this system of equations in a succinct fashion that allows manipulations to be represented concisely.

A.1 Vectors

For an observation, we might have data on sales, profit, R&D spending, and fixed assets. We can arrange these data as a vector: \(y = ( \textrm{sales}, \textrm{profit}, \textrm{R\&D}, \textrm{fixed assets})\). This \(y_i\) is an \(n\)-tuple (here \(n = 4\)), which is a finite ordered list of elements. A more generic representation of a \(y\) would be \(y = (y_1, y_2, \dots, y_n)\).

A.1.1 Operations on vectors

Suppose we have two vectors \(x = (x_1, x_2, \dots, x_n)\) and \(y = (y_1, y_2, \dots, y_n)\).

Vectors of equal length can be added:

\[x + y = (x_1 + y_1, x_2 + y_2, \dots, x_n + y_n)\]

and subtracted:

\[x - y = (x_1 - y_1, x_2 - y_2, \dots, x_n - y_n)\] Vectors can also be multiplied by real number:

\[\lambda y = (\lambda y_1, \lambda y_2, \dots, \lambda y_n)\]

Definition A.1 (Dot product) The dot product (or scalar product) of two \(n\)-vectors \(x\) and \(y\) is denoted as \(x \cdot y\) and is defined as:

\[ x \cdot y = x_1 y_1 + x_2 y_2 + \dots + x_n y_n = \sum_{i=1}^n x_i y_i. \]

A.2 Matrices

A matrix is a rectangular array of real numbers.

\[ A = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1k} \\ a_{21} & a_{22} & \cdots & a_{2k} \\ \cdot & \cdot & \cdots & \cdot \\ a_{m1} & a_{m2} & \cdots & a_{mk} \end{bmatrix} \]

Matrices are typically denoted with capital letters (e.g., \(A\)) and the generic element of a matrix is denoted as \(a_{ij}\). We can also express a matrix in terms of its generic element and its dimensions as \(\left[ a_{ij} \right]_{m \times k}\).

Two important matrices are the null matrix, 0, which contains only zeros, and the identity matrix of size \(n\), \(I_n\) or simply \(I\), which has diagonal elements equal to one (\(i_{kk} = 1, \forall k\)) and all other elements equal to zero (\(i_{jk} = 0\) for \(\forall j \neq k\)). Provided that \(I\) and \(A\) are conformable for multiplication (e.g., they are both \(n \times n\) square matrices), then \(AI = A\) and \(IA = A\), hence the term identity matrix (in some ways, \(I\) is the matrix equivalent of the number \(1\)).

Each row or column of a matrix can be considered a vector, so that an \(m \times k\) matrix can be viewed as \(m\) \(k\)-vectors (the rows) or \(k\) \(m\)-vectors (the columns).

A.2.1 Operations on matrices

Suppose we have two matrices \(A = \left[ a_{ij} \right]_{m \times k}\) and \(B = \left[ b_{ij} \right]_{m \times k}\), then we can add these matrices

\[ A + B = \left[ a_{ij} + b_{ij} \right]_{m \times k} \] We can multiply a matrix by a real number \(\lambda\)

\[ \lambda A = \left[ \lambda a_{ij} \right]_{m \times k} \]

Matrix multiplication is defined for two matrices if the number of columns for the first is equal to the number of rows of the second.

Given matrices \(A = \left[ a_{ij} \right]_{m \times l}\) and \(B = \left[ b_{jk} \right]_{l \times n}\), the \(m \times n\) matrix \(AB\) with typical element \(c_{ik}\) is defined as

\[ AB = \left[ {c}_{ik} := \sum_{j=1}^l a_{ij} b_{jk} \right]_{m \times n}\] Alternatively \(c_{ik} = a_i \cdot b_k\), where \(a_i\) is the \(i\)-th row of \(A\) and \(b_k\) is the \(k\)-th column of \(B\). Not that multiplication of \(A\) and \(B\) requires that they be **conformable* for multiplication. In particular, the number of columns of \(A\) must equal the number of rows of \(B\) for \(AB\) to exist. If the number of rows of \(A\) does not equal the number of columns of \(B\), then \(BA\) will not exist (let alone equal \(AB\)).

Definition A.2 (Transpose) The matrix \(B = \left[ b_{ij} \right]_{n \times m}\) is called the transpose of a matrix \(A = \left[ a_{ij} \right]_{m \times n}\) (and denoted \(A^{\mathsf{T}}\)) if \(b_{ij} = a_{ji}\) for all \(i \in \{1, 2, \dots, m\}\) and all \(j \in \{1, 2, \dots, n\}\).

Definition A.3 (Square matrix) A matrix that has the same number of rows and columns is called a square matrix.

Definition A.4 (Symmetric) A square matrix is symmetric if \(a_{ij} = a_{ji}, \forall i, j\). Clearly if \(A\) is a symmetric matrix, then \(A = A^{\mathsf{T}}\).

Definition A.5 (Matrix inverse) An \(m \times m\) square matrix \(A\) has an inverse, if there exists a matrix denoted as \(A^{-1}\) such that \(A^{-1} A = I_m\) and \(A A^{-1} = I_m\), where \(I_m\) denotes the \(m \times m\) identity matrix. A matrix that has an inverse is said to be invertible or non-singular.

Properties of inverses.

If an inverse of \(A\) exists, it is unique.
If \(\alpha \neq 0\) and \(A\) is invertible, then \((\alpha A)^{-1} = 1/\alpha A^{-1}\).
If \(A\) and \(B\) are both invertible \(m \times m\) matrices, then \((AB)^{-1} = B^{-1} A^{-1}\).

Here we show a couple of useful results about transposes. First, for two square, invertible matrices \(A\) and \(B\), we have \((AB)^{\mathsf{T}} = B^{\mathsf{T}} A^{\mathsf{T}}\).

\[ \begin{aligned} (AB)^{\mathsf{T}} &= [ab_{ij}]^{\mathsf{T}} \\ &= \left[ ab_{ji} \right] \\ &= \left[ \sum_{k=1}^n a_{jk} b_{ki} \right] \\ &= \left[ \sum_{k=1}^n (B^{\mathsf{T}})_{ik} (A^{\mathsf{T}})_{kj} \right] \\ &= B^{\mathsf{T}} A^{\mathsf{T}} \end{aligned} \]

Second, that for a square, invertible matrix \(A\), we have \(\left(A^{\mathsf{T}}\right)^{-1} = \left(A^{-1}\right)^{\mathsf{T}}\):

\[ \begin{aligned} A A^{-1} &= I \\ \left(A^{-1}\right)^{\mathsf{T}} A^{\mathsf{T}} &= I \\ \left(A^{-1}\right)^{\mathsf{T}} A^{\mathsf{T}} \left(A^{\mathsf{T}}\right)^{-1}&= \left(A^{\mathsf{T}}\right)^{-1} \\ \left(A^{-1}\right)^{\mathsf{T}} &= \left(A^{\mathsf{T}}\right)^{-1} \end{aligned} \]

Definition A.6 (Diagonal matrix) A square matrix \(A\) is a diagonal matrix if \(a_{ij} = 0, \forall i \neq j\). In words, all off-diagonal elements of a diagonal matrix are zero.

Definition A.7 (Linear independence) Let \(\{x_1, x_2, \dots, x_r\}\) be a set of \(n \times 1\) vectors. We say that these vectors are linearly independent if and only if \[ \alpha_1 x_1 + \alpha_2 x_2 + \dots + \alpha_r x_r = 0 \tag{A.1}\] implies that \(\alpha_1 = \alpha_2 = \dots = \alpha_r = 0\). If Equation A.1 holds for a set of scalars that are not all zero, then \(\{x_1, x_2, \dots, x_r\}\) is linearly dependent.

Definition A.8 (Matrix rank) Let \(A\) be an \(m \times k\) matrix. The rank of a matrix \(A\) is the maximum number of linearly independent columns of \(A\). If \(A\) is \(m \times k\) and the rank of \(A\) is \(k\), then \(A\) has full column rank. If \(A\) is \(m \times k\) and \(m \geq k\), then its rank can be at most \(k\).

Some properties of rank include:

The rank of \(A\) equals the rank of \(A^{\mathsf{T}}\).
If \(A\) is a \(k \times k\) square matrix with rank \(k\), then it is non-singular.

Definition A.9 (Idempotent) A matrix \(A\) is idempotent if it has the property that \(A A = A\).

Definition A.10 (Projection matrix) Given a matrix \(X\), the projection matrix for \(X\) is denoted as \(P_X\) and is defined as \[ P_X = X(X^{\mathsf{T}} X)^{-1}X^{\mathsf{T}} \]

The following shows that \(P_X\) is idempotent:

\[ P_X P_X = X(X^{\mathsf{T}} X)^{-1}X^{\mathsf{T}} X(X^{\mathsf{T}} X)^{-1}X^{\mathsf{T}} = X(X^{\mathsf{T}} X)^{-1}X^{\mathsf{T}} = P_X \]

Note also that \(P_X\) is symmetric, which means that it and its transpose are equal, as the following demonstrates:

\[ \begin{aligned} P_X^{\mathsf{T}} &= \left(X(X^{\mathsf{T}} X)^{-1}X^{\mathsf{T}}\right)^{\mathsf{T}} \\ &= \left(X(X^{\mathsf{T}} X)^{-1}X^{\mathsf{T}}\right)^{\mathsf{T}} \\ &= X \left(X^{\mathsf{T}} X)^{-1}\right)^{\mathsf{T}} X^{\mathsf{T}} \\ &= X \left((X^{\mathsf{T}} X)^{\mathsf{T}}\right)^{-1} X^{\mathsf{T}} \\ &= X \left(X^{\mathsf{T}} X\right)^{-1} X^{\mathsf{T}} \\ &= P_X \end{aligned} \]

In this analysis, we used two results about transposes discussed above.

A.3 The OLS estimator

The classical linear regression model assumes that the data-generating process has \(y = X \beta + \epsilon\) where \(\epsilon \sim IID(0, \sigma^2 I)\), where \(y\) and \(\epsilon\) are \(n\)-vectors, \(X\) is an \(n \times k\) matrix (including the constant term), \(\beta\) is a \(k\)-vector, and \(I\) is the \(n \times n\) identity matrix.¹

As discussed in Chapter 3, the ordinary least-squares (OLS) estimator is given by:

\[ \hat{\beta} = \left(X^{\mathsf{T}} X\right)^{-1} X^{\mathsf{T}} y \] Here we can see that we can only calculate \(\hat{\beta}\) if \(X^{\mathsf{T}} X\) is invertible, which requires that it be of rank \(k\). This requires that no one column of \(X\) is a linear combination of the other columns of \(X\).

Assuming \(\mathbb{E}[\epsilon | X] = 0\), we can derive the following result:

\[ \begin{aligned} \mathbb{E}\left[\hat{\beta} \right] &= \mathbb{E}\left[\mathbb{E}\left[\hat{\beta} | X \right] \right] \\ &= \mathbb{E}\left[\left(X^{\mathsf{T}} X\right)^{-1} X^{\mathsf{T}} (X\beta + \epsilon) | X \right] \\ &= \mathbb{E}\left[\left(X^{\mathsf{T}} X\right)^{-1} X^{\mathsf{T}} X\beta | X \right] + \mathbb{E}\left[\left(X^{\mathsf{T}} X\right)^{-1} X^{\mathsf{T}} \epsilon | X \right] \\ &= \beta + \left(X^{\mathsf{T}} X\right)^{-1} X^{\mathsf{T}} \mathbb{E}\left[ \epsilon | X \right] \\ &= \beta \end{aligned} \] This demonstrates that \(\hat{\beta}\) is unbiased given these assumptions. But note that the assumption that \(\mathbb{E}[\epsilon | X] = 0\) can be a strong one in some situations. For example, Davidson and MacKinnon point out that “in the context of time-series data, [this] assumption is a very strong one that we may often not feel comfortable making.” As such, many textbook treatments replace \(\mathbb{E}[\epsilon | X] = 0\) with weaker assumptions and focus on the asymptotic property of consistency instead of unbiasedness.

A.4 Further reading

This appendix barely scratches the surface of matrices and linear algebra. Many econometrics textbooks have introductory sketches of linear algebra that go beyond what we have provided here. Chapter 1 of Davidson and MacKinnon (2004) and Appendix D of Wooldridge (2000) cover the results provided here and more. Standard introductory texts for mathematical economics, such as Chiang (1984) and Simon and Blume (1994), provide introductions to linear algebra.

Here \(IID(0, \sigma^2 I)\) means independent and identically distributed with mean \(0\) and variance \(\sigma^2 I\).↩︎