Linear Algebra

### Introduction

Linear algebra is the language of quantum computing. It is therefore crucial to develop a good understanding of the basic mathematical concepts that linear algebra is built upon, in order to arrive at many of the amazing and interesting constructions seen in quantum computation. The goal of this section is to create a foundation of introductory linear algebra knowledge, upon which the reader can build during their study of quantum computing.

### Vectors and Vector Spaces

We will start our investigation into introductory linear algebra by first discussing one of the most important mathematical quantities in quantum computation: the vector.

Formally, a vector $|v\rangle$ is defined as elements of a set known as a vector space. A more intuitive and geometric definition is that a vector "is a mathematical quantity with both direction and magnitude". For instance, consider a vector with $x$ and $y$ components of the form $\begin{pmatrix} 3 \\ 5 \end{pmatrix}$. This vector can be visualized as an arrow pointing in the direction of $3$ units down the $x$ axis and $5$ units up the $y$ axis:

Note that "tail" of the vector doesn't have to be positioned at the origin; it only needs to point in the correct direction.

In quantum computing, we often deal with state vectors, which are simply vectors that point to a specific point in space that corresponds to a particular quantum state. This can be visualized using a Bloch sphere. For instance, a vector representing the state of a quantum system could look something like this arrow, enclosed inside the Bloch sphere, which is the so-called "state space" of all possible points to which our state vectors can "point":

This particular state corresponds to an even superposition between $|0\rangle$ and $|1\rangle$ (the arrow is halfway between $|0\rangle$ at the top and $|1\rangle$ at the bottom of the sphere). Our vectors are allowed to rotate anywhere on the surface of the sphere, and each of these points represents a different quantum state.

Let's revisit our more formal definition of a vector, which is that a vector is an element of a vector space. We must now define a vector space. A vector space $V$ over a field $F$ is a set of objects (vectors), where two conditions hold. Firstly, vector addition of two vectors $|a\rangle, \ |b\rangle \ \in \ V$ will yield a third vector $|a\rangle \ + \ |b\rangle \ = \ |c\rangle$, also contained in $V$. The second condition is that scalar multiplication between some $|a\rangle \ \in \ V$ and some $n \ \in \ F$, denoted by $n|a\rangle$, is also contained within $V$.

We will now clarify this previous definition by working through a basic example. Let us demonstrate that the set $\mathbb{R}^2$ over the field $\mathbb{R}$ is a vector space. We assert that

$$\begin{pmatrix} x_1 \\ y_1 \end{pmatrix} \ + \ \begin{pmatrix} x_2 \\ y_2 \end{pmatrix} \ = \ \begin{pmatrix} x_1 \ + \ x_2 \\ y_1 \ + \ y_2 \end{pmatrix}$$

is contained within $\mathbb{R}^2$. This is evidently the case, as the sum of two real numbers is a real number, making both components of the newly-formed vector real numbers; thus, the vector is contained in $\mathbb{R}^2$ by definition. We also assert that:

$$n |v\rangle \ = \ \begin{pmatrix} nx \\ ny \end{pmatrix} \ \in \ V \ \ \ \ \forall n \ \in \ \mathbb{R}$$

This is true as well, as the product of a real number and a real number is a real number, making the entire new vector real, and thus proving this statement.

### Matrices and Matrix Operations

Let's turn our attention to another fundamental concept: a matrix. Matrices are mathematical objects that transform vectors into other vectors:

$$|v\rangle \ \rightarrow \ |v'\rangle \ = \ M |v\rangle$$

Generally, matrices are written as "arrays" of numbers, looking something like this:

$$M \ = \ \begin{pmatrix} 1 & -2 & 3 \\ 1 & 5i & 0 \\ 1 \ + \ i & 7 & -4 \end{pmatrix}$$

We can "apply" a matrix to a vector by performing matrix multiplication. In general, matrix multiplication between two matrices involves taking the first row of the first matrix, and multiplying each element by its "partner" in the first column of the second matrix (the first number of the row is multiplied by the first number of the column, second number of the row and second number of column, etc.). The sum of these new numbers becomes the first element of the first row of the new matrix. To fill in the rest of the first row, we repeat this process for the second, third, etc. columns of the second matrix. Then we take the second row of the first matrix, and repeat the process for each column of the second matrix, to produce the second row. We perform this process until we have used all rows of the first matrix. The resulting matrix is our new matrix. Here is an example:

$$\begin{pmatrix} 2 & 0 \\ 5 & -1 \end{pmatrix} \begin{pmatrix} -3 & 1 \\ 2 & 1 \end{pmatrix} \ = \ \begin{pmatrix} (2)(-3) + (0)(2) & (2)(1) \ + \ (0)(1) \\ (5)(-3) + (-1)(2) & (5)(1) \ + \ (-1)(1) \end{pmatrix} \ = \ \begin{pmatrix} -6 & 2 \\ -17 & 4 \end{pmatrix}$$

To perform a quantum computation, we have some quantum state vector we manipulate by applying a matrix to that vector. A vector is simply a matrix with one column. To apply a matrix to a vector, therefore, we follow the same matrix multiplication procedure described above. We manipulate qubits in our quantum computer by applying sequences of quantum gates. Each quantum gate can be expressed as a matrix that can be applied to state vectors, thus changing the state. For instance, a commonly seen quantum gate is the Pauli-X gate, which is represented by the following matrix:

$$\sigma_x \ = \ \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}$$

This gate acts similarly to the classical NOT logic gate. It maps the computational basis state $|0\rangle$ to $|1\rangle$ and $|1\rangle$ to $|0\rangle$ (it "flips" the state). We write the two basis states as column vectors:

$$|0\rangle \ = \ \begin{pmatrix} 1 \\ 0 \end{pmatrix} \ \ \ \ \ \ \ |1\rangle \ = \ \begin{pmatrix} 0 \\ 1 \end{pmatrix}$$

When we apply this matrix to each of the vectors:

$$\sigma_x |0\rangle \ = \ \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix} \begin{pmatrix} 1 \\ 0 \end{pmatrix} \ = \ \begin{pmatrix} (0)(1) \ + \ (1)(0) \\ (1)(1) \ + \ (0)(0) \end{pmatrix} \ = \ \begin{pmatrix} 0 \\ 1 \end{pmatrix} \ = \ |1\rangle$$

$$\sigma_x |1\rangle \ = \ \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix} \begin{pmatrix} 0 \\ 1 \end{pmatrix} \ = \ \begin{pmatrix} (0)(0) \ + \ (1)(1) \\ (1)(0) \ + \ (0)(1) \end{pmatrix} \ = \ \begin{pmatrix} 1 \\ 0 \end{pmatrix} \ = \ |0\rangle$$

The matrix acts on the state vectors as expected.

Within quantum computation, we often encounter two important types of matrices: Hermitian and Unitary matrices. The former is more important in the study of quantum mechanics, but is still necessary to discuss in a study of quantum computation. The latter is of unparalleled importance in both quantum mechanics and quantum computation. If you take away only one concept from this section on linear algebra, it should be the concept of a unitary matrix.

A Hermitian matrix is simply a matrix that is equal to its conjugate transpose (denoted with a $\dagger$ symbol). This means that flipping the sign of a Hermitian matrix's imaginary components, then reflecting its entries along its main diagonal (from the top left to bottom right corners), produces an equal matrix. For instance, the Pauli-Y matrix, commonly used in quantum computation, is Hermitian:

$$\sigma_y \ = \ \begin{pmatrix} 0 & -i \\ i & 0 \end{pmatrix} \ \Rightarrow \ \sigma_y^{\dagger} \ = \ \begin{pmatrix} 0 & -(i) \\ -(-i) & 0 \end{pmatrix} \ = \ \begin{pmatrix} 0 & -i \\ i & 0 \end{pmatrix} \ = \ \sigma_y$$

Notice how we switched the places of the $i$ and the $-i$ (as we reflect across the main diagonal, the zeroes remain unchanged), and then flipped the sign.

A unitary matrix is very similar. Specifically, it is a matrix such that the inverse matrix is equal to the conjugate transpose of the original matrix.

The inverse of some matrix $A$, denoted as $A^{-1}$, is a matrix such that:

$$A^{-1} A \ = \ A A^{-1} \ = \ \mathbb{I}$$

where $\mathbb{I}$ is the identity matrix. The identity matrix has $1$s along the main diagonal (top left to bottom right), and $0$s in all other places. It is called the identity matrix because it acts trivially on any other matrix (it has no effect). You can prove this on your own by multiplying an identity matrix by any other matrix.

When matrices get larger than $2 \ \times \ 2$, calculating the inverse becomes sufficiently complicated that it is usually left to computers to calculate. For a $2 \ \times \ 2$ matrix, the inverse is defined as:

$$A \ = \ \begin{pmatrix} a & b \\ c & d \end{pmatrix} \ \Rightarrow \ A^{-1} \ = \ \frac{1}{\text{det} \ A} \begin{pmatrix} d & -b \\ -c & a \end{pmatrix},$$

where $\text{det} \ A$ is the determinant of the matrix. In the $2 \ \times \ 2$ case, $\text{det} \ A \ = \ ad \ - \ bc$.

Calculating inverse matrices is rarely important in quantum computing. Since most of the matrices we encounter are unitary, we can assume that the inverse is simply given by taking the conjugate transpose.

Let's look at a basic example. The Pauli-Y matrix, in addition to being Hermitian, is also unitary (it is equal to its conjugate transpose, which is also equal to its inverse; therefore, the Pauli-Y matrix is its own inverse!). We can verify that this matrix is in fact unitary:

$$\sigma_y \ = \ \begin{pmatrix} 0 & -i \\ i & 0 \end{pmatrix} \ \ \ \ \ \sigma_y^{\dagger} \ = \ \begin{pmatrix} 0 & -i \\ i & 0 \end{pmatrix} \ \Rightarrow \ \sigma_y^{\dagger} \sigma_y \ = \ \begin{pmatrix} (0)(0) + (-i)(i) & (0)(-i) \ + \ (-i)(0) \\ (i)(0) \ + \ (0)(i) & (i)(-i) \ + \ (0)(0) \end{pmatrix} \ = \ \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} \ = \ \mathbb{I}$$

The reason unitary matrices are important will become more apparent in the section on Hilbert spaces, and more so in the quantum mechanics subtopic of this textbook. The basic idea is that evolution of a quantum state by application of a unitary matrix "preserves" the norm (magnitude) of the quantum state.

### Spanning Sets, Linear Dependence, and Bases

We are now in a position to discuss the construction of vector spaces. Consider some vector space $V$. We say that some set of vectors $S$ spans a subspace $V_S \ \subset \ V$ (subset closed under vector space operations) of the vector space, if we can write any vector in the subspace as a linear combination of vectors contained within the spanning set.

A linear combination of some collection vectors $|v_1\rangle, \ ..., \ |v_n\rangle$ in some vector space over a field $F$ is defined as an arbitrary sum of these vectors (which of course will be another vector that we will call $|v\rangle$):

$$|v\rangle \ = \ f_1 |v_1\rangle \ + \ f_2 |v_2\rangle \ + \ ... \ + \ f_n |v_n\rangle \ = \ \displaystyle\sum_{i} \ f_i |v_i\rangle$$

where each $f_i$ is some element of $F$. If we have a set of vectors that spans a space, we are saying that any other vector in the vector space can be written as a linear combination of these vectors.

A set of vectors $|v_1\rangle, \ ..., \ |v_n\rangle$ is said to be linearly dependent if there exist corresponding coefficients for each vector, $b_i \ \in \ F$, such that:

$$b_1 |v_1\rangle \ + \ b_2 |v_2\rangle \ + \ ... \ + \ b_n |v_n\rangle \ = \ \displaystyle\sum_{i} \ b_i |v_i\rangle \ = \ 0,$$

where at least one of the $b_i$ coefficients is non-zero. This is equivalent to the more intuitive statement that "the set of vectors can be expressed as linear combinations of each other". For example, let us have the set $\{|v_1\rangle, \ ..., \ |v_n\rangle \}$ along with the corresponding coefficients $\{b_1, \ ..., \ b_n \}$, such that the linear combination is equal to $0$. Since there is at least one vector with a non-zero coefficient, we choose a term in the linear combination $b_a |v_a\rangle$:

$$\displaystyle\sum_{i} \ b_i |v_i\rangle \ = \ b_a |v_a\rangle \ + \ \displaystyle\sum_{i, \ i \ \neq \ a} \ b_i |v_i\rangle \ = \ 0 \ \Rightarrow \ |v_a\rangle \ = \ - \displaystyle\sum_{i, \ i \ \neq \ a} \ \frac{b_i}{b_a} |v_i\rangle \ = \ \displaystyle\sum_{i, \ i \ \neq \ a} \ c_i |v_i\rangle$$

In the case that $b_a$ is the only non-zero coefficient, it is necessarily true that $|v_a\rangle$ is the null vector, automatically making the set linearly dependent. If this is not the case, $|v_a\rangle$ has been written as a linear combination of non-zero vectors, as was shown above. To prove the converse, we assume that there exists some vector $|v_a\rangle$ in the subspace $|v_1\rangle, ..., \ |v_n\rangle$ that can be written as a linear combination of other vectors in the subspace. This means that:

$$|v_a\rangle \ = \ \displaystyle\sum_{s} b_s |v_s\rangle$$

where $s$ is an index that runs over a subset of the subspace. It follows that:

$$|v_a\rangle \ - \ \displaystyle\sum_{s} b_s |v_s\rangle \ = \ |v_a\rangle \ - \ (b_1|v_{s_1}\rangle \ + \ ... \ + \ b_r|v_{s_r}\rangle) \ = \ 0$$

For all vectors in the subspace that are not included in the subset indexed by $s$, we set their coefficients, indexed by $q$, equal to $0$. Thus,

$$|v_a\rangle \ - \ (b_1|v_{s_1}\rangle \ + \ ... \ + \ b_r|v_{s_r}\rangle) \ + \ (0)(|v_{q_1}\rangle \ + \ ... \ + \ |v_{q_t}\rangle) \ = \ 0$$

which is a linear combination of all elements in the subspace $|v_1\rangle, \ ..., \ |v_n\rangle$. This is equal to $0$, thus completing the proof that the two definitions of linear dependence imply each other.

Let's now consider a basic example. Consider the set of two vectors in $\mathbb{R}^2$, consisting of $|a\rangle \ = \ \begin{pmatrix} 1 \\ 0 \end{pmatrix}$ and $|b\rangle \ = \ \begin{pmatrix} 2 \\ 0 \end{pmatrix}$. If we choose the field over our vector space to be $\mathbb{R}$, then we can create a linear combination of these vectors that equates to $0$. For example:

$$2|a\rangle \ - \ |b\rangle \ = \ 0$$

A set of vectors is said to be linearly independent if there is no vector in the set that can be expressed as a linear combination of all the others.

The notion of a basis is simply a linearly independent spanning set. In this sense, the basis of a vector space is the minimal possible set that spans the entire space. We call the size of the basis set the dimension of the vector space.

Bases and spanning sets are important because they allow us to "shrink down" vector spaces and express them in terms of only a few vectors. We can come to certain conclusions about our basis set that we can generalize to the entire vector space, simply because we know every vector in the space is just a linear combination of the basis vectors.

In quantum computation, one of the bases that we often encounter is $|0\rangle, \ |1\rangle$. We can write any other qubit state as a linear combination of these basis vectors. For instance, the linear combination

$$\frac{|0\rangle \ + \ |1\rangle}{\sqrt{2}}$$

represents a superposition between the $|0\rangle$ and $|1\rangle$ basis state, with equal probability of measuring the state to be in either one of the basis vector states (this is intuitive, as the "weight" or the "amount of each basis vector" in the linear combination is equal, both being scaled by $1/\sqrt{2}$).

### Hilbert Spaces, Orthonormality, and the Inner Product

Hilbert Spaces are one of the most important mathematical constructs in quantum mechanics and quantum computation. A Hilbert space can be thought of as the state space in which all quantum state vectors "live". The main difference between a Hilbert space and any random vector space is that a Hilbert space is equipped with an inner product, which is an operation that can be performed between two vectors, returning a scalar.

In the context of quantum mechanics and quantum computation, the inner product between two state vectors returns a scalar quantity representing the amount to which the first vector lies along the second vector. From this, the probabilities of measurement in different quantum states (among other things) can be calculated (this will be discussed more in the quantum mechanics subtopic).

For two vectors $|a\rangle$ and $|b\rangle$ in a Hilbert space, we denote the inner product as $\langle a | b \rangle$, where $\langle a |$ is equal to the conjugate transpose of $|a\rangle$, denoted $|a\rangle^{\dagger}$. Thus, the inner product between two vectors of the Hilbert space looks something like:

$$\langle a | b \rangle \ = \ \begin{pmatrix} a_1^{*} & a_2^{*} & ... & a_n^{*} \end{pmatrix} \begin{pmatrix} b_1 \\ b_2 \\ . \\ . \\ . \\ b_n \end{pmatrix} \ = \ a_1^{*} b_1 \ + \ a_2^{*} b_2 \ + \ ... \ + \ a_n^{*} b_n$$

where $*$ denotes the complex conjugate of the vector.

One of the most important conditions for a Hilbert space representing a quantum system is that the inner product of a vector with itself is equal to one: $\langle \psi | \psi \rangle \ = \ 1$. This is the so-called normalization condition, which states that the length of the vector squared (each component of the vector is squared and summed together, by definition of the inner product) must be equal to one. The physical significance of this is that the length of a vector in a particular direction is representative of the "probability amplitude" of the quantum system with regards to measurement in that particular state. Obviously, the probability of the quantum system being measured in the state that it is in must be $1$ (after all, the sum of the probabilities of finding the quantum system in any particular state must equal $1$).

Let's consider the Bloch sphere:

The surface of this sphere, along with the inner product between qubit state vectors, is a valid Hilbert space. In addition, the normalization condition holds true, as the radius of the Bloch sphere is $1$, and thus the length squared of each vector must also equal $1$.

A final note regarding Hilbert spaces and the inner product is their relationship to unitary matrices. Unitary matrices are important in quantum computation because they preserve the inner product, meaning that no matter how you transform a vector under a sequence of unitary matrices, the normalization condition still holds true. This can be demonstrated in the following short proof:

$$\langle \psi | \psi \rangle \ = \ 1 \ \Rightarrow \ |\psi\rangle \ \rightarrow \ U |\psi\rangle \ = \ |\psi'\rangle \ \Rightarrow \ \langle \psi' | \psi' \rangle \ = \ (U |\psi\rangle)^{\dagger} U|\psi\rangle \ = \ \langle \psi | U^{\dagger} U |\psi\rangle \ = \ \langle \psi | \psi \rangle \ = \ 1$$

This means that unitary evolution sends quantum states to other valid quantum states. For a single-qubit Hilbert space, represented by the Bloch sphere, unitary transformations correspond to rotations of state vectors to different points on the sphere, not changing the length of the state vector in any way.

## References

[1] Cayley, Arthur. “A Memoir on the Theory of Matrices.” Philosophical Transactions of the Royal Society of London, vol. 148, 1858, pp. 17–37. JSTOR.

[2] A New Branch of Mathematics: The Ausdehnungslehre of 1844 and Other Works: Hermann Grassmann, Lloyd C. Kannenberg: 9780812692761