The LookAt function was originally part of the OpenGL Utility library (GLU), an utility library for OpenGL that is now outdated. The function was used to update the internal model-view matrix using the LookAt camera model. With the release of version 3.0 in 2008, OpenGL moved away from the fixed function pipeline, which is often referred to as legacy OpenGL now, in favor of a programmable, shader based, pipeline. This meant most concepts and elements concerning the fixed function pipeline were declared deprecated, including the LookAt function from the GLU library.

In a programmable pipeline there is no internal model-view matrix, we have to manage the matrices ourselves and load them to our shaders when necessary. This means we need to create our own LookAt function if we want to use the LookAt camere model. The source code for the function is quite short as shown below.

The function takes three vectors as arguments that together describe the
position and orientation of a camera in world space. The **eye** vector defines the
position of the camera, the **at** vector is the position where the camera is
looking at and the **up** vector specifies the up direction of the camera. Using
these three vectors we can construct a set of orthonormal basis vectors that
define the orientation of the camera coordinate system relative to the world
coordinate system. Using the orientation and position, which is the defined by
the *eye* vector, we can create an affine transformation matrix that transforms
vertices from camera space to world space. To do the reverse transformation,
moving the vertices from world space to camera space, we have to compute the inverse
of this affine transformation matrix.

We will be using a right handed coordinate system for the world and camera
space. Although with the programmable pipeline we are free to use whichever
coordinate system we prefer, the convention in OpenGL is to use a right handed
coordinate system in world and camera space, as this was the default setting
in legacy OpenGL. In world space the *x*-axis will be pointing east, the *y*-axis
up and the *z*-axis will be pointing south. In camera space the *x*-axis will be
pointing to the right, when looking in the direction of the camera, the *y*-axis
up and the *z*-axis will be pointing in the opposite direction of the camera, so
the camera will be looking in the negative *z* direction.

First we create the *z*-axis basis vector by subtracting the *eye* vector from the
*at* vector. We now have a vector that defines the displacement from the position
where the camera is looking at to the position of the camera. All we need to do
now is normalize the vector to get the first basis vector.

The *z*-axis basis vector we computed is pointing in the direction of the camera,
but as we stated above it should be pointing in the opposite direction. This
means we have to negate the *z*-axis basis vector we computed. However we will do
this after computing the other two basis vectors. In effect we are create a left
handed coordinate system with the *x*-axis pointing to the right, *y*-axis up and
*z*-axis in the direction of the camera. Finally we will transform this left
handed coordinate system into a right handed one by flipping the *z*-axis.

To compute the next basis vector we will use the cross product, this yields
a vector that is **perpendicular** to the original two vectors, which is exactly
what we need. The order of the vectors is important as this determines in
which direction the perpendicular vector will point. We can determine in which
direction the vector will point by using the right hand rule, right hand because
we are working in a right handed coordinate system. We also need to normalize
this vector because the up vector might not be a unit vector.

We will compute the third basis vector by taking the cross product of the *x*-axis
and *z*-axis basis vectors. It’s possible that this vector is the same as the up
vector that was passed as argument, but only if it is a unit vector orthogonal
to the vector from *eye* to *at*.

Now we can negate the *z*-axis to become a right handed coordinate system with
the camera pointing in the negative *z* direction.

With the orientation and the position of the camera specified, we can start construction the affine transformation matrix that will transform vertices from world space to camera space. An affine transformation is a linear transformation followed by a translation. We can separate the linear transformation and translations portions as shown below.

\[\begin{align*} M = TR &= \begin{bmatrix} 1 & 0 & 0 & \Delta x\\ 0 & 1 & 0 & \Delta y\\ 0 & 0 & 1 & \Delta z\\ 0 & 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} r^{11} & r^{21} & r^{31} & 0\\ r^{12} & r^{22} & r^{32} & 0\\ r^{13} & r^{23} & r^{33} & 0\\ 0 & 0 & 0 & 1 \end{bmatrix}\\ &= \begin{bmatrix} r^{11} & r^{21} & r^{31} & \Delta x\\ r^{12} & r^{22} & r^{32} & \Delta y\\ r^{13} & r^{23} & r^{33} & \Delta z\\ 0 & 0 & 0 & 1 \end{bmatrix} \end{align*}\]It’s important to keep in mind that the order of multiplication matters. OpenGL uses column vectors by default, this means we need to read matrix vector multiplication from right to left. So in the above example the linear transformation would be applied first and then the translation.

Separating the affine transformation in a linear transformation and translation portion makes it easier for us to compute its inverse. We will have to reverse the order of the matrices, as the inverse of a matrix product is equal to the product of the inverses of the matrices, taken in reverse order.

\[(TR)^{-1} = R^{-1}T^{-1}\]First we’ll compute the inverse of the linear transformation portion. In our case this is the set of orthonormal basis vectors we just computed. Because a set of orthonormal basis vectors form an orthogonal matrix, its inverse is equal to transposing the matrix.

\[\begin{align*} \mathbf R \ is \ orthogonal\\ \Leftrightarrow\\ \mathbf R^{-1} &= \mathbf R^T\\ &= \begin{bmatrix} r^{11} & r^{21} & r^{31} & 0\\ r^{12} & r^{22} & r^{32} & 0\\ r^{13} & r^{23} & r^{33} & 0\\ 0 & 0 & 0 & 1 \end{bmatrix}^{T}\\ &= \begin{bmatrix} r^{11} & r^{12} & r^{13} & 0\\ r^{21} & r^{22} & r^{23} & 0\\ r^{23} & r^{32} & r^{33} & 0\\ 0 & 0 & 0 & 1 \end{bmatrix} \end{align*}\]Because we’re using column vectors, the original transformation matrix contains the basis vectors as columns of the matrix. By taking the transpose of this matrix the basis vectors become the rows of the matrix.

Next up is the translation matrix. To compute its inverse we only need to negate the translation components.

\[\begin{align*} \mathbf T^{-1} &= \begin{bmatrix} 1 & 0 & 0 & \Delta x\\ 0 & 1 & 0 & \Delta y\\ 0 & 0 & 1 & \Delta z\\ 0 & 0 & 0 & 1 \end{bmatrix}^{-1}\\ &= \begin{bmatrix} 1 & 0 & 0 & -\Delta x\\ 0 & 1 & 0 & -\Delta y\\ 0 & 0 & 1 & -\Delta z\\ 0 & 0 & 0 & 1 \end{bmatrix} \end{align*}\]Now we can combine the inverse of the linear transformation and the inverse of the translation in one affine transformation. Because we have to reverse the order of the matrix product when taking the inverse, we now need to multiply the rows of the linear transformation matrix, the basis vectors, with the columns of the translation matrix. For the linear transformation portion this has no consequences, it remains unchanged. However for the right most column, the translation component, we now need to multiply each basis vector with the translation vector. Which is equal to taking the dot product of each basis vector with the translation vector.

From a geometric point of view this makes sense as the dot product can be
interpreted as a **projection**. This means the dot product can be used to measure
displacement in a particular direction. In our case we measure the displacement
of the translation vector into the direction of each of the basis vectors.

We now have enough information to create the view matrix.

\[\begin{align*} &n \ = \ x\text{-}axis\\ &u \ = \ y\text{-}axis\\ &v \ = \ z\text{-}axis\\ &e \ = \ eye\\ \end{align*}\] \[\begin{align*} M^{-1} &= (TR)^{-1}\\ &= R^{-1}T^{-1} \\ &= \begin{bmatrix} n_{x} & u_{x} & v_{x} & 0\\ n_{y} & u_{y} & v_{y} & 0\\ n_{z} & u_{z} & v_{z} & 0\\ 0 & 0 & 0 & 1 \end{bmatrix}^{-1} \begin{bmatrix} 1 & 0 & 0 & e_{x}\\ 0 & 1 & 0 & e_{y}\\ 0 & 0 & 1 & e_{z}\\ 0 & 0 & 0 & 1 \end{bmatrix}^{-1}\\ &= \begin{bmatrix} n_{x} & n_{y} & n_{z} & 0\\ u_{x} & u_{y} & u_{z} & 0\\ v_{x} & v_{y} & v_{z} & 0\\ 0 & 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 & -e_{x}\\ 0 & 1 & 0 & -e_{y}\\ 0 & 0 & 1 & -e_{z}\\ 0 & 0 & 0 & 1 \end{bmatrix}\\ &= \begin{bmatrix} n_{x} & n_{y} & n_{z} & -(n \cdot e)\\ u_{x} & u_{y} & u_{z} & -(u \cdot e)\\ v_{x} & v_{y} & v_{z} & -(v \cdot e)\\ 0 & 0 & 0 & 1 \end{bmatrix}\\ \end{align*}\]Finally, in code this becomes:

This concludes our overview of the LookAt function. If you have any questions feel free to leave a comment below!