Section 4.6 Higher Order Derivatives
The partial derivative of a function \(f\) defined on \(D\subset\mathbb R^N\) is again a function on \(D\text{.}\) Hence we can again take partial derivatives, if they exist. We call such derivatives higher order partial derivatives. We write
\begin{align*}
\frac{\partial^2}{\partial x_j\partial x_i}f(\vect a)
\amp:=\frac{\partial^2f}{\partial x_j\partial x_i}(\vect a)\\
\amp:=\frac{\partial}{\partial x_j}
\Bigl(\frac{\partial f}{\partial x_i}\Bigr)(\vect a)\text{,}
\end{align*}
and
\begin{align*}
\frac{\partial^2}{\partial x_i^2}f(\vect a)
\amp:=\frac{\partial^2f}{\partial x_i^2}(\vect a)\\
\amp:=\frac{\partial}{\partial x_i}
\Bigl(\frac{\partial f}{\partial x_i}\Bigr)(\vect a)
\end{align*}
for the second order partial derivatives. The exponent indicates how many derivatives we take. A fourth order derivative is for instance given by
\begin{equation*}
\frac{\partial^4f}{\partial x_1\partial x_3\partial x_2\partial x_1}(\vect a)
:=\frac{\partial}{\partial x_1}
\Bigl(\frac{\partial}{\partial x_3}
\Bigl(\frac{\partial}{\partial x_2}
\Bigl(\frac{\partial f}{\partial x_1}
\Bigr)\Bigr)\Bigr)(\vect a)\text{.}
\end{equation*}
Example 4.37.
Let \(f(x,y):=x^3-3x^2y\) for all \((x,y)\in\mathbb R^2\text{.}\) Then the first order partial derivatives are
\begin{align*}
\frac{\partial f}{\partial x}(x,y) \amp =3x^2-6xy\\
\frac{\partial f}{\partial y}(x,y) \amp =-3x^2\text{.}
\end{align*}
Hence the second order partial derivatives are
\begin{align*}
\frac{\partial^2 f}{\partial x^2}(x,y) \amp =6x-6y \amp
\frac{\partial^2 f}{\partial y\partial x}(x,y) \amp =-6x\\
\frac{\partial^2 f}{\partial x\partial y}(x,y) \amp =-6x \amp
\frac{\partial^2 f}{\partial y^2}(x,y) \amp =0\text{.}
\end{align*}
In the above example we see that
\begin{equation*}
\frac{\partial^2 f}{\partial y\partial x}
=\frac{\partial^2 f}{\partial x\partial y},
\end{equation*}
that is, interchanging the order of the partial derivative leads to the same answer. This is not accidental as the following proposition shows, the proof of which will be omitted.
Proposition 4.38. Symmetry of second partial derivatives.
Suppose that \(f\text{,}\) defined on \(D\subset\mathbb R^N\) has continuous second order partial derivatives. Then
\begin{equation*}
\frac{\partial^2 f}{\partial x_j\partial x_i}
=\frac{\partial^2 f}{\partial x_i\partial x_j}
\end{equation*}
for all \(i,j=1,\dots,N\text{.}\)
As a natural generalisation of partial derivatives we studied directional derivatives. We now want to look at higher order directional derivatives. Given a function \(f\) defined on \(D\subset\mathbb R^N\) and a unit vector \(\vect v={(v_1,\dots,v_N)}\) we set
\begin{align*}
\frac{\partial^2}{\partial\vect v^2}f(\vect x)
\amp:=\frac{\partial}{\partial\vect v}
\Bigl(\frac{\partial f}{\partial\vect v}\Bigr)(\vect x)\\
\frac{\partial^3}{\partial\vect v^3}f(\vect x)
\amp:=\frac{\partial}{\partial\vect v}
\Bigl(\frac{\partial}{\partial\vect v}
\Bigl(\frac{\partial f}{\partial\vect v}\Bigr)\Bigr)(\vect x)
\end{align*}
etc.
In
Proposition 4.28 we derived a formula for the directional derivative. We found that
\begin{equation*}
\frac{\partial f}{\partial\vect v}(\vect x)
=\bigl(\grad f(\vect x)\bigr)\cdot\vect v
\end{equation*}
if \(\grad f\) is continuous at \(\vect x\text{.}\) To compute the second directional derivative we can apply the same formula to the function \(\bigl(\grad f(\vect x)\bigr)\cdot\vect v\text{.}\) Doing that we get
\begin{equation*}
\frac{\partial^2}{\partial\vect v^2}f(\vect x)
=\grad\bigl(\bigl(\grad f(\vect x)\bigr)\cdot\vect v\bigr)
\cdot\vect v\text{.}
\end{equation*}
To derive a more explicit formula for the above expression we compute the partial derivatives of \(\bigl(\grad f(\vect x)\bigr)\cdot\vect v\text{:}\)
\begin{align*}
\frac{\partial}{\partial x_i}\bigl(\grad f(\vect x)\bigr)\cdot\vect v
\amp=\frac{\partial}{\partial x_i}
\sum_{j=1}^N\frac{\partial f}{\partial x_j}(\vect x)v_j\\
\amp=\sum_{j=1}^N
\frac{\partial^2 f}{\partial x_i\partial x_j}(\vect x)v_j\text{.}
\end{align*}
Therefore,
\begin{align*}
\frac{\partial^2}{\partial\vect v^2}f(\vect x)
\amp=\grad\bigl(\bigl(\grad f(\vect x)\bigr)\cdot\vect v\bigr)
\cdot\vect v\\
\amp=\sum_{i=1}^N\sum_{j=1}^N
\frac{\partial^2 f}{\partial x_i\partial x_j}(\vect x)v_iv_j\text{.}
\end{align*}
If we set
\begin{equation}
H_f(\vect x):=
\begin{bmatrix}
\frac{\partial^2}{\partial x_1^2}f(\vect x) \amp \dots \amp
\frac{\partial^2}{\partial x_1\partial x_n}f(\vect x) \\
\vdots \amp \ddots \amp \vdots \\
\frac{\partial^2}{\partial x_n\partial x_1}f(\vect x) \amp \dots \amp
\frac{\partial^2}{\partial x_n^2}f(\vect x)
\end{bmatrix}\text{,}\tag{4.5}
\end{equation}
and
\begin{equation*}
\vect v=
\begin{bmatrix}
v_1 \\ \vdots \\ v_N
\end{bmatrix}
\quad\text{and}\quad
\vect v^T=
\begin{bmatrix}
v_1 \amp \dots \amp v_N
\end{bmatrix}\text{,}
\end{equation*}
then, using matrix multiplications, we can rewrite the second directional derivative by
\begin{equation*}
\frac{\partial^2}{\partial\vect v^2}f(\vect x)
=\vect v^TH_f(\vect x)\vect v\text{.}
\end{equation*}
This motivates the following definition.
Definition 4.40. Hessian matrix.
The matrix
\(H_f(\vect x)\) given by
(4.5) is called the
Hessian matrix of
\(f\) at
\(\vect x\text{.}\)
Example 4.42.
\begin{equation*}
H_f(x,y)=
\begin{bmatrix}
6x-6y \amp -6x \\
-6x \amp 0
\end{bmatrix}\text{.}
\end{equation*}
Note that the matrix is symmetric.
Let us summarise what we just found.
Proposition 4.43.
Suppose that \(\vect v\) is a unit vector, and that \(f\) has continuous first and second order partial derivatives at \(\vect x\text{.}\) Then the second directional derivative in the direction of \(\vect v\) is given by
\begin{equation*}
\frac{\partial^2}{\partial\vect v^2}f(\vect x)
=\vect v^TH_f(\vect x)\vect v\text{,}
\end{equation*}
where
\(H_f(\vect x)\) is the Hessian matrix of
\(f\) at
\(\vect x\) defined by
(4.5).
In principle we could continue to apply
(4.3) to compute the third, fourth and higher directional derivatives. However, for later purposes we only need the second derivative. We next want to use what we learnt to find `Taylor polynomials' for functions of several variables.