Skip to main content

Section 4.6 Higher Order Derivatives

The partial derivative of a function \(f\) defined on \(D\subset\mathbb R^N\) is again a function on \(D\text{.}\) Hence we can again take partial derivatives, if they exist. We call such derivatives higher order partial derivatives. We write
\begin{align*} \frac{\partial^2}{\partial x_j\partial x_i}f(\vect a) \amp:=\frac{\partial^2f}{\partial x_j\partial x_i}(\vect a)\\ \amp:=\frac{\partial}{\partial x_j} \Bigl(\frac{\partial f}{\partial x_i}\Bigr)(\vect a)\text{,} \end{align*}
and
\begin{align*} \frac{\partial^2}{\partial x_i^2}f(\vect a) \amp:=\frac{\partial^2f}{\partial x_i^2}(\vect a)\\ \amp:=\frac{\partial}{\partial x_i} \Bigl(\frac{\partial f}{\partial x_i}\Bigr)(\vect a) \end{align*}
for the second order partial derivatives. The exponent indicates how many derivatives we take. A fourth order derivative is for instance given by
\begin{equation*} \frac{\partial^4f}{\partial x_1\partial x_3\partial x_2\partial x_1}(\vect a) :=\frac{\partial}{\partial x_1} \Bigl(\frac{\partial}{\partial x_3} \Bigl(\frac{\partial}{\partial x_2} \Bigl(\frac{\partial f}{\partial x_1} \Bigr)\Bigr)\Bigr)(\vect a)\text{.} \end{equation*}

Example 4.37.

Let \(f(x,y):=x^3-3x^2y\) for all \((x,y)\in\mathbb R^2\text{.}\) Then the first order partial derivatives are
\begin{align*} \frac{\partial f}{\partial x}(x,y) \amp =3x^2-6xy\\ \frac{\partial f}{\partial y}(x,y) \amp =-3x^2\text{.} \end{align*}
Hence the second order partial derivatives are
\begin{align*} \frac{\partial^2 f}{\partial x^2}(x,y) \amp =6x-6y \amp \frac{\partial^2 f}{\partial y\partial x}(x,y) \amp =-6x\\ \frac{\partial^2 f}{\partial x\partial y}(x,y) \amp =-6x \amp \frac{\partial^2 f}{\partial y^2}(x,y) \amp =0\text{.} \end{align*}
In the above example we see that
\begin{equation*} \frac{\partial^2 f}{\partial y\partial x} =\frac{\partial^2 f}{\partial x\partial y}, \end{equation*}
that is, interchanging the order of the partial derivative leads to the same answer. This is not accidental as the following proposition shows, the proof of which will be omitted.

Note 4.39.

Examples show that the assumption that the partial derivatives be continuous is essential for the above result to be true!
As a natural generalisation of partial derivatives we studied directional derivatives. We now want to look at higher order directional derivatives. Given a function \(f\) defined on \(D\subset\mathbb R^N\) and a unit vector \(\vect v={(v_1,\dots,v_N)}\) we set
\begin{align*} \frac{\partial^2}{\partial\vect v^2}f(\vect x) \amp:=\frac{\partial}{\partial\vect v} \Bigl(\frac{\partial f}{\partial\vect v}\Bigr)(\vect x)\\ \frac{\partial^3}{\partial\vect v^3}f(\vect x) \amp:=\frac{\partial}{\partial\vect v} \Bigl(\frac{\partial}{\partial\vect v} \Bigl(\frac{\partial f}{\partial\vect v}\Bigr)\Bigr)(\vect x) \end{align*}
etc.
In Proposition 4.28 we derived a formula for the directional derivative. We found that
\begin{equation*} \frac{\partial f}{\partial\vect v}(\vect x) =\bigl(\grad f(\vect x)\bigr)\cdot\vect v \end{equation*}
if \(\grad f\) is continuous at \(\vect x\text{.}\) To compute the second directional derivative we can apply the same formula to the function \(\bigl(\grad f(\vect x)\bigr)\cdot\vect v\text{.}\) Doing that we get
\begin{equation*} \frac{\partial^2}{\partial\vect v^2}f(\vect x) =\grad\bigl(\bigl(\grad f(\vect x)\bigr)\cdot\vect v\bigr) \cdot\vect v\text{.} \end{equation*}
To derive a more explicit formula for the above expression we compute the partial derivatives of \(\bigl(\grad f(\vect x)\bigr)\cdot\vect v\text{:}\)
\begin{align*} \frac{\partial}{\partial x_i}\bigl(\grad f(\vect x)\bigr)\cdot\vect v \amp=\frac{\partial}{\partial x_i} \sum_{j=1}^N\frac{\partial f}{\partial x_j}(\vect x)v_j\\ \amp=\sum_{j=1}^N \frac{\partial^2 f}{\partial x_i\partial x_j}(\vect x)v_j\text{.} \end{align*}
Therefore,
\begin{align*} \frac{\partial^2}{\partial\vect v^2}f(\vect x) \amp=\grad\bigl(\bigl(\grad f(\vect x)\bigr)\cdot\vect v\bigr) \cdot\vect v\\ \amp=\sum_{i=1}^N\sum_{j=1}^N \frac{\partial^2 f}{\partial x_i\partial x_j}(\vect x)v_iv_j\text{.} \end{align*}
If we set
\begin{equation} H_f(\vect x):= \begin{bmatrix} \frac{\partial^2}{\partial x_1^2}f(\vect x) \amp \dots \amp \frac{\partial^2}{\partial x_1\partial x_n}f(\vect x) \\ \vdots \amp \ddots \amp \vdots \\ \frac{\partial^2}{\partial x_n\partial x_1}f(\vect x) \amp \dots \amp \frac{\partial^2}{\partial x_n^2}f(\vect x) \end{bmatrix}\text{,}\tag{4.5} \end{equation}
and
\begin{equation*} \vect v= \begin{bmatrix} v_1 \\ \vdots \\ v_N \end{bmatrix} \quad\text{and}\quad \vect v^T= \begin{bmatrix} v_1 \amp \dots \amp v_N \end{bmatrix}\text{,} \end{equation*}
then, using matrix multiplications, we can rewrite the second directional derivative by
\begin{equation*} \frac{\partial^2}{\partial\vect v^2}f(\vect x) =\vect v^TH_f(\vect x)\vect v\text{.} \end{equation*}
This motivates the following definition.

Definition 4.40. Hessian matrix.

The matrix \(H_f(\vect x)\) given by (4.5) is called the Hessian matrix of \(f\) at \(\vect x\text{.}\)

Remark 4.41.

It follows from Proposition 4.38 that the Hessian matrix \(H_f(\vect x)\) is symmetric if \(\grad f\) is continuous at \(\vect x\text{.}\)

Example 4.42.

The Hessian matrix of the function in Example 4.37 is
\begin{equation*} H_f(x,y)= \begin{bmatrix} 6x-6y \amp -6x \\ -6x \amp 0 \end{bmatrix}\text{.} \end{equation*}
Note that the matrix is symmetric.
Let us summarise what we just found.
In principle we could continue to apply (4.3) to compute the third, fourth and higher directional derivatives. However, for later purposes we only need the second derivative. We next want to use what we learnt to find `Taylor polynomials’ for functions of several variables.