If \(g\) is a differentiable function of one variable defined on an open interval, then we know that \(g'(t)=0\) at every point \(t\) where it attains a (local) maximum or minimum. To find the local maxima and minima we therefore solve the equation \(g'(t)=0\) to find the possible candidates for maxima and minima. We then need criteria which guarantee that \(g\) attains in fact a maximum or minimum at such a point. One possible such criterion is that \(g''(t)\lt 0\) for a maximum, and \(g''(t)\gt 0\) for a minimum.
Subsection4.8.1Critical Points: The First Derivative Test
Using the above facts we want to derive conditions for maxima and minima of functions of several variables. To do so assume that \(f\) is defined on a domain \(D\) in \(\mathbb R^N\text{.}\) Suppose that \(f\) attains a (local) maximum (or minimum) at some point \(\vect a=(a_1,\dots,a_N)\text{.}\) For every \(i=1,\dots, N\) we can then define a function of one variable, \(t\text{,}\) by setting
The graph of this function is a cross-section of the graph of \(f\) in the \(i\)-th coordinate direction as seen in Figure 4.1 for a function of two variables. If \(f\) attains a maximum (or minimum) at \(\vect a\) then \(g_i\) attains a (local) maximum (or minimum) at \(t=0\) for all \(i=1,\dots,N\text{.}\) If \(f\) has partial derivatives in all directions then by the criterion for single variable functions and the definition of the partial derivatives we have
\begin{equation*}
0=g_i'(0)
=\frac{\partial}{\partial x_i}f(\vect a)
\end{equation*}
for all \(i=1,\dots,N\text{.}\) We therefore have the following theorem.
Theorem4.47.Maxima and minima.
Suppose that \(f\) is a function defined on a domain \(D\) in \(\mathbb R^N\) attaining a maximum or minimum at the interior point \(\vect a\) of \(D\text{.}\) Then
A point \(\vect a\) in the domain of \(f\) is called a critical point of \(f\) if \(\grad f(\vect a)=\vect 0\text{.}\)
Hence, all local maxima are among the critical point of a function, but as in case of functions of one variable, not all critical points are maxima or minima. As mentioned at the start of the section a function \(g\) of one variable attains a local maximum in an open interval if \(g'(t)=0\) and \(g''(t)\lt 0\text{.}\) Next we look at in more detail what happens in several dimensions.
Subsection4.8.2The Hessian Matrix: The Second Derivative Test
As with the first derivative test for maxima and minima we can look at cross sections of the graph of the function \(f\) in the coordinate direction. If \(\grad f(\vect a)=0\text{,}\) then
implies that the cross sections in the coordinate directions have a maximum at \(\vect a\text{.}\) The question is whether this guarantees a maximum at \(f(\vect a)\text{.}\) Unfortunately, the answer is NO! The reason is that we only look at cross-sections of the graph in the coordinate directions. The condition only tells us that the function attains a maximum along these cross-section. In other directions it may be decreasing. As an example look at the function \(f(x,y)=4x^2y^2-x^4-y^4\text{,}\) whose graph is displayed in Figure 4.49. Along the cross-sections parallel to the axes through \((0,0)\) the function attains a maximum, but along the cross-sections in the diagonal directions it attains a minimum. You can easily verify that formally.
Hence we have to make sure that we have a maximum along every cross-section. This means that all first directional derivatives must be zero, and all second directional derivatives of \(f\) at \(\vect a\) must be negative in order to ensure a maximum. Hence let \(\vect v=(v_1,\dots,v_n)\) be an arbitrary unit vector, and define
Assume that all first and second order partial derivatives of \(f\) are continuous at \(\vect a\text{.}\) According to the previous reasoning \(f\) attains a maximum at \(\vect a\) if \(g_{\vect v}'(0)=0\) and \(g_{\vect v}''(0)\lt 0\) for every unit vector \(\vect v\text{.}\) By Proposition 4.28 we have
Hence \(g_{\vect v}'(0)=0\) if and only if \(\grad f(\vect a)=\vect 0\text{.}\) Now let us look at the second order derivative. By Proposition 4.43 we know that
for all unit vectors \(\vect v\text{,}\) then \(f\) attains a maximum at \(\vect a\text{.}\) The last conclusion is not obvious, an additional argument is required. Before we formulate a theorem we introduce some definitions from linear algebra.
Definition4.50.Positive definite.
A symmetric \(N\times N\)-matrix \(A\) is called positive definite if there exists a constant \(c\gt 0\) such that
for all \(\vect v\in\mathbb R^N\) then \(A\) is called positive (or negative) stermi-definite. If \(A\) is none of the above then \(A\) is called indefinite.
With this terminology we can formulate our main result about maxima and minima.
Theorem4.51.
Suppose that all first and second order partial derivatives of \(f\) are continuous at the interior point \(\vect a\) of the domain of \(f\text{.}\) If \(\grad f(\vect a)=0\) then
\(f\) attains a maximum at \(\vect a\) if \(H_f(\vect a)\) is negative definite;
\(f\) attains a minimum at \(\vect a\) if \(H_f(\vect a)\) is positive definite;
\(f\) has a saddle point at \(\vect a\) if \(H_f(\vect a)\) is indefinite.
Proof.
We can give a proof of the above using Taylor’s Theorem 4.45. If \(H_f(\vect a)\) is positive definite, then there exists \(c\gt 0\) such that
for \(\|h\|\) small enough. Hence, if \(\grad f(\vect a)=0\) and \(H_f(\vect a)\) is positive definite, then \(f\) has a local minimum at \(\vect a\text{.}\) This proves (ii). Now (i) follows from (ii) by applying it to \(-f\text{.}\)
If \(H_f(\vect a)\) is indefinite there exist unit vectors \(\vect v,\vect w\) such that \(g_{\vect v}''=\vect v^TH_f(\vect a)\vect v\lt 0\text{,}\) and \(g_{\vect w}''=\vect w^TH_f(\vect a)\vect w\gt 0\text{.}\) This means that along some cross-section through the graph of \(f\) we have a maximum, and in some we have a minimum.
Remark4.52.
A point such that (iii) in the above theorem is satisfied is usually called a saddle point. Examples are shown in Figure 2.7 or Figure 4.49
To decide whether a matrix is positive definite we need to use some linear algebra. Recall that every symmetric \(N\times N\) matrix \(A\) is diagonalisable by means of an orthogonal matrix. An orthogonal matrix is an invertible matrix \(P\) such that \(P^{-1}=P^T\text{,}\) that is, \(P^TP=PP^T=I\text{.}\) You learn in linear algebra that for every symmetric matrix \(A\) there exists an orthogonal matrix \(P\) such that
\begin{align*}
\vect x^TA\vect x
\amp =\vect x^TPDP^T\vect x
\geq\lambda_{\min}\|P^T\vect x\|^2
=\lambda_{\min}\|\vect x\|^2\\
\vect x^TA\vect x,
\amp =\vect x^TPDP^T\vect x
\geq\lambda_{\max}\|P^T\vect x\|^2
=\lambda_{\max}\|\vect x\|^2
\end{align*}
for all \(\vect x\in\mathbb R^N\text{.}\) Note that there is equality if \(\vect x\) happens to be an eigenvector corresponding to \(\lambda_{\max}\) and \(\lambda_{\min}\text{,}\) respectively. This leads to the following criterion for positive and negative matrices.
Proposition4.53.Eigenvalues of positive definite matrices.
Suppose that \(A\) is a symmetric \(N\times N\) matrix with smallest and largest eigenvalues \(\lambda_{\min}\) and \(\lambda_{\max}\text{.}\) Then
\(A\) is positive definite if and only if \(\lambda_{\min}\gt 0\text{;}\)
\(A\) is negative definite if and only if \(\lambda_{\max}\lt 0\text{;}\)
\(A\) is indefinite if and only if \(\lambda_{\min}\lt 0\) and \(\lambda_{\max}\gt 0\text{.}\)
From the above we can get a simple criterion in case of \(N=2\text{.}\)
Corollary4.54.
Suppose that \(A=
\begin{bmatrix}
a \amp c \\ c\amp b
\end{bmatrix}\) is a symmetric matrix. Then
\(A\) is positive definite if and only if \(a\gt 0\) and \(\det A\gt 0\text{;}\)
\(A\) is negative definite if and only if \(a\lt 0\) and \(\det A\gt 0\text{;}\)
\(A\) is indefinite if \(\det A\lt 0\text{.}\)
Proof.
To compute the eigenvalues we determine the characteristic polynomial of \(A\) given by
If \(\det A\gt 0\text{,}\) then (4.8) shows that either both eigenvalues are positive or both are negative. By (4.7) they are both positive if \(a\gt 0\) (or \(b\gt 0\)) and both negative if \(a\lt 0\text{.}\) If \(\det A\lt 0\text{,}\) then (4.8) shows that the eigenvalues must have different sign. To finish the proof we use Proposition 4.53.
Example4.55.
Let \(A=
\begin{bmatrix}
9 \amp -1 \\-1\amp 2
\end{bmatrix}
\text{.}\)
Then \(a=9\gt 0\) and \(\det A=18-1=17\gt 0\text{,}\) so \(A\) is positive definite.
Let \(A=
\begin{bmatrix}
-2 \amp 3 \\3\amp 1
\end{bmatrix}
\text{.}\)
Then \(\det A=-2-9=-11\lt 0\text{,}\) so \(A\) is indefinite.
Let \(A=
\begin{bmatrix}
-5 \amp 2 \\2\amp -1
\end{bmatrix}
\text{.}\)
Then \(a=-5\lt 0\) and \(\det A=5-4=1\gt 0\text{,}\) so \(A\) is negative definite.
With the above we can give a convenient criterion for maxima and minima of functions of two variables.
Theorem4.56.
Suppose that all first and second partial derivatives of \(f\) defined on \(D\subset\mathbb R^2\) are continuous at the interior point \(\vect a\in D\text{,}\) and that \(\grad f(\vect a)=0\text{.}\) Then
\(f\) attains a (local) maximum at \(\vect a\) if \(\dfrac{\partial^2 f}{\partial x_1^2}(\vect a)\lt 0\) and \(\det H_f(\vect a)\gt 0\text{;}\)
\(f\) attains a (local) minimum at \(\vect a\) if \(\dfrac{\partial^2 f}{\partial x_1^2}(\vect a)\gt 0\) and \(\det H_f(\vect a)\gt 0\text{;}\)
\(f\) has a saddle point at \(\vect a\) if \(\det H_f(\vect a)\lt 0\text{;}\)
the test is inconclusive if \(\det H_f(\vect a)=0\text{.}\)
The last assertion can be shown by constructing suitable examples: If we set \(f(x,y)=\alpha x^4+\beta y^4\) then \(\grad f(x,y)=4(\alpha x^3,\beta y^3)\text{,}\) and thus \((0,0)\) is a critical point. The Hessian matrix is
However, it is rather obvious that \((0,0)\) is a minimum if \(\alpha=\beta=1\text{,}\) a maximum if \(\alpha=\beta=-1\text{,}\) and a saddle point if \(\alpha=1\) and \(\beta=-1\text{.}\) Sketch the graphs to see this!
Example4.57.
Determine the critical points of \(f(x,y)=x^3+3xy^2-3x^2-3y^2+4\text{,}\) and decide whether \(f\) attains a maximum, minimum or a saddle point at each of them.
Solution.
To determine the critical points we have to solve the system of equations
From the second equation we get that \(y=0\) or \(x=1\text{.}\) If \(y=0\) then from the first equation \(x(3x-6)=0\) and so \(x=0\) or \(x=2\text{.}\) Hence \((0,0)\) and \((2,0)\) are critical points. If \(x=1\) then from the first equation \(3+3y^2-6=0\text{,}\) and so \(y=\pm 1\text{.}\) Hence the critical points of \(f\) are
\begin{equation*}
H_f(x,y)=6
\begin{bmatrix}
x-1 \amp y \\
y \amp x-1
\end{bmatrix}.\text{.}
\end{equation*}
We next look at \(H_f(x,y)\) at every critical point:
We have \(H_f(0,0)=6
\begin{bmatrix}
-1 \amp 0 \\ 0\amp -1
\end{bmatrix}
\text{,}\) and so \(-1\lt 0\) and \(\det H_f(0,0)=1\text{,}\) showing that \(f\) attains a maximum at \((0,0)\text{.}\)
Next we have \(H_f(2,0)=6
\begin{bmatrix}
1 \amp 0 \\ 0\amp 1
\end{bmatrix}
\text{,}\) and so \(1\gt 0\) and \(\det H_f(0,0)=1\text{,}\) showing that \(f\) attains a minimum at \((2,0)\text{.}\)
Next we have \(H_f(1,1)=6
\begin{bmatrix}
0 \amp 1 \\ 1\amp 0
\end{bmatrix}
\text{,}\) and so \(\det H_f(0,0)=-1\lt 0\text{,}\) showing that \(f\) has a saddle point at \((1,1)\text{.}\)
Finally we have \(H_f(1,-1)=6
\begin{bmatrix}
0 \amp -1 \\ -1\amp 0
\end{bmatrix}
\text{,}\) and so \(\det H_f(0,0)=-1\lt 0\text{,}\) showing that \(f\) has another saddle point at \((1,-1)\text{.}\)