Skip to main content

Section 4.8 Maxima and Minima

If \(g\) is a differentiable function of one variable defined on an open interval, then we know that \(g'(t)=0\) at every point \(t\) where it attains a (local) maximum or minimum. To find the local maxima and minima we therefore solve the equation \(g'(t)=0\) to find the possible candidates for maxima and minima. We then need criteria which guarantee that \(g\) attains in fact a maximum or minimum at such a point. One possible such criterion is that \(g''(t)\lt 0\) for a maximum, and \(g''(t)\gt 0\) for a minimum.

Subsection 4.8.1 Critical Points: The First Derivative Test

Using the above facts we want to derive conditions for maxima and minima of functions of several variables. To do so assume that \(f\) is defined on a domain \(D\) in \(\mathbb R^N\text{.}\) Suppose that \(f\) attains a (local) maximum (or minimum) at some point \(\vect a=(a_1,\dots,a_N)\text{.}\) For every \(i=1,\dots, N\) we can then define a function of one variable, \(t\text{,}\) by setting
\begin{equation*} g_i(t)=f(a_1,\dots,a_{i-1},a_i+t,a_{i+1},\dots,a_N)\text{.} \end{equation*}
The graph of this function is a cross-section of the graph of \(f\) in the \(i\)-th coordinate direction as seen in Figure 4.1 for a function of two variables. If \(f\) attains a maximum (or minimum) at \(\vect a\) then \(g_i\) attains a (local) maximum (or minimum) at \(t=0\) for all \(i=1,\dots,N\text{.}\) If \(f\) has partial derivatives in all directions then by the criterion for single variable functions and the definition of the partial derivatives we have
\begin{equation*} 0=g_i'(0) =\frac{\partial}{\partial x_i}f(\vect a) \end{equation*}
for all \(i=1,\dots,N\text{.}\) We therefore have the following theorem.

Definition 4.48. Critical point.

A point \(\vect a\) in the domain of \(f\) is called a critical point of \(f\) if \(\grad f(\vect a)=\vect 0\text{.}\)
Hence, all local maxima are among the critical point of a function, but as in case of functions of one variable, not all critical points are maxima or minima. As mentioned at the start of the section a function \(g\) of one variable attains a local maximum in an open interval if \(g'(t)=0\) and \(g''(t)\lt 0\text{.}\) Next we look at in more detail what happens in several dimensions.

Subsection 4.8.2 The Hessian Matrix: The Second Derivative Test

As with the first derivative test for maxima and minima we can look at cross sections of the graph of the function \(f\) in the coordinate direction. If \(\grad f(\vect a)=0\text{,}\) then
\begin{equation*} \frac{\partial^2}{\partial x_i^2}f(\vect a)\lt 0 \end{equation*}
implies that the cross sections in the coordinate directions have a maximum at \(\vect a\text{.}\) The question is whether this guarantees a maximum at \(f(\vect a)\text{.}\) Unfortunately, the answer is NO! The reason is that we only look at cross-sections of the graph in the coordinate directions. The condition only tells us that the function attains a maximum along these cross-section. In other directions it may be decreasing. As an example look at the function \(f(x,y)=4x^2y^2-x^4-y^4\text{,}\) whose graph is displayed in Figure 4.49. Along the cross-sections parallel to the axes through \((0,0)\) the function attains a maximum, but along the cross-sections in the diagonal directions it attains a minimum. You can easily verify that formally.
Figure 4.49. Not a maximum or a minimum at \((0,0)\text{.}\)
Hence we have to make sure that we have a maximum along every cross-section. This means that all first directional derivatives must be zero, and all second directional derivatives of \(f\) at \(\vect a\) must be negative in order to ensure a maximum. Hence let \(\vect v=(v_1,\dots,v_n)\) be an arbitrary unit vector, and define
\begin{equation*} g_{\vect v}(t):=f(\vect a+t\vect v)\text{.} \end{equation*}
Assume that all first and second order partial derivatives of \(f\) are continuous at \(\vect a\text{.}\) According to the previous reasoning \(f\) attains a maximum at \(\vect a\) if \(g_{\vect v}'(0)=0\) and \(g_{\vect v}''(0)\lt 0\) for every unit vector \(\vect v\text{.}\) By Proposition 4.28 we have
\begin{equation*} g_{\vect v}'(0)=\bigl(\grad f(\vect a)\bigr)\cdot\vect v\text{.} \end{equation*}
Hence \(g_{\vect v}'(0)=0\) if and only if \(\grad f(\vect a)=\vect 0\text{.}\) Now let us look at the second order derivative. By Proposition 4.43 we know that
\begin{equation*} g_{\vect v}''(0) =\frac{\partial^2f}{\partial\vect v^2}(\vect a) =\vect v^TH_f(\vect a)\vect v\text{,} \end{equation*}
where \(H_f(\vect a)\) is the Hessian matrix of \(f\) at \(\vect a\) defined by (4.5). Hence, if
\begin{equation*} \vect v^TH_f(\vect a)\vect v\lt 0 \end{equation*}
for all unit vectors \(\vect v\text{,}\) then \(f\) attains a maximum at \(\vect a\text{.}\) The last conclusion is not obvious, an additional argument is required. Before we formulate a theorem we introduce some definitions from linear algebra.

Definition 4.50. Positive definite.

A symmetric \(N\times N\)-matrix \(A\) is called positive definite if there exists a constant \(c\gt 0\) such that
\begin{equation*} \vect v^TA\vect v\geq c\|\vect v\|^2 \end{equation*}
for all \(\vect v\in\mathbb R^N\text{.}\) The matrix \(A\) is called negative definite if \(-A\) is positive definite. If
\begin{equation*} \vect v^TA\vect v\geq 0\text{ (or }\leq 0\text{)} \end{equation*}
for all \(\vect v\in\mathbb R^N\) then \(A\) is called positive (or negative) stermi-definite. If \(A\) is none of the above then \(A\) is called indefinite.
With this terminology we can formulate our main result about maxima and minima.

Proof.

We can give a proof of the above using Taylor’s Theorem 4.45. If \(H_f(\vect a)\) is positive definite, then there exists \(c\gt 0\) such that
\begin{align*} f(\vect a+\vect h)-f(\vect a)\amp =\grad f(\vect a)+\vect h^TH_f(\vect a)\vect h+R_2(\vect h)\\ \amp =\vect h^TH_f(\vect a)\vect h+R_2(\vect h) \geq c\|\vect h\|^2+R_2(\vect h) \end{align*}
for \(\|\vect h\|\) small enough. Dividing by \(\|\vect h\|^2\) we get
\begin{equation*} \frac{1}{\|\vect h\|^2}\Bigl(f(\vect a+\vect h)-f(\vect a)\Bigr) \geq c+\frac{R_2(\vect h)}{\|\vect h\|^2}\text{.} \end{equation*}
By Taylor’s Theorem 4.45 \(R_2(\vect h)/\|\vect h\|^2\to 0\) as \(\|h\|\to 0\text{.}\) Since \(c\gt 0\) we get
\begin{equation*} \frac{1}{\|\vect h\|^2}\Bigl(f(\vect a+\vect h)-f(\vect a)\Bigr) \geq c/2\gt 0 \end{equation*}
for \(\|h\|\) small enough. Hence, if \(\grad f(\vect a)=0\) and \(H_f(\vect a)\) is positive definite, then \(f\) has a local minimum at \(\vect a\text{.}\) This proves (ii). Now (i) follows from (ii) by applying it to \(-f\text{.}\)
If \(H_f(\vect a)\) is indefinite there exist unit vectors \(\vect v,\vect w\) such that \(g_{\vect v}''=\vect v^TH_f(\vect a)\vect v\lt 0\text{,}\) and \(g_{\vect w}''=\vect w^TH_f(\vect a)\vect w\gt 0\text{.}\) This means that along some cross-section through the graph of \(f\) we have a maximum, and in some we have a minimum.

Remark 4.52.

A point such that (iii) in the above theorem is satisfied is usually called a saddle point. Examples are shown in Figure 2.7 or Figure 4.49
To decide whether a matrix is positive definite we need to use some linear algebra. Recall that every symmetric \(N\times N\) matrix \(A\) is diagonalisable by means of an orthogonal matrix. An orthogonal matrix is an invertible matrix \(P\) such that \(P^{-1}=P^T\text{,}\) that is, \(P^TP=PP^T=I\text{.}\) You learn in linear algebra that for every symmetric matrix \(A\) there exists an orthogonal matrix \(P\) such that
\begin{equation*} P^TAP=D:= \begin{bmatrix} \lambda_1 \amp 0 \amp \dots \amp 0 \\ 0 \amp \lambda_2 \amp \amp \vdots \\ \vdots \amp \amp \ddots \amp 0 \\ 0 \amp \dots \amp 0 \amp \lambda_N \end{bmatrix} \end{equation*}
is diagonal and \(\lambda_1,\dots,\lambda_N\) are the eigenvalues of \(A\text{.}\) By definition of the Euclidean norm we have
\begin{equation*} \|P\vect x\|^2 =(P\vect x)^TP\vect x =\vect x^TP^TP\vect x =\vect x^T\vect x =\|\vect x\|^2\text{.} \end{equation*}
Similarly we get \(\|P^T\vect x\|^2=\|\vect x^2\|\text{.}\) If \(\vect y=(y_1,\dots,y_N)\text{,}\) then a simple calculation shows that
\begin{equation*} \vect y^TD\vect y=\sum_{k=1}^N\lambda_ky_k^2 \end{equation*}
and hence
\begin{equation*} \vect y^TD\vect y\geq\lambda_{\text{min}}\|\vect y\|^2,\qquad \vect y^TD\vect y\leq\lambda_{\text{max}}\|\vect y\|^2, \end{equation*}
where
\begin{equation*} \lambda_{\text{min}}:=\min\{\lambda_1,\dots,\lambda_N\} \quad\text{and}\quad \lambda_{\text{max}}:=\max\{\lambda_1,\dots,\lambda_N\}\text{.} \end{equation*}
Combining all the above the above we get
\begin{align*} \vect x^TA\vect x \amp =\vect x^TPDP^T\vect x \geq\lambda_{\min}\|P^T\vect x\|^2 =\lambda_{\min}\|\vect x\|^2\\ \vect x^TA\vect x, \amp =\vect x^TPDP^T\vect x \geq\lambda_{\max}\|P^T\vect x\|^2 =\lambda_{\max}\|\vect x\|^2 \end{align*}
for all \(\vect x\in\mathbb R^N\text{.}\) Note that there is equality if \(\vect x\) happens to be an eigenvector corresponding to \(\lambda_{\max}\) and \(\lambda_{\min}\text{,}\) respectively. This leads to the following criterion for positive and negative matrices.
From the above we can get a simple criterion in case of \(N=2\text{.}\)

Proof.

To compute the eigenvalues we determine the characteristic polynomial of \(A\) given by
\begin{align*} \det(A-\lambda I) \amp =\det \begin{bmatrix} a-\lambda \amp c \\ c \amp b-\lambda \end{bmatrix}\\ \amp =(a-\lambda)(b-\lambda)-c^2\\ \amp =\lambda^2-(a+b)\lambda+ab-c^2\\ \amp =\lambda^2-(a+b)\lambda+\det A\text{.} \end{align*}
Hence the eigenvalues of \(A\) are
\begin{align*} \lambda_1\amp=\frac{1}{2}\Bigl(a+b+\sqrt{(a+b)^2-4\det A}\Bigr)\\ \lambda_2\amp=\frac{1}{2}\Bigl(a+b-\sqrt{(a+b)^2-4\det A}\Bigr), \end{align*}
and therefore,
\begin{gather} \lambda_1+\lambda_2=a+b\tag{4.7}\\ \lambda_1\lambda_2=\det A\text{.}\tag{4.8} \end{gather}
If \(\det A\gt 0\text{,}\) then (4.8) shows that either both eigenvalues are positive or both are negative. By (4.7) they are both positive if \(a\gt 0\) (or \(b\gt 0\)) and both negative if \(a\lt 0\text{.}\) If \(\det A\lt 0\text{,}\) then (4.8) shows that the eigenvalues must have different sign. To finish the proof we use Proposition 4.53.

Example 4.55.

  1. Let \(A= \begin{bmatrix} 9 \amp -1 \\-1\amp 2 \end{bmatrix} \text{.}\)
    Then \(a=9\gt 0\) and \(\det A=18-1=17\gt 0\text{,}\) so \(A\) is positive definite.
  2. Let \(A= \begin{bmatrix} -2 \amp 3 \\3\amp 1 \end{bmatrix} \text{.}\)
    Then \(\det A=-2-9=-11\lt 0\text{,}\) so \(A\) is indefinite.
  3. Let \(A= \begin{bmatrix} -5 \amp 2 \\2\amp -1 \end{bmatrix} \text{.}\)
    Then \(a=-5\lt 0\) and \(\det A=5-4=1\gt 0\text{,}\) so \(A\) is negative definite.
With the above we can give a convenient criterion for maxima and minima of functions of two variables.

Proof.

The first three assertions are a direct consequence of Theorem 4.51 and Corollary 4.54.
The last assertion can be shown by constructing suitable examples: If we set \(f(x,y)=\alpha x^4+\beta y^4\) then \(\grad f(x,y)=4(\alpha x^3,\beta y^3)\text{,}\) and thus \((0,0)\) is a critical point. The Hessian matrix is
\begin{equation*} H_f(x,y)= \begin{bmatrix} 12\alpha x^2 \amp 0 \\ 0 \amp 12\beta y^2 \end{bmatrix}\text{,} \end{equation*}
and so
\begin{equation*} \det H_f(0,0)=\det \begin{bmatrix} 0 \amp 0 \\ 0 \amp 0 \end{bmatrix} =0\text{.} \end{equation*}
However, it is rather obvious that \((0,0)\) is a minimum if \(\alpha=\beta=1\text{,}\) a maximum if \(\alpha=\beta=-1\text{,}\) and a saddle point if \(\alpha=1\) and \(\beta=-1\text{.}\) Sketch the graphs to see this!

Example 4.57.

Determine the critical points of \(f(x,y)=x^3+3xy^2-3x^2-3y^2+4\text{,}\) and decide whether \(f\) attains a maximum, minimum or a saddle point at each of them.
Solution.
To determine the critical points we have to solve the system of equations
\begin{align*} \frac{\partial f}{\partial x}(x,y) \amp =3x^2+3y^2-6x=0\\ \frac{\partial f}{\partial y}(x,y) \amp =6xy-6y=0.\text{.} \end{align*}
From the second equation we get that \(y=0\) or \(x=1\text{.}\) If \(y=0\) then from the first equation \(x(3x-6)=0\) and so \(x=0\) or \(x=2\text{.}\) Hence \((0,0)\) and \((2,0)\) are critical points. If \(x=1\) then from the first equation \(3+3y^2-6=0\text{,}\) and so \(y=\pm 1\text{.}\) Hence the critical points of \(f\) are
\begin{equation*} (0,0),\quad(2,0),\quad(1,1)\quad\text{and}\quad(1,-1)\text{.} \end{equation*}
To decide the behaviour of \(f\) at these points we determine its Hessian matrix:
\begin{align*} \frac{\partial^2f}{\partial x^2}(x,y) \amp =\frac{\partial}{\partial x}(3x^2+3y^2-6x)=6x-6\\ \frac{\partial^2f}{\partial y^2}(x,y) \amp =\frac{\partial}{\partial y}(6xy-6y)=6x-6\\ \frac{\partial^2f}{\partial y\partial x}(x,y) \amp =\frac{\partial^2f}{\partial x\partial y}(x,y) =\frac{\partial}{\partial x}(6xy-6y) =6y\text{.} \end{align*}
Hence the Hessian matrix is
\begin{equation*} H_f(x,y)=6 \begin{bmatrix} x-1 \amp y \\ y \amp x-1 \end{bmatrix}.\text{.} \end{equation*}
We next look at \(H_f(x,y)\) at every critical point:
  • We have \(H_f(0,0)=6 \begin{bmatrix} -1 \amp 0 \\ 0\amp -1 \end{bmatrix} \text{,}\) and so \(-1\lt 0\) and \(\det H_f(0,0)=1\text{,}\) showing that \(f\) attains a maximum at \((0,0)\text{.}\)
  • Next we have \(H_f(2,0)=6 \begin{bmatrix} 1 \amp 0 \\ 0\amp 1 \end{bmatrix} \text{,}\) and so \(1\gt 0\) and \(\det H_f(0,0)=1\text{,}\) showing that \(f\) attains a minimum at \((2,0)\text{.}\)
  • Next we have \(H_f(1,1)=6 \begin{bmatrix} 0 \amp 1 \\ 1\amp 0 \end{bmatrix} \text{,}\) and so \(\det H_f(0,0)=-1\lt 0\text{,}\) showing that \(f\) has a saddle point at \((1,1)\text{.}\)
  • Finally we have \(H_f(1,-1)=6 \begin{bmatrix} 0 \amp -1 \\ -1\amp 0 \end{bmatrix} \text{,}\) and so \(\det H_f(0,0)=-1\lt 0\text{,}\) showing that \(f\) has another saddle point at \((1,-1)\text{.}\)