Skip to main content

Section 4.8 Maxima and Minima

If g is a differentiable function of one variable defined on an open interval, then we know that g(t)=0 at every point t where it attains a (local) maximum or minimum. To find the local maxima and minima we therefore solve the equation g(t)=0 to find the possible candidates for maxima and minima. We then need criteria which guarantee that g attains in fact a maximum or minimum at such a point. One possible such criterion is that g(t)<0 for a maximum, and g(t)>0 for a minimum.

Subsection 4.8.1 Critical Points: The First Derivative Test

Using the above facts we want to derive conditions for maxima and minima of functions of several variables. To do so assume that f is defined on a domain D in RN. Suppose that f attains a (local) maximum (or minimum) at some point a=(a1,,aN). For every i=1,,N we can then define a function of one variable, t, by setting
gi(t)=f(a1,,ai1,ai+t,ai+1,,aN).
The graph of this function is a cross-section of the graph of f in the i-th coordinate direction as seen in Figure 4.1 for a function of two variables. If f attains a maximum (or minimum) at a then gi attains a (local) maximum (or minimum) at t=0 for all i=1,,N. If f has partial derivatives in all directions then by the criterion for single variable functions and the definition of the partial derivatives we have
0=gi(0)=xif(a)
for all i=1,,N. We therefore have the following theorem.

Definition 4.48. Critical point.

A point a in the domain of f is called a critical point of f if gradf(a)=0.
Hence, all local maxima are among the critical point of a function, but as in case of functions of one variable, not all critical points are maxima or minima. As mentioned at the start of the section a function g of one variable attains a local maximum in an open interval if g(t)=0 and g(t)<0. Next we look at in more detail what happens in several dimensions.

Subsection 4.8.2 The Hessian Matrix: The Second Derivative Test

As with the first derivative test for maxima and minima we can look at cross sections of the graph of the function f in the coordinate direction. If gradf(a)=0, then
2xi2f(a)<0
implies that the cross sections in the coordinate directions have a maximum at a. The question is whether this guarantees a maximum at f(a). Unfortunately, the answer is NO! The reason is that we only look at cross-sections of the graph in the coordinate directions. The condition only tells us that the function attains a maximum along these cross-section. In other directions it may be decreasing. As an example look at the function f(x,y)=4x2y2x4y4, whose graph is displayed in Figure 4.49. Along the cross-sections parallel to the axes through (0,0) the function attains a maximum, but along the cross-sections in the diagonal directions it attains a minimum. You can easily verify that formally.
Figure 4.49. Not a maximum or a minimum at (0,0).
Hence we have to make sure that we have a maximum along every cross-section. This means that all first directional derivatives must be zero, and all second directional derivatives of f at a must be negative in order to ensure a maximum. Hence let v=(v1,,vn) be an arbitrary unit vector, and define
gv(t):=f(a+tv).
Assume that all first and second order partial derivatives of f are continuous at a. According to the previous reasoning f attains a maximum at a if gv(0)=0 and gv(0)<0 for every unit vector v. By Proposition 4.28 we have
gv(0)=(gradf(a))v.
Hence gv(0)=0 if and only if gradf(a)=0. Now let us look at the second order derivative. By Proposition 4.43 we know that
gv(0)=2fv2(a)=vTHf(a)v,
where Hf(a) is the Hessian matrix of f at a defined by (4.5). Hence, if
vTHf(a)v<0
for all unit vectors v, then f attains a maximum at a. The last conclusion is not obvious, an additional argument is required. Before we formulate a theorem we introduce some definitions from linear algebra.

Definition 4.50. Positive definite.

A symmetric N×N-matrix A is called positive definite if there exists a constant c>0 such that
vTAvcv2
for all vRN. The matrix A is called negative definite if A is positive definite. If
vTAv0 (or 0)
for all vRN then A is called positive (or negative) stermi-definite. If A is none of the above then A is called indefinite.
With this terminology we can formulate our main result about maxima and minima.

Proof.

We can give a proof of the above using Taylor’s Theorem 4.45. If Hf(a) is positive definite, then there exists c>0 such that
f(a+h)f(a)=gradf(a)+hTHf(a)h+R2(h)=hTHf(a)h+R2(h)ch2+R2(h)
for h small enough. Dividing by h2 we get
1h2(f(a+h)f(a))c+R2(h)h2.
By Taylor’s Theorem 4.45 R2(h)/h20 as h0. Since c>0 we get
1h2(f(a+h)f(a))c/2>0
for h small enough. Hence, if gradf(a)=0 and Hf(a) is positive definite, then f has a local minimum at a. This proves (ii). Now (i) follows from (ii) by applying it to f.
If Hf(a) is indefinite there exist unit vectors v,w such that gv=vTHf(a)v<0, and gw=wTHf(a)w>0. This means that along some cross-section through the graph of f we have a maximum, and in some we have a minimum.

Remark 4.52.

A point such that (iii) in the above theorem is satisfied is usually called a saddle point. Examples are shown in Figure 2.7 or Figure 4.49
To decide whether a matrix is positive definite we need to use some linear algebra. Recall that every symmetric N×N matrix A is diagonalisable by means of an orthogonal matrix. An orthogonal matrix is an invertible matrix P such that P1=PT, that is, PTP=PPT=I. You learn in linear algebra that for every symmetric matrix A there exists an orthogonal matrix P such that
PTAP=D:=[λ1000λ2000λN]
is diagonal and λ1,,λN are the eigenvalues of A. By definition of the Euclidean norm we have
Px2=(Px)TPx=xTPTPx=xTx=x2.
Similarly we get PTx2=x2. If y=(y1,,yN), then a simple calculation shows that
yTDy=k=1Nλkyk2
and hence
yTDyλminy2,yTDyλmaxy2,
where
λmin:=min{λ1,,λN}andλmax:=max{λ1,,λN}.
Combining all the above the above we get
xTAx=xTPDPTxλminPTx2=λminx2xTAx,=xTPDPTxλmaxPTx2=λmaxx2
for all xRN. Note that there is equality if x happens to be an eigenvector corresponding to λmax and λmin, respectively. This leads to the following criterion for positive and negative matrices.
From the above we can get a simple criterion in case of N=2.

Proof.

To compute the eigenvalues we determine the characteristic polynomial of A given by
det(AλI)=det[aλccbλ]=(aλ)(bλ)c2=λ2(a+b)λ+abc2=λ2(a+b)λ+detA.
Hence the eigenvalues of A are
λ1=12(a+b+(a+b)24detA)λ2=12(a+b(a+b)24detA),
and therefore,
(4.7)λ1+λ2=a+b(4.8)λ1λ2=detA.
If detA>0, then (4.8) shows that either both eigenvalues are positive or both are negative. By (4.7) they are both positive if a>0 (or b>0) and both negative if a<0. If detA<0, then (4.8) shows that the eigenvalues must have different sign. To finish the proof we use Proposition 4.53.

Example 4.55.

  1. Let A=[9112].
    Then a=9>0 and detA=181=17>0, so A is positive definite.
  2. Let A=[2331].
    Then detA=29=11<0, so A is indefinite.
  3. Let A=[5221].
    Then a=5<0 and detA=54=1>0, so A is negative definite.
With the above we can give a convenient criterion for maxima and minima of functions of two variables.

Proof.

The first three assertions are a direct consequence of Theorem 4.51 and Corollary 4.54.
The last assertion can be shown by constructing suitable examples: If we set f(x,y)=αx4+βy4 then gradf(x,y)=4(αx3,βy3), and thus (0,0) is a critical point. The Hessian matrix is
Hf(x,y)=[12αx20012βy2],
and so
detHf(0,0)=det[0000]=0.
However, it is rather obvious that (0,0) is a minimum if α=β=1, a maximum if α=β=1, and a saddle point if α=1 and β=1. Sketch the graphs to see this!

Example 4.57.

Determine the critical points of f(x,y)=x3+3xy23x23y2+4, and decide whether f attains a maximum, minimum or a saddle point at each of them.
Solution.
To determine the critical points we have to solve the system of equations
fx(x,y)=3x2+3y26x=0fy(x,y)=6xy6y=0..
From the second equation we get that y=0 or x=1. If y=0 then from the first equation x(3x6)=0 and so x=0 or x=2. Hence (0,0) and (2,0) are critical points. If x=1 then from the first equation 3+3y26=0, and so y=±1. Hence the critical points of f are
(0,0),(2,0),(1,1)and(1,1).
To decide the behaviour of f at these points we determine its Hessian matrix:
2fx2(x,y)=x(3x2+3y26x)=6x62fy2(x,y)=y(6xy6y)=6x62fyx(x,y)=2fxy(x,y)=x(6xy6y)=6y.
Hence the Hessian matrix is
Hf(x,y)=6[x1yyx1]..
We next look at Hf(x,y) at every critical point:
  • We have Hf(0,0)=6[1001], and so 1<0 and detHf(0,0)=1, showing that f attains a maximum at (0,0).
  • Next we have Hf(2,0)=6[1001], and so 1>0 and detHf(0,0)=1, showing that f attains a minimum at (2,0).
  • Next we have Hf(1,1)=6[0110], and so detHf(0,0)=1<0, showing that f has a saddle point at (1,1).
  • Finally we have Hf(1,1)=6[0110], and so detHf(0,0)=1<0, showing that f has another saddle point at (1,1).