Maxima and Minima

Section 4.8 Maxima and Minima

g

is a differentiable function of one variable defined on an open interval, then we know that

g^{'} (t) = 0

at every point

t

where it attains a (local) maximum or minimum. To find the local maxima and minima we therefore solve the equation

g^{'} (t) = 0

to find the possible candidates for maxima and minima. We then need criteria which guarantee that

g

attains in fact a maximum or minimum at such a point. One possible such criterion is that

g^{″} (t) < 0

for a maximum, and

g^{″} (t) > 0

for a minimum.

🔗

Subsection 4.8.1 Critical Points: The First Derivative Test

Using the above facts we want to derive conditions for maxima and minima of functions of several variables. To do so assume that

f

is defined on a domain

D

R^{N} .

Suppose that

f

attains a (local) maximum (or minimum) at some point

a = (a_{1}, \dots, a_{N}) .

For every

i = 1, \dots, N

we can then define a function of one variable,

t,

by setting

🔗

g_{i} (t) = f (a_{1}, \dots, a_{i - 1}, a_{i} + t, a_{i + 1}, \dots, a_{N}) .

🔗

The graph of this function is a cross-section of the graph of

f

in the

i

-th coordinate direction as seen in Figure 4.1 for a function of two variables. If

f

attains a maximum (or minimum) at

a

then

g_{i}

attains a (local) maximum (or minimum) at

t = 0

for all

i = 1, \dots, N .

f

has partial derivatives in all directions then by the criterion for single variable functions and the definition of the partial derivatives we have

🔗

0 = g_{i}^{'} (0) = \frac{\partial}{\partial x_{i}} f (a)

for all

i = 1, \dots, N .

We therefore have the following theorem.

🔗

Theorem 4.47. Maxima and minima.

Suppose that

f

is a function defined on a domain

D

R^{N}

attaining a maximum or minimum at the interior point

a

D .

Then

🔗

\begin{matrix} (4.6) & grad f (a) = 0 \end{matrix}

if all partial derivatives exist.

🔗

Definition 4.48. Critical point.

A point

a

in the domain of

f

is called a critical point of

f

grad f (a) = 0 .

🔗

Hence, all local maxima are among the critical point of a function, but as in case of functions of one variable, not all critical points are maxima or minima. As mentioned at the start of the section a function

g

of one variable attains a local maximum in an open interval if

g^{'} (t) = 0

and

g^{″} (t) < 0 .

Next we look at in more detail what happens in several dimensions.

🔗

Subsection 4.8.2 The Hessian Matrix: The Second Derivative Test

As with the first derivative test for maxima and minima we can look at cross sections of the graph of the function

f

in the coordinate direction. If

grad f (a) = 0,

then

🔗

\frac{\partial^{2}}{\partial x_{i}^{2}} f (a) < 0

implies that the cross sections in the coordinate directions have a maximum at

a .

The question is whether this guarantees a maximum at

f (a) .

Unfortunately, the answer is NO! The reason is that we only look at cross-sections of the graph in the coordinate directions. The condition only tells us that the function attains a maximum along these cross-section. In other directions it may be decreasing. As an example look at the function

f (x, y) = 4 x^{2} y^{2} - x^{4} - y^{4},

whose graph is displayed in Figure 4.49. Along the cross-sections parallel to the axes through

(0, 0)

the function attains a maximum, but along the cross-sections in the diagonal directions it attains a minimum. You can easily verify that formally.

🔗

Figure 4.49. Not a maximum or a minimum at $(0, 0) .$
🔗

Hence we have to make sure that we have a maximum along every cross-section. This means that all first directional derivatives must be zero, and all second directional derivatives of

f

a

must be negative in order to ensure a maximum. Hence let

v = (v_{1}, \dots, v_{n})

be an arbitrary unit vector, and define

🔗

g_{v} (t) := f (a + t v) .

🔗

Assume that all first and second order partial derivatives of

f

are continuous at

a .

According to the previous reasoning

f

attains a maximum at

a

g_{v}^{'} (0) = 0

and

g_{v}^{″} (0) < 0

for every unit vector

v .

By Proposition 4.28 we have

🔗

g_{v}^{'} (0) = (grad f (a)) \cdot v .

🔗

Hence

g_{v}^{'} (0) = 0

if and only if

grad f (a) = 0 .

Now let us look at the second order derivative. By Proposition 4.43 we know that

🔗

g_{v}^{″} (0) = \frac{\partial^{2} f}{\partial v^{2}} (a) = v^{T} H_{f} (a) v,

where

H_{f} (a)

is the Hessian matrix of

f

a

defined by (4.5). Hence, if

🔗

v^{T} H_{f} (a) v < 0

for all unit vectors

v,

then

f

attains a maximum at

a .

The last conclusion is not obvious, an additional argument is required. Before we formulate a theorem we introduce some definitions from linear algebra.

🔗

Definition 4.50. Positive definite.

A symmetric

N \times N

-matrix

A

is called positive definite if there exists a constant

c > 0

such that

🔗

v^{T} A v \geq c ‖ v ‖^{2}

for all

v \in R^{N} .

The matrix

A

is called negative definite if

- A

is positive definite. If

🔗

v^{T} A v \geq 0 (or \leq 0)

for all

v \in R^{N}

then

A

is called positive (or negative) stermi-definite. If

A

is none of the above then

A

is called indefinite.

🔗

With this terminology we can formulate our main result about maxima and minima.

🔗

Theorem 4.51.

Suppose that all first and second order partial derivatives of

f

are continuous at the interior point

a

of the domain of

f .

grad f (a) = 0

then

🔗

$f$ attains a maximum at $a$ if $H_{f} (a)$ is negative definite;
$f$ attains a minimum at $a$ if $H_{f} (a)$ is positive definite;
$f$ has a saddle point at $a$ if $H_{f} (a)$ is indefinite.

🔗

Proof.

We can give a proof of the above using Taylor’s Theorem 4.45. If

H_{f} (a)

is positive definite, then there exists

c > 0

such that

\begin{aligned} f (a + h) - f (a) & = grad f (a) + h^{T} H_{f} (a) h + R_{2} (h) \\ = h^{T} H_{f} (a) h + R_{2} (h) \geq c ‖ h ‖^{2} + R_{2} (h) \end{aligned}

for

‖ h ‖

small enough. Dividing by

‖ h ‖^{2}

we get

\frac{1}{‖ h ‖^{2}} (f (a + h) - f (a)) \geq c + \frac{R_{2} (h)}{‖ h ‖^{2}} .

By Taylor’s Theorem 4.45

R_{2} (h) / ‖ h ‖^{2} \to 0

‖ h ‖ \to 0 .

Since

c > 0

we get

\frac{1}{‖ h ‖^{2}} (f (a + h) - f (a)) \geq c / 2 > 0

for

‖ h ‖

small enough. Hence, if

grad f (a) = 0

and

H_{f} (a)

is positive definite, then

f

has a local minimum at

a .

This proves (ii). Now (i) follows from (ii) by applying it to

- f .

H_{f} (a)

is indefinite there exist unit vectors

v, w

such that

g_{v}^{″} = v^{T} H_{f} (a) v < 0,

and

g_{w}^{″} = w^{T} H_{f} (a) w > 0 .

This means that along some cross-section through the graph of

f

we have a maximum, and in some we have a minimum.

Remark 4.52.

A point such that (iii) in the above theorem is satisfied is usually called a saddle point. Examples are shown in Figure 2.7 or Figure 4.49

🔗

To decide whether a matrix is positive definite we need to use some linear algebra. Recall that every symmetric

N \times N

matrix

A

is diagonalisable by means of an orthogonal matrix. An orthogonal matrix is an invertible matrix

P

such that

P^{- 1} = P^{T},

that is,

P^{T} P = P P^{T} = I .

You learn in linear algebra that for every symmetric matrix

A

there exists an orthogonal matrix

P

such that

🔗

P^{T} A P = D := [\begin{matrix} λ_{1} & 0 & \dots & 0 \\ 0 & λ_{2} & ⋮ \\ ⋮ & ⋱ & 0 \\ 0 & \dots & 0 & λ_{N} \end{matrix}]

is diagonal and

λ_{1}, \dots, λ_{N}

are the eigenvalues of

A .

By definition of the Euclidean norm we have

🔗

‖ P x ‖^{2} = (P x)^{T} P x = x^{T} P^{T} P x = x^{T} x = ‖ x ‖^{2} .

🔗

Similarly we get

‖ P^{T} x ‖^{2} = ‖ x^{2} ‖ .

y = (y_{1}, \dots, y_{N}),

then a simple calculation shows that

🔗

y^{T} D y = \sum_{k = 1}^{N} λ_{k} y_{k}^{2}

and hence

🔗

y^{T} D y \geq λ_{min} ‖ y ‖^{2}, y^{T} D y \leq λ_{max} ‖ y ‖^{2},

where

🔗

λ_{min} := min {λ_{1}, \dots, λ_{N}} and λ_{max} := max {λ_{1}, \dots, λ_{N}} .

🔗

Combining all the above the above we get

🔗

\begin{aligned} x^{T} A x & = x^{T} P D P^{T} x \geq λ_{min} ‖ P^{T} x ‖^{2} = λ_{min} ‖ x ‖^{2} \\ x^{T} A x, & = x^{T} P D P^{T} x \geq λ_{max} ‖ P^{T} x ‖^{2} = λ_{max} ‖ x ‖^{2} \end{aligned}

for all

x \in R^{N} .

Note that there is equality if

x

happens to be an eigenvector corresponding to

λ_{max}

and

λ_{min},

respectively. This leads to the following criterion for positive and negative matrices.

🔗

Proposition 4.53. Eigenvalues of positive definite matrices.

Suppose that

A

is a symmetric

N \times N

matrix with smallest and largest eigenvalues

λ_{min}

and

λ_{max} .

Then

🔗

$A$ is positive definite if and only if $λ_{min} > 0;$
$A$ is negative definite if and only if $λ_{max} < 0;$
$A$ is indefinite if and only if $λ_{min} < 0$ and $λ_{max} > 0 .$

🔗

From the above we can get a simple criterion in case of

N = 2 .

🔗

Corollary 4.54.

Suppose that

A = [\begin{matrix} a & c \\ c & b \end{matrix}]

is a symmetric matrix. Then

🔗

$A$ is positive definite if and only if $a > 0$ and $det A > 0;$
$A$ is negative definite if and only if $a < 0$ and $det A > 0;$
$A$ is indefinite if $det A < 0 .$

🔗

Proof.

To compute the eigenvalues we determine the characteristic polynomial of

A

given by

\begin{aligned} det (A - λ I) & = det [\begin{array}{c} a - λ & c \\ c & b - λ \end{array}] \\ = (a - λ) (b - λ) - c^{2} \\ = λ^{2} - (a + b) λ + a b - c^{2} \\ = λ^{2} - (a + b) λ + det A . \end{aligned}

Hence the eigenvalues of

A

are

\begin{aligned} λ_{1} & = \frac{1}{2} (a + b + \sqrt{(a + b)^{2} - 4 det A}) \\ λ_{2} & = \frac{1}{2} (a + b - \sqrt{(a + b)^{2} - 4 det A}), \end{aligned}

and therefore,

\begin{matrix} (4.7) & λ_{1} + λ_{2} = a + b \\ (4.8) & λ_{1} λ_{2} = det A . \end{matrix}

det A > 0,

then (4.8) shows that either both eigenvalues are positive or both are negative. By (4.7) they are both positive if

a > 0

(or

b > 0

) and both negative if

a < 0 .

det A < 0,

then (4.8) shows that the eigenvalues must have different sign. To finish the proof we use Proposition 4.53.

Example 4.55.

Let $A = [\begin{matrix} 9 & - 1 \\ - 1 & 2 \end{matrix}] .$

Then $a = 9 > 0$ and $det A = 18 - 1 = 17 > 0,$ so $A$ is positive definite.
Let $A = [\begin{matrix} - 2 & 3 \\ 3 & 1 \end{matrix}] .$

Then $det A = - 2 - 9 = - 11 < 0,$ so $A$ is indefinite.
Let $A = [\begin{matrix} - 5 & 2 \\ 2 & - 1 \end{matrix}] .$

Then $a = - 5 < 0$ and $det A = 5 - 4 = 1 > 0,$ so $A$ is negative definite.

🔗

With the above we can give a convenient criterion for maxima and minima of functions of two variables.

🔗

Theorem 4.56.

Suppose that all first and second partial derivatives of

f

defined on

D \subset R^{2}

are continuous at the interior point

a \in D,

and that

grad f (a) = 0 .

Then

🔗

$f$ attains a (local) maximum at $a$ if $\frac{\partial^{2} f}{\partial x_{1}^{2}} (a) < 0$ and $det H_{f} (a) > 0;$
$f$ attains a (local) minimum at $a$ if $\frac{\partial^{2} f}{\partial x_{1}^{2}} (a) > 0$ and $det H_{f} (a) > 0;$
$f$ has a saddle point at $a$ if $det H_{f} (a) < 0;$
the test is inconclusive if $det H_{f} (a) = 0 .$

🔗

Proof.

The first three assertions are a direct consequence of Theorem 4.51 and Corollary 4.54.

The last assertion can be shown by constructing suitable examples: If we set

f (x, y) = α x^{4} + β y^{4}

then

grad f (x, y) = 4 (α x^{3}, β y^{3}),

and thus

(0, 0)

is a critical point. The Hessian matrix is

H_{f} (x, y) = [\begin{matrix} 12 α x^{2} & 0 \\ 0 & 12 β y^{2} \end{matrix}],

and so

det H_{f} (0, 0) = det [\begin{matrix} 0 & 0 \\ 0 & 0 \end{matrix}] = 0 .

However, it is rather obvious that

(0, 0)

is a minimum if

α = β = 1,

a maximum if

α = β = - 1,

and a saddle point if

α = 1

and

β = - 1 .

Sketch the graphs to see this!

Example 4.57.

Determine the critical points of

f (x, y) = x^{3} + 3 x y^{2} - 3 x^{2} - 3 y^{2} + 4,

and decide whether

f

attains a maximum, minimum or a saddle point at each of them.

🔗

Solution.

To determine the critical points we have to solve the system of equations

\begin{aligned} \frac{\partial f}{\partial x} (x, y) & = 3 x^{2} + 3 y^{2} - 6 x = 0 \\ \frac{\partial f}{\partial y} (x, y) & = 6 x y - 6 y = 0. . \end{aligned}

From the second equation we get that

y = 0

x = 1 .

y = 0

then from the first equation

x (3 x - 6) = 0

and so

x = 0

x = 2 .

Hence

(0, 0)

and

(2, 0)

are critical points. If

x = 1

then from the first equation

3 + 3 y^{2} - 6 = 0,

and so

y = \pm 1 .

Hence the critical points of

f

are

(0, 0), (2, 0), (1, 1) and (1, - 1) .

To decide the behaviour of

f

at these points we determine its Hessian matrix:

\begin{aligned} \frac{\partial^{2} f}{\partial x^{2}} (x, y) & = \frac{\partial}{\partial x} (3 x^{2} + 3 y^{2} - 6 x) = 6 x - 6 \\ \frac{\partial^{2} f}{\partial y^{2}} (x, y) & = \frac{\partial}{\partial y} (6 x y - 6 y) = 6 x - 6 \\ \frac{\partial^{2} f}{\partial y \partial x} (x, y) & = \frac{\partial^{2} f}{\partial x \partial y} (x, y) = \frac{\partial}{\partial x} (6 x y - 6 y) = 6 y . \end{aligned}

Hence the Hessian matrix is

H_{f} (x, y) = 6 [\begin{matrix} x - 1 & y \\ y & x - 1 \end{matrix}] . .

We next look at

H_{f} (x, y)

at every critical point:

We have $H_{f} (0, 0) = 6 [\begin{matrix} - 1 & 0 \\ 0 & - 1 \end{matrix}],$ and so $- 1 < 0$ and $det H_{f} (0, 0) = 1,$ showing that $f$ attains a maximum at $(0, 0) .$
Next we have $H_{f} (2, 0) = 6 [\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}],$ and so $1 > 0$ and $det H_{f} (0, 0) = 1,$ showing that $f$ attains a minimum at $(2, 0) .$
Next we have $H_{f} (1, 1) = 6 [\begin{matrix} 0 & 1 \\ 1 & 0 \end{matrix}],$ and so $det H_{f} (0, 0) = - 1 < 0,$ showing that $f$ has a saddle point at $(1, 1) .$
Finally we have $H_{f} (1, - 1) = 6 [\begin{matrix} 0 & - 1 \\ - 1 & 0 \end{matrix}],$ and so $det H_{f} (0, 0) = - 1 < 0,$ showing that $f$ has another saddle point at $(1, - 1) .$

🔗

Prev Top Next