Skip to main content

Section 4.9 Maxima and Minima with Constraints

In this section we consider extremal value problems as in the previous section, but assume that a side condition (constraint) has to be satisfied. This will lead to the Lagrange multiplier rule. As an example of such a problem consider the following four problems.
  1. Find the rectangle inscribed in the ellipse \(x_1^2+2x_2^2=1\) with the largest surface area as shown in Figure 4.58
    Figure 4.58. A rectangle inscribed in an ellipse
  2. A footpath on the mountain \(x_3=4x_1x_2\) lies over the curve \(x_1^2+2x_2^2=1\) as shown in Figure 4.59.(a). Where is its highest point?
  3. Find the largest value of \(4x_1x_2\) on the curve \(x_1^2+2x_2^2=1\text{.}\)
  4. Find all level curves of \(f(x_1,x_2)=4x_1x_2\) which are tangent to the ellipse \(x_1^2+2x_2^2=1\text{,}\) and determine the points of tangency. The contour lines and the ellipse are shown in Figure 4.59.(b).
(a) A path on a mountain
(b) Contour lines of \(f\) with ellipse
Figure 4.59. Contour lines of \(f\) and the ellipse
The area of a rectangle inscribed in the ellipse \(x_1^2+2x_2^2=1\) is \(4x_1x_2\text{.}\) Hence the first three problems are the same mathematical problem. It turns out that the fourth is also the same. Intuitively, we can see this as follows. We walk along the curve on the graph in Figure 4.59.(a). When we reach a maximum or a minimum we just touch the maximal level, we are lower before and after. When projected onto the plane below this means that the curve hits the corresponding contour line at one point only without crossing it. Crossing would mean that we keep going up (or down), so we cannot be at an extremal point. Hence the contour line and the curve must be tangential at every maximum and minimum along the curve. In our previous example the contour map of \(f(x_1,x_2)=4x_1x_2\) and the ellipse \(x_1^2+2x_2^2=1\) are shown in Figure 4.59.(b). One can clearly see where they are tangential.
If two curves are tangential, then the vectors perpendicular to the common tangent must obviously be multiples of each other. We saw in Theorem 4.31 that the gradient of a function is perpendicular to the contour line at every point. Hence if \(f\) attains a maximum (or minimum) subject to the condition \(g(\vect x)=0\) at a point \(\vect x_0\) we must have \(\grad f(\vect x_0)=\lambda\grad g(\vect x_0)\) for some \(\lambda\in\mathbb R\text{.}\) This is called the Lagrange multiplier rule, and \(\lambda\) a Lagrange multiplier. Note however, that tangency of the curve and a contour line of \(f\) does not guarantee that we are at a maximum or a minimum. We could just be at a point where the path levels out, but keeps going up or down.

Proof.

We only give a proof for maxima, the case of minima is completely analogous. Assume that \(f\) attains a maximum at \(\vect x_0\) subject to \(g(\vect x)=0\text{.}\) By assumption \(\grad g(\vect x_0)\neq(0,0)\text{.}\) By interchanging \(x_1\) and \(x_2\) if necessary, we can assume without loss of generality that
\begin{equation} \frac{\partial}{\partial x_2}g(\vect x_0)\neq 0\text{.}\tag{4.11} \end{equation}
As discussed in Remark 4.34 the set of solutions of \(g(\vect x)=0\) containing \(x_0\) is the graph of a continuously differentiable function \(h\) defined in an interval \(I\) centred at \(x_{01}\text{.}\) The situation is shown in Figure 4.61.
Figure 4.61. The fat part of \(g(\vect x)=0\) near \(\vect x_0\) is the graph of a function.
Hence \(g(x_1,h(x_1))=0\) for \(x_1\in I\text{.}\) By our assumption \(f\) attains a maximum at \(\vect x_0\) on the curve \(g(\vect x)=0\text{.}\) This means that the function \(x_1\mapsto f(x_1,h(x_1))\) attains a maximum at \(x_{01}\) in \(I\text{.}\) But this is a function of one variable. We know that at a maximum the derivative of such a function must be zero. Using the chain rule (see Theorem 4.17) we therefore get
\begin{align} 0\amp=\left.\frac{d}{dx_1}f((x_1,h(x_1))\right|_{x_1=x_{01}}\notag\\ \amp=\frac{\partial}{\partial x_1}f((x_{01},h(x_{01}))+\frac{\partial}{\partial x_2}f((x_{01},h(x_{01}))h'(x_{01})\notag\\ \amp=\frac{\partial}{\partial x_1}f(\vect x_0)+\frac{\partial}{\partial x_2}f(\vect x_0)h'(x_{01})\text{.}\tag{4.12} \end{align}
We next compute \(h'(x_{01})\text{.}\) To do so we use the identity
\begin{equation*} g(x_1,h(x_1))=0\qquad\text{for }x_1\in I\text{.} \end{equation*}
Applying the chain rule as before we get
\begin{equation} \frac{d}{dx_1}g((x_1,h(x_1)) =\frac{\partial}{\partial x_1}g(x_1,h(x_1)) +\frac{\partial}{\partial x_2}g(x_1,h(x_1))h'(x_1) =0\text{.}\tag{4.13} \end{equation}
Taking into account (4.11) we have
\begin{align*} h'(x_{01}) \amp=-\Bigl[\frac{\partial}{\partial x_2} g(x_{01},h(x_{01}))\Bigr]^{-1} \frac{\partial}{\partial x_1}g(x_{01},h(x_{01}))\\ \amp=-\Bigl[\frac{\partial}{\partial x_2}g(\vect x_0)\Bigr]^{-1} \frac{\partial}{\partial x_1}g(\vect x_0)\text{.} \end{align*}
Substituting this into (4.12) we get
\begin{equation*} \frac{\partial}{\partial x_1}f(\vect x_0) -\frac{\partial}{\partial x_2}f(\vect x_0) \Bigl[\frac{\partial}{\partial x_2}g(\vect x_0)\Bigr]^{-1} \frac{\partial}{\partial x_1}g(\vect x_0) =0\text{.} \end{equation*}
Setting \(\lambda :=\dfrac{\partial}{\partial x_2}f(\vect x_0) \Bigl[\dfrac{\partial}{\partial x_2}g(\vect x_0)\Bigr]^{-1}\) the first equation in (4.10) holds. By definition of \(\lambda\) we finally have
\begin{equation*} \frac{\partial}{\partial x_2}f(\vect x_0) =\lambda\frac{\partial}{\partial x_2}g(\vect x_0), \end{equation*}
which is the second equation in (4.10). Hence (4.9) holds, completing the proof of the theorem.
The previous theorem guarantees that all points where \(f\) attains a maximum or minimum subject to \(g(\vect x)=0\) are among the solutions of the system
\begin{equation} \begin{aligned} \frac{\partial}{\partial x_1}f(x_1,x_2) \amp =\lambda\frac{\partial}{\partial x_1}g(x_1,x_2), \\ \frac{\partial}{\partial x_2}f(x_1,x_2) \amp =\lambda\frac{\partial}{\partial x_2}g(x_1,x_2), \\ g(x_1,x_2) \amp =0. \end{aligned}\tag{4.14} \end{equation}
The unknown variables in that system are \(x_1\text{,}\) \(x_2\) and \(\lambda\text{.}\) To find the maxima and minima of \(f\) subject to \(g(\vect x)=0\text{,}\) we first solve the above system to find all possible candidates. As mentioned before, not every solution of (4.14) needs to correspond to a maximum or minimum. We hence need to find other criteria to decide whether a given point corresponds to a maximum, a minimum or neither. We illustrate this by two examples.

Example 4.62.

We solve the problem from the beginning of the section: Find the maxima and minima of \(f(x_1,x_2)=4x_1x_2\) subject to the condition \(g(x_1,x_2)=x_1^2+2x_2^2-1=0\text{.}\)
Solution.
To do so we write down system (4.14) for our problem:
\begin{align*} 4x_2\amp =2\lambda x_1\\ 4x_1\amp =4\lambda x_2\\ x_1^2+2x_2^2-1\amp =0\text{.} \end{align*}
From the second equation we get \(x_1=\lambda x_2\text{.}\) Substituting this into the first equation we have \(2x_2=\lambda^2x_2\text{,}\) and so \(\lambda=\pm\sqrt{2}\text{.}\) Assume that \(\lambda=\sqrt{2}\text{.}\) Then from the second and the last equation \(1=2x_2^2+2x_2^2=4x_2^2\text{.}\) Hence \(x_2=\pm 1/2\text{,}\) and by the first (or second) equation \(x_1=\pm 1/\sqrt{2}\text{.}\) Hence \((1/\sqrt{2},1/2)\) and \((-1/\sqrt{2},-1/2)\) are candidates for maxima or minima. Proceeding similarly as above with \(\lambda=-\sqrt{2}\) we see that the other possible candidates are \((1/\sqrt{2},-1/2)\) and \((-1/\sqrt{2},1/2)\text{.}\) Now
\begin{equation*} f(1/\sqrt{2},1/2)=f(-1/\sqrt{2},-1/2)=\sqrt{2} \end{equation*}
and
\begin{equation*} f(1/\sqrt{2},-1/2)=f(-1/\sqrt{2},1/2)=-\sqrt{2}\text{.} \end{equation*}
The first correspond to maxima, and the second to minima. Note that these are the only possibilities as there must be a maximum and a minimum of \(f\) along the ellipse.

Example 4.63.

Find the maxima and minima of \(xy^2\) subject to the condition \(x^2+y^2=1\text{.}\)
Solution.
To solve this problem we first write down the system (4.14) for our situation:
\begin{align*} y^2 \amp =2\lambda x\\ 2xy \amp =2\lambda y\\ x^2+y^2-1 \amp =0\text{.} \end{align*}
Multiplying the first equation by \(2x\) and the second equation by \(y\) we get \(2xy^2=4\lambda x^2=2\lambda y^2\text{.}\) Hence
\begin{equation*} \lambda y^2=2\lambda x^2, \end{equation*}
which implies that either \(\lambda=0\) or \(y^2=2x^2\text{.}\) In the latter case we get from the third equation
\begin{equation*} x^2+2x^2=3x^2=1, \end{equation*}
and so \(x=\pm 1/\sqrt{3}\text{.}\) Hence we get \(y=\pm\sqrt{2x^2}=\pm\sqrt{2/3}\text{,}\) and therefore
\begin{equation*} (1/\sqrt{3},\sqrt{2/3}),\quad (-1/\sqrt{3},-\sqrt{2/3}),\quad (1/\sqrt{3},-\sqrt{2/3}),\quad (-1/\sqrt{3},\sqrt{2/3}) \end{equation*}
are candidates for maxima and minima. We now consider the case \(\lambda=0\text{.}\) Then from the first equation \(y=0\text{,}\) and from the third \(x^2=1\text{,}\) so
\begin{equation*} (1,0)\quad\text{and}\quad(-1,0) \end{equation*}
are other possible points for maxima and minima. We finally need to decide whether \(f(x,y)=xy^2\) attains a maximum, minimum or neither at the above points. We have
\begin{align*} f(1/\sqrt{3},\sqrt{2/3})\amp =f(1/\sqrt{3},-\sqrt{2/3})=2/3\sqrt{3}\\ f(-1/\sqrt{3},\sqrt{2/3})\amp =f(-1/\sqrt{3},-\sqrt{2/3})=-2/3\sqrt{3}\\ f(1,0)\amp =f(-1,0)=0\text{.} \end{align*}
Hence \(f\) attains a (global) maximum at \((1/\sqrt{3},\sqrt{2/3})\) and at \((1/\sqrt{3},-\sqrt{2/3})\text{,}\) and a (global) minimum at \((-1/\sqrt{3},\sqrt{2/3}\) and \((-1/\sqrt{3},-\sqrt{2/3}\) on the circle \(x^2+y^2=1\text{.}\) As \((1,0)\) lies between two maxima \(f\) attains a (local) minimum there. Likewise, as \((-1,0)\) lies between two minima, \(f\) must attain a (local) maximum at that point on the circle.