Section 4.3 The Chain Rule
If \(f,g\) are differentiable functions of one variable and \(g(x)\) is in the domain of \(f\) then the chain rule asserts that
\begin{equation*}
[f\circ g(x)]'=f'(g(x))g'(x),
\end{equation*}
as you know from first year calculus. We now want to generalise the above formula if \(f\) is a function of several variables. One key requirement was that \(g\) has its values in the domain of \(f\text{.}\) Hence, if \(f\) is defined on a subset \(D\subset\mathbb R^N\text{,}\) then \(g\) must to be a function with values in \(\mathbb R^N\text{,}\) that is, a vector valued function. In general \(\vect g\) could be a function of several variables as well.
In what follows we assume that \(f\) is a function of \(\vect x=(x_1,\dots,x_N)\text{,}\) and that \(\vect g\) a vector valued function with values in \(\mathbb R^N\) of the variables \(\vect y=(y_1,\dots,y_k)\text{.}\) As usual, we denote the component functions of \(\vect g\) by \(g_i\text{,}\) \(i=1,\dots,N\text{.}\) We are now in a position to state one possible version of the chain rule in vector calculus.
Theorem 4.17. Chain rule.
Suppose that \(f\text{,}\) \(\vect g\) are as above. Moreover, assume that \(\vect y\) is an interior point of the domain of \(f\circ\vect g\text{,}\) that \(\vect g\) has a partial derivative with respect to \(y_j\text{,}\) and that \(\grad f(\vect g(\vect y))\) exists and is continuous at \(\vect y\text{.}\) Then
\begin{align*}
\frac{\partial}{\partial y_j}f(\vect g(\vect y))
\amp=\grad f(\vect g(\vect y))
\cdot\frac{\partial}{\partial y_j}\vect g(\vect y)\\
\amp=\sum_{i=1}^N\frac{\partial f}{\partial x_i}(\vect g(\vect y))
\frac{\partial}{\partial y_j}g_i(\vect y)\text{.}
\end{align*}
If \(\vect g\) is a function of one variable, \(t\text{,}\) then
\begin{align*}
\frac{d}{dt}f(\vect g(t))
\amp=\grad f(\vect g(t))\cdot\vect g'(t)\\
\amp=\sum_{i=1}^N\frac{\partial f}{\partial x_i}(\vect g(t))
\frac{d}{dt}g_i(t)\text{.}
\end{align*}
Proof.
By definition of the partial derivative of a function we treat all variables as constant except for one. Hence the proof of the first formula can be reduced to the proof of the second one. The general proof is quite tedious, so we only illustrate the main ideas if \(N=2\text{.}\) We need to compute the limit as \(h\to 0\) of
\begin{equation*}
D(h):=\frac{1}{h}\bigl(f(g_1(t+h),g_2(t+h))-f(g_1(t),g_2(t))\bigr)\text{.}
\end{equation*}
To do so we rewrite the expression as
\begin{align*}
D(h)\amp =\frac{f(g_1(t+h),g_2(t+h))-f(g_1(t),g_2(t+h))}{g_1(t+h)-g_1(t)}\quad\frac{g_1(t+h)-g_1(t)}{h}\\
\amp \qquad+\frac{f(g_1(t),g_2(t+h))-f(g_1(t),g_2(t))}{g_2(t+h)-g_2(t)}\quad \frac{g_2(t+h)-g_2(t)}{h}
\end{align*}
adding and subtracting a term. As \(g_i\) is differentiable at \(t\) it follows that
\begin{equation*}
\lim_{h\to 0}\frac{g_i(t+h)-g_i(t)}{h}
=g_i'(t)
\end{equation*}
for \(i=1,2\text{.}\) As a differentiable function of one variable is continuous, we have that \(g_i(t+h)-g_i(t)\to 0\) as \(h\to 0\text{.}\) Hence, by definition of partial derivatives,
\begin{equation*}
\frac{f(g_1(t),g_2(t+h))-f(g_1(t),g_2(t))}{g_2(t+h)-g_2(t)}
\xrightarrow{h\to 0}
\frac{\partial f}{\partial x_2}(\vect g(t))\text{.}
\end{equation*}
To deal with the remaining term we apply the mean value theorem to the function \(x_1\mapsto f(x_1,g_2(t+h))\text{.}\) By assumption, that function is differentiable and its derivative is continuous at \(g_1(t)\text{.}\) Thus, by the mean value theorem there exist \(c_h\in\mathbb R\) such that
\begin{align*}
f(g_1(t+h),g_2(t+h))\amp -f(g_1(t),g_2(t+h))\\
\amp=\bigl(g_1(t+h)-g_1(t)\bigr)\frac{\partial f}{\partial x_1}f(c_h,g_2(t+h))
\end{align*}
with \(\bigl|c_h-g_1(t))\bigr|\leq\bigl|g_1(t+h)-g_1(t))\bigr|\text{.}\) As \(g_1\) is continuous at \(t\) it follows that \(c_h\to g_1(t))\to 0\) as \(h\to 0\text{.}\) Finally, by continuity of \(\partial f/\partial x_1\) with respect to \(x_1\) we conclude that
\begin{equation*}
\frac{f(g_1(t+h),g_2(t+h))-f(g_1(t),g_2(t+h))}{g_1(t+h)-g_1(t)}
\xrightarrow{h\to 0}
\frac{\partial f}{\partial x_1}(\vect g(t))
\end{equation*}
as \(h\to 0\text{.}\) If we put everything together it follows that
\begin{align*}
\lim_{h\to 0}D(h)
\amp=\frac{\partial f}{\partial x_1}(\vect g(t))g_1'(t)
+\frac{\partial f}{\partial x_2}(\vect g(t))g_2'(t)\\
\amp=\grad f(\vect g(t))\cdot\vect g'(t)
\end{align*}
as required. For general \(N\) there are more terms to add and subtract, but the basic ideas stay the same.
If \(\vect f\) is a vector valued function then the chain rule applies to every component function. We want to use this to derive a formula for the derivative of a composition \(\vect f\circ\vect g\text{,}\) which is a vector valued function. Before we do so we introduce the `Jacobian matrix' of a function.
Definition 4.18. Jacobian matrix.
Suppose that \(\vect f\) is a function defined on a subset of \(\mathbb R^N\) with values in \(\mathbb R^k\text{.}\) If all entries exist, the matrix
\begin{equation*}
J_{\vect f}(\vect x):=
\begin{bmatrix}
\frac{\partial}{\partial x_1}f_1(\vect x) \amp \dots \amp
\frac{\partial}{\partial x_N}f_1(\vect x) \\
\vdots \amp \ddots \amp \vdots \\
\frac{\partial}{\partial x_1}f_k(\vect x) \amp \dots \amp
\frac{\partial}{\partial x_N}f_k(\vect x)
\end{bmatrix}
\end{equation*}
is called the Jacobian matrix of \(\vect f\) at \(\vect x\).
Note that
\(J_f(\vect x)=\grad f(\vect x)\) if
\(f\) is scalar valued, that is,
\(k=1\text{.}\) If we apply
Theorem 4.17 to every component function of
\(\vect f\) then we get the following formula for the Jacobian matrix of a composition of functions.
Corollary 4.19.
Suppose that every component function of
\(\vect f\) and
\(\vect g\) satisfies the assumptions of
Theorem 4.17. Then
\begin{equation*}
J_{\vect f\circ\vect g}(\vect y)
=J_{\vect f}(\vect g(\vect y))J_{\vect g}(\vect y)\text{.}
\end{equation*}
In other words, the Jacobian matrix of a composition of vector valued function is the matrix product of the individual Jacobian matrices.
Example 4.20.
Let \(f(x,y,z)=x^2-z\cos(y)\) and \(\vect g(t):=(t,t^2,1/t)\text{.}\) To compute \((f\circ\vect g)'\) we first note that \(\grad f(x,y,z)=(2x,z\sin(y),\cos(y))\) and \(\vect g'(t)=(1,2t,-1/t^2)\text{.}\) Then by the chain rule
\begin{align*}
\frac{d}{dt}f\circ\vect g(t)\amp =\grad f(t,t^2,1/t)\cdot\vect g'(t)\\
\amp =(2t, t^{-1}\sin(t^2), \cos(t^2))\cdot(1,2t,-1/t^2)\\
\amp =2t+2\sin(t^2)-\frac{\cos(t^2)}{t^2}\text{.}
\end{align*}
Example 4.21.
Let \(r(\vect x):=\|x\|=\sqrt{x_1^2+x_2^2}\) and \(\vect g(t):=(2\cos(t),3\sin(t))\) for \(t\in(0,2\pi)\text{.}\) Compute \((r\circ\vect g)'\text{.}\)
Solution.
We start by computing the gradient of \(r\text{:}\)
\begin{align*}
\frac{\partial r}{\partial x_1}(x_1,x_2)
\amp =\frac{\partial }{\partial x_1}\sqrt{x_1^2+x_2^2}\\
\amp=\frac{2x_1}{2\sqrt{x_1^2+x_2^2}}\\
\amp=\frac{x_1}{\sqrt{x_1^2+x_2^2}}
\end{align*}
and
\begin{align*}
\frac{\partial r}{\partial x_2}(x_1,x_2)
\amp =\frac{\partial }{\partial x_2}\sqrt{x_1^2+x_2^2}\\
\amp=\frac{2x_2}{2\sqrt{x_1^2+x_2^2}}\\
\amp=\frac{x_2}{\sqrt{x_1^2+x_2^2}}\text{.}
\end{align*}
Hence
\begin{equation*}
\grad r(\vect x)
=\frac{1}{\sqrt{x_1^2+x_2^2}}(x_1,x_2)
=\frac{\vect x}{r(\vect x)}\text{.}
\end{equation*}
Next note that \(\vect g'(t)=(-2\sin(t),3\cos(t))\text{.}\) By the chain rule
\begin{align*}
\frac{d}{dt}r\circ\vect g(t)
\amp=\grad r(\vect g(t))\cdot\vect g'(t)\\
\amp =\frac{(2\cos(t),3\sin(t))}{\sqrt{4\sin^2(t)+9\cos^2(t)}}
\cdot(-2\sin(t),3\cos(t))\\
\amp =\frac{-4\sin(t)\cos(t)+9\sin(t)\cos(t)}{\sqrt{4\sin^2(t)+9\cos^2(t)}}
=\frac{5\sin(t)\cos(t)}{\sqrt{4+5\cos^2(t)}}\text{.}
\end{align*}
Example 4.22.
Compute the Jacobian matrix of
\begin{equation*}
\vect g(s,t):=\Bigl(\frac{-t}{s^2+t^2},\frac{s}{s^2+t^2}\Bigr)\text{.}
\end{equation*}
Solution.
We need both partial derivatives for every component function:
\begin{align*}
\frac{\partial}{\partial s}\frac{-t}{s^2+t^2}
\amp =\frac{2st}{(s^2+t^2)^2}\\
\frac{\partial}{\partial t}\frac{-t}{s^2+t^2}
\amp =-\frac{s^2+t^2-2t^2}{(s^2+t^2)^2}
=\frac{t^2-s^2}{(s^2+t^2)^2}\\
\frac{\partial}{\partial s}\frac{s}{s^2+t^2}
\amp =\frac{s^2+t^2-2s^2}{(s^2+t^2)^2}
=\frac{t^2-s^2}{(s^2+t^2)^2}\\
\frac{\partial}{\partial t}\frac{s}{s^2+t^2}
\amp =-\frac{2st}{(s^2+t^2)^2}\text{.}
\end{align*}
Hence the Jacobian matrix of \(\vect g\) is
\begin{equation*}
J_{\vect g}(s,t)=\frac{1}{(s^2+t^2)^2}
\begin{bmatrix}
2st \amp t^2-s^2 \\
t^2-s^2 \amp -2st
\end{bmatrix}\text{.}
\end{equation*}
Example 4.23.
Consider the function \(f(x,t):=xe^{-t}\text{.}\) Then the partial derivatives are
\begin{align*}
\frac{\partial}{\partial x}f(x,t) \amp =e^{-t}\\
\frac{\partial}{\partial t}f(x,t) \amp =-xe^{-t}
\end{align*}
If we assume that \(x\) is a function of \(t\) as well, then by the chain rule
\begin{align*}
\frac{d}{dt}f(x(t),t)\amp =\frac{\partial}{\partial x}f(x(t),t)x'(t)+\frac{\partial}{\partial t}f(x(t),t)\frac{dt}{dt}\\
\amp =e^{-t}x'(t)-x(t)e^{-t}\\
\amp=\bigl(x'(t)-x(t)\bigr)e^{-t}.
\end{align*}
Hence,
\begin{equation*}
\frac{\partial}{\partial t}f(x,t)=-xe^{-t}
\end{equation*}
and
\begin{equation*}
\frac{d}{dt}f(x,t)=(x'-x)e^{-t}
\end{equation*}
are
not the same. The first is the partial derivative, and the second is called the
total derivative of
\(f\) with respect to
\(t\text{.}\) (See also
Warning 4.3.)