Skip to main content

Section 4.3 The Chain Rule

If f,g are differentiable functions of one variable and g(x) is in the domain of f then the chain rule asserts that
[fg(x)]=f(g(x))g(x),
as you know from first year calculus. We now want to generalise the above formula if f is a function of several variables. One key requirement was that g has its values in the domain of f. Hence, if f is defined on a subset DRN, then g must to be a function with values in RN, that is, a vector valued function. In general g could be a function of several variables as well.
In what follows we assume that f is a function of x=(x1,,xN), and that g a vector valued function with values in RN of the variables y=(y1,,yk). As usual, we denote the component functions of g by gi, i=1,,N. We are now in a position to state one possible version of the chain rule in vector calculus.

Proof.

By definition of the partial derivative of a function we treat all variables as constant except for one. Hence the proof of the first formula can be reduced to the proof of the second one. The general proof is quite tedious, so we only illustrate the main ideas if N=2. We need to compute the limit as h0 of
D(h):=1h(f(g1(t+h),g2(t+h))f(g1(t),g2(t))).
To do so we rewrite the expression as
D(h)=f(g1(t+h),g2(t+h))f(g1(t),g2(t+h))g1(t+h)g1(t)g1(t+h)g1(t)h+f(g1(t),g2(t+h))f(g1(t),g2(t))g2(t+h)g2(t)g2(t+h)g2(t)h
adding and subtracting a term. As gi is differentiable at t it follows that
limh0gi(t+h)gi(t)h=gi(t)
for i=1,2. As a differentiable function of one variable is continuous, we have that gi(t+h)gi(t)0 as h0. Hence, by definition of partial derivatives,
f(g1(t),g2(t+h))f(g1(t),g2(t))g2(t+h)g2(t)h0fx2(g(t)).
To deal with the remaining term we apply the mean value theorem to the function x1f(x1,g2(t+h)). By assumption, that function is differentiable and its derivative is continuous at g1(t). Thus, by the mean value theorem there exist chR such that
f(g1(t+h),g2(t+h))f(g1(t),g2(t+h))=(g1(t+h)g1(t))fx1f(ch,g2(t+h))
with |chg1(t))||g1(t+h)g1(t))|. As g1 is continuous at t it follows that chg1(t))0 as h0. Finally, by continuity of f/x1 with respect to x1 we conclude that
f(g1(t+h),g2(t+h))f(g1(t),g2(t+h))g1(t+h)g1(t)h0fx1(g(t))
as h0. If we put everything together it follows that
limh0D(h)=fx1(g(t))g1(t)+fx2(g(t))g2(t)=gradf(g(t))g(t)
as required. For general N there are more terms to add and subtract, but the basic ideas stay the same.
If f is a vector valued function then the chain rule applies to every component function. We want to use this to derive a formula for the derivative of a composition fg, which is a vector valued function. Before we do so we introduce the `Jacobian matrix’ of a function.

Definition 4.18. Jacobian matrix.

Suppose that f is a function defined on a subset of RN with values in Rk. If all entries exist, the matrix
Jf(x):=[x1f1(x)xNf1(x)x1fk(x)xNfk(x)]
is called the Jacobian matrix of f at x.
Note that Jf(x)=gradf(x) if f is scalar valued, that is, k=1. If we apply Theorem 4.17 to every component function of f then we get the following formula for the Jacobian matrix of a composition of functions.

Example 4.20.

Let f(x,y,z)=x2zcos(y) and g(t):=(t,t2,1/t). To compute (fg) we first note that gradf(x,y,z)=(2x,zsin(y),cos(y)) and g(t)=(1,2t,1/t2). Then by the chain rule
ddtfg(t)=gradf(t,t2,1/t)g(t)=(2t,t1sin(t2),cos(t2))(1,2t,1/t2)=2t+2sin(t2)cos(t2)t2.

Example 4.21.

Let r(x):=x=x12+x22 and g(t):=(2cos(t),3sin(t)) for t(0,2π). Compute (rg).
Solution.
We start by computing the gradient of r:
rx1(x1,x2)=x1x12+x22=2x12x12+x22=x1x12+x22
and
rx2(x1,x2)=x2x12+x22=2x22x12+x22=x2x12+x22.
Hence
gradr(x)=1x12+x22(x1,x2)=xr(x).
Next note that g(t)=(2sin(t),3cos(t)). By the chain rule
ddtrg(t)=gradr(g(t))g(t)=(2cos(t),3sin(t))4sin2(t)+9cos2(t)(2sin(t),3cos(t))=4sin(t)cos(t)+9sin(t)cos(t)4sin2(t)+9cos2(t)=5sin(t)cos(t)4+5cos2(t).

Example 4.22.

Compute the Jacobian matrix of
g(s,t):=(ts2+t2,ss2+t2).
Solution.
We need both partial derivatives for every component function:
sts2+t2=2st(s2+t2)2tts2+t2=s2+t22t2(s2+t2)2=t2s2(s2+t2)2sss2+t2=s2+t22s2(s2+t2)2=t2s2(s2+t2)2tss2+t2=2st(s2+t2)2.
Hence the Jacobian matrix of g is
Jg(s,t)=1(s2+t2)2[2stt2s2t2s22st].

Example 4.23.

Consider the function f(x,t):=xet. Then the partial derivatives are
xf(x,t)=ettf(x,t)=xet
If we assume that x is a function of t as well, then by the chain rule
ddtf(x(t),t)=xf(x(t),t)x(t)+tf(x(t),t)dtdt=etx(t)x(t)et=(x(t)x(t))et.
Hence,
tf(x,t)=xet
and
ddtf(x,t)=(xx)et
are not the same. The first is the partial derivative, and the second is called the total derivative of f with respect to t. (See also Warning 4.3.)