The Chain Rule

Section 4.3 The Chain Rule

f, g

are differentiable functions of one variable and

g (x)

is in the domain of

f

then the chain rule asserts that

🔗

[f \circ g (x)]^{'} = f^{'} (g (x)) g^{'} (x),

as you know from first year calculus. We now want to generalise the above formula if

f

is a function of several variables. One key requirement was that

g

has its values in the domain of

f .

Hence, if

f

is defined on a subset

D \subset R^{N},

then

g

must to be a function with values in

R^{N},

that is, a vector valued function. In general

g

could be a function of several variables as well.

🔗

In what follows we assume that

f

is a function of

x = (x_{1}, \dots, x_{N}),

and that

g

a vector valued function with values in

R^{N}

of the variables

y = (y_{1}, \dots, y_{k}) .

As usual, we denote the component functions of

g

g_{i},

i = 1, \dots, N .

We are now in a position to state one possible version of the chain rule in vector calculus.

🔗

Theorem 4.17. Chain rule.

Suppose that

f,

g

are as above. Moreover, assume that

y

is an interior point of the domain of

f \circ g,

that

g

has a partial derivative with respect to

y_{j},

and that

grad f (g (y))

exists and is continuous at

y .

Then

🔗

\begin{aligned} \frac{\partial}{\partial y_{j}} f (g (y)) & = grad f (g (y)) \cdot \frac{\partial}{\partial y_{j}} g (y) \\ = \sum_{i = 1}^{N} \frac{\partial f}{\partial x_{i}} (g (y)) \frac{\partial}{\partial y_{j}} g_{i} (y) . \end{aligned}

🔗

g

is a function of one variable,

t,

then

🔗

\begin{aligned} \frac{d}{d t} f (g (t)) & = grad f (g (t)) \cdot g^{'} (t) \\ = \sum_{i = 1}^{N} \frac{\partial f}{\partial x_{i}} (g (t)) \frac{d}{d t} g_{i} (t) . \end{aligned}

🔗

Proof.

By definition of the partial derivative of a function we treat all variables as constant except for one. Hence the proof of the first formula can be reduced to the proof of the second one. The general proof is quite tedious, so we only illustrate the main ideas if

N = 2 .

We need to compute the limit as

h \to 0

D (h) := \frac{1}{h} (f (g_{1} (t + h), g_{2} (t + h)) - f (g_{1} (t), g_{2} (t))) .

To do so we rewrite the expression as

\begin{aligned} D (h) & = \frac{f (g_{1} (t + h), g_{2} (t + h)) - f (g_{1} (t), g_{2} (t + h))}{g_{1} (t + h) - g_{1} (t)} \frac{g_{1} (t + h) - g_{1} (t)}{h} \\ + \frac{f (g_{1} (t), g_{2} (t + h)) - f (g_{1} (t), g_{2} (t))}{g_{2} (t + h) - g_{2} (t)} \frac{g_{2} (t + h) - g_{2} (t)}{h} \end{aligned}

adding and subtracting a term. As

g_{i}

is differentiable at

t

it follows that

lim_{h \to 0} \frac{g_{i} (t + h) - g_{i} (t)}{h} = g_{i}^{'} (t)

for

i = 1, 2 .

As a differentiable function of one variable is continuous, we have that

g_{i} (t + h) - g_{i} (t) \to 0

h \to 0 .

Hence, by definition of partial derivatives,

\frac{f (g_{1} (t), g_{2} (t + h)) - f (g_{1} (t), g_{2} (t))}{g_{2} (t + h) - g_{2} (t)} \overset{h \to 0}{\to} \frac{\partial f}{\partial x_{2}} (g (t)) .

To deal with the remaining term we apply the mean value theorem to the function

x_{1} \mapsto f (x_{1}, g_{2} (t + h)) .

By assumption, that function is differentiable and its derivative is continuous at

g_{1} (t) .

Thus, by the mean value theorem there exist

c_{h} \in R

such that

\begin{aligned} f (g_{1} (t + h), g_{2} (t + h)) & - f (g_{1} (t), g_{2} (t + h)) \\ = (g_{1} (t + h) - g_{1} (t)) \frac{\partial f}{\partial x_{1}} f (c_{h}, g_{2} (t + h)) \end{aligned}

with

| c_{h} - g_{1} (t)) | \leq | g_{1} (t + h) - g_{1} (t)) | .

g_{1}

is continuous at

t

it follows that

c_{h} \to g_{1} (t)) \to 0

h \to 0 .

Finally, by continuity of

\partial f / \partial x_{1}

with respect to

x_{1}

we conclude that

\frac{f (g_{1} (t + h), g_{2} (t + h)) - f (g_{1} (t), g_{2} (t + h))}{g_{1} (t + h) - g_{1} (t)} \overset{h \to 0}{\to} \frac{\partial f}{\partial x_{1}} (g (t))

h \to 0 .

If we put everything together it follows that

\begin{aligned} lim_{h \to 0} D (h) & = \frac{\partial f}{\partial x_{1}} (g (t)) g_{1}^{'} (t) + \frac{\partial f}{\partial x_{2}} (g (t)) g_{2}^{'} (t) \\ = grad f (g (t)) \cdot g^{'} (t) \end{aligned}

as required. For general

N

there are more terms to add and subtract, but the basic ideas stay the same.

f

is a vector valued function then the chain rule applies to every component function. We want to use this to derive a formula for the derivative of a composition