Khayyam Math

The chain rule: differentiating composite functions

Differentiate f(g(x)) by multiplying f'(g(x)) and g'(x) — the most-used rule in all of calculus.

xinputg(x) = 3x² + 1innerf(u) = u⁵outeryoutputdy/dx = f'(g(x)) · g'(x) = 5·(3x² + 1)⁴ · 6xdy/dx = 30x · (3x² + 1)⁴outer derivative at inner value, times inner derivative

Try this live →

What this shows

A composite function f(g(x)) feeds the output of one function into another. The chain rule says: differentiate the outer function with the inner function still inside, then multiply by the derivative of the inner function. Symbolically:

    d/dx [ f(g(x)) ] = f'(g(x)) · g'(x)

For y = (3x² + 1)⁵, identify the layers:

    outer  f(u) = u⁵        →   f'(u) = 5u⁴
    inner  g(x) = 3x² + 1   →   g'(x) = 6x

Apply the rule:

    dy/dx = f'(g(x)) · g'(x)
          = 5·(3x² + 1)⁴ · 6x
          = 30x · (3x² + 1)⁴

Notice: the inner function 3x² + 1 stays unevaluated inside f'. A common mistake is to also simplify it — but it must remain because (3x² + 1) is the argument of the outer derivative, not a variable to be solved for.

Where it shows up

The chain rule is the workhorse of every applied calculus problem. Related rates (a balloon's volume changing through its radius), implicit differentiation (dy/dx for y² + x² = 1), the gradient of any deep-learning loss function (gradient descent IS the chain rule, applied many times) — all of them are chain rule.

Generalisations include the multivariate chain rule (for functions of several variables), the matrix chain rule (which is exactly back-propagation in neural networks), and the inverse-function theorem (a special case where g and f are inverses).

Frequently asked questions

What if there are more than two layers?

Apply the chain rule repeatedly, from the outside in. For h(g(f(x))) you get h'(g(f(x))) · g'(f(x)) · f'(x). Each layer contributes its derivative evaluated at whatever was inside it.

Why does it work?

From the definition of the derivative: Δy/Δx = (Δy/Δu)·(Δu/Δx) when u depends smoothly on x. Letting Δx → 0 makes both ratios into derivatives, and the product becomes the chain rule. A rigorous proof handles the case Δu = 0 separately to avoid dividing by zero.

How is it related to back-propagation in neural networks?

A neural network is a deeply nested composition of layers, each parameterised by weights. Back-propagation computes the gradient of the loss with respect to each weight by repeated application of the chain rule, working backwards from the loss layer to the input layer. The order of multiplication is what gives back-propagation its O(n) cost instead of O(n²).

What's the multivariate version?

If z = f(x, y) where x = g(t) and y = h(t), then dz/dt = (∂f/∂x)·(dx/dt) + (∂f/∂y)·(dy/dt). Each path from t to z contributes a product of partial derivatives along the path; sum over all paths.

Related topics