What the Standard Proof of Cauchy MVT is Hiding
The standard proof of the Cauchy Mean Value Theorem defines an auxiliary function that conveniently satisfies the hypotheses of Rolle’s theorem. It’s algebraically tidy, but it hides the most salient idea:
The Cauchy MVT is just the Lagrange MVT applied to a parametric curve.
Lagrange is a special case of Cauchy, and once you see the geometry, the auxiliary function writes itself.
The theorem
Cauchy Mean Value Theorem. Let $f$ and $g$ be continuous on $[a, b]$ and differentiable on $(a, b)$, with $g'(t) \neq 0$ for all $t \in (a, b)$. Then there exists $c \in (a, b)$ such that
$$\frac{f(b) - f(a)}{g(b) - g(a)} = \frac{f'(c)}{g'(c)}.$$
The rest of this article is about where that statement comes from geometrically.
Recalling how we proved Lagrange
In the article on the Mean Value Theorem, we worked with a function $f(x)$ on $[a, b]$. The key observation was that the secant line $L(x)$ captured all the “linear growth” of $f$ across the interval. Subtracting it from $f$, and recentering at $a$ with a horizontal shift, gave us an auxiliary function that vanished at both endpoints, making Rolle’s theorem applicable.
We’re going to do exactly the same thing here. The only difference is that our curve now lives in a plane parametrised by $t$.
The parametric picture
Suppose $f$ and $g$ are continuous on $[a, b]$, differentiable on $(a, b)$, and $g'(t) \neq 0$ throughout $(a, b)$.
Instead of thinking about $f$ and $g$ separately, consider the parametric curve
$$ t \;\longmapsto\; \bigl(g(t),\, f(t)\bigr), \qquad t \in [a, b]. $$As $t$ runs from $a$ to $b$, this traces a curve in the $(X, Y)$-plane, starting at $(g(a), f(a))$ and ending at $(g(b), f(b))$.
![The parametric curve $(g(t), f(t))$ for $f(t) = t^2$, $g(t) = t^3$ on $[0.5, 1.5]$. The secant $L$ connects the two endpoints of the curve.](/images/cauchy_mvt_parametric.png)
The parametric curve $(g(t), f(t))$ for $f(t) = t^2$, $g(t) = t^3$ on $[0.5, 1.5]$. The secant $L$ connects the two endpoints of the curve.
Notice in the figure that the secant is a straight line in the $(g(t), f(t))$ plane, connecting the two endpoints of the curve. The slope of that line is exactly $r$. The whole game is to find a point on the curve where the tangent direction has that same slope, which is what the theorem guarantees.
The slope of the secant line joining the two endpoints is
$$ r = \frac{f(b) - f(a)}{g(b) - g(a)}. $$Removing the trend
The secant line through $(g(a), f(a))$ with slope $r$, expressed as a function of $t$, is
$$ L(t) = f(a) + r\,\bigl(g(t) - g(a)\bigr). $$One thing worth pausing on: $L(t)$ is not a linear function of $t$. It is linear in $g(t)$. That is the correct notion of linearity here, because the horizontal axis of our picture is $g(t)$, not $t$ itself. The term $g(a)$ is the horizontal shift that pins $L$ to the correct starting point $(g(a), f(a))$. You can verify directly: $L(a) = f(a)$ and $L(b) = f(a) + r(g(b)-g(a)) = f(b)$.
Now subtract $L(t)$ from $f(t)$ and define the auxiliary function
$$ \boxed{\phi(t) = f(t) - L(t) = f(t) - f(a) - r\,\bigl(g(t) - g(a)\bigr).} $$By construction:
$$ \phi(a) = f(a) - f(a) - r\cdot 0 = 0, \qquad \phi(b) = f(b) - f(a) - r\,(g(b)-g(a)) = 0. $$So $\phi(a) = \phi(b) = 0$, and $\phi$ is continuous on $[a, b]$ and differentiable on $(a, b)$, which is exactly what Rolle’s theorem needs.
Applying Rolle’s theorem
By Rolle’s theorem, there exists a $c \in (a, b)$ such that $\phi'(c) = 0$. Differentiating:
$$ \phi'(t) = f'(t) - r\, g'(t). $$Setting $\phi'(c) = 0$:
$$ f'(c) - r\, g'(c) = 0 \implies r = \frac{f'(c)}{g'(c)}. $$Substituting back the definition of $r$:
$$ \frac{f(b) - f(a)}{g(b) - g(a)} = \frac{f'(c)}{g'(c)}. $$This is the Cauchy Mean Value Theorem: there exists $c \in (a, b)$ where the slope of the secant in the parametric plane equals the ratio of the instantaneous rates of change.
Lagrange as a special case
Set $g(t) = t$. Then $g'(t) = 1$, and the Cauchy MVT immediately collapses to
$$ \frac{f(b) - f(a)}{b - a} = f'(c), $$which is exactly the Lagrange MVT. The parametric picture reduces to the ordinary $(x, f(x))$ graph you’re already familiar with.