Factorize Difference of Two Powers
This article teaches how to think in order to derive the factorization of the difference of powers without memorization.
This article teaches how to think in order to derive the factorization of the difference of powers without memorization.
The standard proof of the Cauchy Mean Value Theorem defines an auxiliary function that conveniently satisfies the hypotheses of Rolle’s theorem. It’s algebraically tidy, but it hides the most salient idea:
The Cauchy MVT is just the Lagrange MVT applied to a parametric curve.
Lagrange is a special case of Cauchy, and once you see the geometry, the auxiliary function writes itself.
Cauchy Mean Value Theorem. Let $f$ and $g$ be continuous on $[a, b]$ and differentiable on $(a, b)$, with $g'(t) \neq 0$ for all $t \in (a, b)$. Then there exists $c \in (a, b)$ such that
We look at what it means to subtract the linear trend from a function, why the secant line is the natural choice, and how this simple idea unlocks the Mean Value Theorem from Rolle’s theorem.
You’re probably well acquainted with the $\sin(2x)$ function.
What transformation will transform our traditional $\sin(2x)$ into this:

$\sin(2x) + 0.4x$ in blue, its linear trend $0.4x$ in red, and $\sin(2x)$ recovered in teal after removing the trend.
Let $[a,b] \subset \mathbb{R}$ where $a \neq b$.

We partition that interval using points $x_0, x_1, \dots, x_n$ where $x_0 = a$ and $x_n = b$.

We want an expression for the total length covered by summing all the partition pieces. Each piece contributes $x_{i+1} - x_i$, so the total is
A scalar function takes a point in space as input and assigns a real number to it. It is the perfect tool to quantify properties across 3d space. For example, let $T(x,y,z)$ be a temperature function that takes a position in 3d space $(x,y,z)$ and returns a scalar temperature. Nothing constrains us to temperature of course. Scalar functions describe temperature, height, density, and much more. When we apply $\nabla$ to a scalar function $f$, we obtain $\nabla f$, which we define as the gradient of $f$. We often read that the gradient is the direction of steepest change, and the motivation usually given links to the directional derivative as the direction which maximises change. I propose an alternative approach that builds a more visual intuition. If
A function is a mapping from the elements of one set to another. The most popular functions map generally from a set of numbers to another. For example, the function $f : \mathbb{R} \to \mathbb{R}$ where $f(x) = x^2$ maps a real number to the square of that real number.
But a set is simply a collection of distinct objects, and nothing limits those objects to numbers alone. As wild as the idea may sound, one can define a function that maps to another function. We call this an operator.