Understanding the formal definition of Big-O
This is the third in a three post series. The first post explains Big-O from a self-taught programmer's perspective. The second post talks about how to calculate Big-O.
In my original Big-O piece, I’ve made a bunch of simplifying assumptions and told you a bunch of half truths. It might seem a bit disingenuous, but really.. these half-truths are helpful because (unless you’re going after a formal CS degree) they don’t really matter.
But for those who are looking to get a formal CS degree, these sorts of differences matter to professors and the like. For that, let’s actually parse what the formal definition means.
The formal definition via wikipedia:
Let f and g be two functions defined on some subset of the real numbers. One writes
f(x) = O(g(x)) as x -> infinity
if and only if there is a positive constant M such that for all sufficiently large values of x, the absolute value of f(x) is at most M multiplied by the absolute value of g(x). That is, f(x) = O(g(x)) if and only if there exists a positive real number M and real number x0 such that |f(x)| <= M * |g(x)| for all x >= x0.
Ugh. Let’s parse that.
We have two undefined functions called f and g. Our f function is the thing we’re trying to find the Big-O of. It’s a block of code.
The g function is an attempt to classify our
f function as
O, but we’ll use
O(n) for example purposes).
Following this definition, we’ll know that our f function is
and only if we can graph the two of these and (starting at an x value
of your choosing) the lines never cross one another.
You get to choose the x value to start at in order to account for the
step in calculating Big-O where we drop all of the constants. No one
cares if the Big-O is “~O(45n)~”. We drop that because it’s the same
shape of a curve, though a bit steeper. Because the definition above
says we can choose our own constant (
M), we could just choose 46. 46n
will always be above 45n and never cross once n=1.
The same can’t be true for smaller Big-O functions. No matter what
real-number (e.g. not infinity) constant we choose to multiply
O(n) will always cross it at some point.
As always, graphing makes this a bit clearer. Here is an example of our 45n vs 46n example above.
set title "45n vs 46n" set xrange[0:10] set yrange[0:500] plot 45*x title "45n", 46*x title "46n"
You’ll see that once n=1, they never cross each other again. If we try
the 1,000,000 *
log(n) function versus a simple
O(n), we’ll see they
cross each other around the x=16,000,000 mark.
set title "log n vs n" set xrange[10000000:20000000] set yrange[10000000:20000000] set label "n approx 16.5 million" at 16500000,16000000 plot 1000000 * log(x) title "O(1,000,000 * log(n))", x title "O(n)"
Because the red line is crossed, it means the green line doesn’t “bound” the red line (i.e. it is not a boundary).
You might notice that this definition means that the
O(n³) because it will also never cross. This is true, but not