Big-O notation explained by a self-taught programmer
This is the first in a three post series. The second post talks about how to calculate Big-O. The third article talks about understanding the formal definition of Big-O.
Big-O notation used to be a really scary concept for me. I thought this is how "real" programmers talked about their code. It was all the more scary because the academic descriptions (such as Wikipedia) made very little sense to me. This is frustrating because the underlying concepts aren't actually that hard.
Simply put, Big-O notation is how programmers talk about algorithms. Algorithms are another scary topic which I'll cover in another post, but for our purposes, let's say that "algorithm" means a function in your program (which isn't too far off). A function's Big-O notation is determined by how it responds to different inputs. How much slower is it if we give it a list of 1000 things to work on instead of a list of 1 thing?
Consider this code:
def item_in_list(to_check, the_list): for item in the_list: if to_check == item: return True return False
So if we call this function like
item_in_list(2, [1,2,3]), it should
be pretty quick. We loop over each thing in the list and if we find
the first argument to our function, return True. If we get to the end
and we didn't find it, return False.
The "complexity" of this function is
O(n). I'll explain what this
means in just a second, but let's break down this mathematical
O(n) is read "Order of N" because the
O function is also
known as the Order function. I think this is because we're doing
approximation, which deals in "orders of magnitude".
"Orders of magnitude" is YET ANOTHER mathematical term which basically tells the difference between classes of numbers. Think the difference between 10 and 100. If you picture 10 of your closest friends and 100 people, that's a really big difference. Similarly, the difference between 1,000 and 10,000 is pretty big (in fact, its the difference between a junker car and a lightly used one). It turns out that in approximation, as long as you're within an order of magnitude, you're pretty close. If you were to guess the number of gumballs in a machine, you'd be within an order of magnitude if you said 200 gumballs. 10,000 gumballs would be WAY off.
Back to dissecting
O(n), this says that if we were to graph the time
it takes to run this function with different sized inputs (e.g. an
array of 1 item, 2 items, 3 items, etc), we'd see that it
approximately corresponds to the number of items in the array. This is
called a linear graph. This means that the line is basically straight
if you were to graph it.
Some of you may have caught that if, in the code sample above, our item was always the first item in the list, our code would be really fast! This is true, but Big-O is all about the approximate worst-case performance of doing something. The worst case for the code above is that the thing we're searching for isn't in the list at all. (Note: The math term for this is "upper bound", which means its talking about the mathematic limit of awfulness).
If you wanted to see a graph for these functions, you ignore the
function and change the variable
x. You can then type that
into Wolfram Alpha as "plot x" which will show you a diagonal
line. The reason you swap out n for x is that their graphing program
x as its variable name because it corresponds to the x
axis. The x-axis getting bigger from left to right corresponds to
giving bigger and bigger arrays to your function. The y-axis
represents time, so the higher the line, the slower it is.
So what are some other examples of this?
Consider this function:
def is_none(item): return item is None
This is a bit of a silly example, but bear with me. This function is
O(1) which is called "constant time". What this means is no
matter how big our input is, it always takes the same amount of time
to compute things. If you go back to Wolfram and
plot 1, you'll see
that it always stays the same, no matter how far right we go. If you
pass in a list of 1 million integers, it'll take about the same time
as if you were going to pass in a list of 1 integer. Constant time is
considered the best case scenario for a function.
Consider this function:
def all_combinations(the_list): results =  for item in the_list: for inner_item in the_list: results.append((item, inner_item)) return results
This matches every item in the list with every other item in the
list. If we gave it an array
[1,2,3], we'd get back
(1,3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3)]. This is part
of the field of combinatorics (warning: scary math terms!), which is
the mathematical field which studies combinations of things. This
function (or algorithm, if you want to sound fancy) is considered
O(n^2). This is because for every item in the list (aka
n for the
input size), we have to do
n more operations. So
n * n == n^2.
Below is a comparison of each of these graphs, for reference. You can
see that an
O(n^2) function will get slow very quickly where as
something that operates in constant time will be much better. This is
particularly useful when it comes to data structures, which I'll post
This is a pretty high level overview of Big-O notation, but hopefully gets you acquainted with the topic. There's a coursera course which can give you a more in depth view into this topic, but be warned that it will hop into mathematic notation very quickly. If anything here doesn't make sense, send me an email.
Update: I've also written about how to calculate Big-O.
I'm thinking of writing a book on this topic. If this is something you'd like to see, please express your interest in it here.