Big-O notation explained by a self-taught programmer Big-O is easy to calculate, if you know how

Introduction

Hi there. I'm Justin Abrahms. I'm a self taught programmer, as you likely are. I'm writing this book because of a tweet-sized statement that really hit me at the end of 2012.

"Remember the version of you from like 5 years ago? There are people like that right now. Do what you can to help them out."

5 years is a long time. In 2007, I was struggling with the practical ramifications of not understanding computer science concepts. I had just written a backend CMS for someone which had some awful scaling issues. I went with the requirements that were laid out by the architect without questioning. Three months into this endeavor, the system was at a point where we could put data in and play around. Crushingly, it was dog slow. Turns out, I found myself doing O(N^2) queries on one of the main pages. Don't know what that means? Don't worry, neither did I.

To ensure it wouldn't happen again, I learned a bit more about the fundamentals of computer science. One of the nice things is it doesn't take much learning before you see strong gains in your daily work. I went on to work for Google for a few years, where I got a much broader understanding of computer science. Its amazing how much you learn when you work with incredibly talented and smart people!

I don't want you to have to go through the painful process of needing to know something, but not having it in mental storage. That's where I was 5 years ago. By finishing this book, you should have enough of an understanding of what's going on under the hood of your code that you'll be able to talk about why something might be slow at a conceptual level. You'll know of some ways to speed up your code by using the correct tool (data structure) for the job.

This book is broken up into two sections. In the first section, we talk about Big-O. Big-O is how we talk about the speed of an "algorithm". That's as scary as it might sound. We'll cover it all nice and slow.

In the other section, we'll talk about data structures and algorithms. Data structures are basically containers that hold data. If you've written more than 5 lines of code, you've certainly used one. Algorithms can be thought of as things we do to data structures. We'll go through a few of the most common data structures, what their relative merits are and go over how they work (at a high level). At the end of this section, you'll have written a little bit of code. The language you choose doesn't matter much. They're all about the same for this. That's one of the great things about working with theory!

Big-O

This is the first in a three post series. The second post talks about how to calculate Big-O. The third article talks about understanding the formal definition of Big-O.

Big-O notation used to be a really scary concept for me. I thought this is how "real" programmers talked about their code. It was all the more scary because the academic descriptions (such as Wikipedia) made very little sense to me. This is frustrating because the underlying concepts aren't actually that hard.

Simply put, Big-O notation is how programmers talk about algorithms. Algorithms are another scary topic which I'll cover in another post, but for our purposes, let's say that "algorithm" means a function in your program (which isn't too far off). A function's Big-O notation is determined by how it responds to different inputs. How much slower is it if we give it a list of 1000 things to work on instead of a list of 1 thing?

Consider this code:

def item_in_list(to_check, the_list):
    for item in the_list:
        if to_check == item:
          return True
    return False

So if we call this function like item_in_list(2, [1,2,3]), it should be pretty quick. We loop over each thing in the list and if we find the first argument to our function, return True. If we get to the end and we didn't find it, return False.

The "complexity" of this function is O(n). I'll explain what this means in just a second, but let's break down this mathematical syntax. O(n) is read "Order of N" because the O function is also known as the Order function. I think this is because we're doing approximation, which deals in "orders of magnitude".

"Orders of magnitude" is YET ANOTHER mathematical term which basically tells the difference between classes of numbers. Think the difference between 10 and 100. If you picture 10 of your closest friends and 100 people, that's a really big difference. Similarly, the difference between 1,000 and 10,000 is pretty big (in fact, its the difference between a junker car and a lightly used one). It turns out that in approximation, as long as you're within an order of magnitude, you're pretty close. If you were to guess the number of gumballs in a machine, you'd be within an order of magnitude if you said 200 gumballs. 10,000 gumballs would be WAY off.

gumball_machine.jpg

Figure 1: A gumball machine whose number of gumballs is probably within an order of magnitude of 200.

Back to dissecting O(n), this says that if we were to graph the time it takes to run this function with different sized inputs (e.g. an array of 1 item, 2 items, 3 items, etc), we'd see that it approximately corresponds to the number of items in the array. This is called a linear graph. This means that the line is basically straight if you were to graph it.

Some of you may have caught that if, in the code sample above, our item was always the first item in the list, our code would be really fast! This is true, but Big-O is all about the approximate worst-case performance of doing something. The worst case for the code above is that the thing we're searching for isn't in the list at all. (Note: The math term for this is "upper bound", which means its talking about the mathematic limit of awfulness).

If you wanted to see a graph for these functions, you ignore the O() function and change the variable n for x. You can then type that into Wolfram Alpha as "plot x" which will show you a diagonal line. The reason you swap out n for x is that their graphing program wants x as its variable name because it corresponds to the x axis. The x-axis getting bigger from left to right corresponds to giving bigger and bigger arrays to your function. The y-axis represents time, so the higher the line, the slower it is.

o_n__plot.png

Figure 2: Runtime characteristics of an O(n) function

So what are some other examples of this?

Consider this function:

def is_none(item):
    return item is None

This is a bit of a silly example, but bear with me. This function is called O(1) which is called "constant time". What this means is no matter how big our input is, it always takes the same amount of time to compute things. If you go back to Wolfram and plot 1, you'll see that it always stays the same, no matter how far right we go. If you pass in a list of 1 million integers, it'll take about the same time as if you were going to pass in a list of 1 integer. Constant time is considered the best case scenario for a function.

o_1__plot.png

Figure 3: Runtime characteristics of an O(1) function

Consider this function:

def all_combinations(the_list):
   results = []
   for item in the_list:
       for inner_item in the_list:
           results.append((item, inner_item))
   return results

This matches every item in the list with every other item in the list. If we gave it an array [1,2,3], we'd get back [(1,1) (1,2), (1,3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3)]. This is part of the field of combinatorics (warning: scary math terms!), which is the mathematical field which studies combinations of things. This function (or algorithm, if you want to sound fancy) is considered O(n^2). This is because for every item in the list (aka n for the input size), we have to do n more operations. So n * n == n^2.

Below is a comparison of each of these graphs, for reference. You can see that an O(n^2) function will get slow very quickly where as something that operates in constant time will be much better. This is particularly useful when it comes to data structures, which I'll post about soon.

runtime_comparison.png

Figure 4: Comparison of O(n2) vs O(n) vs O(1) functions

This is a pretty high level overview of Big-O notation, but hopefully gets you acquainted with the topic. There's a coursera course which can give you a more in depth view into this topic, but be warned that it will hop into mathematic notation very quickly. If anything here doesn't make sense, send me an email.

Update: I've also written about how to calculate Big-O.

I'm thinking of writing a book on this topic. If this is something you'd like to see, please express your interest in it here.

Calculating Big-O

This is the second in a three post series. The first post explains Big-O from a self-taught programmer's perspective. The third article talks about understanding the formal definition of Big-O.

So now that we know what Big-O is, how do we calculate the Big-O classification of a given function? It's just as easy as following along with your code and counting along the way.

  def count_ones(a_list):
      total = 0
      for element in a_list:
          if element == 1:
              total += 1
      return total

The above code is a classic example of an O(n) function. This is because we have to loop over every element that we get in order to complete our calculation. We can trace this by following along the code and counting. We're doing a few calculations here.

  1. We're setting total to zero.
  2. We loop through a_list.
  3. We check if element is equal to 1.
  4. We increment total by one a few times.

So setting a counter to zero is a constant time operation. If you think about what's happening inside the computer, we're setting a chunk of memory to some new value. Because we've hard-coded the zero here, it happens in constant time. There's no way to call this function or alter global state that we could change the operation. This is an O(1) operation.

Next, we loop through the list. So we have to look at each item in the list. This number of operations changes depending on the size of the list. If it's a list of 10 things, we do it 10 times. If it's 75, we do 75 operations. In mathematical terms, this means that the time it takes to do something increases linearly with its input. We use a variable to represent the size of the input, which everyone in the industry calls n. So the "loop over the list" function is O(n) where n represents the size of a_list.

Checking whether an element is equal to 1 is an O(1) operation. A way to prove to ourselves that this is true is to think about it as a function. If we had an is_equal_to_one() function, would it take longer if you passed in more values (eg is_equal_to_one(4) vs is_equal_to_one([1,2,3,4,5])? It wouldn't because of the fixed number 1 in our comparison. It took me a while of conferring with other people about why this was true before I was convinced. We decided that binary comparisons are constant time and that's what a comparison to 1 eventually got to.

Next, we increment total by 1. This is like setting total to zero (but you have to do addition first). Addition of one, like equality, is also constant time.

So where are we? We have \(O(1) + O(n) * (O(1) + O(1))\). This is because we do 1 up front, constant time operation, then (potentially) 2 more constant item things for each item in the list. In math terms, that reduces to \(O(2n) + O(1)\). In big O, we only care about the biggest "term" here. "Term" is the mathematical word that means "portion of an algebraic statement".

To figure out the biggest expression if you don't remember the order, you can just cheat and graph them. From the other article, we know that to graph these things you just replace n with x.

2n_vs_1__plot.png

Figure 5: If you don't remember if 2x is bigger than 1, you can always graph them to be sure.

So as I was saying, in calculating Big-O, we're only interested in the biggest term: O(2n). Because Big-O only deals in approximation, we drop the 2 entirely, because the difference between 2n and n isn't fundamentally different. The growth is still linear, it's just a faster growing linear function. You drop the products of (aka things multiplying) the variables. This is because these change the rate of growth, but not the type of growth. If you want a visual indication of this, compare the graph of x^2 vs 2x vs x. x^2 is a different type of growth from the other two (quadratic [which is just a math-y way of saying "squared"] vs linear), just like 2n is a different type of growth from 1 in the graph above (linear vs constant). Even if you pick a really high multiplier like 10,000, we can still beat it with x^2 because eventually the line for x^2 will be higher if we go far enough to the right.

To sum it up, the answer is this function has an O(N) runtime (or a linear runtime). It runs slower the more things you give it, but that should grow at a predictable rate.

The key aspects to determining the Big-O of a function is really just as simple as counting. Enumerate the different operations your code does (be careful to understand what happens when you call out into another function!), then determine how they relate to your inputs. From there, its just simplifying to the biggest term and dropping the multipliers.

If you found this interesting, I'm thinking about writing up a longer form book on this topic. Head over to leanpub and express your interest in hearing more. If you have any questions, please feel free to email me.

Thanks to Wraithan, Chris Dickinson and Cory Kolbeck for their review.

Big-O of things other than computational complexity.

Data structures & Algorithms

Data Structure: List

Tell me about the abstract concept of lists.

LinkedList

List Algorithms

Insert
Remove
Interlude: Why do people care about sorting?

Alternative Implementation: ArrayList

Algorithm: Ordering & Sorting

The Map

HashMap

TreeMap

Algorithm: Searching

The Set

Hacky Set (hint: use a map)

Hash Set

Tree Set

© 2012 - 2020 · HomeTheme