Thinking like a programmer: Inputs & outputs
One of the difficulties in learning to program is not the specific syntax it takes to accomplish a goal, but the thinking that went behind it. As a coworker of mine once said, "I understand what you did, but I wouldn't have thought of doing it that way." One method of thinking about a problem is by considering the inputs and outputs of a program or function.
Let's talk about inputs and outputs from a function. Functions are a good example, but it could apply to modules or distributed system components too.
It's important to think about this because inputs & outputs tell a system how to run. There are a few principles that go along with this. The inputs and outputs should be as close to the function as possible. First, let's back up and talk about "What is an input?".
At the function level, an input are simply the arguments of the
function. Given the add function below, the inputs are a
and b
and
the output is the summation of those two values.
def add(a,b): return a + b
There can be inputs that aren't related to the arguments however. One example would be a function that calculates whether today is October 15th.
def isOct15th(): today = datetime.date() return today.day == 15 and today.month == OCTOBER
This operates by pulling from your system the current time, then use that value to see if today is Oct. 15th. That function has 0 arguments and 1 return value, but it still has the input of the current time. That is to say functions can have inputs that aren't from their argument list. They could be global variables, system state, network connections or anything along those lines.
Let's talk about outputs. Outputs can come in a few flavors. One is the return value of the method. That's the simplest. There are others as well. When they aren't the return value, they're called "side effects". This is a word which is often used with a negative connotation, but isn't globally negative. An example of a side effect might be printing the string "Hello, World!". There's some output from the program. It's affecting something on the screen, but that's separate from its return value.
You can see from this that there are 2 types of inputs and outputs. There are the explicit inputs/outputs which are the method parameters and the return value and there are implicit inputs and outputs which are things like global system state, printing to the terminal, things along those lines.
One of the key pieces in thinking about these is that you want to move as many of these inputs/outputs as possible to explicit roles. The reason for this is that it makes testing easier.
Testing is essentially validating that given some inputs a function
will return some expected outputs. Using our add method from above, we
give it a known input of 1,2
and we validate it returns an expected
output of 3
. From here, you can build a big test suite of various
inputs and the expected outputs for that function.
This gets really difficult when you have these global inputs. Going back to the date example, to write a test for that, you'd have to convince your system that it's actually October 15th to test that it returns true when its October 15th. That's difficult to do without digging into the internals of your system. Another way to write that function would be to take the global time and move it to an explicit method parameter. Now the function becomes..
def isOct15th(date): return date.month == OCTOBER && date.day == 15
Now we can give it various date objects ranging from October 15th to Feb 29th and a bunch of other conditions which would be unfeasible if we were using the system clock.
So there's an accepted vocabulary that goes around these concepts. A
function that doesn't talk to global state like a file system or global
variables, those are considered "pure functions". Pure functions are
functions which only operate on their explicit input values & they
don't operate on their global/system state. These are super testable!
There is another qualifier for pure functions which is impotence
(pronounced eye-dem-POE-tence). Idempotence is the ability to call the
function a bunch of times and it always results in the same thing. A
good example of this would be the add
method from earlier. Adding 1
and 2 will always result in 3 no matter how many times you call it.
An example of a non-idempotent method call would be the append method
on a list. Calling it once returns [1]
and calling it again will
return [1,1]
, so this is is a non-idempotent operation. To tag onto
this example, if you instead used a set, it would then be idempotent
because sets are unique. Adding 1 will result in set([1])
and doing
it again will result in the same thing, because it doesn't add
duplicates. Idempotence is a necessary qualification for something
being a pure function.
The goal of all of this has been to describe a bit about how programmers think about things, especially as it relates to inputs and outputs. It's another way of talking about the test-ability of something. It is much easier to think about a function that is a pure function because all of the things you need to keep in your head are in the function itself. You don't have to worry that 3 files over you have some weird code that might affect your current thing. For this reason, you should try to write your functions as purely as possible.
Reasons you might not want to write pure functions. It turns out side-effects are actually very useful. They are things like things being saved to your disk, or printed to a console. They are implicit outputs of your function. So it isn't about not having side effects, but rather that it's useful to constrain those to smaller areas of your code base. This increases the testability of larger portions of your program.