I really need to stop saying when the next one of these is likely to be out. In Part 2, I say that I hope it doesn’t take me a year and a half to write it and how long has it been? Almost two years.
Anyway, on to business.
map() and reduce()
The key to understanding the significance of these functions lies in two things:
First, when you have a list and you want to process it item-by-item, what do you do? You probably write a loop… but that’s boilerplate and some languages don’t have loops. Anyway, the computer should know what a loop looks like. All you want is to tell it to perform some task on each item in the list. That’s their original purpose… as a clean way to say “use this function on that list to transform/aggregate the data within”.
Second, whether you’re running it on your little home PC or Google’s massive server cluster, the loop you wrote will do the same thing… use one processor to do exactly what you told it to. If you want to scale up, you need to either write or find some code to spread your task out across multiple processors.
That’s what Google needed to do too… and they realized that map()
and reduce()
were a very clean way to talk to such a library. In languages like Python, these functions may be little more than syntactic sugar around a loop, but scalable MapReduce libraries like Google’s can use the exact same “apply this function to each item in that list” interface to farm your operation out over what may be thousands of processors just as easily as with one.
As for why there’s two of them? map()
takes a list and returns a new list of the same length. reduce()
takes a list and uses your function to merge each item into some running aggregate. You’d probably use a parallel map()
to do multi-core thumbnailing; Google uses it to toss your search out to several dozen servers in their cluster. I don’t have a handy reduce()
example for home use, but Google uses it to merge the results from map()
into one unified list. (The Wikipedia page gives a few more examples.)
Functional languages also tend to have filter()
, which takes a list of things and only keeps the ones where your function returns True
. In Python, rather than using the built-in map()
or filter()
you’re advised to use List Comprehensions since they’re cleaner and more concise.
Closures
Closures or, more specifically, lexical closures, rely on the way lexical scoping and anonymous functions work in many modern languages (eg. JavaScript, Python, PHP 5.3+, etc.). This is easiest to explain with an example, so I’ll give one:
def makeAnnotateFunction(prefix): def do(text): return str(prefix) + str(text) return do error = makeAnnotateFunction('ERROR: ') warning = makeAnnotateFunction('WARNING: ')
Because do()
has access to the environment it was created in, even after makeAnnotateFunction() exits, this works as you’d expect it to. The example is slightly contrived , but it’s in the same vein as some of the tricks I’ve used it for in test suites and logging systems. do()
acts as a template for a bunch of similar functions I need.
Another common use for closures is in asynchronous programming situations like JavaScript. For this one, I can give a more concrete example using jQuery:
var foo = function(selector) { var node = jQuery(selector); jQuery.get('http://www.example.com/', function(data, i, j) { node.html(data) }); };
If you’re not familiar with JavaScript and jQuery, here’s what’s going on:
- We define a function named
foo
which takes a CSS selector. - When called, the function uses jQuery to find matching elements on the current page.
- It then performs an HTTP GET request for http://www.example.com/ and, when that returns, inserts the data it got into the selected elements.
The key is understanding that jQuery.get()
is asynchronous. It returns immediately, rather than freezing up the page while waiting for the HTTP request to complete. If you want to do something with the result of the request, you have to provide a callback… but how do you make sure that the callback can see node
so it can call node.html()
?
That’s what closures do. The callback definition is executed within the parent function, so it keeps a reference to the parent function’s local scope, no matter what part of the browser may end up calling it later on.
Metaclasses
Let’s think about our metaphors for object-oriented programming. A class is a blueprint for building objects. So… what if you need to specify a blueprint for building classes? Hence, the metaclass.
That level of abstraction can be difficult to explain (I’m not even sure I fully understand it) but their most visible use in practice is redefining the behaviour of class definition to implement a domain-specific language such as Django ORM’s model definitions. They give the ORM a very clean way to let you write a class for each table in your database without also requiring you to manually hook up the “magic” that does the translation.
…or, if you’re familiar with function and class decorators, think “like those, but you can have inheritance”.
There’s an in-depth Python explanation on StackOverflow.
Bonus Features
Matrix Transposition
Matrix transpose is an operation that can be explained in two ways:
- Given a grid of data stored as list of rows, restructure it to be stored as a list of columns (or vice-versa. same thing.)
- Imagine you have a grid of numbers printed on something rigid and clear like a piece of plexiglass. Using one hand, pick it up by pinching the top-left and bottom-right corners. Now, with your other hand, spin it 180 degrees so it’s facing away from you. (Ignore that the individual numbers within the grid are now mirrored and rotated 90 degrees.)
When you’re doing it with a grid stored as a list of lists, Python and Ruby call it zip()
and it has a lot of uses.
For example, suppose you’re storing scores in a game and each player gets their own list.
>>> scores = [ ... [8.7, 5.8, 9.4, 7.4, 6.5, 6.5, 8.8, 8.9, 5.8, 8.9], # Dave ... [6.8, 9.9, 7.2, 5.4, 7.9, 6.7, 9.7, 9.3, 8.5, 9.5], # Anne ... [7.7, 6.6, 8.7, 6.4, 9.7, 5.9, 9.3, 7.0, 7.1, 9.6], # Ellen ... [7.2, 8.9, 6.8, 7.8, 6.0, 5.4, 7.7, 6.1, 5.3, 5.2], # Mike ...]
…and now you want them round-by-round. Do you loop through them manually? Nope. Just transpose.
>>> scores_by_round = zip(*scores) >>> from pprint import pprint >>> pprint(scores_by_round) [(8.7, 6.8, 7.7, 7.2), (5.8, 9.9, 6.6, 8.9), (9.4, 7.2, 8.7, 6.8), (7.4, 5.4, 6.4, 7.8), (6.5, 7.9, 9.7, 6.0), (6.5, 6.7, 5.9, 5.4), (8.8, 9.7, 9.3, 7.7), (8.9, 9.3, 7.0, 6.1), (5.8, 8.5, 7.1, 5.3), (8.9, 9.5, 9.6, 5.2)]
Now Dave is column 1 and you can access the scores for round 1 as the first row. (Just keep in mind that it will truncate all of the sub-lists to the length of the shortest one unless you pad them out.)
This is also very useful for tabular display of numbers at the command-line. Just generate each column as a list and zip()
them together before you print row-by-row.
Why Programmers Should Study Computer Science
When it all comes down to it, if you can Google for everything you need, why would you want to study Computer Science… especially if you want to write programs to solve real-world problems?
That’s what I used to think. The thing I didn’t realize was that, no matter how good you are, you can’t Google for things you never even imagined could exist. That’s what an education in computer science brings to a programmer. It expands your horizons and gives you new ways to look at what you already know.
(And, honestly, I think it may even be better to learn by doing first, so you can better understand the relevance and significance of what you’re learning when you go to school.)
Functional Programming Concepts for the Lay Programmer – Part 3 by Stephan Sokolow is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Pingback: Functional Programming Concepts for the Lay Pro...