Functional Programming Concepts for the Lay Programmer – Part 3

I really need to stop saying when the next one of these is likely to be out. In Part 2, I say that I hope it doesn’t take me a year and a half to write it and how long has it been? Almost two years.

Anyway, on to business.

map() and reduce()

The key to understanding the significance of these functions lies in two things:

First, when you have a list and you want to process it item-by-item, what do you do? You probably write a loop… but that’s boilerplate and some languages don’t have loops. Anyway, the computer should know what a loop looks like. All you want is to tell it to perform some task on each item in the list. That’s their original purpose… as a clean way to say “use this function on that list to transform/aggregate the data within”.

Second, whether you’re running it on your little home PC or Google’s massive server cluster, the loop you wrote will do the same thing… use one processor to do exactly what you told it to. If you want to scale up, you need to either write or find some code to spread your task out across multiple processors.

That’s what Google needed to do too… and they realized that map() and reduce() were a very clean way to talk to such a library. In languages like Python, these functions may be little more than syntactic sugar around a loop, but scalable MapReduce libraries like Google’s can use the exact same “apply this function to each item in that list” interface to farm your operation out over what may be thousands of processors just as easily as with one.

As for why there’s two of them? map() takes a list and returns a new list of the same length. reduce() takes a list and uses your function to merge each item into some running aggregate. You’d probably use a parallel map() to do multi-core thumbnailing; Google uses it to toss your search out to several dozen servers in their cluster. I don’t have a handy reduce() example for home use, but Google uses it to merge the results from map() into one unified list. (The Wikipedia page gives a few more examples.)

Functional languages also tend to have filter(), which takes a list of things and only keeps the ones where your function returns True. In Python, rather than using the built-in map() or filter() you’re advised to use List Comprehensions since they’re cleaner and more concise.

Closures

Closures or, more specifically, lexical closures, rely on the way lexical scoping and anonymous functions work in many modern languages (eg. JavaScript, Python, PHP 5.3+, etc.). This is easiest to explain with an example, so I’ll give one:

def makeAnnotateFunction(prefix):
    def do(text):
        return str(prefix) + str(text)

    return do

error = makeAnnotateFunction('ERROR: ')
warning = makeAnnotateFunction('WARNING: ')

Because do() has access to the environment it was created in, even after makeAnnotateFunction() exits, this works as you’d expect it to. The example is slightly contrived , but it’s in the same vein as some of the tricks I’ve used it for in test suites and logging systems. do() acts as a template for a bunch of similar functions I need.

Another common use for closures is in asynchronous programming situations like JavaScript. For this one, I can give a more concrete example using jQuery:

 var foo = function(selector) {
    var node = jQuery(selector);

    jQuery.get('http://www.example.com/', function(data, i, j) {
        node.html(data)
    });
};

If you’re not familiar with JavaScript and jQuery, here’s what’s going on:

  1. We define a function named foo which takes a CSS selector.
  2. When called, the function uses jQuery to find matching elements on the current page.
  3. It then performs an HTTP GET request for http://www.example.com/ and, when that returns, inserts the data it got into the selected elements.

The key is understanding that jQuery.get() is asynchronous. It returns immediately, rather than freezing up the page while waiting for the HTTP request to complete. If you want to do something with the result of the request, you have to provide a callback… but how do you make sure that the callback can see node so it can call node.html()?

That’s what closures do. The callback definition is executed within the parent function, so it keeps a reference to the parent function’s local scope, no matter what part of the browser may end up calling it later on.

Metaclasses

Let’s think about our metaphors for object-oriented programming. A class is a blueprint for building objects. So… what if you need to specify a blueprint for building classes? Hence, the metaclass.

That level of abstraction can be difficult to explain (I’m not even sure I fully understand it) but their most visible use in practice is redefining the behaviour of class definition to implement a domain-specific language such as Django ORM’s model definitions. They give the ORM a very clean way to let you write a class for each table in your database without also requiring you to manually hook up the “magic” that does the translation.

…or, if you’re familiar with function and class decorators, think “like those, but you can have inheritance”.

There’s an in-depth Python explanation on StackOverflow.

Bonus Features

Matrix Transposition

Matrix transpose is an operation that can be explained in two ways:

  1. Given a grid of data stored as list of rows, restructure it to be stored as a list of columns (or vice-versa. same thing.)
  2. Imagine you have a grid of numbers printed on something rigid and clear like a piece of plexiglass. Using one hand, pick it up by pinching the top-left and bottom-right corners. Now, with your other hand, spin it 180 degrees so it’s facing away from you. (Ignore that the individual numbers within the grid are now mirrored and rotated 90 degrees.)

When you’re doing it with a grid stored as a list of lists, Python and Ruby call it zip() and it has a lot of uses.

For example, suppose you’re storing scores in a game and each player gets their own list.

>>> scores = [
...    [8.7, 5.8, 9.4, 7.4, 6.5, 6.5, 8.8, 8.9, 5.8, 8.9], # Dave
...    [6.8, 9.9, 7.2, 5.4, 7.9, 6.7, 9.7, 9.3, 8.5, 9.5], # Anne
...    [7.7, 6.6, 8.7, 6.4, 9.7, 5.9, 9.3, 7.0, 7.1, 9.6], # Ellen
...    [7.2, 8.9, 6.8, 7.8, 6.0, 5.4, 7.7, 6.1, 5.3, 5.2], # Mike
...]

…and now you want them round-by-round. Do you loop through them manually? Nope. Just transpose.

>>> scores_by_round = zip(*scores)

>>> from pprint import pprint
>>> pprint(scores_by_round)
[(8.7, 6.8, 7.7, 7.2),
 (5.8, 9.9, 6.6, 8.9),
 (9.4, 7.2, 8.7, 6.8),
 (7.4, 5.4, 6.4, 7.8),
 (6.5, 7.9, 9.7, 6.0),
 (6.5, 6.7, 5.9, 5.4),
 (8.8, 9.7, 9.3, 7.7),
 (8.9, 9.3, 7.0, 6.1),
 (5.8, 8.5, 7.1, 5.3),
 (8.9, 9.5, 9.6, 5.2)]

Now Dave is column 1 and you can access the scores for round 1 as the first row. (Just keep in mind that it will truncate all of the sub-lists to the length of the shortest one unless you pad them out.)

This is also very useful for tabular display of numbers at the command-line. Just generate each column as a list and zip() them together before you print row-by-row.

Why Programmers Should Study Computer Science

When it all comes down to it, if you can Google for everything you need, why would you want to study Computer Science… especially if you want to write programs to solve real-world problems?

That’s what I used to think. The thing I didn’t realize was that, no matter how good you are, you can’t Google for things you never even imagined could exist. That’s what an education in computer science brings to a programmer. It expands your horizons and gives you new ways to look at what you already know.

(And, honestly, I think it may even be better to learn by doing first, so you can better understand the relevance and significance of what you’re learning when you go to school.)

Posted in Geek Stuff | 1 Comment

A Disingenuous Mistake in Dan Ariely’s Talk About Dishonesty

I ran across this post draft while clearing out old notes and, since I don’t think I’ve ever blogged about the instinctual underpinnings of “piracy” before, here you go…

A few days ago, I saw RSA Animate’s video for The (Honest) Truth About Dishonesty  by Dan Ariely. On the whole, it’s an interesting talk, as pretty much anything chosen for RSA Animate is… but there was one problem which I found myself dwelling on.

While Dan Ariely makes an excellent point about the importance of incorporating forgiveness into society to improve social behaviour, his argument is greatly crippled because one of his examples elicits dismissive behaviour far in excess of any benefit he may gain from using it. Specifically, his example of illegal music downloading as extreme rationalization.

Why does this cause his credibility to take such a hit among certain viewers? Because he gives no indication that he understands the deep psychological underpinnings of the distinction between eating and running and downloading a file.

For the vast majority of human existence, music was free. It was a flowing, living cultural artifact similar to gossip or telling jokes. Then, we gained the ability to record it and each of those wax cylinders or plastic discs took effort to make, so we charged for them… but going back to the other model was inevitable.

As soon as computers made it possible for us to copy files en masse at vanishingly small costs, a different set of instincts took over. Files are like jokes and recipes, not balls and ice-cream cones. People need to be taught to share a scarce item like a ball, but they laugh at you and then become outraged when they realize you’re serious if you tell them they can’t re-tell a joke without paying.

As human beings, we are terrible at fighting our own instincts. That’s why so many of us have trouble saving money or stopping eating or getting up the confidence to talk to a stranger we’ll probably never meet again. We’re still wired as tribal apes and we’re terrible at fighting those very same instincts that tell us to share culture like mad. (Speaking of which, borrow Mean Genes by Terry Burnham and Jay Phelan next time you stop by your local library.)

By dismissing this distinction, Mr. Ariely comes across as ignorant, which calls his position as an authority on other psychology-related matters into question… matters on which his argument depends.

Posted in Web Wandering & Opinion | Leave a comment

Emptying “Deleted” Files

A few days ago, I woke to find my home partition 100% full and programs erroring out because of it.

“No problem,” I thought and fired up ncdu Install via APT to figure out what had eaten up the 12GiB I’d had free the night before. (Nothing beats ncdu for efficient workflow though FileLight Install via APT is a good alternative if you’re willing to trade keyboard shortcuts for a GUI)

Nothing showed up so, on a hunch, I tried using lsof Install via APT to see if it was a deleted file being held in existence by an open file handle. Bingo!

…it was the ~/.xsession-errors that I’d deleted weeks ago, it was being held open by half the apps on my desktop (including my session manager), and I had in-progress tasks that I couldn’t interrupt to restart my session. Here’s where it got tricky. I had to somehow get rid of a file that was already “deleted” when I no longer had a name to reference it by.

Thankfully, after a fair bit of searching, I found a solution. The key lies in how, on Linux, you can access all open files via /proc if you know the process’s PID, even if they’ve been deleted. (It’s the second column in the lsof output. You’ll also need the file descriptor number, but you can trial-and-error that using something like less)

Once you’ve got that, you can use a root shell to truncate the file to a length of zero.

you@host:~# > /proc/<PID>/fd/<FD>

(Yes, that’s using a non-appending redirect without providing a command to redirect output from)

I hope this helps if you ever find yourself in a similar situation.

Posted in Geek Stuff | Leave a comment

A little tool for command-line playlist building

For the last few years, I’ve been amassing a collection of little scripts I use every day to build playlists, both for Audacious and for MPlayer.

About a week ago, I realized that they’d started to duplicate each others’ functionality and refactored them into a single script which has different default behaviours depending on what it’s called as (but you can also use command-line flags).

So far, here are the behaviours it implements based on what you call it as (either via MPRIS [1] [2] or using a command provided via --exec such as mplayer):

aq
Add the given paths to the playlist.
ap
Like aq but start the first one playing too.
laq
Like aq but use locate -i to search for the first argument, filter for known media types and filter for the following arguments, then display a chooser.
lap
Like laq but start the first one playing too.
raq
Randomly select -n NUM songs (default: 10) from the paths provided (default: XDG_MUSIC_DIR) and add them to the playlist.
rap
Like rap but start the first one playing too.

…and all of these are just aliases to options listed in --help.

Until two days ago, the chooser was a simple little thing where you were given a numbered list of filenames and you typed in a sequence of numbers, optionally including Python-style start:stop slice syntax.

Now, if you’ve got urwid installed, you get a nice ncurses-style pseudo-GUI, complete with mouse interaction… though the scroll wheel support is preliminary and I’m still trying to figure out how to set up Tab-based focus switching.

Here’s a screenshot:

I haven’t had time to fully bring the codebase up to spec, write the last few new features, and write a test suite for it, but it’s good enough for my day-to-day use so, if you want to give it a try, it’s on GitHub in ssokolow/profile as home/bin/lap. (It has its own GitHub repo now.)

The queue option in the chooser can be toggled via the Q key or Meta+Q and, once you’ve selected all the tracks you want using Spacebar or the mouse, press Enter to commit.

In case I ever move it to its own repo and forget to update the link, you can also grab the revision at the time of this posting.

Posted in Geek Stuff | 6 Comments

Trifles Make Perfection

TL;DR: When you’re doing something creative, look for details which, with only minimal change, could greatly broaden your work’s appeal and staying power. (Also includes examples.)

Every now and then, I run across a creative work which frustrates me, not because it’s bad, but because the author made one tiny “unimportant”  decision that severely cripples either the story’s longevity or its appeal.

The most common example I run across is fanfiction where, aside from a few too many mentions of something like MySpace (as opposed to more generic terms), the story could easily last at least a decade or more without feeling strange. (Though breaking the atmosphere is also a common problem with mentioning things we recognize as contemporary brands in fanfiction since the source material generally occupies a different mental universe from what we encounter in day-to-day life.)

Hence, I appeal to people. When you’re doing something creative, please take a few minutes to think about what other people might care more about than you do. Ask friends to help if you can spare the time. I just don’t want to see good things fading into obscurity because of a minor oversight.

How about a more direct example. Two days ago, I was discussing musical tastes with a friend and, to clarify what she meant by “fan-song”, she sent me a YouTube link to a My Little Pony song.

Now, in order to get my point across, I need to be clear about my opinions. I have nothing against My Little Pony: Friendship is Magic but I’m not a fan either and neither of us approve of the Brony subculture. I treat it the same way I treat Care Bears… and some of my fondest childhood nostalgia relates to Care Bears but I don’t still watch it. MLP:FiM is a well-done show for people much younger than I am and, if my young self had access to it, I’d probably be nostalgic for it too.

Despite that, I’ve been listening to “Lullaby for a Princess” a lot. It’s a truly beautiful song and the lyrics are very cleverly phrased at times. (eg. “but such is the way of the limelight, it sweetly takes hold of the mind of its host”)

It’s relevant to this topic because, were it not for four uses of the word “pony” where other two-syllable words could readily have been substituted, it could easily transcend and outlive its fandom as a masterful expression of a common literary theme.

For those who haven’t had a need to google up the wikia page yet, here’s a little explanatory backstory: A thousand years before the first episode of MLP:FiM, Princess Celestia and Princess Luna ruled together, the elder sister with domain over the day and the younger over the night, both immortals.

Living in her sister’s shadow, Luna ends up feeling unappreciated and, one day, can take it no longer. Consumed by her pain and corrupted by how her emotions have fed into her magic, she refuses to allow the morning to come, determined to force the kingdom to appreciate the beauty of her domain. (as covered by “The Moon Rises“, which is also beautiful but was recorded before the fan group in question found a female member to do the vocals.)

Celestia is forced to choose her responsibility to her subjects over her love for her sister. With Luna refusing to back down, Celestia takes over dominion of night and banishes her sister to the moon for a thousand years to allow her emotionally charged self-transformation time to weaken. (“Lullaby for a Princess” is sung following this event, exploring Celestia’s guilt at her own blindness toward her sister’s pain and the responsibility she feels for how things turned out.)

Obviously, not all songs can be made more generally appealing in this manner. For example, “The Moon Rises” has a point where the lyrics include “the ponies of Equestria” and, even if I were to suggest a replacement for “ponies”, I can’t imagine an easy way to remove the kingdom’s name without weakening the lyrics.

However, in Lullaby for a Princess, the four uses of the word “pony” can be replaced very naturally. In the context of the song’s narrative, Celestia is just as much a “princess” as a “pony” and, with only a few seconds of thought, I can see that “no pony” could easily be replaced with “nobody”.

In fact, I think that such changes would make the lyrics more powerful for two reasons:

  • The song is about a sovereign regretting that she neglected her sister to the point where she had to banish her to protect the kingdom whose “limelight” she’d been blinded by. Referring to her as “a princess” rather than “a pony” supports the theme more strongly.
  • Compared to something like “nobody”, the one use of “no pony” feels a bit superfluous and impedes the immersiveness of the story for non-fans. (Sort of like fanfiction authors who include random bits of Japanese, not because they lack good English equivalents or because it aids comprehension, but because they think it’s cool.)

That’s not to say it’s not a beautiful song nonetheless. I’d still recommend it to anyone who can appreciate it despite its context, but it could have been so much more broadly appealing with such a small change.

To quote Michelangelo, trifles make perfection, and perfection is no trifle.

P.S. For anyone who likes those two songs, another fan did a cover of “Lullaby for a Princess” with modified lyrics named “Luna’s Reply“. Her voice isn’t quite as suited to the melody and, from what I read on wikia, I’m not sure if Luna’s un-corrupted personality would still exist in a state distinct enough from Nightmare Moon for it to be canon-compatible, but the lyrics are still excellent and the song is nice to listen to.

In case of dead links:

Update: Perfect Picture (MP3) by StormWolf is an excellent example of doing this well.

It’s nowhere near as deep as the ones I mentioned before, but it’s not supposed to be. It’s a catchy techno-styled musical summary of a famous fashion photographer who just happens to be a pony.

There’s nothing specifically restricting it to the MLP fandom because the lyrics focus on her character, rather than needlessly pointing out the fact that, like everyone else in the setting, she is non-human. From the lyrics to the musical styling, it’s a catchy musical summary of a character which also works as a catchy summary of a type of person if you have no idea who Photo Finish is.

That means it has a good shot at getting shared well beyond the fandom and surviving as “another catchy song” when MLP eventually wanes.

Posted in Otaku Stuff, Writing | 2 Comments

Making Coveralls work with PyPy and skip Python 2.5 on Travis-CI

I have a couple of projects that get tested on Travis-CI and I just discovered Coveralls, a tool which integrates with it to provide a code coverage badge to go alongside your build status badge. (and is also free for open-source projects)

Unit Test Status Coverage Status

Unfortunately, it seems that there’s a bug in the current Python client that causes it to error out when submitting coverage data from PyPy builds. (It tries to grab source for compiled PyPy modules with no source available and then dies when it can’t).

Here’s the boilerplate I use to skip it on Python 2.5 (which it doesn’t support), make it complete successfully on PyPy, and omit things like the unit test framework itself from the report:

Posted in Geek Stuff | Leave a comment

How To Invert Your X11 NumLock LED

TL;DR: numlockx off; xdotool key Num_Lock

I like to leave NumLock on all the time but, with my current keyboard, the indicator light is a blindingly bright blue. I noticed that the numlock light would sometimes get confused, so its behaviour was reversed (only on when NumLock was turned off) but I couldn’t figure out why until now.

Here’s the deal:

First, under modern versions of X.org with MPX support, the XInput system lets you have one or more master keyboard-pointer pairs, each which can be mapped to one or more physical keyboards and mice, called slaves. (You can examine this by running xinput in a terminal)

Here’s an example of what xinput outputs on my system:

⎡ Virtual core pointer                          id=2    [master pointer  (3)]
⎜   ↳ Virtual core XTEST pointer                id=4    [slave  pointer  (2)]
⎜   ↳ ATI Remote Wonder II                      id=12   [slave  pointer  (2)]
⎜   ↳ Logitech USB Gaming Mouse                 id=13   [slave  pointer  (2)]
⎜   ↳ Wacom BambooFun 6x8 stylus                id=14   [slave  pointer  (2)]
⎜   ↳ Wacom BambooFun 6x8 eraser                id=15   [slave  pointer  (2)]
⎜   ↳ Wacom BambooFun 6x8 cursor                id=16   [slave  pointer  (2)]
⎜   ↳ Wacom BambooFun 6x8 pad                   id=17   [slave  pointer  (2)]
⎣ Virtual core keyboard                         id=3    [master keyboard (2)]
    ↳ Virtual core XTEST keyboard               id=5    [slave  keyboard (3)]
    ↳ Power Button                              id=6    [slave  keyboard (3)]
    ↳ Power Button                              id=7    [slave  keyboard (3)]
    ↳ Guest Barcode Reader                      id=8    [slave  keyboard (3)]
    ↳ CYKB23 USB Keyboard                       id=9    [slave  keyboard (3)]
    ↳ CYKB23 USB Keyboard                       id=10   [slave  keyboard (3)]
    ↳ zc3xx                                     id=11   [slave  keyboard (3)]

(For the curious, zc3xx is the button on the side of my USB webcam which triggers the Logitech software under Windows by sending a keycode that maps to the XF86WebCam keysym, same as on a multimedia keyboard)

Noteworthy implications of this design:

  • It’s possible to create more masters and move devices over to them. If you do, you get extra mouse pointers on your screen and, if your window manager supports it, you can have multiple people using several different applications at the same time on the same desktop.
  • Every master has a virtual slave called XTEST, which is used to inject events.
  • Each slave (physical keyboard) can have its own LEDs, but the mod2 modifier that actually controls NumLock behaviour is part of the master.
  • X11 doesn’t yet know how to force all the slave keyboards on a given master to share the same LED states. (Because, if you want to support all the different kinds of keyboards out there, it’s very complicated)

This means that, if you press NumLock on one keyboard, it will turn NumLock on/off for the whole master/slave grouping, but the light will only change on that one keyboard.

The problem I was having was that older versions of the numlockx command were faking a NumLock keypress on the XTEST virtual keyboard, rather than my physical one, and then pressing NumLock on the physical keyboard just toggled the LED and the modifier while assuming they were still in sync.

numlockx has now been fixed, but you can still trigger this mismatch manually if, like me, you want the light to come on when NumLock is turned off. Just use this command:

numlockx off; xdotool key Num_Lock

The first half puts both your NumLock modifier and LED into a known state (off) and then the call to xdotool presses NumLock on the virtual XTEST keyboard, which turns on the modifier and the invisible/imaginary LED on the virtual keyboard, while leaving the LED on the physical keyboard untouched.

As long as you only change your numlock state by pressing the button, they’ll stay out of sync until the end of your desktop session. Enjoy!

(Thanks to Peter Hutterer for explaining things)

Posted in Geek Stuff | 2 Comments