Splitting CamelCase without Regex lookahead/lookbehind

I’ve been working to port the brains of my game launcher experiment from Python to Rust for easier refactoring and to eventually offer language bindings other than Python itself and, at the moment, I’m working on the “filename to title” heuristic, because that’s got the best unit test suite.

Surprise, surprise. The biggest time-sink has been redesigning things where the Python version uses regular expressions with lookbehind and lookahead expressions.

Well, the CamelCase-processing regular expression faked me out. If you search for answers, it seems every language is using some variation of this regex:


…and, since Rust’s Regex crate uses a DFA to avoid pathological performance, this wasn’t an option.

I wasted an afternoon trying to implement such a parser manually before it occurred to me that you don’t actually need lookahead/lookbehind to parse camelcase!

It seems that someone originally wrote that regex with the goal of getting accurate string indexes for the starts of words, then everyone else copy-pasted and translated it without recognizing that it was stricter than necessary.

UPDATE: There is one situation where the lack of lookahead has to be worked around which I somehow overlooked… any situation where you need to insert a space before and after the same character. (ie. words like “I” and “A” in the middle of a phrase.) The solution there is just to run the regex twice.

So, if you just want to insert spaces (without duplicating ones that are already there), and you want to enhance the algorithm to handle numbers and ampersands reasonably and you want it to work in Rust’s regex engine, here’s how you do it:

r"([a-z&])([&A-Z0-9])|([^ ])([A-Z][a-z])|([0-9])([a-z])"

…and then use this as your replacement string:

"${1}${3}${5} ${2}${4}${6}"

It passes every test in my unit test suite for CamelCase splitting and improves the accuracy score on the integration test corpus for the filename_to_name function which relies on it.

UPDATE #2: I got to playing with the Regex crate’s support for unicode character classes and I have two things to report:

First, you actually do need a fourth pair of capture groups to split “A&b” because of the “ampersand, followed by lowercase” situation.

Second, for anyone who wants it, here’s a fully commented Rust regexp which will process CamelCase written using any mix of what the Unicode database considers to be letters (lowercase, uppercase, or titlecase), numbers (decimal digits, letters, or other), and the four different kinds of ampersands I was able to find:

 # == Ampersand (definitely end) followed by anything not already whitespace ==
 # ('Ampersand' or 'Small Ampersand', or 'Fullwidth Ampersand' or...
 # 'Heavy Ampersand Ornament')
 # (Something not 'Separator, Space')

 # == OR ==

 # == Lower/titlecase (possible end) followed by upper/titlecase/number (possible start) ==
 # ('Letter, Lowercase' or 'Letter, Titlecase', 'Ampersand' or 'Small Ampersand', or...
 # 'Fullwidth Ampersand' or 'Heavy Ampersand Ornament')
 # ('Letter, Uppercase' or 'Number, Decimal Digit' or 'Number, Letter' or...
 # 'Number, Other' or 'Ampersand' or 'Small Ampersand' or...
 # 'Fullwidth Ampersand' or 'Heavy Ampersand Ornament')

 # == OR ==

 # == Number followed by an un-capitalized word ==
 # ('Number, Decimal Digit' or 'Number, Letter' or 'Number, Other')
 # ('Letter, Lowercase')

 # == OR ==

 # == Anything not whitespace, followed by ampersand or unambiguous beginnings of word ==
 # (Something not 'Separator, Space')
 # ('Letter, Titlecase' or ['Letter, Uppercase' followed by 'Letter, Lowercase'] or...
 # 'Ampersand' or 'Small Ampersand' or 'Fullwidth Ampersand' or ...
 # 'Heavy Ampersand Ornament')
 (\p{Lt} | \p{Lu}\p{Ll} | [\x{26}\x{FE60}\x{FF06}\x{1F674}])


"${1}${3}${5}${7} ${2}${4}${6}${8}"

Before you get scared, it’s actually just the Unicode version of the tiny regexp with a ton of explanatory comments and a little more ampersand matching. It’s very simple once you throw out the comments and understand the structure.

The regular expression matches any of four situations where a space should be inserted:

  • An ampersand followed by anything that isn’t whitespace (this is the one I split out)
  • A lowercase or titlecase (upper+lower in one character) letter followed by an uppercase/titlecase letter or a number.
  • A number followed by an un-capitalized letter.
  • Anything not whitespace, followed by an ampersand, titlecase letter, or <UPPER><lower> pair.

Once you ignore the comments, most of the verbosity comes from having to combine multiple unicode character literals or classes to say things like…

  • “any ampersand” ([\x{26}\x{FE60}\x{FF06}\x{1F674}])
  • “any type of number” ([\p{Nd}\p{Nl}\p{No}])

The advantage is that, because it delegates the lower-level details to the Unicode lookup tables for everything but the ampersands, any failure should result from the language being so alien that the concept of CamelCase itself is difficult to apply to it.

UPDATE #3: Replace chained OR tokens with compound character classes where possible for minor cache pressure savings in the regexp engine.

It’s still a heavy regexp (I was chatting with burntsushi on IRC when the idea occurred to me and, if anyone should have the expertise to judge that, it’s the author of the Regex crate), but it meets my needs, it’s easy to tweak as I further refine my heuristic guesser, and if I ever need something more performant, writing this has taught me enough about the problem that I should be able to write a manual solution.

Posted in Geek Stuff | Leave a comment

Getting Your Cheap Chinese USB Foot Pedal Doing Useful Things on Linux

Well, I have a Chinese USB foot pedal that I picked up on eBay for $11 CA to play around with but, for ages, it’s been worth bupkiss because it’s mapped to the “b” key on the keyboard and I was too busy to look into how to remap a key on just one device on Linux… that ends here and now.

Try 1: Remapping in XKB

The most obvious solution would be to remap the thing by applying a custom X11 keyboard layout to just the one device. (ie. Redefine how keycodes from evdev map to X11 keysyms.)

OK, doesn’t seem too hard and, Google to the rescue, someone even wrote some example code for it. (See also: XKB Remapping)

The gist is:

  1. Use xinput list to find the ID for the device you want to limit the remapping to
  2. Use setxkbmap -device <the ID> -print to dump the keymap
  3. Write a custom supplemental XKB keymap which redefines the key in question
  4. sed the dumped keymap to append the supplemental keymap to the stack of what gets applied.
  5. Use xkbcomp to re-apply it.

…but, no matter what I tried, I couldn’t get my Ubuntu 14.04 LTS system to acknowledge the change.

Verdict: Failure on *buntu 14.04 LTS

Try 2: Remapping in XKB (hackier but more foolproof)

I was getting errors from the previous version and I didn’t know whether any of them were spurious, so I decided to try the old “hacky but foolproof” standby described in this ArchWiki page … dump the keymap, hand-edit the stupid thing, and then re-apply it.

xkbcomp -i <the ID> $DISPLAY ~/out.xkb
vim ~/out.xkb
xkbcomp -i <the ID> ~/out.xkb $DISPLAY

First, let me caution you that it has to be -i ID and not -iID. The latter form doesn’t work.

That said, it ran without complaint and even showed the expected results when I re-dumped the keymap fresh from the X server… it just had no effect.

(I think it might be because, when you look at the USB HID descriptor that evtest dumps on startup, the stupid pedal announces that it’s a 162-key keyboard, plus a mouse with scroll wheel, all in one device, which causes xinput list to file it as a mouse rather than a keyboard.)

Verdict: Failure on *buntu 14.04 LTS

(Naturally, I didn’t continue on to research how to make it survive restarts.)

Try 3: Remapping via udev

OK, no big deal. My ATi Remote Wonder II didn’t need remapping because it sent keycodes that were already mapped to the XF86AudioPlay and XF86AudioPause keysyms. Let’s drop below X11 and redefine how the kernel maps hardware scan codes from the pedal to evdev keycodes.

Now, since we need to redefine it for only one input device, that means we can’t use the setkeycodes command, so udev it is. (Besides, I’ve heard that setkeycodes doesn’t work with USB devices.)

According to these StackExchange answers [1] [2] and these ArchWiki pages [1] [2], I need to put together a definition like this, and store it at /etc/udev/hwdb.d/some-unique-name.hwdb, then refresh the database:


Oh, and make sure you only indent the second line one space. More will cause a silent failure and the first guide I read not only didn’t mention that, but actually used two spaces in the example!

To keep things simple, here’s the overview of the process:

  1. Create something in the above format as a /etc/udev/hwdb.d/whatever.hwdb file.
  2. The numbers in the file are as follows:
    1. The b0003 means “Bus type: USB”
    2. The v0c45 and p7403 are the USB vendor and product IDs from lsusb
    3. The 70005 is the MSC_SCAN value from pressing the pedal while evtest is listening.
    4. The playpause is the kernel-internal name for the desired keycode. They’re defined in a header I don’t seem to have (linux/input-event-codes.h) but the ArchWiki pages have a link to an online list of them.
  3. After putting the file in place, run sudo udevadm hwdb --update (older systems) or sudo systemd-hwdb update (newer systems) to merge it into the hardware database.
  4. Finally, either reboot your system or run sudo udevadm trigger to apply the changes. (Un-plugging and re-plugging the thing may also work)

…so, did it work? Well… no. It turns out *buntu 14.04 is using the last version of udev before that feature was added (14.04 has udev 204 and it needs 205 or higher).

(Also, note that whether you use the keyboard: prefix or the evdev: prefix will depend on the udev version you use. I tried both, but this site explains what to use for which version.)

Verdict: Failure on *buntu 14.04 LTS

NOTE: This AskUbuntu answer that I found after hacking together my own solution suggests that it might actually be possible, but that 14.04 includes a version of udev with some funny requirements for applying the changes.

Try 4: Remapping via udev (the old way)

*sigh* Ok, so we can’t use the clean, easy solution everyone’s listing because 14.04 is too old… so what’s the old solution?

Well, according to this page and this AskUbuntu answer, versions of udev up to and including 204 do key remapping by using the old familiar udev rules.d system and a helper utility named /lib/udev/keymap. Cool.

To paraphrase Jurassic Park, “It’s a rules.d system, I know this.” (From fixing permissions on various devices, to getting my Chinese NES controller adapter to announce itself as a joystick, I have a fair bit of experience writing rules.d files.)

OK, so I write this /etc/udev/keymaps/microdia-foot-switch file, using the same values from the previous attempt:

0x70005 playpause

…and this /etc/udev/rules.d/99-microdia-foot-switch.rules file:

ACTION!="remove", SUBSYSTEMS=="usb", ENV{ID_VENDOR_ID}=="0c45", ENV{ID_MODEL_ID}=="7403", RUN+="keymap $name /etc/udev/keymaps/microdia-foot-switch"

…and unplug and replug the foot pedal… and nothing happens. I check `dmesg` and whadda ya know, `/lib/udev/keymap` isn’t installed.

(NOTE: While it didn’t affect me, your distro may have a bug in keymap which forces you to use the keycode itself instead of the playpause name alias for it. If you happen to have another device which can emit the target keycode, you can use sudo showkey --keycodes to get it really easily.)

So, anyway, I pop over to packages.ubuntu.com to search up the package to install… and it turns out it was in udev in Precise and got pulled (with no replacement) in Trusty… way to jump the gun, guys!

Of course, this isn’t the only botch in 14.04 (it also shipped with a buggy version of Geeqie), so this is old-hat.

  1. Pop over to the precise-updates version of the udev package on packages.ubuntu.com, scroll to the bottom, and download whichever architecture matches your hardware. (Probably amd64, in this day and age)
  2. Open it up with file-roller (because it’s smart enough to hide the fact that a .deb is an archive containing more archives) and copy out /lib/udev/keymap.
  3. sudo cp keymap /lib/udev/keymap && chmod +x /lib/udev/keymap

Now, try unplugging and re-plugging the thing again and it should work.

Verdict: Success on *buntu 14.04 LTS

…but wait, there’s more!

While evtest and xev were showing the right keysym, my Audacious Media Player wasn’t listening to the resulting XF86AudioPlay presses.

Apparently, not all XF86AudioPlays are created equal, because adding a second mapping from “Pause/Resume” to XF86AudioPlay by stepping on the pedal fixed it.

(My hypothesis is that, while Audacious is displaying the keysym names, it’s binding using the underlying keycodes and the one from my ATi Remote Wonder II began life as a play keypress while the one from the pedal began life as playpause.)

Posted in Geek Stuff | 1 Comment

Home-made tamper-evident security seals for kids and adults alike

Suppose you need to keep siblings, roommates, children, or even friends with wandering hands out of something, but you can’t use a lock. Maybe you’re worried they’ll find the key, maybe you need something that has no metallic parts, or maybe you’re a kid and your parents are worried you will lose the key.

This is the kind of thing numbered security seals are good for… but they’re expensive. (The cheapest I’ve found are roughly 50¢ each in packs of 100 or more)

…so I decided to do a little experimentation and came up with a cheap-but-reliable solution for homemade tamper seals that’s so simple and safe that even children can do it.


Step 1: Making the seals

The most important thing about a security seal is that it’s unique, so it can’t be replaced, and either can’t be reclosed after being opened or you’ll notice if it is.

To satisfy these requirements with paper, we need to print or write something on the paper which other people can’t reproduce and which will look wrong if someone cuts and re-glues the seal.

If you’ve got a printer, you could print out a random image from the Internet, delete the file and empty your browser history, then cut strips from it to use as seals, but Inkjet ink tends to cost a fortune and most people don’t have laser printers.

What I recommend is this: Cut strips from a piece of paper, then sign and doodle all over both sides of them so any cuts or complete replacement will be obvious. (You do it on both sides, so that you can easily see if someone mends a cut by glueing another piece of paper to the inside of the loop.)

Don’t forget to use your scissors to make the ends of the strip round, rather than square, so it’s harder for someone to attempt to pick at the joint if they want to try to peel it apart.

Step 2: Applying the seals

To apply the seal, you just need to pass it through or around whatever you want to seal (eg. cabinet handles) and then glue it into a loop. (a tight loop, if you’re securing knob-shaped cabinet handles, so it can’t be lifted off without breaking it.)

If you want to seal a box, try gluing several strips together into a shape similar to the ribbon on a Christmas present.

It’s the details which make it tamper-evident:

  1. Get a bottle of Polyvinyl Acetate glue (A.K.A. white glue, school glue, PVA).
  2. Apply a enough glue to one end of the strip that, when squeezed, it won’t leave unglued corners.
  3. Close the loop and squeeze the ends together for at least five seconds, as hard as you can.
  4. Wait at least 5 minutes, but ideally 10 minutes.

Step 3: Making the seals tougher

Supposing you’re trying to keep really clever people from sneaking access, there are two more tricks you should get an adult to do:

  1. Paint the glued part with clear nail polish to waterproof it, so nobody can try to invisibly open the seal by steaming the glue. (This also makes it harder to try to pick apart.)
  2. Cut some diagonal stripes into the glued spot using a utility knife to absolutely guarantee that any attempt to peel apart the glue will result in obvious tearing.

The “Why”

I was very specific in my instructions, because I actually did a lot of testing. If you want to repeat my tests yourself, here’s what I did:


  1. A successful seal must not have the glue fail when pulling on both ends of the glued joint.
  2. A successful seal must show obvious damage when someone tries to pick and peel at the glued joint.
  3. A successful seal must have the paper visibly fail before the glue if a solvent is applied to the glued joint.


  1. Cut a bunch of paper strips, approximately 3″ long.
  2. Glue pairs of strips into longer strips, using various test glues.
  3. On each glued strip, write the time the process was finished, so drying time can be considered.
  4. For each drying time tested, grip the ends of a test strip and pull evenly outward as with a Christmas cracker.
    The test is a success if the paper breaks outside the glued patch. (If it fails far enough from the join, repeat until there’s nothing left to grip)
  5. For each drying time tested, attempt to peel apart the two strips of paper.
    The test is a success if the paper is visibly damaged in a manner that wouldn’t be hidden by glueing the joint back together.
  6. Repeat each test at least once, to account for variations in the process.
  7. Prepare another set of test strips and allow the glued spot to soak in a drop of mineral or vegetable oil for 30 minutes. Repeat the tests.
    The glue passes if it wasn’t weakened by the oil. (In my tests, the breakage still consistently occurred in the dry portions of the paper, indicating that the oil had not weakened it significantly.)
  8. Prepare another set of test strips and paint the glued patch thoroughly with clear nail polish. Repeat the tests.
    The glue passes if the solvent in the nail polish didn’t weaken it. (I have already confirmed in previous experiments that nail polish will waterproof paper without weakening it and, in these tests, the breakage still consistently occurred in the un-painted portions of the paper.)


I performed these tests with the following dollar-store adhesives:

Double-sided tape
I tested two different kinds of foamless double-sided tape. Both were too weak to be suitable, losing their grip on the paper during the pull test with barely any visible effect.
Roll-on glue tape
This contact adhesive passed the pull test, but was not tamper-evident in the face of picking and peeling at the joint and was so thoroughly weakened by the oil test that it was trivial to pull open without harming the paper.
Non-frosted Scotch/Sellotape
From personal experience, I know that the frosted variety is meant to be possible to peel off without harm if you’re careful. I tested the un-frosted kind and, while it just barely passed the pull test, it was too easy to peel off without evidence. Also, it was severely weakened by exposure to oil and I suspect this to be a trait common to all readily available contact adhesives.
Roll-on glue tape, plus Scotch/Sellotape
Didn’t perform significantly better Scotch/Sellotape on its own.
Glue stick
Passed the dry and oil tests, but whether it passed the peel test depended very heavily on exactly how I applied it, so I can’t recommend it. It also got severely degraded by the solvent in the clear nail polish.
White/School/Elmer’s Glue (PVA/Polyvinyl Acetate)
I was surprised how well this performed… though I probably shouldn’t have been, given that it’s used as a bookbinding glue. PVA dries quickly enough that, after 5 minutes, the still-damp paper next to the joint breaks under test and, after 10 minutes, the paper is back to normal. It is so resistant to peeling that the paper tends to tear, and neither the oil nor the nail polish solvent had a measurable effect on the joint’s strength.

I did, however, notice that peeling at a corner produced damage that was easier to overlook… thus my recommendation to round off the ends of the strip before gluing.

I would have moved on to testing other avenues, such as hot glue, super glue, contact cement, and so on, but I don’t need to test them to know that, at best, they’d match the performance of white glue for this application.  (They’re harder to work with, more expensive, no faster to dry, and PVA already produces a bond stronger than and just as water-resistant as the paper around it.)

(I did, however, test super glue on some scraps of projector transparencies meant for black-and-white photocopiers. It passed the pull test but was trivial to peel apart without leaving evidence of tampering.)

Posted in Geek Stuff | Leave a comment

How To Extend A JavaScript API Based On Promises/A+

I was writing a little unsigned “dump my tabs”  WebExtension for my Firefox Developer Edition when I realized that, outside the jQuery world, I’d only ever used old-style callbacks before.

“No biggie”, I thought. It shouldn’t be too hard to google up the basic DOs and DON’Ts of writing your own “links in the then chain”. Well, it turns out that, no matter how hard I searched, it seemed like everyone was either trying to convince me to use them (already done), or explaining how jQuery does them (I’m not using jQuery), or teaching me how to use ready-made promise APIs, or ignoring caveats I specifically needed to know about.

An introduction to the underlying design pattern behind all of these Promises/A+-compliant APIs was conspicuously absent so, after stumbling around the web for an hour or two and having to resort to IRC channels to get answers to some of my problems, I decided this really needed to be written down. I’m still not sure I feel confident, but that’s more a symptom of being a perfectionist coding in JavaScript than anything else.

I’ll start with a simple synchronous example:

function mongleData(input_data) {
    return new Promise(function(resolve, reject) {
        try {          
            // Do stuff to produce output_data from input_data
        } catch (e) {

Now I’ll explain:

  1. We use a closure to match the “take data, return a Promise” pattern that other APIs use.
  2. We do everything within the body of the promise’s callback. (Most visibly, so we have access to the resolve and reject functions at all times.)
  3. We wrap everything in a try/catch block so we can redirect all errors into reject. (Yes, resolve(foo(x)) will reject any errors thrown by foo, but what about errors thrown while preparing x? Nearly every article I found just ignored that point.

With this basic API, we can now do things like browser.tabs.query({}).then(mongleData).then(somethingElse)

…so what about a promise around an asynchronous API? Well, let’s implement somethingElse:

function somethingElse(input_data) {
    return new Promise(function(resolve, reject) {
        try {
            let success = function(data) { resolve(data); }
            let failure = function(error) { reject(error); }

            doSomethingAsync(input_data, success, failure);
        } catch (e) {

Again, simple once you get the hang of it. The key is in understanding that resolve and reject basically mean “call the next step in the success/failure chain with this argument”.

The try/catch block isn’t strictly necessary, but it’s a good idea to get in the habit so that, if you do have something that can fail, the failure will get propagated into the promise system.

IMPORTANT: If you do further processing inside the success or failure callbacks, don’t forget to add more try/catch.

Finally, what if you want an elegant way to perform an asynchronous operation on each item in an array? For that, we use Promise.all:

function asyncPromiseForArray(data) {
    return Promise.all(data.map(asyncPromiseForItem))


It’s just that simple. Here’s how it works:

  1. then() calls asyncPromiseForArray(data)
  2. asyncPromiseForArray then…
    1. uses Array.prototype.map to call asyncPromiseForItem on each entry in the data Array.
    2. Feeds the resulting array of promises to Promise.all to produce one promise which waits on all of the entries.

Suppose you want to write a promise which takes a list of URLs and retrieves their contents using XMLHttpRequest. Just write an asyncPromiseForItem that does it for one URL (See my previous “asynchronous promise” example) and let this code extend it to a list of URLs.

IMPORTANT: Promise.all fails if any of the promises in the list fail, so, in some circumstances, you may want to intentionally write promises which ignore certain errors (calling resolve rather than reject).

So, where to next? Well, I’d recommend reading these too:

Posted in Geek Stuff | Leave a comment

A couple of simple proofs, made easier to grasp

I’ve always felt a certain undercurrent of awe at how elegant and beautiful the idea of a mathematical proof is.

Don’t get me wrong, I’m not a math geek, but I can’t help but admire the beauty of the proving process. (While, on paper, I fit the bill for a math geek, I’m too results oriented and learned to resent theoretical math because it was “drudgework my computer is supposed to be doing but the teacher would mark me down for that”.)

That said, I wanted to see if I could help people feel that sense of minor awe without all of the “First, you must complete three quests learn more math” that turns people off… so, here we go. I’ve decided to put my communication skills to the test on a couple of simple proofs. (Sorry for the lack of proper math typesetting)

If I’ve done my job, even someone who didn’t finish high school and doesn’t remember their high-school math courses should have a fair shot at understanding this.

TL;DR: If you’re not willing to try reading this, at least scroll down to the bottom where I’ve provided some very short, but awe-inspiring YouTube videos.

Both of these proofs use a technique called “proof by contradiction”. The idea is that, if you can make math that’s the right kind of nonsense, you can say “Statement X must be false because, if it was true, we’d get nonsense”.

It’s a lot easier to explain by example, so let’s get started:

1. Why you cannot divide by zero

Dividing by zero is one of those things everyone seems to wonder about, so let’s use it to demonstrate proof by contradiction.

  1. Suppose that anything divided by zero is zero. (Makes intuitive sense. Cut something into zero pieces and you have none of it.)
  2. That would mean that 1/0 = 0 and 2/0 = 0.
  3. Ok, an equals sign is just a fancy way of saying “these two things are the same, so let’s use that zero to combine the two equations:
    1/0 = 0 = 2/0
  4. …which means that 1/0 = 2/0
  5. Now, since we can do anything to an equation as long as we do the same thing to both sides, let’s get rid of that “divided by zero” by multiplying both sides by zero. (Remember, 5 * 2 / 2 is the same as 5. The twos cancel each other out.)
  6. Oops. Now we’ve got 1 = 2

Well, the only thing we assumed was that something divided by zero equals zero, so that must be what caused us to get nonsense.

That’s proof by contradiction. Assume one (and only one) thing, then show that it causes nonsense.

You also can’t use infinity (or anything else where you always get the same result) but infinity has the additional problem that it isn’t a number in the same way that “east” and “west” aren’t places.

If you’re still curious, I recommend the Numberphile video Problems with Zero, which shows multiple ways to explore this “dividing by zero” thing as well as other math quirks zero has. (And, unlike me, they draw graphs too)

2. How we know that √2 is irrational

One of the things you might have been told by your high-school math teacher is that √2 (the square root of 2) is irrational.

Refresher (square root):
If n*n = x, then “n” is the square root of “x“. For example, 2*2 = 4, so 2 is the square root of 4.
Refresher (irrational number):
When a number can be represented as a ratio (fraction), such as 0.25 = 1/4, we call it “rational”. An irrational number is a number that can’t be represented as a ratio (fraction). Irrational numbers have an infinite number of decimals which don’t get stuck in a repeating loop. (Pi is the most famous irrational number)

Here’s how you prove that √2 is irrational (that the number you have to multiply by itself to get 2 has an infinite number of decimal places without repeating, like Pi):

  1. For the purpose of our proof, lets assume that √2 is rational.
  2. If √2 is rational, that means that we can find some ratio (fraction) a/b that’s equal to √2.
  3. Before we start, we’ll say that a and b have already been put in simplest terms. (eg. There’s no number that both a and be can be divided by without getting decimals)
    √2 = a/b
  4. Algebra says that you can do anything you want to an equation as long as you make the same change to both sides of the equals sign, so they stay equal.
  5. Let’s multiply both sides by b
    √2 * b = a/b * b
  6. Dividing by b and then multiplying by b is the same as doing nothing, so a / b * b is the same as a
    √2 * b = a
  7. Now, let’s get rid of that symbol. First, since √n * √n = n (that’s the whole point of square roots), we’ll just square both sides (multiply both sides by themselves):
    (√2 * b) * (√2 * b) = a * a
  8. Now, to actually get rid of the symbol, we use another rule that’s already been proven called the associative property, which says that you can reorder the numbers and move the parentheses around in a sequence that’s all multiplication or all addition:
    (√2 * √2) * (b * b) = a * a
    2 * (b * b) = a * a
  9. This can also be written:
    2 * b² = a²

    Now for the clever bits:

  10. Since the left-hand side is 2 times something and both sides are equal, the right hand side must also be 2 times something (eg 2*8 = 4*4 is the same as 2*8 = 2*2*4).
  11. Let’s say that a = 2 * c and replace a in the equation. (We can do this because we’re not adding any new limits on what a can be. We’re just saying that c is 1/2 of whatever a is.)
    2 * b² = (2 * c)²
  12. OK, now we use the definition of what an exponent is (2² = 2 * 2) and the associative property to get rid of those brackets:
    2 * b² = (2 * c) * (2 * c)
    2 * b² = (2 * 2) * (c * c)
    2 * b² = 2² * c²
    2 * b² = 4 * c²
  13. Now let’s divide both sides by 2
    b² = 2 * c²
  14. OK, we’re half-way around to step 9. One side can be divided by 2, so the other side must also be divisible by 2. If we were to repeat steps 10, 11, and 12, we’d be back to step 9 exactly.
  15. The only way you get this behaviour is if a and b are even numbers, which means that a and b are both divisible by 2.
  16. …but, in step 3, we already assumed we’d divided them until they weren’t divisible by 2 anymore!

This means that we’ve got a nonsense equation and the only thing we assumed without proving it was that √2 is rational (that there exist integers (numbers without decimals) a and b where √2 = a/b)

Therefore, √2 must not be rational.

And, indeed, if you’ve got an irrational number, that would explain why step 3 failed. If you’ve got an infinitely complex number, then you’d have to divide a and b by 2 an infinite number of times to prepare it for this proof and, as I said earlier, infinity is a direction, not a destination.

UPDATE 2017-06-03: If you’d like some visual examples, check out 5 Unusual Proofs by PBS Infinite Series.

3. Awe-Inspiring YouTube Videos

So, yeah. If you really don’t want to read the proofs, get your dose of awe from these instead, please:

Posted in Geek Stuff | Leave a comment

Things You Might Not Know About robots.txt

While bringing one of my old sites up to spec, I realized that I’d never actually looked into robots.txt beyond copy-pasting ready-made directives.

So, without further ado, here’s a list of everything I could find about it which isn’t obvious if you also learned about in an ad-hoc fashion:

  1. Crawlers will obey only the directives from the most specific User-agent rule which matches. (I assume this is because the Allow directive is much younger than the Disallow directive.)
  2. Paths must start with a leading slash or they’ll be ignored.
  3. Longer paths (defined by character count) win over shorter paths when both Disallow and Allow match.
  4. Allow is younger and less likely to be supported by crawlers than Disallow.
  5. Crawlers will compare against the percent-encoded form of URLs when checking rules.
  6. Match patterns aren’t regexes, substring matches, literal matches, or globs… they’re literal prefix matches augmented with support for two metacharacters:
    1. The * to match any string of characters.
    2. The $ to match the end of the URL (It has no special meaning when not at the end of the pattern, so it can be escaped by using $* instead)
  7. robots.txt won’t prevent links from appearing in Google.
    1. Google will still show excluded pages if linked from allowed pages… the listings will just be bare URLs without page titles or excerpts.
    2. Pages covered by robots.txt can’t contribute their PageRank to your site.
    3. Bottom line: robots.txt is for controlling resource consumption. Use the HTML noindex/nofollow meta tags and X-Robots-Tag HTTP headers for hiding content for other reasons.
  8. Don’t exclude GoogleBot from your CSS and JavaScript. Google actually renders your pages in order to find more content than competitors and you’ll be penalized for this under their “don’t show GoogleBot different content than real users” policy because it could be interfering with the ability to retrieve AJAX-loaded content or detect a paywall.
  9. I shouldn’t have to say this, but robots.txt is advisory.
    1. Use it to hide pages like your shopping cart page.
    2. Use it to prevent search engines from wasting their time in your spambot honeypots.
    3. Use it to keep search engines from walking a big tree of dynamically-generated filter pages which ultimately terminate at pages you’ve indexed in a more static fashion elsewhere in the site.
    4. Use it to opt out of aggressive prefetching extensions like Fasterfox
    5. …just don’t think it has any benefits for security or secrecy.
  10. Historically, some search crawlers have been finicky, so be strict in your structure:
    1. Order your directives “User-agent, Disallow, Allow, Sitemap, Crawl-delay, Host
    2. Only put one path pattern per Disallow/Allow line.
    3. If you must, comments begin with # but I advise against them.
    4. Avoid blank lines when feasible.
  11. The non-standard Host directive allows you to tell Yandex.ru (which powers DuckDuckGo at the moment) that domains X, Y, and Z are mirrors, with X being the authoritative source.
  12. Google does not honour Crawl-delay. You need to set it in the Google Webmaster Tools.
  13. Use the Google Search Console in Google Webmaster Tools to keep an eye out for robots.txt mistakes hiding pages you actually want crawled.
  14. Make sure your site is replying with Error 400 if query parameters fail to parse.
    1. Google will sometimes generate search queries to try to tease out hidden content and, as one of my sites discovered.
    2. On one of my sites, I have a query parameter that’s used to filter a listing of titles by their first character. (ie A-Z or a number, like a pocket phone directory)
    3. Despite it not being tied to a search field anywhere, GoogleBot concluded it was a search field and started spamming it with irrelevant crud.
    4. If GoogleBot receives Error 404 after it received a 200 OK for other values of the same query parameter, it apparently concludes that Error 404 means “No results. Try another.”
    5. Error 400 is the HTTP response for “malformed request”. It’s typically used for things like JSON APIs, but it applies equally well to “Validator expected a single alphanumeric character. Received a GoogleBot-generated query string.”
    6. Sending error 400 for any malformed URL causes GoogleBot to quickly learn to confine its guessing to actual search fields.

For more, the SEOBook.com Robots.txt Tutorial is the best “from beginning to reference charts” introduction I found while catching up my knowledge.

P.S. While not specifically a robots.txt thing, I learned that Google will honour an `hreflang` attribute on <link> tags and Link headers and it’s always a good thing to give GoogleBot more information to make informed crawling decisions with.

Posted in Web Wandering & Opinion | Leave a comment

Checking if daemons have been restarted

TL;DR: Use this script.

I’ve been working on an ansible script to set up web hosting (because cheap VPSes are cheaper than cheap shared hosting for the features I want) and, since I don’t like having to put in effort for maintenance and debugging, I’ve been trying to make it as robust as possible.

As part of that, I wrote a little python helper script (tested under 2.7 and 3.3) which checks that all processes matching a given name were started after the modification date of a given config file. (In other words, it checks if the config changes have been applied)

I just thought I’d share it in case anyone else finds it useful. Unlike many other approaches I’ve seen for looking up when a process was started, it doesn’t read the system time twice to convert from “seconds since boot” to “seconds since the epoch”, so it should give perfectly deterministic results.

Posted in Geek Stuff | Leave a comment