A Better Linkifying Regex

From time to time, I run across situations where the linkifying Greasemonkey script I use mistakenly includes a closing parenthesis in what it considers to be a URL.

Given that I can’t remember a single situation where I needed to linkify a URL with nested unescaped parentheses but URLs inside parentheses have bitten me repeatedly, I decided to solve the problem in a way that’ll work with any regex grammar.

const urlRegex = /\b((ht|f)tps?:\/\/[^\s+\"\<\>()]+(\([^\s+\"\<\>()]*\)[^\s+\"\<\>()]*)*)/ig;

Basically, it matches:

  1. http, https, ftp, or ftps followed by ://
  2. an alternating sequence of “stuff” and balanced pairs of parentheses containing “stuff”

…where “stuff” to refers to a sequence of zero or more non-whitespace, non-parenthesis characters (and, in this linkify.user.js version, non-caret, non-double-quote too).

Embarassingly, aside from two corrections and a few extra characters in the blacklists that I kept from the original linkify.user.js regex, this is a direct translation of something I wrote for http://ssokolow.com/scripts/ years ago… I’d just never remembered the problem in a situation where I could spare the time and willpower to do something about it.

Here’s the corrected Python original.

hyperlinkable_url_re = re.compile(r"""((?:ht|f)tps?://[^\s()]+(?:\([^\s()]*\)[^\s()]*)*)""", re.IGNORECASE | re.UNICODE)

The corrections made were:

  1. Allow the pairs of literal parentheses to be empty
  2. Move a grouping parenthesis so that a(b)c(d)e will be matched as readily as a(b)(c)d.

Markdown source code works especially well to demonstrate the difference.

Naive Linkifying Regex My Linkifying Regex
[FreeDOS](http://freedos.org). [FreeDOS](http://freedos.org).

Theoretically, look-ahead/behind assertions are enough of an extension to regexp syntax to allow real HTML parsing, so I could probably also support nested parens, but I’m just not in the mood to self-nerd-snipe right now.

Posted in Geek Stuff | Leave a comment

Mixed Feelings on Cloanto and Amiga/C64 Forever

UPDATE: I’ve received a response from Cloanto and, after talking to a real human about this, I’m convinced that this is mostly, if not entirely, a pile of unfortunate mistakes that they sincerely want to get fixed. I’ve added notes to clarify things.

As someone who prefers to take the high ground, when I was offered the opportunity to get Amiga Forever and C64 Forever at a big discount, I jumped at it. My first PC was an original IBM PC and I’d missed out on those famous platforms entirely… here was a chance to get into them without compromising my principles.

I also loved how, as a Linux user, Cloanto seems to be walking a balance with their cross-platform support page… admitting that their digital download releases are MSI installers, but providing what I’ll call “relaxed support” for other platforms from CD/DVD versions and clarifying that users have confirmed the ability to generate them from the MSIs using Wine or equivalent.

However, after deciding to purchase, I noticed that their website design wasn’t the only thing that, to be kind, felt a bit dated.

First, their purchase process. Is it really necessary to ask users for their shipping address if they’ve selected a digital download item and PayPal payment? I could easily see that driving away some on-the-fence buyers who value their privacy.

UPDATE: We’re still talking, but this looks to me like one of those “their merchant services provider doesn’t understand this marget segment” issues… they’re already working on a new site design to help remedy that problem as much as possible.

Second, the post-purchase e-mails. Where do I start?

  1. Is it really necessary for me to receive seven e-mails in response to a successfully completed transaction?
  2. What’s the point in sending me an e-mail, just to tell me to log into my e-mail account to follow the instructions in the e-mail I’m about to receive? (No joke.) I’m not going to see it until I’ve done what it’s telling me to do!
  3. An anti-fraud measure involving them asking me to confirm my PayPal e-mail? What’s wrong with just asking PayPal if I’m a verified user. (Oh well, if Cloanto or Avangate start spamming me, I can just move PayPal to a new alias and delete the old one.)
  4. Is it really necessary to send three different confirmation e-mails for different stages of the process, rather than just waiting a couple of seconds and sending one combined e-mail?

Oh well… on to the next problem.

UPDATE: The seven e-mails are all from their merchant services provider (Avangate) and I received no argument on this side that it’s excessive. They’ve passed on my concerns via the B2B communication channels available to them.

Third, the registration keys.

As soon as I saw those, I immediately worried that maybe Amiga Forever and C64 Forever were online-activated products and the installers would stop working if Cloanto went out of business.

(Thankfully, the “Forever” in the title does appear to be accurate, as they installed without complaint on the quarantined Windows XP retro-PC that does double-duty as an online activation tester. No need to demand a refund.)

UPDATE: We’re still talking, but I’ve suggested, at minimum, that an explanation of the key’s purpose (unlocking the paid content in a multi-role offline install package) be provided either with or before showing the keys. I also made suggestions for a longer-term strategy.

Fourth, when your selling a game-related product in the era of services like Steam and GOG, this can easily trigger buyer’s remorse:

You have 50 downloads remaining.
Link expires on: November 16, 2017.

There are many reasons this is a problem:

  1. This is retro-emulation stuff with the bulk of it being more than 20 years old. It’s already easy enough and tempting enough for people to pirate it without adding an expiry message so they can’t rationalize it as paying for a booklet of 50 off-site backup coupons.
  2. In the era of cheap hosting like Amazon S3 and “re-download is better than backup” services like Steam/Origin/uPlay/GOG/etc., is it really necessary to make people with flaky connections worry about whether their download manager’s resume feature will chew up most of their redownloads?
  3. The installer acts as if the same download is offered for both trial and paid copies, depending on whether you enter a registration code. Again, why am I made to agonize over a redownload limit and expiry counter on this thing?

All in all, this screams “Danger, Will Robinson! Danger!” because this kind of out-of-touchness makes me worry about whether they’ll remain competitive enough in the market to avoid going under.

*sigh* Ok, I’ve paid for the damn thing and, as much as it hurts, I’ve already spent far too much money on taking the moral high ground for other platforms (eg. I use a Retrode and buy actual cartridges, rather than being locked into Nintendo’s Virtual Console.). What’s next?

UPDATE: They’ve actually been trying to get Avangate to understand this for a while and providing their own accounts system to resolve this is part of the reason they’re working on a new site design.

Fifth, the download speed.

An average download speed of 80KiB/s because it gives me spikes of full speed alternated with several seconds of nothing… ’nuff said.

UPDATE: Avangate serves the files.

Sixth, the license agreement.

  1. Make up your mind. The website I can see before I pay seems friendly and willing to allow me to install on any platform I have the know-how for, but the license says that installing it on any platform other than Windows, MacOS, or GNU/Linux (eg. FreeBSD) will terminate my license.
  2. I can only install it on two machines? Dammit, I forgot to pay attention to whether that “wait at least 6 months” rule was only for the evaluation version.
    Does my “moral high ground” rule mean that I can’t install it on both my Linux desktop and my Linux handheld until 6 months after I remove it from the XP machine I used for testing?

UPDATE: I had to clarify my concerns in my response. I’ve waiting for a reply.

Seventh, the RP9 files.

Not strictly Cloanto’s fault, but there are no Google results which mention that you can get more broadly compatible disk images from an RP9 using 7-Zip. I just figured it out by accident when I right-clicked one on the test machine.

UPDATE: I’ve suggested some minor adjustments to the knowledge base page which shows up in Google and/or the “RP9 Toolbox” software to draw more attention to the link to the RP9 spec which I missed.

Eighth, the games themselves.

Ok, so, dude, I bought this pack because I want to stay legal. Ya dig? …so why am I seeing a cracking group intro when I fire up B.C.’s Quest for Tires on the C64?

I seriously doubt the rightsholder for the game got permission to use the cracking group’s intellectual property and just because it’s an unauthorized derivative work doesn’t magically cause the rights to be forfeit.

I’m now stuck in one of those BS situations where I’m only “legal” because the guys I sided with have the bigger stick, not because they’re actually in the moral right.

…so, what did I pay for then? Kickstart ROMs and disk images that would have fallen into the public domain by now if Copyright hadn’t become corrupted and the warm, fuzzy feeling of having a slightly lighter wallet?

I’m really starting to understand why the GOG.com user base considers Cloanto to be at fault for GOG.com failing to negotiate a deal for the Kickstart ROMs so they could include Amiga games in their catalogue.

I’d say “I give up”, but that might be taken as “I’m going to start pirating” when, really, it just means that I’m probably going to buy fewer retro-games. I already have

They wonder why people pirate things when, even if you spend hours and tie yourself in knots trying to stay compliant with the letter of copyright law, your upstream suppliers are unilaterally deciding that a cracking group’s IP deserves no protection because it’s an unauthorized derivative work. It’s simply flat-out impossible to enjoy early cultural artifacts in the world of gaming and retain the moral high ground in a world of bit-rotting floppy discs. 🙁

UPDATE: They’ve actually brought this “We got the rights to the games, but what about the copyright on the code the crackers wrote?” issue up with the US Copyright Office multiple times.

Also, on the “GOG failed to negotiate a deal” front, Cloanto is apparently aiming to eventually get the Amiga/C64 IP to the point where it can be spun off as a non-profit… it’s just not as simple as I make it sound.

Finally, the convenience (or lack thereof).

I can only assume that Cloanto is mostly trying to compete with pirates based on convenience (like Steam does quite well)… but does this seem convenient to you?

  1. Install both things on my quarantined XP machine so I can be sure they’re not phoning home.
  2. Ask both tools to generate the promised ISO versions (and printable covers, since I’m going to this effort anyway) because using p7zip to unpack the installers on my Linux desktop without running them produces unhelpful filenames.
  3. Put everything including the ISOs, my purchase invoice, and a text file containing the registration keys into another DVD ISO so I know everything can be kept together nicely.
  4. Run all three of the aforementioned ISOs (official C64, official Amiga, combined backup) through dvdisaster to augment the raw ISO filesystem with forward error correction in case the discs start to bit-rot after my download links have expired.
  5. Burn all three to discs from the stockpile of Taiyo Yuden T02 DVD+R media that I use for archival (which, by the way, they no longer make).
  6. Write the order number and my name on all three discs so that they won’t look pirated if Cloanto goes out of business and their records become unavailable.
  7. Write the registration keys on the official media, since they won’t have them in the burned data.

UPDATE: Already addressed as a side-effect of addressing the earlier concerns.

…and no, I couldn’t just rely on pirated copies as my off-site backup. Those bits have the wrong colour.

Posted in Geek Stuff | Leave a comment

Simple Alarm Clock Script For Linux

TL;DR: Install python-dateutil, pytimeparse, and this script, then see the --help output for more details.

For a while, I’d been using the at command to schedule alarms when I needed to wake up in the morning, but I found that it was a fragile solution because of how MPlayer and its descendants interacted with PulseAudio’s session-centric setup and the presence or absence of a video output.

…and you really don’t want a fragile solution for your alarm clock, so I decided to write a little helper script that could run inside my quake-style terminal in my user session so it would Just Work™.

You’ll want to edit the hard-coded media player command it uses to actually play the alarm, but, otherwise, it should be pretty polished for something I just hacked together for my own use.

It’ll accept arguments in two forms:

  • wakeme at 6am
  • wakeme in 3 hours

(It accepts a great many formats for times and durations, so I’ll just point you at the docs for dateutil.parser.parse() (times) and pytimeparse (durations) for the complete list.)

Either one will cause it to echo back its interpretation of what you asked for (so you can double-check that it understood properly) and then sleep until it’s time to wake you.

Installation is as simple as:

  1. Make sure Python 2.x is installed (I haven’t tested 3.x)
  2. Install python-dateutil
  3. Install pytimeparse
  4. Put wakeme in your PATH
Posted in Geek Stuff | Leave a comment

What Disney Has Forgotten About Classic Donald Duck

Who hasn’t seen at least one of the classic Donald Duck cartoons from the 1930s, 40s, and 50s? You know, the ones where, in the later cartoons, the theme says “Who never never starts an argument?”

Sadly, it seems that Disney has forgotten what made those so special. About 5 years ago, when I had the opportunity to borrow the Chronological Donald box set [1] [2], I saw that it ended with an example of a modern Donald Duck cartoon from Mickey Mouse Works which Leonard Maltin referred to as proof that Donald was still alive and well.

The problem was that, in that cartoon, we see Donald at the zoo, trying to take a picture of the Aracuan Bird for Daisy, while the Aracuan Bird keeps tormenting him… I felt sorry for Donald and that’s not supposed to happen!

What Disney seems to have forgotten is that classic Donald Duck cartoons are supposed to be a caricature of our own failings. That short felt more like a Warner Brothers cartoon in disguise.

I gave it some thought and I managed to come up with three rules:

1. Donald is the maker of his own misfortune

This rule is satirized right in the opening song. They sing about how Donald never loses his temper and so on, and Donald is listening and agreeing, but we all know how false that is.

Look at Donald trying to make waffles. What goes wrong? He leaves his scrapbooking rubber cement out and mistakenly uses it in the recipe… and why was he using rubber cement for scrapbooking in the first place?

What about getting into a fight with Huey, Dewy, and Louie? Same verditct. He starts it by smashing up their snowman with his toboggan and then refuses to concede defeat as things escalate.

Getting into a fight with a robot butler in a museum of modern marvels over whether he can keep his hat on? Not only is it stupid to argue with a machine, the jerk cheated the turnstile with a coin on a string!

2. Animals are innocent

Look at the cartoons. whenever Donald gets into a fight with an animal, he always throws the first punch. For example, look at when he goes on a picnic. Sure, the ants show up to cart off his food, but that’s just ants being ants! How does Donald respond? He provokes them by playing mean-spirited pranks on them.

Donald the Beekeeper? He keeps escalating the fight when the bees don’t take kindly to having their honey stolen. Donald the highway-builder, assigned to remove Chip and Dale’s tree? He pranks them when they mistake his steam shovel for a dragon.

This is the biggest mistake that the modern cartoon made. Warner Brothers has antagonistic slapstick between two characters who, if you really think about it, tend to be jerks. Disney is supposed to do better. (eg. Classic Pluto cartoons are caricatures of things you can easily imagine your dog actually doing. Classic Goofy cartoons began as parodies of wholesome father-and-son shows, then became parodies of “how to” videos.)

3. Inanimate objects are antagonistic

While you probably haven’t gotten mocked by a clock spring or taunted by a steam piston (“So!” “Ssssoooo what?”), we’ve all had those moments when we felt so frustrated because “Why won’t you just work!?“.

These moments are Donald Duck’s bread and butter, with inanimate objects like rocks, bike pumps, and machinery doing the wildly improbably or, sometimes, even the impossible in order to produce the most frustrating outcome for Donald Duck.

From a pebble under his camping cot getting thrown and landing in the one place where it can trigger a rock slide, to his rubber cemented waffle batter behaving in a very familiar way to anyone who’s ever used a not-brand-new tube of rubber cement, Donald Duck spent a ton of time being a send-up to all of the little frustrations of day-to-day life.

So… how would I have done it?

The key is to make it completely clear that Donald’s troubles are his own damn fault.

After Daisy asked Donald for the picture, I’d have started with Donald getting frustrated with the view and deciding he knows better than the posted zoo rules… maybe climbing a tree or fence to get a better angle for the photo.

Then, when he encounters the inevitable pratfall, the Aracuan Bird can laugh along with the audience, egging Donald on further. From this point on, it’s clear that Donald’s at fault.

The episode could then take on that Roadrunner and Coyote-esque quality that happens when Donald provokes an innocent creature, except more focused on antagonistic inanimate objects since the Aracuan hasn’t truly been given cause to fight Donald.

Posted in Writing | Leave a comment

A more formal way to think about validity of input data

I’ve begun to port one of my hobby projects from Python to Rust and, while setting up the clap argument parser, I found myself having to bind to the access(2) libc function myself.

Yes, it exposes you to a race condition exploit if you’re not careful, because the permissions could change between checking and depending on them. Yes, it’s a documented fact that it may be more permissive than actually attempting to access the filesystem. (I believe the situation I’m remembering was “access() doesn’t consider ACLs when evaluating permissions”) …but how else am I to implement a “fail early” check for “Can I create files in this directory?” when there exist real in-the-wild examples of filesystems (eg. AFS) having been configured to allow the creation of a hypothetical test file, but not the subsequent deletion?

That said, despite my intent to use Rust to ensure I handle every recoverable error case, there’s still a certain appeal to being able to point to a spot and say “beyond this point, this piece of data is trustworthy”.

Thinking about this made me realize a nice, simple way to think about handling input data. By analogy to passing by value (with deep copying) or by reference.

NOTE: While my examples will all use command-line arguments, this applies to any kind of input data.

Value Arguments

If a command-line argument cannot become invalid after being validated, then it’s a value argument. Examples of this include:

  • Boolean flags like “mirror this print job”
  • Integers representing things like the number of copies of a document to print
  • Strings which can’t experience any kind of namespace collision

You can validate value arguments once and then trust that they’ll stay valid.

Reference Arguments

If an argument depends on something outside your control to determine its validity, then a validity check only applies to the instant you perform it. Common examples of “reference arguments” include:

  • Filesystem paths (Between the check and use, permissions could change, a creation/deletion/rename could invalidate the path, etc.)
  • File descriptors (Even a supposedly local file descriptor could be on a network-mounted drive which goes away)
  • Strings used to create filenames (someone could create a file with that name which you lack the permissions to manipulate)
  • Network addresses
  • Cached results of arbitrary checks

This means that you need to be prepared for the unexpected every time you use a reference argument and you can only check separately from using them if the following conditions are met:

  1. The check has no security implications and can be safely removed
  2. You accept that the check could fail but the attempt could still succeed
  3. You accept that the check could succeed but the attempt could still fail


Argument Type Why?
Boolean Value Nothing external to the program will invalidate this.

(The only way this could be a reference is if there were some kind of wrapper which detected the orientation of pre-punched cardstock in the printer and then did or didn’t pass this flag. The user could invalidate it by flipping/rotating the card stock before the print job actually begins.)

Boolean Reference The flag implies that either the user or the code detected a rewritable CD/DVD, but the user could swap in a non-rewritable disc before it actually gets used if the script does something long-running first, like generating an ISO in /tmp

Because you can only erase a rewritable disc, this must be validated as late as possible. (ie. After the drive tray has been locked and right before the operation would take place)

 Number of copies to print Integer Value The only relevant detail which can change is how much paper is in the printer, and, if there isn’t enough, the proper solution isn’t to reduce the size of the print job.
 File descriptor Integer Reference The descriptor could be pointing at a resource on a network-attached device that goes away.
Document Title String Either Whether to treat this as a reference depends on where it will end up and how you handle failure.

If you’re converting an eBook with ebook-convert from Calibre, then it’s a value because the output filename is specified separately and whether your title will override the source file’s metadata is not up for debate.

Output Filename String Reference No matter how many times you validate, it’s possible that a read-only file will have taken that name by the time you call open()

The Takeaway

  • Think in terms of how one piece of data depends on another and don’t forget that dependencies can extend outside of your program.
  • Whether a piece of data can be validated once and then trusted is unrelated to its data type or how it’s passed within your code. (You can pass a filename or URL by value but it’s still a reference to an external resource. A network filesystem will subvert your expectations for how reliable it is to hold an open file descriptor. etc. etc. etc.)
  • The definition of “valid” for a piece of data may depend on how your program is intended to be used. (A human might specify a filename and re-run your tool if it’s already taken. From your perspective, that means it’s valid even if it causes the process to abort. A GUI frontend, on the other hand, probably won’t know how to detect that kind of failure and retry. Expose a more foolproof API by using something like mkstemp or mkdtemp and then returning the newly-created path.)
  • Functions like access which check the validity of a reference are unreliable and should only be used to catch obvious mistakes early so the user doesn’t have to waste their time waiting for a failure that could have been anticipated. If it’s unsafe to comment them out, you’re doing it wrong.
    (eg. You can use access to detect read-only target directories before you know the exact output filename… with the caveat that they could be made read-only between the check and the attempt to actually write the file.)
Posted in Geek Stuff | Leave a comment

On Making Steam Machines Successful

TL;DR: Provide a summarized representation of system requirements, make it easier to decide between different models, partner with YouTube and/or NetFlix to make the device more valuable, spin the cost of a Steam Machine as an investment in cheaper per-game costs and long-term compatibility, and appoint/hire a hype management expert.

With Steam Machines, Valve is quite possibly the first company to have a viable idea for a non-traditional gaming console. However, there are still several ways in which they don’t seem to be learning from history.

One of the greastest strengths consoles have always had (and, with PCs taking the lead on hardware innovation, their main strength) is their appliance-like simplicity. Conversely, the greatest weakness of personal computers is that they do an inherently complex job and attempts to reduce them to mere appliances have always crippled them to the point of irrelevance.

However, as the millennial generation grew up with computers, the definition of an acceptably simple console shifted closer and closer to what Steam now provides, growing a simplified operating system, game browser, online store, and a menu analogous to the Steam overlay.

While I’m not a fan of online DRM, I can’t help but approve of the money and effort Valve has been pouring into getting people to make Linux builds of games (making a build without Steamworks CEG for sites like GOG and Humble is easy once you get that far), so I thought I’d point out the main mistakes Valve seems not to be learning from the tales of other “licensed hardware” consoles, like the 3DO, the Philips CDi, and the Nuon.

The Problems

First, price. Without the ability to sell the console as a loss-leader or enjoy the massive economies of scale for a single model, these consoles always lose out on price.

Second, confusion. To put it bluntly, “If I wanted to do research, I’d be a PC gamer”. Picking the right Steam Machine is a serious issue and at odds with the “console-like simplicity” niche that everything else about the Steam Machine has been aiming for.

Valve should also keep in mind that a glut of choice without clear and obvious determinants was one of the big contributing factors to the video game crash of 1983. The difference being that, here, the answer is “Stick to Nintendo/Sony/Microsoft” rather than “Shy away from the entire market”.

Third, inertia. Whenever you look at technologies which failed to live up to their potential, one of the recurring themes is that they fell off the wave of excitement they were building and it died away. Steam Machines have suffered the same problem, which makes future marketing efforts much more difficult.

That said, the biggest problem the 3DO, CDi, and Nuon had was their poor game libraries… something Valve has been doing an excellent job to solve. This is why I firmly believe Steam Machines have a chance.

The Solutions

First and foremost, Valve needs to simplify buying decisions. I strongly suggest the following:

  1. Summarize system requirements into numbered hardware classes and focus promotional efforts on three at a time, representing entry, mid, and high-level hardware:
    • Class 1: Entry Level on release day
    • Class 2: Mid-level on release day, Entry Level when Class 4 is announced
    • Class 3: High-level on release day, Mid-level when Class 4 is announced
    • Class 4: High-level when announced
  2. Set up something like the Windows Logo Program through which hardware partners are approved for “Class 1/2/3” badges to use on their hardware and packaging.
  3. Use class badges to summarize the required and recommended system requirements on each game in the Steam store and add support for filtering by them.
  4. By default, Steam Machines should filter by required class. (Completely. Even front-page promotions which don’t run on the filtered class should be eliminated or collapsed into a “deals for your other devices” bar on Steam machines incapable of playing them.)
  5. Produce a prominent “What Steam Machine Is Right For Me?” comparison matrix based, not on system requirements or features, but on comparing which games will run on which of the three classes currently being promoted. I’d suggest the following three columns with “and more…” hyperlinked to an appropriate catalogue search:
    1. Class 2: <list of popular games> and more…
    2. Class 3: Everything in Class 2, plus <list of popular games> and more…
    3. Class 4: Everything in Classes 3 and 4, plus <list of popular games> and more…

If users start thinking of the classes in terms of “a Class 1 game” rather than “a Class 1 machine”, so much the better for marketing purposes. (It could be leveraged into a tool for evaluating current desktop PC hardware or possibly planned purchases for their suitability to gaming, thus helping Steam as a whole.)

Second, Valve needs to change the conversation about price. When you buy a modern console, everyone knows that you’ll need more than one because, at best, you’ll get one generation of backwards compatibility and you may have to re-buy your games for that.

When you buy a Class 5 Steam Machine, the Steam Runtime guarantees compatibility all the way back to Class 1 and, when you buy a class 8 Steam Machine, you don’t have to re-buy anything. Furthermore, games are never locked to your console the way they are with the Wii Virtual Console.

Also, as everyone on PC knows, Steam sales allow you to build your library much more cheaply than on a regular console.

A Steam Machine is an investment in spending a lot less on the actual games.

Third, Valve needs to build on that “fewer pieces of hardware” angle. If a Steam Machine is maximally backwards compatible, why can’t they also partner with Google and/or NetFlix to include a tweaked copy of Chrome and ChromeOS apps for YouTube and NetFlix? I know for a fact that NetFlix has one.

Failing that, they could pour some effort into Shashlik to get the YouTube and NetFlix Android apps running on SteamOS in a polished way. Hell, if they’re not planning to compete with the Google Play Store, that’d allow them to add “plays select Android games” to the less emphasized portion of the Steam Machine feature list.

(“less emphasized” because it’d probably be tricky to get the go-ahead to preload the Google Play Store app and “select games” because, last I checked, most Android games used native ARM machine code and I’m unsure what Intel would want for their libhoudini ARM-to-x86 emulation layer for Android.)

Finally, Valve needs to manage their marketing better. Allowing the excitement surrounding the Steam Machine to bleed away into “valve time” [2] [3] will cripple any attempt to break into an existing market. Find someone who knows how to walk the hype tightrope and listen to their advice very closely.

However, on this front, Valve actually has an advantage that the the 3D0, CDi, and Nuon didn’t: Steam is an established, successful brand, Steam is the leader in PC game sales, they have a lot of money, and they’ve already proven a capacity for long-term thinking with Steam itself… they have the resources to succeed where falling this far off the hype train was a death blow to 3DO and friends.

…or I could be wrong and Valve made a conscious decision to put Steam Machines on the back burner when the Windows Store failed to materialize as an active and growing threat. Either way, had Valve followed this advice in the early days, the Steam Machine concept would be in a much stronger position now.

Posted in Web Wandering & Opinion | Leave a comment

A Compromise Between Substring and Prefix Matching

A.K.A.: How to write what human intuition actually expects substring matching to be

While the changes aren’t yet ready to be pushed, I’ve been working on one of my hobby projects quite a bit over the last few days and I just thought I’d share a little something I stumbled upon while implementing a result filter box.

Systems with advanced string searching will often let you choose between prefix or substring matching, but I’ve found that both have glaring flaws when you’re implementing something like a “find as you type” launcher, where the goal is a fast match that’s “good enough”.

With substring matching, you quickly realize that computers are much better than humans at finding substrings in the darndest of places, making substring matching very counter-intuitive. (I get the impression that it has to do with humans thinking in syllables while computers don’t, so it’d be interesting to see how the effect changes in non-alphabetic writing systems, like Kanji or Hangul.)

By contrast, prefix matching is often overly specific and ill-suited to situations where many titles may begin with the same article (A, The, etc.) or the name of a series with many entries. Unfortunately, splitting off the articles, then moving them to the end, as Steam does, also has the potential to trip people up, so there’s no perfect solution.

The solution I developed, almost by accident, is essentially a half-way point between prefix matching and the full-blown keyword-based approach a search engine takes:

Use case-insensitive matching and require that substring matches begin at a word boundary.

This has the following desirable characteristics for a find-as-you-type solution:

  • It minimizes the need to press modifier keys, which require costly muscle synchronization:
    • It’s case-insensitive
    • There’s no need for users to quote literals to avoid them being reordered as would be necessary with a full-blown keyword search grammar (ie. “pirates of” won’t match “of pirates”)
  • It’s robust against variations in title formatting:
    • A search for “bri” will match both “The Bridge” and “Bridge, The” without also returning spurious results like “Abrix the robot”.
    • A search for “pir” will return “Space Quest III: The Pirates of Pestulon” without concern for how many Space Quest games sort earlier in the results, whether the title was transcribed using “3” or “III”, or “]|[“, whether the subtitle begins with “The”, or whether the separator is “: ” or ” – “.
  • It lacks the over-broadness that you find with substring matching, where “pir” will match “Drascula: The Vampire Strikes Back” and “Spirits”.

It’s also simple to implement:

  • For typical regexp searching, just prepend \b to the pattern and set the case-insensitive flag. (If your engine lacks \b, then use (^|\s) instead.)
  • For literal string matching on top of a regexp engine, just escape the pattern and follow my instructions for a regexp search.
  • For CMD.EXE-style wildcard matching, escape the pattern, then replace \? with . and \* with .* before prepending the \b.
  • For a manual implementation of literal-string matching on titles with normalized whitespace, just check whether it matches at the beginning (eg. title.lower().startswith(pattern.lower())) and then prepend a space and search within. (eg. (title.lower().index(' ' + pattern.lower())) >= 0)

UPDATE 2016-10-02: The \b word boundary token doesn’t consider parentheses to be part of a word, which I’ve found to be a confusing surprise in day-to-day use, so you’ll want to use (^|\b|\s) instead of \b. This will allow both “(Eng” and “Eng” to match “(English)” in typical usage for maximum intuitiveness.

In case you want to play around with this, here’s a quick sampling of how to regex-escape a string in various popular environments:

Posted in Geek Stuff | Leave a comment