There’s an annoying little problem that happens sometimes when you save stuff off the web, which the popularity of emoji has brought into the spotlight: Some tools still assume Unicode code points will fit within 16 bits and break with characters outside the Basic Multilingual Plane (BMP for short).
I’ve seen this happen with
git gui where I had to adjust the test suites for some Unicode-processing code to use character escapes instead but, in this case, the problem is astral characters in filenames.
If you’ve ever used something like Ctrl+S on a Tumblr page or
youtube-dl on a video that uses emoji in the title, you might have discovered that CD/DVD-burning GUIs like K3b run into errors with
genisoimage when you try to save such files onto a typical Joliet+RockRidge ISO.
It used to be just one or two, so I’d rename them away manually within K3b before burning the disc but, now, I’m starting to see a lot of them… so I wrote a quick little script that recurses through one or more folders (I forgot the usual “Is this a file? Skip
os.walk and go straight to the file handler” code, so no file paths) and renames away any codepoints above
I’m not sure what Windows tools might break on non-BMP codepoints in filenames, but I habitually steer clear of anything I know will complicate making scripts portable so it should work anywhere you’ve got Python 3 installed.