Why you should ALWAYS practice defensive programming

Take a look at this Rust snippet for a moment and tell me whether you can find the reason it’s not safe to run.

After all, Rust is a language that makes comfortable safety its claim to fame, so it should be pretty obvious, right?

if direntry.path_is_symlink() {
    return Entry::Symlink {
        path: direntry.path().to_owned(),
        target: read_link(direntry.path()).unwrap()
    }
}

Surprisingly enough, there are actually two ways this can fail, and both can be insulated against by this change… assuming it’s acceptable for the Symlink variant to store an Option<PathBuf> rather than a PathBuf:

if direntry.path_is_symlink() {
    let target = read_link(direntry.path()).ok();
    if target.is_none() {
        println!("path_is_symlink() but read_link() failed: {}",
                 direntry.path().display());
    }

    return Entry::Symlink {
        path: direntry.path().to_owned(),
        target: target
    }
}

(In this case, the error resolution strategy is “surface the error in an e-mail from cron via println! so I can fix it when I wake up and send a ‘best effort’ entry to the index that this code populates”.)

The hint lies within that .unwrap() or, more correctly, in the nature of what’s being unwrapped.

Problem 1: Rust’s ownership model can only protect you from misusing things it can control

The first code snippet is susceptible to a race condition because there’s nothing Rust can do to stop another program on the system from deleting the symlink in between the entry.path_is_symlink() and the read_link(entry.path()).

No matter how much validation you do, paths are inherently weak references to resources you don’t have an exclusive lock on. Either handle potential errors each time you dereference them (ie. every time you feed them to a system call), or, if possible in your situation, open a file handle, which the kernel can impose stronger guarantees on.

In my case, opening a handle to share between the two calls was not possible, because the initial query is done by the ignore crate, but I then have to call read_link myself. (I’ll probably file a feature request for this, since it seems like an odd oversight in ignore‘s metadata support.)

(This is what makes proper use of the access(2) function in the C standard library such a niche thing. File permissions can change between when you check and when you try to take advantage of them, so it’s only really proper to use it as a way to bail out early when, otherwise, you’re at risk of performing some heavy or time-consuming task, only to fail for lack of permissions when it comes time to make use of the results.)

I’ve actually written a more thorough blog post on this approach to looking at types of data.

Problem 2: Programmers are fallible

The entry object comes from a WalkDir iterator from the ignore crate, and, for some reason, in my project, the root of the tree always returns true for entry.path_is_symlink() even though it’s not.

Likewise, my very first test with an EXE parser named goblin panicked, because I threw an EXE at it that the author hadn’t anticipated being possible. (I’ve reported the bug and it has since been fixed.)

No matter how good a programming language is, you and the people who write your dependencies are still using the good old Human Brain, which has been stuck at version 1.0, bugs and all, for at least 25,000 years.

As of this writing, I haven’t yet determined whether the symlink bug is in my code or ignore, but it does present a perfect example of why I wrap each unit of processing in except Exception as err: (Python) or std::panic::catch_unwind(|| { ... }) (Rust) in my projects, even if I feel confident that I’ve handled or prevented each failure case with more specific code.

In short, you want a thorough belt-and-suspenders approach to safety:

  1. Expected Errors: Study up on ways that calls can fail, so you can avoid unnecessarily throwing away progress or giving vague diagnostic information with a coarse-grained recovery strategy. (I actually tripped over just this problem with serde_json in the same project, where it was difficult to diagnose and report a bug because the panic message didn’t contain enough information.)Despite that, this still is one of Rust’s biggest strengths. For example, I never realized that getcwd could fail in over a decade of using it in Python.
  2. Unexpected Errors: Identify your transactional unit of work (eg. a step in a processing pipeline that writes its intermediate products to disk, a single thumbnail in a thumbnailer, a single file in a downloader, etc.) and wrap it up in the most general error handler you can write.

As your knowledge grows, the scope of expected errors will grow (filesystem race conditions being a good example of something people don’t usually learn early on), but you’ll never anticipate everything. (Nor should you. There’s a curve of diminishing returns at play, and you need to balance robustness against programmer time.)

Suppose you want to leave your computer to thumbnail thousands of images overnight. You don’t want to spend weeks perfecting code for a single night of use, but would you rather wake up to find thousands of thumbnails and a handful of error messages in the log, or a few thumbnails and a panic message dating to not long after you fell asleep?

UPDATE: It turns out that the bug is in ignore and I’ve reported it.

CC BY-SA 4.0 Why you should ALWAYS practice defensive programming by Stephan Sokolow is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

This entry was posted in Geek Stuff. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

By submitting a comment here you grant this site a perpetual license to reproduce your words and name/web site in attribution under the same terms as the associated post.

All comments are moderated. If your comment is generic enough to apply to any post, it will be assumed to be spam. Borderline comments will have their URL field erased before being approved.