Open Source: The Scientific Community in Technology

tl;dr Free/open-source software is to producing software as science is to accumulating knowledge… science just had a “different upbringing” and more time to build up formal structures.

Note: This was originally written in October of 2010 as an essay for my scientific reasoning course. The citation format has been modified, moving URLs into hyperlinks, and additional hyperlinks have been added to elaborate on terms discussed primarily in my textbooks but, otherwise, this is identical to what was submitted and marked.

When most people look at the open source software community, their reactions typically take one of two forms: Some react with puzzlement at how such a thing can possibly work while others react with wonder at how this marvellous new form of collaboration arose, seemingly out of nothing. The truth is that, while the expression is new, the open source community itself is merely a new manifestation of the same drives, behaviours, and goals already present in a much older and more formalized institution: the scientific community.

Perhaps the biggest obstacle to recognizing this relationship is the fundamental difference in goals. While, in science, the goal is to gain the most accurate representation of the universe possible, the purpose of software development is simply to satisfy the target user group as well as possible. This difference is crucial because it completely changes participants’ motivations for seeking generality. While scientists seek generality because it improves their ability to comprehend the universe, programmers seek general appeal for their creations because it spreads the maintenance burden between a larger group of volunteers and, in the absence of formal awards, gives them an avenue by which to gain community acclaim. This is a critical difference because it means that, at some point, the project reaches a “terminal velocity” where the effort saved by a larger developer group is balanced by the effort expended managing social friction among developers with varying personalities and potentially conflicting visions. Furthermore, from a purely goal-oriented standpoint, perfect generality is impossible in the world of software as an unavoidable side-effect of human nature. Name any design element for any piece of software, and you’ll be certain to find users who disagree over it. This pattern occurs at all scales and large-scale disagreements akin to those between supporters of the caloric and kinetic theories of heat have occurred frequently enough in the history of software to gain a typically tongue-in-cheek geek moniker from less tribal users: holy wars.i

To illustrate the commonalities between the scientific community and the open-source world, consider, for a moment, the open source phenomenon known as forking. Forking occurs when, all other avenues of conflict resolution having failed, a portion of the developer base for a project exercises their license-granted rights and leaves to form a competing project, taking a copy of the source code with them. Because of the sudden requirement to duplicate all activities performed by the existing project, forking is not undertaken lightly, but it serves as a necessary safety mechanism, ensuring that development on a software project behaves like theoretical research, tied to field-leaders only so long as their benefit to the field outweighs the cost of dealing with them.ii

As an example, compare the history of heliocentrism with that of the GNU Compiler Collection, one of the core components of any free operating system.1 By 1997, Richard Stallman had been shepherding his GNU project for over a decade and what he desired more than anything else was a stable compiler that the Free Software Foundation (FSF) could show to the world. This conservatism frustrated various developers who wanted to implement more experimental improvements to the compiler and, in the end, several nascent forks were begun.iii These forks quickly coalesced into a single fork named EGCS (Experimental/Enhanced GNU Compiler System) which, like the heliocentric model of the solar system, was initially less useful, but attracted many new participants interested in the plethora of opportunities to explore and experiment. By April 1999, EGCS had proved so successful that the FSF officially retired their fork, accepted EGCS as the new, official GCC, and adopted EGCS’s more open model for contributions.iv

It should, however, be noted that forking does not always end this way. Just as general relativity did not stop us from using Newton’s equations for calculating motion on Earth, not all forks end with one branch withering or being absorbed into the other. If the community for a project is large enough and the circumstances are right, a fork may lead to two stable projects with common ancestry serving slightly different user bases with neither group willing to expend the effort to reconcile the two ever-diverging codebases. This is the fundamental difference between science’s slowly-unifying tree and open source’s more Darwinian ever-branching, extinction-pruned one.

Typically, forks which don’t unify are formed due to irreconcilable differences between the management of the original project and a contributor or group of contributors large enough to easily maintain their own fork. Probably the most well-known example of this is the fork between the GNU Emacs and XEmacs programmers text editors, begun when GNU Emacs refused contributions Lucid Inc. developed to make GNU Emacs suitable as the base for one of their products.v vi More recent examples include the forking of the Joomla web content management system from Mambo CMSvii and the LibreOffice project from OpenOffice.orgviii when their respective developer communities felt that Mambo Inc. and Oracle weren’t acting in the best interests of their respective communities. As the OpenOffice-LibreOffice fork is still young, reconciliation a la GCC/EGCS is still a possibility.

This illustrates another commonality between the scientific and open source communities: Their social mores and the personality traits necessary for a good project leader. In the open source world, a good project manager is expected to be willing to consider new ideas and accept worthy contributions, yet have the vision and drive to complete the project alone if necessary and the wisdom to only accept contributions which won’t detract from the wholeix. In this sense, projects are more akin to fields of study than individual research projects. This is due to the amount of effort necessary to refactor existing work to fit into new projects. A program is a long-term endeavour and, just as science is ill-suited to fad-like behaviour, open source software development has yet to find a solution to the problem of producing “disposable software” like video games which cannot be refined over the course of several years of use and adjustment.

Furthermore, open source has its own “pseudoscience” to which participants react poorly. Just as work done in isolation without input from the greater scientific community is one of the most common indicators of pseudoscience, so too are “patch dumps” an indication of bad actors in the open source community. Simply put, a patch dump is a large set of modifications to a project’s code, dumped on the management without warning… often accompanied by a “take it or leave it” attitude.x Most patch dumps are simply ignored, but occasionally, one comes along which can’t be accepted, but also can’t be simply rejected. The most recent example of this is probably Google’s changes to the Linux kernel for their Android smartphone platform which languished in the so-called staging tree until eventually being removed for lack of maintainershipxi. Attempts are ongoing to reconcile the two branches, but the Linux maintainers see Google’s non-trivial changes as a flawed solution to the problem being addressed and Google, having the development muscle to maintain their own codebase, has made little effort to reach a compromise. Given the tremendous interest in maintaining a single kernel against which everyone can work, a solution is inevitable, but it may take a very long time.xii

Possibly less visible but just as wasteful in the long run are users who, whether due to hubris or a lack of confidence, keep their projects secret like Darwin did until it’s almost too late. Unlike the events surrounding the theory of natural selection, keeping an eventual open-source project secret serves no useful purpose because public exposure takes the place of empirical testing… a flaw revealed recently with the Diaspora project, an attempt to produce a decentralized alternative to Facebook which, having raised sufficient funding for a summer of full-time work, only released their source at end of the season, long after many flawed assumptions about system requirements had already been incorporated into the code.xiii For the world of software, the effects Darwin experienced are still possible, but come from publishing early yet being unable to commit to an implementation. Projects like GNU HURDxiv and Duke Nukem Foreverxv have come to epitomize the concept of “vaporware”, software which is promised but never delivered… in both cases, because the project leads insist on chasing a moving target. Of course, the traditional danger of being beaten to publishing also still applies as, despite the ability for the software market to support multiple contenders, network effects tend to heavily favour earlier arrivals.

With that, we come to peer review. Perhaps surprisingly, this familiar idea helps to demonstrate that many of the most familiar formalisms of science are only incidental to the process. Rather, scientific formalisms as we know them are, like so much else, organically grown responses to a specific set of circumstances. In this case, a procedural carapace, protecting the search for a single, unified truth from attacks, both external (religion, pseudoscience) and internal (hubris, expectations, bias). Just as there is no single scientific method, peer review is, when examined, equally ephemeral. To the world of open source software, peer review means public development and frequent releases. The former to protect against nefarious contributions and improve the chances of bugs being noticed and fixed, the latter to ensure that user feedback on new changes is swift and varied.xvi If you’ve ever tried software while still in beta, you’ve helped to peer review it. The court of public opinion is a fickle master, even among those who don’t report bugs, and a developer’s skills in quality control and release management are essential parts of their reputation.

Reputation also takes on a surprising role in the world of open source software as, in concert with copyright, experience, social pressures, and the aforementioned threat of forking, it composes the core of open source development’s current analogue to tenure. This relationship cuts both ways. Outspoken, abrasive people with vision and skill like Theo DeRaadt have had funding cut, only to be defended by the community and rescued by group fundingxvii while, simultaneously, people who write good code have been cut down for still being a detriment to the project as a whole. This further reinforces the analogical relationship between fields of study and programming projects. This change in reputation can be extremely sudden, as when the XFree862 management, spooked by a failed fork named Xouvert, changed the license to something incompatible with the GNU General Public License3 and found themselves deserted by the majority of their developers to form X.org in mere weeks. X.org X11 has now supplanted XFree86 as the option of choice for graphics on Linux. However, change can also be slow, as with the current migration by various projects from using GLIBC4 to a variant5 named EGLIBC due to the borderline abusive behaviour of its lead developer and maintainer, Ulrich Drepper, while refusing patches he feels are unnecessary.xviii xix (Including one to fix a shuffling function so it no longer gives skewed distributions)

This focus on sharing over personal gain also expresses itself in the ideological and political facets of the ecosystem to significant effect. The Software Freedom Law Centre, a pro bono legal organization has litigated many cases over violations of the GPL family of licenses and, in public appearances, its chairman, Eben Moglen, has noted that their “we don’t want money, we just want compliance” stance on violations has resulted in a significant reduction in legal deadlock. Moglen has also clarified his stance by analogizing software licensing with a hypothetical “math licensing” regime under which companies must pay a per-seat fee for each field of mathematics they need, always having to budget and ration.xx xxi Even the GNU GPL license, itself a legal document, is written to be just as much a philosophical statement as a legal document, enshrining what Richard Stallman refers to as “the four software freedoms”. (The freedoms to use the software for any purpose, to study and customize the program, to help your neighbours by sharing, and to help the community by distributing any improvements you make)xxii

However, perhaps the most illustrative example of the deep commonalities between the scientific and open source communities is the history of the open source movement itself. Proposed by Chris Peterson in a strategy session in 1997, the term “open source” was intended to describe Netscape’s release of the source code to their waning Navigator web browser without the moral baggage and linguistic ambiguity of the existing term, “free software”.xxiii Free software as a term and a movement, in turn, has its roots in Richard Stallman’s GNU project, an attempt to produce a free clone of the UNIX operating system with the express purpose of bringing back the academic culture of sharing Stallman had grown accustomed to at the MIT computer labs… a culture under threat in the late 1970s and early 1980s from corporate software vendors like Microsoftxxiv, a position most famously stated in 1976 when a young Bill Gates famously accused the majority of computer hobbyists of “stealing their software” in his Open Letter to Hobbyists.xxv

Fundamentally, the open source ethos is the scientific ethos. Software freedom is peer review, collaborative research, and open data, stripped of their formalisms and opened to all willing participants. Open source is the world’s first large scale experiment in organic, ad hoc, massively collaborative problem-solving and, while it may have its flaws, it has succeeded beyond our wildest dreams despite still being in its infancy. The core principles of the scientific community are alive and well and companies like Google and IBM are already recognizing the value of funding grant and mentorship programs like the Summer of Code and hiring full-time programmers to work on non-proprietary projects. While the formalisms that will develop as the community matures are yet to be determined, open source has proven that scientific principles aren’t just for scientific problems and that many of the most fundamental elements of the scientific community may be an emergent property of human nature itself.

Footnotes

1. A compiler translates human-readable source code into executable machine code. Without a compiler, modern computer programming would be impossible.

2. The graphical subsystem used by all Linux distributions which offer a desktop interface.

3. The GNU General Public License (GPL for short) is the most popular license used for open source software and using a GPL-incompatible license is considered equivalent to patenting your research and enforcing your patents very actively.

4. The GNU implementation of the standard library of functions for the C programming language, required by every program written in C… which happens to be the most common language used to write free software.

5. A variant is similar to a fork, but attempts to retain as much commonality with the parent project as possible. Variants are usually started when a set of important contributions are rejected but the parent project is otherwise progressing in a healthy fashion.

References

i Raymond, Eric. holy wars. (December 29, 2003). The Jargon File version 4.4.7. Retrieved October 9, 2010.

ii Hill, Benjamin. (August 7, 2005). To Fork or Not To Fork. Retrieved October 9, 2010.

iii A Brief History of GCC. (January 1, 2008). GCC Wiki. Retrieved October 9.

iv Bezroukov, Nikolai. The Short History of GCC development. Portraits of Open Source Pioneers. Retrieved October 9, 2010.

v Zeth. History of Emacs and XEmacs. (March 25, 2007). Command Line Warriors. Retrieved October 9, 2010.

vi Stallman, Richard. The FSF Point of View. Xemacs vs. GNU Emacs. Retrieved October 9.

vii Joomla. Wikipedia. Retrieved October 9, 2010.

viii Nitot, Tristan. Welcome to Document Foundation and LibreOffice. (September 28, 2010). Standblog. Retrieved October 9, 2010.

ix Srijith, Krishnan. (October 2002) Study on Management of Open Source Software Projects [PDF]. Retrieved October 9, 2010.

x Collins-Sussman, Ben. The Risks of Distributed Version Control. (November 10, 2005). iBanjo. Retrieved October 9, 2010.

xi Hartman, Greg. Android and the Linux kernel community. (Feb 2, 2010). linux kernel monkey log. Retrieved October 9, 2010.

xii Vaughan-Nichols, Steven. Android/Linux kernel fight continues. (September 7, 2010). ComputerWorld Blogs. Retrieved October 9, 2010.

xiii Zer-Aviv, Mushon. Diaspora’s Kickstarter $$$,$$$ success endangers both Diaspora, Kickstarter & you. (May 14, 2010). Mushon.com Networking Loose Ends. Retrieved October 9, 2010.

xiv Hillesly, Richard. GNU HURD – Altered visions and lost promise. (June 30, 2010). The H Open Source. Retrieved October 9, 2010.

xv Kuchera, Ben. The death and rebirth of Duke Nukem Forever: a history. (September 7, 2010). Ars Technica. Retrieved October 9, 2010.

xvi Raymond, Eric. Release Early, Release Often. (September 11, 2000). The Cathedral and the Bazaar. Retrieved October 9, 2010.

xvii Brockmeier, Joe. DARPA Cancels OpenBSD Funding. (April 23, 2003). Linux Weekly News. Retrieved October 9, 2010.

xviii Jarno, Aurelien. Debian is switching to EGLIBC. (May 5, 2009). Aurelien’s weblog. Retrieved October 9, 2010.

xix Ark Linux switches to eglibc (May 13, 2009) Ark Linux: Development and planning of Ark Linux. Retrieved October 9, 2010.

xx Moglen, Eben. (November 21, 2006) Software and Community in the Early 21st Century. Retrieved October 9, 2010.

xxi Glass, Geof. Eben Moglen on Free Software and Social Justice. (November December 10, 2006). a whole minute. Retrieved October 9, 2010.

xxii The Free Software Definition, version 1.92. (October 10, 2010). Philosophy of the GNU Project. Retrieved October 9, 2010.

xxiii History of the OSI. Retrieved October 9, 2010.

xxiv Stallman, Richard. (October 3, 2010). The GNU Project. Retrieved October 9, 2010.

xxv Gates, Bill. (February 3, 1976). An Open Letter to Hobbyists. Retrieved October 9, 2010.

CC BY-SA 4.0 Open Source: The Scientific Community in Technology by Stephan Sokolow is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

This entry was posted in Geek Stuff. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

By submitting a comment here you grant this site a perpetual license to reproduce your words and name/web site in attribution under the same terms as the associated post.

All comments are moderated. If your comment is generic enough to apply to any post, it will be assumed to be spam. Borderline comments will have their URL field erased before being approved.