overview

Advanced

Myths Open Source Developers Tell Ourselves

Posted by archive 
Published on The O'Reilly Network (http://www.oreillynet.com/)
Source

Myths Open Source Developers Tell Ourselves
by chromatic
11 December 2003

One persistent misfeature of open source development is thoughtless mimicry, copying the behaviors of other projects without considering if they work or if there are better options under the current circumstances. At best, these practices are conventional wisdom, things that everybody believes even if nobody really remembers why. At worst, they're lies we tell ourselves.

Perhaps "lies" is too strong a word. "Myths" is better; these ideas may not be true, but we don't intend to deceive ourselves. We may not even be dogmatic about them, either. Ask any experienced open source developer if his users really want to track the latest CVS sources. Chances are, he doesn't really believe that.

In practice, though, what we do is more important than what we say. Here's the problem. Many developers act as if these myths are true. Maybe it's time to reconsider our ideas about open source development. Are they true today? Were they ever true? Can we do better?

Some of these myths also apply to proprietary software development. Both proprietary and open models have much room to improve in reliability, accessibility of the codebase, and maturity of the development process. Other myths are specific to open source development, though most stem from treating code as the primary artifact of development (not binaries), not from any relative immaturity in its participants or practices.

Not every open source developer believes every one of these ideas, either. Many experienced coders already have good discipline and well-reasoned habits. The rest of us should learn from their example, understanding when and why certain practices work and don't work.
Publishing your Code Will Attract Many Skilled and Frequent Contributors

Myth: Publicly releasing open source code will attract flurries of patches and new contributors.

Reality: You'll be lucky to hear from people merely using your code, much less those interested in modifying it.

While user (and developer) feedback is an advantage of open source software, it's not required by most licenses, nor is it guaranteed by any social or technical means. When was the last time you reported a bug? When was the last time you tried to fix a bug? When was the last time you produced a patch? When was the last time you told a developer how her work solved your problem?

Some projects grow large and attract many developers. Many more projects have only a few developers. Most of the code in a given project comes from one or a few developers. That's not bad — most projects don't need to be huge to be successful — but it's worth keeping in mind.

The problem may be the definition of success. If your goal is to become famous, open source development probably isn't for you. If your goal is to become influential, open source development probably isn't for you. Those may both happen, but it's far more important to write and to share good software. Success is also hard to judge by other traditional means. It's difficult to count the number of people using your code, for example.

It's far more important to write and to share good software. Be content to produce a useful program of sufficiently high quality. Be happy to get a couple of patches now and then. Be proud if one or two developers join your project. There's your success.

This isn't a myth because it never happens. It's a myth because it doesn't happen as often as we'd like.
Feature Freezes Help Stability

Myth: Stopping new development for weeks or months to fix bugs is the best way to produce stable, polished software.

Reality: Stopping new development for awhile to find and fix unknown bugs is fine. That's only a part of writing good software.

The best way to write good software is not to write bugs in the first place. Several techniques can help, including test-driven development, code reviews, and working in small steps. All three ideas address the concept of managing technical debt: entropy increases, so take care of small problems before they grow large.

Think of your project as your home. If you put things back when you're done with them, take out the trash every few days, and generally keep things in order, it's easy to tidy up before having friends over. If you rush around in the hours before your party, you'll end up stuffing things in drawers and closets. That may work in the short term, but eventually you'll need something you stashed away. Good luck.

By avoiding bugs where possible, keeping the project clean and working as well as possible, and fixing things as you go, you'll make it easier for users to test your project. They'll probably find smaller bugs, as the big ones just won't be there. If you're lucky (the kind of luck attracted by clear-thinking and hard work), you'll pick up ideas for avoiding those bugs next time.

Another goal of feature freezes is to solicit feedback from a wider range of users, especially those who use the code in their own projects. This is a good practice. At best, only a portion of the intended users will participate. The only way to get feedback from your entire audience is to release your code so that it reaches as many of them as possible.

Many of the users you most want to test your code before an official release won't. The phrase "stable release" has special magic that "alpha," "beta," and "prelease" lack. The best way to get user feedback is to release your code in a stable form.

Make it easy to keep your software clean, stable, and releasable. Make it a priority to fix bugs as you find them. Seek feedback during development, but don't lose momentum for weeks on end as you try to convince everyone to switch gears from writing new code to finding and closing old bugs.

This isn't a myth because it's bad advice. It's only a myth because there's better advice.
The Best Way to Learn a Project is to Fix its Bugs and Read its Code

Myth: New developers interested in the project will best learn the project by fixing bugs and reading the source code.

Reality: Reading code is difficult. Fixing bugs is difficult and probably something you don't want to do anyway. While giving someone unglamorous work is a good way to test his dedication, it relies on unstructured learning by osmosis.

Learning a new project can be difficult. Given a huge archive of source code, where do you start? Do you just pick a corner and start reading? Do you fire up the debugger and step through? Do you search for strings seen in the user interface?

While there's no substitute for reading the code, using the code as your only guide to the project is like mapping the California coast one pebble at a time. Sure, you'll get a sense of all the details, but how will you tell one pebble from the next? It's possible to understand a project by working your way up from the details, but it's easier to understand how the individual pieces fit together if you've already seen them from ten thousand feet.

Writing any project over a few hundred lines of code means creating a vocabulary. Usually this is expressed through function and variable names. (Think of "interrupts," "pages," and "faults" in a kernel, for example.) Sometimes it takes the form of a larger metaphor. (Python's Twisted framework uses a sandwich metaphor.)

Your project needs an overview. This should describe your goals and offer enough of a roadmap so people know where development is headed. You may not be able to predict volunteer contributions (or even if you'll receive any), but you should have a rough idea of the features you've implemented, the features you want to implement, and the problems you've encountered along the way.

If you're writing automated tests as you go along (and you certainly should be), these tests can help make sense of the code. Customer tests, named appropriately, can provide conceptual links from working code to required features.

Keep your overview and your tests up-to-date, though. Outdated documentation can be better than no documentation, but misleading documentation is, at best, annoying and unpleasant.

This isn't a myth because reading the code and fixing bugs won't help people understand the project. It's a myth because the code is only an artifact of the project.
Packaging Doesn't Matter

Myth: Installation and configuration aren't as important as making the source available.

Reality: If it takes too much work just to get the software working, many people will silently quit.

Potential users become actual users through several steps. They hear about the project. Next, they find and download the software. Then they must brave the installation process. The easier it is to install your software, the sooner people can play with it. Conversely, the more difficult the installation, the more people will give up, often without giving you any feedback.

Granted, you may find people who struggle through the installation, report bugs, and even send in patches, but they're relatively few in number. (I once wrote an installation guide for a piece of open source software and then took a job working on the code several months later. Sometimes it's worth persisting.)

Difficulties often arise in two areas: managing dependencies and creating the initial configuration. For a good example of installation and customization, see Brian Ingerson's Kwiki. The amount of time he put into making installation easier has paid off by saving many users hours of customization. Those savings, in turn, have increased the number of people willing to continue using his code. It's so easy to use, why not set up a Kwiki for every good idea that comes along?

It's OK to expect that mostly programmers will use development tools and libraries. It's also OK to assume that people should skim the details in the README and INSTALL files before trying to build the code. If you can't easily build, test, and install your code on another machine, though, you have no business releasing it to other people.

It's not always possible, nor advisable, to avoid dependencies. Complex web applications likely require a database, a web server with special configurations (mod_perl, mod_php, mod_python, or a Java stack). Meta-distributions can help. Apache Toolbox can take out much of the pain of Apache configuration. Perl bundles can make it easier to install several CPAN modules. OS packages (RPMs, debs, ebuilds, ports, and packages) can help.

It takes time to make these bundles and you might not have the hardware, software, or time to write and test them on all possible combinations. That's understandable; source code is the real compatibility layer on the free Unix platforms anyway.

At a minimum, however, you should make your dependencies clear. Your configuration process should detect as many dependencies as possible without user input. It's OK to require more customization for more advanced features. However, users should be able to build and to install your software without having to dig through a manual or suck down the latest and greatest code from CVS for a dozen other projects.

This isn't a myth because people really believe software should be difficult to install. It's a myth because many projects don't make it easier to install.



It's Better to Start from Scratch

Myth: Bad or unappealing code or projects should be thrown away completely.

Reality: Solving the same simple problems again and again wastes time that could be applied to solving new, larger problems.

Writing maintainable code is important. Perhaps it's the most important practice of software development. It's secondary, though, to solving a problem. While you should strive for clean, well-tested, and well-designed code that's reasonably easy to modify as you add features, it's even more important that your code actually works.

Throwing away working code is usually a mistake. This applies to functions and libraries as well as entire programs. Sometimes it seems as if most of the effort in writing open source software goes to creating simple text editors, weblogs, and IRC clients that will never attract more than a handful of users.

Many codebases are hard to read. It's hard to justify throwing away the things the code does well, though. Software isn't physical — it's relatively easy to change, even at the design level. It's not a building, where deciding to build four stories instead of two means digging up the entire foundation and starting over. Chances are, you've already solved several problems that you'd need to rediscover, reconsider, re-code, and re-debug if you threw everything away.

Every new line of code you write has potential bugs. You will spend time debugging them. Though discipline (such as test-driven development, continual code review, and working in small steps) mitigates the effects, they don't compare in effectiveness to working on already-debugged, already-tested, and already-reviewed code.

Too much effort is spent rewriting the simple things and not enough effort is spent reusing existing code. That doesn't mean you have to put up with bad (or simply different) ideas in the existing code. Clean them up as you go along. It's usually faster to refine code into something great than to wait for it to spring fully formed and perfect from your mind.

This isn't a myth because rewriting bad code is wrong. It's a myth because it can be much easier to reuse and to refactor code than to replace it wholesale.
Programs Suck; Frameworks Rule!

Myth: It's better to provide a framework for lots of people to solve lots of problems than to solve only one problem well.

Reality: It's really hard to write a good framework unless you're already using it to solve at least one real problem.

Which is better, writing a library for one specific project or writing a library that lots of projects can use?

Software developers have long pursued abstraction and reuse. These twin goals have driven the adoption of structured programming, object orientation, and modern aspects and traits, though not exactly to roaring successes. Whether proprietary code, patent encumbrances, or not-invented-here stubbornness, there may be more people producing "reusable" code than actually reusing code.

Part of the problem is that it's more glamorous (in the delusive sense of the word) to solve a huge problem. Compare "Wouldn't it be nice if people had a fast, cross-platform engine that could handle any kind of 3D game, from shooter to multiplayer RPG to adventure?" to "Wouldn't it be nice to have a simple but fun open source shooter?"

Big ambitions, while laudable, have at least two drawbacks. First, big goals make for big projects — projects that need more resources than you may have. Can you draw in enough people to spend dozens of man-years on a project, especially as that project only makes it possible to spend more time making the actual game? Can you keep the whole project in your head?

Second, it's exceedingly difficult to know what is useful and good in a framework unless you're actually using it. Is one particular function call awkward? Does it take more setup work than you need? Have you optimized for the wrong ideas?

Curiously, some of the most portable and flexible open source projects today started out deliberately small. The Linux kernel originally ran only on x86 processors. It's now impressively portable, from embedded processors to mainframes and super-computer clusters. The architecture-dependent portions of the code tend to be small. Code reuse in the kernel grew out of refining the design over time.

Solve your real problem first. Generalize after you have working code. Repeat. This kind of reuse is opportunistic.

This isn't a myth because frameworks are bad. This is a myth because it's amazingly difficult to know what every project of a type will need until you have at least one working project of that type.
I'll Do it Right *This* Time

Myth: Even though your previous code was buggy, undocumented, hard to maintain, or slow, your next attempt will be perfect.

Reality: If you weren't disciplined then, why would you be disciplined now?

Widespread Internet connectivity and adoption of Free and Open programming languages and tools make it easy to distribute code. On one hand, this lowers the barriers for people to contribute to open source software. On the other hand, the ease of distribution makes finding errors less crucial. This article has been copyedited, but not to the same degree as a print book; it's very easy to make corrections on the Web.

It's very easy to put out code that works, though it's buggy, undocumented, slow, or hard to maintain. Of course, imperfect code that solves a problem is much better than perfect code that doesn't exist. It's OK (and even commendable) to release code with limitations, as long as you're honest about its limitations — though you should remove the ones that don't make sense.

The problem is putting out bad code knowingly, expecting that you'll fix it later. You probably won't. Don't keep bad code around. Fix it or throw it away.

This may seem to contradict the idea of not rewriting code from scratch. In conjunction, though, both ideas summarize to the rule of "Know what's worth keeping." It's OK to write quick and dirty code to figure out a problem. Just don't distribute it. Clean it up first.

Develop good coding habits. Training yourself to write clean, sensible, and well-tested code takes time. Practice on all code you write. Getting out of the habit is, unfortunately, very easy.

If you find yourself needing to rewrite code before you publish it, take notes on what you improve. If a maintainer rejects a patch over cleanliness issues, ask the project for suggestions to improve your next attempt. (If you're the maintainer, set some guidelines and spend some time coaching people along as an investment. If it doesn't immediately pay off to your project, it may help along other projects.) The opportunity for code review is a prime benefit of participating in open source development. Take advantage of it.

This isn't a myth because it's impossible to improve your coding habits. This is a myth because too few developers actually have concrete, sensible plans to improve.
Warnings Are OK

Myth: Warnings are just warnings. They're not errors and no one really cares about them.

Reality: Warnings can hide real problems, especially if you get used to them.

It's difficult to design a truly generic language, compiler, or library partially because it's impossible to imagine all of its potential uses. The same rule applies to reporting warnings. While you can detect some dangerous or nonsensical conditions, it's possible that users who really know what they are doing should be able to bypass those warnings. In effect, it's sometimes very useful to be able to say, "I realize this is a nasty hack, but I'm willing to put up with the consequences in this one situation."

Other times, what you consider a warnable or exceptional condition may not be worth mentioning in another context. Of course, the developer using the tool could just ignore the warnings, especially if they're nonfatal and are easily shunted off elsewhere (even if it is /dev/null). This is a problem.

When the "low oil pressure" or "low battery" light comes on in a car, the proper response is to make sure that everything is running well. It's possible that the light or a sensor is malfunctioning, but ignoring the real problem — whether bad light or oil leak — may exacerbate further problems. If you assume that the light has malfunctioned but never replace it, how will you know if you're really out of oil?

Similarly, an error log filled with trivial, fixable warnings may hide serious problems. Any well-designed tool generates warnings for a good reason: you're doing something suspicious.

When possible, purge all warnings from your code. If you expect a warning to occur — and if you have a good reason for it — disable it in the narrowest possible scope. If it's generated by something the user does and if the user is privy to the warning, make it clear how to avoid that condition.

Running a program that spews crazy font configuration questions and null widget access messages to the console is noisy and incredibly useless to anyone who'd rather run your software than fix your mess. Besides that, it's much easier to dig through error logs that only track real bugs and failures. Anything that makes it easier to find and fix bugs is nice.

This isn't a myth because people really ignore warnings. It's a myth because too few people take the effort to clean them up.
End Users Love Tracking CVS

Myth: Users don't mind upgrading to the latest version from CVS for a bugfix or a long-awaited feature.

Reality: If it's difficult for you to provide important bugfixes for previous releases, your CVS tree probably isn't very stable.

It's tricky to stay abreast of a project's latest development sources. Not only do you have to keep track of the latest check-ins, you may have to guess when things are likely to spend more time working than crashing and build binaries right then. You can waste a lot of time watching compiles fail. That's not much fun for a developer. It's even less exciting for someone who just wants to use the software.

Building software from CVS also likely means bypassing your distribution's usual package manager. That can get tangled very quickly. Try to keep required libraries up-to-date for only two applications you compiled on your own for awhile. You'll gain a new appreciation for people who make and test packages.

There are two main solutions to this trouble.

First, keep your main development sources stable and releasable. It should be possible for a dedicated user (or, at least, a package maintainer for a distribution) to check out the current development sources and build a working program with reasonable effort. This is also in your best interests as a developer: the easier the build and the fewer compile, build, and installation errors you allow to persist, the easier it is for existing developers to continue their work and for new developers to start their work.

Second, release your code regularly. Backport fixes if you have to fix really important bugs between releases; that's why tags and branches exist in CVS. This is much easier if you keep your code stable and releasable. Though there's no guarantee users will update every release, working on a couple of features per release tends to be easier anyway.

This isn't a myth because developers believe that development moves too fast for snapshots. It's a myth because developers aren't putting out smaller, more stable, more frequent releases.
Common Sense Conclusions

Again, these aren't myths because they're never true. There are good reasons to have a feature freeze. There are good reasons to invite new developers to get to know a project by looking through small or simple bug reports. Sometimes, it does make sense to write a framework. They're just not always true.

It's always worth examining why you do what you do. What prevents you from releasing a new stable version every month or two? Can you solve that problem? Solve it. Would building up a good test suite help you cut your bug rates? Build it. Can you refactor a scary piece of code into something saner in a series of small steps? Refactor it.

Making your source code available to the world doesn't make all of the problems of software development go away. You still need discipline, intelligence, and sometimes, creative solutions to weird problems. Fortunately, open source developers have more options. Not only can we work with smart people from all over the world, we have the opportunity to watch other projects solve problems well (and, occasionally, poorly).

Learn from their examples, not just their code.

chromatic is the Technical Editor of the O'Reilly Network.