overview

Advanced

The birth of Samba (Open Source)

Posted by archive 
Special report: The birth of Samba

October 23, 2003
Source

In this ZDNet Australia special report, the creator of the open source file-sharing software Samba, Andrew Tridgell, explains how he came to write the software that has earned him Bulletin magazine's Smartest 100 award in the ICT sector for 2003.

I did my honours degree in theoretical physics, building a computer simulation of a proposed interferometer based gravity wave observatory. That was a lot of fun, [Editor's note: for some, maybe] but I wanted something a little different for my PhD, so after looking around a bit I ended up studying automatic speech recognition in the Computer Sciences Laboratory at the Australian National University (ANU).

As so often happens in research the really novel ideas that came out of my work were a long way from the original project, so after a few years I ended up changing topics to reflect the direction that my research was taking. My thesis ended up being on efficient methods of sorting and transmitting data, with perhaps the best known result being the development of the 'rsync algorithm' for efficient remote update of files.

I released a free implementation of this algorithm as an open source application called rsync. The fact that it has become so popular, especially in the UNIX community, has been a source of immense gratification to me.

The way Samba got started is a testament to the power of procrastination in software engineering. It was 1991 and I was working on my PhD in speech recognition when an opportunity came up to beta test a program from Digital called "Wind-X" (later renamed to "Excursion"). Like many PhD students I was happy to dive off into any distraction that came my way, so I signed up for the beta test program and started evaluating Wind-X, which provided the ability to display X-Windows applications from Unix servers to Windows PCs.

At the time I was using a program called PC-NFS to share files between my Windows desktop and the departmental Sun4 server. If you did any networking with Windows back in 1991 then you will probably remember that each vendor had their own TCP/IP implementation, rather than using a single stack provided with the operating system. PC-NFS and Wind-X were no exception to this, which means that they could not co-exist, which in turn meant that while testing Wind-X I could no longer share files between Windows and the Sun Unix server.

I noticed that Wind-X came with its own file sharing system called "Pathworks", but that didn't work with Sun servers, it only worked with Digital's own operating systems (which were Ultrix and VMS at the time). I thought that the protocols that Pathworks used wouldn't be too hard to work out, so I set out to build a Pathworks compatible server for our Sun server, so that I could still use the Sun server while evaluating Wind-X. I really had caught the procrastination bug rather badly!

I captured lots of data going between my Windows desktop and a local Ultrix Pathworks server and stared at it for hours on end to work out the protocol. After that the first version of my very rough "Pathworks server for SunOS" was finished in about in about a week of all-night programming. In a moment of complete lack of inspiration I called it "server 0.1". It really wasn't a very good piece of code, and it certainly wasn't very reliable, but the important thing is that I then decided to release it to the world for free. That was the original basis for what is now known as Samba.

I first heard of Linux when Dan Shearer wrote to me in November 1992. He wrote to ask me about my free "Pathworks server" program, and told me that there had been some interest in my program in a Linux discussion group. I asked him about Linux, and pretty soon I was hooked. Here was a complete operating system that I was able to play with the internals of on my own PC. I have been a keen Linux user and developer ever since.

When I first released "server" for free I didn't use the GPL. Instead, I added a simple note that said "use this software at your own risk" and left it at that. I never really expected anyone to want to use it anyway, so the license conditions didn't seem important.

After I started using Linux I noticed it was released under this much more complex license called the GNU GPL. I didn't really understand it at first, but soon came to realise just how good it was for free software development. Once I started to understand the GPL I changed the license on Samba to it. If I hadn't done that then I am almost certain that Samba would never have enjoyed the success that it has.

That first bit of code I wrote around Christmas of 1991 was really a terrible bit of programming that wasn't put together with the sort of attention to detail that is needed for decent networking software. It was the community development process that goes along with the free software movement that really allowed Samba to succeed. As I got code contributions, constructive criticism, bug reports and encouragement from a growing number of users I was able to take Samba from a very rough beginning into something that is used by just about every large company in the world.

During the lifetime of Samba several proprietary software products that essentially do the same thing have been developed by various companies, including companies with deep pockets and plenty of experienced developers. The proprietary equivalents have never enjoyed the same success and it's not just because of the price. Open source development has turned Samba into a high-quality, feature-rich piece of software. It's not simply a good choice because it's "cheap".

Free software and sharing source code has a longer history than the relatively new notion of proprietary software. Many people don't realise that when the computing age started it was just considered natural to share your source code with other people, just like the academic community shares ideas via journals and scientific collaboration.

Unfortunately some people then discovered that it was possible to make your fortune by "hoarding" software and the age of proprietary software was born. What most people don't think about is the huge cost to society that this new way of developing software has left us with. We now have large numbers of programmers reinventing poorly designed bits of software, most of which will eventually be discarded and lost forever.

Thanks to the pioneering efforts of people like Richard Stallman the free software community is breathing life back into the old methods of software development where sharing your source code is the norm and making your fortune is less important. The result is a set of software which forms the basic building blocks of the Internet and reaches into just about every area of computing.

The nice thing is that many programmers like myself are still able to make a good living when developing free software, as there are plenty of companies who understand the benefits that free software brings them and are willing to pay top dollar for improving the software further. That allows nearly everyone to win, except those companies that refuse to think about anything except proprietary development methods.

The first version of Samba was not anywhere near good enough to sell commercially, and I would not have been able to develop it to anywhere near the level that it is at now without the help of the free software community and the excellent set of development tools that the GNU project provided. I wouldn't have wanted to even if I could have -- community-based development is just so much more satisfying.

I joined IBM in January of this year as a research staff member in the Almaden research lab in San Jose, California, working remotely from Canberra. The work that I am doing for them is in many ways rather similar to the work I have done for several other companies over the last few years - I help to improve Samba. The big difference with my job at IBM is that the research focus allows me to take on much more ambitious projects without having to worry as much about meeting next months product deadlines.

The group in IBM Canberra that I work with is nearly exactly the same group of people that I worked with in my first job after leaving the academic world, its just that we all now work for a different company. In 1999 I joined a small start-up company called Linuxcare that was based in San Francisco, and helped to build a group of really talented Linux developers in Canberra, which we called "OzLabs". Nearly all of that group now works for IBM, and it's a fantastic place to work.

I am quite used to working remotely with other programmers. The Samba Team is spread out all over the world, and we only meet once or twice a year at most, so I find it perfectly normal to be working on a programming project with someone on the other side of the world.

At the moment I'm working on Samba version 4, which is a rather major rewrite aimed at cleaning up some of the poor structure in Samba that is left over from the early days of its development. I'm also working on a number of other non-Samba projects related to delta compression (such as rsync) and a new algorithm for fighting the tide of SPAM that is inundating everyone's e-mail these days. There is always plenty to do!