Debian GNU/Linux: The Past, the Present and the Future

This is material was presented at the Free Software Symposium 2002 on October 22, 2002 15:30 at the Japan Education Center.

Additional Material:

  1. View slides of the Magicpoint presentation online
  2. Tarball of the presentation and this text
  3. Some pictures from the conference courtesy of Takashi Okamoto.

1. Introduction

Debian/GNU Linux is the largest Linux distribution that exists. Frequently little is known about Debian though because Debian is not a commercial entity but rather a non-commercial organization run by volunteers. There is basically no commercial advertising for Debian. Debian has a budget of 10-30k/year or so. No one has a benefit from the sale of Debian. Complete sources and binaries can be downloaded for free from the Internet. A CD burner and a high speed Internet connection is all that is needed to start making and distributing Debian CDs to other people. With just an Internet connection you can download Debian and install it with no strings attached.

Currently Debian contains over 9000 open source packages. Debian 3.0 aka "woody" is available on 11 different architectures. The Intel architecture is the most frequently used one and most folks only know about Debian because it was available for their PCs. But Debian also runs on the IBM mainframe (S/390) as well as on Palmtops. Debian is the most flexible Linux distribution that there is. The power of Debian originates in its aim to be a "Free" software distribution. We understand that to mean "Free" as in "Freedom" and not in "Free" as in free beer. The freedom is the ability to modify, enhance and change the software at will to fit our needs. That in turn has led to a large number of contributors. Debian has around 1000 developers on file and numerous volunteers contributing in other ways to Debian. The Debian Free Software Guidelines (DFSG) prescribe that the software in Debian/GNU Linux must satisfy the following criteria:

  1. No restrictions on the redistribution of the software.
  2. The source code must be included and distribution of the source must also not be restricted
  3. It must be possible to modify the software and redistribute the modifications.
  4. No discrimination: The license of the software must not restrict use by field of endeavor or persons and groups of people
Frequently software in the Debian Project has been released under the GPL, the GNU Public License.

Debian has been a prime mover in many areas of the Open Source world. Many of the other distributions have copied software that was first developed under Debian GNU/Linux. The package management (dpkg and apt) is known to be the most sophisticated in the open source world and the upgradability and stability of Debian/GNU Linux is legendary. The origins of the very term Open Source has its origins in discussions in the Debian Project. The Open Source Initiative that certifies that licenses are compliant to the Open Source Definition was founded by members of the Debian project.

2. The past

Debian was founded by an initiative of the Free Software foundation. Legend has it that Richard Stallmann was concerned about the rise of commercial Linux distributions (SLS, Slackware, Red Hat) and wanted to make sure that a completely free (as in freedom) Linux distribution would come into being. He offered a grant for someone to develop a Linux distribution that would be done in the spirit of the Free Software Movement and where all software would be available under licensing of the Free Software Foundation. Ian Murdock saw that ad in a magazine and responded to it. He began developing a Linux distribution and named it Deb-ian after the first names of his wife DEBorah and his first name IAN. Thus the name of Debian was created.

A merry band of co-conspirator's grouped around Ian and began helping him to develop a new distribution . The grant ran out after a while and Ian gradually dropped out of the Debian Project. The merry band of co-conspirators continued the project and the project continued to increase and out came what the Debian Project is today. More details on the history of Debian can be found following one of the URLs in the reference section.

2.1 History of Debian

Overview of Debian Releases
VersionYearPackagesDevelopersArchesMilestones
0.93R61995250602First port to the m68k arch
1.1 (Buzz)1996474902ELF and Linux 2.0 Kernel. The number of maintainers here is an interpolation
1.2 (Rex)19968481202
1.3 (Bo)19979742002
2.0 (hamm)199815004002GLIBC
2.1 (slink)199922504104Add Alpha / SPARC Architectures. The Apt tool is included. Work on Hurd begins. The number of maintainers is an interpolation.
2.2 (potato)200039004506Add Power PC / Arm architectures
3.0 (woody)20029000100011Debconf and more architectures (S/390 IBM Mainframe)

I just hope that you are not surprised by these numbers. Debian releases are named after characters from the movie Toy Story. The successor to Ian Murdock was Bruce Perens who used to work for the Pixar the company that had made the movie. 0.93R6 was released under Ian Murdock and therefore it does not have such a name. Buzz was the initial release with a 2.0 Linux Kernel. The 2.0 Kernel had for the first time a full implementation of a typical Unix Kernel. At that point the format of the binaries was changed to ELF which meant a better ability of using shared libraries. With Rex Debian saw a doubling of the number of packages and a significant increase in the number of developers. Bo continued that development. With Hamm a new c library became available that required significant changes to a lot of software. The number of maintainers is becoming very high and at that points efforts began to organize the project in an official way by founding a non-profit organization. Policy documents were written and the general process of packaging software in Debian was formalized. The first packaging tools developed that were used to enhance the ease of packaging which caused a significant increase in the number of packages. Some of the early packaging tools (deb-make, alien, deb-sums etc) were written by me in that time frame. Slink saw the addition of apt by Jason Gunthorpe which tremendously simplified the managing of dependencies of Debian packages and automatized the whole process of downloading and installing software. It was now possible to install applications with a single command. Apt would take care of all dependencies or conflicts and modify other packages in such a way that the installation became possible without user intervention. This was a tremendous boost to the project. Apt is still the most sophisticated packaging tool available today.

We also entered a phase in which we became concerned about the quality of developers joining the project and there was a freeze on accepting new developers. That is a reason why the increase of developers slowed in the following years. A process was implemented to insure that the identity of the developers was known and that they knew about the aims and the policies of the Debian project before having access to Debian machines. Sponsors were initiating newbies into the project.

Potato again meant a huge increase in the number of software packages. New architectures were added as the build process for multiple architectures was automatized. Software could now be released and build daemons would automatically build binaries for other supported architectures.

The current release is Woody released just a few months ago. The new maintainer process works and we doubled the number of developers again. The number of packages almost tripled. Quality concerns caused the release of woody to be delayed for several months. It becomes difficult to maintain software consistently over all the different architectures that are supported by Debian. The project has become huge and developed a degree of inertia that becomes difficult to manage at times.

The numbers for Woody are certainly proof that Debian is the largest distribution that exists. No other Linux distribution is available for as many platforms as Debian. Debian is growing and growing. Here is a chart showing the development of the packages and arches from the earliest public release of Debian (0.93R6) until today:

Note that the number of packages seems to be growing exponentially (Note that the X-Axis of the diagram are the years of Debian releases. Diagrams with proper scaling of years can be found in chapter 4).

The following diagram shows the growth of the number of maintainers until today. The period in which we were not able to accept new maintainers is clearly visible between 1998 and 2000.

Note again that the X-Axis of the diagram are the years of Debian releases.

3. The present

Today we have a huge distribution. Do not ask me: Does Debian support this and that. Debian supports everything. Debian is the largest distribution out there. With a single command any of 9000 packages can be installed. The maintenance process is very well formalized. Procedures exist for everything beginning with the application to become a developer to security fixes. Debian has a huge amount of mailing lists. I used to be able to know exactly what was going on the in project in my first years as a member but I lost that sometime in 1998. The big problem today is communication. It is very difficult to get everyone on the same page and work on the same issue. There are some core concepts that are written down in our Social Contract and a variety of documents on www.debian.org. The Social Contract and the DFSG function sort of like a constitution for Debian. Other documents regulate the details. Debian is a very well established project and has a strong presence in the Open Source movement.

Debian is a fascinating entity since it is a community of developers that interacts only through the Internet. It is rare that Debian developers meet and have a face-to-face encounter like we have here today. Communication is happening mostly through IRC (Internet Relay Chat), mailing lists and websites.

With that online communication comes the lack of personal encounters. This means that the emotional component of communication must be imagined. This can often lead to misunderstanding and conflicts that otherwise would not develop. One area of concern has always been the amount of "flaming" on the mailing lists and on the IRC channels. Debian developers are known to have strong convictions and it is easy to get into some old argument when the buttons of one group or another are pressed. With the large group of developers it is more and more difficult to maintain personal contacts. Cliques develop that deal with some aspects of the project. Decisions are frequently made in those small groups rather than as a whole project. That is unavoidable given the nature of the project but it often leads to complaints because another group or person was not consulted or not aware of coming changes.

3.1 Key Debian Tools

I will try to describe 3 key technologies in Debian in the next section. Hopefully I will be able to present them in the context that they arose so that the purpose and the rational is clear.

Debian packaging is done by modifications to the original source archive of the software that is supposed to be integrated into Debian. If there is a package sed-3.58.tar.gz for example then we will unpack that archive and make necessary modifications to it. A debian directory is generated in that source tree that contains Debian specific information about the package which includes a control file with meta data for dpkg and the build instructions in a rules file. The maintainer will then test the package. When he is satisfied with it a diff will be generated to the original tarball. A Debian package always consists of the original tarball (f.e. sed-3.58.orig.tar.gz) and a diff file with the changes made by the developer (f.e. sed-3.58-1.diff.gz). Note that another item was added to the release. Debian adds its own delta version which is used to track the versions of the diffs by maintainers to the upstream sources.

3.2 List of files in the Debian archive for sed

sed_3.02-8.diff.gz
sed_3.02-8.dsc
sed_3.02.orig.tar.gz
The three files above are the files that contain all information about the debian package. The *.orig.tar.gz file contains the original archive with the sources from the sed developers. It is the original and has the upstream version number (3.02). The diff.gz contains the modifications that debian maintainers have made in order to integrate the package into the distribution. It has both an upstream version and a debian delta (8). The *-diff.gz is a patch that is applied to the *.orig.tar.gz when the package is build.

3.3 List of files in the sed source directory after unpacking and applying the diff

The following is the content of the sed source directory. All of the files in there are typically the original files from the sed upstream site. The diff might have changed some files. It is typical to make modifications to the Makefile in order to place files into their proper locations.
christoph@melchi:~/devel/sed-3.02$ l
total 304
-rw-r--r--    1 christop clameter      420 Aug  1  1998 ANNOUNCE
-rw-r--r--    1 christop clameter      169 Jul 21  1998 AUTHORS
-rw-r--r--    1 christop clameter     2652 Aug  1  1998 BUGS
-rw-r--r--    1 christop clameter    17996 Jul 15  1996 COPYING
-rw-r--r--    1 christop clameter    49910 Aug  1  1998 ChangeLog
-rw-r--r--    1 christop clameter     7831 Apr  9  1998 INSTALL
-rw-r--r--    1 christop clameter      163 Jul 21  1998 Makefile.am
-rw-r--r--    1 christop clameter    10820 Oct 15 10:58 Makefile.in
-rw-r--r--    1 christop clameter     1499 Aug  1  1998 NEWS
-rw-r--r--    1 christop clameter      838 Aug  1  1998 README
-rw-r--r--    1 christop clameter     1319 May 12  1998 README.boot
-rw-r--r--    1 christop clameter     1343 Jul 21  1998 THANKS
-rw-r--r--    1 christop clameter      945 May 31  1998 TODO
-rw-r--r--    1 christop clameter      286 Jul  2  1998 acconfig.h
-rw-r--r--    1 christop clameter        0 Aug  1  1998 acinclude.m4
-rw-r--r--    1 christop clameter     4397 Oct 15 10:58 aclocal.m4
-rwxr-xr-x    1 christop clameter     1594 May 13  1998 bootstrap.sh
-rw-r--r--    1 christop clameter     3395 Oct 15 10:58 config_h.in
-rwxr-xr-x    1 christop clameter    83494 Oct 15 10:58 configure
-rw-r--r--    1 christop clameter     3060 Oct 15 10:58 configure.in
-rwxr-xr-x    1 christop clameter     8682 Oct 15 10:58 dc.sed
drwxr-xr-x    3 christop clameter     4096 Oct 15 10:58 debian
drwxr-xr-x    2 christop clameter     4096 Oct 15 10:58 djgpp
drwxr-xr-x    2 christop clameter     4096 Oct 15 10:58 doc
-rwxr-xr-x    1 christop clameter     5584 Apr  9  1998 install-sh
drwxr-xr-x    2 christop clameter     4096 Oct 15 10:58 lib
-rwxr-xr-x    1 christop clameter     6274 Apr  9  1998 missing
-rwxr-xr-x    1 christop clameter      732 Apr  9  1998 mkinstalldirs
drwxr-xr-x    2 christop clameter     4096 Oct 15 10:58 sed
-rw-r--r--    1 christop clameter       10 Aug  1  1998 stamp-h.in
drwxr-xr-x    2 christop clameter     4096 Oct 15 10:58 testsuite

One special directory with the name debian was added. The files in the debian directory contain information on how to build the package and eventually extra files that are not part of the upstream sed release.

3.4 List of file in the debian directory

christoph@melchi:~/devel/sed-3.02$ l debian
total 36
-rw-r--r--    1 christop clameter     4984 Oct 15 10:58 changelog
-rw-r--r--    1 christop clameter      464 Oct 15 10:58 control
-rw-r--r--    1 christop clameter      636 Oct 15 10:58 copyright
drwxr-xr-x    2 christop clameter     4096 Oct 15 10:58 my
-rw-r--r--    1 christop clameter      287 Oct 15 10:58 postinst
-rw-r--r--    1 christop clameter       66 Oct 15 10:58 preinst
-rw-r--r--    1 christop clameter      183 Oct 15 10:58 prerm
-rwxr-xr-x    1 christop clameter     2028 Oct 15 10:58 rules
The main file of importance is the control file. It controls the basic parameters for the build. The changelog describes all the changes done by the debian maintainer. The copyright file contains information about licensing and changes that the debian maintainer has made to the upstream sources. The my is a directory that the maintainer has added with additional files. The rules file describes how to build the package so that a .deb file is generated. The postinst, preinst and prerm are scripts that are run at various phases of the installation of the package. They are typically used to set up the integration of this package with other tools.

3.5 The "control" file of Sed

christoph@melchi:~/devel/sed-3.02$ cat debian/control
Source: sed
Section: base
Priority: required
Maintainer: Robert van der Meulen 
Standards-Version: 3.1.1
Build-Depends: texinfo, debhelper

Package: sed
Architecture: any
Essential: yes
Pre-Depends: ${shlibs:Pre-Depends}
Description: The GNU sed stream editor.
 sed reads the specified files or the standard input if no
 files are specified, makes editing changes according to a
 list of commands, and writes the results to the standard
 output.
The control file describes who is responsible for this package. Which version of policy does this package follow. It describes what other packages are needed to build this one. texinfo and debhelper need to be installed in order to build this package.

The Architecture setting determines for which architectures the package can be build. any means that sed is buildable on any platform. The package is Essential. It must be present on all debian installations and the system will not work properly without sed.

The Depends list dependencies on other package. The strange ${shlibs:Pre-Depends} means that a pass will be made over the package and the library dependencies of the binaries will determine the dependencies.

3.6 Rules file of Sed

The rules file contains the build instructions for the package.
#! /usr/bin/make -f

# Debian package information
package         = sed
docdir          = /usr/share/doc/$(package)
tmpdir          = $(shell pwd)/debian/tmp

# C compiler information
CC              = gcc
CFLAGS          = -g -O2
LDFLAGS         = -s

all build: Makefile
        make $(MFLAGS) CC="$(CC)" CFLAGS="$(CFLAGS)" LDFLAGS="$(LDFLAGS)"
        make check
        touch build

clean:
        dh_clean
        rm -f build config.log config.cache
        -make distclean

Makefile: Makefile.in
        ./configure --prefix=/usr \
                        --exec-prefix=/ \
                        --datadir=/usr/share \
                        --mandir=/usr/share/man \
                        --infodir=/usr/share/info \
                        --with-regex=


binary: binary-indep binary-arch

binary-indep:

binary-arch: build checkroot
        -rm -rf debian/tmp debian/{files,substvars}
        install -d -o root -g root -m 755 $(tmpdir)$(docdir)/examples

# Install sed
        make DESTDIR=`pwd`/debian/tmp install
        strip --remove-section=.comment --remove-section=.note \
                --strip-unneeded debian/tmp/bin/sed
        gzip -9 $(tmpdir)/usr/share/man/man1/*
        gzip -9 $(tmpdir)/usr/share/info/sed.info

# Install some documentation
        install -p -o root -g root -m 644 ANNOUNCE AUTHORS BUGS README THANKS \
                TODO NEWS $(tmpdir)$(docdir)
        install -p -o root -g root -m 644 ChangeLog $(tmpdir)$(docdir)/changelog
        install -p -o root -g root -m 644 debian/changelog \
                $(tmpdir)$(docdir)/changelog.Debian
        # We expect an error here for the examples-subdir
        -gzip -9 $(tmpdir)$(docdir)/*
        install -p -o root -g root -m 644 debian/copyright $(tmpdir)$(docdir)
        install -p -o root -g root -m 644 dc.sed $(tmpdir)$(docdir)/examples/
        install -p -o root -g root -m 644 debian/my/sedfaq.txt $(tmpdir)$(docdir)

# Install Debian-specific stuff
        install -d -o root -g root -m 755 $(tmpdir)/DEBIAN
        install -p -o root -g root -m 755 debian/preinst $(tmpdir)/DEBIAN
        install -p -o root -g root -m 755 debian/postinst $(tmpdir)/DEBIAN
        install -p -o root -g root -m 755 debian/prerm $(tmpdir)/DEBIAN

# Build the packgae
        dpkg-shlibdeps -dPre-Depends $(tmpdir)/bin/sed
        dpkg-gencontrol -isp
        dpkg --build debian/tmp ..

checkroot:
    test root = "`whoami`"
Basically a rules is like a Makefile with special targets that are invoked by dpkg. The real action happens in the binary-arch target. It installs the binaries into a temporary debian/tmp tree, does various modifications and then generates the .deb by called dpkg --build debian/tmp.

3.7 Postinst of SED

With debian/rules we have a binary .deb. The postinst, preinst etc scripts describe what happens when the binary is deployed on a system. Here is the postinst of sed which describes what to do after the binaries have been installed.
#! /bin/sh -e

pkg=sed

if [ ! "$1" = "configure" ]; then
        exit 0
fi


install-info --quiet --section "General commands" "General commands" \
                /usr/share/info/sed.info

if [ -d /usr/doc -a ! -e /usr/doc/$pkg -a -d /usr/share/doc/$pkg ] ; then
        ln -s ../share/doc/$pkg /usr/doc/$pkg
fi
Sed registers a sed.info file with install-info into a directory of installed info files. Then some maintenance is done to install a symlink to the documentation.

The process of writing these files is very involved and lots of things need to be synchronized in order to come up with a working final package.

3.8 DEBHELPER: Building Debian packages easily

Initially the files were all created manually. More and more functionality was added and more files were necessary. In order to cope with that we moved on to the use of templates. As the complexity increased even more and policies were established how exactly to place files, what permissions to give them, when they should be compressed etc the task got even more complicated and errors easily slipped in. Scripts were written to generate the infrastructure automatically (my deb-make and related software was the first to do that). Today deb-helper by Joey Hess is the tool that is frequently used to generate all the files in the correct locations with the correct permissions and the correct compression.

DEBHELPER can:

  1. Control the generation of the binary debian packages
  2. Set up the locations and permission for files to be conformant to Debian Policy
  3. Compress files and format files to be conformant to Debian Policy
  4. Manage the deployment of SYSV init script so that services can be started on bootup
  5. Calculate dependencies
  6. Build MD5Sum for file integrity verification

3.8 APT: The package management tool

As the number of packages increased it became a burden to manage all the relationships between packages and the tools that we had mainly dealt with individual packages. Situations can arise where a whole group of packages have to be upgraded in order to get additional functionality. Packages might have to be removed in order to enable other packages to be installed. Packages also need to be kept up to date. We needed an intelligent tool that dealt with all the complexities of interrelated packages when doing an upgrade. A package should not be installed if it depends on other packages not yet available. And a package should not be upgraded if it causes breakage of software already on the system and so forth. A tool was developed for package management mainly by Jason Gunthorpe called APT (A Packaging Tool).

APT also allows the automatic retrieval of all needed software from the net for a specific action. This feature was new when apt arrived and APT has been ported to other Linux distributions that tried to do the same thing. Apt can only work because we have a centralized archive of all our software and everything works together due to our policies and procedures. The rpm ports of apt have the challenge of having a variety of archives with different software that does often not fit together.

3.9 DEBCONF: Configuration control

Debconf is a tool that can store package configurations. Maintainer scripts can then access the database to configure software as it is installed.. A variety of front ends (User Interfaces) can interact with Debconf capable packages to determine the overall system configuration. In the past it was necessary to answer long lists of questions when packages were installed. With Debconf these question can be asked before the installation and the installation can then proceed without any user interaction. Configuration values can be saved and preset in a a file so that the replication of an existing configuration becomes possible.

Currently the following Front ends are available:

  1. Dialog: Menu driven character based user interface
  2. Readline: Command line type user interface.
  3. Noninteractive: Simply take the defaults to any questions. Allows non-interactive installations.
  4. Gnome: Graphical Interface based on Gnome
  5. Editor: Use a text editor for entering configuration values
  6. Web: Configuration via a web interface.
The granularity of the questions asked can be controlled so that long tedious question and answer sessions are avoided. It is possible though to get into the details of the configuration for a special package if needed.

4. The future

With all these comfortable tools we have integrated a large number of open source packages into Debian. New people are continually joining the Debian Project which increases our productivity continually. I rarely have had time to look at the numbers and I was quite surprised when I did that a two weeks ago. Here is a graph showing only the development of the number of packages:

It is evident that this is exponential growth. When I fed these number into a spreadsheet and tried to use the tools to predict the number of packages we would have in 2004 or 2006 I ended up with numbers in the millions of packages. So I interpolated the growth of the number of packages manually and came up with some more reasonable numbers. Here is the diagram depicting what I think would be more reasonable scenarios:

The question is does so much open source software exists? Debian already includes almost every open source package that I know of. Given the growth that we have seen so far we should be expecting to have 100000 (one hundred thousand!) packages by the year 2006. The ratio of packages to maintainers started with something like 6 packages per maintainer. In woody we are approximating 10 packages per maintainer. If we would have 100000 packages in Debian then we would need 10000 (ten-thousand!) package maintainers.

Note that there is a strange drop in the ratio from 1996 to 1998. I wonder why that happened?

4.1 The Challenges

Given the tremendous growth in the Debian there are a variety of challenges ahead for us.

4.1.1 Complexity

It is very difficult today for a single person to understand how all of the Debian features work together. It seems that dpkg the package manager becomes more difficult to maintain. A huge body of policies and procedures has accumulated. It is not easy anymore to package software because there is certainly one or the other regulation that was not followed. Debhelper helps somewhat in that area.

4.1.2 People Management

The Debian project used to be able to communicate easily when we were just a few hundred. Now with a thousand developers it is quite difficult to get everyone on the same page. Most of the effective work is happening in small groups of developers. But they can only deal with isolated areas. It is very difficult to change an overall method used in the product.

Being one among 1000 developers also makes the individual rather anonymous. The attraction in the past for many developers was the personal relationships that develop in the project. We need to reorganize the project into smaller groups were these significant relationships can develop.

4.1.3 Project Inertia

That results in inertia of the Debian project. Lots of work is spend on maintaining existing packages and the existing infrastructure and less work is spent on innovative approaches. Debian is known to have a very slow release cycle. We have often argued that this is because of the thoroughness of the testing. We just wont release buggy software and we are proud of the stability of the releases that we have releases. On the other hand I feel that we need to keep up with the other distribution which has been difficult and it gets more difficult as the project grows larger and testing gets more and more extensive.

4.1.4 Difficulty of maintaining huge amount of diffs and files depending on upstream releases

Our basic technique is to produce a patch to the upstream sources. The patches need to be adapted to a new release of open source project. If the patches are kept simple then it is easy to get the new release working but if extensive modifications have been made then the patches will break and will cause lots of work reimplementing features that we already had in prior releases. This is generally avoided by working closely with open source projects and making sure that our features make the next release of the open source projects. Where this does not work major delays occur such as with the X release from 4.1 to 4.2. Debian carries a significant amount of patches for X servers on the variety of platforms supported.

4.2 Source based distributions

4.2.1 From building binaries to building from source

A variety of distributions are emerging based on a different way of doing open source software. Debian is a project whose aim it is to distribute binaries. Debian developers take the upstream source code and compile it for the various platforms that Debian supports. Scripts and patches are made to increase the usefulness of those binaries. The end-user of Debian then downloads the binaries from the Debian mirrors to use them.

The source based distributions that are now emerging are moving away from distributing binaries but instead provide packaging that allows the package manager to retrieve source code and then customize the source code for the particular system.

Documentation was published a while ago called "Linux From Scratch" which gave instructions on how to build a system from plain sources from nothing. Today this is hosted on http://www.linuxfromscratch.com. There is an open source Macho image associated with it. If you are a "REAL" man then you will build your own Linux from nothing. You do not really need a package manager. The source based distributions arose from this project by trying to simplify the work of maintaining a source based distribution. Some of those projects are SourceMage, Lunar Linux and Gentoo Linux.

The Source based distributions typically retrieve patches and source code from the net and then have some kind of description that described how to bring all the components together. The advantage of such an approach is that the source code could theoretically be build in a custom way for your system. The users of the packages are no longer limited by the options that the Debian or RedHat developer put in when the binary package was created. This implies a degree of flexibility that was not known before. The other big issue is that binaries are typically build for 386 CPUs and 386 instructions sets. Today we have much more powerful CPUs like the Athlon or the Pentium 4 that have additional instructions that cannot be used by the binary distributions since the binaries have to created for the least common denominator which is still frequently seen as being the 386.

Advantages and Disadvantages of Source Based distributions
IssueBinarySource based
CustomizabilityThe developer builds from source and configures the packageThe developer provides a recipe describing how software can be build from source. That process is dynamic and can change based on global system configuration by the end user or by the desire to have a package build in a special way.
Deployment SpeedFast: The binary must be downloaded and installed.Can be very slow for big packages like X or GCC or GLIBC. The source code must be retrieved and then the source must be configured and compiled. This can take hours if not days. But as faster hardware becomes available the time to build software will decrease. Small and medium sized packages already build faster than then process of downloading binaries would take.
EfficiencyCannot be tuned to the actual machine the binary is deployed on The binary can be build in an optimized way for the CPU and other characteristics of the target environment. Resources can be more effectively used. Programs run faster and are smaller.
ReliabilityBinaries are tested and established mechanisms exist to check dependencies that insure that the binary will workThis is new territory. Procedures for verifying source dependencies and source configurations are still in flux. As a result the build process is rather fragile
Bandwidth useRequires the maintenance of huge archives of binaries for all the different architectures supported. Complete binaries have to be downloaded for every update.It only requires an archive that stores the recipes. Multiple arches are easily supported. Bandwidth use is limited since an minor update just requires the update of the recipe instead of a full download of a binary. Patches can be easily distributed using source based schemes. Typically upstream source code is cached so that a minor fix just requires a few kilobytes of bandwidth. The time spend re building the package might be an issue though.

The existing source based distributions are pretty immature. They are typically using the same package dependency schemes (or minor variations thereof) already known from binary distributions and try to manage package relationships with them. The relationships between source based packages can be much more complex since a packages can be build with several options using a variety of tools. When I looked into these source based distributions I saw that these are rather fragile due to this issue. Builds would frequently abort. It was necessary to fix things by hand. This puts the source based distributions out of the reach of the average person that wants to use software. A binary distribution such as Debian is much easier to handle and does not require knowledge about the build process in order to use the software.

4.3 The Micro-Package Manager

I have written a new package manager that can do both binary and source packages. It is a simple C program and does not require a lot of tools surrounding it. In order to produce reliable packages uPM tracks all build characteristics and is able to rebuild all components affected by a global configuration change.

4.3.1 Reduction of the amount of information maintained per package

uPM was designed to simplify packaging. It basically integrates an essential subset of the features from dpkg, apt-get and a build daemon. It reduces the meta information necessary to build a package to the absolute minimum.

Meta information about packages is difficult to maintain and has to change frequently. If this meta information is encapsulated in scripts then those scripts have to be manually reworked again and again. uPM puts the meta information in a rigid formal structure that makes the information machine processable. With that minimal information it is very easy to integrate a new package into a distribution.

4.3.2 Increasing the number of packages per maintainer

By reducing the complexity of packaging and simplifying the packaging as much as possible the ratio of packages per maintainer can be radically increased. The factor of reduction of work for maintaining meta data is around 100:1. For 100 lines changed and edited under the classic Debian scheme only one line is needed using uPM. Over the years I packaged around 150 packages for Debian. I did 350 packages with uPM in the last 2 months while developing uPM.

4.3.3 Simplify..Simplify...Simplify

uPM works by simplifying all aspects of maintaining a package. The attempt was made to automatize all elements of packaging that classically required the writing of scripts. For example the registration of info files using dpkg requires a maintainer script. uPM supports triggers. Whenever a file is placed or removed from the /usr/share/info directory uPM will execute a predefined script that updates the info index. This means that a package providing an info file does not need to deal with registration of info files anymore. Analogous methods can be used in many other situations.

I hope you remembered all the information necessary for package integration that we discussed before. Here is the recipe to integrate sed using uPM:

W Super sed - stream editor
S ftp://alpha.gnu.org/gnu/sed/{PV}-fixed.tar.gz
H http://www.gnu.org/software/sed/{P}.html
A LICENSE GPL-2
R BUILD c-build
R BUILD {NLS}
A CONFOPT {NLS?:"--disable-nls"}
A CHECK make check
A DOC COPYING NEWS README* THANKS TODO AUTHORS BUGS
E POSTINST mv usr/bin/sed bin/sed-{V}
Y bin/sed sed-{V} 50
This is not the regular sed but an enhanced sed version. It has the ability to support national languages {NLS} which can be configured on a global scale for packages. This Sed package also provides an alternative for /bin/sed in the Y record.

Well that is basically a cursory overview. I sure wish that there would be some way to lighten the burden of work for Debian developers using some of the ideas in uPM. You can find out more about uPM design and get the code at http://telemetrybox.org/upm,

Thank you for your time. If you want to know more about this topic then look at the on line resources for this presentation at http://telemetrybox.org/tokyo. The web page there also contains lots of references to source material used for this presentation.

5. References

  1. The Debian Project www.debian.org
  2. A short history of Debian can be found at http://www.debian.org/doc/manuals/project-history.
  3. Debian Planet www.debianplanet.com
  4. The Micro Package Manager http://telemetrybox.org/upm
  5. Telemetry/Linux http://ibiblio.org/telemetry
  6. The GNU Public License http://www.gnu.org/licenses/licenses.html#GPL
  7. The Open Source Initiative: http://www.opensource.org
  8. The Debian Free Software Guidelines: http://www.debian.org/intro/free.
  9. Linux From Scratch: http://www.linuxfromscratch.org.
  10. Source Mage GNU/Linux: http://www.sourcemage.org.
  11. Lunar Linux: http://lunar-linux.org.
  12. Gentoo/Linux: http://gentoo.org.

6. Author

Christoph Lameter is a faculty member of the University of Phoenix. He is currently teaching graduate and undergraduate classes a the San Jose extension of the University of Phoenix. Subjects taught vary from Critical Thinking (PHL/251) to Introduction to Unix (POS/420), Programming Concepts using C++ (POS/370), Networks and Telecommunications (NTC/410) to classes on management of programing teams Managing Programming (CMGT/576).

Christoph has been a developer of the Debian Project since 1996. In 1997 Christoph was elected to the Board of Directors of the Debian Project.
Christoph has contributed more than 150 packages to Debian among those alien (Package format converter), deb-make (rapid packaging tool, precursor to helper). Since 1994 Christoph has contributed to the Open Source community various patches to utilities and the Linux kernel as well as started some open source projects. Christoph has been speaking at various conferences on technical and nontechnical issues since 1999.
Christoph is serving as the representative of the Debian Project on the Advisory Council of the Linux Professional Institute since 2000.

Christoph is currently working on a temporary basis at the University of Phoenix and is looking for permanent opportunities for teaching and/or employment. You can reach Christoph at Christoph@Lameter.com.

7. Academic Credentials

Master of Computer Science, University of Bremen, Germany, 1986 with a thesis on Compiler construction titled A transpiler from ADA to Pascal using LALR(1)-Syntax transformation.
Master of Divinity, Fuller Theological Seminary, Pasadena, California, 1994
Associate Fellow of CRIS (Azusa Pacific University, California), 1999
Ph.D. Candidate, Fuller Theological Seminary, 2003(?). Divine action in the context of Scientific Thinking: From Quantum Mechanics to Divine Action.