Additional Material:
Currently Debian contains over 9000 open source packages. Debian 3.0 aka "woody" is available on 11 different architectures. The Intel architecture is the most frequently used one and most folks only know about Debian because it was available for their PCs. But Debian also runs on the IBM mainframe (S/390) as well as on Palmtops. Debian is the most flexible Linux distribution that there is. The power of Debian originates in its aim to be a "Free" software distribution. We understand that to mean "Free" as in "Freedom" and not in "Free" as in free beer. The freedom is the ability to modify, enhance and change the software at will to fit our needs. That in turn has led to a large number of contributors. Debian has around 1000 developers on file and numerous volunteers contributing in other ways to Debian. The Debian Free Software Guidelines (DFSG) prescribe that the software in Debian/GNU Linux must satisfy the following criteria:
Debian has been a prime mover in many areas of the Open Source world. Many of the other distributions have copied software that was first developed under Debian GNU/Linux. The package management (dpkg and apt) is known to be the most sophisticated in the open source world and the upgradability and stability of Debian/GNU Linux is legendary. The origins of the very term Open Source has its origins in discussions in the Debian Project. The Open Source Initiative that certifies that licenses are compliant to the Open Source Definition was founded by members of the Debian project.
A merry band of co-conspirator's grouped around Ian and began helping him to develop a new distribution . The grant ran out after a while and Ian gradually dropped out of the Debian Project. The merry band of co-conspirators continued the project and the project continued to increase and out came what the Debian Project is today. More details on the history of Debian can be found following one of the URLs in the reference section.
Version | Year | Packages | Developers | Arches | Milestones |
---|---|---|---|---|---|
0.93R6 | 1995 | 250 | 60 | 2 | First port to the m68k arch |
1.1 (Buzz) | 1996 | 474 | 90 | 2 | ELF and Linux 2.0 Kernel. The number of maintainers here is an interpolation |
1.2 (Rex) | 1996 | 848 | 120 | 2 | |
1.3 (Bo) | 1997 | 974 | 200 | 2 | |
2.0 (hamm) | 1998 | 1500 | 400 | 2 | GLIBC |
2.1 (slink) | 1999 | 2250 | 410 | 4 | Add Alpha / SPARC Architectures. The Apt tool is included. Work on Hurd begins. The number of maintainers is an interpolation. |
2.2 (potato) | 2000 | 3900 | 450 | 6 | Add Power PC / Arm architectures |
3.0 (woody) | 2002 | 9000 | 1000 | 11 | Debconf and more architectures (S/390 IBM Mainframe) |
I just hope that you are not surprised by these numbers. Debian releases are named after characters from the movie Toy Story. The successor to Ian Murdock was Bruce Perens who used to work for the Pixar the company that had made the movie. 0.93R6 was released under Ian Murdock and therefore it does not have such a name. Buzz was the initial release with a 2.0 Linux Kernel. The 2.0 Kernel had for the first time a full implementation of a typical Unix Kernel. At that point the format of the binaries was changed to ELF which meant a better ability of using shared libraries. With Rex Debian saw a doubling of the number of packages and a significant increase in the number of developers. Bo continued that development. With Hamm a new c library became available that required significant changes to a lot of software. The number of maintainers is becoming very high and at that points efforts began to organize the project in an official way by founding a non-profit organization. Policy documents were written and the general process of packaging software in Debian was formalized. The first packaging tools developed that were used to enhance the ease of packaging which caused a significant increase in the number of packages. Some of the early packaging tools (deb-make, alien, deb-sums etc) were written by me in that time frame. Slink saw the addition of apt by Jason Gunthorpe which tremendously simplified the managing of dependencies of Debian packages and automatized the whole process of downloading and installing software. It was now possible to install applications with a single command. Apt would take care of all dependencies or conflicts and modify other packages in such a way that the installation became possible without user intervention. This was a tremendous boost to the project. Apt is still the most sophisticated packaging tool available today.
We also entered a phase in which we became concerned about the quality of developers joining the project and there was a freeze on accepting new developers. That is a reason why the increase of developers slowed in the following years. A process was implemented to insure that the identity of the developers was known and that they knew about the aims and the policies of the Debian project before having access to Debian machines. Sponsors were initiating newbies into the project.
Potato again meant a huge increase in the number of software packages. New architectures were added as the build process for multiple architectures was automatized. Software could now be released and build daemons would automatically build binaries for other supported architectures.
The current release is Woody released just a few months ago. The new maintainer process works and we doubled the number of developers again. The number of packages almost tripled. Quality concerns caused the release of woody to be delayed for several months. It becomes difficult to maintain software consistently over all the different architectures that are supported by Debian. The project has become huge and developed a degree of inertia that becomes difficult to manage at times.
The numbers for Woody are certainly proof that Debian is the largest distribution that exists. No other Linux distribution is available for as many platforms as Debian. Debian is growing and growing. Here is a chart showing the development of the packages and arches from the earliest public release of Debian (0.93R6) until today:
Note that the number of packages seems to be growing exponentially (Note that the X-Axis of the diagram are the years of Debian releases. Diagrams with proper scaling of years can be found in chapter 4).
The following diagram shows the growth of the number of maintainers until today. The period in which we were not able to accept new maintainers is clearly visible between 1998 and 2000.
Note again that the X-Axis of the diagram are the years of Debian releases.
Debian is a fascinating entity since it is a community of developers that interacts only through the Internet. It is rare that Debian developers meet and have a face-to-face encounter like we have here today. Communication is happening mostly through IRC (Internet Relay Chat), mailing lists and websites.
With that online communication comes the lack of personal encounters. This means that the emotional component of communication must be imagined. This can often lead to misunderstanding and conflicts that otherwise would not develop. One area of concern has always been the amount of "flaming" on the mailing lists and on the IRC channels. Debian developers are known to have strong convictions and it is easy to get into some old argument when the buttons of one group or another are pressed. With the large group of developers it is more and more difficult to maintain personal contacts. Cliques develop that deal with some aspects of the project. Decisions are frequently made in those small groups rather than as a whole project. That is unavoidable given the nature of the project but it often leads to complaints because another group or person was not consulted or not aware of coming changes.
Debian packaging is done by modifications to the original source archive of the software that is supposed to be integrated into Debian. If there is a package sed-3.58.tar.gz for example then we will unpack that archive and make necessary modifications to it. A debian directory is generated in that source tree that contains Debian specific information about the package which includes a control file with meta data for dpkg and the build instructions in a rules file. The maintainer will then test the package. When he is satisfied with it a diff will be generated to the original tarball. A Debian package always consists of the original tarball (f.e. sed-3.58.orig.tar.gz) and a diff file with the changes made by the developer (f.e. sed-3.58-1.diff.gz). Note that another item was added to the release. Debian adds its own delta version which is used to track the versions of the diffs by maintainers to the upstream sources.
sed_3.02-8.diff.gz sed_3.02-8.dsc sed_3.02.orig.tar.gzThe three files above are the files that contain all information about the debian package. The *.orig.tar.gz file contains the original archive with the sources from the sed developers. It is the original and has the upstream version number (3.02). The diff.gz contains the modifications that debian maintainers have made in order to integrate the package into the distribution. It has both an upstream version and a debian delta (8). The *-diff.gz is a patch that is applied to the *.orig.tar.gz when the package is build.
christoph@melchi:~/devel/sed-3.02$ l total 304 -rw-r--r-- 1 christop clameter 420 Aug 1 1998 ANNOUNCE -rw-r--r-- 1 christop clameter 169 Jul 21 1998 AUTHORS -rw-r--r-- 1 christop clameter 2652 Aug 1 1998 BUGS -rw-r--r-- 1 christop clameter 17996 Jul 15 1996 COPYING -rw-r--r-- 1 christop clameter 49910 Aug 1 1998 ChangeLog -rw-r--r-- 1 christop clameter 7831 Apr 9 1998 INSTALL -rw-r--r-- 1 christop clameter 163 Jul 21 1998 Makefile.am -rw-r--r-- 1 christop clameter 10820 Oct 15 10:58 Makefile.in -rw-r--r-- 1 christop clameter 1499 Aug 1 1998 NEWS -rw-r--r-- 1 christop clameter 838 Aug 1 1998 README -rw-r--r-- 1 christop clameter 1319 May 12 1998 README.boot -rw-r--r-- 1 christop clameter 1343 Jul 21 1998 THANKS -rw-r--r-- 1 christop clameter 945 May 31 1998 TODO -rw-r--r-- 1 christop clameter 286 Jul 2 1998 acconfig.h -rw-r--r-- 1 christop clameter 0 Aug 1 1998 acinclude.m4 -rw-r--r-- 1 christop clameter 4397 Oct 15 10:58 aclocal.m4 -rwxr-xr-x 1 christop clameter 1594 May 13 1998 bootstrap.sh -rw-r--r-- 1 christop clameter 3395 Oct 15 10:58 config_h.in -rwxr-xr-x 1 christop clameter 83494 Oct 15 10:58 configure -rw-r--r-- 1 christop clameter 3060 Oct 15 10:58 configure.in -rwxr-xr-x 1 christop clameter 8682 Oct 15 10:58 dc.sed drwxr-xr-x 3 christop clameter 4096 Oct 15 10:58 debian drwxr-xr-x 2 christop clameter 4096 Oct 15 10:58 djgpp drwxr-xr-x 2 christop clameter 4096 Oct 15 10:58 doc -rwxr-xr-x 1 christop clameter 5584 Apr 9 1998 install-sh drwxr-xr-x 2 christop clameter 4096 Oct 15 10:58 lib -rwxr-xr-x 1 christop clameter 6274 Apr 9 1998 missing -rwxr-xr-x 1 christop clameter 732 Apr 9 1998 mkinstalldirs drwxr-xr-x 2 christop clameter 4096 Oct 15 10:58 sed -rw-r--r-- 1 christop clameter 10 Aug 1 1998 stamp-h.in drwxr-xr-x 2 christop clameter 4096 Oct 15 10:58 testsuite
One special directory with the name debian was added. The files in the debian directory contain information on how to build the package and eventually extra files that are not part of the upstream sed release.
christoph@melchi:~/devel/sed-3.02$ l debian total 36 -rw-r--r-- 1 christop clameter 4984 Oct 15 10:58 changelog -rw-r--r-- 1 christop clameter 464 Oct 15 10:58 control -rw-r--r-- 1 christop clameter 636 Oct 15 10:58 copyright drwxr-xr-x 2 christop clameter 4096 Oct 15 10:58 my -rw-r--r-- 1 christop clameter 287 Oct 15 10:58 postinst -rw-r--r-- 1 christop clameter 66 Oct 15 10:58 preinst -rw-r--r-- 1 christop clameter 183 Oct 15 10:58 prerm -rwxr-xr-x 1 christop clameter 2028 Oct 15 10:58 rulesThe main file of importance is the control file. It controls the basic parameters for the build. The changelog describes all the changes done by the debian maintainer. The copyright file contains information about licensing and changes that the debian maintainer has made to the upstream sources. The my is a directory that the maintainer has added with additional files. The rules file describes how to build the package so that a .deb file is generated. The postinst, preinst and prerm are scripts that are run at various phases of the installation of the package. They are typically used to set up the integration of this package with other tools.
christoph@melchi:~/devel/sed-3.02$ cat debian/control Source: sed Section: base Priority: required Maintainer: Robert van der MeulenThe control file describes who is responsible for this package. Which version of policy does this package follow. It describes what other packages are needed to build this one. texinfo and debhelper need to be installed in order to build this package.Standards-Version: 3.1.1 Build-Depends: texinfo, debhelper Package: sed Architecture: any Essential: yes Pre-Depends: ${shlibs:Pre-Depends} Description: The GNU sed stream editor. sed reads the specified files or the standard input if no files are specified, makes editing changes according to a list of commands, and writes the results to the standard output.
The Architecture setting determines for which architectures the package can be build. any means that sed is buildable on any platform. The package is Essential. It must be present on all debian installations and the system will not work properly without sed.
The Depends list dependencies on other package. The strange ${shlibs:Pre-Depends} means that a pass will be made over the package and the library dependencies of the binaries will determine the dependencies.
#! /usr/bin/make -f # Debian package information package = sed docdir = /usr/share/doc/$(package) tmpdir = $(shell pwd)/debian/tmp # C compiler information CC = gcc CFLAGS = -g -O2 LDFLAGS = -s all build: Makefile make $(MFLAGS) CC="$(CC)" CFLAGS="$(CFLAGS)" LDFLAGS="$(LDFLAGS)" make check touch build clean: dh_clean rm -f build config.log config.cache -make distclean Makefile: Makefile.in ./configure --prefix=/usr \ --exec-prefix=/ \ --datadir=/usr/share \ --mandir=/usr/share/man \ --infodir=/usr/share/info \ --with-regex= binary: binary-indep binary-arch binary-indep: binary-arch: build checkroot -rm -rf debian/tmp debian/{files,substvars} install -d -o root -g root -m 755 $(tmpdir)$(docdir)/examples # Install sed make DESTDIR=`pwd`/debian/tmp install strip --remove-section=.comment --remove-section=.note \ --strip-unneeded debian/tmp/bin/sed gzip -9 $(tmpdir)/usr/share/man/man1/* gzip -9 $(tmpdir)/usr/share/info/sed.info # Install some documentation install -p -o root -g root -m 644 ANNOUNCE AUTHORS BUGS README THANKS \ TODO NEWS $(tmpdir)$(docdir) install -p -o root -g root -m 644 ChangeLog $(tmpdir)$(docdir)/changelog install -p -o root -g root -m 644 debian/changelog \ $(tmpdir)$(docdir)/changelog.Debian # We expect an error here for the examples-subdir -gzip -9 $(tmpdir)$(docdir)/* install -p -o root -g root -m 644 debian/copyright $(tmpdir)$(docdir) install -p -o root -g root -m 644 dc.sed $(tmpdir)$(docdir)/examples/ install -p -o root -g root -m 644 debian/my/sedfaq.txt $(tmpdir)$(docdir) # Install Debian-specific stuff install -d -o root -g root -m 755 $(tmpdir)/DEBIAN install -p -o root -g root -m 755 debian/preinst $(tmpdir)/DEBIAN install -p -o root -g root -m 755 debian/postinst $(tmpdir)/DEBIAN install -p -o root -g root -m 755 debian/prerm $(tmpdir)/DEBIAN # Build the packgae dpkg-shlibdeps -dPre-Depends $(tmpdir)/bin/sed dpkg-gencontrol -isp dpkg --build debian/tmp .. checkroot: test root = "`whoami`"Basically a rules is like a Makefile with special targets that are invoked by dpkg. The real action happens in the binary-arch target. It installs the binaries into a temporary debian/tmp tree, does various modifications and then generates the .deb by called dpkg --build debian/tmp.
#! /bin/sh -e pkg=sed if [ ! "$1" = "configure" ]; then exit 0 fi install-info --quiet --section "General commands" "General commands" \ /usr/share/info/sed.info if [ -d /usr/doc -a ! -e /usr/doc/$pkg -a -d /usr/share/doc/$pkg ] ; then ln -s ../share/doc/$pkg /usr/doc/$pkg fiSed registers a sed.info file with install-info into a directory of installed info files. Then some maintenance is done to install a symlink to the documentation.
The process of writing these files is very involved and lots of things need to be synchronized in order to come up with a working final package.
Initially the files were all created manually. More and more functionality was added and more files were necessary. In order to cope with that we moved on to the use of templates. As the complexity increased even more and policies were established how exactly to place files, what permissions to give them, when they should be compressed etc the task got even more complicated and errors easily slipped in. Scripts were written to generate the infrastructure automatically (my deb-make and related software was the first to do that). Today deb-helper by Joey Hess is the tool that is frequently used to generate all the files in the correct locations with the correct permissions and the correct compression.
DEBHELPER can:
As the number of packages increased it became a burden to manage all the relationships between packages and the tools that we had mainly dealt with individual packages. Situations can arise where a whole group of packages have to be upgraded in order to get additional functionality. Packages might have to be removed in order to enable other packages to be installed. Packages also need to be kept up to date. We needed an intelligent tool that dealt with all the complexities of interrelated packages when doing an upgrade. A package should not be installed if it depends on other packages not yet available. And a package should not be upgraded if it causes breakage of software already on the system and so forth. A tool was developed for package management mainly by Jason Gunthorpe called APT (A Packaging Tool).
APT also allows the automatic retrieval of all needed software from the net for a specific action. This feature was new when apt arrived and APT has been ported to other Linux distributions that tried to do the same thing. Apt can only work because we have a centralized archive of all our software and everything works together due to our policies and procedures. The rpm ports of apt have the challenge of having a variety of archives with different software that does often not fit together.
Currently the following Front ends are available:
It is evident that this is exponential growth. When I fed these number into a spreadsheet and tried to use the tools to predict the number of packages we would have in 2004 or 2006 I ended up with numbers in the millions of packages. So I interpolated the growth of the number of packages manually and came up with some more reasonable numbers. Here is the diagram depicting what I think would be more reasonable scenarios:
The question is does so much open source software exists? Debian already includes almost every open source package that I know of. Given the growth that we have seen so far we should be expecting to have 100000 (one hundred thousand!) packages by the year 2006. The ratio of packages to maintainers started with something like 6 packages per maintainer. In woody we are approximating 10 packages per maintainer. If we would have 100000 packages in Debian then we would need 10000 (ten-thousand!) package maintainers.
Note that there is a strange drop in the ratio from 1996 to 1998. I wonder why that happened?
Being one among 1000 developers also makes the individual rather anonymous. The attraction in the past for many developers was the personal relationships that develop in the project. We need to reorganize the project into smaller groups were these significant relationships can develop.
The source based distributions that are now emerging are moving away from distributing binaries but instead provide packaging that allows the package manager to retrieve source code and then customize the source code for the particular system.
Documentation was published a while ago called "Linux From Scratch" which gave instructions on how to build a system from plain sources from nothing. Today this is hosted on http://www.linuxfromscratch.com. There is an open source Macho image associated with it. If you are a "REAL" man then you will build your own Linux from nothing. You do not really need a package manager. The source based distributions arose from this project by trying to simplify the work of maintaining a source based distribution. Some of those projects are SourceMage, Lunar Linux and Gentoo Linux.
The Source based distributions typically retrieve patches and source code from the net and then have some kind of description that described how to bring all the components together. The advantage of such an approach is that the source code could theoretically be build in a custom way for your system. The users of the packages are no longer limited by the options that the Debian or RedHat developer put in when the binary package was created. This implies a degree of flexibility that was not known before. The other big issue is that binaries are typically build for 386 CPUs and 386 instructions sets. Today we have much more powerful CPUs like the Athlon or the Pentium 4 that have additional instructions that cannot be used by the binary distributions since the binaries have to created for the least common denominator which is still frequently seen as being the 386.
Issue | Binary | Source based |
---|---|---|
Customizability | The developer builds from source and configures the package | The developer provides a recipe describing how software can be build from source. That process is dynamic and can change based on global system configuration by the end user or by the desire to have a package build in a special way. |
Deployment Speed | Fast: The binary must be downloaded and installed. | Can be very slow for big packages like X or GCC or GLIBC. The source code must be retrieved and then the source must be configured and compiled. This can take hours if not days. But as faster hardware becomes available the time to build software will decrease. Small and medium sized packages already build faster than then process of downloading binaries would take. |
Efficiency | Cannot be tuned to the actual machine the binary is deployed on | The binary can be build in an optimized way for the CPU and other characteristics of the target environment. Resources can be more effectively used. Programs run faster and are smaller. |
Reliability | Binaries are tested and established mechanisms exist to check dependencies that insure that the binary will work | This is new territory. Procedures for verifying source dependencies and source configurations are still in flux. As a result the build process is rather fragile |
Bandwidth use | Requires the maintenance of huge archives of binaries for all the different architectures supported. Complete binaries have to be downloaded for every update. | It only requires an archive that stores the recipes. Multiple arches are easily supported. Bandwidth use is limited since an minor update just requires the update of the recipe instead of a full download of a binary. Patches can be easily distributed using source based schemes. Typically upstream source code is cached so that a minor fix just requires a few kilobytes of bandwidth. The time spend re building the package might be an issue though. |
The existing source based distributions are pretty immature. They are typically using the same package dependency schemes (or minor variations thereof) already known from binary distributions and try to manage package relationships with them. The relationships between source based packages can be much more complex since a packages can be build with several options using a variety of tools. When I looked into these source based distributions I saw that these are rather fragile due to this issue. Builds would frequently abort. It was necessary to fix things by hand. This puts the source based distributions out of the reach of the average person that wants to use software. A binary distribution such as Debian is much easier to handle and does not require knowledge about the build process in order to use the software.
Meta information about packages is difficult to maintain and has to change frequently. If this meta information is encapsulated in scripts then those scripts have to be manually reworked again and again. uPM puts the meta information in a rigid formal structure that makes the information machine processable. With that minimal information it is very easy to integrate a new package into a distribution.
I hope you remembered all the information necessary for package integration that we discussed before. Here is the recipe to integrate sed using uPM:
W Super sed - stream editor S ftp://alpha.gnu.org/gnu/sed/{PV}-fixed.tar.gz H http://www.gnu.org/software/sed/{P}.html A LICENSE GPL-2 R BUILD c-build R BUILD {NLS} A CONFOPT {NLS?:"--disable-nls"} A CHECK make check A DOC COPYING NEWS README* THANKS TODO AUTHORS BUGS E POSTINST mv usr/bin/sed bin/sed-{V} Y bin/sed sed-{V} 50This is not the regular sed but an enhanced sed version. It has the ability to support national languages {NLS} which can be configured on a global scale for packages. This Sed package also provides an alternative for /bin/sed in the Y record.
Well that is basically a cursory overview. I sure wish that there would be some way to lighten the burden of work for Debian developers using some of the ideas in uPM. You can find out more about uPM design and get the code at http://telemetrybox.org/upm,
Thank you for your time. If you want to know more about this topic then look at the on line resources for this presentation at http://telemetrybox.org/tokyo. The web page there also contains lots of references to source material used for this presentation.
Christoph has been a developer of the Debian Project since 1996.
In 1997 Christoph was elected to the Board of Directors of the Debian Project.
Christoph has contributed more than 150 packages to Debian among those alien (Package format converter), deb-make (rapid packaging tool, precursor to helper).
Since 1994 Christoph has contributed to the Open Source community various patches to utilities
and the Linux kernel as well as started some open source projects. Christoph has been speaking at various
conferences on technical and nontechnical issues since 1999.
Christoph is serving as the representative of the Debian Project on the Advisory Council of the Linux Professional Institute since 2000.
Christoph is currently working on a temporary basis at the University of Phoenix and is looking for permanent opportunities for teaching and/or employment. You can reach Christoph at Christoph@Lameter.com.