Packaging RFC
There are many different people and organizations trying to use Globus (and succeeding). The Globus project wants to satisfy the needs of its users, but the reality is that different communities have different needs. Ideally, a tailored distribution could be constructed to meet the specific needs of an individual community. Right now, Globus' ability to be easily customized is minimal, and difficult to work with, at best. The goal of the packaging effort is to construct a framework and a set of Globus packages which can be used to create tailored Globus distributions that fit the specific needs of the organizations using Globus. Another goal of the packaging effort is to assist the process of developing the Globus toolkit, by allowing individual pieces of what is now a monolithic Globus distribution to be released on separate schedules.
The current monolithic Globus toolkit which consists of many components is inflexible in that it is difficult to distribute subsets of Globus. To alleviate this problem the Globus components will be divided into packages. Using the new packaging framework, it will be possible for organizations to construct both source and binary distributions of the selected packages they are interested in. No longer will users be forced to build and configure components which are of no interest to them.
Under the new packaging framework, it will be possible to release updates to an individual component without necessarily releasing updates to all other components. Additionally, an individual component can be built and tested without configuring and building unrelated components. Thus the packaging framework substantially increases the efficiency of the development and release process.
This paper is an attempt to lay out the design of a packaging framework that can be used to enable the packaging of Globus components. We will introduce packaging concepts, and discuss strategies and policies for dividing large pieces of software into logical packages (think Globus, but what we say is applicable in general). We will describe the different types of binary packages, as well as the distinctions between source packages and binary packages. We will also describe the mechanism by which the packaging system manages complex build environments, and what it means to have a Globus installation in the context of the packaging system. Finally, we will describe the metadata needed by the packaging system, for each package type, to allow the system to effectively manage the packages.
We plan to provide a simple portable framework for creating and managing Globus packages. Here is a list of the features that will be supported.
Install / Uninstall / Upgrade
The installation management of binary packages is the responsibility of a tool (or tools, collectively) known as the Package Manger. The Package Manager is the interface through which the person installing a package will install, uninstall, or upgrade it. An example of a package manager that is probably known to many of our readers is RPM, the RedHat Package Manager. On a RedHat Linux system, one typically uses RPM to install, uninstall, or upgrade any binary package.
Dependency tracking
There are three types of inter-package dependencies:
- Compile dependencies
A compile (or compile-time) dependency exists when a package requires
another package in order to compile. This usually means that the
dependent package is including a header from the package on which it
depends.
- Link dependencies
A link (or link-time) dependency exists when a program or library in a
package requires a library from another package in order to link. In
the case of programs linked with shared libraries, the packaging system
has to verify that the link dependencies have been fulfilled in a given
installation, as otherwise the programs will not run.
- Runtime dependencies
A runtime dependency exists when a program, script, or library requires
some resource from another package at runtime.
Versioning
Whenever more than one separately released pieces of software need to interact, a need arises to ensure that the software, as installed, is interoperable. The time-tested way of doing this is to assign a version number to each release of each piece of software, so that any given version is readily identifiable. This allows the encoding of version numbers into package dependencies, as you may know that your package foo 2.11 requires package bar 2.0 or greater.
Flavored binaries
Certain compile time options used when creating a library, must also be used when a program is linked that uses that library. Otherwise, linking errors will occur. For example, it can be important that the same compiler be used, or that the same threads package be used. Additionally, it is very important whether or not we are compiling 32 bit or 64 bit. We refer to such sets of compile time options as flavors.
Relocatable binaries
It is important for ease of installation that the binaries not be tied to specific directories. (i.e. a program should not insist on being installed in /usr/local/bin). However, given large sets of inter-package dependencies, especially with scripts calling programs, it is a relatively hard problem to enable the easy installation of packages into totally arbitrary locations. A reasonable compromise, which we make for Globus Toolkit packaging, is to insist that all packages comprising a given installation be installed into a single installation tree, but that tree can be rooted anywhere it is desired.
External programs and libraries
When distributing any software onto systems where the underlying operating system is not packaged using the same packaging system (i.e. every system onto which Globus is installed, unless someone makes a Linux or BSD distribution using our packaging system sometime in the future), there will be programs and libraries that do not have packaging system metadata associated with them. There are, in general, two ways of dealing with such external programs and libraries. One is to make special exceptions for them in the tools that check dependencies. The other, which is by no means mutually exclusive with the first, is to create "virtual packages" which consist of appropriate metadata for the external "package", and, if necessary, links from the install tree to the actual location of the external programs and/or libraries.
Compatible with existing package managers
The packaging system described in this document will have its own accompanying package manager, but the system has been designed with compatibility in mind. That is, it should be relatively simple to take a set of binary packages created for this system, and convert them into binary RPMS, for example. We are providing our own package manager so that we can ensure that a package manager is available on all platforms that we support, but organizations that will be creating distributions of Globus packages may wish to use their own package manager, for ease of installation management.
(A distribution of Globus packages is a set of Globus packages, along with the runtime configuration files (or tools to generate the runtime configuration files) needed for a particular set of users. RedHat linux is a good example of the distinction between packages and a distribution; each piece of software is a package, in an RPM, but the RedHat Linux Distribution includes tools (such as their install tool/bootdisk) that sets up the system for a particular configuration.)
Runtime Configuration files (vs. static data files) and who manages them (RPM vs. Linuxconf)
Programs, scripts, and possibly libraries, may require some information provided to them at runtime, per machine or per user, in order to function as desired. We will refer to files containing such information as runtime configuration files (we will always use this term instead of simply using "configuration files" to avoid confusing runtime configuration files with files used by configure when building a package). In this packaging system, binary packages may require that some runtime configuration files exist in order to function, but the package itself shall not install the actual files. This allows organizations that wish to create personalized distributions to create runtime configuration packages that can be installed and managed separately from the packages they relate to. This makes the process of upgrading a package without changing its runtime configuration files extremely easy--it is the default.
We can look to the Linux world for an example of this division. Linuxconf is a wizard that allows the user to relatively easily manipulate the runtime configuration files for various different packages that might be installed on a linux system. However, since RPM allows packages to install and manipulate their own runtime configuration files, there is no consistent method for ensuring that you retain your old runtime configuration when upgrading a package.
To illustrate this say that you have a globus package foo which has a runtime configuration file "foo.conf" that is modified by the user using the Linuxconf like GUI wizard "gui_fee". For this illustration the concept ownership is defined in such a way that when a package is responsible for installing, uninstalling, or updating a file it "owns" the file.
If foo.conf is owned by package foo then several problems arise. First any modifications by the user for gui_fee are lost whenever package foo is uninstalled, reinstalled, or updated. Second, foo.conf will be re-installed every time a new version of package foo is released even though the format of foo.conf most likely did not change. Some packagers have tried to resolve these problems by introducing pre and post install/uninstall scripts that are run during an action on package foo but this introduces an unacceptable amount of complexity to our packaging framework design.
The same problems occur when foo.conf is owned by the package gui_fee. In addition, gui_fee probably manages the runtime configurations of several packages not all of which have to be installed. Finally not all globus installations will be able to run a GUI wizard but will still need to have foo.conf.
The only acceptable solution is to have foo.conf in its own package freeing it from the actions needed for the other packages.
The Globus Toolkit has some requirements that dictate what the strategy for
splitting up the toolkit will be:
- The Toolkit has a complex network of dependencies between its various
components. Many of these dependencies are circular unless Globus is
split into atomic sized packages (ie. 1 package = 1 library). This
network of dependencies also dictate that the toolkit has to be treated
as a distribution of packages rather than a loose collection of packages.
- The development portion of the toolkit is built with a number of compiler
flavors. A compiler flavor is defined as the set of build environment
variables (compiler choice, linker choice, compiler/linker flags etc.)
which every binary that will be linked together has to use. Flavors
cannot be mixed which means that for any given source dependency tree
there is an equivalent binary dependency tree for every flavor.
- A significant portion of Globus is not compiled at all. Instead it
consist of scripts and data files which are not flavor specific.
- The installed files that are the product of Globus source code can be used
in a number of different ways as discussed in the Overview. None of these
uses require the exact same set of installed files. For example, consider
a source code package that installs the program foo, the static and shared
versions of the library libfoo, and the header foo.h. Program foo is needed
at runtime. The shared library libfoo.so is need at runtime if other
programs which link to it are installed. Both static and shared libraries
libfoo.a and libfoo.so as well as the header file are needed when the
package foo is used in a development tree.
- Several globus components use files that have to be modified after
installation. These files have to be treated special so that the Globus
user does not lose the modifications when packages are re-installed or
updated. (These files are by definition runtime configuration files, as
mentioned above.)
- Because of the complex dependencies anticipated between globus
packages a flexible versioning scheme is needed. This
scheme needs to allow both the package maintainer and the package
users to individually express compatibility among the package
versions. For example, the maintainer for package foo is releasing
an update. This maintainer needs to be able to express that the
update can be safely used in place of the previous
version. Conversely, the maintainer needs to be able to express that
the update is incompatible with previous versions of foo. In the
same way, the users of package foo need to be able to express which
versions of foo their package depends on.
The following design decisions were made to satisfy these requirements.
Small Packages
The idea behind package types is that smaller is better because it provides
flexibility. For example, dependency checking between packages becomes
simple if the contents of a particular package serve one consistent purpose.
If dependency checking is simple, then it can be automated through package
manager convenience tools. From this, users can manage a large number of
packages in groups rather than individually.
Small Source Packages
The source code contained in a source code package shall always be released
together. There is no provision in our current packaging framework for
different binary package types generated from the same source package to
have different version numbers. In other words all of the source code in a
source code package uses the same name and version number. Thus, putting
the source to a library, and to programs using that library into the same
source package implies that you will never want to release a new version of
the library without releasing a new version of the programs, and vice
versa.
Multiple Binary Package Generation
One source code package shall generate several binary packages. For
example, if the package generates binaries than each build flavor these
binaries were built with would need to be in a seperate binary package.
These packages will use the name and version of the source package and add
their own unique extension. This allows dependency checking to be tailored
to a particular user requirement (ie. run-time vs development tree) which
will simplify the checking.
Binary Package Types
All of the installed files of a given source code package build shall be
contained in several different binary package types depending on how the
files are used. No file shall belong to more than one binary package.
Consider the example discussed in requirement 4 in the previous section.
Here is how the installed files would be divided into seperate binary
packages:
- Dynamically linked program foo. This is used if the runtime
environment supports shared libraries.
- Statically linked program foo. This is used if the runtime
environment does not support shared libraries.
- Shared library libfoo.so. This is used when other dynamically
linked programs are linked with this library.
- Static library libfoo.a and foo.h. Used in a development tree.
Flavored Binary Packages
Any binary package that contains compiled code or files configured by
flavor shall be tagged with a flavor name. Any binary package that does
not contain these types of files will be tagged as a "noflavor" package.
Expressing Dependency data in packages
Each package shall store only it's direct dependencies. For example, if
the source code in package foo had an include statement which referenced
a header file in package fum then package foo has a direct dependency to
package fum. On the other hand if the header file in fum includes headers
from other packages those dependency belong to package fum not package foo.
A packaging system will have to examine all of the dependent packages in
order to obtain the entire dependency tree.
Circular Dependencies
A circular dependency is defined by the situation where dependencies
between two or more packages can only be accommodated if all of the
packages are installed simultaneously. For example if packages foo and
fum depend on each other then a PM install cannot install foo before fum.
Nor can it install fum before foo. This situation shall be resolved by
splitting up foo and fum in such a way that the dependency tree becomes
a directed acyclic graph.
Runtime Configuration Files
Files required by a globus package after installation will not be include
in the package as defined by this framework. Globus distributions are
encouraged to distribute these files in separate packages.
Libtool Versioning Scheme
We have adopted a variation of libtool's version numbering scheme.
In libtool's scheme, each version number consists of three fields,
major version number, minor version number, and a "compatibility
range" number. The major version number is bumped for any
interface change, the minor version represents gets bumped for
bugfixes, etc in a given interface, and the third number
represents the range for which the first number is backwardly
compatible. For example, 5.4.3 is backwardly compatible with
2.x.x and up (through 5), since 5-3=2. We will refer to this
versioning scheme as "aging version".
Dependency Version Specifications
To provide flexibility in specifying versions as a part of a
package dependency the following shall be done. A packager will
be able to specify the version of a dependency as either one
version number or a range. If if one version is specified, then
the framework will use the compatibility range mentioned
previously to determine whether a dependency is met or not. If a
range is specified then the packaging framwork will look for
versions only within that range. In the source package a packager
will be able to specify a list of ranges and versions. This list
shall be in order of preference. We will refer to this list as a
"version specification".
As an example consider the installed package foo which has a
version number of 5.3. As was mentioned in the previous
section, this specifies a compatibility range of 2 to 5. Now
we want to install package fum which depends on foo. The
following table shows how the versioning works:
Version specification for
Dependency foo to fum |
Dependency is met? |
1 |
No |
1 to 4 |
No |
4 to 4 |
No |
3 |
Yes |
3 to 6 |
Yes |
Binary packages will only have the first version specification
that was met when the source package was built. So for
specifications were listed in the example, binary package foo
would have the version specification of 2.3.
Supported Package Types:
Source Package
Dynamically Linked Program Binary Package
Statically Linked Program Binary Package
Non-Flavored Headers Package
Development Binary Package
Data Binary Package
Document Binary Package
This package consists of source code, scripts, and documents which are
configured and built to produce binary packages. One source package will
produce one or more binary packages each of which is a different package type.
Source packages have two sources of dependencies to consider. The first source
are the compile and link dependencies that are present when the source code is
being built. The second source is the run-time dependencies that need to be
stored in the binary packages when they are generated.
Source packages are different from all of the other package types, in that
they are not managed by the package manager. Source packages are not
installed into the installation tree, so they do not need to include metadata
for the purpose of their own uninstallation. Rather, the metadata included in
a source package is necessary for ensuring that the compile and link
dependencies are satisfied when building the binary packages, and for
generating the metadata necessary for each binary package being produced.
The metadata can also be used by a convenience tool which builds/installs/
generates binary packages from multiple source package ordered by their
dependencies.
Source packages shall have the following metadata:
- Name of the source package.
- Aging version of the source package.
- Types of binary packages produced.
- A flag indicating whether the package is built with flavors or not.
- A version specification of a configuration specification package
if the package requires run-time configuration files.
- Compile Dependencies: Packages needed for compilation (headers etc.).
Each dependency comes with a list of version specifications
- Linking Dependencies: Packages containing the libraries needed for
linking. Note that this dependency ties to two different binary
package types dev and rtl because the user will have a choice on
whether to link against the shared or static binary of a package's
library. Each dependency comes with a list of version specifications
- Build Env. CFLAGS line containing defines needed to use the header
files. LIBS line containing external libraries need to link with
the libraries in this package. And various other build flags.
- Runtime Dependencies: Packages containing programs, scripts, and data
files needed to run programs or scripts in this package. These
dependencies will be transferred to the binary packages that are
built from this source packages. Each dependency comes with a list of
version specifications
This package contains dynamically linked executables and scripts. It will
always be generated from a source package and will share the source package's
name and version. If the package contains executables it shall also have a
flavor as part of its identity. If the package contains only scripts then it
can be designated as "noflavor".
Program packages can have run-time dependencies if their executables and
scripts call executables and scripts in other program packages. They could
also have runtime dependencies on data files and documents.
A program package can also have runtime linking dependencies if its
executables are linked with libraries from rtl packages. For example, say
that an executable links with libfoo in package fum. If the executable is
linked to the shared library libfoo.so then the linking dependency translates
to the fum_rtl package which will have to be installed before the program
package.
Dynamically linked program binary package metadata:
- Name of the package.
- Aging version of the package.
- Package type
- Flavor the package was built with (or noflavor). All of the libraries
listed in the link dependency list will have the same flavor.
- A version specification of a configuration specification package
if the package requires run-time configuration files.
- Runtime dependencies (including a version specification for each
dependency) with other pgm, data, and doc packages.
- Runtime linking dependencies (including a version specification for
each dependency) to rtl packages if the executables were linked with
shared libraries.
This package contains statically linked executables. It will always be
generated from a source package and will share the source package's name and
version. The package shall also have a flavor as part of its identity.
Static program packages can have run-time dependencies if their executables
call executables and scripts in other program packages. They could also have
runtime dependencies on data files and documents. In addition these packages
absorb the runtime dependencies of the static libraries they are linked with.
For example, consider a program foo that statically links with a library
libfee.a that has a system call to still another program fum. The library
libfee has a runtime dependency to the program fum. The program foo will
have to absorb this dependency so that program fum is installed before program
foo is installed.
A program package can also have regeneration dependencies if its executables
are linked with libraries from other packages. For example, say that an
executable links with libfoo in package fum. If the library was linked
statically to libfoo.a then the dependency is translated to the fum_dev
package. In this case the program package will have to be regenerated any time
fum_dev is updated. A build number will be updated to reflect the
regeneration.
None of the executables in a program package shall ever be built with a
mixture of static and shared package libraries because this complicates the
compatibility checks needed at runtime to make sure that all of the libraries
are compatible.
Statically linked program binary package metadata:
- Name of the package.
- Aging version of the package.
- Package type
- Flavor the package was built with. All of the libraries listed in
the package regeneration dependency list will have the same flavor.
- A version specification of a configuration specification package
if the package requires run-time configuration files.
- Package regeneration dependencies (including a version specification
for each dependency) to dev packages if the executables were linked
with static libraries. This list is the same as the runtime linking
dependencies list of the pgm packages except for the package type.
- Runtime dependencies (including a version specification for each
dependency) with other pgm, data, and doc packages. In addition to
the runtime dependencies of the executables and scripts, this list
also contains the runtime dependencies of the static libraries for the
entire regeneration dependency tree.
This package contains flavored header files, static libraries, and
libtool library files. It will always be generated from a source
package and will share the source package's name and version. The
package shall always have a flavor as part of its identity.
Development packages can have run-time dependencies if their libraries call
executables and scripts in other program packages. They could also have
runtime dependencies on data files and documents.
Even though development packages are not installed for run-time they can still
have run-time dependencies with other pgm, data, and doc packages if the
libraries access files or programs in these packages. The run-time
dependencies of a static library will have to be absorbed by a pgm_static
package if it contains an executable that was linked with the library.
A development package can have a compile dependency to another dev package if
it contains a header file that includes headers from the other package.
A development package can also have linking dependencies if its libraries use
symbols from libraries contained in other packages. These dependencies are
contained here so that the dependency tree for an executable (from a
pgm_static) can be recursively extracted when the executable is built.
Development binary package metadata:
- Name of the package.
- Aging Version of the package.
- Package type
- Flavor the package was built with. All of the libraries
listed in the link dependency list will have the same flavor.
- A version specification of a configuration specification package
if the package requires run-time configuration files.
- Compile dependencies if files from the package include headers from
other packages. In most cases this list is the same as the linking
dependencies.
- Linking dependencies (including a version specification for each
dependency) to other dev packages if its libraries use symbols
from other libraries.
- Build Env. CFLAGS line containing defines needed to use the header
files. LIBS line containing the libraries provided by this package,
the external libraries needed to link with the libraries in this
package, and various other build flags.
- Runtime dependencies (including a version specification for each
dependency) with other pgm, data, and doc packages.
This package contains header files. It will always be generated
from a source package and will share the source package's name and
version. The package contains only header files which are not
configured for a flavor and so is assumed to be "noflavor".
A non-flavored headers package can have a compile dependency to
another dev package if it contains a header file that includes
headers from the other package.
Non-Flavored Headers binary package metadata:
- Name of the package.
- Aging version of the package.
- Package type
- Compile dependencies if header files from the package include headers
from other packages.
This package contains libraries used at run-time by programs and scripts. It
will always be generated from a source package and will share the source
package's name and version. If the package contains binaries it shall also
have a flavor as part of its identity. Otherwise it is a noflavor.
Runtime packages can have run-time dependencies if their libraries call
executables and scripts in other program packages. They could also have
runtime dependencies on data files and documents.
Runtime packages have linking dependencies which are needed at runtime. For
example when a program using shared library foo starts execution, it needs to
load libfoo.so as well as all of the shared libraries that libfoo depends on
for symbols.
Runtime library binary package metadata:
- Name of the package.
- Aging version of the package.
- Package type
- Flavor the package was built with. All of the libraries listed in the
link dependency list will have the same flavor.
- A version specification of a configuration specification package
if the package requires run-time configuration files.
- Linking dependencies (including a version specification for each
dependency) to other rtl packages if its libraries use symbols from
other libraries.
- Runtime dependencies (including a version specification for each
dependency) with other pgm, data, and doc packages.
This package contains data files which cannot be modified by users. It will
always be generated from a source package and will share the source package's
name and version. If the package shall also have a flavor as part of its
identity if any data files are configured for flavored. Otherwise it will be
noflavored.
Data packages have run-time dependencies if data files include files from
other data packages.
Data binary package metadata:
- Name of the package.
- Aging version of the package.
- Package type
- Flavor the package was built with or noflavor.
- Runtime dependencies (including a version specification for each
dependency) to other data packages if its data files include files
from other dev packages.
This package contains documents. It will always be generated from a source
package and will share the source package's name and version. It will always
be noflavored.
Document packages have run-time dependencies if document files include files
from other doc packages.
Document binary package metadata:
- Name of the package
- Version of the package
- Package type
- Flavor the package was built with or noflavor
- Runtime dependencies (including a version specification for each
dependency) to other document packages if its document files include
files from other dev packages
Globus Installations
A globus installation is defined as a set of packages installed in one
location whose dependencies are completely resolved with other packages
installed in the same location.
A platform can (should) have multiple globus installations. The packages in
any one of these installations shall have no knowledge of the other
installations.
Users shall be able to switch between installations by using the environment
variable $GLOBUS_INSTALL_PATH.
Only one version of a package shall be installed in any particular globus
installation.
Multiple flavors and binary package types for a particular package can be
installed in the same globus installation.
Only one flavor of a pgm or pgm_static package can be installed. This is
because executables are not tagged with the flavor name.
Other than programs, any installed file that is configured for a flavor shall
have the flavor name appended to its filename. For example libfoo.so compiled
with the flavor sweet shall be installed as libfoo_sweet.so. The exception is
flavored header files which keep their name but are installed in a flavored
subdirectory. In other words a header foo.h which is configured for the
flavor sweet is installed as $GLOBUS_INSTALL_PATH/include/sweet/foo.h.
Every package shall install packaging metadata in $GLOBUS_INSTALL_PATH/etc so
that the data can be used for package management tasks, for building other
globus packages, and for building applications which use globus components.
Globus Core
Globus core shall be used to define the build environment for a particular
flavor. The flavor is defined when the core is configured. Specifics for a
flavor can be passed into configure using --with-* and --enable-* options.
The flavor name is specified using the --with-flavor= option.
Globus core shall install a flavor specific header file and a flavor specific
script initializer which shall be used to build all other packages for that
particular flavor.
Globus core treats the name of a flavor as an arbitrary label as long as it
can be used in directory and file names.
The Globus group shall establish a flavor naming policy that allows packages
from different globus distributions to be interoperable. This policy shall
not be part of globus core.
Globus core shall provide a script which will create a run time configuration
file that locates various scripting tools needed to run globus scripts and
build globus components.