4.12. Binary Formats

Typically when a command is passed to the shell, the shell will arrange for an executable file to be loaded into memory and a new process is created. Executable files can either be a binary file (usually created by the linker as part of compiling a program) or a shell script (text file to be interpreted by a binary file, like sh(1) or perl(1)). The file(1) command can usually determine what is inside a file.

Binary files need to have a well defined format for the system to be able to use them properly. Part of the file will be the executable machine code (the instructions that tell the CPU what to do), part of it will be data space with pre-defined values, part will be data space with no pre-defined values, etc. Through time, different binary file formats have evolved.

To understand why FreeBSD uses the elf(5) format, the three currently dominant, executable formats for UNIX® must be described:

FreeBSD comes from the classic camp and used the a.out(5) format, a technology tried and proven through many generations of BSD releases, until the beginning of the 3.X branch. Though it was possible to build and run native ELF binaries and kernels on a FreeBSD system for some time before that, FreeBSD initially resisted the push to switch to ELF as the default format. Why? When Linux made its painful transition to ELF, it was due to their inflexible jump-table based shared library mechanism, which made the construction of shared libraries difficult for vendors and developers. Since ELF tools offered a solution to the shared library problem and were generally seen as the way forward, the migration cost was accepted as necessary and the transition made. FreeBSD's shared library mechanism is based more closely on the SunOS™ style shared library mechanism and is easy to use.

So, why are there so many different formats? Back in the PDP-11 days when simple hardware supported a simple, small system, a.out was adequate for the job of representing binaries. As UNIX® was ported, the a.out format was retained because it was sufficient for the early ports of UNIX® to architectures like the Motorola 68k or VAXen.

Then some hardware engineer decided that if he could force software to do some sleazy tricks, a few gates could be shaved off the design and the CPU core could run faster. a.out was ill-suited for this new kind of hardware, known as RISC. Many formats were developed to get better performance from this hardware than the limited, simple a.out format could offer. COFF, ECOFF, and a few others were invented and their limitations explored before settling on ELF.

In addition, program sizes were getting huge while disks and physical memory were still relatively small, so the concept of a shared library was born. The virtual memory system became more sophisticated. While each advancement was done using the a.out format, its usefulness was stretched with each new feature. In addition, people wanted to dynamically load things at run time, or to junk parts of their program after the init code had run to save in core memory and swap space. Languages became more sophisticated and people wanted code called before the main() function automatically. Lots of hacks were done to the a.out format to allow all of these things to happen, and they basically worked for a time. In time, a.out was not up to handling all these problems without an ever increasing overhead in code and complexity. While ELF solved many of these problems, it would be painful to switch from the system that basically worked. So ELF had to wait until it was more painful to remain with a.out than it was to migrate to ELF.

As time passed, the build tools that FreeBSD derived their build tools from, especially the assembler and loader, evolved in two parallel trees. The FreeBSD tree added shared libraries and fixed some bugs. The GNU folks that originally wrote these programs rewrote them and added simpler support for building cross compilers and plugging in different formats. Those who wanted to build cross compilers targeting FreeBSD were out of luck since the older sources that FreeBSD had for as(1) and ld(1) were not up to the task. The new GNU tools chain (binutils) supports cross compiling, ELF, shared libraries, and C++ extensions. In addition, many vendors release ELF binaries, and FreeBSD should be able to run them.

ELF is more expressive than a.out and allows more extensibility in the base system. The ELF tools are better maintained and offer cross compilation support. ELF may be a little slower than a.out, but trying to measure it can be difficult. There are also numerous details that are different between the two such as how they map pages and handle init code.

This, and other documents, can be downloaded from http://ftp.FreeBSD.org/pub/FreeBSD/doc/

For questions about FreeBSD, read the documentation before contacting <questions@FreeBSD.org>.

For questions about this documentation, e-mail <doc@FreeBSD.org>.