[[meta copyright="Copyright © 2001 Marcus Brinkmann"]] [[meta license="Verbatim copying and distribution of this entire article is permitted in any medium, provided this notice is preserved."]] [[meta title="The Hurd, a presentation by Marcus Brinkmann"]]
This talk about the Hurd was written by Marcus Brinkmann for
When we talk about free software, we usually refer to the free software licenses. We also need relief from software patents, so our freedom is not restricted by them. But there is a third type of freedom we need, and that's user freedom.
Expert users don't take a system as it is. They like to change the configuration, and they want to run the software that works best for them. That includes window managers as well as your favourite text editor. But even on a GNU/Linux system consisting only of free software, you can not easily use the filesystem format, network protocol or binary format you want without special privileges. In traditional unix systems, user freedom is severly restricted by the system administrator.
The Hurd removes these restrictions from the user. It provides an user extensible system framework without giving up POSIX compatibility and the unix security model. Throughout this talk, we will see that this brings further advantages beside freedom.
The Hurd is a POSIX compatible multi-server system operating on top of the GNU Mach microkernel. Topics:
|
The Hurd is a POSIX compatible multi-server system operating on top of the GNU Mach Microkernel.
I will have to explain what GNU Mach is, so we start with that. Then I will talk about the Hurd's architecture. After that, I will give a short overview on the Hurd libraries. Finally, I will tell you how the Debian project is related to the Hurd.
|
When Richard Stallman founded the GNU project in 1983, he wanted to write an operating system consisting only of free software. Very soon, a lot of the essential tools were implemented, and released under the GPL. However, one critical piece was missing: The kernel.
After considering several alternatives, it was decided not to write a new kernel from scratch, but to start with the Mach microkernel. This was in 1988, and it was not before 1991 that Mach was released under a license allowing the GNU project to distribute it as a part of the system.
In 1998, I started the Debian GNU/Hurd project, and in 2001 the number of available GNU/Hurd packages fills three CD images.
Microkernel:
Monolithic kernel:
|
Microkernels were very popular in the scientific world around that time. They don't implement a full operating system, but only the infrastructure needed to enable other tasks to implement most features. In contrast, monolithical kernels like Linux contain program code of device drivers, network protocols, process management, authentication, file systems, POSIX compatible interfaces and much more.
So what are the basic facilities a microkernel provides? In general, this is resource management and message passing. Resource management, because the kernel task needs to run in a special privileged mode of the processor, to be able to manipulate the memory management unit and perform context switches (also to manage interrupts). Message passing, because without a basic communication facility the other tasks could not interact to provide the system services. Some rudimentary hardware device support is often necessary to bootstrap the system. So the basic jobs of a microkernel are enforcing the paging policy (the actual paging can be done by an external pager task), scheduling, message passing and probably basic hardware device support.
Mach was the obvious choice back then, as it provides a rich set of interfaces to get the job done. Beside a rather brain-dead device interface, it provides tasks and threads, a messaging system allowing synchronous and asynchronous operation and a complex interface for external pagers. It's certainly not one of the sexiest microkernels that exist today, but more like a big old mama. The GNU project maintains its own version of Mach, called GNU Mach, which is based on Mach 4.0. In addition to the features contained in Mach 4.0, the GNU version contains many of the Linux 2.0 block device and network card drivers.
A complete treatment of the differences between a microkernel and monolithical kernel design can not be provided here. But a couple of advantages of a microkernel design are fairly obvious.
Microkernel
Monolithic kernel
|
Because the system is split up into several components, clean interfaces have to be developed, and the responsibilities of each part of the system must be clear.
Once a microkernel is written, it can be used as the base for several different operating systems. Those can even run in parallel which makes debugging easier. When porting, most of the hardware dependant code is in the kernel.
Much of the code that doesn't need to run in the special kernel mode of the processor is not part of the kernel, so stability increases because there is simply less code to break.
New features are not added to the kernel, so there is no need to hold the barrier high for new operating system features.
Compare this to a monolithical kernel, where you either suffer from creeping featuritis or you are intolerant of new features (we see both in the Linux kernel).
Because in a monolithical kernel, all parts of the kernel can access all data structures in other parts, it is more likely that short cuts are used to avoid the overhead of a clean interface. This leads to a simple speed up of the kernel, but also makes it less comprehensible and more error prone. A small change in one part of the kernel can break remote other parts.
Single Server
Multi Server
A single-server system is comparable to a monolithic kernel system. It has similar advantages and disadvantages. |
There exist a couple of operating systems based on Mach, but they all have the same disadvantages as a monolithical kernel, because those operating systems are implemented in one single process running on top of the kernel. This process provides all the services a monolithical kernel would provide. This doesn't make a whole lot of sense (the only advantage is that you can probably run several of such isolated single servers on the same machine). Those systems are also called single-server systems. The Hurd is the only usable multi-server system on top of Mach. In the Hurd, there are many server programs, each one responsible for a unique service provided by the operating system. These servers run as Mach tasks, and communicate using the Mach message passing facilities. One of them does only provide a small part of the functionality of the system, but together they build up a complete and functional POSIX compatible operating system.
Any multi-server has advantages over single-server:
|
Using several servers has many advantages, if done right. If a file system server for a mounted partition crashes, it doesn't take down the whole system. Instead the partition is "unmounted", and you can try to start the server again, probably debugging it this time with gdb. The system is less prone to errors in individual components, and over-all stability increases. The functionality of the system can be extended by writing and starting new servers dynamically. (Developing these new servers is easier for the reasons just mentioned.)
But even in a multi-server system the barrier between the system and the users remains, and special privileges are needed to cross it. We have not achieved user freedom yet.
The Hurd goes beyond all this, and allows users to write and run their servers, too!
|
To quote Thomas Bushnell, BSG, from his paper [[``Towards_a_New_Strategy_of_OS_design''_(1996)|hurd-paper]]:
The GNU Hurd, by contrast, is designed to make the area of system code as limited as possible. Programs are required to communicate only with a few essential parts of the kernel; the rest of the system is replaceable dynamically. Users can use whatever parts of the remainder of the system they want, and can easily add components themselves for other users to take advantage of. No mutual trust need exist in advance for users to use each other's services, nor does the system become vulnerable by trusting the services of arbitrary users.
So the Hurd is a set of servers running on top of the Mach micro-kernel, providing a POSIX compatible and extensible operating system. What servers are there? What functionality do they provide, and how do they cooperate?
Ports are message queues which can be used as one-way communication channels.
MIG provides remote procedure calls on top of Mach IPC. RPCs look like function calls to the user. |
Inter-process communication in Mach is based on the ports concept. A port is a message queue, used as a one-way communication channel. In addition to a port, you need a port right, which can be a send right, receive right, or send-once right. Depending on the port right, you are allowed to send messages to the server, receive messages from it, or send just one single message.
For every port, there exists exactly one task holding the receive right, but there can be no or many senders. The send-once right is useful for clients expecting a response message. They can give a send-once right to the reply port along with the message. The kernel guarantees that at some point, a message will be received on the reply port (this can be a notification that the server destroyed the send-once right).
You don't need to know much about the format a message takes to be able to use the Mach IPC. The Mach interface generator mig hides the details of composing and sending a message, as well as receiving the reply message. To the user, it just looks like a function call, but in truth the message could be sent over a network to a server running on a different computer. The set of remote procedure calls a server provides is the public interface of this server.
Traditional Mach:
The Hurd:
|
So how does one get a port to a server? You need something like a phone book for server ports, or otherwise you can only talk to yourself. In the original Mach system, a special nameserver is dedicated to that job. A task could get a port to the nameserver from the Mach kernel and ask it for a port (with send right) to a server that registered itself with the nameserver at some earlier time.
In the Hurd, there is no nameserver. Instead, the filesystem is used as the server namespace. This works because there is always a root filesystem in the Hurd (remember that the Hurd is a POSIX compatible system); this is an assumption the people who developed Mach couldn't make, so they had to choose a different strategy. You can use the function hurd_file_name_lookup, which is part of the C library, to get a port to the server belonging to a filename. Then you can start to send messages to the server in the usual way.
mach_port_t identity; mach_port_t pwserver; kern_return_t err; pwserver = hurd_file_name_lookup ("/servers/password"); err = password_check_user (pwserver, 0 /* root */, "supass", &identity); |
As a concrete example, the special filename /servers/password can be used to request a port to the Hurd password server, which is responsible to check user provided passwords.
(explanation of the example)
Task: Lookup /mnt/readme.txt where /mnt has a mounted filesystem.
|
The C library itself does not have a full list of all available servers. Instead pathname resolution is used to traverse through a tree of servers. In fact, filesystems themselves are implemented by servers (let us ignore the chicken and egg problem here). So all the C library can do is to ask the root filesystem server about the filename provided by the user (assuming that the user wants to resolve an absolute path), using the dir_lookup RPC. If the filename refers to a regular file or directory on the filesystem, the root filesystem server just returns a port to itself and records that this port corresponds to the file or directory in question. But if a prefix of the full path matches the path of a server the root filesystem knows about, it returns to the C library a port to this server and the remaining part of the pathname that couldn't be resolved. The C library than has to retry and query the other server about the remaining path component. Eventually, the C library will either know that the remaining path can't be resolved by the last server in the list, or get a valid port to the server in question.
|
It should by now be obvious that the port returned by the server can be used to query the files status, content and other information from the server, if good remote procedure calls to do that are defined and implemented by it. This is exactly what happens. Whenever a file is opened using the C libraries open() call, the C library uses the above pathname resolution to get a port to a server providing the file. Then it wraps a file descriptor around it. So in the Hurd, for every open file descriptor there is a port to a server providing this file. Many other C library calls like read() and write() just call a corresponding RPC using the port associated with the file descriptor.
|
So we don't have a single phone book listing all servers, but rather a tree of servers keeping track of each other. That's really like calling your friend and asking for the phone number of the blond girl at the party yesterday. He might refer you to a friend who hopefully knows more about it. Then you have to retry.
This mechanism has huge advantages over a single nameserver. First, note that standard unix permissions on directories can be used to restrict access to a server (this requires that the filesystems providing those directories behave). You just have to set the permissions of a parent directory accordingly and provide no other way to get a server port.
But there are much deeper implications. Most of all, a pathname never directly refers to a file, it refers to a port of a server. That means that providing a regular file with static data is just one of the many options the server has to service requests on the file port. A server can also create the data dynamically. For example, a server associated with /dev/random can provide new random data on every io_read() on the port to it. A server associated with /dev/fortune can provide a new fortune cookie on every open().
While a regular filesystem server will just serve the data as stored in a filesystem on disk, there are servers providing purely virtual information, or a mixture of both. It is up to the server to behave and provide consistent and useful data on each remote procedure call. If it does not, the results may not match the expectations of the user and confuse him.
A footnote from the Hurd info manual:
(1) You are lost in a maze of twisty little filesystems, all alike....
Because a server installed in the filesystem namespace translates all filesystem operations that go through its root path, such a server is also called "active translator". You can install translators using the settrans command with the -a option.
Active Translators:
|
Many translator settings remain constant for a long time. It would be very lame to always repeat the same couple of dozens settrans calls manually or at boot time. So the Hurd provides a filesystem extension that allows to store translator settings inside the filesystem and let the filesystem servers do the work to start those servers on demand. Such translator settings are called "passive translators". A passive translator is really just a command line string stored in an inode of the filesystem. If during a pathname resolution a server encounters such a passive translator, and no active translator does exist already (for this node), it will use this string to start up a new translator for this inode, and then let the C library continue with the path resolution as described above. Passive translators are installed with settrans using the -p option (which is already the default).
Passive Translators:
|
So passive translators also serve as a sort of automounting feature, because no manual interaction is required. The server start up is deferred until the service is need, and it is transparent to the user.
When starting up a passive translator, it will run as a normal process with the same user and group id as those of the underlying inode. Any user is allowed to install passive and active translators on inodes that he owns. This way the user can install new servers into the global namespace (for example, in his home or tmp directory) and thus extend the functionality of the system (recall that servers can implement other remote procedure calls beside those used for files and directories). A careful design of the trusted system servers makes sure that no permissions leak out.
In addition, users can provide their own implementations of some of the system servers instead the system default. For example, they can use their own exec server to start processes. The user specific exec server could for example start java programs transparently (without invoking the interpreter manually). This is done by setting the environment variable EXECSERVERS. The systems default exec server will evaluate this environment variable and forward the RPC to each of the servers listed in turn, until some server accepts it and takes over. The system default exec server will only do this if there are no security implications. (XXX There are other ways to start new programs than by using the system exec server. Those are still available.)
Let's take a closer look at some of the Hurd servers. It was already mentioned that only few system servers are mandatory for users. To establish your identity within the Hurd system, you have to communicate with the trusted systems authentication server auth. To put the system administrator into control over the system components, the process server does some global bookkeeping.
But even these servers can be ignored. However, registration with the authentication server is the only way to establish your identity towards other system servers. Likewise, only tasks registered as processes with the process server can make use of its services.
A user identity is just a port to an authserver. The auth server stores four set of ids for it:
Basic properties:
|
The Hurd auth server is used to establish the identity of a user for a server. Such an identity (which is just a port to the auth server) consists of a set of effective user ids, a set of effective group ids, a set of available user ids and a set of available group ids. Any of these sets can be empty.
The auth server provides the following operations on ports:
|
If you have two identities, you can merge them and request an identity consisting of the unions of the sets from the auth server. You can also create a new identity consisting only of subsets of an identity you already have. What you can't do is extending your sets, unless you are the superuser which is denoted by having the user id 0.
|
Finally, the auth server can establish the identity of a user for a server. This is done by exchanging a server port and a user identity if both match the same rendezvous port. The server port will be returned to the user, while the server is informed about the id sets of the user. The server can then serve or reject subsequent RPCs by the user on the server port, based on the identity it received from the auth server.
Anyone can write a server conforming to the auth protocol, but of course all system servers use a trusted system auth server to establish the identity of a user. If the user is not using the system auth server, matching the rendezvous port will fail and no server port will be returned to the user. Because this practically requires all programs to use the same auth server, the system auth server is minimal in every respect, and additional functionality is moved elsewhere, so user freedom is not unnecessarily restricted.
The password server /servers/password runs as root and returns a new authentication port in exchange for a unix password. The ids corresponding to the authentication port match the unix user and group ids. Support for shadow passwords is implemented here. |
The password server sits at /servers/password and runs as root. It can hand out ports to the auth server in exchange for a unix password, matching it against the password or shadow file. Several utilities make use of this server, so they don't need to be setuid root.
The superuser must remain control over user tasks, so:
Optionally, user tasks can store:
Also implemented in the proc server:
|
The process server is responsible for some global bookkeeping. As such it has to be trusted and is not replaceable by the user. However, a user is not required to use any of its service. In that case the user will not be able to take advantage of the POSIXish appearance of the Hurd.
The Mach Tasks are not as heavy as POSIX processes. For example, there is no concept of process groups or sessions in Mach. The proc server fills in the gap. It provides a PID for all Mach tasks, and also stores the argument line, environment variables and other information about a process (if the mach tasks provide them, which is usually the case if you start a process with the default fork()/exec()). A process can also register a message port with the proc server, which can then be requested by anyone. So the proc server also functions as a nameserver using the process id as the name.
The proc server also stores some other miscellaneous information not provided by Mach, like the hostname, hostid and system version. Finally, it provides facilities to group processes and their ports together, as well as to convert between pids, process server ports and mach task ports.
User tasks not registering themselve with proc only have a PID assigned. Users can run their own proc server in addition to the system default, at least for those parts of the interface that don't require superuser privileges. |
Although the system default proc server can't be avoided (all Mach tasks spawned by users will get a pid assigned, so the system administrator can control them), users can run their own additional process servers if they want, implementing the features not requiring superuser privileges.
Store based filesystems
Network file systems
Miscellaneous
|
We already talked about translators and the file system service they provide. Currently, we have translators for the ext2, ufs and iso9660 filesystems. We also have an nfs client and an ftp filesystem. Especially the latter is intriguing, as it provides transparent access to ftp servers in the filesystem. Programs can start to move away from implementing a plethora of network protocols, as the files are directly available in the filesystem through the standard POSIX file interface.
Over a dozen libraries support the development of new servers. For special server types highly specialized libraries require only the implementation of a number of callback functions.
|
The Hurd server protocols are complex enough to allow for the implementation of a POSIX compatible system with GNU extensions. However, a lot of code can be shared by all or at least similar servers. For example, all storage based filesystems need to be able to read and write to a store medium splitted in blocks. The Hurd comes with several libraries which make it easy to implement new servers. Also, there are already a lot of examples of different server types in the Hurd. This makes writing a new server easier.
libdiskfs is a library that supports writing store based filesystems like ext2fs or ufs. It is not very useful for filesystems which are purely virtual, like /proc or files in /dev.
libnetfs is intended for filesystems which provide a rich directory hierarchy, but don't use a backing store (for example ftpfs, nfs).
libtrivfs is intended for filesystems which just provide a single inode or directory. Most servers which are not intended to provide a filesystem but other services (like /servers/password) use it to provide a dummy file, so that file operations on the servers node will not return errors. But it can also be used to provide meaningful data in a single file, like a device store or a character device.
Another very useful library is libstore, which is used by all store based filesystems. It provides a store media abstraction. A store consists of a store class and a name (which itself can sometimes contain stores). Primitive store classes:
|
Composed store classes:
Wanted: A similar abstraction for streams (based on channels), which can be used by network and character device servers. |
libstore provides a store abstraction, which is used by all store based filesystems. The store is determined by a type and a name, but some store types modify another store rather than providing a new store, and thus stores can be stacked. For example, the device store type expects a Mach device, but the remap store expects a list of blocks to pick from another store, like remap:1+:device:hd2, which would pick all blocks from hd2 but the first one, which skipped. Because this functionality is provided in a library, all libstore using filesystems support many different store kinds, and adding a new store type is enough to make all store based filesystems support it.
Goal:
Constraints:
Side Goal:
|
The Debian distribution of the GNU Hurd that I started in 1998 is supposed to become a complete binary distribution of the Hurd that is easy to install.
See http://buildd.debian.org/stats/graph.png for the most current version of the statistic.
Plus:
Minus:
|
While good compatibiity can be achieved at the source level, the binary packages can not always express their relationship to the available architectures sufficiently.
For example, the Linux version of makedev is binary-all, where a binary-all-linux relationship would be more appropriate.
More work has to be done here to fix the tools.
Common pitfalls are POSIX incompatibilities:
|
Most packages are POSIX compatible and can be compiled without changes on the Hurd. The maintainers of the Debian source packages are usually very kind, responsiver and helpful.
The Turtle autobuilder software (http://turtle.sourceforge.net) builds the Debian packages on the Hurd automatically.
Upstream benefits:
Debian benefits:
GNU/Hurd benefits:
|
The sheet lists the advantages of all groups involved.
Join us at |
List of contacts.