From 29cbd769fc8bb025856a24f9868df89f979cc903 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Thu, 27 Mar 2008 19:56:46 +0100 Subject: community/gsoc/project_ideas: Various amendments and improvements. --- community/gsoc/project_ideas.mdwn | 309 ++++++++++++++++++++------------------ 1 file changed, 166 insertions(+), 143 deletions(-) diff --git a/community/gsoc/project_ideas.mdwn b/community/gsoc/project_ideas.mdwn index f283155b..8e7a032f 100644 --- a/community/gsoc/project_ideas.mdwn +++ b/community/gsoc/project_ideas.mdwn @@ -210,7 +210,7 @@ This task will require digging into parts of the code to understand how file locking works on the Hurd. Only general programming skills are required. -## procfs +## `procfs` Although there is no standard (POSIX or other) for the layout of the `/proc` pseudo-filesystem, it turned out a very useful facility in GNU/Linux and other @@ -316,7 +316,7 @@ discussing this topic, from a last year's GSoC application. -## dtrace Support +## `dtrace` Support One of the main problems of the current Hurd implementation is very poor performance. While we have a bunch of ideas what could cause the performance @@ -343,11 +343,11 @@ with integrating existing components as well as low-level programming. ## Hurdish TCP/IP Stack -The Hurd presently uses a TCP/IP stack based on code from an old Linux version. +The Hurd presently uses a [[TCP/IP_stack|hurd/translator/pfinet]] based on code from an old Linux version. This works, but lacks some rather important features (like PPP/PPPoE), and the design is not hurdish at all. -A true hurdish network stack will use a set of stack of translator processes, +A true hurdish network stack will use a set of stack of [[hurd/translator]] processes, each implementing a different protocol layer. This way not only the implementation gets more modular, but also the network stack can be used way more flexibly. Rather than just having the standard socket interface, plus some @@ -362,6 +362,8 @@ layers, it's up to the student to design and implement the various interfaces at each layer. This task requires understanding the Hurd philosophy and translator programming, as well as good knowledge of TCP/IP. +This is [[GNU_Savannah_task 5469]]. + ## Improved NFS Implementation @@ -377,18 +379,19 @@ a previous unfinished GSoC project can serve as a starting point. Both client and server parts need work, though the client is probably much more important for now, and shall be the major focus of this project. -The task has no special prerequisites besides general programming skills, and +This task, [[GNU_Savannah_task 5497]], has no special prerequisites besides general programming skills, and an interest in file systems and network protocols. -## Fix libdiskfs Locking Issues +## Fix `libdiskfs` Locking Issues Nowadays the most often encountered cause of Hurd crashes seems to be lockups -in the ext2fs server. One of these could be traced recently, and turned out to -be a lock inside libdiskfs that was taken and not released in some cases. There -is reason to believe that there are more faulty paths causing these lockups. +in the [[hurd/translator/ext2fs]] server. One of these could be traced +recently, and turned out to be a lock inside [[hurd/libdiskfs]] that was taken +and not released in some cases. There is reason to believe that there are more +faulty paths causing these lockups. -The task is systematically checking the libdiskfs code for this kind of locking +The task is systematically checking the [[hurd/libdiskfs]] code for this kind of locking issues. To achieve this, some kind of test harness has to be implemented: For exmple instrumenting the code to check locking correctness constantly at runtime. Or implementing a unit testing framework that explicitely checks @@ -426,7 +429,8 @@ One possible option is creating a wrapper that implements the cthreads interfaces on top of pthreads, to ease the transition -- but it might very well turn out that it's easier to just change all the existing code to use pthreads directly. This is up to the student. Such a wrapper has been proposed as -[[GNU_Savannah_task 7895]]. +[[GNU_Savannah_task 7895]] and its implementation would be a useful +starting-point. This project requires relatively little Hurd-specific knowledge. Experience with multithreaded programming in general and pthreads in particular is @@ -435,11 +439,12 @@ required, though. ## Sound Support -The Hurd presently has no sound support. Fixing this requires two steps: One is -to port kernel drivers so we can get access to actual sound hardware. The -second is to implement a userspace server (translator), that implements an -interface on top of the kernel device that can be used by applications -- -probably OSS or maybe ALSA. +The Hurd presently has no sound support. Fixing this, [[GNU_Savannah_task +5485]], requires two steps: the first is to port some other kernel's drivers to +[[GNU_Mach|microkernel/mach/gnumach]] so we can get access to actual sound +hardware. The second is to implement a userspace server ([[hurd/translator]]), +that implements an interface on top of the kernel device that can be used by +applications -- probably OSS or maybe ALSA. Completing this task requires porting at least one driver (e.g. from Linux) for a popular piece of sound hardware, and the basic userspace server. For the @@ -452,6 +457,9 @@ time for porting more drivers, or implementing a more sophisticated userspace infrastructure. The latter requires good understanding of the Hurd philosophy, to come up with an appropriate design. +Another option would be to evaluate whether a driver that is completely running +in user-space is feasible. + ## Disk I/O Performance Tuning @@ -461,13 +469,15 @@ systems like GNU/Linux, is very slow harddisk access. The reason for this slowness is lack and/or bad implementation of common optimisation techniques, like scheduling reads and writes to minimalize head movement; effective block caching; effective reads/writes to partial blocks; -reading/writing multiple blocks at once; and read-ahead. The ext2 filesystem -driver might also need some optimisations at a higher logical level. +reading/writing multiple blocks at once; and read-ahead. The +[[ext2_filesystem_server|hurd/translator/ext2fs]] might also need some +optimisations at a higher logical level. The goal of this project is to analyze the current situation, and implement/fix various optimisations, to achieve significantly better disk performance. It requires understanding the data flow through the various layers involved in -disk acces on the Hurd (filesystem, pager, driver), and general experience with +disk acces on the Hurd ([[filesystem|hurd/virtual_file_system]], +[[pager|hurd/libpager]], driver), and general experience with optimising complex systems. That said, the killing feature we are definitely missing is the read-ahead, and even a very simple implementation would bring very big performance speedups. @@ -475,7 +485,7 @@ very big performance speedups. ## VM Tuning -Hurd/Mach presently make very bad use of the available physical memory in the +Hurd/[[microkernel/Mach]] presently make very bad use of the available physical memory in the system. Some of the problems are inherent to the system design (the kernel can't distinguish between important application data and discardable disk buffers for example), and can't be fixed without fundamental changes. Other @@ -490,100 +500,109 @@ implementation to other systems, implementing any worthwhile improvements, and general optimisation/tuning. It requires very good understanding of the Mach VM, and virtual memory in general. +This project is related to [[GNU_Savannah_task 5489]]. + -## mtab +## `mtab` In traditional monolithic system, the kernel keeps track of all mounts; the -information is available through /proc/mounts (on Linux at least), and in a -very similar form in /etc/mtab. +information is available through `/proc/mounts` (on Linux at least), and in a +very similar form in `/etc/mtab`. -The Hurd on the other hand has a totally decentralized file system. There is no -single entity involved in all mounts. Rather, only the parent file system to -which a mountpoint (translator) is attached is involved. As a result, there is -no central place keeping track of mounts. +The Hurd on the other hand has a totally +[[decentralized_file_system|hurd/virtual_file_system]]. There is no single +entity involved in all mounts. Rather, only the parent file system to which a +mountpoint ([[hurd/translator]]) is attached is involved. As a result, there +is no central place keeping track of mounts. As a consequence, there is currently no easy way to obtain a listing of all -mounted file systems. This also means that commands like "df" can only work on +mounted file systems. This also means that commands like `df` can only work on explicitely specified mountpoints, instead of displaying the usual listing. One possible solution to this would be for the translator startup mechanism to -update the mtab on any mount/unmount, like in traditional systems. However, -there are same problems with this approach. Most notably: What to do with -passive translators, i.e. translators that are not presently running, but set -up to be started automatically whenever the node is accessed? Probably these -should be counted an among the mounted filesystems; but how to handle the mtab -updates for a translator that is not started yet? Generally, being centralized -and event-based, this is a pretty unelegant, non-hurdish solution. - -A more promising approach is to have mtab exported by a special translator, -which gathers the necessary information on demand. This could work by +update the `mtab` on any `mount`/`unmount`, like in traditional systems. +However, there are same problems with this approach. Most notably: what to do +with passive translators, i.e., translators that are not presently running, but +set up to be started automatically whenever the node is accessed? Probably +these should be counted an among the mounted filesystems; but how to handle the +`mtab` updates for a translator that is not started yet? Generally, being +centralized and event-based, this is a pretty unelegant, non-hurdish solution. + +A more promising approach is to have `mtab` exported by a special translator, +which gathers the necessary information on demand. This could work by traversing the tree of translators, asking each one for mount points attached -to it. (Theoretically, it could also be done by just traversing *all* nodes, -checking each one for attached translators. That would be very inefficient, -though. Thus a special interface is probably required, that allows asking a +to it. (Theoretically, it could also be done by just traversing *all* nodes, +checking each one for attached translators. That would be very inefficient, +though. Thus a special interface is probably required, that allows asking a translator to list mount points only.) -There are also some other issues to keep in mind. Traversing arbitrary +There are also some other issues to keep in mind. Traversing arbitrary translators set by other users can be quite dangerous -- and it's probably not very interesting anyways what private filesystems some other user has mounted. -But what about the global /etc/mtab? Should it list only root-owned -filesystems? Or should it create different listings depending on what user +But what about the global `/etc/mtab`? Should it list only root-owned +filesystems? Or should it create different listings depending on what user contacts it?... -That leads to a more generic question: Which translators should be actually -listed? There are all kinds of translators: Ranging from traditional -filesystems (disks and other actual stores), but also purely virtual -filesystems like ftpfs or unionfs, and even things that have very little to do -with a traditional filesystem, like gzip translator, mbox translator, xml -translator, or various device file translators... Listing all of these in -/etc/mtab would be pretty pointless, so some kind of classification mechanism -is necessary. By default it probably should list only translators that claim to -be real filesystems, though alternative views with other filtering rules might -be desirable. +That leads to a more generic question: which translators should be actually +listed? There are different kinds of translators: ranging from traditional +filesystems ([[disks|hurd/libdiskfs]] and other actual +[[stores|hurd/translator/storeio]]), but also purely virtual filesystems like +[[hurd/translator/ftpfs]] or [[hurd/translator/unionfs]], and even things that +have very little to do with a traditional filesystem, like a +[[gzip_translator|hurd/translator/storeio]], +[[mbox_translator|hurd/translator/mboxfs]], +[[xml_translator|hurd/translator/xmlfs]], or various device file translators... +Listing all of these in `/etc/mtab` would be pretty pointless, so some kind of +classification mechanism is necessary. By default it probably should list only +translators that claim to be real filesystems, though alternative views with +other filtering rules might be desirable. After taking decisions on the outstanding design questions, the student will -implement both the actual mtab translator, and the necessery interface(s) for -gathering the data. It requires getting a good understanding of the translator -mechanism and Hurd interfaces in general. +implement both the actual [[mtab_translator|hurd/translator/mtabfs]], and the +necessery interface(s) for gathering the data. It requires getting a good +understanding of the translator mechanism and Hurd interfaces in general. ## GNU Mach Code Cleanup Although there are some attempts to move to a more modern microkernel -alltogether, the current Hurd implementation is based on gnumach, which is only -a slightly modified variant of the original CMU Mach. +alltogether, the current Hurd implementation is based on +[[GNU_Mach|microkernel/mach/gnumach]], which is only a slightly modified +variant of the original CMU [[microkernel/Mach]]. Unfortunately, Mach was created about two decades ago, and is in turn based on -even older BSD code. Parts of the BSD kernel -- file systems, UNIX mechanisms -like processes and signals etc. -- were ripped out (to be implemented in -userspace servers instead); while other mechanisms were added to allow -implementing stuff in userspace. (Pager interface, IPC etc.) +even older BSD code. Parts of the BSD kernel -- file systems, UNIX mechanisms +like processes and signals, etc. -- were ripped out (to be implemented in +[[userspace_servers|hurd/translator]] instead); while other mechanisms were +added to allow implementing stuff in userspace. +([[Pager_interface|microkernel/mach/external_pager_mechanism]], +[[microkernel/mach/IPC]], etc.) Also, Mach being a research project, many things were tried, adding lots of optional features not really needed. The result of all this is that the current code base is in a pretty bad shape. It's rather hard to make modifications -- to make better use of modern hardware -for example, or even to fix bugs. The goal of this project is to improve the +for example, or even to fix bugs. The goal of this project is to improve the situation. -The task starts out easy, with fixing compiler warnings. Later it moves on to -more tricky things: Removing dead or unneeded code paths; restructuring code +The task starts out easy, with fixing compiler warnings. Later it moves on to +more tricky things: removing dead or unneeded code paths; restructuring code for readability and maintainability. This task requires good knowledge of C, and experience with working on a large -existing code base. Previous kernel hacking experience is an advantage, but not -really necessary. +existing code base. Previous kernel hacking experience is an advantage, but +not really necessary. -## xmlfs +## `xmlfs` -Hurd translators allow presenting underlying data in a different format. This -is a very powerful ability: It allows using standard tools on all kinds of -data, and combining existing components in new ways, once you have the -necessary translators. +Hurd [[translators|hurd/translator]] allow presenting underlying data in a +different format. This is a very powerful ability: it allows using standard +tools on all kinds of data, and combining existing components in new ways, once +you have the necessary translators. -A typical example for such a translator would be xmlfs: A translator that +A typical example for such a translator would be xmlfs: a translator that presents the contents of an underlying XML file in the form of a directory tree, so it can be studied and edited with standard filesystem tools, or using a graphical file manager, or to easily extract data from an XML file in a @@ -598,57 +617,59 @@ Ideally, the translation should be reversible, so that another, complementary translator applied on the expanded directory tree would yield the original XML file again; and also the other way round, applying the complementary translator on top of some directory tree and xmlfs on top of that would yield the original -directory again. However, with the different semantics of directory trees and -XML files, it might not be possible to create such a universal mapping. Thus it -is a desirable goal, but not a strict requirement. +directory again. However, with the different semantics of directory trees and +XML files, it might not be possible to create such a universal mapping. Thus +it is a desirable goal, but not a strict requirement. The goal of this project is to create a fully usable XML translator, that -allows both reading and writing any XML file. Implementing the complementary +allows both reading and writing any XML file. Implementing the complementary translator also would be nice if time permits, but is not mandatory part of the task. The [[existing_partial_(read-only)_xmlfs_implementation|hurd/translator/xmlfs]] can serve as a starting point. -This task requires pretty good designing skills. Good knowledge of XML is also -necessary. Learning translator programming will obviously be necessary to +This task requires pretty good designing skills. Good knowledge of XML is also +necessary. Learning translator programming will obviously be necessary to complete the task. -## Allow Using unionfs Early at Boot +## Allow Using `unionfs` Early at Boot In UNIX systems, traditionally most software is installed in a common directory hierachy, where files from various packages live beside each other, grouped by -function: User-invokable executables in /bin, configuration files in /etc, -architecture specific static files in /lib, variable data in /var and so on. To -allow clean installation, deinstallation, and upgrade of software packages, -GNU/Linux distributions usually come with a package manager, which keeps track -of all files upon installation/removal in some kind of central database. - -An alternative approach is the one implemented by GNU Stow: Each package is -actually installed in a private directory tree. The actual standard directory +function: user-invokable executables in `/bin`, system-wide configuration files +in `/etc`, architecture specific static files in `/lib`, variable data in +`/var`, and so on. To allow clean installation, deinstallation, and upgrade of +software packages, GNU/Linux distributions usually come with a package manager, +which keeps track of all files upon installation/removal in some kind of +central database. + +An alternative approach is the one implemented by GNU Stow: each package is +actually installed in a private directory tree. The actual standard directory structure is then created by collecting the individual files from all the -packages, and presenting them in the common /bin, /lib etc. locations. +packages, and presenting them in the common `/bin`, `/lib`, etc. locations. While the normal Stow package (for traditional UNIX systems) uses symlinks to the actual files, updated on installation/deinstallation events, the Hurd -translator mechanism allows a much more elegant solution: Stowfs (which is -actually a special mode of unionfs) creates virtual directories on the fly, -composed of all the files from the individual package directories. +[[hurd/translator]] mechanism allows a much more elegant solution: +[[hurd/translator/stowfs]] (which is actually a special mode of +[[hurd/translator/unionfs]]) creates virtual directories on the fly, composed +of all the files from the individual package directories. The problem with this approach is that unionfs presently can be launched only once the system is booted up, meaning the virtual directories are not available -at boot time. But the boot process itself already needs access to files from -various packages. So to make this design actually usable, it is necessary to +at boot time. But the boot process itself already needs access to files from +various packages. So to make this design actually usable, it is necessary to come up with a way to launch unionfs very early at boot time, along with the root filesystem. Completing this task will require gaining a very good understanding of the Hurd -boot process and other parts of the design. It requires some design skills also -to come up with a working mechanism. +boot process and other parts of the design. It requires some design skills +also to come up with a working mechanism. -## Fix tmpfs +## Fix `tmpfs` In some situations it is desirable to have a file system that is not backed by actual disk storage, but only by anonymous memory, i.e. lives in the RAM (and @@ -656,25 +677,26 @@ possibly swap space). A simplistic way to implement such a memory filesystem is literally creating a ramdisk, i.e. simply allocating a big chunck of RAM (called a memory store in -Hurd terminology), and create a normal filesystem like ext2 on that. However, +Hurd terminology), and create a normal filesystem like ext2 on that. However, this is not very efficient, and not very convenient either (the filesystem -needs to be recreated each time the ramdisk is invoked). A nicer solution is -having a real tmpfs, which creates all filesystem structures directly in RAM, -allocating memory on demand. +needs to be recreated each time the ramdisk is invoked). A nicer solution is +having a real [[hurd/translator/tmpfs]], which creates all filesystem +structures directly in RAM, allocating memory on demand. -The Hurd has had such a tmpfs for a long time. However, the existing +The Hurd has had such a tmpfs for a long time. However, the existing implementation doesn't work anymore -- it got broken by changes in other parts of the Hurd design. -There are several issues. The most serious known problem seems to be -that for technical reasons it receives RPCs from two different sources on one -port, and gets mixed up with them. Fixing this is non-trivial, and requires a -good understanding of the involved mechanisms. +There are several issues. The most serious known problem seems to be that for +technical reasons it receives [[microkernel/mach/RPC]]s from two different +sources on one [[microkernel/mach/port]], and gets mixed up with them. Fixing +this is non-trivial, and requires a good understanding of the involved +mechanisms. The goal of this project to get a fully working, full featured tmpfs -implementation. It requires digging into some parts of the Hurd, incuding the -pager interface and translator programming. This task probably doesn't require -any design work, only good debugging skills. +implementation. It requires digging into some parts of the Hurd, incuding the +[[pager_interface|hurd/libpager]] and [[hurd/translator]] programming. This +task probably doesn't require any design work, only good debugging skills. ## Lexical `..` Resolution @@ -704,41 +726,41 @@ See also [[GNU_Savannah_bug 17133]]. ## Secure `chroot` implementation As the Hurd attempts to be (almost) fully UNIX-compatible, it also implements a -chroot() system call. However, the current implementation is not really good, -as it allows easily escaping the chroot, for example by use of passive -translators. +`chroot()` system call. However, the current implementation is not really +good, as it allows easily escaping the `chroot`, for example by use of +[[passive_translators|hurd/translator]]. Many solutions have been suggested for this problem -- ranging from simple -workaround changing the behaviour of passive translators in a chroot; changing -the context in which passive translators are exectuted; changing the +workaround changing the behaviour of passive translators in a `chroot`; +changing the context in which passive translators are exectuted; changing the interpretation of filenames in a chroot; to reworking the whole passive -translator mechanism. Some involving a completely different approch to chroot -implementation, using a proxy instead of a special system call in the +translator mechanism. Some involving a completely different approch to +`chroot` implementation, using a proxy instead of a special system call in the filesystem servers. The task is to pick and implement one approach for fixing chroot. -This task is pretty heavy: It requires a very good understanding of file name +This task is pretty heavy: it requires a very good understanding of file name lookup and the translator mechanism, as well as of security concerns in general -- the student must prove that he really understands security implications of the UNIX namespace approach, and how they are affected by the introduction of -new mechanisms. (Translators.) More important than the acualy code is the -documentation of what he did: He must be able to defend why he chose a certain +new mechanisms. (Translators.) More important than the acualy code is the +documentation of what he did: he must be able to defend why he chose a certain approach, and explain why he believes this approach really secure. ## Hurdish Package Manager for the GNU System Most GNU/Linux systems use pretty sophisticated package managers, to ease the -management of installed software. These keep track of all installed files, and -various kinds of other necessary information, in special databases. On package +management of installed software. These keep track of all installed files, and +various kinds of other necessary information, in special databases. On package installation, deinstallation, and upgrade, scripts are used that make all kinds of modifications to other parts of the system, making sure the packages get properly integrated. -This approach creates various problems. For one, *all* management has to be +This approach creates various problems. For one, *all* management has to be done with the distribution package management tools, or otherwise they would -loose track of the system state. This is reinforced by the fact that the state +loose track of the system state. This is reinforced by the fact that the state information is stored in special databases, that only the special package management tools can work with. @@ -747,31 +769,32 @@ Also, as changes to various parts of the system are made on certain events transitions becomes very complex and bug-prone. For the official (Hurd-based) GNU system, a different approach is intended: -Making use of Hurd translators -- more specifically their ability to present -existing data in a different form -- the whole system state will be created on -the fly, directly from the information provided by the individual packages. The -visible system state is always a reflection of the sum of packages installed at -a certain moment; it doesn't matter how this state came about. There are no -global databases of any kind. (Some things might require caching for better -performance, but this must happen transparently.) - -The core of this approach is formed by stowfs, which creates a traditional unix -directory structure from all the files in the individual package directories. -But this only handles the lowest level of package management. Additional -mechanisms are necessary to handle stuff like dependencies on other packages. +making use of Hurd [[translators|hurd/translator]] -- more specifically their +ability to present existing data in a different form -- the whole system state +will be created on the fly, directly from the information provided by the +individual packages. The visible system state is always a reflection of the +sum of packages installed at a certain moment; it doesn't matter how this state +came about. There are no global databases of any kind. (Some things might +require caching for better performance, but this must happen transparently.) + +The core of this approach is formed by [[hurd/translator/stowfs]], which +creates a traditional unix directory structure from all the files in the +individual package directories. But this only handles the lowest level of +package management. Additional mechanisms are necessary to handle stuff like +dependencies on other packages. The goal of this task is to create these mechanisms. ## Port the Debian Installer to the Hurd -The primary means of distributing the Hurd is through Debian GNU/Hurd. However, -the installation CDs presently use an ancient, non-native installer. The -situation could be much improved by making sure that the newer Debian Installer -works on the Hurd. +The primary means of distributing the Hurd is through Debian GNU/Hurd. +However, the installation CDs presently use an ancient, non-native installer. +The situation could be much improved by making sure that the newer *Debian +Installer* works on the Hurd. Some preliminary work has been done, see -http://wiki.debian.org/DebianInstaller/Hurd . +. -The goal is to have the Debian Installer fully working on the Hurd. It requires -relatively little Hurd-specific knowledge. +The goal is to have the Debian Installer fully working on the Hurd. It +requires relatively little Hurd-specific knowledge. -- cgit v1.2.3