Reordered tasks -- by quality of description, importance, and suitability

author: antrik <antrik@users.sf.net> 2008-03-18 02:17:26 +0100
committer: antrik <antrik@users.sf.net> 2008-03-18 02:32:17 +0100
commit: f0841136ec67bb0b1063bac155e4d2efe318d430 (patch)
tree: 1025bef89c90435c6be24ff0b49dcf6134e52203 /community/gsoc/project_ideas.mdwn
parent: 1dcbe9e9f63af0f1f0682ae9a6808f72f2e5bc34 (diff)
1 files changed, 354 insertions, 354 deletions
diff --git a/community/gsoc/project_ideas.mdwn b/community/gsoc/project_ideas.mdwn
index 67ae9918..d845f9ba 100644
--- a/community/gsoc/project_ideas.mdwn
+++ b/community/gsoc/project_ideas.mdwn
@@ -17,157 +17,109 @@ If you have questions regarding the projects, or if there is more than one that
 you are interested in and you are unsure which to choose, don't hesitate to
 contact us -- on [[IRC]] or using [[mailing_lists]].
 
-* sound support
-
-The Hurd presently has no sound support. Fixing this requires two steps: One is
-to port kernel drivers so we can get access to actual sound hardware. The
-second is to implement a userspace server (translator), that implements an
-interface on top of the kernel device that can be used by applications --
-probably OSS or maybe ALSA.
-
-Completing this task requires porting at least one driver (e.g. from Linux) for
-a popular piece of sound hardware, and the basic userspace server. For the
-driver part, previous experience with programming kernel drivers is strongly
-advisable. The userspace part requires some knowledge about programming Hurd
-translators, but shouldn't be too hard.
-
-Once the basic support is working, it's up to the student to use the remaining
-time for porting more drivers, or implementing a more sophisticated userspace
-infrastructure. The latter requires good understanding of the Hurd philosophy,
-to come up with an appropriate design.
-
-* hurdish TCP/IP stack
-
-The Hurd presently uses a TCP/IP stack based on code from an old Linux version.
-This works, but lacks some rather important features (like PPP/PPPoE), and the
-design is not hurdish at all.
-
-A true hurdish network stack will use a set of stack of translator processes,
-each implementing a different protocol layer. This way not only the
-implementation gets more modular, but also the network stack can be used way
-more flexibly. Rather than just having the standard socket interface, plus some
-lower-level hooks for special needs, there are explicit (perhaps
-filesystem-based) interfaces at all the individual levels; special application
-can just directly access the desired layer. All kinds of packet filtering,
-routing, tunneling etc. can be easily achieved by stacking compononts in the
-desired constellation.
-
-While the general architecture is pretty much given by the various network
-layers, it's up to the student to design and implement the various interfaces
-at each layer. This task requires understanding the Hurd philosophy and
-translator programming, as well as good knowledge of TCP/IP. 
-
-* new driver glue code
-
-Although a driver framework in userspace would be desirable, presently the Hurd
-uses kernel drivers in the microkernel, gnumach. (And changing this would be
-far beyond a GSoC project...)
-
-The problem is that the drivers in gnumach are presently old Linux drivers
-(mostly from 2.0.x) accessed through a glue code layer. This is not an ideal
-solution, but works quite OK, except that the drivers are very old. The goal of
-this project is to redo the glue code, so we can use drivers from current Linux
-versions, or from one of the free BSD variants.
-
-This is a doable, but pretty involved project. Experience with driver
-programming under Linux (or BSD) is a must. (No Hurd-specific knowledge is
-required, though.)
-
-* server overriding mechanism
-
-The main idea of the Hurd is that every user can influence almost all system
-functionality, by running private Hurd servers that replace or proxy the global
-default implementations.
+* Lisp, Python, ... bindings
 
-However, running such a cumstomized subenvironment presently is not easy,
-because there is no standard mechanism to easily replace an individual standard
-server, keeping everything else. (Presently there is only the subhurd method,
-which creates a completely new system instance with a completely independant
-set of servers.)
+The main idea of the Hurd design is giving users the ability to easily
+modify/extend the system's functionality. This is done by creating filesystem
+translators, or sometimes other kinds of Hurd servers.
 
-The goal of this project is to provide a simple method for overriding
-individual standard servers, using environment variables, or a special
-subshell, or something like that.
+However, in practice this is not as easy as it should, because creating
+translators and other servers is quite involved -- the interfaces for doing
+that are not exactly simple, and available only for C programs. Being able to
+easily create simple translators in RAD languages is highly desirable, to
+really be able to reap the advantages of the Hurd architecture.
 
-Various approaches for such a mechanism has been discussed before.
-Probably the easiest (1) would be to modify the Hurd-specific parts of glibc,
-which are contacting various standard servers to implement certain system
-calls, so that instead of always looking for the servers in default locations,
-they first check for overrides in environment variables, and use these instead
-if present.
+Originally Lisp was meant to be the second system language besides C in the GNU
+system; but that doesn't mean we are bound to Lisp. Bindings for any popular
+high-level language, that helps quickly creating simple programs, are highly
+welcome.
 
-A somewhat more generic solution (2) could use some mechanism for arbitrary
-client-side namespace overrides. The client-side part of the filename lookup
-mechanism would have to check an override table on each lookup, and apply the
-desired replacement whenever a match is found.
+Several approaches are possible when creating such bindings. One way is simply
+to provide wrappers to all the available C libraries (libtrivfs, libnetfs
+etc.). While this is easy (it requires relatively little consideration), it may
+not be the optimal solution. It is preferable to hook in at a lower level, thus
+being able te create interfaces that are specially adapted to make good use of
+the features available in the respective language.
 
-Another approach would be server-side overrides. Again there are various
-variants. The actual servers themself could provide a mechanism to redirect to
-other servers on request. (3) Or we could use some more generic server-side
-namespace overrides: Either all filesystem servers could provide a mechanism to
-modify the namespace they export to certain clients (4), or proxies could be
-used that mirror the default namespace but override certain locations. (5)
+These more specialised bindings could hook in at some of the lower level
+library interfaces (libports, glibc, etc.); use the mig-provided RPC stubs
+directly; or even create native stubs directly from the interface definitions.
 
-Variants (4) and (5) are the most powerful. They are intimately related to
-chroots: (4) is like the current chroot implementation works in the Hurd, and
-(5) has been proposed as an alternative. The generic overriding mechanism could
-be implemented on top of chroot, or chroot could be implemented on top of the
-generic overriding mechanism. But this is out of scope for this project...
+The task is to create easy to use Hurd bindings for a language of the student's
+choice, and some example servers to prove that it works well in practice. This
+project will require gaining a very good understanding of the various Hurd
+interfaces. Skills in designing nice programming interfaces are a must.
 
-In practice, probably a mix of the different approaches would prove most useful
-for various servers and use cases. It is strongly recommended that the student
-starts with (1) as the simplest approach, perhaps augmenting it with (3) for
-certain servers that don't work with (1) because of indirect invocation.
+* virtualization using Hurd mechanisms
 
-This tasks requires some understanding of the Hurd internals, especially a good
-understanding of the file name lookup mechanism. It's probably not too heavy on
-the coding side.
+The main idea behind the Hurd design is to allow users to replace almost any
+system functionality. Any user can easily create a subenvironment using some
+custom servers instead of the default system servers. This can be seen as an
+[advanced lightweight
+virtualization](http://tri-ceps.blogspot.com/2007/10/advanced-lightweight-virtualization.html)
+mechanism, which allows implementing all kinds of standard and nonstandard
+virtualization scenarios.
 
-* secure chroot implementation
+However, though the basic mechanisms are there, currently it's not easy to make
+use of these possibilities, because we lack tools to automatically launch the
+desired constellations.
 
-As the Hurd attempts to be (almost) fully UNIX-compatible, it also implements a
-chroot() system call. However, the current implementation is not really good,
-as it allows easily escaping the chroot, for example by use of passive
-translators.
+The goal is to create a set of powerful tools for managing at least one
+desirable virtualization scenario. One possible starting point could be the
+subhurd/neighbour Hurd mechanism, which allows a second almost totally
+independant instance of the Hurd in parallel to the main one. The current
+implementation has serious limitations though. A subhurd can only be started by
+root. There are no communication channels between the subhurd and the main one.
+There is no mechanism for safe sharing of hardware devices. Fixing this issues
+could turn subhurds into a very powerful solution for lightweight
+virtualization using so-called logical partitions. (Similar to Linux-vserver,
+OpenVZ etc.)
 
-Many solutions have been suggested for this problem -- ranging from simple
-workaround changing the behaviour of passive translators in a chroot; changing
-the context in which passive translators are exectuted; changing the
-interpretation of filenames in a chroot; to reworking the whole passive
-translator mechanism. Some involving a completely different approch to chroot
-implementation, using a proxy instead of a special system call in the
-filesystem servers.
+While subhurd allow creating a complete second system instance, with an own set
+of Hurd servers and UNIX daemons and all, there are also situations where it is
+desirable to have a smaller subenvironment, living withing the main system and
+using most of its facilities -- similar to a chroot environment. A simple way
+to create such a subenvironment with a single command would be very helpful.
 
-The task is to pick and implement one approach for fixing chroot.
+It might be possible to implement (perhaps as a prototype) a wrapper using
+existing tools (chroot and unionfs); or it might require more specific tools,
+like some kind of unionfs-like filesytem proxy that mirrors other parts of the
+filesystem, but allows overriding individual locations, in conjuction with
+either chroot or some similar mechanism to create a subenvironment with a
+different root filesystem.
 
-This task is pretty heavy: It requires a very good understanding of file name
-lookup and the translator mechanism, as well as of security concerns in general
--- the student must prove that he really understands security implications of
-the UNIX namespace approach, and how they are affected by the introduction of
-new mechanisms. (Translators.) More important than the acualy code is the
-documentation of what he did: He must be able to defend why he chose a certain
-approach, and explain why he believes this approach really secure.
+It's also desirable to have a mechanism allowing a user to set up such a custom
+environment in a way that it will automatically get launched on login --
+practically allowing the user to run a customized operating system in his own
+account.
 
-* lexical dot-dot resolution
+Yet another interesting scenario would be a subenvironment -- using some kind
+of special filesystem proxy again -- in which the user serves as root, being
+able to create local sub-users and/or sub-groups.
 
-For historical reasons, UNIX filesystems have a real (hard) .. link from each
-directory pointing to its parent. However, this is problematic, because the
-meaning of "parent" really depends on context. If you have a symlink for
-example, you can reach a certain node in the filesystem by a different path. If
-you go to .. from there, UNIX will traditionally take you to the hard-coded
-parent node -- but this is usually not what you want. Usually you want to go
-back to the logical parent from which you came. That is called "lexical"
-resolution.
+This would allow the user to run "dangerous" applications (webbrowser, chat
+client etc.) in a confined fashin, allowing it access to only a subset of the
+user's files and other resources. (This could be done either using a lot of
+groups for individual resources, and lots of users for individual applications;
+adding a user to a group would give the corresponding application access to the
+corresponding resource -- an advanced ACL mechanism. Or leave out the groups,
+assigning the resources to users instead, and use the Hurd's ability for a
+process to have multiple user ID's, to equip individual applications with set's
+of user ID's giving them access to the necessary resources -- basically a
+capability mechanism.)
 
-Some application already use lexical resolution internally for that reason. It
-is generally agreed that many problems could be avoided if the standard
-filesystem lookup calls used lexical resolution as well. The compatibility
-problems probably would be negligable.
+The student will have to pick (at least) one of the described scenarios -- or
+come up with some other one in a similar spirit -- and implement all the tools
+(scripts, translators) necessary to make it available to users in an
+easy-to-use fashion. While the Hurd by default already offers the necessary
+mechanisms for that, these are not perfect and could be further refined for
+even better virtualization capabilities. Should need or desire for specific
+improvements in that regard come up in the course of this project, implementing
+these improvements can be considered part of the task.
 
-The goal of this project is to modify the filename lookup mechanism in the Hurd
-to use lexical resolution, and to check that the system is still fully
-functional afterwards. This task requires understanding the filename resolution
-mechanism. It's probably a relatively easy task.
+Completing this project will require gaining a very good understanding of the
+Hurd architecture and spirit. Previous experience with other virtualization
+solutions would be very helpful.
 
 * namspace based translator selection
 
@@ -229,50 +181,112 @@ programming; but the implementation should not be too hard. Perhaps the hardest
 part is finding a convenient, flexible, elegant, hurdish method for mapping the
 special extensions to actual translators...
 
-* gnumach code cleanup
+* fix file locking
 
-Although there are some attempts to move to a more modern microkernel
-alltogether, the current Hurd implementation is based on gnumach, which is only
-a slightly modified variant of the original CMU Mach.
+Over the years, UNIX has aquired a host of different file locking mechanisms.
+Some of them work on the Hurd, while others are buggy or only partially
+implemented. This breaks many applications.
 
-Unfortunately, Mach was created about two decades ago, and is in turn based on
-even older BSD code. Parts of the BSD kernel -- file systems, UNIX mechanisms
-like processes and signals etc. -- were ripped out (to be implemented in
-userspace servers instead); while other mechanisms were added to allow
-implementing stuff in userspace. (Pager interface, IPC etc.)
+The goal is to make all file locking mechanisms work properly. This requires
+finding all existing shortcomings (through systematic testing and/or checking
+for known issues in the bug tracker and mailing list archives), and fixing
+them.
 
-Also, Mach being a research project, many things were tried, adding lots of
-optional features not really needed.
+This task will require digging into parts of the code to understand how file
+locking works on the Hurd. Only general programming skills are required.
 
-The result of all this is that the current code base is in a pretty bad shape.
-It's rather hard to make modifications -- to make better use of modern hardware
-for example, or even to fix bugs. The goal of this project is to improve the
-situation.
+* procfs
 
-The task starts out easy, with fixing compiler warnings. Later it moves on to
-more tricky things: Removing dead or unneeded code paths; restructuring code
-for readability and maintainability.
+Although there is no standard (POSIX or other) for the layout of the /proc
+pseudo-filesystem, it turned out a very useful facility in GNU/Linux and other
+systems, and many tools concerned with process management use it. (ps, top,
+htop, gtop, killall, pkill, ...)
 
-This task requires good knowledge of C, and experience with working on a large
-existing code base. Previous kernel hacking experience is an advantage, but not
-really necessary.
+Instead of porting all these tools to use libps (Hurd's official method for
+accessing process information), they could be made to run out of the box, by
+implementing a Linux-compatible /proc filesystem for the Hurd.
 
-* fix libdiskfs locking issues
+The goal is to implement all /proc functionality needed for the various process
+management tools to work. (On Linux, the /proc filesystem is used also for
+debugging purposes; but this is highly system-specific anyways, so there is
+probably no point in trying to duplicate this functionality as well...)
 
-Nowadays the most often encountered cause of Hurd crashes seems to be lockups
-in the ext2fs server. One of these could be traced recently, and turned out to
-be a lock inside libdiskfs that was taken and not released in some cases. There
-is reason to believe that there are more faulty paths causing these lockups.
+The existing partially working procfs implementation from the hurdextras
+repository can serve as a starting point, but needs to be largely
+rewritten. (It should use libnetfs rather than libtrivfs; the data format needs
+to change to be more Linux-compatible; and it needs adaptation to newer system
+interfaces.)
 
-The task is systematically checking the libdiskfs code for this kind of locking
-issues. To achieve this, some kind of test harness has to be implemented: For
-exmple instrumenting the code to check locking correctness constantly at
-runtime. Or implementing a unit testing framework that explicitely checks
-locking in various code paths. (The latter could serve as a template for
-implementing unit checks in other parts of the Hurd codebase...)
+This project requires learning translator programming, and understanding some
+of the internals of process management in the Hurd. It should not be too hard
+coding-wise; and the task is very nicely defined by the exising Linux /proc
+interface -- no design considerations necessary.
 
-This task requires experience with debugging locking issues in multithreaded
-applications.
+* new driver glue code
+
+Although a driver framework in userspace would be desirable, presently the Hurd
+uses kernel drivers in the microkernel, gnumach. (And changing this would be
+far beyond a GSoC project...)
+
+The problem is that the drivers in gnumach are presently old Linux drivers
+(mostly from 2.0.x) accessed through a glue code layer. This is not an ideal
+solution, but works quite OK, except that the drivers are very old. The goal of
+this project is to redo the glue code, so we can use drivers from current Linux
+versions, or from one of the free BSD variants.
+
+This is a doable, but pretty involved project. Experience with driver
+programming under Linux (or BSD) is a must. (No Hurd-specific knowledge is
+required, though.)
+
+* server overriding mechanism
+
+The main idea of the Hurd is that every user can influence almost all system
+functionality, by running private Hurd servers that replace or proxy the global
+default implementations.
+
+However, running such a cumstomized subenvironment presently is not easy,
+because there is no standard mechanism to easily replace an individual standard
+server, keeping everything else. (Presently there is only the subhurd method,
+which creates a completely new system instance with a completely independant
+set of servers.)
+
+The goal of this project is to provide a simple method for overriding
+individual standard servers, using environment variables, or a special
+subshell, or something like that.
+
+Various approaches for such a mechanism has been discussed before.
+Probably the easiest (1) would be to modify the Hurd-specific parts of glibc,
+which are contacting various standard servers to implement certain system
+calls, so that instead of always looking for the servers in default locations,
+they first check for overrides in environment variables, and use these instead
+if present.
+
+A somewhat more generic solution (2) could use some mechanism for arbitrary
+client-side namespace overrides. The client-side part of the filename lookup
+mechanism would have to check an override table on each lookup, and apply the
+desired replacement whenever a match is found.
+
+Another approach would be server-side overrides. Again there are various
+variants. The actual servers themself could provide a mechanism to redirect to
+other servers on request. (3) Or we could use some more generic server-side
+namespace overrides: Either all filesystem servers could provide a mechanism to
+modify the namespace they export to certain clients (4), or proxies could be
+used that mirror the default namespace but override certain locations. (5)
+
+Variants (4) and (5) are the most powerful. They are intimately related to
+chroots: (4) is like the current chroot implementation works in the Hurd, and
+(5) has been proposed as an alternative. The generic overriding mechanism could
+be implemented on top of chroot, or chroot could be implemented on top of the
+generic overriding mechanism. But this is out of scope for this project...
+
+In practice, probably a mix of the different approaches would prove most useful
+for various servers and use cases. It is strongly recommended that the student
+starts with (1) as the simplest approach, perhaps augmenting it with (3) for
+certain servers that don't work with (1) because of indirect invocation.
+
+This tasks requires some understanding of the Hurd internals, especially a good
+understanding of the file name lookup mechanism. It's probably not too heavy on
+the coding side.
 
 * dtrace support
 
@@ -298,39 +312,26 @@ in their Mach-based kernel might be helpful here...)
 This project requires ability to evaluate possible solutions, and experience
 with integrating existing components as well as low-level programming.
 
-* disk I/O performance tuning
-
-The most obvious reason for the Hurd feeling slow compared to mainstream
-systems like GNU/Linux, is very slow harddisk access.
-
-The reason for this slowness is lack and/or bad implementation of common
-optimisation techniques, like scheduling reads and writes to minimalize head
-movement; effective block caching; effective reads/writes to partial blocks;
-reading/writing multiple blocks at once. The ext2 filesystem driver might also
-need some optimisation at a higher logical level.
-
-The goal of this project is to analyze the current situation, and implement/fix
-various optimisations, to achieve significantly better disk performance. It
-requires understanding the data flow through the various layers involved in
-disk acces on the Hurd (filesystem, pager, driver), and general experience with
-optimising complex systems.
+* hurdish TCP/IP stack
 
-* VM tuning
+The Hurd presently uses a TCP/IP stack based on code from an old Linux version.
+This works, but lacks some rather important features (like PPP/PPPoE), and the
+design is not hurdish at all.
 
-Hurd/Mach presently make very bad use of the available physical memory in the
-system. Some of the problems are inherent to the system design (the kernel
-can't distinguish between important application data and discardable disk
-buffers for example), and can't be fixed without fundamental changes. Other
-problems however are an ordinary lack of optimisation, like extremely crude
-heuristics when to start paging. Many parameters are based on assumptions from
-a time when typical machines had like 16 MiB of RAM, or simply have been set to
-arbitrary values and never tuned for actual use.
+A true hurdish network stack will use a set of stack of translator processes,
+each implementing a different protocol layer. This way not only the
+implementation gets more modular, but also the network stack can be used way
+more flexibly. Rather than just having the standard socket interface, plus some
+lower-level hooks for special needs, there are explicit (perhaps
+filesystem-based) interfaces at all the individual levels; special application
+can just directly access the desired layer. All kinds of packet filtering,
+routing, tunneling etc. can be easily achieved by stacking compononts in the
+desired constellation.
 
-The goal of this project is to bring the virtual memory management in Hurd/Mach
-closer to that of modern mainstream kernels (Linux, FreeBSD), by comparing the
-implementation to other systems, implementing any worthwhile improvements, and
-general optimisation/tuning. It requires very good understanding of the Mach
-VM, and virtual memory in general.
+While the general architecture is pretty much given by the various network
+layers, it's up to the student to design and implement the various interfaces
+at each layer. This task requires understanding the Hurd philosophy and
+translator programming, as well as good knowledge of TCP/IP. 
 
 * improved NFS implementation
 
@@ -349,117 +350,75 @@ important for now, and shall be the major focus of this project.
 The task has no special prerequisites besides general programming skills, and
 an interest in file systems and network protocols.
 
-* fix file locking
-
-Over the years, UNIX has aquired a host of different file locking mechanisms.
-Some of them work on the Hurd, while others are buggy or only partially
-implemented. This breaks many applications.
-
-The goal is to make all file locking mechanisms work properly. This requires
-finding all existing shortcomings (through systematic testing and/or checking
-for known issues in the bug tracker and mailing list archives), and fixing
-them.
-
-This task will require digging into parts of the code to understand how file
-locking works on the Hurd. Only general programming skills are required.
-
-* virtualization using Hurd mechanisms
-
-The main idea behind the Hurd design is to allow users to replace almost any
-system functionality. Any user can easily create a subenvironment using some
-custom servers instead of the default system servers. This can be seen as an
-[advanced lightweight
-virtualization](http://tri-ceps.blogspot.com/2007/10/advanced-lightweight-virtualization.html)
-mechanism, which allows implementing all kinds of standard and nonstandard
-virtualization scenarios.
-
-However, though the basic mechanisms are there, currently it's not easy to make
-use of these possibilities, because we lack tools to automatically launch the
-desired constellations.
+* fix libdiskfs locking issues
 
-The goal is to create a set of powerful tools for managing at least one
-desirable virtualization scenario. One possible starting point could be the
-subhurd/neighbour Hurd mechanism, which allows a second almost totally
-independant instance of the Hurd in parallel to the main one. The current
-implementation has serious limitations though. A subhurd can only be started by
-root. There are no communication channels between the subhurd and the main one.
-There is no mechanism for safe sharing of hardware devices. Fixing this issues
-could turn subhurds into a very powerful solution for lightweight
-virtualization using so-called logical partitions. (Similar to Linux-vserver,
-OpenVZ etc.)
+Nowadays the most often encountered cause of Hurd crashes seems to be lockups
+in the ext2fs server. One of these could be traced recently, and turned out to
+be a lock inside libdiskfs that was taken and not released in some cases. There
+is reason to believe that there are more faulty paths causing these lockups.
 
-While subhurd allow creating a complete second system instance, with an own set
-of Hurd servers and UNIX daemons and all, there are also situations where it is
-desirable to have a smaller subenvironment, living withing the main system and
-using most of its facilities -- similar to a chroot environment. A simple way
-to create such a subenvironment with a single command would be very helpful.
+The task is systematically checking the libdiskfs code for this kind of locking
+issues. To achieve this, some kind of test harness has to be implemented: For
+exmple instrumenting the code to check locking correctness constantly at
+runtime. Or implementing a unit testing framework that explicitely checks
+locking in various code paths. (The latter could serve as a template for
+implementing unit checks in other parts of the Hurd codebase...)
 
-It might be possible to implement (perhaps as a prototype) a wrapper using
-existing tools (chroot and unionfs); or it might require more specific tools,
-like some kind of unionfs-like filesytem proxy that mirrors other parts of the
-filesystem, but allows overriding individual locations, in conjuction with
-either chroot or some similar mechanism to create a subenvironment with a
-different root filesystem.
+This task requires experience with debugging locking issues in multithreaded
+applications.
 
-It's also desirable to have a mechanism allowing a user to set up such a custom
-environment in a way that it will automatically get launched on login --
-practically allowing the user to run a customized operating system in his own
-account.
+* sound support
 
-Yet another interesting scenario would be a subenvironment -- using some kind
-of special filesystem proxy again -- in which the user serves as root, being
-able to create local sub-users and/or sub-groups.
+The Hurd presently has no sound support. Fixing this requires two steps: One is
+to port kernel drivers so we can get access to actual sound hardware. The
+second is to implement a userspace server (translator), that implements an
+interface on top of the kernel device that can be used by applications --
+probably OSS or maybe ALSA.
 
-This would allow the user to run "dangerous" applications (webbrowser, chat
-client etc.) in a confined fashin, allowing it access to only a subset of the
-user's files and other resources. (This could be done either using a lot of
-groups for individual resources, and lots of users for individual applications;
-adding a user to a group would give the corresponding application access to the
-corresponding resource -- an advanced ACL mechanism. Or leave out the groups,
-assigning the resources to users instead, and use the Hurd's ability for a
-process to have multiple user ID's, to equip individual applications with set's
-of user ID's giving them access to the necessary resources -- basically a
-capability mechanism.)
+Completing this task requires porting at least one driver (e.g. from Linux) for
+a popular piece of sound hardware, and the basic userspace server. For the
+driver part, previous experience with programming kernel drivers is strongly
+advisable. The userspace part requires some knowledge about programming Hurd
+translators, but shouldn't be too hard.
 
-The student will have to pick (at least) one of the described scenarios -- or
-come up with some other one in a similar spirit -- and implement all the tools
-(scripts, translators) necessary to make it available to users in an
-easy-to-use fashion. While the Hurd by default already offers the necessary
-mechanisms for that, these are not perfect and could be further refined for
-even better virtualization capabilities. Should need or desire for specific
-improvements in that regard come up in the course of this project, implementing
-these improvements can be considered part of the task.
+Once the basic support is working, it's up to the student to use the remaining
+time for porting more drivers, or implementing a more sophisticated userspace
+infrastructure. The latter requires good understanding of the Hurd philosophy,
+to come up with an appropriate design.
 
-Completing this project will require gaining a very good understanding of the
-Hurd architecture and spirit. Previous experience with other virtualization
-solutions would be very helpful.
+* disk I/O performance tuning
 
-* procfs
+The most obvious reason for the Hurd feeling slow compared to mainstream
+systems like GNU/Linux, is very slow harddisk access.
 
-Although there is no standard (POSIX or other) for the layout of the /proc
-pseudo-filesystem, it turned out a very useful facility in GNU/Linux and other
-systems, and many tools concerned with process management use it. (ps, top,
-htop, gtop, killall, pkill, ...)
+The reason for this slowness is lack and/or bad implementation of common
+optimisation techniques, like scheduling reads and writes to minimalize head
+movement; effective block caching; effective reads/writes to partial blocks;
+reading/writing multiple blocks at once. The ext2 filesystem driver might also
+need some optimisation at a higher logical level.
 
-Instead of porting all these tools to use libps (Hurd's official method for
-accessing process information), they could be made to run out of the box, by
-implementing a Linux-compatible /proc filesystem for the Hurd.
+The goal of this project is to analyze the current situation, and implement/fix
+various optimisations, to achieve significantly better disk performance. It
+requires understanding the data flow through the various layers involved in
+disk acces on the Hurd (filesystem, pager, driver), and general experience with
+optimising complex systems.
 
-The goal is to implement all /proc functionality needed for the various process
-management tools to work. (On Linux, the /proc filesystem is used also for
-debugging purposes; but this is highly system-specific anyways, so there is
-probably no point in trying to duplicate this functionality as well...)
+* VM tuning
 
-The existing partially working procfs implementation from the hurdextras
-repository can serve as a starting point, but needs to be largely
-rewritten. (It should use libnetfs rather than libtrivfs; the data format needs
-to change to be more Linux-compatible; and it needs adaptation to newer system
-interfaces.)
+Hurd/Mach presently make very bad use of the available physical memory in the
+system. Some of the problems are inherent to the system design (the kernel
+can't distinguish between important application data and discardable disk
+buffers for example), and can't be fixed without fundamental changes. Other
+problems however are an ordinary lack of optimisation, like extremely crude
+heuristics when to start paging. Many parameters are based on assumptions from
+a time when typical machines had like 16 MiB of RAM, or simply have been set to
+arbitrary values and never tuned for actual use.
 
-This project requires learning translator programming, and understanding some
-of the internals of process management in the Hurd. It should not be too hard
-coding-wise; and the task is very nicely defined by the exising Linux /proc
-interface -- no design considerations necessary.
+The goal of this project is to bring the virtual memory management in Hurd/Mach
+closer to that of modern mainstream kernels (Linux, FreeBSD), by comparing the
+implementation to other systems, implementing any worthwhile improvements, and
+general optimisation/tuning. It requires very good understanding of the Mach
+VM, and virtual memory in general.
 
 * mtab
 
@@ -516,6 +475,34 @@ implement both the actual mtab translator, and the necessery interface(s) for
 gathering the data. It requires getting a good understanding of the translator
 mechanism and Hurd interfaces in general.
 
+* gnumach code cleanup
+
+Although there are some attempts to move to a more modern microkernel
+alltogether, the current Hurd implementation is based on gnumach, which is only
+a slightly modified variant of the original CMU Mach.
+
+Unfortunately, Mach was created about two decades ago, and is in turn based on
+even older BSD code. Parts of the BSD kernel -- file systems, UNIX mechanisms
+like processes and signals etc. -- were ripped out (to be implemented in
+userspace servers instead); while other mechanisms were added to allow
+implementing stuff in userspace. (Pager interface, IPC etc.)
+
+Also, Mach being a research project, many things were tried, adding lots of
+optional features not really needed.
+
+The result of all this is that the current code base is in a pretty bad shape.
+It's rather hard to make modifications -- to make better use of modern hardware
+for example, or even to fix bugs. The goal of this project is to improve the
+situation.
+
+The task starts out easy, with fixing compiler warnings. Later it moves on to
+more tricky things: Removing dead or unneeded code paths; restructuring code
+for readability and maintainability.
+
+This task requires good knowledge of C, and experience with working on a large
+existing code base. Previous kernel hacking experience is an advantage, but not
+really necessary.
+
 * xmlfs
 
 Hurd translators allow presenting underlying data in a different format. This
@@ -554,6 +541,38 @@ This task requires pretty good designing skills. Good knowledge of XML is also
 necessary. Learning translator programming will obviously be necessary to
 complete the task.
 
+* allow using unionfs early at boot
+
+In UNIX systems, traditionally most software is installed in a common directory
+hierachy, where files from various packages live beside each other, grouped by
+function: User-invokable executables in /bin, configuration files in /etc,
+architecture specific static files in /lib, variable data in /var and so on. To
+allow clean installation, deinstallation, and upgrade of software packages,
+GNU/Linux distributions usually come with a package manager, which keeps track
+of all files upon installation/removal in some kind of central database.
+
+An alternative approach is the one implemented by GNU Stow: Each package is
+actually installed in a private directory tree. The actual standard directory
+structure is then created by collecting the individual files from all the
+packages, and presenting them in the common /bin, /lib etc. locations.
+
+While the normal Stow package (for traditional UNIX systems) uses symlinks to
+the actual files, updated on installation/deinstallation events, the Hurd
+translator mechanism allows a much more elegant solution: Stowfs (which is
+actually a special mode of unionfs) creates virtual directories on the fly,
+composed of all the files from the individual package directories.
+
+The problem with this approach is that unionfs presently can be launched only
+once the system is booted up, meaning the virtual directories are not available
+at boot time. But the boot process itself already needs access to files from
+various packages. So to make this design actually usable, it is necessary to
+come up with a way to launch unionfs very early at boot time, along with the
+root filesystem.
+
+Completing this task will require gaining a very good understanding of the Hurd
+boot process and other parts of the design. It requires some design skills also
+to come up with a working mechanism.
+
 * fix tmpfs
 
 In some situations it is desirable to have a file system that is not backed by
@@ -582,37 +601,51 @@ implementation. It requires digging into some parts of the Hurd, incuding the
 pager interface and translator programming. This task probably doesn't require
 any design work, only good debugging skills.
 
-* allow using unionfs early at boot
+* lexical dot-dot resolution
 
-In UNIX systems, traditionally most software is installed in a common directory
-hierachy, where files from various packages live beside each other, grouped by
-function: User-invokable executables in /bin, configuration files in /etc,
-architecture specific static files in /lib, variable data in /var and so on. To
-allow clean installation, deinstallation, and upgrade of software packages,
-GNU/Linux distributions usually come with a package manager, which keeps track
-of all files upon installation/removal in some kind of central database.
+For historical reasons, UNIX filesystems have a real (hard) .. link from each
+directory pointing to its parent. However, this is problematic, because the
+meaning of "parent" really depends on context. If you have a symlink for
+example, you can reach a certain node in the filesystem by a different path. If
+you go to .. from there, UNIX will traditionally take you to the hard-coded
+parent node -- but this is usually not what you want. Usually you want to go
+back to the logical parent from which you came. That is called "lexical"
+resolution.
 
-An alternative approach is the one implemented by GNU Stow: Each package is
-actually installed in a private directory tree. The actual standard directory
-structure is then created by collecting the individual files from all the
-packages, and presenting them in the common /bin, /lib etc. locations.
+Some application already use lexical resolution internally for that reason. It
+is generally agreed that many problems could be avoided if the standard
+filesystem lookup calls used lexical resolution as well. The compatibility
+problems probably would be negligable.
 
-While the normal Stow package (for traditional UNIX systems) uses symlinks to
-the actual files, updated on installation/deinstallation events, the Hurd
-translator mechanism allows a much more elegant solution: Stowfs (which is
-actually a special mode of unionfs) creates virtual directories on the fly,
-composed of all the files from the individual package directories.
+The goal of this project is to modify the filename lookup mechanism in the Hurd
+to use lexical resolution, and to check that the system is still fully
+functional afterwards. This task requires understanding the filename resolution
+mechanism. It's probably a relatively easy task.
 
-The problem with this approach is that unionfs presently can be launched only
-once the system is booted up, meaning the virtual directories are not available
-at boot time. But the boot process itself already needs access to files from
-various packages. So to make this design actually usable, it is necessary to
-come up with a way to launch unionfs very early at boot time, along with the
-root filesystem.
+* secure chroot implementation
 
-Completing this task will require gaining a very good understanding of the Hurd
-boot process and other parts of the design. It requires some design skills also
-to come up with a working mechanism.
+As the Hurd attempts to be (almost) fully UNIX-compatible, it also implements a
+chroot() system call. However, the current implementation is not really good,
+as it allows easily escaping the chroot, for example by use of passive
+translators.
+
+Many solutions have been suggested for this problem -- ranging from simple
+workaround changing the behaviour of passive translators in a chroot; changing
+the context in which passive translators are exectuted; changing the
+interpretation of filenames in a chroot; to reworking the whole passive
+translator mechanism. Some involving a completely different approch to chroot
+implementation, using a proxy instead of a special system call in the
+filesystem servers.
+
+The task is to pick and implement one approach for fixing chroot.
+
+This task is pretty heavy: It requires a very good understanding of file name
+lookup and the translator mechanism, as well as of security concerns in general
+-- the student must prove that he really understands security implications of
+the UNIX namespace approach, and how they are affected by the introduction of
+new mechanisms. (Translators.) More important than the acualy code is the
+documentation of what he did: He must be able to defend why he chose a certain
+approach, and explain why he believes this approach really secure.
 
 * hurdish package manager for the GNU system
 
@@ -648,36 +681,3 @@ But this only handles the lowest level of package management. Additional
 mechanisms are necessary to handle stuff like dependencies on other packages.
 
 The goal of this task is to create these mechanisms.
-
-* Lisp, Python, ... bindings
-
-The main idea of the Hurd design is giving users the ability to easily
-modify/extend the system's functionality. This is done by creating filesystem
-translators, or sometimes other kinds of Hurd servers.
-
-However, in practice this is not as easy as it should, because creating
-translators and other servers is quite involved -- the interfaces for doing
-that are not exactly simple, and available only for C programs. Being able to
-easily create simple translators in RAD languages is highly desirable, to
-really be able to reap the advantages of the Hurd architecture.
-
-Originally Lisp was meant to be the second system language besides C in the GNU
-system; but that doesn't mean we are bound to Lisp. Bindings for any popular
-high-level language, that helps quickly creating simple programs, are highly
-welcome.
-
-Several approaches are possible when creating such bindings. One way is simply
-to provide wrappers to all the available C libraries (libtrivfs, libnetfs
-etc.). While this is easy (it requires relatively little consideration), it may
-not be the optimal solution. It is preferable to hook in at a lower level, thus
-being able te create interfaces that are specially adapted to make good use of
-the features available in the respective language.
-
-These more specialised bindings could hook in at some of the lower level
-library interfaces (libports, glibc, etc.); use the mig-provided RPC stubs
-directly; or even create native stubs directly from the interface definitions.
-
-The task is to create easy to use Hurd bindings for a language of the student's
-choice, and some example servers to prove that it works well in practice. This
-project will require gaining a very good understanding of the various Hurd
-interfaces. Skills in designing nice programming interfaces are a must.
author	antrik <antrik@users.sf.net>	2008-03-18 02:17:26 +0100
committer	antrik <antrik@users.sf.net>	2008-03-18 02:32:17 +0100
commit	f0841136ec67bb0b1063bac155e4d2efe318d430 (patch)
tree	1025bef89c90435c6be24ff0b49dcf6134e52203 /community/gsoc/project_ideas.mdwn
parent	1dcbe9e9f63af0f1f0682ae9a6808f72f2e5bc34 (diff)