We offer a wide range of possible projects to choose from. If you have an idea not listed here, we'd love to hear about it!

In either case, we encourage you to contact us (on IRC and/or our developer mailing lists), so we can discuss your idea, or help you pick a suitable task -- we will gladly explain the tasks in more detail, if the descriptions are not clear enough.

In fact, we suggest you discuss your choice with us even if you have no trouble finding a task that suits you: as explained in the introduction section of the student application form, we ask all students to get into regular communication with us for the application to be considered complete. Talking about your project choice is a good start :-)

(We strongly suggest that you generally take a look at the student application form right now -- the sooner you know what we expect, the better you can cater to it :-) )

Many of the project descriptions suggest some "exercise". The reason is that for the application to be complete, we require you to make a change to the Hurd code, and send us the resulting patch. (This is also explained in the student application form.) If possible, the change should make some improvement to the code you will be working on during the summer, or to some related code.

The "exercise" bit in the project description is trying to give you some ideas what kind of change this could be. In most cases it is quite obvious, though: Try to find something to improve in the relevant code, by looking at known issues in the Savannah bug tracker; by running the code and testing stuff; and by looking through the code. If you don't find anything, try with some related code -- if you task involves translator programming, make some improvement to an existing translator; if it involves glibc hacking, make an improvement to glibc; if it involves driver hacking, make an improvement to the driver framework; and so on... Makes sense, doesn't it? :-)

Sometimes it's hard to come up with a useful improvement to the code in question, that isn't too complicated for the purposes of the application. In this case, we need to find a good alternative. You could for example make an improvement to some Hurd code that is not directly related to your project: this way you won't get familiar with working on the code you will actually need for the task, but at least you can show that you are able to work with the Hurd code in general.

Another possible alternative would be making some change to the code in question, that isn't really a useful improvement, while still making sense in some way -- this could suffice to prove that you are able to work with the code.

Don't despair if you can't come up with anything suitable by yourself. Contact us, and we will think of something together :-)

In either case, we strongly suggest that you talk to us about the change you want to make up front, to be sure that it is something that will get our approval -- especially if the idea is not directly taken from the project description.

Also, don't let this whole patch stuff discourage you from applying! As explained in the student application form, it's not a problem if you do not yet have all the necessary knowledge to do this alone -- we don't expect that. After all, the purpose of GSoC is to introduce you to free software development :-) We only want to see that you are able to obtain the necessary knowledge before the end of the application process, with our help -- contact us, and we will assist you as well as we can.

Here is a list of project ideas, followed by all project ideas inlined.

Porting Rust

Virtualization Using Hurd Mechanisms

Secure chroot Implementation

Hurdish Package Manager for the GNU System, GNU Guix

New Driver Framework

Bindings to Other Programming Languages

Hurdish TCP/IP Stack

Improved NFS Implementation

Disk I/O Performance Tuning

GNU Mach Code Cleanup

xmlfs

Allow Using unionfs Early at Boot

Lexical .. Resolution

Use Internet Protocol Translators (ftpfs etc.) as Backends for Other Programs

Fixing Programs Using PATH_MAX et al Unconditionally

Stub Implementations of Hardware Specific Libraries

Implement CD Audio Grabbing

Improving Perl or Python Support

Fix Compatibility Problems Exposed by Testsuites, Implement Missing Interfaces in glibc for GNU Hurd

Automated Testing Framework

Implementing libcap

Porting Valgrind to the Hurd

Fix libdiskfs Locking Issues

Additional ideas have been posted in id:"87zjkyhp5f.fsf@kepler.schwinge.homeip.net", but have not yet been integrated here and elsewhere. Keywords: bootstrap-vz, buildbot, ceph, clang, cloud, continuous integration, debian, eudyptula challenge, gcc front end, gdb, grub, guile, learning system, llvm, lttng, rump kernels, samba, sbcl, smbfs, steel bank common lisp, subhurd, systemtap, teaching system, tracing, virtio, x.org, xen, xorg. As well as any other ideas you might have, these are likewise applicable for projects.

All project ideas inlined:

$/!\$ Obsolete $/!\$

Vedant Tewari has been working on this as a Google Summer of Code 2023 project.

The goal of this project is to make the Rust language available on GNU/Hurd.

Presumably less work will be needed on the language's frontend itself, but rather on the supporting libraries.

The Rust language is being used more and more widely, and notably in rather fundamental libraries such as librsvg or python-cryptography. It is thus more and more pressing for GNU/Hurd to have a compiler for Rust.

The Rust compiler itself is quite portable, but its runtime library, libstd, needs to be ported to the GNU/Hurd system. This essentially consists in telling Rust how the standard C library functions can be called, i.e. the C ABI.

There as an initial attempt against rustc 1.30 that was enough to get rustc to crossbuild, but it was missing most of what libstd needs. It is most probably very outdated nowadays, but that gives an idea.

An example of the main part of the libstd port can be seen for the VxWorks port

The bulk of such a file can be mostly generated from the libc C headers thanks to the bindgen tool, it then needs to be cleaned up and integrated into the Rust build infrastructure, some preliminary work had already been investigated in that part.

A good level of C programming will be welcome to understand the questions of ABI and the libc C functions being bound.

Knowing the Rust language is not required: it can be learnt along the way, this can be a good occasion.

Possible mentors: Samuel Thibault (youpi) samuel.thibault@gnu.org

For somebody who has already a very good level of C programming and good Rust programming skills, this can probably be a 175-hour project. Otherwise it will be a good occasion to learn, and can then be a 350-hour project.

It is expected to be of medium difficulty: a fair part of the port is about mimicing other ports. Deciding how to mimic can be discussed with the community anyway. The other part is about expressing the C ABI. This requires to properly understand what that means, but once that is understood, it should be relatively straightforward to implement.

You can contact the bug-hurd@gnu.org mailing list to discuss about this project.

Bonding exercise: Building the Debian rustc package on Debian GNU/Linux. Building some Debian package (not rustc) on Debian GNU/Hurd. Have a look at the initial attempt against rustc 1.30 to get an idea how it looks like, how the ABI gets expressed in Rust. Then one builds the cross-compiler with

DEB_BUILD_OPTIONS=parallel=8 dpkg-buildpackage -B -ahurd-i386 -Ppkg.rustc.dlstage0,nocheck -nc

Rust language

How to build and run Rust

One can check out the result with:

git clone https://github.com/sthibaul/getrandom.git
git clone https://github.com/sthibaul/nix.git -b r0.26-hack
git clone https://github.com/sthibaul/rust_libloading.git -b v0.7-hack
git clone https://github.com/sthibaul/rust_libloading.git -b hack rust_libloading-0.8.0
git clone https://github.com/sthibaul/socket2.git -b v0.4.x
git clone https://github.com/Vtewari2311/libc.git -b libc-hurd-latest-hack
git clone https://github.com/Vtewari2311/rust.git -b mod-hurd-latest-hack

(yes, rust imposes checking out libloading several times...)

To build from GNU/Hurd, you will need existing rustc/cargo/rustfmt, you can e.g. fetch the tarball from Samuel. Then you can add the following to rust/config.toml

[build]
rustc = "/path/to/your/rust-hurd/usr/local/bin/rustc"
rustfmt = "/path/to/your/rust-hurd/usr/local/bin/rustfmt"
cargo = "/path/to/your/rust-hurd/usr/local/bin/cargo"

And then run from rust/

./x build
DESTDIR=/where/you/want ./x install
DESTDIR=/where/you/want ./x install cargo rustfmt

Expect about 20GB disk usage and several hours duration. You also need quite some ram, 4GB may be needed.

One can run the basic testsuites with

./x test tests/ui
./x test library/core
./x test library/std

One can run tests remotely, to test with crossbuilding:

./x build src/tools/remote-test-server --target i686-unknown-hurd-gnu

One can run it on the target:

./remote-test-server  -v --bind 0.0.0.0:12345

and run the testsuite with

export TEST_DEVICE_ADDR="1.2.3.4:12345"
./x test tests/ui --target i686-unknown-hurd-gnu

Currently these tests are known to fail:

tests/ui:

[ui] tests/ui/abi/homogenous-floats-target-feature-mixup.rs
[ui] tests/ui/associated-consts/issue-93775.rs
[ui] tests/ui/env-funky-keys.rs
[ui] tests/ui/issues/issue-74564-if-expr-stack-overflow.rs
[ui] tests/ui/macros/macros-nonfatal-errors.rs
[ui] tests/ui/modules/path-no-file-name.rs
[ui] tests/ui/parser/mod_file_with_path_attr.rs
[ui] tests/ui/process/no-stdio.rs#mir
[ui] tests/ui/process/no-stdio.rs#thir
[ui] tests/ui/process/println-with-broken-pipe.rs
[ui] tests/ui/sse2.rs
[ui] tests/ui/traits/object/print_vtable_sizes.rs

notably because we have not enabled SSE2 by default, and we have errno values that are different from Linux etc.,

library/std:

net::tcp::tests::double_bind
net::tcp::tests::test_read_timeout
net::tcp::tests::test_read_with_timeout
net::tcp::tests::timeouts
net::udp::tests::test_read_timeout
net::udp::tests::test_read_with_timeout
net::udp::tests::timeouts
os::unix::net::tests::basic
os::unix::net::tests::test_read_timeout
os::unix::net::tests::test_read_with_timeout
os::unix::net::tests::test_unix_datagram_connect_to_recv_addr

because pfinet currently lets double-bind on IPv6 addresses, doesn't currently support SO_SNDTIMEO / SO_RCVTIMEO, and pflocal doesn't support getpeername.

To cross-build (e.g. from Linux), you need to set up a cross build toolchain ; a simple way is to use the build-many-glibcs.py script as described on glibc. Note that to produce something that can be run even on current latest Debian, you should be using the 2.37/master branch. You also need to comment the #define ED from sysdeps/mach/hurd/bits/errno.h.

You also need to unpack a Hurd build of openssl ; a simple way is to take the debian packages from debian-ports: libssl3 and libssl-dev and unpack them with:

dpkg-deb -x libssl3_3.0.10-1_hurd-i386.deb /where/you/want
dpkg-deb -x libssl-dev_3.0.10-1_hurd-i386.deb /where/you/want
mv /where/you/want/usr/include/i386-gnu/openssl/* /where/you/want/usr/include/openssl/
mv /where/you/want/usr/lib/i386-gnu/* /where/you/want/usr/lib/

Then you can add the following to rust/config.toml

[llvm]
download-ci-llvm = false

[target.i686-unknown-hurd-gnu]
cc = "/path/to/your/build-glibc/install/compilers/i686-gnu/bin/i686-glibc-gnu-gcc"
cxx = "/path/to/your/build-glibc/install/compilers/i686-gnu/bin/i686-glibc-gnu-g++"
linker = "/path/to/your/build-glibc/install/compilers/i686-gnu/bin/i686-glibc-gnu-gcc"

And then run from rust/

export I686_UNKNOWN_HURD_GNU_OPENSSL_DIR=/where/you/want/usr
./x build --stage 0 compiler library
./x build --host i686-unknown-hurd-gnu --target i686-unknown-hurd-gnu compiler library cargo rustfmt
DESTDIR=/where/you/want ./x install --host i686-unknown-hurd-gnu --target i686-unknown-hurd-gnu
DESTDIR=/where/you/want ./x install --host i686-unknown-hurd-gnu --target i686-unknown-hurd-gnu cargo rustfmt

Expect about 25GB disk usage and several hours duration. You also need quite some ram, 4GB may be needed. Take care if you have many cores and threads, the parallel build of llvm can be quite demanding, you may want to reduce the number of available processors (e.g. disabling SMT by prefixing your commands with hwloc-bind --no-smt all --), so you have more memory per core.

Note that you will have a usable cross-compiler in rust/build/x86_64-unknown-linux-gnu/stage1/bin/rustc and a native Hurd compiler in rust/build/i686-unknown-hurd-gnu/stage2/bin/rustc

Posted 2023-02-08 00:27:13 CET

Edit
?Discussion

The main idea behind the Hurd design is to allow users to replace almost any system functionality (extensible system). Any user can easily create a subenvironment using some custom servers instead of the default system servers. This can be seen as an advanced lightweight virtualization mechanism, which allows implementing all kinds of standard and nonstandard virtualization scenarios.

However, though the basic mechanisms are there, currently it's not easy to make use of these possibilities, because we lack tools to automatically launch the desired constellations.

The goal is to create a set of powerful tools for managing at least one desirable virtualization scenario. One possible starting point could be the subhurd/neighborhurd mechanism, which allows a second almost totally independent instance of the Hurd in parallel to the main one.

While subhurd allow creating a complete second system instance, with an own set of Hurd servers and UNIX daemons and all, there are also situations where it is desirable to have a smaller subenvironment, living within the main system and using most of its facilities -- similar to a chroot environment. A simple way to create such a subenvironment with a single command would be very helpful.

It might be possible to implement (perhaps as a prototype) a wrapper using existing tools (chroot and unionfs); or it might require more specific tools, like some kind of unionfs-like filesystem proxy that mirrors other parts of the filesystem, but allows overriding individual locations, in conjunction with either chroot or some similar mechanism to create a subenvironment with a different root filesystem.

It's also desirable to have a mechanism allowing a user to set up such a custom environment in a way that it will automatically get launched on login -- practically allowing the user to run a customized operating system in his own account.

Yet another interesting scenario would be a subenvironment -- using some kind of special filesystem proxy again -- in which the user serves as root, being able to create local sub-users and/or sub-groups.

This would allow the user to run "dangerous" applications (webbrowser, chat client etc.) in a confined fashion, allowing it access to only a subset of the user's files and other resources. (This could be done either using a lot of groups for individual resources, and lots of users for individual applications; adding a user to a group would give the corresponding application access to the corresponding resource -- an advanced ?ACL mechanism. Or leave out the groups, assigning the resources to users instead, and use the Hurd's ability for a process to have multiple user IDs, to equip individual applications with sets of user IDs giving them access to the necessary resources -- basically a capability mechanism.)

The student will have to pick (at least) one of the described scenarios -- or come up with some other one in a similar spirit -- and implement all the tools (scripts, translators) necessary to make it available to users in an easy-to-use fashion. While the Hurd by default already offers the necessary mechanisms for that, these are not perfect and could be further refined for even better virtualization capabilities. Should need or desire for specific improvements in that regard come up in the course of this project, implementing these improvements can be considered part of the task.

Completing this project will require gaining a very good understanding of the Hurd architecture and spirit. Previous experience with other virtualization solutions would be very helpful.

Possible mentors: Justus Winter (teythoon)

Exercise: Currently, when issuing 'reboot' in Subhurds, 'boot' exits. Make it reboot the Subhurd instead.

Posted 2009-03-05 19:20:56 CET

License:

GFDL 1.2+

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled GNU Free Documentation License.

Edit
?Discussion

As the Hurd attempts to be (almost) fully UNIX-compatible, it also implements a chroot() system call. However, the current implementation is not really good, as it allows easily escaping the chroot, for example by use of passive translators.

Many solutions have been suggested for this problem -- ranging from simple workaround changing the behavior of passive translators in a chroot; changing the context in which passive translators are executed; changing the interpretation of filenames in a chroot; to reworking the whole passive translator mechanism. Some involving a completely different approach to chroot implementation, using a proxy instead of a special system call in the filesystem servers.

See http://tri-ceps.blogspot.com/2007/07/theory-of-filesystem-relativity.html for some suggestions, as well as the followup discussions on http://lists.gnu.org/archive/html/gnu-system-discuss/2007-09/msg00118.html and http://lists.gnu.org/archive/html/bug-hurd/2008-03/msg00089.html.

The task is to pick and implement one approach for fixing chroot.

This task is pretty heavy: it requires a very good understanding of file name lookup and the translator mechanism, as well as of security concerns in general -- the student must prove that he really understands security implications of the UNIX namespace approach, and how they are affected by the introduction of new mechanisms. (Translators.) More important than the actual code is the documentation of what he did: he must be able to defend why he chose a certain approach, and explain why he believes this approach really secure.

Possible mentors: Justus Winter (teythoon)

Exercise: It's hard to come up with a relevant exercise, as there are so many possible solutions... Probably best to make an improvement to one of the existing translators -- if possible, something touching name resolution or and such, e.g. implementing file_reparent() in a translator that doesn't support it yet.

2016-02-14, Justus Winter

I have factored out the proxying-bits from fakeroot so that it can be shared. The most simple chrooting translator is the identity translator, which proxies RPCs without really modifying them. Combining the identity translator with settrans --chroot gives us chroot(8). With a little more work, I believe that can be used to implement chroot(2). Whether or not that is secure remains to be seen, maybe that is even an ill-conceived goal.

Posted 2009-03-05 19:20:56 CET

Edit
?Discussion

Most GNU/Linux systems use pretty sophisticated package managers, to ease the management of installed software. These keep track of all installed files, and various kinds of other necessary information, in special databases. On package installation, deinstallation, and upgrade, scripts are used that make all kinds of modifications to other parts of the system, making sure the packages get properly integrated.

This approach creates various problems. For one, all management has to be done with the distribution package management tools, or otherwise they would loose track of the system state. This is reinforced by the fact that the state information is stored in special databases, that only the special package management tools can work with.

Also, as changes to various parts of the system are made on certain events (installation/deinstallation/update), managing the various possible state transitions becomes very complex and bug-prone.

For the official (Hurd-based) GNU system, a different approach is intended: making use of Hurd translators -- more specifically their ability to present existing data in a different form -- the whole system state will be created on the fly, directly from the information provided by the individual packages. The visible system state is always a reflection of the sum of packages installed at a certain moment; it doesn't matter how this state came about. There are no global databases of any kind. (Some things might require caching for better performance, but this must happen transparently.)

The core of this approach is formed by stowfs. GNU Guix, GNU's package manager, installs each package in its own directory. Each user has a profile, which is the union of some of these packages. On GNU/Linux, this union is implemented as a symlink tree; on GNU/Hurd, stowfs would offer a more elegant solution. Stowfs creates a traditional Unix directory structure from all the files in the individual package directories. This handles the lowest level of package management.

The goal of this task is to exploit Hurd features in GNU Guix.

Discussion

Java

IRC, freenode, #hurd, 2013-12-19

<antrik_> teythoon_: I don't think wrapping libtrivfs etc. for guile
  bindings is really desirable... for the lisp bindings, we agreed that
  it's better to hook in at a lower level, and build more lispish
  abstractions
<antrik> trivfs is a C framework; it probably doesn't map very well to
  other languages -- especially non-imperative ones...
<antrik> (it is arguable whether trivfs is really a good abstraction even
  for C... but that's another discussion :-) )
<antrik> ArneBab: same for Python bindings. when I suggested ignoring
  libtrivfs etc., working around the pthread problem was just a side effect
  -- the real goal has always been nicer abstraction
<anatoly> antrik: agree with you
<anatoly> antrik: about nicer abstractions
<teythoon_> antrik: I agree too, but wrapping libtrivfs is much easier
<teythoon_> otherwise, one needs to reimplement lots of stuff to get some
  basic functionality
<teythoon_> like a mig that emits your language
<braunr> i agree with antrik too
<braunr> yes, the best would be mig handling multiple languages

open issue mig.

<antrik> teythoon_: not exactly. for dynamic languages, code generation is
  silly. just handle the marshalling on the fly. that's what the Lisp
  bindings are doing (AFAIK)
<teythoon> antrik: ok, but you'd still need to parse the rpc definitions,
  no?
<antrik> teythoon: yeah, you still need to parse the .defs -- unless we add
  reflection to RPC interfaces...
<antrik> err, I mean introspection

Posted 2009-03-05 19:20:56 CET Tags: open issue mig

Edit
?Discussion

The Hurd presently uses a TCP/IP stack based on code from an old Linux version. This works, but lacks some rather important features (like PPP/PPPoE), and the design is not hurdish at all. Recently lwip, which is an userspace tcp/ip library, was ported to the Hurd. If you are only using an ethernet connection, then it is possible to use lwip as a complete replacement for pfinet. However, lwip uses the netdde device drivers for wireless chips, which are old drivers from an old version of linux. To use lwip for a wifi connection on more modern hardware, one would also need modern device drivers to access the internet. The promising approach to this is using a rump kernel. This is essentially the New Driver Framework google summer of code project idea. Hopefully, one day soon the Hurd project will completely replace pfinet with lwip.

A true hurdish network stack will use a set of translator processes, each implementing a different protocol layer. This way not only the implementation gets more modular, but also the network stack can be used way more flexibly. Rather than just having the standard socket interface, plus some lower-level hooks for special needs, there are explicit (perhaps filesystem-based) interfaces at all the individual levels; special application can just directly access the desired layer. All kinds of packet filtering, routing, tunneling etc. can be easily achieved by stacking components in the desired constellation.

Implementing a complete modular network stack is not feasible as a GSoC project, though. Instead, the task is to take some existing user space TCP/IP implementation, and make it run as a single Hurd server for now, so it can be used in place of the existing pfinet. The idea is to split it up into individual layers later. The initial implementation, and the choice of a TCP/IP stack, should be done with this in mind -- it needs to be modular enough to make such a split later on feasible.

This is GNU Savannah task #5469.

Possible mentors: youpi

Exercise: You could try making some improvement to the existing pfinet implementation; or you could work towards running some existing userspace TCP/IP stack on Hurd. (As a normal program for now, not a proper Hurd server yet.)

Posted 2009-03-05 19:20:56 CET

License:

GFDL 1.2+

Edit
?Discussion

The Hurd has both NFS server and client implementations, which work, but not very well: File locking doesn't work properly (at least in conjunction with a GNU/Linux server), and performance is extremely poor. Part of the problems could be owed to the fact that only NFSv2 is supported so far.

This project encompasses implementing NFSv3 support, fixing bugs and performance problems -- the goal is to have good NFS support. The work done in a previous unfinished GSoC project can serve as a starting point.

Both client and server parts need work, though the client is probably much more important for now, and shall be the major focus of this project. One could probably use libnfs for the client portion. You should probably talk to Sergey, who has an in-development 9P port to the Hurd.

Some discussion of NFS improvements has been done for a former GSoC application -- it might give you some pointers. But don't take any of the statements made there for granted -- check the facts yourself!

A bigger subtask is the libnetfs: io map issue.

This task, GNU Savannah task #5497, has no special prerequisites besides general programming skills, and an interest in file systems and network protocols.

Possible mentors: ?

Exercise: Look into one of the existing issues in the NFS code. It's quite possible that you will not be able to fix any of the visible problems before the end of the application process; but you might discover something else you could improve in the code while working on it :-)

If you can't find anything suitable, talk to us about possible other exercise tasks.

Posted 2009-03-05 19:20:56 CET

Edit
?Discussion

The most obvious reason for the Hurd feeling slow compared to mainstream systems like GNU/Linux, is a low I/O system performance, in particular very slow hard disk access.

The reason for this slowness is lack and/or bad implementation of common optimization techniques, like scheduling reads and writes to minimize head movement; effective block caching; effective reads/writes to partial blocks; reading/writing multiple blocks at once; and read-ahead. The ext2 filesystem server might also need some optimizations at a higher logical level.

The goal of this project is to analyze the current situation, and implement/fix various optimizations, to achieve significantly better disk performance. It requires understanding the data flow through the various layers involved in disk access on the Hurd (filesystem, pager, driver), and general experience with optimizing complex systems. That said, the killing feature we are definitely missing is the read-ahead, and even a very simple implementation would bring very big performance speedups.

Here are some real testcases:

binutils ld 64ksec;
running the Git testsuite which is mostly I/O bound;
use TopGit on a non-toy repository.

Possible mentors: Samuel Thibault (youpi)

Exercise: Look through all the code involved in disk I/O, and try something easy to improve. It's quite likely though that you will find nothing obvious -- in this case, please contact us about a different exercise task.

Posted 2009-03-05 19:20:56 CET Tags: open issue hurd

Edit
?Discussion

Although there are some attempts to move to a more modern microkernel altogether, the current Hurd implementation is based on GNU Mach, which is only a slightly modified variant of the original CMU Mach.

Unfortunately, Mach was created about two decades ago, and is in turn based on even older BSD code. Parts of the BSD kernel -- file systems, UNIX mechanisms like processes and signals, etc. -- were ripped out (to be implemented in userspace servers instead); while other mechanisms were added to allow implementing stuff in user space. (Pager interface, IPC, etc.)

Also, Mach being a research project, many things were tried, adding lots of optional features not really needed.

The result of all this is that the current code base is in a pretty bad shape. It's rather hard to make modifications -- to make better use of modern hardware for example, or even to fix bugs. The goal of this project is to improve the situation.

There are various things you can do here: Fixing compiler warnings; removing dead or unneeded code paths; restructuring code for readability and maintainability etc. -- a glance at the source code should quickly give you some ideas.

This task requires good knowledge of C, and experience with working on a large existing code base. Previous kernel hacking experience is an advantage, but not really necessary.

Possible mentors: Samuel Thibault (youpi)

Exercise: You should have no trouble finding something to improve when looking at the gnumach code, or even just at compiler warnings, for instance "implicit declaration of function", "format ‘%lu’ expects argument of type..." are easy to start with.

Posted 2009-03-05 19:20:56 CET

Edit
?Discussion

Hurd translators allow presenting underlying data in a different format. This is a very powerful ability: it allows using standard tools on all kinds of data, and combining existing components in new ways, once you have the necessary translators.

A typical example for such a translator would be xmlfs: a translator that presents the contents of an underlying XML file in the form of a directory tree, so it can be studied and edited with standard filesystem tools, or using a graphical file manager, or to easily extract data from an XML file in a script etc.

The exported directory tree should represent the DOM structure of the document, or implement XPath/XQuery, or both, or some combination thereof (perhaps XPath/XQuery could be implemented as a second translator working on top of the DOM one) -- whatever works well, while sticking to XML standards as much as possible.

Ideally, the translation should be reversible, so that another, complementary translator applied on the expanded directory tree would yield the original XML file again; and also the other way round, applying the complementary translator on top of some directory tree and xmlfs on top of that would yield the original directory again. However, with the different semantics of directory trees and XML files, it might not be possible to create such a universal mapping. Thus it is a desirable goal, but not a strict requirement.

The goal of this project is to create a fully usable XML translator, that allows both reading and writing any XML file. Implementing the complementary translator also would be nice if time permits, but is not mandatory part of the task.

The existing partial (read-only) xmlfs implementation can serve as a starting point.

This task requires pretty good designing skills. Very good knowledge of XML is also necessary. Learning translator programming will obviously be necessary to complete the task.

Possible mentors: Olaf Buddenhagen (antrik)

Exercise: Make some improvement to the existing xmlfs, or some other existing Hurd translator. (Especially those in hurdextras are often quite rudimental -- it shouldn't be hard to find something to improve...)

Posted 2009-03-05 19:20:56 CET

License:

GFDL 1.2+

Edit
?Discussion

In UNIX systems, traditionally most software is installed in a common directory hierarchy, where files from various packages live beside each other, grouped by function: user-invokable executables in /bin, system-wide configuration files in /etc, architecture specific static files in /lib, variable data in /var, and so on. To allow clean installation, deinstallation, and upgrade of software packages, GNU/Linux distributions usually come with a package manager, which keeps track of all files upon installation/removal in some kind of central database.

An alternative approach is the one implemented by GNU Stow and GNU Guix: each package is actually installed in a private directory tree. The actual standard directory structure is then created by collecting the individual files from all the packages, and presenting them in the common /bin, /lib, etc. locations.

While the normal Stow or Guix package (for traditional UNIX systems) uses symlinks to the actual files, updated on installation/deinstallation events, the Hurd translator mechanism allows a much more elegant solution: stowfs (which is actually a special mode of unionfs) creates virtual directories on the fly, composed of all the files from the individual package directories.

The problem with this approach is that unionfs presently can be launched only once the system is booted up, meaning the virtual directories are not available at boot time. But the boot process itself already needs access to files from various packages. So to make this design actually usable, it is necessary to come up with a way to launch unionfs very early at boot time, along with the root filesystem.

Completing this task will require gaining a very good understanding of the Hurd boot process and other parts of the design. It requires some design skills also to come up with a working mechanism.

Possible mentors: Carl Fredrik Hammar (cfhammar)

Posted 2009-03-05 19:20:56 CET

Edit
?Discussion

For historical reasons, UNIX filesystems have a real (hard) .. link from each directory pointing to its parent. However, this is problematic, because the meaning of "parent" really depends on context. If you have a symlink for example, you can reach a certain node in the filesystem by a different path. If you go to .. from there, UNIX will traditionally take you to the hard-coded parent node -- but this is usually not what you want. Usually you want to go back to the logical parent from which you came. That is called "lexical" resolution.

Some application already use lexical resolution internally for that reason. It is generally agreed that many problems could be avoided if the standard filesystem lookup calls used lexical resolution as well. The compatibility problems probably would be negligible.

The goal of this project is to modify the filename lookup mechanism in the Hurd to use lexical resolution, and to check that the system is still fully functional afterwards. This task requires understanding the filename resolution mechanism.

Fix Compatibility Problems Exposed by Testsuites

A number of software packages come with extensive testsuites. Some notable ones are glibc, gnulib, Perl, Python, GNU Coreutils, and glib. While these testsuites were written mostly to track regressions in the respective packages, some of the tests fail on the Hurd in general.

There is also the Open POSIX Testsuite which is more of a whole system interface testing suite.

Then, there is the File System Exerciser which we can use to test our file system servers for conformity.

While in some cases these might point to wrong usage of system interfaces, most of the time such failures are actually caused by shortcomings in Hurd's implementation of these interfaces. These shortcomings are often not obvious in normal use, but can be directly or indirectly responsible for all kinds of failures. The testsuites help in isolating such problems, so they can be tracked down and fixed.

This task thus consists in running some of the mentioned testsuites (and/or any other ones that come to mind), and looking into the causes of failures. The goal is to analyze all failures in one or more of the listed testsuites, to find out what shortcomings in the Hurd implementation cause them (if any), and to fix at least some of these shortcomings.

Note that this task somewhat overlaps with the Perl/Python task. Here the focus however is not on improving the support for any particular program, but on fixing general problems in the Hurd.

A complementary task is adding a proper unit testing framework to the GNU Hurd's code base, and related packages.

Implement Missing Interfaces in glibc for GNU Hurd

A related project is to implement missing interfaces for GNU Hurd (glibc wiki), primatily in glibc.

In glibc's Linux kernel port, most simple POSIX interfaces are in fact just forwarded to (implemented by) Linux kernel system calls. In contrast, in the GNU Hurd port, the POSIX (and other) interfaces are actually implemented in glibc on top of the Hurd RPC protocols. A few examples: getuid, open, rmdir, setresuid, socketpair.

When new interfaces are added to glibc (new editions of POSIX and similar standards, support for new editions of C/C++ standards, new GNU-specific extensions), generally ENOSYS stubs are added, which are then used as long as there is no real implementation, and often these real implementations are only done for the Linux kernel port, but not GNU Hurd. (This is because most of the contributors are primarily interested in using glibc on Linux-based systems.) Also, there is quite a backlog of missing implementations for GNU Hurd.

In coordination with the GNU Hurd developers, you'd work on implementing such missing interfaces.

These are very flexible tasks: while less experienced students should be able to tackle at least a few of the easier problems, other issues will be challenging even for experienced hackers. No specific previous knowledge is required; only fairly decent C programming skills. While tracking down the various issues, the student will be digging into the inner workings of the Hurd, and thus gradually gaining the knowledge required for Hurd development in general.

Possible mentors: Samuel Thibault (youpi)

Exercise: Take a stab at one of the testsuite failures, or missing implementation, and write a minimal testcase exposing the underlying problem. Actually fixing it would be a bonus of course -- but as it's hard to predict which issues will be easy and which will be tricky, we will already be satisfied if the student makes a good effort. (We hope to see some discussion of the problems in this case though :-) )

Posted 2010-03-27 17:31:56 CET

Edit
?Discussion

Hurd development would benefit greatly from automated tests. Unit tests should be added for all the major components (Mach; Hurd servers and libraries). Also, functional tests can be performed on various levels: Mach; individual servers; and interaction of several servers.

(The highest level would actually be testing libc functionality, which in turn uses the various Hurd mechanisms. glibc already comes with a test suite -- though it might be desirabe to add some extra tests for exercising specific aspects of the Hurd...)

Our page on automated testing collects some relevant material.

The Goal of this task is to provide testing frameworks that allow automatically running tests as part of the Hurd and Mach build processes. The student will have to create the necessary infrastrucure, and a couple of sample tests employing it. Ideally, all the aspects mentioned above should be covered. At least some have to be ready for use and upstream merging before the end of the summer.

(As a bonus, in addition to these explicit tests, it would be helpful to integrate some methods for testing locking validity, performing static code analysis etc.)

This task probably requires some previous experience with unit testing of C programs, as well as dealing with complex build systems. No in-depth knowledge about any specific parts of the Hurd should be necessary, but some general understanding of the Hurd architecture will have to be aquired to complete this project. This makes it a very good project to get started on Hurd development :-)

Possible mentors: ?

Exercise: Create a program performing some simple test(s) on the Hurd or Mach code. It doesn't need to be integrated in the build process yet -- a standalone progrem with it's own Makefile is fine for now.

Posted 2011-03-24 22:55:40 CET

License:

GFDL 1.2+

libcap is a library providing the API to access POSIX capabilities. These allow giving various kinds of specific privileges to individual users, without giving them full root permissions.

Although the Hurd design should facilitate implementing such features in a quite natural fashion, there is no support for POSIX capabilities yet. As a consequence, libcap is not available on the Hurd, and thus various packages using it can not be easily built in Debian GNU/Hurd.

The first goal of this project is implementing a dummy libcap, which doesn't actually do anything useful yet, but returns appropriate status messages, so program using the library can be built and run on Debian GNU/Hurd.

Having this, actual support for at least some of the capabilities should be implemented, as time permits. This will require some digging into Hurd internals.

Some knowledge of POSIX capabilities will need to be obtained, and for the latter part also some knowledge about the Hurd architecture. This project is probably doable without previous experience with either, though.

David Hedberg applied for this project in 2010, and though he didn't go through with it, he fleshed out many details.

Possible mentors: Samuel Thibault (youpi)

Exercise: Make libcap compile on Debian GNU/Hurd. It doesn't need to actually do anything yet -- just make it build at all for now.

Posted 2009-03-24 18:57:47 CET

Edit
?Discussion

Valgrind is an extremely useful debugging tool for memory errors. (And some other kinds of hard-to-find errors too.) Aside from being useful for program development in general, a Hurd port will help finding out why certain programs segfault on the Hurd, although they work on Linux. Even more importantly, it will help finding bugs in the Hurd servers themselfs.

To keep track of memory use, Valgrind however needs to know how each system call affects the validity of memory regions. This knowledge is highly kernel-specific, and thus Valgrind needs to be explicitely ported for every system.

Such a port involves two major steps: making Valgrind understand how kernel traps work in general on the system in question; and how all the individual kernel calls affect memory. The latter step is where most of the work is, as the behaviour of each single system call needs to be described.

Compared to Linux, Mach (the microkernel used by the Hurd) has very few kernel traps. Almost all system calls are implemented as RPCs instead -- either handled by Mach itself, or by the various Hurd servers. All RPCs use a pair of mach_msg() invocations: one to send a request message, and one to receive a reply. However, while all RPCs use the same mach_msg() trap, the actual effect of the call varies greatly depending on which RPC is invoked -- similar to the ioctl() call on Linux. Each request thus must be handled individually.

Unlike ioctl(), the RPC invocations have explicit type information for the parameters though, which can be retrieved from the message header. By analyzing the parameters of the RPC reply message, Valgrind can know exactly which memory regions are affected by that call, even without specific knowledge of the RPC in question. Thus implementing a general parser for the reply messages will already give Valgrind a fairly good approximation of memory validity -- without having to specify the exact semantic of each RPC by hand.

While this should make Valgrind quite usable on the Hurd already, it's not perfect: some RPCs might return a buffer that is only partially filled with valid data; or some reply parameters might be optional, and only contain valid data under certain conditions. Such specific semantics can't be deduced from the message headers alone. Thus for a complete port, it will still be necessary to go through the list of all known RPCs, and implement special handling in Valgrind for those RPCs that need it. Reading the source code of the rpctrace tool would probably be useful to understand how the RPC message can be parsed.

The goal of this task is at minimum to make Valgrind grok Mach traps, and to implement the generic RPC handler. Ideally, specific handling for RPCs needing it should also be implemented.

Completing this project will require digging into Valgrind's handling of system calls (in C), and into Hurd RPCs. It is really not an easy task, but a fairly predictable one -- there shouldn't be any unexpected difficulties, and no major design work is necessary. It doesn't require any specific previous knowledge: only very good programming skills in general. On the other hand, the student will obtain a good understanding of Hurd RPCs while working on this task, and thus perfect qualifications for Hurd development in general :-)

Possible mentors: Samuel Thibault (youpi)

Exercise: As a starter, students can try to teach valgrind a couple of Linux ioctls, as this will make them learn how to use the read/write primitives of valgrind.

Note: Some work exists in https://github.com/sprkv5/valgrind-hurd/tree/main and would welcome rebasing on a more recent version of valgrind.

Posted 2009-12-17 22:47:56 CET Tags: open issue gnumach open issue hurd

Edit
?Discussion

Nowadays the most often encountered cause of Hurd crashes seems to be lockups in the ext2fs server. One of these could be traced recently, and turned out to be a lock inside libdiskfs that was taken and not released in some cases. There is reason to believe that there are more faulty paths causing these lockups.

The task is systematically checking the libdiskfs code for this kind of locking issues. To achieve this, some kind of test harness has to be implemented: For example instrumenting the code to check locking correctness constantly at runtime. Or implementing a unit testing framework that explicitly checks locking in various code paths. (The latter could serve as a template for implementing unit tests in other parts of the Hurd codebase...)

(A systematic code review would probably suffice to find the existing locking issues; but it wouldn't document the work in terms of actual code produced, and thus it's not suitable for a GSoC project...)

This task requires experience with debugging locking issues in multithreaded applications.

Tools have been written for automated code analysis; these can help to locate and fix such errors.

Possible mentors: Samuel Thibault (youpi)

Exercise: If you could actually track down and fix one of the existing locking errors before the end of the application process, that would be excellent. This might be rather tough though, so probably you need to talk to us about an alternative exercise task...

Posted 2009-03-05 19:20:56 CET Tags: open issue hurd

Edit
?Discussion