summaryrefslogtreecommitdiff
path: root/community/gsoc/project_ideas.mdwn
blob: 8e7a032fb5298ea0761ee968952deea3b6c0772f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
[[meta copyright="Copyright © 2008 Free Software Foundation, Inc."]]

[[meta license="""[[toggle id="license" text="GFDL 1.2+"]][[toggleable
id="license" text="Permission is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation License, Version 1.2 or
any later version published by the Free Software Foundation; with no Invariant
Sections, no Front-Cover Texts, and no Back-Cover Texts.  A copy of the license
is included in the section entitled
[[GNU_Free_Documentation_License|/fdl]]."]]"""]]

This list offers a wide range of tasks: From rather straightforward to pretty
involved ones; from low-level kernel hacking (drivers, VM) to high-level
translators; from fixing bugs, implementing new functionality based on existing
interfaces, to creating totally new mechanisms.

If you have questions regarding the projects, or if there is more than one that
you are interested in and you are unsure which to choose, don't hesitate to
[[contact_us|communication]].


## Bindings to Other Programming Languages

The main idea of the Hurd design is giving users the ability to easily
modify/extend the system's functionality ([[extensible_system|extensibility]]).
This is done by creating [[filesystem_translators|hurd/translator]] and other
kinds of Hurd servers.

However, in practice this is not as easy as it should, because creating
translators and other servers is quite involved -- the interfaces for doing
that are not exactly simple, and available only for C programs. Being able to
easily create simple translators in RAD languages is highly desirable, to
really be able to reap the advantages of the Hurd architecture.

Originally Lisp was meant to be the second system language besides C in the GNU
system; but that doesn't mean we are bound to Lisp. Bindings for any popular
high-level language, that helps quickly creating simple programs, are highly
welcome.

Several approaches are possible when creating such bindings. One way is simply
to provide wrappers to all the available C libraries ([[hurd/libtrivfs]], [[hurd/libnetfs]]
etc.). While this is easy (it requires relatively little consideration), it may
not be the optimal solution. It is preferable to hook in at a lower level, thus
being able te create interfaces that are specially adapted to make good use of
the features available in the respective language.

These more specialised bindings could hook in at some of the lower level
library interfaces ([[hurd/libports]], [[hurd/glibc]], etc.); use the
[[microkernel/mach/MIG]]-provided [[microkernel/mach/RPC]] stubs directly; or
even create native stubs directly from the interface definitions.

The task is to create easy to use Hurd bindings for a language of the student's
choice, and some example servers to prove that it works well in practice. This
project will require gaining a very good understanding of the various Hurd
interfaces. Skills in designing nice programming interfaces are a must.

There has already been some [earlier work on Python
bindings](http://www.sigill.org/files/pytrivfs-20060724-ro-test1.tar.bz2), that
perhaps can be re-used.  Also some work on [Perl
bindings](http://www.nongnu.org/hurdextras/#pith) is availabled.


## Virtualization Using Hurd Mechanisms

The main idea behind the Hurd design is to allow users to replace almost any
system functionality ([[extensible_system|extensibility]]). Any user can easily
create a subenvironment using some custom [[servers|hurd/translator]] instead
of the default system servers. This can be seen as an
[[advanced_lightweight_virtualization|hurd/virtualization]] mechanism, which
allows implementing all kinds of standard and nonstandard virtualization
scenarios.

However, though the basic mechanisms are there, currently it's not easy to make
use of these possibilities, because we lack tools to automatically launch the
desired constellations.

The goal is to create a set of powerful tools for managing at least one
desirable virtualization scenario. One possible starting point could be the
[[hurd/subhurd]]/[[hurd/neighborhurd]] mechanism, which allows a second almost totally
independant instance of the Hurd in parallel to the main one. The current
implementation has serious limitations though. A subhurd can only be started by
root. There are no communication channels between the subhurd and the main one.
There is no mechanism for safe sharing of hardware devices. Fixing this issues
could turn subhurds into a very powerful solution for lightweight
virtualization using so-called logical partitions. (Similar to Linux-vserver,
OpenVZ etc.)

While subhurd allow creating a complete second system instance, with an own set
of Hurd servers and UNIX daemons and all, there are also situations where it is
desirable to have a smaller subenvironment, living withing the main system and
using most of its facilities -- similar to a chroot environment. A simple way
to create such a subenvironment with a single command would be very helpful.

It might be possible to implement (perhaps as a prototype) a wrapper using
existing tools (chroot and [[hurd/translator/unionfs]]); or it might require more specific tools,
like some kind of unionfs-like filesytem proxy that mirrors other parts of the
filesystem, but allows overriding individual locations, in conjuction with
either chroot or some similar mechanism to create a subenvironment with a
different root filesystem.

It's also desirable to have a mechanism allowing a user to set up such a custom
environment in a way that it will automatically get launched on login --
practically allowing the user to run a customized operating system in his own
account.

Yet another interesting scenario would be a subenvironment -- using some kind
of special filesystem proxy again -- in which the user serves as root, being
able to create local sub-users and/or sub-groups.

This would allow the user to run "dangerous" applications (webbrowser, chat
client etc.) in a confined fashion, allowing it access to only a subset of the
user's files and other resources. (This could be done either using a lot of
groups for individual resources, and lots of users for individual applications;
adding a user to a group would give the corresponding application access to the
corresponding resource -- an advanced [[ACL]] mechanism. Or leave out the groups,
assigning the resources to users instead, and use the Hurd's ability for a
process to have multiple user IDs, to equip individual applications with sets
of user IDs giving them access to the necessary resources -- basically a
[[capability]] mechanism.)

The student will have to pick (at least) one of the described scenarios -- or
come up with some other one in a similar spirit -- and implement all the tools
(scripts, translators) necessary to make it available to users in an
easy-to-use fashion. While the Hurd by default already offers the necessary
mechanisms for that, these are not perfect and could be further refined for
even better virtualization capabilities. Should need or desire for specific
improvements in that regard come up in the course of this project, implementing
these improvements can be considered part of the task.

Completing this project will require gaining a very good understanding of the
Hurd architecture and spirit. Previous experience with other virtualization
solutions would be very helpful.


## Namspace-based Translator Selection

The main idea behind the Hurd is to make (almost) all system functionality
user-modifiable ([[extensible_system|extensibility]]).  This includes a
user-modifiable filesystem: the whole filesystem is implemented decentrally, by
a set of filesystem servers forming the directory tree together, a
[[hurd/virtual_file_system]].  These filesystem servers are called
[[translators|hurd/translator]], and are the most visible feature of the Hurd.

The reason they are called translators is because when you set a translator on
a filesystem node, the underlying node(s) are hidden by the translator, but the
translator itself can access them, and present their contents in a different
format -- translate them.  A simple example is a
[[gunzip_translator|hurd/translator/storeio]], which can be set on a gzipped
file, and presents a virtual file with the uncompressed contents.  Or the other
way around.  Or a translator that presents an
[[XML_file_as_a_directory_tree|hurd/translator/xmlfs]].  Or an mbox as a set of
individual files for each mail ([[hurd/translator/mboxfs]]); or ever further
breaking it down into headers, body, attachements...

This gets even more powerful when translators are used as building blocks for
larger applications: A mail reader for example doesn't need backends for
understanding various mailbox formats anymore. All formats can be parsed by
special translators, and the mail reader gets the data as a uniform, directly
usable filesystem structure. Translators can also be stacked: If you have a
compressed mailbox for example, first apply a gunzip translator, and then an
mbox translator on top of that.

There are a few problems with the way translators are set, though. For one,
once a translator is set on a node, you always see the translated content. If
you need the untranslated contents again, to do a backup for example, you first
need to remove the translator again. Also, having to set a translator
explicitely before accessing the contents is pretty cumbersome, making this
feature almost useless.

A possible solution is implementing a mechanism for selecting translators
through special filename attributes.  For example you could use
`index.html.gz,,+` and `index.html.gz,,-` to choose between translated and
untranslated versions of a file.  Or you could use `index.html.gz,,u` to get
the contents of the file with a gunzip translator applied automatically.  You
could also use attributes on whole directory trees: `.,,0/` would give you a
directory tree corresponding to the current directory, but with any translators
disabled, for doing a backup.  And `site,,u/*.html.gz` would present a whole
directory tree of compressed HTML files as uncompressed files.

One benefit of the Hurd's flexibility is that it should be possible to
implement such a mechanism without touching the existing Hurd components:
Rather, just implement a special proxy, that mirrors the normal filesystem, but
is able to interpret the special extensions and present transformed files in
place of the original ones.

In the long run it's probably desirable to have the mechanism implemented in
the standard name lookup mechanism, so it will be available globally, and avoid
the overhead of a proxy; but for the beginnig the proxy solution is much more
flexible.

The goal of this project is implementing a prototype proxy; perhaps also a
first version of the global variant as proof of concept, if time permits. It
requires good understanding of the name lookup mechanism, and translator
programming; but the implementation should not be too hard. Perhaps the hardest
part is finding a convenient, flexible, elegant, hurdish method for mapping the
special extensions to actual translators...


## Fix File Locking

Over the years, UNIX has aquired a host of different file locking mechanisms.
Some of them work on the Hurd, while others are buggy or only partially
implemented. This breaks many applications.

The goal is to make all file locking mechanisms work properly. This requires
finding all existing shortcomings (through systematic testing and/or checking
for known issues in the bug tracker and mailing list archives), and fixing
them.

This task will require digging into parts of the code to understand how file
locking works on the Hurd. Only general programming skills are required.


## `procfs`

Although there is no standard (POSIX or other) for the layout of the `/proc`
pseudo-filesystem, it turned out a very useful facility in GNU/Linux and other
systems, and many tools concerned with process management use it.  (`ps`, `top`,
`htop`, `gtop`, `killall`, `pkill`, ...)

Instead of porting all these tools to use [[hurd/libps]] (Hurd's official method for
accessing process information), they could be made to run out of the box, by
implementing a Linux-compatible `/proc` filesystem for the Hurd.

The goal is to implement all `/proc` functionality needed for the various process
management tools to work.  (On Linux, the `/proc` filesystem is used also for
debugging purposes; but this is highly system-specific anyways, so there is
probably no point in trying to duplicate this functionality as well...)

The [[existing_partially_working_procfs_implementation|hurd/translator/procfs]]
can serve as a starting point, but needs to be largely rewritten.  (It should
use [[hurd/libnetfs]] rather than [[hurd/libtrivfs]]; the data format needs to
change to be more Linux-compatible; and it needs adaptation to newer system
interfaces.)

This project requires learning [[hurd/translator]] programming, and
understanding some of the internals of process management in the Hurd.  It
should not be too hard coding-wise; and the task is very nicely defined by the
exising Linux `/proc` interface -- no design considerations necessary.


## New Driver Glue Code

Although a driver framework in userspace would be desirable, presently the Hurd
uses kernel drivers in the microkernel,
[[GNU_Mach|microkernel/mach/gnumach]]. (And changing this would be far beyond a
GSoC project...)

The problem is that the drivers in GNU Mach are presently old Linux drivers
(mostly from 2.0.x) accessed through a glue code layer. This is not an ideal
solution, but works quite OK, except that the drivers are very old. The goal of
this project is to redo the glue code, so we can use drivers from current Linux
versions, or from one of the free BSD variants.

This is a doable, but pretty involved project. Experience with driver
programming under Linux (or BSD) is a must. (No Hurd-specific knowledge is
required, though.)

An alternative approach would be to use Xen's high-level driver interface
ala mini-os.  In the long term, it is likely this is easiest to maintain.

This is [[GNU_Savannah_task 5488]].


## Server Overriding Mechanism

The main idea of the Hurd is that every user can influence almost all system
functionality ([[extensible_system|extensibility]]), by running private Hurd
servers that replace or proxy the global default implementations.

However, running such a cumstomized subenvironment presently is not easy,
because there is no standard mechanism to easily replace an individual standard
server, keeping everything else.  (Presently there is only the [[hurd/subhurd]]
method, which creates a completely new system instance with a completely
independent set of servers.)

The goal of this project is to provide a simple method for overriding
individual standard servers, using environment variables, or a special
subshell, or something like that.

Various approaches for such a mechanism has been discussed before.
Probably the easiest (1) would be to modify the Hurd-specific parts of [[hurd/glibc]],
which are contacting various standard servers to implement certain system
calls, so that instead of always looking for the servers in default locations,
they first check for overrides in environment variables, and use these instead
if present.

A somewhat more generic solution (2) could use some mechanism for arbitrary
client-side namespace overrides. The client-side part of the filename lookup
mechanism would have to check an override table on each lookup, and apply the
desired replacement whenever a match is found.

Another approach would be server-side overrides. Again there are various
variants. The actual servers themself could provide a mechanism to redirect to
other servers on request. (3) Or we could use some more generic server-side
namespace overrides: Either all filesystem servers could provide a mechanism to
modify the namespace they export to certain clients (4), or proxies could be
used that mirror the default namespace but override certain locations. (5)

Variants (4) and (5) are the most powerful. They are intimately related to
chroots: (4) is like the current chroot implementation works in the Hurd, and
(5) has been proposed as an alternative. The generic overriding mechanism could
be implemented on top of chroot, or chroot could be implemented on top of the
generic overriding mechanism. But this is out of scope for this project...

In practice, probably a mix of the different approaches would prove most useful
for various servers and use cases. It is strongly recommended that the student
starts with (1) as the simplest approach, perhaps augmenting it with (3) for
certain servers that don't work with (1) because of indirect invocation.

This tasks requires some understanding of the Hurd internals, especially a good
understanding of the file name lookup mechanism. It's probably not too heavy on
the coding side.

This is [[GNU_Savannah_task 6612]].  Also there are quite a bit of emails
discussing this topic, from a last year's GSoC application.  <!-- TODO.  Link
to those.  -->


## `dtrace` Support

One of the main problems of the current Hurd implementation is very poor
performance. While we have a bunch of ideas what could cause the performance
problems, these are mostly just guesses. Better understanding what really
causes bad performance is necessary to improve the situation.

For that, we need tools for performance measurements. While all kinds of more
or less specific profiling tools could be convieved, the most promising and
generic approach seems to be a framework for logging certain events in the
running system (both in the microkernel and in the Hurd servers). This would
allow checking how much time is spent in certain modules, how often certain
situations occur, how things interact, etc. It could also prove helpful in
debugging some issues that are otherwise hard to find because of complex
interactions.

The most popular framework for that is Sun's dtrace; but there might be others.
The student has to evaluate the existing options, deciding which makes most
sense for the Hurd; and implement that one. (Apple's implementation of dtrace
in their Mach-based kernel might be helpful here...)

This project requires ability to evaluate possible solutions, and experience
with integrating existing components as well as low-level programming.


## Hurdish TCP/IP Stack

The Hurd presently uses a [[TCP/IP_stack|hurd/translator/pfinet]] based on code from an old Linux version.
This works, but lacks some rather important features (like PPP/PPPoE), and the
design is not hurdish at all.

A true hurdish network stack will use a set of stack of [[hurd/translator]] processes,
each implementing a different protocol layer. This way not only the
implementation gets more modular, but also the network stack can be used way
more flexibly. Rather than just having the standard socket interface, plus some
lower-level hooks for special needs, there are explicit (perhaps
filesystem-based) interfaces at all the individual levels; special application
can just directly access the desired layer. All kinds of packet filtering,
routing, tunneling etc. can be easily achieved by stacking compononts in the
desired constellation.

While the general architecture is pretty much given by the various network
layers, it's up to the student to design and implement the various interfaces
at each layer. This task requires understanding the Hurd philosophy and
translator programming, as well as good knowledge of TCP/IP. 

This is [[GNU_Savannah_task 5469]].


## Improved NFS Implementation

The Hurd has both NFS server and client implementations, which work, but not
very well: File locking doesn't work properly (at least in conjuction with a
GNU/Linux server), and performance is extremely poor. Part of the problems
could be owed to the fact that only NFSv2 is supported so far.

This project encompasses implementing NFSv3 support, fixing bugs and
performance problems -- the goal is to have good NFS support. The work done in
a previous unfinished GSoC project can serve as a starting point.

Both client and server parts need work, though the client is probably much more
important for now, and shall be the major focus of this project.

This task, [[GNU_Savannah_task 5497]], has no special prerequisites besides general programming skills, and
an interest in file systems and network protocols.


## Fix `libdiskfs` Locking Issues

Nowadays the most often encountered cause of Hurd crashes seems to be lockups
in the [[hurd/translator/ext2fs]] server.  One of these could be traced
recently, and turned out to be a lock inside [[hurd/libdiskfs]] that was taken
and not released in some cases.  There is reason to believe that there are more
faulty paths causing these lockups.

The task is systematically checking the [[hurd/libdiskfs]] code for this kind of locking
issues. To achieve this, some kind of test harness has to be implemented: For
exmple instrumenting the code to check locking correctness constantly at
runtime. Or implementing a unit testing framework that explicitely checks
locking in various code paths. (The latter could serve as a template for
implementing unit checks in other parts of the Hurd codebase...)

This task requires experience with debugging locking issues in multithreaded
applications.


## Convert Hurd Libraries and Servers to pthreads

The Hurd was originally created at a time when the [pthreads
standard](http://www.opengroup.org/onlinepubs/009695399/basedefs/pthread.h.html)
didn't exist yet.  Thus all Hurd servers and libraries are using the old
[[cthreads|hurd/libcthreads]] package that came with [[microkernel/Mach]],
which is not compatible with [[pthreads|hurd/libpthread]].

Not only does that mean that people hacking on Hurd internals have to deal with
a non-standard thread package, which nobody is familiar with. Although a
pthreads implementation for the Hurd was created in the meantime, it's not
possible to use both cthreads and pthreads in the same program. Consequently,
pthreads can't presently be used in any Hurd servers -- including translators.

Some work already has been done once on converting the Hurd servers and
libraries to use pthreads, but that work hasn't been finished.  It is available
as [[GNU_Savannah_task 5487]] and can of course be used to base the new work
upon.

The goal of this project is to have all the Hurd code use pthreads. Should any
limitations in the existing pthreads implementation turn up that hinder this
transition, they will have to be fixed as well.

One possible option is creating a wrapper that implements the cthreads
interfaces on top of pthreads, to ease the transition -- but it might very well
turn out that it's easier to just change all the existing code to use pthreads
directly.  This is up to the student.  Such a wrapper has been proposed as
[[GNU_Savannah_task 7895]] and its implementation would be a useful
starting-point.

This project requires relatively little Hurd-specific knowledge. Experience
with multithreaded programming in general and pthreads in particular is
required, though.


## Sound Support

The Hurd presently has no sound support.  Fixing this, [[GNU_Savannah_task
5485]], requires two steps: the first is to port some other kernel's drivers to
[[GNU_Mach|microkernel/mach/gnumach]] so we can get access to actual sound
hardware.  The second is to implement a userspace server ([[hurd/translator]]),
that implements an interface on top of the kernel device that can be used by
applications -- probably OSS or maybe ALSA.

Completing this task requires porting at least one driver (e.g. from Linux) for
a popular piece of sound hardware, and the basic userspace server. For the
driver part, previous experience with programming kernel drivers is strongly
advisable. The userspace part requires some knowledge about programming Hurd
translators, but shouldn't be too hard.

Once the basic support is working, it's up to the student to use the remaining
time for porting more drivers, or implementing a more sophisticated userspace
infrastructure. The latter requires good understanding of the Hurd philosophy,
to come up with an appropriate design.

Another option would be to evaluate whether a driver that is completely running
in user-space is feasible.  <!-- TODO.  Elaborate.  -->


## Disk I/O Performance Tuning

The most obvious reason for the Hurd feeling slow compared to mainstream
systems like GNU/Linux, is very slow harddisk access.

The reason for this slowness is lack and/or bad implementation of common
optimisation techniques, like scheduling reads and writes to minimalize head
movement; effective block caching; effective reads/writes to partial blocks;
reading/writing multiple blocks at once; and read-ahead.  The
[[ext2_filesystem_server|hurd/translator/ext2fs]] might also need some
optimisations at a higher logical level.

The goal of this project is to analyze the current situation, and implement/fix
various optimisations, to achieve significantly better disk performance. It
requires understanding the data flow through the various layers involved in
disk acces on the Hurd ([[filesystem|hurd/virtual_file_system]],
[[pager|hurd/libpager]], driver), and general experience with
optimising complex systems.  That said, the killing feature we are definitely
missing is the read-ahead, and even a very simple implementation would bring
very big performance speedups.


## VM Tuning

Hurd/[[microkernel/Mach]] presently make very bad use of the available physical memory in the
system. Some of the problems are inherent to the system design (the kernel
can't distinguish between important application data and discardable disk
buffers for example), and can't be fixed without fundamental changes. Other
problems however are an ordinary lack of optimisation, like extremely crude
heuristics when to start paging. Many parameters are based on assumptions from
a time when typical machines had like 16 MiB of RAM, or simply have been set to
arbitrary values and never tuned for actual use.

The goal of this project is to bring the virtual memory management in Hurd/Mach
closer to that of modern mainstream kernels (Linux, FreeBSD), by comparing the
implementation to other systems, implementing any worthwhile improvements, and
general optimisation/tuning. It requires very good understanding of the Mach
VM, and virtual memory in general.

This project is related to [[GNU_Savannah_task 5489]].


## `mtab`

In traditional monolithic system, the kernel keeps track of all mounts; the
information is available through `/proc/mounts` (on Linux at least), and in a
very similar form in `/etc/mtab`.

The Hurd on the other hand has a totally
[[decentralized_file_system|hurd/virtual_file_system]].  There is no single
entity involved in all mounts.  Rather, only the parent file system to which a
mountpoint ([[hurd/translator]]) is attached is involved.  As a result, there
is no central place keeping track of mounts.

As a consequence, there is currently no easy way to obtain a listing of all
mounted file systems. This also means that commands like `df` can only work on
explicitely specified mountpoints, instead of displaying the usual listing.

One possible solution to this would be for the translator startup mechanism to
update the `mtab` on any `mount`/`unmount`, like in traditional systems.
However, there are same problems with this approach.  Most notably: what to do
with passive translators, i.e., translators that are not presently running, but
set up to be started automatically whenever the node is accessed?  Probably
these should be counted an among the mounted filesystems; but how to handle the
`mtab` updates for a translator that is not started yet?  Generally, being
centralized and event-based, this is a pretty unelegant, non-hurdish solution.

A more promising approach is to have `mtab` exported by a special translator,
which gathers the necessary information on demand.  This could work by
traversing the tree of translators, asking each one for mount points attached
to it.  (Theoretically, it could also be done by just traversing *all* nodes,
checking each one for attached translators.  That would be very inefficient,
though.  Thus a special interface is probably required, that allows asking a
translator to list mount points only.)

There are also some other issues to keep in mind.  Traversing arbitrary
translators set by other users can be quite dangerous -- and it's probably not
very interesting anyways what private filesystems some other user has mounted.
But what about the global `/etc/mtab`?  Should it list only root-owned
filesystems?  Or should it create different listings depending on what user
contacts it?...

That leads to a more generic question: which translators should be actually
listed?  There are different kinds of translators: ranging from traditional
filesystems ([[disks|hurd/libdiskfs]] and other actual
[[stores|hurd/translator/storeio]]), but also purely virtual filesystems like
[[hurd/translator/ftpfs]] or [[hurd/translator/unionfs]], and even things that
have very little to do with a traditional filesystem, like a
[[gzip_translator|hurd/translator/storeio]],
[[mbox_translator|hurd/translator/mboxfs]],
[[xml_translator|hurd/translator/xmlfs]], or various device file translators...
Listing all of these in `/etc/mtab` would be pretty pointless, so some kind of
classification mechanism is necessary.  By default it probably should list only
translators that claim to be real filesystems, though alternative views with
other filtering rules might be desirable.

After taking decisions on the outstanding design questions, the student will
implement both the actual [[mtab_translator|hurd/translator/mtabfs]], and the
necessery interface(s) for gathering the data.  It requires getting a good
understanding of the translator mechanism and Hurd interfaces in general.


## GNU Mach Code Cleanup

Although there are some attempts to move to a more modern microkernel
alltogether, the current Hurd implementation is based on
[[GNU_Mach|microkernel/mach/gnumach]], which is only a slightly modified
variant of the original CMU [[microkernel/Mach]].

Unfortunately, Mach was created about two decades ago, and is in turn based on
even older BSD code.  Parts of the BSD kernel -- file systems, UNIX mechanisms
like processes and signals, etc. -- were ripped out (to be implemented in
[[userspace_servers|hurd/translator]] instead); while other mechanisms were
added to allow implementing stuff in userspace.
([[Pager_interface|microkernel/mach/external_pager_mechanism]],
[[microkernel/mach/IPC]], etc.)

Also, Mach being a research project, many things were tried, adding lots of
optional features not really needed.

The result of all this is that the current code base is in a pretty bad shape.
It's rather hard to make modifications -- to make better use of modern hardware
for example, or even to fix bugs.  The goal of this project is to improve the
situation.

The task starts out easy, with fixing compiler warnings.  Later it moves on to
more tricky things: removing dead or unneeded code paths; restructuring code
for readability and maintainability.

This task requires good knowledge of C, and experience with working on a large
existing code base.  Previous kernel hacking experience is an advantage, but
not really necessary.


## `xmlfs`

Hurd [[translators|hurd/translator]] allow presenting underlying data in a
different format.  This is a very powerful ability: it allows using standard
tools on all kinds of data, and combining existing components in new ways, once
you have the necessary translators.

A typical example for such a translator would be xmlfs: a translator that
presents the contents of an underlying XML file in the form of a directory
tree, so it can be studied and edited with standard filesystem tools, or using
a graphical file manager, or to easily extract data from an XML file in a
script etc.

The exported directory tree should represent the DOM structure of the document,
or implement XPath, or both, or some combination thereof (perhaps XPath could
be implemented as a second translator working on top of the DOM one) --
whatever works well, while sticking to XML standards as much as possible.

Ideally, the translation should be reversible, so that another, complementary
translator applied on the expanded directory tree would yield the original XML
file again; and also the other way round, applying the complementary translator
on top of some directory tree and xmlfs on top of that would yield the original
directory again.  However, with the different semantics of directory trees and
XML files, it might not be possible to create such a universal mapping.  Thus
it is a desirable goal, but not a strict requirement.

The goal of this project is to create a fully usable XML translator, that
allows both reading and writing any XML file.  Implementing the complementary
translator also would be nice if time permits, but is not mandatory part of the
task.

The [[existing_partial_(read-only)_xmlfs_implementation|hurd/translator/xmlfs]]
can serve as a starting point.

This task requires pretty good designing skills.  Good knowledge of XML is also
necessary.  Learning translator programming will obviously be necessary to
complete the task.


## Allow Using `unionfs` Early at Boot

In UNIX systems, traditionally most software is installed in a common directory
hierachy, where files from various packages live beside each other, grouped by
function: user-invokable executables in `/bin`, system-wide configuration files
in `/etc`, architecture specific static files in `/lib`, variable data in
`/var`, and so on.  To allow clean installation, deinstallation, and upgrade of
software packages, GNU/Linux distributions usually come with a package manager,
which keeps track of all files upon installation/removal in some kind of
central database.

An alternative approach is the one implemented by GNU Stow: each package is
actually installed in a private directory tree.  The actual standard directory
structure is then created by collecting the individual files from all the
packages, and presenting them in the common `/bin`, `/lib`, etc. locations.

While the normal Stow package (for traditional UNIX systems) uses symlinks to
the actual files, updated on installation/deinstallation events, the Hurd
[[hurd/translator]] mechanism allows a much more elegant solution:
[[hurd/translator/stowfs]] (which is actually a special mode of
[[hurd/translator/unionfs]]) creates virtual directories on the fly, composed
of all the files from the individual package directories.

The problem with this approach is that unionfs presently can be launched only
once the system is booted up, meaning the virtual directories are not available
at boot time.  But the boot process itself already needs access to files from
various packages.  So to make this design actually usable, it is necessary to
come up with a way to launch unionfs very early at boot time, along with the
root filesystem.

Completing this task will require gaining a very good understanding of the Hurd
boot process and other parts of the design.  It requires some design skills
also to come up with a working mechanism.


## Fix `tmpfs`

In some situations it is desirable to have a file system that is not backed by
actual disk storage, but only by anonymous memory, i.e. lives in the RAM (and
possibly swap space).

A simplistic way to implement such a memory filesystem is literally creating a
ramdisk, i.e. simply allocating a big chunck of RAM (called a memory store in
Hurd terminology), and create a normal filesystem like ext2 on that.  However,
this is not very efficient, and not very convenient either (the filesystem
needs to be recreated each time the ramdisk is invoked).  A nicer solution is
having a real [[hurd/translator/tmpfs]], which creates all filesystem
structures directly in RAM, allocating memory on demand.

The Hurd has had such a tmpfs for a long time.  However, the existing
implementation doesn't work anymore -- it got broken by changes in other parts
of the Hurd design.

There are several issues.  The most serious known problem seems to be that for
technical reasons it receives [[microkernel/mach/RPC]]s from two different
sources on one [[microkernel/mach/port]], and gets mixed up with them.  Fixing
this is non-trivial, and requires a good understanding of the involved
mechanisms.

The goal of this project to get a fully working, full featured tmpfs
implementation.  It requires digging into some parts of the Hurd, incuding the
[[pager_interface|hurd/libpager]] and [[hurd/translator]] programming.  This
task probably doesn't require any design work, only good debugging skills.


## Lexical `..` Resolution

For historical reasons, UNIX filesystems have a real (hard) `..` link from each
directory pointing to its parent. However, this is problematic, because the
meaning of "parent" really depends on context. If you have a symlink for
example, you can reach a certain node in the filesystem by a different path. If
you go to `..` from there, UNIX will traditionally take you to the hard-coded
parent node -- but this is usually not what you want. Usually you want to go
back to the logical parent from which you came. That is called "lexical"
resolution.

Some application already use lexical resolution internally for that reason. It
is generally agreed that many problems could be avoided if the standard
filesystem lookup calls used lexical resolution as well. The compatibility
problems probably would be negligable.

The goal of this project is to modify the filename lookup mechanism in the Hurd
to use lexical resolution, and to check that the system is still fully
functional afterwards. This task requires understanding the filename resolution
mechanism. It's probably a relatively easy task.

See also [[GNU_Savannah_bug 17133]].


## Secure `chroot` implementation

As the Hurd attempts to be (almost) fully UNIX-compatible, it also implements a
`chroot()` system call.  However, the current implementation is not really
good, as it allows easily escaping the `chroot`, for example by use of
[[passive_translators|hurd/translator]].

Many solutions have been suggested for this problem -- ranging from simple
workaround changing the behaviour of passive translators in a `chroot`;
changing the context in which passive translators are exectuted; changing the
interpretation of filenames in a chroot; to reworking the whole passive
translator mechanism.  Some involving a completely different approch to
`chroot` implementation, using a proxy instead of a special system call in the
filesystem servers.

The task is to pick and implement one approach for fixing chroot.

This task is pretty heavy: it requires a very good understanding of file name
lookup and the translator mechanism, as well as of security concerns in general
-- the student must prove that he really understands security implications of
the UNIX namespace approach, and how they are affected by the introduction of
new mechanisms.  (Translators.)  More important than the acualy code is the
documentation of what he did: he must be able to defend why he chose a certain
approach, and explain why he believes this approach really secure.


## Hurdish Package Manager for the GNU System

Most GNU/Linux systems use pretty sophisticated package managers, to ease the
management of installed software.  These keep track of all installed files, and
various kinds of other necessary information, in special databases.  On package
installation, deinstallation, and upgrade, scripts are used that make all kinds
of modifications to other parts of the system, making sure the packages get
properly integrated.

This approach creates various problems.  For one, *all* management has to be
done with the distribution package management tools, or otherwise they would
loose track of the system state.  This is reinforced by the fact that the state
information is stored in special databases, that only the special package
management tools can work with.

Also, as changes to various parts of the system are made on certain events
(installation/deinstallation/update), managing the various possible state
transitions becomes very complex and bug-prone.

For the official (Hurd-based) GNU system, a different approach is intended:
making use of Hurd [[translators|hurd/translator]] -- more specifically their
ability to present existing data in a different form -- the whole system state
will be created on the fly, directly from the information provided by the
individual packages.  The visible system state is always a reflection of the
sum of packages installed at a certain moment; it doesn't matter how this state
came about.  There are no global databases of any kind.  (Some things might
require caching for better performance, but this must happen transparently.)

The core of this approach is formed by [[hurd/translator/stowfs]], which
creates a traditional unix directory structure from all the files in the
individual package directories.  But this only handles the lowest level of
package management.  Additional mechanisms are necessary to handle stuff like
dependencies on other packages.

The goal of this task is to create these mechanisms.


## Port the Debian Installer to the Hurd

The primary means of distributing the Hurd is through Debian GNU/Hurd.
However, the installation CDs presently use an ancient, non-native installer.
The situation could be much improved by making sure that the newer *Debian
Installer* works on the Hurd.

Some preliminary work has been done, see
<http://wiki.debian.org/DebianInstaller/Hurd>.

The goal is to have the Debian Installer fully working on the Hurd.  It
requires relatively little Hurd-specific knowledge.