community/gsoc/project_ideas.mdwn


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278

[[meta copyright="Copyright © 2008 Free Software Foundation, Inc."]]

[[meta license="""[[toggle id="license" text="GFDL 1.2+"]][[toggleable
id="license" text="Permission is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation License, Version 1.2 or
any later version published by the Free Software Foundation; with no Invariant
Sections, no Front-Cover Texts, and no Back-Cover Texts.  A copy of the license
is included in the section entitled
[[GNU_Free_Documentation_License|/fdl]]."]]"""]]

* translator stacking mechanism

One of the major strengths of the Hurd design is that not only it uses a
modular (multiserver) architecture internally, but also encourages a similar
modular architecture at application level. Complex applications can be created
intuitively from simple components by combining them through translator
stacking -- similar to traditional UNIX pipes, but much more suitable for
complex interaction and structured data.

The downside is that communication between the components with filesystem- or
other RPC interfaces imposes a relatively high overhead: Not only are the
actual IPC operations relatively slow in Mach, but also communication is
generally more complicated because of the constraints of an RPC interface.

In some cases, like network stacks for example, the overhead might prove a
serious problem; a mechanism that allows cutting down on it by more or less
transparently, taking a shortcut in certain situations to avoid actual RPC, is
highly desirable.

The work on libchannel as a special-purpose translator stacking mechanism as
last year's GSoC project yielded some very interesting ideas for a more generic
translator stacking framework.

(links)

The task here is to follow up on the previous work, creating a framework based
on these ideas, and implementing or porting some translators based on this
framework to prove it's applicability in practice. It is up to the student to
decide whether it's better to start with the previous libchannel code, turning
it into something more generic; or starting from scratch, using libchannel (and
libstore) only for reference.

This task is pretty involved. The architecture of the stacking framework has
been discussed before; but the student needs to design the various actual
interfaces for the framework, and implement all of them.

* sound support

The Hurd presently has no sound support. Fixing this requires two steps: One is
to port kernel drivers so we can get access to actual sound hardware. The
second is to implement a userspace server (translator), that implements an
interface on top of the kernel device that can be used by applications --
probably OSS or maybe ALSA.

Completing this task requires porting at least one driver (e.g. from Linux) for
a popular piece of sound hardware, and the basic userspace server. For the
driver part, previous experience with programming kernel drivers is strongly
advisable. The userspace part requires some knowledge about programming Hurd
translators, but shouldn't be too hard.

Once the basic support is working, it's up to the student to use the remaining
time for porting more drivers, or implementing a more sophisticated userspace
infrastructure. The latter requires good understanding of the Hurd philosophy,
to come up with an appropriate design.

(links)

* hurdish TCP/IP stack

The Hurd presently uses a TCP/IP stack based on code from an old Linux version.
This works, but lacks some rather important features (like PPP/PPPoE), and the
design is not hurdish at all.

A true hurdish network stack will use a set of stack of translator processes,
each implementing a different protocol layer. This way not only the
implementation gets more modular, but also the network stack can be used way
more flexibly. Rather than just having the standard socket interface, plus some
lower-level hooks for special needs, there are explicit (perhaps
filesystem-based) interfaces at all the individual levels; special application
can just directly access the desired layer. All kinds of packet filtering,
routing, tunneling etc. can be easily achieved by stacking compononts in the
desired constellation.

While the general architecture is pretty much given by the various network
layers, it's up to the student to design and implement the various interfaces
at each layer. This task requires understanding the Hurd philosophy and
translator programming, as well as good knowledge of TCP/IP. 

(links)

* new driver glue code

Although a driver framework in userspace would be desirable, presently the Hurd
uses kernel drivers in the microkernel, gnumach. (And changing this would be
far beyond a GSoC project...)

The problem is that the drivers in gnumach are presently old Linux drivers
(mostly from 2.0.x) accessed through a glue code layer. This is not an ideal
solution, but works quite OK, except that the drivers are very old. The goal of
this project is to redo the glue code, so we can use drivers from current Linux
versions, or from one of the free BSD variants.

This is a doable, but pretty involved project. Experience with driver
programming under Linux (or BSD) is a must. (No Hurd-specific knowledge is
required, though.)

(links)

* server overriding mechanism

The main idea of the Hurd is that every user can influence almost all system
functionality, by running private Hurd servers that replace or proxy the global
default implementations.

However, running such a cumstomized subenvironment presently is not easy,
because there is no standard mechanism to easily replace an individual standard
server, keeping everything else. (Presently there is only the subhurd method,
which creates a completely new system instance with a completely independant
set of servers.)

The goal of this project is to provide a simple method for overriding
individual standard servers, using environment variables, or a special
subshell, or something like that.

Various approaches for such a mechanism has been discussed before. (links)
Probably the easiest (1) would be to modify the Hurd-specific parts of glibc,
which are contacting various standard servers to implement certain system
calls, so that instead of always looking for the servers in default locations,
they first check for overrides in environment variables, and use these instead
if present.

A somewhat more generic solution (2) could use some mechanism for arbitrary
client-side namespace overrides. The client-side part of the filename lookup
mechanism would have to check an override table on each lookup, and apply the
desired replacement whenever a match is found.

Another approach would be server-side overrides. Again there are various
variants. The actual servers themself could provide a mechanism to redirect to
other servers on request. (3) Or we could use some more generic server-side
namespace overrides: Either all filesystem servers could provide a mechanism to
modify the namespace they export to certain clients (4), or proxies could be
used that mirror the default namespace but override certain locations. (5)

Variants (4) and (5) are the most powerful. They are intimately related to
chroots: (4) is like the current chroot implementation works in the Hurd, and
(5) has been proposed as an alternative. The generic overriding mechanism could
be implemented on top of chroot, or chroot could be implemented on top of the
generic overriding mechanism. But this is out of scope for this project...

In practice, probably a mix of the different approaches would prove most useful
for various servers and use cases. It is strongly recommended that the student
starts with (1) as the simplest approach, perhaps augmenting it with (3) for
certain servers that don't work with (1) because of indirect invocation.

This tasks requires some understanding of the Hurd internals, especially a good
understanding of the file name lookup mechanism. It's probably not too heavy on
the coding side.

* secure chroot implementation

As the Hurd attempts to be (almost) fully UNIX-compatible, it also implements a
chroot() system call. However, the current implementation is not really good,
as it allows easily escaping the chroot, for example by use of passive
translators.

Many solutions have been suggested for this problem -- ranging from simple
workaround changing the behaviour of passive translators in a chroot; changing
the context in which passive translators are exectuted; changing the
interpretation of filenames in a chroot; to reworking the whole passive
translator mechanism. Some involving a completely different approch to chroot
implementation, using a proxy instead of a special system call in the
filesystem servers. (links)

The task is to pick and implement one approach for fixing chroot.

This task is pretty heavy: It requires a very good understanding of file name
lookup and the translator mechanism, as well as of security concerns in general
-- the student must prove that he really understands security implications of
the UNIX namespace approach, and how they are affected by the introduction of
new mechanisms. (Translators.) More important than the acualy code is the
documentation of what he did: He must be able to defend why he chose a certain
approach, and explain why he believes this approach really secure.

* lexical dot-dot resolution

For historical reasons, UNIX filesystems have a real (hard) .. link from each
directory pointing to its parent. However, this is problematic, because the
meaning of "parent" really depends on context. If you have a symlink for
example, you can reach a certain node in the filesystem by a different path. If
you go to .. from there, UNIX will traditionally take you to the hard-coded
parent node -- but this is usually not what you want. Usually you want to go
back to the logical parent from which you came. That is called "lexical"
resolution.

Some application already use lexical resolution internally for that reason. It
is generally agreed that many problems could be avoided if the standard
filesystem lookup calls used lexical resolution as well. The compatibility
problems probably would be negligable.

The goal of this project is to modify the filename lookup mechanism in the Hurd
to use lexical resolution, and to check that the system is still fully
functional afterwards. This task requires understanding the filename resolution
mechanism. It's probably a relatively easy task.

* namspace based translator selection

The main idea behind the Hurd is to make (almost) all system functionality
user-modifiable. This includes a user-modifiable filesystem: The whole
filesystem is implemented decentrally, by a set of filesystem servers forming
the directory tree together. These filesystem servers are called translators,
and are the most visible feature of the Hurd.

The reason they are called translators is because when you set a translator on
a filesystem node, the underlying node(s) are hidden by the translator, but the
translator itself can access them, and present their contents in a different
format -- translate them. A simple example is a gunzip translator, which can be
set on a gzipped file, and presents a virtual file with the uncompressed
contents. Or the other way around. Or a translator that presents an XML file as
a directory tree. Or an mbox as a set of individual files for each mail; or
ever further breaking it down into headers, body, attachements...

This gets even more powerful when translators are used as building blocks for
larger applications: A mail reader for example doesn't need backends for
understanding various mailbox formats anymore. All formats can be parsed by
special translators, and the mail reader gets the data as a uniform, directly
usable filesystem structure. Translators can also be stacked: If you have a
compressed mailbox for example, first apply a gunzip translator, and then an
mbox translator on top of that.

There are a few problems with the way translators are set, though. For one,
once a translator is set on a node, you always see the translated content. If
you need the untranslated contents again, to do a backup for example, you first
need to remove the translator again. Also, having to set a translator
explicitely before accessing the contents is pretty cumbersome, making this
feature almost useless.

A possible solution is implementing a mechanism for selecting translators
through special filename attributes. For example you could use index.html.gz,,+
and index.html.gz,,- to choose between translated and untranslated versions of
a file. Or you could use index.html.gz,,u to get the contents of the file with
a gunzip translator applied automatically. You could also use attributes on
whole directory trees: .,,0/ would give you a directory tree corresponding to
the current directory, but with any translators disabled, for doing a backup.
And site,,u/*.html.gz would present a whole directory tree of compressed HTML
files as uncompressed files.

One benefit of the Hurd's flexibility is that it should be possible to
implement such a mechanism without touching the existing Hurd components:
Rather, just implement a special proxy, that mirrors the normal filesystem, but
is able to interpret the special extensions and present transformed files in
place of the original ones.

In the long run it's probably desirable to have the mechanism implemented in
the standard name lookup mechanism, so it will be available globally, and avoid
the overhead of a proxy; but for the beginnig the proxy solution is much more
flexible.

The goal of this project is implementing a prototype proxy; perhaps also a
first version of the global variant as proof of concept, if time permits. It
requires good understanding of the name lookup mechanism, and translator
programming; but the implementation should not be too hard. Perhaps the hardest
part is finding a convenient, flexible, elegant, hurdish method for mapping the
special extensions to actual translators...

* gnumach code cleanup
* fix libdiskfs locking issues
* dtrace support
* I/O performance tuning
* VM performance tuning
* improved NFS implementation
* fix file locking
* virtualization based on Hurd mechanisms
* procfs
* mtab
* xmlfs
* fix tmpfs
* allow using unionfs early at boot
* hurdish package manager