1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
|
# Java for Hurd (and vice vera)
*[Draft] I'll finish this later today and send an email to bug-hurd when I'm done*
Contact information:
* Full name: Jérémie Koenig
* Email: jk@jk.fr.eu.org
* IRC: jkoenig on Freenode and OFTC
## Introductions
I am a first year M.Sc. student
in Computer Science at University of Strasbourg (France).
My interests include capability-based security,
programming languages and formal methods (in particular, proof-carrying code).
### Previous involvement
Although I have known the Hurd for some time,
I did not contribute until last summer,
during which I participated to Google Summer of Code
as a student for the Debian project.
I worked on porting Debian-Installer to Hurd.
This project was mostly a success,
although we still have to use a special mirror for installation
with a few modified packages
and tweaked priorities
to work around some uninstallable packages
with Priority: standard.
Shortly afterwards,
I rewrote the procfs translator
to fix some issues with memory leaks,
make it more reliable,
and improve comptibility with Linux-based tools
such as `procps` or `htop`.
Although I have not had as much time
as I would have liked to dedicate to the Hurd
since that time,
I have continued to maintain the mirror in question,
and I have started to work
on implementing POSIX threads signal semantics in glibc.
### Project-related skills and interests
I have used Java mostly for university assignements.
This includes non-trivial projects
using threads and distributed programming frameworks
such as Java RMI or CORBA.
I have also used it to experiment with
Google App Engine
(web applications)
and Google Web Toolkit
(a compiler from Java to Javascript which helps with AJAX code),
and I have some limited experience with JNI
(the Java Native Interface, to link Java with C code).
My knowledge of the Hurd and Debian GNU/Hurd is reasonable,
as the Debian-Installer and procfs projects
gave me the opportunity to fiddle with many parts of the system.
## Improve Java support
### Justification
Java is a popular language and platform used by many desktop and web
applications (mostly on the server side). As a consequence, competitive Java
support is important for any general-purpose operating system.
### Current situation
Java is currently supported on Hurd with the GNU Java suite:
* [GCJ](http://gcc.gnu.org/java/),
the GNU Compiler for Java, is part of GCC and can compile Java
source code to Java bytecode, and both source code and bytecode to
native code;
* libgcj is the implementation of the Java runtime which GCJ uses.
It is based on [GNU Classpath](http://www.gnu.org/software/classpath/).
It includes a bytecode interpreter which enables
Java applications compiled to native code to dynamically load and execute
Java bytecode from class files.
* The gij command is a wrapper around the above-mentioned virtual machine
functionality of libgcj and can be used as a replacement for the java
command.
However, GCJ does not work flawlessly on Hurd.r
For instance, some parts of libgcj relies on
the POSIX threads signal semantics, which are not yet implemented.
In particular, this makes ant hang waiting for child processes,
which makes some packages fail to build on Hurd
(“ant” is the “make” of the Java world).
### Tasks
* **Finish implementing POSIX thread semantics** in glibc (high priority).
According to POSIX, signal dispositions should be global to a process,
while signal blocking masks should be thread-specific. Signals sent to the
process as a whole are to be delivered to any thread which does not block
them. By contrast, Hurd has per-thread signal dispositions and signals
sent to a process are delivered to the main thread only. I have been
working on refactoring the glibc signal code and implementing the POSIX
semantics as a per-thread option. However, due to lack of time I have not
yet been able to test and debug my code properly. Finishing this work
would be my first task.
* **Fix further problems with GCJ on Hurd** (high priority). While I’m not
aware of any other problems with GCJ at the moment, I suspect some might
turn up as I progress with the other tasks. Fixing these problems would
also be a high-priority task.
* **Port OpenJDK 6** (medium priority). While GCJ is fine, it is not yet
100% complete. It is also slower than OpenJDK on architectures where a
just-in-time compiler is available. Porting OpenJDK would therefore
improve Java support on Hurd in scope and quality. Besides, it would also
be a good way to test GCJ, which is used for bootstrapping by the Debian
OpenJDK packages. Also note that OpenJDK 6 is now the default Java
Runtime Environment on all released Linux-based Debian architectures;
bringing Hurd in line with this would probably be a good thing.
* **Port Eclipse and other Java applications** (low priority). Eclipse is a
popular, state-of-the-art IDE and tool suite used for Java and other
languages. It is a dependency of the Joe-E verifier (see part 3 of this
proposal). Porting Eclipse would be a good opportunity to test GCJ and
OpenJDK.
## Create Java bindings for the Hurd interfaces
### Justification
Java is a popular language, used for many applications and often taught to
introduce object-oriented programming. The fact that Java is a
garbage-collected language makes it easier to use, especially for the less
experienced programmers. Besides, the object-oriented nature of Java is a
natural fit for the capability-based design of Hurd.
Advantages over other garbage-collected, object-oriented languages include
performance, type safety and the possibility to compile a Java translator to
native code and
[link it statically](http://gcc.gnu.org/wiki/Statically_linking_libgcj)
using GCJ, should anyone want to use a
translator written in Java for booting. Note that Java is
[being](http://oss.readytalk.com/avian/)
[used](http://www.linuxjournal.com/article/8757)
in this manner for embedded development.
Java bindings would lower the bar for newcomers
to begin experimenting with what makes Hurd unique
without being faced right away with the complexity of
low-level systems programming.
### Approach
One approach used previously to interface programming languages with the Hurd
has been to create bindings for helper libraries such as libtrivfs. Instead,
for Java I would like to take a lower-level approach by providing access to
Mach primitives and extending MIG to generate Java code from the interface
description files.
This approach would be initially more involved, and would introduces several
issues related to overcoming the "impedence mismatch" between Java and Mach.
However, once an initial implementation is done it would be easier to maintain
in the long run and we would be able to provide Java bindings for a large
percentage of the Hurd’s interfaces.
### Design goals
FIXME: a completer
* Give maximum flexibility to the Java code while maintaining the memory
safety of Java, and using a minimum amount of C code.
* Provide higher-level interfaces as well, ...
* Hurd objects would map to Java objects.
* Local objects can be accessed directly,
remote objects can be accessed transparently over IPC.
### Bindings for Mach system calls
In this low-level approach, my intention is to enable Java code to use Mach
system calls (in particular, mach_msg) more or less directly. This would
ensure full access to the system from Java code, but it raises a number of
issues:
* the Java code must be able to manipulate Mach-level entities, such as port
rights or page-aligned buffers mapped outside of the garbage-collected
heap (for out-of-line transfers);
* putting together IPC messages requires control of the low-level
representation of data.
In order to address these concerns, classes would be encapsulating these
low-level entities so that they can be referenced through normal, safe objects
from standard Java code. Bindings for Mach system calls can then be provided
in terms of these classes. Their implementation would use C code through the
Java Native Interface (JNI).
More specifically, this functionality would be provided by the `org.gnu.mach`
package, which would contain at least the following classes:
* `MachPort` would encapsulate a `mach_port_t`. (Some of) its constructors
would act as an interface for the `mach_port_allocate()` system call.
`MachPort` objects would also be instanciated from other parts of the JNI
C code to represent port rights received through IPC. The `deallocate()`
mehod would call `mach_port_deallocate()` and replace the encapsulated
port name with `MACH_PORT_DEAD`. We would recommend that users call it
when a port is no longer used, but the finalizer would also deallocate the
port when the `MachPort` object is garbage collected.
* `Buffer` would represent a page-aligned buffer allocated outside of the
Java heap, to be transferred (or having been received) as out-of-line
memory. The JNI code would would provide methods to read and write data at
an arbitrary offset (but within bounds) and would use `vm_allocate()` and
`vm_deallocate()` in the same spirit as for `MachPort` objects.
* `Message` would allow Java code to put together Mach messages. The
constructor would allocate a `byte[]` member array of a given size.
Additional methods would be provided to fill in or query the information
in the message header and additional data items, including `MachPort` and
`Buffer` objects which would be translated to the correponding port names
and out-of-line pointers.
A global map from port names to the corresponding `MachPort` object
would probably be needed to ensure that there is a one-to-one
correspondance.
* `Syscall` would provide static JNI methods for performing system calls not
covered by the above classes, such as `mach_msg()` or
`mach_thread_self()`. These methods would accept or return `MachPort`,
`Buffer` and `Message` objects when appropriate. The associated C code
would access the contents of such objects directly in order to perform the
required unsafe operations, such as constructing `MachPort` and `Buffer`
objects directly from port names and C pointers.
Note that careful consideration should be given to the interfaces of these
classes to avoid “safety leaks” which would compromise the safety guarantees
provided by Java. Potential problematic scenarios include the following
examples:
* It must not be possible to write an integer at some position in a
`Message` object, and to read it back as a `MachPort` or `Buffer` object,
since this would allow unsafe access to arbitrary memory addresses and
mach port names.
* Providing the `mach_task_self()` system call would also provide acces to
arbitrary addresses and ports by using the `vm_*` family of RPC operations
with the returned `MachPort` object. This means that the relevant task
operations should be provided by the `Syscall` class instead.
Finally, access should be provided to the initial ports and file descriptors
in `_hurd_ports` and `FIXME`, for instance through static methods such as
`getCRDir()`, `getCWDir()`, `getProc()`, ... in a dedicated class such as
`org.gnu.hurd.InitPorts`.
A realistic example of code based on such interfaces would be:
import org.gnu.mach.MsgType;
import org.gnu.mach.MachPort;
import org.gnu.mach.Buffer;
import org.gnu.mach.Message;
import org.gnu.mach.Syscall;
import org.gnu.hurd.InitPorts;
public class Hello
{
public static main(String argv[])
/* Parent class for all Mach-related exceptions */
throws org.gnu.mach.MachException
{
/* Allocate a reply port */
MachPort reply = new MachPort();
/* Allocate an out-of-line buffer */
Buffer data = new Buffer(MsgType.CHAR, 13);
data.writeString(0, "Hello, World!");
/* Craft an io_write message */
Message msg = new Message(1024);
msg.setRemotePort(InitPorts.getdport(1));
msg.setLocalPort(reply, Message.Type.MAKE_SEND_ONCE);
msg.setId(21000);
msg.addBuffer(data);
/* Make the call, MACH_MSG_SEND | MACH_MSG_RECEIVE */
Syscall.machMsg(msg, true, true, reply);
/* Extract the returned value */
msg.assertId(21100);
int retCode = msg.readInt(0);
int amount = msg.readInt(1);
}
}
Should this paradigm prove insufficient,
more ideas could be borrowed from the
[`org.vmmagic`](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.151.5253&rep=rep1&type=pdf)
package used by [Jikes RVM](http://jikesrvm.org/),
a research Java virtual machine itself written in Java.
### Generating Java stubs with MIG
Once the basic machinery is in place to interface with Mach, Java programs
have more or less equal access to the system functionality without resorting
to more JNI code. However, as illustrated above, this access is far from
convenient.
As a solution I would modify MIG to add the option to output Java code. MIG
would emit a Java interface, a client class able to implement the interface
given a Mach port send right, an a server class which would be able to handle
incoming messages. The class diagram below, although it is by no means
complete or exempt of any problem, illustrates the general idea:
[[gsoc2011_classes.png]]
This structure is somewhat reminiscent of
[Java RMI](http://en.wikipedia.org/wiki/Java_remote_method_invocation)
or similar systems,
which aim to provide more or less transparent access to remote objects.
The exact way the Java code would be generated still needs to be determined,
but basically:
* An interface, corresponding to the header files generated by MIG, would
enumerate the operations listed in a given .defs files. Method names would
be transformed to adhere to Java conventions (for instance,
`some_random_identifier` would become `someRandomIdentifier`).
* A user class, corresponding to the `*User.c` files,
would implement this interface by doing RPC over a given MachPort object.
* A server class, corresponding to `*Server.c`, would be able to handle
incoming messages using a user-provided implementation of the interface.
(Possibly, a skelton class providing methods which would raise
`NotImplementedException`s would be provided as well.
Users would derive from this class and override the relevant methods.
This would allow them not to implement some operations,
and would avoid pre-existing code from breaking when new operations are
introduced.)
In order to help with the implementation of servers, some kind of library
would be needed to associate Mach receive rights with server objects and to
handle incoming messages on dedicated threads, in the spirit of libports.
This would probably require support for port sets at the level of the Mach
primitives described in the previous section.
When possible, operations involving the transmission of send rights
of some kind would be expressed in terms of the MIG-generated interfaces
intead of `MachPort` objects.
Upon reception of a send right,
a `FooUser` object would be created
and associated with the corresponding `MachPort` object.
If the received send right corresponds to a local port
to which a server object has been associated,
this object would be used instead.
This way,
subsequent operations on the received send right
would be handled as direct method calls
instead of going through RPC mechanisms.
Some issues will still need to be solved regarding how MIG will convert
interface description files to Java interfaces. For instance:
* `.defs` files are not explicitely associated with a type. For instance in
the example above, MIG would have to somehow infer that io_t corresponds
to `this` in the `Io` interface.
* More generally, a correspondance between MIG and Java types would have
to be determined. Ideally this would be automated and not hardcoded
too much.
* Initially, reply port parameters would be ignored. However they may be
needed for some applications.
So the details would need to be flushed out during the community bonding
period and as the implementation progresses. However I’m confident that a
satisfactory solution can be designed.
Using these new features, the example above could be rewritten as:
import org.gnu.hurd.InitPorts;
import org.gnu.hurd.Io;
import org.gnu.hurd.IoUser;
class Hello {
static void main(String argv[]) throws ...
{
Io stdout = new IoUser(InitPorts.getdport(1));
String hello = “Hello, World!\n”;
int amount = stdout.write(hello.getBytes(), -1);
/* (A retCode corresponding to an error
would be signalled as an exception.) */
}
}
An example of server implementation would be:
import org.gnu.hurd.Io;
import java.util.Arrays;
class HelloIo implements Io {
final byte[] contents = “Hello, World!\n”.getBytes();
int write(byte[] data, int offset) {
return SOME_ERROR_CODE;
}
byte[] read(int offset, int amount) {
return Arrays.copyOfRange(contents, offset,
offset + amount - 1);
}
/* ... */
}
A new server object could then be created with `new IoServer(new HelloIo())`,
and associated with some receive right at the level of the ports management
library.
### Base classes for common types of translators
Once MIG can target Java code, and a libports equivalent is available,
creating new translators in Java would be greatly facilitated. However,
we would probably want to introduce basic implementations of filesystem
translators in the spirit of libtrivfs or libnetfs. They could take the form
of base classes implementing the relevant MIG-generated interfaces which
would then be derived by users,
or could define a simpler interface
which would then be used by adapter classes
to implement the required ones.
I would draw inspiration from libtrivfs and libnetfs
to design and implement similar solutions for Java.
### Packaging and long-term maintenance
The Java libraries resulting from this work
(including any MIG support classes),
as well as the class files built from the MIG-generated code
for the Mach and Hurd interface definition files,
would be provided as single `hurd-java` package for
Debian GNU/Hurd.
This package would be separate from both Hurd and Mach,
so as not to impose unreasonable build dependencies on them.
I expect I would be able to act as its maintainer in the forseeable future,
either as an individual or as a part of the Hurd team.
Hopefully,
my code would be claimed by the Hurd project as their own,
and consequently the modifications to MIG
(which would at least conceptually depend on the Mach Java package)
could be integrated upstream.
Since by design,
the Java code would use only a small number of stable interfaces,
it would not be subject to excessive amounts of bitrot.
Consequenty,
maintenance would primarily consist in
fixing bugs as they are reported,
and adding new features as they are requested.
A large number of such requests
would mean the package is useful,
so I expect that the overall amount of work
would be correlated with the willingness of more people
to help with maintenance
should I become overwhelmed or get hit by a bus.
## Timeline
The dates listed are deadlines for the associated tasks.
* *Community bonding period.*
Discuss, refine and complete the design of the Java bindings
(in particular the MIG and "libports" parts)
* *May 23.*
Coding starts.
* *May 30.*
Finish implementing pthread signal semantics.
* *June 5.*
Port OpenJDK
* *June 19 (two weeks).*
Fix the remaining problems with GCJ and/or OpenJDK,
possibly port Eclipse or other big Java packages.
* *June 26.*
Create the bindings for Mach.
* *July 3.*
Work on some kind of basic Java libports
to handle receive rights.
* *July 10.*
Test, write some documentation and examples.
* *July 24 (two weeks).*
Add the Java target to MIG.
* *July 31.*
Test, write some documentation and examples.
* *August 7.*
Try to write a basic but non-trivial translator
to evaluate the performance and ease of use of the result,
rectify any rough edges this would uncover.
* *Last two weeks.*
Polish the code,
do the packaging.
## Conclusion
## Appendix: potential applications of object-capability languages
The work discussed is this last part would have
fewer immediate benefits for the Hurd project
and has more of a research orientation.
It is also unlikely that there would be any time remaining
to work on it at the end of the summer.
(Though it could work as some kind of reward
if I somehow managed to do a prefect job of all the rest
within the allocated time :-) ).
As a consequence,
I don't really consider this a part of my application.
This being said,
to some extent the project discussed here
will informed the way I design the Java bindings,
since it depends on them
and I intend to work on this at some point in the future.
I also believe it touches on some interesting ideas,
and a Summer of Code application is probably
as good an occasion as any to discuss them.
### Justification
The primary advantage of multi-server operating systems is the ability to
break what used to be the kernel into small pieces which can be isolated
from each others. This makes sense from an engineering perspective, as
smaller components can be swapped with different implementations and reduce
the impact of bugs.
A capability-based approach also ensures that the
authority wielded by components is clear and reduced to the minimum required
for them to function.
These properties are crucial to the Hurd's agenda of user freedom,
since in order to allow them to plug their own code into the system
[FIXME: développer]
However, this flexibility has a cost. In a system where the isolation of
components relies on running them inside different address spaces,
communication between them must be done through IPC calls.
This introduces a trade-off between the size of the modules
and performance as well as practicality,
which imposes a limit to the granularity with which the system
can be decomposed and the principle of least authority applied
(to the code within a given process, a Mach port is ambient authority).
Another issue is that of the threading structure of the system as a whole.
In systems based on a monolithic kernel, user threads execute the kernel
code themselves, which is then intrinsically concurrent. By contrast, in a
system based on a “client-server” paradigm, each server must be explicitly
multi-threaded if it is to serve requests concurrently.
### Object-capability languages
An object-capability language is an object-oriented language which is
restricted enough so that object references are themselves capabilities.
One such language is Joe-E (FIXME: lien),
which is an object-capability subset of Java:
global state and static methods are mostly forbidden;
careful white-listing of the objects and methods
provided by the Java standard library
ensures that compliant code cannot not access ambient autority.
Ways in which object references can be transferred
are restricted to the most obvious ones
(for instance, exceptions are carefully restricted).
As a result, untrusted Joe-E code can be executed without any further
isolation and its autority can be controlled by carefully limiting the
object references which are passed to it.
This would allow to load and execute translators written in Joe-E
in a single address space.
### Bundling translators into a single process
[mechanisme pour transmettre le code Joe-E
et les port initiaux au serveur]
[émulation des différentes tâches]
### Challenges and further work
[proof-carrying code / typed assembly,
resource accounting (passer en revue la conception de Viengoos?)]
|