summaryrefslogtreecommitdiff
path: root/user/jkoenig/gsoc2011_proposal.mdwn
diff options
context:
space:
mode:
authorJeremie Koenig <jk@jk.fr.eu.org>2011-04-08 13:56:56 +0200
committerJeremie Koenig <jk@jk.fr.eu.org>2011-04-08 14:06:04 +0200
commit4c63187abaef32037edad63c6b82b31763584bff (patch)
tree9d42fc06a1e642174b7e8a5ab5c9431c91320769 /user/jkoenig/gsoc2011_proposal.mdwn
parentfd5b568b614fb930517681c0dfc630d499c6398e (diff)
gsoc2011 (java): Initial import of the proposal
Diffstat (limited to 'user/jkoenig/gsoc2011_proposal.mdwn')
-rw-r--r--user/jkoenig/gsoc2011_proposal.mdwn567
1 files changed, 567 insertions, 0 deletions
diff --git a/user/jkoenig/gsoc2011_proposal.mdwn b/user/jkoenig/gsoc2011_proposal.mdwn
new file mode 100644
index 00000000..666733c1
--- /dev/null
+++ b/user/jkoenig/gsoc2011_proposal.mdwn
@@ -0,0 +1,567 @@
+
+# Java for Hurd (and vice vera)
+
+Contact information:
+
+ * Full name: Jérémie Koenig
+ * Email: jk@jk.fr.eu.org
+ * IRC: jkoenig on Freenode and OFTC
+
+## Introductions
+
+I am a first year M.Sc. student
+in Computer Science at University of Strasbourg (France).
+My interests include capability-based security,
+programming languages and formal methods (in particular, proof-carrying code).
+
+### Previous involvement
+
+Although I have known the Hurd for some time,
+I did not contribute until last summer,
+during which I participated to Google Summer of Code
+as a student for the Debian project.
+I worked on porting Debian-Installer to Hurd.
+This project was mostly a success,
+although we still have to use a special mirror for installation
+with a few modified packages
+and tweaked priorities
+to work around some uninstallable packages
+with Priority: standard.
+
+Shortly afterwards,
+I rewrote the procfs translator
+to fix some issues with memory leaks,
+make it more reliable,
+and improve comptibility with Linux-based tools
+such as `procps` or `htop`.
+
+Although I have not had as much time
+as I would have liked to dedicate to the Hurd
+since that time,
+I have continued to maintain the mirror in question,
+and I have started to work
+on implementing POSIX threads signal semantics in glibc.
+
+### Project-related skills and interests
+
+I have used Java mostly for university assignements.
+This includes non-trivial projects
+using threads and distributed programming frameworks
+such as Java RMI or CORBA.
+I have also used it to experiment with
+Google App Engine
+(web applications)
+and Google Web Toolkit
+(a compiler from Java to Javascript which helps with AJAX code),
+and I have some limited experience with JNI
+(the Java Native Interface, to link Java with C code).
+
+My knowledge of the Hurd and Debian GNU/Hurd is reasonable,
+as the Debian-Installer and procfs projects
+gave me the opportunity to fiddle with many parts of the system.
+
+
+## Improve Java support
+
+### Justification
+
+Java is a popular language and platform used by many desktop and web
+applications (mostly on the server side). As a consequence, competitive Java
+support is important for any general-purpose operating system.
+
+### Current situation
+
+Java is currently supported on Hurd with the GNU Java suite:
+
+ * GCJ, the GNU Compiler for Java, is part of GCC and can compile Java
+ source code to Java bytecode, and both source code and bytecode to
+ native code;
+ * libgcj is the implementation of the Java runtime which GCJ uses. It is
+ based on GNU Classpath. It includes a bytecode interpreter which enables
+ Java applications compiled to native code to dynamically load and execute
+ Java bytecode from class files.
+ * The gij command is a wrapper around the above-mentioned virtual machine
+ functionality of libgcj and can be used as a replacement for the java
+ command.
+
+However, GCJ does not work flawlessly on Hurd.r
+For instance, some parts of libgcj relies on
+the POSIX threads signal semantics, which are not yet implemented.
+In particular, this makes ant hang waiting for child processes,
+which makes some packages fail to build on Hurd
+(“ant” is the “make” of the Java world).
+
+### Tasks
+
+ * **Finish implementing POSIX thread semantics** in glibc (high priority).
+ According to POSIX, signal dispositions should be global to a process,
+ while signal blocking masks should be thread-specific. Signals sent to the
+ process as a whole are to be delivered to any thread which does not block
+ them. By contrast, Hurd has per-thread signal dispositions and signals
+ sent to a process are delivered to the main thread only. I have been
+ working on refactoring the glibc signal code and implementing the POSIX
+ semantics as a per-thread option. However, due to lack of time I have not
+ yet been able to test and debug my code properly. Finishing this work
+ would be my first task.
+ * **Fix further problems with GCJ on Hurd** (high priority). While I’m not
+ aware of any other problems with GCJ at the moment, I suspect some might
+ turn up as I progress with the other tasks. Fixing these problems would
+ also be a high-priority task.
+ * **Port OpenJDK 6** (medium priority). While GCJ is fine, it is not yet
+ 100% complete. It is also slower than OpenJDK on architectures where a
+ just-in-time compiler is available. Porting OpenJDK would therefore
+ improve Java support on Hurd in scope and quality. Besides, it would also
+ be a good way to test GCJ, which is used for bootstrapping by the Debian
+ OpenJDK packages. Also note that OpenJDK 6 is now the default Java
+ Runtime Environment on all released Linux-based Debian architectures;
+ bringing Hurd in line with this would probably be a good thing.
+ * **Port Eclipse and other Java applications** (low priority). Eclipse is a
+ popular, state-of-the-art IDE and tool suite used for Java and other
+ languages. It is a dependency of the Joe-E verifier (see part 3 of this
+ proposal). Porting Eclipse would be a good opportunity to test GCJ and
+ OpenJDK.
+
+
+## Create Java bindings for the Hurd interfaces
+
+### Justification
+
+Java is a popular language, used for many applications and often taught to
+introduce object-oriented programming. The fact that Java is a
+garbage-collected language makes it easier to use, especially for the less
+experienced programmers. Besides, the object-oriented nature of Java is a
+natural fit for the capability-based design of Hurd.
+
+Advantages over other garbage-collected, object-oriented languages include
+performance, type safety and the possibility to compile a Java translator to
+native code and link it statically using GCJ, should anyone want to use a
+translator written in Java for booting. Note that Java is being used in this
+manner for embedded development.
+
+Java bindings would lower the bar for newcomers
+to begin experimenting with what makes Hurd unique
+without being faced right away with the complexity of
+low-level systems programming.
+
+### Approach
+
+One approach used previously to interface programming languages with the Hurd
+has been to create bindings for helper libraries such as libtrivfs. Instead,
+for Java I would like to take a lower-level approach by providing access to
+Mach primitives and extending MIG to generate Java code from the interface
+description files.
+
+This approach would be initially more involved, and would introduces several
+issues related to overcoming the "impedence mismatch" between Java and Mach.
+However, once an initial implementation is done it would be easier to maintain
+in the long run and we would be able to provide Java bindings for a large
+percentage of the Hurd’s interfaces.
+
+### Design goals
+
+FIXME: a completer
+
+ * Give maximum flexibility to the Java code while maintaining the memory
+ safety of Java, and using a minimum amount of C code.
+ * Provide higher-level interfaces as well, ...
+ * Hurd objects would map to Java objects.
+ * Local objects can be accessed directly,
+ remote objects can be accessed transparently over IPC.
+
+### Bindings for Mach system calls
+
+In this low-level approach, my intention is to enable Java code to use Mach
+system calls (in particular, mach_msg) more or less directly. This would
+ensure full access to the system from Java code, but it raises a number of
+issues:
+
+ * the Java code must be able to manipulate Mach-level entities, such as port
+ rights or page-aligned buffers mapped outside of the garbage-collected
+ heap (for out-of-line transfers);
+ * putting together IPC messages requires control of the low-level
+ representation of data.
+
+In order to address these concerns, classes would be encapsulating these
+low-level entities so that they can be referenced through normal, safe objects
+from standard Java code. Bindings for Mach system calls can then be provided
+in terms of these classes. Their implementation would use C code through the
+Java Native Interface (JNI).
+
+More specifically, this functionality would be provided by the `org.gnu.mach`
+package, which would contain at least the following classes:
+
+ * `MachPort` would encapsulate a `mach_port_t`. (Some of) its constructors
+ would act as an interface for the `mach_port_allocate()` system call.
+ `MachPort` objects would also be instanciated from other parts of the JNI
+ C code to represent port rights received through IPC. The `deallocate()`
+ mehod would call `mach_port_deallocate()` and replace the encapsulated
+ port name with `MACH_PORT_DEAD`. We would recommend that users call it
+ when a port is no longer used, but the finalizer would also deallocate the
+ port when the `MachPort` object is garbage collected.
+ * `Buffer` would represent a page-aligned buffer allocated outside of the
+ Java heap, to be transferred (or having been received) as out-of-line
+ memory. The JNI code would would provide methods to read and write data at
+ an arbitrary offset (but within bounds) and would use `vm_allocate()` and
+ `vm_deallocate()` in the same spirit as for `MachPort` objects.
+ * `Message` would allow Java code to put together Mach messages. The
+ constructor would allocate a `byte[]` member array of a given size.
+ Additional methods would be provided to fill in or query the information
+ in the message header and additional data items, including `MachPort` and
+ `Buffer` objects which would be translated to the correponding port names
+ and out-of-line pointers.
+ A global map from port names to the corresponding `MachPort` object
+ would probably be needed to ensure that there is a one-to-one
+ correspondance.
+ * `Syscall` would provide static JNI methods for performing system calls not
+ covered by the above classes, such as `mach_msg()` or
+ `mach_thread_self()`. These methods would accept or return `MachPort`,
+ `Buffer` and `Message` objects when appropriate. The associated C code
+ would access the contents of such objects directly in order to perform the
+ required unsafe operations, such as constructing `MachPort` and `Buffer`
+ objects directly from port names and C pointers.
+
+Note that careful consideration should be given to the interfaces of these
+classes to avoid “safety leaks” which would compromise the safety guarantees
+provided by Java. Potential problematic scenarios include the following
+examples:
+
+ * It must not be possible to write an integer at some position in a
+ `Message` object, and to read it back as a `MachPort` or `Buffer` object,
+ since this would allow unsafe access to arbitrary memory addresses and
+ mach port names.
+ * Providing the `mach_task_self()` system call would also provide acces to
+ arbitrary addresses and ports by using the `vm_*` family of RPC operations
+ with the returned `MachPort` object. This means that the relevant task
+ operations should be provided by the `Syscall` class instead.
+
+Finally, access should be provided to the initial ports and file descriptors
+in `_hurd_ports` and `FIXME`, for instance through static methods such as
+`getCRDir()`, `getCWDir()`, `getProc()`, ... in a dedicated class such as
+`org.gnu.hurd.InitPorts`.
+
+A realistic example of code based on such interfaces would be:
+
+ import org.gnu.mach.MsgType;
+ import org.gnu.mach.MachPort;
+ import org.gnu.mach.Buffer;
+ import org.gnu.mach.Message;
+ import org.gnu.mach.Syscall;
+ import org.gnu.hurd.InitPorts;
+
+ public class Hello
+ {
+ public static main(String argv[])
+ /* Parent class for all Mach-related exceptions */
+ throws org.gnu.mach.MachException
+ {
+ /* Allocate a reply port */
+ MachPort reply = new MachPort();
+
+ /* Allocate an out-of-line buffer */
+ Buffer data = new Buffer(MsgType.CHAR, 13);
+ data.writeString(0, "Hello, World!");
+
+ /* Craft an io_write message */
+ Message msg = new Message(1024);
+ msg.setRemotePort(InitPorts.getdport(1));
+ msg.setLocalPort(reply, Message.Type.MAKE_SEND_ONCE);
+ msg.setId(21000);
+ msg.addBuffer(data);
+
+ /* Make the call, MACH_MSG_SEND | MACH_MSG_RECEIVE */
+ Syscall.machMsg(msg, true, true, reply);
+
+ /* Extract the returned value */
+ msg.assertId(21100);
+ int retCode = msg.readInt(0);
+ int amount = msg.readInt(1);
+ }
+ }
+
+### Generating Java stubs with MIG
+
+Once the basic machinery is in place to interface with Mach, Java programs
+have more or less equal access to the system functionality without resorting
+to more JNI code. However, as illustrated above, this access is far from
+convenient.
+
+As a solution I would modify MIG to add the option to output Java code. MIG
+would emit a Java interface, a client class able to implement the interface
+given a Mach port send right, an a server class which would be able to handle
+incoming messages. The class diagram below, although it is by no means
+complete or exempt of any problem, illustrates the general idea:
+
+[[gsoc2011_classes.png]]
+
+This structure is somewhat reminiscent of Java RMI or similar systems,
+which aim to provide more or less transparent access to remote objects.
+The exact way the Java code would be generated still needs to be determined,
+but basically:
+
+ * An interface, corresponding to the header files generated by MIG, would
+ enumerate the operations listed in a given .defs files. Method names would
+ be transformed to adhere to Java conventions (for instance,
+ `some_random_identifier` would become `someRandomIdentifier`).
+ * A user class, corresponding to the `*User.c` files,
+ would implement this interface by doing RPC over a given MachPort object.
+ * A server class, corresponding to `*Server.c`, would be able to handle
+ incoming messages using a user-provided implementation of the interface.
+ (Possibly, a skelton class providing methods which would raise
+ `NotImplementedException`s would be provided as well.
+ Users would derive from this class and override the relevant methods.
+ This would allow them not to implement some operations,
+ and would avoid pre-existing code from breaking when new operations are
+ introduced.)
+
+In order to help with the implementation of servers, some kind of library
+would be needed to associate Mach receive rights with server objects and to
+handle incoming messages on dedicated threads, in the spirit of libports.
+This would probably require support for port sets at the level of the Mach
+primitives described in the previous section.
+
+When possible, operations involving the transmission of send rights
+of some kind would be expressed in terms of the MIG-generated interfaces
+intead of `MachPort` objects.
+Upon reception of a send right,
+a `FooUser` object would be created
+and associated with the corresponding `MachPort` object.
+If the received send right corresponds to a local port
+to which a server object has been associated,
+this object would be used instead.
+This way,
+subsequent operations on the received send right
+would be handled as direct method calls
+instead of going through RPC mechanisms.
+
+Some issues will still need to be solved regarding how MIG will convert
+interface description files to Java interfaces. For instance:
+
+ * `.defs` files are not explicitely associated with a type. For instance in
+ the example above, MIG would have to somehow infer that io_t corresponds
+ to `this` in the `Io` interface.
+ * More generally, a correspondance between MIG and Java types would have
+ to be determined. Ideally this would be automated and not hardcoded
+ too much.
+ * Initially, reply port parameters would be ignored. However they may be
+ needed for some applications.
+
+So the details would need to be flushed out during the community bonding
+period and as the implementation progresses. However I’m confident that a
+satisfactory solution can be designed.
+
+Using these new features, the example above could be rewritten as:
+
+ import org.gnu.hurd.InitPorts;
+ import org.gnu.hurd.Io;
+ import org.gnu.hurd.IoUser;
+
+ class Hello {
+ static void main(String argv[]) throws ...
+ {
+ Io stdout = new IoUser(InitPorts.getdport(1));
+ String hello = “Hello, World!\n”;
+
+ int amount = stdout.write(hello.getBytes(), -1);
+
+ /* (A retCode corresponding to an error
+ would be signalled as an exception.) */
+ }
+ }
+
+An example of server implementation would be:
+
+ import org.gnu.hurd.Io;
+ import java.util.Arrays;
+
+ class HelloIo implements Io {
+ final byte[] contents = “Hello, World!\n”.getBytes();
+
+ int write(byte[] data, int offset) {
+ return SOME_ERROR_CODE;
+ }
+
+ byte[] read(int offset, int amount) {
+ return Arrays.copyOfRange(contents, offset,
+ offset + amount - 1);
+ }
+
+ /* ... */
+ }
+
+A new server object could then be created with `new IoServer(new HelloIo())`,
+and associated with some receive right at the level of the ports management
+library.
+
+### Base classes for common types of translators
+
+Once MIG can target Java code, and a libports equivalent is available,
+creating new translators in Java would be greatly facilitated. However,
+we would probably want to introduce basic implementations of filesystem
+translators in the spirit of libtrivfs or libnetfs. They could take the form
+of base classes implementing the relevant MIG-generated interfaces which
+would then be derived by users,
+or could define a simpler interface
+which would then be used by adapter classes
+to implement the required ones.
+
+I would draw inspiration from libtrivfs and libnetfs
+to design and implement similar solutions for Java.
+
+### Packaging and long-term maintenance
+
+The Java libraries resulting from this work
+(including any MIG support classes),
+as well as the class files built from the MIG-generated code
+for the Mach and Hurd interface definition files,
+would be provided as single `hurd-java` package for
+Debian GNU/Hurd.
+This package would be separate from both Hurd and Mach,
+so as not to impose unreasonable build dependencies on them.
+
+I expect I would be able to act as its maintainer in the forseeable future,
+either as an individual or as a part of the Hurd team.
+Hopefully,
+my code would be claimed by the Hurd project as their own,
+and consequently the modifications to MIG
+(which would at least conceptually depend on the Mach Java package)
+could be integrated upstream.
+
+Since by design,
+the Java code would use only a small number of stable interfaces,
+it would not be subject to excessive amounts of bitrot.
+Consequenty,
+maintenance would primarily consist in
+fixing bugs as they are reported,
+and adding new features as they are requested.
+A large number of such requests
+would mean the package is useful,
+so I expect that the overall amount of work
+would be correlated with the willingness of more people
+to help with maintenance
+should I become overwhelmed or get hit by a bus.
+
+
+## Timeline
+
+The dates listed are deadlines for the associated tasks.
+
+ * *Community bonding period.*
+ Discuss, refine and complete the design of the Java bindings
+ (in particular the MIG and "libports" parts)
+ * *May 23.*
+ Coding starts.
+ * *May 30.*
+ Finish implementing pthread signal semantics.
+ * *June 5.*
+ Port OpenJDK
+ * *June 19 (two weeks).*
+ Fix the remaining problems with GCJ and/or OpenJDK,
+ possibly port Eclipse or other big Java packages.
+ * *June 26.*
+ Create the bindings for Mach.
+ * *July 3.*
+ Work on some kind of basic Java libports
+ to handle receive rights.
+ * *July 10.*
+ Test, write some documentation and examples.
+ * *July 24 (two weeks).*
+ Add the Java target to MIG.
+ * *July 31.*
+ Test, write some documentation and examples.
+ * *August 7.*
+ Try to write a basic but non-trivial translator
+ to evaluate the performance and ease of use of the result,
+ rectify any rough edges this would uncover.
+ * *Last two weeks.*
+ Polish the code,
+ do the packaging.
+
+
+## Conclusion
+
+
+
+
+## Appendix: potential applications of object-capability languages
+
+The work discussed is this last part would have
+fewer immediate benefits for the Hurd project
+and has more of a research orientation.
+It is also unlikely that there would be any time remaining
+to work on it at the end of the summer.
+(Though it could work as some kind of reward
+if I somehow managed to do a prefect job of all the rest
+within the allocated time :-) ).
+As a consequence,
+I don't really consider this a part of my application.
+
+This being said,
+to some extent the project discussed here
+will informed the way I design the Java bindings,
+since it depends on them
+and I intend to work on this at some point in the future.
+I also believe it touches on some interesting ideas,
+and a Summer of Code application is probably
+as good an occasion as any to discuss them.
+
+### Justification
+
+The primary advantage of multi-server operating systems is the ability to
+break what used to be the kernel into small pieces which can be isolated
+from each others. This makes sense from an engineering perspective, as
+smaller components can be swapped with different implementations and reduce
+the impact of bugs.
+A capability-based approach also ensures that the
+authority wielded by components is clear and reduced to the minimum required
+for them to function.
+These properties are crucial to the Hurd's agenda of user freedom,
+since in order to allow them to plug their own code into the system
+[FIXME: développer]
+
+However, this flexibility has a cost. In a system where the isolation of
+components relies on running them inside different address spaces,
+communication between them must be done through IPC calls.
+This introduces a trade-off between the size of the modules
+and performance as well as practicality,
+which imposes a limit to the granularity with which the system
+can be decomposed and the principle of least authority applied
+(to the code within a given process, a Mach port is ambient authority).
+
+Another issue is that of the threading structure of the system as a whole.
+In systems based on a monolithic kernel, user threads execute the kernel
+code themselves, which is then intrinsically concurrent. By contrast, in a
+system based on a “client-server” paradigm, each server must be explicitly
+multi-threaded if it is to serve requests concurrently.
+
+### Object-capability languages
+
+An object-capability language is an object-oriented language which is
+restricted enough so that object references are themselves capabilities.
+
+One such language is Joe-E (FIXME: lien),
+which is an object-capability subset of Java:
+global state and static methods are mostly forbidden;
+careful white-listing of the objects and methods
+provided by the Java standard library
+ensures that compliant code cannot not access ambient autority.
+Ways in which object references can be transferred
+are restricted to the most obvious ones
+(for instance, exceptions are carefully restricted).
+
+As a result, untrusted Joe-E code can be executed without any further
+isolation and its autority can be controlled by carefully limiting the
+object references which are passed to it.
+This would allow to load and execute translators written in Joe-E
+in a single address space.
+
+### Bundling translators into a single process
+
+[mechanisme pour transmettre le code Joe-E
+et les port initiaux au serveur]
+[émulation des différentes tâches]
+
+### Challenges and further work
+
+[proof-carrying code / typed assembly,
+resource accounting (passer en revue la conception de Viengoos?)]
+