The Hurd currently uses its ihash library (libihash) as a generic container for various objects. While it does its job, it has been reported to suffer from high collision rates. In addition, the "one size fits all" approach contributes to slow things down. One particular use case is looking up an object from a Mach port name, which basically translates to getting the file or socket associated with a file descriptor in traditional Unix systems. It's particular because there are actually two lookups for each object, the first being finding the Mach port from a client port name, which is done in the GNU Mach kernel, and the second being finding the server object from a server port name. The best strategy would probably be to directly associate the address of an object to the receive right of its port, eliminating the need to look up again, but this is quite an intrusive change in the code base. For the time being, optimizing lookups would already be an improvement.
The goal of this project is to increase system performance by speeding up object lookups, with a particular focus on name-to-object lookups. Note that there is little room for improvement in the kernel name-to-port lookups because of the various optimizations IPC has received in the past. Looking up server objects from port names could use an algorithm highly tuned for this task, perhaps with better locking (shared/exclusive instead of always mutually exclusive for example). Then, the libihash algorithm could be replaced with a better one, not necessarily a hash based one, to improve all the other users.
This task requires proper knowledge of data structure algorithms, taking into account machine properties such as processor caches, as well as the appropriate skills in C and assembly to check the generated code. Being able to perform accurate measurements in a system that lacks modern profiling tools would also be helpful.
Possible mentors: Richard Braun
IRC, freenode, #hurd, 2013-09-18
In context of
<teythoon> braunr: (wrt the gnumach HACK) funny, I was thinking about doind the same for userspace servers, renaming ports to the address of the associated object, saving the need for the hash table... <braunr> teythoon: see http://darnassus.sceen.net/~hurd-web/community/gsoc/project_ideas/object_lookups/ <braunr> teythoon: my idea is to allow servers to set a label per port, obtained at mesage recv time <braunr> because, yes, looking up an object twice is ridiculous <braunr> you normally still want port names to be close to 0 because it allows some data structure optimizations <teythoon> braunr: yes, I feared that ports should normally be smallish integers and contigious at best <teythoon> braunr: interesting that you say there that libihash suffers from high collision rates <teythoon> I've a theory to why that is, libihash doesn't do any hashing at all <pinotree> there are notes about that in the open_issues section of the wiki <teythoon> but I figured that this is probably ok for port names, as they are small and contigious <neal> braunr: That's called protected payload. <neal> braunr: The idea is that the kernel appends data to the message in flight.