[[!meta copyright="Copyright © 2013 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled [[GNU Free Documentation License|/fdl]]."]]"""]] [[!meta title="Improved System Object Lookups"]] The Hurd currently uses its ihash library ([[hurd/libihash]]) as a generic container for various objects. While it does its job, it has been reported to suffer from high collision rates. In addition, the "one size fits all" approach contributes to slow things down. One particular use case is looking up an object from a Mach port name, which basically translates to getting the file or socket associated with a file descriptor in traditional Unix systems. It's particular because there are actually two lookups for each object, the first being finding the Mach port from a client port name, which is done in the GNU Mach kernel, and the second being finding the server object from a server port name. The best strategy would probably be to directly associate the address of an object to the receive right of its port, eliminating the need to look up again, but this is quite an intrusive change in the code base. For the time being, optimizing lookups would already be an improvement. The goal of this project is to increase system performance by speeding up object lookups, with a particular focus on name-to-object lookups. Note that there is little room for improvement in the kernel name-to-port lookups because of the various optimizations IPC has received in the past. Looking up server objects from port names could use an algorithm highly tuned for this task, perhaps with better locking (shared/exclusive instead of always mutually exclusive for example). Then, the libihash algorithm could be replaced with a better one, not necessarily a hash based one, to improve all the other users. This task requires proper knowledge of data structure algorithms, taking into account machine properties such as processor caches, as well as the appropriate skills in C and assembly to check the generated code. Being able to perform accurate measurements in a system that lacks modern profiling tools would also be helpful. Possible mentors: Richard Braun # IRC, freenode, #hurd, 2013-09-18 In context of [[!message-id "20130918081345.GA13789@dalaran.sceen.net"]]. braunr: (wrt the gnumach HACK) funny, I was thinking about doind the same for userspace servers, renaming ports to the address of the associated object, saving the need for the hash table... teythoon: see http://darnassus.sceen.net/~hurd-web/community/gsoc/project_ideas/object_lookups/ teythoon: my idea is to allow servers to set a label per port, obtained at mesage recv time because, yes, looking up an object twice is ridiculous you normally still want port names to be close to 0 because it allows some data structure optimizations braunr: yes, I feared that ports should normally be smallish integers and contigious at best braunr: interesting that you say there that libihash suffers from high collision rates I've a theory to why that is, libihash doesn't do any hashing at all there are notes about that in the open_issues section of the wiki but I figured that this is probably ok for port names, as they are small and contigious braunr: That's called protected payload. braunr: The idea is that the kernel appends data to the message in flight. ## IRC, freenode, #hurd, 2013-10-24 and, with some effort, getting rid of the hash table lookup by letting the kernel provide the address of the object (iirc neil knew the proper term for that) teythoon: that is a big interface change how so optimizing libihash and libpthread should already be a good start well how do you intend to add this information ? ok, "big" is overstatement, but still, it's a low level interface change that would probably break a lot of things store a pointer in the port structure in gnumach, make that accessible somehow yes but how ? interesting question indeed my plan for x15 is to make this "label" part of received messages which means you need to change the format of messages that is what i call a big change ok, so we need to provide an update path but once done, the change to hurd will be minimal, patching libports should cover most of that normally yes so this amounts to messing with gnumach and mig and designing a clever way to make the update process safe libihash is known to show high collision rates right, libihash it could use an integer hash function on the keys to distribute them better i think that's already what it tries to do so merely using a better hash algorithm such as murmur should do the job or use another data structure altogether no, it does no hashing of its own on the keys are you sure ? well, it uses only prime numbers as sizes, and computes key % size well that's hashing .. :) but this is not really a good hash yes isn't that what i said ? right ok, I didn't get that ;) also, the sizes start quite small, 3, 7, 19... and each time the hash table is grown, all items will have to be updated which is why we could consider another data structure or, for starters, to thin out that list of sizes my personal preference being radix trees I assume you have an implementation handy? yes cool :D but good hashing is excellent too radix trees have their own issues braunr: http://burtleburtle.net/bob/hash/integer.html i use thomas wang's hashing function in x15 or rather, my own personal c utility library, since x15 doesn't hash anything currently but murmur is better we prefer distribution over hashing performances https://131002.net/siphash/