[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled [[GNU Free Documentation License|/fdl]]."]]"""]] [[!tag open_issue_gnumach]] There is a [[!FF_project 268]][[!tag bounty]] on this task. IRC, freenode, #hurd, 2011-04-23 youpi: is there any use of the port renaming facility ? I don't know at least, did you see such use ? i wonder why mach mach_port_insert_right() lets the caller specify the port name ../hurd-debian/hurd/serverboot/default_pager.c: kr = mach_port_rename( default_pager_self, mach_port_rename() is used only once, in the default pager so it's not that important but mach_port_insert_right() lets userspace task decide the port name value just to repeat myself again, I don't know port stuff very much :) well you know that a port denotes a right, which denotes a port yes, but I don't have any real experience with it err port name the only reason I see is that the caller, say /hurd/exec running a fork() hm no, i don't even see the reason here port names should be allocated by the kernel only, like file descriptors you can choose file descriptor values too really ? with dup2, yes oh hm what's the data structure in current unices to store file descriptors ? a hash table ? I don't know i'll have to look at that FYI, i'm asking these questions because i'm thinking of reworking ipc spaces i believe the use of splay trees completely destroys performance of tasks with many many port names such as the root file system that can be a problem yes since there are 3 ports per opened file, and like 3 per thread too + the page cache with a few thousand opened files and threads, that makes a lot by "opened file" I meant page cache actually i saw numbers up to 30k ok on buildds I easily see 100k ports for a single task ? wow yes the page cache is 4k files so that's definitely worth the try so that already makes 12k ports and 4k is not so big it's limited to 4K ? I haven't been able to check where the 100k come from yet braunr: yas could be leaks :/ yes omg, a hard limit on the page cache .. vm/vm_object.c:int vm_object_cached_max = 4000; /* may be patched*/ mach is really old :( I've raised it before it was 200 ... oO I tried to dro pthe limit, but then I was lacking memory which I believe have fixed the other day, but I have to test again that implementation doesn't know how to deal with memory pressure yes i saw your recent changes about adding warnings in such cases so, back to ipc spaces i think splay trees 1/ can get very unbalanced easily which isn't hard to imagine and 2/ make poor usage of the cpu caches because they're BST and write a lot to memory maybe you could write a patch which would dump statistics on that? that's part of the job i'm assigning to myself ok i'd like to try replacing splay trees with radix trees I can run it on the buildds buildds are very good stress-tests :) :) 22h building -> 77k ports 26h building -> 97k ports the problem is that when I add leak debugging (backtraces), I'm getting out of memory :) that will be a small summer of code outside the gsoc :p :/ backtraces are very consuming but that's only because of hardcoded limits I'll have to test again with bigger limits again .. evil hard limits well, actually we could as well just drop them but we'd also need to easily get statistics on zone/vm_maps usage because else we don't see leaks (except that the machine eventually crashes) hm i haven't explained why i was asking my questions actually so, i want radix trees, because they're nice they reduce the paths lengths they don't get too unbalanced (they're invariant wrt the order of operations) they don't need to write to memory on lookups the only drawback is that they can create much overhead if their usage pattern isn't appropriate elements in such a structure should be close, so that they share common nodes the common usage pattern in ext2fs is a big bunch of ever-open ports :) if there is one entry per node, it's a big waste yes there are 3, actually but the port names have low values they're allocated sequentially, beginning at 0 (or 1 actually) which is perfect for radix trees yes 97989: send but if anyone can rename this introduces a new potential weakness ah, if it's just a weakness it's probably not a problem I thought it was even a no-go i think so I guess port rename is very seldom but in a future version, it would be nice not to allow port renaming unless there are similar issues in current unix kernels in which case i'd say it's acceptable there are of that order ? and it'd be useful for e.g. processing tracing/debugging/tweaking/whatever it's also used to hide fds from a process port renaming you mean ? you allocate them very high yes ok choosing your port name, generally to match what the process expects for instance then it would be a matter of resource limiting (which we totally lack afaik) along the number of maximum open files, you would have a number of maximum rights does that seem fine to you ? if done throught rlimits, sure something similar yes (_no_ PORTS_MAX ;) ) oh and, in addition, i remember gnumach has a special configuration of the processor in which caching is limited like write-through only didn't I fix that recently ? i don't know :) CR0=e001003b i don't think it's fixed I mean, in the git ah not in the debian package didn't tried the git version yet last time i tried (which was a long time ago), it made the kernel crash have you figured why ? I'm not aware of that anyway, splay trees write a lot, and most trees write a lot even at insertion/removal to rebalance braunr: Mmm, there's no clearance of CD in the kernel actually with radix trees, even if caching can't be fully enabled, it would make much better use of it so if port renaming isn't a true issue, i'll choose that data structure that'd probably be better yes I'm surprised by the CD, I do remember fixing something like this lately there are several levels where CD can be set the processors ORs all those if i'm right to determine if caching is enabled I know ok but in my memory that was at the CR* level, precisely maybe for xen only ? no well good luck if you hunt that one, i'm off, see you :) braunr: ah, no, it was the PGE flag that I had fixed braunr: explicit port naming is used for example to pass some initial ports to a new task at well-known places IIRC braunr: but these tend to be low numbers, so I don't see a problem there (I'm not familiar with radix trees... why would high numbers be a problem?) braunr: iirc the ipc space is limited to ~192k ports antrik: in most cases i've seen, the insert_right() call is used on task_self() and if there really are special ports (like the bootstrap or device ports), they should have special names IIRC, these ports are given through command line expansion by the kernel at boot time but it seems reasonable to think of port renaming as a potentially useful feature antrik: the problem with radix trees isn't them being high, it's them being sparse you get the most efficient trees when entries have keys that are close to each other because radix trees are a type of tries (the path in the tree is based on the elements composing the key) so the more common prefixes you have, the less external nodes you need here, keys are port names, but they can be memory addresses or offsets in memory objects (like in the page cache) the radix algorithm takes a few bits, say 4 or 6, at a time from a key, and uses that as an index in a node if keys are sparse, there can be as little as one entry per node IIRC, the worst case (on entry per node with the maximum possible number of nodes for a 32-bits key) is 2% entries the reste being null entries and almost-empty nodes containing them so if you leave the ability to give port rights the names you want, you can create such worst case trees which may consume several MiB of memory per tree tens of MiB i'd say on the other hand, in the current state, almost all hurd applications use sequentially allocated port names, close to 0 (which allows a nice optimization) so a radix ree would be the most efficient well, if some processes really feel they must use random numbers for port names, they *ought* to be penalized ;-)