[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]]

[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
id="license" text="Permission is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation License, Version 1.2 or
any later version published by the Free Software Foundation; with no Invariant
Sections, no Front-Cover Texts, and no Back-Cover Texts.  A copy of the license
is included in the section entitled [[GNU Free Documentation
License|/fdl]]."]]"""]]

[[!tag open_issue_gnumach]]

[[!toc]]


# IRC, freenode, #hurd, 2011-05-07

    <braunr> things that are referred to as "system calls" in glibc are
      actually RPCs to the kernel or other tasks, those RPCs have too lookup
      port rights
    <braunr> the main services have tens of thousands of ports, looking up one
      is slow

There is a [[!FF_project 268]][[!tag bounty]] on this task.


# IRC, freenode, #hurd, 2011-04-23

    <braunr> youpi: is there any use of the port renaming facility ?
    <youpi> I don't know
    <braunr> at least, did you see such use ?
    <braunr> i wonder why mach mach_port_insert_right() lets the caller specify
      the port name
    <youpi> ../hurd-debian/hurd/serverboot/default_pager.c:	kr =
      mach_port_rename(	default_pager_self,
    <braunr> mach_port_rename() is used only once, in the default pager
    <braunr> so it's not that important
    <braunr> but mach_port_insert_right() lets userspace task decide the port
      name value
    <youpi> just to repeat myself again, I don't know port stuff very much :)
    <braunr> well you know that a port denotes a right, which denotes a port
    <youpi> yes, but I don't have any real experience with it
    <braunr> err
    <braunr> port name
    <braunr> the only reason I see is that the caller, say /hurd/exec running a
      fork()
    <braunr> hm
    <braunr> no, i don't even see the reason here
    <braunr> port names should be allocated by the kernel only, like file
      descriptors
    <youpi> you can choose file descriptor values too
    <braunr> really ?
    <youpi> with dup2, yes
    <braunr> oh
    <braunr> hm
    <braunr> what's the data structure in current unices to store file
      descriptors ?
    <braunr> a hash table ?
    <youpi> I don't know
    <braunr> i'll have to look at that
    <braunr> FYI, i'm asking these questions because i'm thinking of reworking
      ipc spaces
    <braunr> i believe the use of splay trees completely destroys performance
      of tasks with many many port names such as the root file system
    <youpi> that can be a problem yes
    <youpi> since there are 3 ports per opened file, and like 3 per thread too
    <braunr> + the page cache
    <youpi> with a few thousand opened files and threads, that makes a lot
    <youpi> by "opened file" I meant page cache actually
    <braunr> i saw numbers up to 30k
    <braunr> ok
    <youpi> on buildds I easily see 100k ports
    <braunr> for a single task ?
    <braunr> wow
    <youpi> yes
    <youpi> the page cache is 4k files
    <braunr> so that's definitely worth the try
    <youpi> so that already makes 12k ports
    <youpi> and 4k is not so big
    <braunr> it's limited to 4K ?
    <youpi> I haven't been able to check where the 100k come from yet
    <youpi> braunr: yas
    <braunr> could be leaks :/
    <youpi> yes
    <braunr> omg, a hard limit on the page cache ..
    <youpi> vm/vm_object.c:int		vm_object_cached_max = 4000;	/* may
      be patched*/
    <braunr> mach is really old :(
    <youpi> I've raised it
    <youpi> before it was 200
    <youpi> ...
    <braunr> oO
    <youpi> I tried to dro pthe limit, but then I was lacking memory
    <youpi> which I believe have fixed the other day, but I have to test again
    <braunr> that implementation doesn't know how to deal with memory pressure
    <youpi> yes
    <braunr> i saw your recent changes about adding warnings in such cases
    <braunr> so, back to ipc spaces
    <braunr> i think splay trees 1/ can get very unbalanced easily
    <braunr> which isn't hard to imagine
    <braunr> and 2/ make poor usage of the cpu caches because they're BST and
      write a lot to memory
    <youpi> maybe you could write a patch which would dump statistics on that?
    <braunr> that's part of the job i'm assigning to myself
    <youpi> ok
    <braunr> i'd like to try replacing splay trees with radix trees
    <youpi> I can run it on the buildds
    <youpi> buildds are very good stress-tests :)
    <braunr> :)
    <youpi> 22h building -> 77k ports
    <youpi> 26h building -> 97k ports
    <youpi> the problem is that when I add leak debugging (backtraces), I'm
      getting out of memory :)
    <braunr> that will be a small summer of code outside the gsoc :p
    <braunr> :/
    <braunr> backtraces are very consuming
    <youpi> but that's only because of hardcoded limits
    <youpi> I'll have to test again with bigger limits
    <braunr> again ..
    <braunr> evil hard limits
    <youpi> well, actually we could as well just drop them
    <youpi> but we'd also need to easily get statistics on zone/vm_maps usage
    <youpi> because else we don't see leaks
    <youpi> (except that the machine eventually crashes)
    <braunr> hm
    <braunr> i haven't explained why i was asking my questions actually
    <braunr> so, i want radix trees, because they're nice
    <braunr> they reduce the paths lengths
    <braunr> they don't get too unbalanced (they're invariant wrt the order of
      operations)
    <braunr> they don't need to write to memory on lookups
    <braunr> the only drawback is that they can create much overhead if their
      usage pattern isn't appropriate
    <braunr> elements in such a structure should be close, so that they share
      common nodes
    <youpi> the common usage pattern in ext2fs is a big bunch of ever-open
      ports :)
    <braunr> if there is one entry per node, it's a big waste
    <braunr> yes
    <youpi> there are 3, actually
    <braunr> but the port names have low values
    <braunr> they're allocated sequentially, beginning at 0
    <braunr> (or 1 actually)
    <braunr> which is perfect for radix trees
    <youpi> yes
    <youpi>  97989: send
    <braunr> but if anyone can rename
    <braunr> this introduces a new potential weakness
    <youpi> ah, if it's just a weakness it's probably not a problem
    <youpi> I thought it was even a no-go
    <braunr> i think so
    <youpi> I guess port rename is very seldom
    <braunr> but in a future version, it would be nice not to allow port
      renaming
    <braunr> unless there are similar issues in current unix kernels
    <braunr> in which case i'd say it's acceptable
    <youpi> there are
    <braunr> of that order ?
    <youpi> and it'd be useful for e.g. processing
      tracing/debugging/tweaking/whatever
    <youpi> it's also used to hide fds from a process
    <braunr> port renaming you mean ?
    <youpi> you allocate them very high
    <youpi> yes
    <braunr> ok
    <youpi> choosing your port name, generally
    <youpi> to match what the process expects for instance
    <braunr> then it would be a matter of resource limiting (which we totally
      lack afaik)
    <braunr> along the number of maximum open files, you would have a number of
      maximum rights
    <braunr> does that seem fine to you ?
    <youpi> if done throught rlimits, sure
    <braunr> something similar yes
    <youpi> (_no_ PORTS_MAX ;) )
    <braunr> oh and, in addition, i remember gnumach has a special
      configuration of the processor in which caching is limited
    <braunr> like write-through only
    <youpi> didn't I fix that recently ?
    <braunr> i don't know :)
    <braunr> CR0=e001003b
    <braunr> i don't think it's fixed
    <youpi> I mean, in the git
    <braunr> ah
    <youpi> not in the debian package
    <braunr> didn't tried the git version yet
    <braunr> last time i tried (which was a long time ago), it made the kernel
      crash
    <braunr> have you figured why ?
    <youpi> I'm not aware of that
    <braunr> anyway, splay trees write a lot, and most trees write a lot even
      at insertion/removal to rebalance
    <youpi> braunr: Mmm, there's no clearance of CD in the kernel actually
    <braunr> with radix trees, even if caching can't be fully enabled, it would
      make much better use of it
    <braunr> so if port renaming isn't a true issue, i'll choose that data
      structure
    <youpi> that'd probably be better yes
    <youpi> I'm surprised by the CD, I do remember fixing something like this
      lately
    <braunr> there are several levels where CD can be set
    <braunr> the processors ORs all those if i'm right
    <braunr> to determine if caching is enabled 
    <youpi> I know
    <braunr> ok
    <youpi> but in my memory that was at the CR* level, precisely
    <braunr> maybe for xen only ?
    <youpi> no
    <braunr> well good luck if you hunt that one, i'm off, see you :)
    <youpi> braunr: ah, no, it was the PGE flag that I had fixed

    <antrik> braunr: explicit port naming is used for example to pass some
      initial ports to a new task at well-known places IIRC
    <antrik> braunr: but these tend to be low numbers, so I don't see a problem
      there
    <antrik> (I'm not familiar with radix trees... why would high numbers be a
      problem?)

    <youpi> braunr: iirc the ipc space is limited to ~192k ports

    <braunr> antrik: in most cases i've seen, the insert_right() call is used
      on task_self()
    <braunr> and if there really are special ports (like the bootstrap or
      device ports), they should have special names
    <braunr> IIRC, these ports are given through command line expansion by the
      kernel at boot time
    <braunr> but it seems reasonable to think of port renaming as a potentially
      useful feature
    <braunr> antrik: the problem with radix trees isn't them being high, it's
      them being sparse
    <braunr> you get the most efficient trees when entries have keys that are
      close to each other
    <braunr> because radix trees are a type of tries (the path in the tree is
      based on the elements composing the key)
    <braunr> so the more common prefixes you have, the less external nodes you
      need
    <braunr> here, keys are port names, but they can be memory addresses or
      offsets in memory objects (like in the page cache)
    <braunr> the radix algorithm takes a few bits, say 4 or 6, at a time from a
      key, and uses that as an index in a node
    <braunr> if keys are sparse, there can be as little as one entry per node
    <braunr> IIRC, the worst case (on entry per node with the maximum possible
      number of nodes for a 32-bits key) is 2% entries
    <braunr> the reste being null entries and almost-empty nodes containing
      them
    <braunr> so if you leave the ability to give port rights the names you
      want, you can create such worst case trees
    <braunr> which may consume several MiB of memory per tree
    <braunr> tens of MiB i'd say
    <braunr> on the other hand, in the current state, almost all hurd
      applications use sequentially allocated port names, close to 0 (which
      allows a nice optimization)
    <braunr> so a radix ree would be the most efficient
    <antrik> well, if some processes really feel they must use random numbers
      for port names, they *ought* to be penalized ;-)


# IRC, freenode, #hurd, 2011-04-27

    <braunr> antrik: remember when you asked why high numbers would be a
      problem with radix trees ?
    <braunr> here is a radix tree with one entry, which key is around 5000
    <braunr> [  656.296412] tree height: 3
    <braunr> [  656.296412] index:  0, level:  0, height:  3, count:  1,
      bitmap: 0000000000000002
    <braunr> [  656.296412] index:  1, level:  1, height:  2, count:  1,
      bitmap: 0000000000004000
    <braunr> [  656.296412] index: 14, level:  2, height:  1, count:  1,
      bitmap: 0000000000000080
    <braunr> three levels, each with an external node (dynamically allocated),
      for one entry
    <braunr> so in the worst case of entries with keys close to the highest
      values, the could be many external nodes with higher paths lengths than
      when keys are close to 0
    <braunr> which also brings the problem of port name allocation
    <braunr> can someone with access to a buildd which has an uptime of at
      least a few days (and did at least one build) show me the output of
      portinfo 3 | tail ?
    <braunr> port names are allocated linearly IIRC, like PIDs, and some parts
      of the kernel may rely on them not being reused often
    <braunr> but for maximum effifiency, they should be
    <braunr> efficiency*
    <braunr> 00:00 < braunr> can someone with access to a buildd which has an
      uptime of at least a few days (and did at least one build) show me the
      output of portinfo 3 | tail ?
    <braunr> :)
    <youpi> it's almost like wc -l
    <youpi>   4905: receive
    <youpi> vs 4647
    <youpi> for /
    <youpi>  52902: receive
    <youpi> vs 52207
    <youpi> for the chroot
    <braunr> even after several builds ?
    <braunr> and several days ?
    <youpi> that's after 2 days
    <youpi> it's not so many builds
    <youpi> rossini is not so old
    <youpi> (7h)
    <youpi> but many builds
    <youpi> 70927: send
    <youpi> vs 70938
    <braunr> ok
    <braunr> so it seems port names are reused
    <braunr> good
    <youpi> yes they are clearly
    <braunr> i think i remember a comment about why the same port name
      shouldn't be reused too soon
    <youpi> well, it could help catching programming errors
    <braunr> that it helped catch bugs in applications that could
      deallocate/reallote quickly
    <braunr> reallocate*
    <braunr> without carefuly synchronization
    <braunr> careful
    <braunr> damn, i'm tired :/
    <youpi> but that's about debugging
    <youpi> so we don't care about performance there
    <braunr> yes
    <braunr> i'll try to improve allocation performance too
    <braunr> using e.g. bitmaps in each external node back to the root so that
      unused slots are quickly found
    <braunr> i thknk that's what idr does in linux
    <antrik> braunr: idr?
    <braunr> antrik: a data structure used to map integers to pointers
    <braunr> http://fxr.watson.org/fxr/source/lib/idr.c?v=linux-2.6


# IRC, freenode, #hurd, 2011-06-08

    <braunr> hm, reverse space/port to name lookups also suck
    <braunr> having separate types for simple ipc entries and splay tree
      entries really makes many parts of the ipc code complicated
    <braunr> and a global hash table for these operations is scary


# IRC, freenode, #hurd, 2011-06-09

    <braunr> hm nice, my radix tree code runs inside gnumach, along with the
      original splay tree code and assertions making sure results are the same
    <braunr> there is this "collision" thing i'm not sure to understand but
      once this is solved, replacing the splay trees should be easy

    <braunr> youpi: is there a way to easily know the number of send rights
      associated to a port ?
    <youpi> portinfo ?
    <braunr> portinfo gives information in a space
    <braunr> but this is specific to a port
    <braunr> is there an option for that ?
    <youpi> -v
    <braunr> hm ok
    <youpi>     25: send (refs: 550)
    <braunr> nice
    <braunr> youpi: if you have time, could you give me the min/max/avg numbers
      of send rights referring to the same port on buildds ?
    <braunr> i'm trying to estimate if it's better to have space->list_of_ports
      or port->list_of_spaces to replace the global ipc hash table
    <braunr> the latter seems better but there could be unexpected cases on
      machines using large amounts of resources like the buildds
    <youpi> max is 64k
    <youpi> min is 1 of course :)
    <braunr> 64k
    <braunr> then it's not what i'm looking for
    <youpi> avg is 55
    <braunr> isn't this the number of urefs ?
    <youpi> I don't know
    <braunr> hmm
    <braunr> what i'm looking for is the number of *pure send rights* for the
      same port
    <braunr> i don't think portinfo can give it
    <braunr> there can only be one such send right per task for the same port
    <braunr> 64k would mean there are 64k tasks
    <youpi> ok, so it's more difficult
    <youpi> it means using -t
    <braunr> ahh
    <youpi> and run n^2 portinfo over the n processes
    <braunr> i see
    <youpi> Mmm, that will however still show any duplicate send right
    <youpi> but then by min/max/avg, you mean, over time ?
    <braunr> i'll change the source code, simpler
    <youpi> e.g. min would be right after boot?
    <braunr> min is 1
    <youpi> 1 what ?
    <braunr> 1 send right to a port
    <youpi> ah, 1 for a given port
    <braunr> yes
    <youpi> ok, it becomes really hairy to compute, I don't hav ethe time :)
    <braunr> avg and max are more interesting :)
    <braunr> no worries
    <youpi> braunr: I wouldn't be surprised that max is the number of tasks
    <youpi> e.g. for a send port to the proc server for instance
    <braunr> youpi: it is, but i'm not looking for potential numbers
    <youpi> I'm not talking about a potential number, but an actual number
      that's almost always true
    <braunr> for one port, yes
    <braunr> but yes, ok for max
    <braunr> this makes choosing an appropriate data structure difficult

    <antrik> braunr: actually, min number of send rights to a port is 0... but
      I'm sure you know that already :-)

    <antrik> youpi: normally each client gets a separate port. I'm not sure
      there are any ports with send rights distributed over many tasks...

    <jkoenig> antrik, what about / ?

    <youpi> antrik: not necessarily

    <antrik> jkoenig: not sure... isn't the "/" port authenticated to the
      specific user?

    <jkoenig> antrik, I guess so, but a single user could still have many
      tasks.
    <jkoenig> (wrt /)
    <antrik> jkoenig: well, in theory the tasks having exactly the same UIDs
      and GITs could probably share an auth token... but that's not how things
      are handled in general
    <antrik> at least I don't think so
    <antrik> tasks are authenticated, not users
    <antrik> err... GIDs :-)
    <jkoenig> antrik, still, my quick glance to the fork() code seemed to
      indicate the port is inherited as-is, maybe authentication happens only
      when something is actually looked up?
    <jkoenig> hmm "rpctrace ls -d /" does not show any authentication calls,
      only a lookup("") on the root which returns a different port
    <jkoenig> so I guess the root port is "deauthenticated" or something when
      the uid of a process is changed.
    <antrik> too bad cfhammer isn't around, he digged into all this stuff...
    <antrik> I know that there is a mechanism which reauths all FDs when the
      IDs of a process change
    <antrik> but I'm not sure the "/" port uses this mechanism

    <braunr> antrik: but the radix tree codee is really used as is, which means
      no locking, no preloading before locking, all of this because simple
      locks don't exist on UP, and because the kernel isn't preemptible

    <braunr> antrik: and yes, min number is 0, but in that case you don't need
      (space, port) -> right lookups :)
    <braunr> antrik: or put in another way, whichever reasonable structure you
      use, when it's empty, you don't care much
    <braunr> which also means that the min number has actually no value here
    <braunr> because the same applies to 1

    <braunr> then what seems to take most time is forks
    <braunr> and i hope my upcoming kernel changes will help the situation a
      bit
    <pinotree> what are your incoming gnumach changes about?
    <braunr> the ipc translation layer speed
    <braunr> which basically means operating on port names (looking them up,
      inserting, removing, renaming, looking up send rights to a specific
      ports) will be faster
    <braunr> and i believe forks are (one of) the most demanding use cases wrt
      ipc space manipulation
    <braunr> i'm really surprised how badly the splay trees are used
    <braunr> the worst case for this data structure is traversal
    <braunr> and this is done in many situations
    <braunr> leaving the tree in its worst case shape
    <braunr> and i didn't mentioned the bunch of memory writes occurring, event
      for things like lookups or traversals
    <braunr> this is slow and can disrupt many cpu cache lines
    <braunr> and when there are 10k to 100+k (e.g. in some ext2fs instances on
      buildds), just imagine the number of operations involved in those
      operations
    <braunr> a simple traversal_next involves a rotation *gasp*
    <braunr> this means potentially writing on 3 different cache lines, for
      *one* next operation
    <pinotree> what are you replacing that splay tree with?
    <braunr> radix trees
    <braunr> much shorter paths
    <braunr> extremely few memory writes
    <braunr> locality of reference when traversing
    <braunr> good cache usage (as many of the top nodes are reused)
    <braunr> the two drawbacks are 1/ memory allocation for external nodes,
      which means the tree must be preloaded before locking
    <braunr> and 2/ high memory overhead if the keys are sparse
    <braunr> but this isn't the case with port names, unless someone messes it
      up by assigning random names to many rights

    <antrik> braunr: so, when will we see the first performance comparision?
      :-)
    <braunr> antrik: that's a topic of itself, how to compare ?
    <braunr> antrik: the thing is, my current gnumach patches only makes
      assertions
    <braunr> this is the best way i found to test my tree in real life
      conditions
    <braunr> much cleanup is needed
    <braunr> and what i'd like to do is to completely replace all teh
      translation layer structures with it
    <braunr> which means removing much code, making sure it still works as
      expected
    <braunr> this is tedious
    <braunr> so one month at least
    <antrik> braunr: comparing shouldn't be too hard... the average configure
      script does a lot of forking, which should be a good benchmark according
      to your observations
    <braunr> rough estimates are easy, yes
    <braunr> but my observations my be wrong :p
    <antrik> braunr: well, we don't really need precise numbers...
    <antrik> unless you need to do some kind of fine-tuning?
    <braunr> i don't know yet


# IRC, freenode, #hurd, 2011-06-18

    < braunr> hmm, i'm having a problem with integrating my radix tree code in
      gnumach
    < braunr> inserting into such a tree can trigger memory allocation
    < braunr> so commonly, the tree i loaded with nodes before insertion,
      usually if it requires strong locking
    < braunr> ipc spaces are locked using "simple locks" (which are spin locks)
    < braunr> but spin locks are noops on UP, and gnumach is always UP ..
    < braunr> so, should i still include preloading code, even if it'll end up
      dead code ?
    < antrik> hm... I think we discussed this before; but isn't gnumach
      supposed to be SMP-capable, minus bugs?...
    < braunr> it is
    < braunr> but ofc, if i choose not to include preloading, i'll write
      #errors so that the day gnumach is built for SMP again, such support will
      be included
    < antrik> oh, sorry, I think I misread. what is UP?
    < braunr> uniprocessor
    < antrik> well, if it's just bugs forcing the current UP state, I think
      saying that gnumach is always UP is a stretch...
    < braunr> sure, but it's a practical consideration
    < antrik> does the locking complicate stuff? or is it just performance
      considerations?
    < braunr> no it's about correctness and completeness
    < braunr> if you don't preload a tree before locking
    < braunr> and memory allocation occurs while you're holding a simple lock
    < braunr> and memory allocation requires the kernel to sleep
    < braunr> you're screwed
    < braunr> but i hate the idea of including code that won't be used and
      which won't be easy to test
    < braunr> so i'm wondering if it's ok for now to just put this in a TODO
      comment and write it when the time is right
    < braunr> or if i should spens the week adding this and tweaking the
      userspace implementation to "emulate" spin locks
    < antrik> well, it's tricky situation. on one hand, it seems stupid to
      include handling for something that presently isn't used, and it's not
      clear when it will. on the other hand, I'd rather not see additional
      problems introduced that will make fixing SMP even harder...
    < braunr> that's why i'm asking here
    < antrik> of course, you could resolve this question by fixing SMP
      first... ;-)
    < braunr> ew
    < antrik> well, I guess it would be best first to make the code work... and
      we can still decide about the locking thing before it goes mainline I'd
      say?
    < braunr> "make the code work" ?
    < antrik> I mean make gnumach work with your radix tree code
    < braunr> without preloading then
    < antrik> yeah... as a first step... I guess adding it later won't be any
      harder than adding it right now?
    < braunr> not much
    < braunr> testing is what requires time really


# IRC, freenode, #hurd, 2011-06-27

    < braunr> ok, here is the radix tree code:
      http://git.sceen.net/rbraun/libbraunr.git/
    < braunr> the preloading stuff will be added in the kernel only, as it's
      really pointless and not easily doable in userspace
    < youpi> preloading?
    < braunr> youpi: yes, preloading
    < braunr> radix trees allocate external nodes
    < youpi> well, providing a url at some random time of some random day is
      not a great way  to get eyes on it :)
    < braunr> and ipc spaces are locked when inserting/allocating names
    < braunr> we normally don't need preloading in gnumach
    < braunr> since there is no preemption nor SMP
    < braunr> but in case someone changes that, i'd like the code to be mostly
      ready
    < braunr> and correctly handle those ugly simple locks
    < braunr> youpi: is what i say clear enough or do you need more background
      on what is done ?
    < youpi> about preloading?
    < braunr> yes
    < youpi> I guess it means allocating nodes in advance?
    < braunr> yes
    < youpi> k
    < braunr> before locking the ipc spaces


# IRC, freenode, #hurd, 2011-06-28

    < braunr> antrik: i think i won't write the code for the preloading stuff
      actually
    < braunr> antrik: it's not very difficult, but i really hate the idea of
      not being able to reliably test it
    < braunr> antrik: and i'd rather concentrate on integrating the radix tree
      code in gnu mach now
    < braunr> (i've already removed much code, even some files which weren't
      actually used before my changes !)
    < braunr> hmm, i won't be able not to write the preloading code after all
    < antrik> braunr: not able not to write? how's that?
    < braunr> antrik: it's actually required
    < braunr> there are three functions, ipc_entry_get, ipc_entry_alloc, and
      ipc_entry_grow_table
    < braunr> ipc_entry_get cannot allocate memory
    < braunr> if it fails, ipc_entry_grow_table is called, which will allocate
      memory
    < braunr> ipc_entry_alloc calls both of them depending on the result of
      ipc_entry_get
    < braunr> this is the equivalent of the preloading thing i had in mind
    < braunr> not a bad thing after all
    < braunr> the only thing i'm afraid of are the "optimized" version of those
      ipc functions in te so-called fast paths
    < braunr> i'm afraid if i don't deal right with those, the kernel may end
      up using mostly slow paths
    < braunr> but considering the purpose of those fast paths was merely to
      avoid the overhead of function calls and some locking functions, it
      shouldn't be that bad
    < braunr> this is such a mess eh
    < antrik> hurray microoptimisations ;-)
    < braunr> there, the preload functions are done, easy :)
    < antrik> braunr: seems you spent less time implementing it than pondering
      whether you should implement it ;-)
    < braunr> well, i couldn't implement it correctly before knowing what
      should have been done exactly
    < braunr> and there are still other problems :/
    < braunr> and the other problems make me reconsider if this was useful at
      all eh
    < braunr> youpi: i'm unable to find where ipc tree entries are released
      except in ipc_entry_alloc_name(), which could mean they're leaked ...
    < braunr> youpi: would you have time to take a look ?
    < youpi> they aren't in ipc_entry_dealloc() ?
    < braunr> no .....
    < youpi> it's not so unprobable that they're only freed when the task quits
    < braunr> i don't see that either
    < braunr> i only see them released in ipc_entry_alloc_name()
    < braunr> so maybe they're reused
    < braunr> but i'm not sure about that when reading the code
    < braunr> oh wait, yes, they are :/
    < braunr> my bad
    < youpi> in the ipc_splay_tree_* fucntions I guess?
    < braunr> yes
    < braunr> it's just surprsing to see them allocated outside the tree code
      only
    < braunr> but released in both the entry and the splay tree code ...


# IRC, freenode, #hurd, 2011-06-29

    < braunr> hmm i missed an important thing :/
    < braunr> and it's scary
    < braunr> it looks like the splay tree is mainly used when names are
      provided
    < braunr> whereas the entry table is used when names are allocated
    < braunr> which means the table is the main ipc data structure, even for
      tasks with lots of rights
    < braunr> i can make my root ext2fs have more than 10k rights, and i see
      the ipc table table grow along that number ...
    < braunr> now thetable has 15k+ entries
    < braunr> IOW there is no point to put the radix tree code in gnumach :(
    < antrik> braunr: what do you mean by "provided" and "allocated"?
    < antrik> and what is that table you are talking about?
    < braunr> antrik: provided means the user space tasks gives the name of the
      new right
    < braunr> antrik: allocated means the kernel generates it
    < braunr> antrik: the table i'm talking about is is_table in struct
      ipc_space
    < braunr>  55  *      Every space has a non-NULL is_table with
      is_table_size entries.
    < braunr>  56  *      A space may have a NULL is_tree.  is_tree_small
      records the
    < braunr>  57  *      number of entries in the tree that, if the table were
      to grow
    < braunr>  58  *      to the next larger size, would move from the tree to
      the table.
    < braunr> here is the description which mislead me (in addition of the
      obscure code)
    < braunr>  50  *      Spaces hold capabilities for ipc_object_t's (ports
      and port sets).
    < braunr>  51  *      Each ipc_entry_t records a capability.  Most
      capabilities have
    < braunr>  52  *      small names, and the entries are elements of a table.
    < braunr>  53  *      Capabilities can have large names, and a splay tree
      holds
    < braunr>  54  *      those entries.  The cutoff point between the table
      and the tree
    < braunr>  55  *      is adjusted dynamically to minimize memory
      consumption.
    < antrik> ah, so the rights with a low name are in a linear table, and only
      those with "random" high names are stored in the splay tree instead?
    < antrik> seems a rather complex design... I guess though there isn't much
      room for further optimisation there :-(
    < antrik> (well, except for code size optimisation -- which could in fact
      make a considerable difference...)
    < braunr> well there are problems with this approach, but most don't
      concern performance
    < braunr> when the table gets big (close to the page size or more), it gets
      remapped when reallocated
    < braunr> which will incur some penalty because of the tlb
    < braunr> but it's annoying even for small tables
    < braunr> the initial table size is 4 entries, and from what i can see,
      most tables are 128 entries wide when tasks are destroyed
    < braunr> an obvious simple optimization is to set a larger default size
    < braunr> the same applies for the dead name tables
    < braunr> those reallocations are a pain, and they're due to this design
    < braunr> they can also fail because of fragmentation
    < braunr> there would be a point to radix trees if they would replace all
      that, and not just the splay tree
    < braunr> but this would cause a lot of changes in a lot of places, and in
      particular the "optimized" fast paths i mentioned yesterday
    < braunr> we'll see how they perform in x15 :>
    < braunr> there is a slight noticeable improvement when increasing the
      initial size of the entry table
    < antrik> braunr: well, if you use them in a completely different
      implementation, there will be no way of telling whether they make a
      difference
    < antrik> how did you test the improvement?
    < braunr> antrik: no actually it's completely negligeable
    < braunr> hm
    < braunr> is that a valid word ? :)
    < braunr> negligible
    < braunr> youpi: did you see my comments about the ipc stuff this earlier
      today ?
    < braunr> youpi: well to make things short, when port names are allocated,
      the right they refer to is allocated from the ipc table
    < braunr> youpi: the splay tree is only used for user provided names
    < braunr> youpi: i had tables as large as the number of rights in a space
      (i could easily reach 20k)
    < braunr> youpi: whereas the splay trees had at most ~40 entries ..