summaryrefslogtreecommitdiff
path: root/open_issues/rework_gnumach_ipc_spaces.mdwn
diff options
context:
space:
mode:
Diffstat (limited to 'open_issues/rework_gnumach_ipc_spaces.mdwn')
-rw-r--r--open_issues/rework_gnumach_ipc_spaces.mdwn728
1 files changed, 0 insertions, 728 deletions
diff --git a/open_issues/rework_gnumach_ipc_spaces.mdwn b/open_issues/rework_gnumach_ipc_spaces.mdwn
deleted file mode 100644
index 20ae126d..00000000
--- a/open_issues/rework_gnumach_ipc_spaces.mdwn
+++ /dev/null
@@ -1,728 +0,0 @@
-[[!meta copyright="Copyright © 2011, 2013 Free Software Foundation, Inc."]]
-
-[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
-id="license" text="Permission is granted to copy, distribute and/or modify this
-document under the terms of the GNU Free Documentation License, Version 1.2 or
-any later version published by the Free Software Foundation; with no Invariant
-Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license
-is included in the section entitled [[GNU Free Documentation
-License|/fdl]]."]]"""]]
-
-[[!tag open_issue_gnumach]]
-
-[[!toc]]
-
-
-# IRC, freenode, #hurd, 2011-05-07
-
- <braunr> things that are referred to as "system calls" in glibc are
- actually RPCs to the kernel or other tasks, those RPCs have too lookup
- port rights
- <braunr> the main services have tens of thousands of ports, looking up one
- is slow
-
-There is a [[!FF_project 268]][[!tag bounty]] on this task.
-
-
-# IRC, freenode, #hurd, 2011-04-23
-
- <braunr> youpi: is there any use of the port renaming facility ?
- <youpi> I don't know
- <braunr> at least, did you see such use ?
- <braunr> i wonder why mach mach_port_insert_right() lets the caller specify
- the port name
- <youpi> ../hurd-debian/hurd/serverboot/default_pager.c: kr =
- mach_port_rename( default_pager_self,
- <braunr> mach_port_rename() is used only once, in the default pager
- <braunr> so it's not that important
- <braunr> but mach_port_insert_right() lets userspace task decide the port
- name value
- <youpi> just to repeat myself again, I don't know port stuff very much :)
- <braunr> well you know that a port denotes a right, which denotes a port
- <youpi> yes, but I don't have any real experience with it
- <braunr> err
- <braunr> port name
- <braunr> the only reason I see is that the caller, say /hurd/exec running a
- fork()
- <braunr> hm
- <braunr> no, i don't even see the reason here
- <braunr> port names should be allocated by the kernel only, like file
- descriptors
- <youpi> you can choose file descriptor values too
- <braunr> really ?
- <youpi> with dup2, yes
- <braunr> oh
- <braunr> hm
- <braunr> what's the data structure in current unices to store file
- descriptors ?
- <braunr> a hash table ?
- <youpi> I don't know
- <braunr> i'll have to look at that
- <braunr> FYI, i'm asking these questions because i'm thinking of reworking
- ipc spaces
- <braunr> i believe the use of splay trees completely destroys performance
- of tasks with many many port names such as the root file system
- <youpi> that can be a problem yes
- <youpi> since there are 3 ports per opened file, and like 3 per thread too
- <braunr> + the page cache
- <youpi> with a few thousand opened files and threads, that makes a lot
- <youpi> by "opened file" I meant page cache actually
- <braunr> i saw numbers up to 30k
- <braunr> ok
- <youpi> on buildds I easily see 100k ports
- <braunr> for a single task ?
- <braunr> wow
- <youpi> yes
- <youpi> the page cache is 4k files
- <braunr> so that's definitely worth the try
- <youpi> so that already makes 12k ports
- <youpi> and 4k is not so big
- <braunr> it's limited to 4K ?
- <youpi> I haven't been able to check where the 100k come from yet
- <youpi> braunr: yas
- <braunr> could be leaks :/
- <youpi> yes
- <braunr> omg, a hard limit on the page cache ..
- <youpi> vm/vm_object.c:int vm_object_cached_max = 4000; /* may
- be patched*/
- <braunr> mach is really old :(
- <youpi> I've raised it
- <youpi> before it was 200
- <youpi> ...
- <braunr> oO
- <youpi> I tried to dro pthe limit, but then I was lacking memory
- <youpi> which I believe have fixed the other day, but I have to test again
- <braunr> that implementation doesn't know how to deal with memory pressure
- <youpi> yes
- <braunr> i saw your recent changes about adding warnings in such cases
- <braunr> so, back to ipc spaces
- <braunr> i think splay trees 1/ can get very unbalanced easily
- <braunr> which isn't hard to imagine
- <braunr> and 2/ make poor usage of the cpu caches because they're BST and
- write a lot to memory
- <youpi> maybe you could write a patch which would dump statistics on that?
- <braunr> that's part of the job i'm assigning to myself
- <youpi> ok
- <braunr> i'd like to try replacing splay trees with radix trees
- <youpi> I can run it on the buildds
- <youpi> buildds are very good stress-tests :)
- <braunr> :)
- <youpi> 22h building -> 77k ports
- <youpi> 26h building -> 97k ports
- <youpi> the problem is that when I add leak debugging (backtraces), I'm
- getting out of memory :)
- <braunr> that will be a small summer of code outside the gsoc :p
- <braunr> :/
- <braunr> backtraces are very consuming
- <youpi> but that's only because of hardcoded limits
- <youpi> I'll have to test again with bigger limits
- <braunr> again ..
- <braunr> evil hard limits
- <youpi> well, actually we could as well just drop them
- <youpi> but we'd also need to easily get statistics on zone/vm_maps usage
- <youpi> because else we don't see leaks
- <youpi> (except that the machine eventually crashes)
- <braunr> hm
- <braunr> i haven't explained why i was asking my questions actually
- <braunr> so, i want radix trees, because they're nice
- <braunr> they reduce the paths lengths
- <braunr> they don't get too unbalanced (they're invariant wrt the order of
- operations)
- <braunr> they don't need to write to memory on lookups
- <braunr> the only drawback is that they can create much overhead if their
- usage pattern isn't appropriate
- <braunr> elements in such a structure should be close, so that they share
- common nodes
- <youpi> the common usage pattern in ext2fs is a big bunch of ever-open
- ports :)
- <braunr> if there is one entry per node, it's a big waste
- <braunr> yes
- <youpi> there are 3, actually
- <braunr> but the port names have low values
- <braunr> they're allocated sequentially, beginning at 0
- <braunr> (or 1 actually)
- <braunr> which is perfect for radix trees
- <youpi> yes
- <youpi> 97989: send
- <braunr> but if anyone can rename
- <braunr> this introduces a new potential weakness
- <youpi> ah, if it's just a weakness it's probably not a problem
- <youpi> I thought it was even a no-go
- <braunr> i think so
- <youpi> I guess port rename is very seldom
- <braunr> but in a future version, it would be nice not to allow port
- renaming
- <braunr> unless there are similar issues in current unix kernels
- <braunr> in which case i'd say it's acceptable
- <youpi> there are
- <braunr> of that order ?
- <youpi> and it'd be useful for e.g. processing
- tracing/debugging/tweaking/whatever
- <youpi> it's also used to hide fds from a process
- <braunr> port renaming you mean ?
- <youpi> you allocate them very high
- <youpi> yes
- <braunr> ok
- <youpi> choosing your port name, generally
- <youpi> to match what the process expects for instance
- <braunr> then it would be a matter of resource limiting (which we totally
- lack afaik)
- <braunr> along the number of maximum open files, you would have a number of
- maximum rights
- <braunr> does that seem fine to you ?
- <youpi> if done throught rlimits, sure
- <braunr> something similar yes
- <youpi> (_no_ PORTS_MAX ;) )
- <braunr> oh and, in addition, i remember gnumach has a special
- configuration of the processor in which caching is limited
- <braunr> like write-through only
- <youpi> didn't I fix that recently ?
- <braunr> i don't know :)
- <braunr> CR0=e001003b
- <braunr> i don't think it's fixed
- <youpi> I mean, in the git
- <braunr> ah
- <youpi> not in the debian package
- <braunr> didn't tried the git version yet
- <braunr> last time i tried (which was a long time ago), it made the kernel
- crash
- <braunr> have you figured why ?
- <youpi> I'm not aware of that
- <braunr> anyway, splay trees write a lot, and most trees write a lot even
- at insertion/removal to rebalance
- <youpi> braunr: Mmm, there's no clearance of CD in the kernel actually
- <braunr> with radix trees, even if caching can't be fully enabled, it would
- make much better use of it
- <braunr> so if port renaming isn't a true issue, i'll choose that data
- structure
- <youpi> that'd probably be better yes
- <youpi> I'm surprised by the CD, I do remember fixing something like this
- lately
- <braunr> there are several levels where CD can be set
- <braunr> the processors ORs all those if i'm right
- <braunr> to determine if caching is enabled
- <youpi> I know
- <braunr> ok
- <youpi> but in my memory that was at the CR* level, precisely
- <braunr> maybe for xen only ?
- <youpi> no
- <braunr> well good luck if you hunt that one, i'm off, see you :)
- <youpi> braunr: ah, no, it was the PGE flag that I had fixed
-
- <antrik> braunr: explicit port naming is used for example to pass some
- initial ports to a new task at well-known places IIRC
- <antrik> braunr: but these tend to be low numbers, so I don't see a problem
- there
- <antrik> (I'm not familiar with radix trees... why would high numbers be a
- problem?)
-
- <youpi> braunr: iirc the ipc space is limited to ~192k ports
-
- <braunr> antrik: in most cases i've seen, the insert_right() call is used
- on task_self()
- <braunr> and if there really are special ports (like the bootstrap or
- device ports), they should have special names
- <braunr> IIRC, these ports are given through command line expansion by the
- kernel at boot time
- <braunr> but it seems reasonable to think of port renaming as a potentially
- useful feature
- <braunr> antrik: the problem with radix trees isn't them being high, it's
- them being sparse
- <braunr> you get the most efficient trees when entries have keys that are
- close to each other
- <braunr> because radix trees are a type of tries (the path in the tree is
- based on the elements composing the key)
- <braunr> so the more common prefixes you have, the less external nodes you
- need
- <braunr> here, keys are port names, but they can be memory addresses or
- offsets in memory objects (like in the page cache)
- <braunr> the radix algorithm takes a few bits, say 4 or 6, at a time from a
- key, and uses that as an index in a node
- <braunr> if keys are sparse, there can be as little as one entry per node
- <braunr> IIRC, the worst case (on entry per node with the maximum possible
- number of nodes for a 32-bits key) is 2% entries
- <braunr> the reste being null entries and almost-empty nodes containing
- them
- <braunr> so if you leave the ability to give port rights the names you
- want, you can create such worst case trees
- <braunr> which may consume several MiB of memory per tree
- <braunr> tens of MiB i'd say
- <braunr> on the other hand, in the current state, almost all hurd
- applications use sequentially allocated port names, close to 0 (which
- allows a nice optimization)
- <braunr> so a radix ree would be the most efficient
- <antrik> well, if some processes really feel they must use random numbers
- for port names, they *ought* to be penalized ;-)
-
-
-# IRC, freenode, #hurd, 2011-04-27
-
- <braunr> antrik: remember when you asked why high numbers would be a
- problem with radix trees ?
- <braunr> here is a radix tree with one entry, which key is around 5000
- <braunr> [ 656.296412] tree height: 3
- <braunr> [ 656.296412] index: 0, level: 0, height: 3, count: 1,
- bitmap: 0000000000000002
- <braunr> [ 656.296412] index: 1, level: 1, height: 2, count: 1,
- bitmap: 0000000000004000
- <braunr> [ 656.296412] index: 14, level: 2, height: 1, count: 1,
- bitmap: 0000000000000080
- <braunr> three levels, each with an external node (dynamically allocated),
- for one entry
- <braunr> so in the worst case of entries with keys close to the highest
- values, the could be many external nodes with higher paths lengths than
- when keys are close to 0
- <braunr> which also brings the problem of port name allocation
- <braunr> can someone with access to a buildd which has an uptime of at
- least a few days (and did at least one build) show me the output of
- portinfo 3 | tail ?
- <braunr> port names are allocated linearly IIRC, like PIDs, and some parts
- of the kernel may rely on them not being reused often
- <braunr> but for maximum effifiency, they should be
- <braunr> efficiency*
- <braunr> 00:00 < braunr> can someone with access to a buildd which has an
- uptime of at least a few days (and did at least one build) show me the
- output of portinfo 3 | tail ?
- <braunr> :)
- <youpi> it's almost like wc -l
- <youpi> 4905: receive
- <youpi> vs 4647
- <youpi> for /
- <youpi> 52902: receive
- <youpi> vs 52207
- <youpi> for the chroot
- <braunr> even after several builds ?
- <braunr> and several days ?
- <youpi> that's after 2 days
- <youpi> it's not so many builds
- <youpi> rossini is not so old
- <youpi> (7h)
- <youpi> but many builds
- <youpi> 70927: send
- <youpi> vs 70938
- <braunr> ok
- <braunr> so it seems port names are reused
- <braunr> good
- <youpi> yes they are clearly
- <braunr> i think i remember a comment about why the same port name
- shouldn't be reused too soon
- <youpi> well, it could help catching programming errors
- <braunr> that it helped catch bugs in applications that could
- deallocate/reallote quickly
- <braunr> reallocate*
- <braunr> without carefuly synchronization
- <braunr> careful
- <braunr> damn, i'm tired :/
- <youpi> but that's about debugging
- <youpi> so we don't care about performance there
- <braunr> yes
- <braunr> i'll try to improve allocation performance too
- <braunr> using e.g. bitmaps in each external node back to the root so that
- unused slots are quickly found
- <braunr> i thknk that's what idr does in linux
- <antrik> braunr: idr?
- <braunr> antrik: a data structure used to map integers to pointers
- <braunr> http://fxr.watson.org/fxr/source/lib/idr.c?v=linux-2.6
-
-
-# IRC, freenode, #hurd, 2011-06-08
-
- <braunr> hm, reverse space/port to name lookups also suck
- <braunr> having separate types for simple ipc entries and splay tree
- entries really makes many parts of the ipc code complicated
- <braunr> and a global hash table for these operations is scary
-
-
-# IRC, freenode, #hurd, 2011-06-09
-
- <braunr> hm nice, my radix tree code runs inside gnumach, along with the
- original splay tree code and assertions making sure results are the same
- <braunr> there is this "collision" thing i'm not sure to understand but
- once this is solved, replacing the splay trees should be easy
-
- <braunr> youpi: is there a way to easily know the number of send rights
- associated to a port ?
- <youpi> portinfo ?
- <braunr> portinfo gives information in a space
- <braunr> but this is specific to a port
- <braunr> is there an option for that ?
- <youpi> -v
- <braunr> hm ok
- <youpi> 25: send (refs: 550)
- <braunr> nice
- <braunr> youpi: if you have time, could you give me the min/max/avg numbers
- of send rights referring to the same port on buildds ?
- <braunr> i'm trying to estimate if it's better to have space->list_of_ports
- or port->list_of_spaces to replace the global ipc hash table
- <braunr> the latter seems better but there could be unexpected cases on
- machines using large amounts of resources like the buildds
- <youpi> max is 64k
- <youpi> min is 1 of course :)
- <braunr> 64k
- <braunr> then it's not what i'm looking for
- <youpi> avg is 55
- <braunr> isn't this the number of urefs ?
- <youpi> I don't know
- <braunr> hmm
- <braunr> what i'm looking for is the number of *pure send rights* for the
- same port
- <braunr> i don't think portinfo can give it
- <braunr> there can only be one such send right per task for the same port
- <braunr> 64k would mean there are 64k tasks
- <youpi> ok, so it's more difficult
- <youpi> it means using -t
- <braunr> ahh
- <youpi> and run n^2 portinfo over the n processes
- <braunr> i see
- <youpi> Mmm, that will however still show any duplicate send right
- <youpi> but then by min/max/avg, you mean, over time ?
- <braunr> i'll change the source code, simpler
- <youpi> e.g. min would be right after boot?
- <braunr> min is 1
- <youpi> 1 what ?
- <braunr> 1 send right to a port
- <youpi> ah, 1 for a given port
- <braunr> yes
- <youpi> ok, it becomes really hairy to compute, I don't hav ethe time :)
- <braunr> avg and max are more interesting :)
- <braunr> no worries
- <youpi> braunr: I wouldn't be surprised that max is the number of tasks
- <youpi> e.g. for a send port to the proc server for instance
- <braunr> youpi: it is, but i'm not looking for potential numbers
- <youpi> I'm not talking about a potential number, but an actual number
- that's almost always true
- <braunr> for one port, yes
- <braunr> but yes, ok for max
- <braunr> this makes choosing an appropriate data structure difficult
-
- <antrik> braunr: actually, min number of send rights to a port is 0... but
- I'm sure you know that already :-)
-
- <antrik> youpi: normally each client gets a separate port. I'm not sure
- there are any ports with send rights distributed over many tasks...
-
- <jkoenig> antrik, what about / ?
-
- <youpi> antrik: not necessarily
-
- <antrik> jkoenig: not sure... isn't the "/" port authenticated to the
- specific user?
-
- <jkoenig> antrik, I guess so, but a single user could still have many
- tasks.
- <jkoenig> (wrt /)
- <antrik> jkoenig: well, in theory the tasks having exactly the same UIDs
- and GITs could probably share an auth token... but that's not how things
- are handled in general
- <antrik> at least I don't think so
- <antrik> tasks are authenticated, not users
- <antrik> err... GIDs :-)
- <jkoenig> antrik, still, my quick glance to the fork() code seemed to
- indicate the port is inherited as-is, maybe authentication happens only
- when something is actually looked up?
- <jkoenig> hmm "rpctrace ls -d /" does not show any authentication calls,
- only a lookup("") on the root which returns a different port
- <jkoenig> so I guess the root port is "deauthenticated" or something when
- the uid of a process is changed.
- <antrik> too bad cfhammer isn't around, he digged into all this stuff...
- <antrik> I know that there is a mechanism which reauths all FDs when the
- IDs of a process change
- <antrik> but I'm not sure the "/" port uses this mechanism
-
- <braunr> antrik: but the radix tree codee is really used as is, which means
- no locking, no preloading before locking, all of this because simple
- locks don't exist on UP, and because the kernel isn't preemptible
-
-[[microkernel/mach/gnumach/preemption]].
-
- <braunr> antrik: and yes, min number is 0, but in that case you don't need
- (space, port) -> right lookups :)
- <braunr> antrik: or put in another way, whichever reasonable structure you
- use, when it's empty, you don't care much
- <braunr> which also means that the min number has actually no value here
- <braunr> because the same applies to 1
-
- <braunr> then what seems to take most time is forks
- <braunr> and i hope my upcoming kernel changes will help the situation a
- bit
- <pinotree> what are your incoming gnumach changes about?
- <braunr> the ipc translation layer speed
- <braunr> which basically means operating on port names (looking them up,
- inserting, removing, renaming, looking up send rights to a specific
- ports) will be faster
- <braunr> and i believe forks are (one of) the most demanding use cases wrt
- ipc space manipulation
- <braunr> i'm really surprised how badly the splay trees are used
- <braunr> the worst case for this data structure is traversal
- <braunr> and this is done in many situations
- <braunr> leaving the tree in its worst case shape
- <braunr> and i didn't mentioned the bunch of memory writes occurring, event
- for things like lookups or traversals
- <braunr> this is slow and can disrupt many cpu cache lines
- <braunr> and when there are 10k to 100+k (e.g. in some ext2fs instances on
- buildds), just imagine the number of operations involved in those
- operations
- <braunr> a simple traversal_next involves a rotation *gasp*
- <braunr> this means potentially writing on 3 different cache lines, for
- *one* next operation
- <pinotree> what are you replacing that splay tree with?
- <braunr> radix trees
- <braunr> much shorter paths
- <braunr> extremely few memory writes
- <braunr> locality of reference when traversing
- <braunr> good cache usage (as many of the top nodes are reused)
- <braunr> the two drawbacks are 1/ memory allocation for external nodes,
- which means the tree must be preloaded before locking
- <braunr> and 2/ high memory overhead if the keys are sparse
- <braunr> but this isn't the case with port names, unless someone messes it
- up by assigning random names to many rights
-
- <antrik> braunr: so, when will we see the first performance comparision?
- :-)
- <braunr> antrik: that's a topic of itself, how to compare ?
- <braunr> antrik: the thing is, my current gnumach patches only makes
- assertions
- <braunr> this is the best way i found to test my tree in real life
- conditions
- <braunr> much cleanup is needed
- <braunr> and what i'd like to do is to completely replace all teh
- translation layer structures with it
- <braunr> which means removing much code, making sure it still works as
- expected
- <braunr> this is tedious
- <braunr> so one month at least
- <antrik> braunr: comparing shouldn't be too hard... the average configure
- script does a lot of forking, which should be a good benchmark according
- to your observations
- <braunr> rough estimates are easy, yes
- <braunr> but my observations my be wrong :p
- <antrik> braunr: well, we don't really need precise numbers...
- <antrik> unless you need to do some kind of fine-tuning?
- <braunr> i don't know yet
-
-
-# IRC, freenode, #hurd, 2011-06-18
-
- < braunr> hmm, i'm having a problem with integrating my radix tree code in
- gnumach
- < braunr> inserting into such a tree can trigger memory allocation
- < braunr> so commonly, the tree i loaded with nodes before insertion,
- usually if it requires strong locking
- < braunr> ipc spaces are locked using "simple locks" (which are spin locks)
- < braunr> but spin locks are noops on UP, and gnumach is always UP ..
- < braunr> so, should i still include preloading code, even if it'll end up
- dead code ?
- < antrik> hm... I think we discussed this before; but isn't gnumach
- supposed to be SMP-capable, minus bugs?...
- < braunr> it is
- < braunr> but ofc, if i choose not to include preloading, i'll write
- #errors so that the day gnumach is built for SMP again, such support will
- be included
- < antrik> oh, sorry, I think I misread. what is UP?
- < braunr> uniprocessor
- < antrik> well, if it's just bugs forcing the current UP state, I think
- saying that gnumach is always UP is a stretch...
- < braunr> sure, but it's a practical consideration
- < antrik> does the locking complicate stuff? or is it just performance
- considerations?
- < braunr> no it's about correctness and completeness
- < braunr> if you don't preload a tree before locking
- < braunr> and memory allocation occurs while you're holding a simple lock
- < braunr> and memory allocation requires the kernel to sleep
- < braunr> you're screwed
- < braunr> but i hate the idea of including code that won't be used and
- which won't be easy to test
- < braunr> so i'm wondering if it's ok for now to just put this in a TODO
- comment and write it when the time is right
- < braunr> or if i should spens the week adding this and tweaking the
- userspace implementation to "emulate" spin locks
- < antrik> well, it's tricky situation. on one hand, it seems stupid to
- include handling for something that presently isn't used, and it's not
- clear when it will. on the other hand, I'd rather not see additional
- problems introduced that will make fixing SMP even harder...
- < braunr> that's why i'm asking here
- < antrik> of course, you could resolve this question by fixing SMP
- first... ;-)
- < braunr> ew
- < antrik> well, I guess it would be best first to make the code work... and
- we can still decide about the locking thing before it goes mainline I'd
- say?
- < braunr> "make the code work" ?
- < antrik> I mean make gnumach work with your radix tree code
- < braunr> without preloading then
- < antrik> yeah... as a first step... I guess adding it later won't be any
- harder than adding it right now?
- < braunr> not much
- < braunr> testing is what requires time really
-
-
-# IRC, freenode, #hurd, 2011-06-27
-
- < braunr> ok, here is the radix tree code:
- http://git.sceen.net/rbraun/libbraunr.git/
- < braunr> the preloading stuff will be added in the kernel only, as it's
- really pointless and not easily doable in userspace
- < youpi> preloading?
- < braunr> youpi: yes, preloading
- < braunr> radix trees allocate external nodes
- < youpi> well, providing a url at some random time of some random day is
- not a great way to get eyes on it :)
- < braunr> and ipc spaces are locked when inserting/allocating names
- < braunr> we normally don't need preloading in gnumach
- < braunr> since there is no preemption nor SMP
-
-[[microkernel/mach/gnumach/preemption]].
-
- < braunr> but in case someone changes that, i'd like the code to be mostly
- ready
- < braunr> and correctly handle those ugly simple locks
- < braunr> youpi: is what i say clear enough or do you need more background
- on what is done ?
- < youpi> about preloading?
- < braunr> yes
- < youpi> I guess it means allocating nodes in advance?
- < braunr> yes
- < youpi> k
- < braunr> before locking the ipc spaces
-
-
-# IRC, freenode, #hurd, 2011-06-28
-
- < braunr> antrik: i think i won't write the code for the preloading stuff
- actually
- < braunr> antrik: it's not very difficult, but i really hate the idea of
- not being able to reliably test it
- < braunr> antrik: and i'd rather concentrate on integrating the radix tree
- code in gnu mach now
- < braunr> (i've already removed much code, even some files which weren't
- actually used before my changes !)
- < braunr> hmm, i won't be able not to write the preloading code after all
- < antrik> braunr: not able not to write? how's that?
- < braunr> antrik: it's actually required
- < braunr> there are three functions, ipc_entry_get, ipc_entry_alloc, and
- ipc_entry_grow_table
- < braunr> ipc_entry_get cannot allocate memory
- < braunr> if it fails, ipc_entry_grow_table is called, which will allocate
- memory
- < braunr> ipc_entry_alloc calls both of them depending on the result of
- ipc_entry_get
- < braunr> this is the equivalent of the preloading thing i had in mind
- < braunr> not a bad thing after all
- < braunr> the only thing i'm afraid of are the "optimized" version of those
- ipc functions in te so-called fast paths
- < braunr> i'm afraid if i don't deal right with those, the kernel may end
- up using mostly slow paths
- < braunr> but considering the purpose of those fast paths was merely to
- avoid the overhead of function calls and some locking functions, it
- shouldn't be that bad
- < braunr> this is such a mess eh
- < antrik> hurray microoptimisations ;-)
- < braunr> there, the preload functions are done, easy :)
- < antrik> braunr: seems you spent less time implementing it than pondering
- whether you should implement it ;-)
- < braunr> well, i couldn't implement it correctly before knowing what
- should have been done exactly
- < braunr> and there are still other problems :/
- < braunr> and the other problems make me reconsider if this was useful at
- all eh
- < braunr> youpi: i'm unable to find where ipc tree entries are released
- except in ipc_entry_alloc_name(), which could mean they're leaked ...
- < braunr> youpi: would you have time to take a look ?
- < youpi> they aren't in ipc_entry_dealloc() ?
- < braunr> no .....
- < youpi> it's not so unprobable that they're only freed when the task quits
- < braunr> i don't see that either
- < braunr> i only see them released in ipc_entry_alloc_name()
- < braunr> so maybe they're reused
- < braunr> but i'm not sure about that when reading the code
- < braunr> oh wait, yes, they are :/
- < braunr> my bad
- < youpi> in the ipc_splay_tree_* fucntions I guess?
- < braunr> yes
- < braunr> it's just surprsing to see them allocated outside the tree code
- only
- < braunr> but released in both the entry and the splay tree code ...
-
-
-# IRC, freenode, #hurd, 2011-06-29
-
- < braunr> hmm i missed an important thing :/
- < braunr> and it's scary
- < braunr> it looks like the splay tree is mainly used when names are
- provided
- < braunr> whereas the entry table is used when names are allocated
- < braunr> which means the table is the main ipc data structure, even for
- tasks with lots of rights
- < braunr> i can make my root ext2fs have more than 10k rights, and i see
- the ipc table table grow along that number ...
- < braunr> now thetable has 15k+ entries
- < braunr> IOW there is no point to put the radix tree code in gnumach :(
- < antrik> braunr: what do you mean by "provided" and "allocated"?
- < antrik> and what is that table you are talking about?
- < braunr> antrik: provided means the user space tasks gives the name of the
- new right
- < braunr> antrik: allocated means the kernel generates it
- < braunr> antrik: the table i'm talking about is is_table in struct
- ipc_space
- < braunr> 55 * Every space has a non-NULL is_table with
- is_table_size entries.
- < braunr> 56 * A space may have a NULL is_tree. is_tree_small
- records the
- < braunr> 57 * number of entries in the tree that, if the table were
- to grow
- < braunr> 58 * to the next larger size, would move from the tree to
- the table.
- < braunr> here is the description which mislead me (in addition of the
- obscure code)
- < braunr> 50 * Spaces hold capabilities for ipc_object_t's (ports
- and port sets).
- < braunr> 51 * Each ipc_entry_t records a capability. Most
- capabilities have
- < braunr> 52 * small names, and the entries are elements of a table.
- < braunr> 53 * Capabilities can have large names, and a splay tree
- holds
- < braunr> 54 * those entries. The cutoff point between the table
- and the tree
- < braunr> 55 * is adjusted dynamically to minimize memory
- consumption.
- < antrik> ah, so the rights with a low name are in a linear table, and only
- those with "random" high names are stored in the splay tree instead?
- < antrik> seems a rather complex design... I guess though there isn't much
- room for further optimisation there :-(
- < antrik> (well, except for code size optimisation -- which could in fact
- make a considerable difference...)
- < braunr> well there are problems with this approach, but most don't
- concern performance
- < braunr> when the table gets big (close to the page size or more), it gets
- remapped when reallocated
- < braunr> which will incur some penalty because of the tlb
- < braunr> but it's annoying even for small tables
- < braunr> the initial table size is 4 entries, and from what i can see,
- most tables are 128 entries wide when tasks are destroyed
- < braunr> an obvious simple optimization is to set a larger default size
- < braunr> the same applies for the dead name tables
- < braunr> those reallocations are a pain, and they're due to this design
- < braunr> they can also fail because of fragmentation
- < braunr> there would be a point to radix trees if they would replace all
- that, and not just the splay tree
- < braunr> but this would cause a lot of changes in a lot of places, and in
- particular the "optimized" fast paths i mentioned yesterday
- < braunr> we'll see how they perform in x15 :>
- < braunr> there is a slight noticeable improvement when increasing the
- initial size of the entry table
- < antrik> braunr: well, if you use them in a completely different
- implementation, there will be no way of telling whether they make a
- difference
- < antrik> how did you test the improvement?
- < braunr> antrik: no actually it's completely negligeable
- < braunr> hm
- < braunr> is that a valid word ? :)
- < braunr> negligible
- < braunr> youpi: did you see my comments about the ipc stuff this earlier
- today ?
- < braunr> youpi: well to make things short, when port names are allocated,
- the right they refer to is allocated from the ipc table
- < braunr> youpi: the splay tree is only used for user provided names
- < braunr> youpi: i had tables as large as the number of rights in a space
- (i could easily reach 20k)
- < braunr> youpi: whereas the splay trees had at most ~40 entries ..