From 8050ba0991b1542f708ada5ae7eca596f6a8099d Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Tue, 26 Apr 2011 11:50:30 +0200 Subject: IRC. --- open_issues/ext2fs_page_cache_swapping_leak.mdwn | 126 ++++ open_issues/gnumach_memory_management.mdwn | 772 +++++++++++++++++++++++ open_issues/gnumach_memory_management/pmap.out | 85 +++ open_issues/rework_gnumach_ipc_spaces.mdwn | 241 +++++++ 4 files changed, 1224 insertions(+) create mode 100644 open_issues/gnumach_memory_management.mdwn create mode 100644 open_issues/gnumach_memory_management/pmap.out create mode 100644 open_issues/rework_gnumach_ipc_spaces.mdwn (limited to 'open_issues') diff --git a/open_issues/ext2fs_page_cache_swapping_leak.mdwn b/open_issues/ext2fs_page_cache_swapping_leak.mdwn index 0ace5cd3..575196d8 100644 --- a/open_issues/ext2fs_page_cache_swapping_leak.mdwn +++ b/open_issues/ext2fs_page_cache_swapping_leak.mdwn @@ -21,3 +21,129 @@ IRC, OFTC, #debian-hurd, 2011-03-24 so the swap tends to accumulate unuseful stuff, i see yes the disk content, basicallyt :) + +IRC, freenode, #hurd, 2011-04-18 + + damn, a cp -a simply gobbles down swap space... + really ? + that's weird + why would a copy use so much anonymous memory ? + unless the external pager is so busy that the kernel falls back to + its default pager + that's what I suggested some time ago + maybe this case should be traced in the kernel + a simple message in the kernel buffer to warn that this condition + happened may help + I'm seeing swap space being kept used on buildds for no real reason + except possibly backing ext2fs pages + that could help, yes + youpi: I think it was actually slpz who suggested that... + I think we're generally missing feedback from memory behavior + youpi: do you think andrei's kernel instrumentation work might be + helpful with analyzing such things? + antrik: I think I suggested it too, but never mind + antrik: no, because it's not a trace of events that you want + some specific events would be useful + but then we don't really need a whole framework for that + apt-get upgrade eats swap too + the upgrade itself, or the computation of the ugprade? + apt is a memory eater nowadays + installing the packages + seems to have stabilized though after a while... + so perhaps it's not a leak in this case + ideally we should have a way to know what was put in the swap + how would you represent what's in the swap ? + the apt-get process has 46M of virtual memory above the 128 M + baseline + mostly libraries i guess + are trheads stacks 8 MiB like on Linux ? + braunr: at least knowing how much of each process is in the swap + braunr: 2MiB + ok + vminfo could also report which parts of the address space are in + the swap + youpi: would be nice to have some simple utility reporting how + much of a process' address space is anonymous + (in fact, I wonder why it's not reported by standard tools such as + ps or top... this shouldn't be too difficult I would think?) + it would be much more useful information than the total virt size, + which includes rather meaningless disk and device mappings... + agreed + well + there are tools like pmap for this + unfortunately, it's difficult in mach to know what backs a + non-anonymous mapping + pagers should be able to name their mappings + that'd be helpful for debugging yes + there is almost no overhead in doing that, and it would be very + useful + and could lead to /proc/pid/maps + yes + isn't there a maps already ? + nope + ok + (probably not very useful without the names) + ithought i remembered maps without names, and guessed it might + have been on the hurd for that reason + but i'm not sure + there's the vminfo command, yes + 14:06 < youpi> braunr: at least knowing how much of each process + is in the swap + wouldn't it be clearer to do it the other way around ? + like a swapinfo tool indicating what it contains ? + sure, but it's a lot more difficult + really ? + why ? + because you have to traverse all the mappings + etc + (in all processes, I mean) + and you have to name what is waht + there are other ways + the swap is a central structure + while simply introducing the swap % in vminfo + for a given process you know what is what + right + and doing that introduction is probably very simple + that's a good point + top-down is effectively easier than bottom-up resolution in Mach + VM + hm... the memory use caused by cp doesn't seem to be reflected in + the virtual size of any particular process + ghost memory + what's cp vmsize at the time of the problem ? + it's at 134 M right now... so considering the 128 M baseline, + nothing worth speaking of + right + maybe a copy map during I/O + but I don't know Mach copy maps in detail, as they have been + eliminated from UVM + BTW, the memory eatup happens even before swap comes into + play... swapping seems to be a result of the problem, not the cause + what do you mean ? + I thought swapping was the issue + you mean RAM is full before swapping ? + well, I don't know what the actual problem is... I just don't + understand why the memory use increases without any particular process + seeing an increase in size + the "free" size in vmstat decreses + once it's eatun up, swap space use increases + well it doesn't change much of it + the anonymous memory pager will use RAM before resorting to the + external default-pager + I would suspect normal block caching... but then, shouldn't this + show up in the memory info of the ext2 process? + although, again, I'm not sure of the behaviour of the anonymous + memory pager + antrik: I don't know how block caching behaves + BTW, is it a know problem that doing ^C on a "cp -a" seems to hang + the whole system?... + (the whole hurd instance that is... the other instance is not + affected) + not that I know of + seems like a deadlock in the anonymous memory handling + (and I've never seen that) + happens both in my main system (using ancient hurd/libc) and in my + subhurd (recently upgraded to current stuff) + this make testing this stuff quite a lot harder... [sigh] + any suggestions how to debug this hang? + antrik: no :/ diff --git a/open_issues/gnumach_memory_management.mdwn b/open_issues/gnumach_memory_management.mdwn new file mode 100644 index 00000000..c85c88e3 --- /dev/null +++ b/open_issues/gnumach_memory_management.mdwn @@ -0,0 +1,772 @@ +[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!tag open_issue_gnumach]] + +IRC, freenode, #hurd, 2011-04-12: + + braunr: do you think the allocator you wrote for x15 could be used + for gnumach? and would you be willing to mentor this? :-) + antrik: to be willing to isn't my current problem + antrik: and yes, I think my allocator can be used + it's a slab allocator after all, it only requires reap() and + grow() + or mmap()/munmap() whatever you want to call it + a backend + antrik: although i've been having other ideas recently + that would have more impact on our usage patterns I think + mcsim: have you investigated how the zone allocator works and how + it's hooked into the system yet? + mcsim: now let me give you a link + mcsim: + http://git.sceen.net/rbraun/libbraunr.git/?a=blob;f=mem.c;h=330436e799f322949bfd9e2fedf0475660309946;hb=HEAD + mcsim: this is an implementation of the slab allocator i've been + working on recently + mcsim: i haven't made it public because i reworked the per + processor layer, and this part isn't complete yet + mcsim: you could use it as a reference for your project + braunr: ok + it used to be close to the 2001 vmem paper + but after many tests, fragmentation and accounting issues have + been found + so i rewrote it to be closer to the linux implementation (cache + filling/draining in bukl transfers) + bulk* + they actually use the word draining in linux too :) + antrik: not complete yet. + braunr: oh, it's unfinished? that's unfortunate... + antrik: only the per processor part + antrik: so it doesn't matter much for gnumach + and it's not difficult to set up + mcsim: hm, OK... but do you think you will have a fairly good + understanding in the next couple of days?... + I'm asking because I'd really like to see a proposal a bit more + specific than "I'll look into things..." + i.e. you should have an idea which things you will actually have + to change to hook up a new allocator etc. + braunr: OK. will the interface remain unchanged, so it could be + easily replaced with an improved implementation later? + the zone allocator in gnumach is a badly written bare object + allocator actually, there aren't many things to understand about it + antrik: yes + great :-) + and the per processor part should be very close to the phys + allocator sitting next to it + (with the slight difference that, as per cpu caches have variable + sizes, they are allocated on the free path rather than on the allocation + path) + this is a nice trick in the vmem paper i've kept in mind + and the interface also allows to set a "source" for caches + ah, good point... do you think we should replace the physmem + allocator too? and if so, do it in one step, or one piece at a time?... + no + too many drivers currently depend on the physical allocator and + the pmap module as they are + remember linux 2.0 drivers need a direct virtual to physical + mapping + (especially true for dma mappings) + OK + the nice thing about having a configurable memory source is that + whot do you mean by "allocated on the free path"? + even if most caches will use the standard vm_kmem module as their + backend + there is one exception in the vm_map module, allowing us to get + rid of either a static limit, or specific allocation code + antrik: well, when you allocate a page, the allocator will lookup + one in a per cpu cache + if it's empty, it fills the cache + (called pools in my implementations) + it then retries + the problem in the slab allocator is that per cpu caches have + variable sizes + so per cpu pools are allocated from their own pools + (remember the magazine_xx caches in the output i showed you, this + is the same thing) + but if you allocate them at allocation time, you could end up in + an infinite loop + so, in the slab allocator, when a per cpu cache is empty, you just + fall back to the slab layer + on the free path, when a per cpu cache doesn't exist, you allocate + it from its own cache + this way you can't have an infinite loop + antrik: I'll try, but I have exams now. + As I understand amount of elements which could be allocated we + determine by zone initialization. And at this time memory for zone is + reserved. I'm going to change this. And make something similar to kmalloc + and vmalloc (support for pages consecutive physically and virtually). And + pages in zones consecutive always physically. + Am I right? + mcsim: don't try to do that + why? + mcsim: we just need a slab allocator with an interface close to + the zone allocator + mcsim: IIRC the size of the complete zalloc map is fixed; but not + the number of elements per zone + we don't need two allocators like kmalloc and vmalloc + actually we just need vmalloc + IIRC the limits are only present because the original developers + wanted to track leaks + they assumed zones would be large enough, which isn't true any + more today + but i didn't see any true reservation + antrik: i'm not sure i was clear enough about the "allocation of + cpu caches on the free path" + antrik: for a better explanation, read the vmem paper ;) + braunr: you mean there is no fundamental reason why the zone map + has a limited maximal size; and it was only put in to catch cases where + something eats up all memory with kernel object creation?... + braunr: I think I got it now :-) + antrik: i'm pretty certin of it yes + I don't see though how it is related to what we were talking + about... + 10:55 < braunr> and the per processor part should be very close to + the phys allocator sitting next to it + the phys allocator doesn't have to use this trick + because pages have a fixed size, so per cpu caches all have the + same size too + and the number of "caches", that is, physical segments, is limited + and known at compile time + so having them statically allocated is possible + I see + it would actually be very difficult to have a phys allocator + requiring dynamic allocation when the dynamic allocator isn't yet ready + hehe :-) + total size of all zone allocations is limited to 12 MB. And is "was + only put in to catch cases where something eats up all memory with kernel + object creation?" + mcsim: ah right, there could be a kernel submap backing all the + zones + but this can be increased too + submaps are kind of evil :/ + mcsim: I think it's actually 32 MiB or something like that in the + Debian version... + braunr: I'm not sure I ever fully understood what the zalloc map + is... I looked through the code once, and I think I got a rough + understading, but I was still pretty uncertain about some bits. and I + don't remember the details anyways :-) + antrik: IIRC, it's a kernel submap + it's named kmem_map in x15 + don't know what a submap is + submaps are vm_map objects + in a top vm_map, there are vm_map_entries + these entries usually point to vm_objects + (for the page cache) + but they can point to other maps too + the goal is to reduce fragmentation by isolating allocations + this also helps reducing contention + for exemple, on BSD, there is a submap for mbufs, so that the + network code doesn't interfere too much with other kernel allocations + antrik: they are similar to spans in vmem, but vmem has an elegant + importing mechanism which eliminates the static limit problem + so memory is not directly allocated from the physical allocator, + but instead from another map which in turn contains physical memory, or + something like that?... + no, this is entirely virtual + submaps are almost exclusively used for the kernel_map + you are using a lot of identifies here, but I don't remember (or + never knew) what most of them mean :-( + sorry :) + the kernel map is the vm_map used to represent the ~1 GiB of + virtual memory the kernel has (on i386) + vm_map objects are simple virtual space maps + they contain what you see in linux when doing /proc/self/maps + cat /proc/self/maps + (linux uses entirely different names but it's roughly the same + structure) + each line is a vm_map_entry + (well, there aren't submaps in linux though) + the pmap tool on netbsd is able to show the kernel map with its + submaps, but i don't have any image around + braunr: is limit for zones is feature and shouldn't be changed? + mcsim: i think we shouldn't have fixed limits for zones + mcsim: this should be part of the debugging facilities in the slab + allocator + is this fixed limit really a major problem ? + i mean, don't focus on that too much, there are other issues + requiring more attention + braunr: at 12 MiB, it used to be, causing a lot of zalloc + panics. after increasing, I don't think it's much of a problem anymore... + but as memory sizes grow, it might become one again + that's the problem with a fixed size... + yes, that's the issue with submaps + but gnumach is full of those, so let's fix them by order of + priority + well, I'm still trying to digest what you wrote about submaps :-) + i'm downloading netbsd, so you can have a good view of all this + so, when the kernel allocates virtual address space regions + (mostly for itself), instead of grabbing chunks of the address space + directly, it takes parts out of a pre-reserved region? + not exactly + both statements are true + antrik: only virtual addresses are reserved + it grabs chunks of the address space directly, but does so in a + reserved region of the address space + a submap is like a normal map, it has a start address, a size, and + is empty, then it's populated with vm_map_entries + so instead of allocating from 3-4 GiB, you allocate from, say, + 3.1-3.2 GiB + yeah, that's more or less what I meant... + braunr: I see two problems: limited zones and absence of caching. + with caching absence of readahead paging will be not so significant + please avoid readahead + ok + and it's not about paging, it's about kernel memory, which is + wired + (well most of it) + what about limited zones ? + the whole kernel space is limited, there has to be limits + the problem is how to handle them + braunr: almost all. I looked through all zones once, and IIRC I + found exactly one that actually allows paging... + currently, when you reach the limit, you have an OOM error + antrik: yes, there are + i don't remember which implementation does that but, when + processes haven't been active for a minute or so, they are "swapedout" + completely + even the kernel stack + and the page tables + (most of the pmap structures are destroyed, some are retained) + that might very well be true... at least inactive processes often + show up with 0 memory use in top on Hurd + this is done by having a pageable kernel map, with wired entries + when the swapper thread swaps tasks out, it unwires them + but i think modern implementations don't do that any more + well, I was talking about zalloc only :-) + oh + so the zalloc_map must be pageable + or there are two submaps ? + not sure whether "morden implementations" includes Linux ;-) + no, i'm talking about the bsd family only + but it's certainly true that on Linux even inactive processes + retain some memory + linux doesn't make any difference between processor-bound and + I/O-bound processes + braunr: I have no idea how it works. I just remember that when + creating zones, one of the optional flags decides whether the zone is + pagable. but as I said, IIRC there is exactly one that actually is... + zone_map = kmem_suballoc(kernel_map, &zone_min, &zone_max, + zone_map_size, FALSE); + kmem_suballoc(parent, min, max, size, pageable) + so the zone_map isn't + IIRC my conclusion was that pagable zones do not count in the + fixed zone map limit... but I'm not sure anymore + zinit() has a memtype parameter + with ZONE_PAGEABLE as a possible flag + this is wierd :) + There is no any zones which use ZONE_PAGEABLE flag + mcsim: are you sure? I think I found one... + if (zone->type & ZONE_PAGEABLE) { + admittedly, it is several years ago that I looked into this, so my + memory is rather dim... + if (kmem_alloc_pageable(zone_map, &addr, ... + calling kmem_alloc_pageable() on an unpageable submap seems wrong + I've greped gnumach code and there is no any zinit procedure call + with ZONE_PAGEABLE flag + good + hm... perhaps it was in some code that has been removed + alltogether since ;-) + actually I think it would be pretty neat to have pageable kernel + objects... but I guess it would require considerable effort to implement + this right + mcsim: you also mentioned absence of caching + mcsim: the zone allocator actually is a bare caching object + allocator + antrik: no, it's easy + antrik: i already had that in x15 0.1 + antrik: the problem is being sure the objects you allocate from a + pageable backing store are never used when resolving a page fault + that's all + I wouldn't expect that to be easy... but surely you know better + :-) + braunr: indeed. I was wrong. + braunr: what is a caching object allocator?... + antrik: ok, it's not easy + antrik: but once you have vm_objects implemented, having pageable + kernel object is just a matter of using the right options, really + antrik: an allocator that caches its buffers + some years ago, the term "object" would also apply to + preconstructed buffers + I have no idea what you mean by "caches its buffers" here :-) + well, a memory allocator which doesn't immediately free its + buffers caches them + braunr: but can it return objects to system? + mcsim: which one ? + yeah, obviously the *implementation* of pageable kernel objects is + not hard. the tricky part is deciding which objects can be pageable, and + which need to be wired... + Can zone allocator return cached objects to system as in slab? + I mean reap() + well yes, it does so, and it does that too often + the caching in the zone allocator is actually limited to the + pagesize + once page is completely free, it is returned to the vm + this is bad caching + yes + if object takes all page than there is now caching at all + caching by side effect + true + but the linux slab allocator does the same thing :p + hm + no, the solaris slab allocator does so + linux's slab returns objects only when system ask + without preconstructed objects, is there actually any point in + caching empty slabs?... + Once I've changed my allocator to slab and it cached more than 1GB + of my memory) + ok wait, need to fix a few mistakes first + s/ask/asks + the zone allocator (in gnumach) actually has a garbage collector + braunr: well, the Solaris allocator follows the slab/magazine + paper, right? so there is caching at the magazine layer... in that case + caching empty slabs too would be rather redundant I'd say... + which is called when running low on memory, similar to the slab + allocaotr + antrik: yes + (or rather the paper follows the Solaris allocator ;-) ) + mcsim: the zone allocator reap() is zone_gc() + braunr: hm, right, there is a "collectable" flag for zones... but + I never understood what it means + braunr: BTW, I heard Linux has yet another allocator now called + "slob"... do you happen to know what that is? + slob is a very simple allocator for embedded devices + AFAIR this is just heap allocator + useful when you have a very low amount of memory + like 1 MiB + yes + just googled it :-) + zone and slab are very similar + sounds like a simple heap allocator + there is another allocator that calls slub, and it better than slab + in many cases + the main difference is the data structures used to store slabs + mcsim: i disagree + mcsim: ah, you already said that :-) + mcsim: slub is better for systems with very large amounts of + memory and processors + otherwise, slab is better + in addition, there are accounting issues with slub + because of cache merging + ok. This strange that slub is default allocator + well both are very good + iirc, linus stated that he really doesn't care as long as its + works fine + he refused slqb because of that + slub is nice because it requires less memory than slab, while + still being as fast for most cases + it gets slower on the free path, when the cpu performing the free + is different from the one which allocated the object + that's a reasonable cost + slub uses heap for large object. Are there any tests that compare + what is better for large objects? + well, if slub requires less memory, why do you think slab is + better for smaller systems? :-) + antrik: smaller is relative + mcsim: for large objects slab allocation is rather pointless, as + you don't have multiple objects in a page anyways... + antrik: when lameter wrote slub, it was intended for systems with + several hundreds processors + BTW, was slqb really refused only because the other ones are "good + enough"?... + yes + wow, that's a strange argument... + linus is already unhappy of having "so many" allocators + well, if the new one is better, it could replace one of the others + :-) + or is it useful only in certain cases? + that's the problem + nobody really knows + hm, OK... I guess that should be tested *before* merging ;-) + is anyone still working on it, or was it abandonned? + mcsim: back to caching... + what does caching in the kernel object allocator got to do with + readahead (i.e. clustered paging)?... + if we cached some physical pages we don't need to find new ones for + allocating new object. And that's why there will not be a page fault. + antrik: Regarding kam. Hasn't he finished his project? + err... what? + one of us must be seriously confused + I totally fail to see what caching of physical pages (which isn't + even really a correct description of what slab does) has to do with page + faults + right, KAM didn't finish his project + If we free the physical page and return it to system we need + another one for next allocation. But if we keep it, we don't need to find + new physical page. + And physical page is allocated only then when page fault + occurs. Probably, I'm wrong + what does "return to system" mean? we are talking about the + kernel... + zalloc/slab are about allocating kernel objects. this doesn't have + *anything* to do with paging of userspace processes + only thing the have in common is that they need to get pages from + the physical page allocator. but that's yet another topic + Under "return to system" I mean ability to use this page for other + needs. + mcsim: consider kernel memory to be wired + here, return to system means releasing a page back to the vm + system + the vm_kmem module then unmaps the physical page and free its + virtual address in the kernel map + ok + antrik: the problem with new allocators like slqb is that it's + very difficult to really know if they're better, even with extensive + testing + antrik: there are papers (like wilson95) about the difficulties in + making valuable results in this field + see + http://www.sceen.net/~rbraun/dynamic_storage_allocation_a_survey_and_critical_review.pdf + how can be allocated physically continuous object now? + mcsim: rephrase please + what is similar to kmalloc in Linux to gnumach? + i know memory is reserved for dma in a direct virtual to physical + mapping + so even if the allocation is done similarly to vmalloc() + the selected region of virtual space maps physical memory, so + memory is physically contiguous too + for other allocation types, a block large enough is allocated, so + it's contiguous too + I don't clearly understand. If we have fragmentation in physical + ram, so there aren't 2 free pages in a row, but there are able apart, we + can't to allocate these 2 pages along? + no + but every system has this problem + But since we have only 12 or 32 MB of memory the problem becomes + more significant + you're confusing virtual and physical memory + those 32 MiB are virtual + the physical pages backing them don't have to be contiguous + Oh, indeed + So the only problem are limits? + and performance + and correctness + i find the zone allocator badly written + antrik: mcsim: here is the content of the kernel pmap on NetBSD + (which uses a virtual memory system close to the Mach VM) + antrik: mcsim: http://www.sceen.net/~rbraun/pmap.out + +[[pmap.out]] + + you can see the kmem_map (which is used for most general kernel + allocations) is 128 MiB large + actually it's not the kernel pmap, it's the kernel_map + braunr: why is it called pmap.out then? ;-) + antrik: because the tool is named pmap + for process map + it also exists under Linux, although direct access to + /proc/xx/maps gives more info + braunr: I've said that this is kernel_map. Can I see kernel_map for + Linux? + mcsim: I don't know how to do that + s/I've/You've + but Linux doesn't have submaps, and uses a direct virtual to + physical mapping, so it's used differently + how are things (such as zalloc zones) entered into kernel_map? + in zone_init() you have + zone_map = kmem_suballoc(kernel_map, &zone_min, &zone_max, + zone_map_size, FALSE); + so here, kmem_map is named zone_map + then, in zalloc() + kmem_alloc_wired(zone_map, &addr, zone->alloc_size) + so, kmem_alloc just deals out chunks of memory referenced directly + by the address, and without knowing anything about the use? + kmem_alloc() gives virtual pages + zalloc() carves them into buffers, as in the slab allocator + the difference is essentially the lack of formal "slab" object + which makes the zone code look like a mess + so kmem_suballoc() essentially just takes a bunch of pages from + the main kernel_map, and uses these to back another map which then in + turn deals out pages just like the main kernel_map? + no + kmem_suballoc creates a vm_map_entry object, and sets its start + and end address + and creates a vm_map object, which is then inserted in the new + entry + maybe that's what you meant with "essentially just takes a bunch + of pages from the main kernel_map" + but there really is no allocation at this point + except the map entry and the new map objects + well, I'm trying to understand how kmem_alloc() manages things. so + it has map_entry structures like the maps of userspace processes? do + these also reference actual memory objects? + kmem_alloc just allocates virtual pages from a vm_map, and backs + those with physical pages (unless the user requested pageable memory) + it's not "like the maps of userspace processes" + these are actually the same structures + a vm_map_entry can reference a memory object or a kernel submap + in netbsd, it can also referernce nothing (for pure wired kernel + memory like the vm_page array) + maybe it's the same in mach, i don't remember exactly + antrik: this is actually very clear in vm/vm_kern.c + kmem_alloc() creates a new kernel object for the allocation + allocates a new entry (or uses a previous existing one if it can + be extended) through vm_map_find_entry() + then calls kmem_alloc_pages() to back it with wired memory + "creates a new kernel object" -- what kind of kernel object? + kmem_alloc_wired() does roughly the same thing, except it doesn't + need a new kernel object because it knows the new area won't be pageable + a simple vm_object + used as a container for anonymous memory in case the pages are + swapped out + vm_object is the same as memory object/pager? or yet something + different? + antrik: almost + antrik: a memory_object is the user view of a vm_object + as in the kernel/user interfaces used by external pagers + vm_object is a more internal name + Is fragmentation a big problem in slab allocator? + I've tested it on my computer in Linux and for some caches it + reached 30-40% + well, fragmentation is a major problem for any allocator... + the original slab allocator was design specifically with the goal + of reducing fragmentation + the revised version with the addition of magazines takes a step + back on this though + have you compared it to slub? would be pretty interesting... + I have an idea how can it be decreased, but it will hurt by + performance... + antrik: no I haven't, but there will be might the same, I think + if each cache will handle two types of object: with sizes that will + fit cache sizes (or I bit smaller) and with sizes which are much smaller + than maximal cache size. For first type of object will be used standard + slab allocator and for latter type will be used (within page) heap + allocator. + I think that than fragmentation will be decreased + not at all. heap allocator has much worse fragmentation. that's + why slab allocator was invented + the problem is that in a long-running program (such an the + kernel), objects tend to have vastly varying lifespans + but we use heap only for objects of specified sizes + so often a few old objects will keep a whole page hostage + for example for 32 byte cache it could be 20-28 byte objects + that's particularily visible in programs such as firefox, which + will grow the heap during use even though actual needs don't change + the slab allocator groups objects in a fashion that makes it more + likely adjacent objects will be freed at similar times + well, that's pretty oversimplyfied, but I hope you get the + idea... it's about locality + I agree, but I speak not about general heap allocation. We have + many heaps for objects with different sizes. + Could it be better? + note that this has been a topic of considerable research. you + shouldn't seek to improve the actual algorithms -- you would have to read + up on the existing research at least before you can contribute anything + to the field :-) + how would that be different from the slab allocator? + slab will allocate 32 byte for both 20 and 32 byte requests + And if there was request for 20 bytes we get 12 unused + oh, you mean the implementation of the generic allocator on top of + slabs? well, that might not be optimal... but it's not an often used case + anyways. mostly the kernel uses constant-sized objects, which get their + own caches with custom tailored size + I don't think the waste here matters at all + affirmative. So my idea is useless. + does the statistic you refer to show the fragmentation in absolute + sizes too? + Can you explain what is absolute size? + I've counted what were requested (as parameter of kmalloc) and what + was really allocated (according to best fit cache size). + how did you get that information? + I simply wrote a hook + I mean total. i.e. how many KiB or MiB are wasted due to + fragmentation alltogether + ah, interesting. how does it work? + BTW, did you read the slab papers? + Do you mean articles from lwn.net? + no + I mean the papers from the Sun hackers who invented the slab + allocator(s) + Bonwick mostly IIRC + Yes + hm... then you really should know the rationale behind it... + There he says about 11% percent of memory waste + you didn't answer my other questions BTW :-) + I've corrupted kernel tree with patch, and tomorrow I'm going to + read myself up for exam (I have it on Thursday). But than I'll send you a + module which I've used for testing. + OK + I can send you module now, but it will not work without patch. + It would be better to rewrite it using debugfs, but when I was + writing this test I didn't know about trace_* macros + +2011-04-15 + + There is a hack in zone_gc when it allocates and frees two + vm_map_kentry_zone elements to make sure the gc will be able to allocate + two in vm_map_delete. Isn't it better to allocate memory for these + entries statically? + mcsim: that's not the point of the hack + mcsim: the point of the hack is to make sure vm_map_delete will be + able to allocate stuff + allocating them statically will just work once + it may happen several times that vm_map_delete needs to allocate it + while it's empty (and thus zget_space has to get called, leading to a + hang) + funnily enough, the bug is also in macos X + it's still in my TODO list to manage to find how to submit the + issue to them + really ? + eh + is that because of map entry splitting ? + it's git commit efc3d9c47cd744c316a8521c9a29fa274b507d26 + braunr: iirc something like this, yes + netbsd has this issue too + possibly + i think it's a fundamental problem with the design + people think of munmap() as something similar to free() + whereas it's really unmap + with a BSD-like VM, unmap can easily end up splitting one entry in + two + but your issue is more about harmful recursion right ? + I don't remember actually + it's quite some time ago :) + ok + i think that's why i have "sources" in my slab allocator, the + default source (vm_kern) and a custom one for kernel map entries + +2011-04-18 + + braunr: you've said that once page is completely free, it is + returned to the vm. + who else, besides zone_gc, can return free pages to the vm? + mcsim: i also said i was wrong about that + zone_gc is the only one + +2011-04-19 + + antrik: mcsim: i added back a new per-cpu layer as planned + + http://git.sceen.net/rbraun/libbraunr.git/?a=blob;f=mem.c;h=c629b2b9b149f118a30f0129bd8b7526b0302c22;hb=HEAD + mcsim: btw, in mem_cache_reap(), you can clearly see there are two + loops, just as in zone_gc, to reduce contention and avoid deadlocks + this is really common in memory allocators + +2011-04-23 + + I've looked through some allocators and all of them use different + per cpu cache policy. AFAIK gnuhurd doesn't support multiprocessing, but + still multiprocessing must be kept in mind. So, what do you think what + kind of cpu caches is better? As for me I like variant with only per-cpu + caches (like in slqb). + mcsim: well, have you looked at the allocator braunr wrote + himself? :-) + I'm not sure I suggested that explicitly to you; but probably it + makes most sense to use that in gnumach + +2011-04-24 + + antrik: Yes, I have. He uses both global and per cpu caches. But he + also suggested to look through slqb, where there are only per cpu + caches.\ + i don't remember slqb in detail + what do you mean by "only per-cpu caches" ? + a whole slab sytem for each cpu ? + I mean that there are no global queues in caches, but there are + special queues for each cpu. + I've just started investigating slqb's code, but I've read an + article on lwn about it. And I've read that it is used for zen kernel. + zen ? + Here is this article http://lwn.net/Articles/311502/ + Yes, this is linux kernel with some patches which haven't been + approved to torvald's tree + http://zen-kernel.org/ + i see + well it looks nice + but as for slub, the problem i can see is cross-CPU freeing + and I think nick piggins mentions it + piggin* + this means that sometimes, objects are "burst-free" from one cpu + cache to another + which has the same bad effects as in most other allocators, mainly + fragmentation + There is a special list for freeing object allocated for another + CPU + And garbage collector frees such object on his own + so what's your question ? + It is described in the end of article. + What cpu-cache policy do you think is better to implement? + at this point, any + and even if we had a kernel that perfectly supports + multiprocessor, I wouldn't care much now + it's very hard to evaluate such allocators + slqb looks nice, but if you have the same amount of fragmentation + per slab as other allocators do (which is likely), you have tat amount of + fragmentation multiplied by the number of processors + whereas having shared queues limit the problem somehow + having shared queues mean you have a bit more contention + so, as is the case most of the time, it's a tradeoff + by the way, does pigging say why he "doesn't like" slub ? :) + piggin* + http://lwn.net/Articles/311093/ + here he describes what slqb is better. + well it doesn't describe why slub is worse + but not very particularly + except for order-0 allocations + and that's a form of fragmentation like i mentioned above + in mach those problems have very different impacts + the backend memory isn't physical, it's the kernel virtual space + so the kernel allocator can request chunks of higher than order-0 + pages + physical pages are allocated one at a time, then mapped in the + kernel space + Doesn't order of page depend on buffer size? + it does + And why does gnumach allocates higher than order-0 pages more? + why more ? + i didn't say more + And why in mach those problems have very different impact? + ? + i've just explained why :) + 09:37 < braunr> physical pages are allocated one at a time, then + mapped in the kernel space + "one at a time" means order-0 pages, even if you allocate higher + than order-0 chunks + And in Linux they allocated more than one at time because of + prefetching page reading? + do you understand what virtual memory is ? + linux allocators allocate "physical memory" + mach kernel allocator allocates "virtual memory" + so even if you allocate a big chunk of virtual memory, it's backed + by order-0 physical pages + yes, I understand this + you don't seem to :/ + the problem of higher than order-0 page allocations is + fragmentation + do you see why ? + yes + so + fragmentation in the kernel space is less likely to create issues + than it does in physical memory + keep in mind physical memory is almost always full because of the + page cache + and constantly under some pressure + whereas the kernel space is mostly empty + so allocating higher then order-0 pages in linux is more dangerous + than it is in Mach or BSD + ok + on the other hand, linux focuses pure performance, and not having + to map memory means less operations, less tlb misses, quicker allocations + the Mach VM must map pages "one at a time", which can be expensive + it should be adapted to handle multiple page sizes (e.g. 2 MiB) so + that many allocations can be made with few mappings + but that's not easy + as always: tradeoffs + There are other benefits of physical allocating. In big DMA + transfers can be needed few continuous physical pages. How does mach + handles such cases? + gnumach does that awfully + it just reserves the whole DMA-able memory and uses special + allocation functions on it, IIRC + but kernels which have a MAch VM like memory sytem such as BSDs + have cleaner methods + NetBSD provides a function to allocate contiguous physical memory + with many constraints + FreeBSD uses a binary buddy system like Linux + the fact that the kernel allocator uses virtual memory doesn't + mean the kernel has no mean to allocate contiguous physical memory ... diff --git a/open_issues/gnumach_memory_management/pmap.out b/open_issues/gnumach_memory_management/pmap.out new file mode 100644 index 00000000..b1af1e66 --- /dev/null +++ b/open_issues/gnumach_memory_management/pmap.out @@ -0,0 +1,85 @@ +Start End Size Offset rwxpc RWX I/W/A Dev Inode - File +c0000000-c16c1fff 23304k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +c16c2000-c16c2fff 4k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +c16c3000-c16e2fff 128k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +c16e3000-c999cfff 133864k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ kmem_map ] + c16e3000-c16e3fff 4k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] + c16e4000-c1736fff 332k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] + c1737000-c1737fff 4k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] + c1738000-c1766fff 188k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] + c1767000-c1767fff 4k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] + c1768000-c182dfff 792k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] + c182e000-c182efff 4k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] + c182f000-c187bfff 308k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] + c187c000-c187cfff 4k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] + c187d000-c187dfff 4k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] + c1880000-c189ffff 128k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +c999d000-ca99cfff 16384k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ pager_map ] +ca99d000-ca9b7fff 108k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +ca9b8000-ca9b9fff 8k 0a9b8000 rwxs- (rwx) 2/0/1 00:00 0 - [ uvm_aobj ] +ca9ba000-ca9bbfff 8k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +ca9bc000-ca9bffff 16k 0a9bc000 rwxs- (rwx) 2/0/1 00:00 0 - [ uvm_aobj ] +ca9c0000-ca9dffff 128k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +ca9e0000-cab0bfff 1200k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ phys_map ] +cab0c000-cad16fff 2092k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ mb_map ] + cab0c000-cab0cfff 4k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] + cab0d000-cab3afff 184k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +cad17000-cad26fff 64k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +cad27000-cad2cfff 24k 0ad27000 rwxs- (rwx) 2/0/1 00:00 0 - [ uvm_aobj ] +cad2d000-cad2dfff 4k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +cad2e000-cad2ffff 8k 0ad2e000 rwxs- (rwx) 2/0/1 00:00 0 - [ uvm_aobj ] +cad30000-cae0ffff 896k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +cae10000-cae11fff 8k 0ae10000 rwxs- (rwx) 2/0/1 00:00 0 - [ uvm_aobj ] +cae12000-cae81fff 448k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +cae82000-cae83fff 8k 0ae82000 rwxs- (rwx) 2/0/1 00:00 0 - [ uvm_aobj ] +cae84000-caecbfff 288k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +caecc000-caecdfff 8k 0aecc000 rwxs- (rwx) 2/0/1 00:00 0 - [ uvm_aobj ] +caece000-caecefff 4k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +caecf000-caecffff 4k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +caed0000-caed1fff 8k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +caed2000-caed3fff 8k 0aed2000 rwxs- (rwx) 2/0/1 00:00 0 - [ uvm_aobj ] +caed4000-caee5fff 72k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +caee6000-caee9fff 16k 0aee6000 rwxs- (rwx) 2/0/1 00:00 0 - [ uvm_aobj ] +caeea000-caeeefff 20k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +caeef000-caef4fff 24k 0aeef000 rwxs- (rwx) 2/0/1 00:00 0 - [ uvm_aobj ] +caef5000-cb00cfff 1120k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +cb00d000-cb01cfff 64k 0b00d000 rwxs- (rwx) 2/0/1 00:00 0 - [ uvm_aobj ] +cb01d000-cb02afff 56k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +cb02b000-cb82afff 8192k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ ubc_pager ] +cb82b000-cb838fff 56k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +cb839000-cb839fff 4k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +cb83a000-cb83bfff 8k 0b83a000 rwxs- (rwx) 2/0/1 00:00 0 - [ uvm_aobj ] +cb83c000-cb855fff 104k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +cb856000-cb857fff 8k 0b856000 rwxs- (rwx) 2/0/1 00:00 0 - [ uvm_aobj ] +cb858000-cb858fff 4k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +cb859000-cb85cfff 16k 0b859000 rwxs- (rwx) 2/0/1 00:00 0 - [ uvm_aobj ] +cb85d000-cb85dfff 4k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +cb85e000-cb85ffff 8k 0b85e000 rwxs- (rwx) 2/0/1 00:00 0 - [ uvm_aobj ] +cb860000-cb88ffff 192k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +cb890000-cb8cffff 256k 0b890000 rwxs- (rwx) 2/0/1 00:00 0 - [ uvm_aobj ] +cb8d0000-cb8f0fff 132k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +cb8f1000-cb8f4fff 16k 0b8f1000 rwxs- (rwx) 2/0/1 00:00 0 - [ uvm_aobj ] +cb8f5000-cba03fff 1084k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +cba04000-cba04fff 4k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +cba05000-cbaf1fff 948k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +cbaf2000-cbaf3fff 8k 0baf2000 rwxs- (rwx) 2/0/1 00:00 0 - [ uvm_aobj ] +cbaf4000-cbaf7fff 16k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +cbaf8000-cbafffff 32k 0baf8000 rwxs- (rwx) 2/0/1 00:00 0 - [ uvm_aobj ] +cbb00000-cbb70fff 452k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +cbb71000-cbb76fff 24k 0bb71000 rwxs- (rwx) 2/0/1 00:00 0 - [ uvm_aobj ] +cbb77000-cbb7bfff 20k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +cbb7c000-cbb7ffff 16k 0bb7c000 rwxs- (rwx) 2/0/1 00:00 0 - [ uvm_aobj ] +cbb80000-cbbc1fff 264k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +cbbc2000-cbbc2fff 4k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +cbbc3000-cbbc3fff 4k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +cbbc4000-cbbc5fff 8k 0bbc4000 rwxs- (rwx) 2/0/1 00:00 0 - [ uvm_aobj ] +cbbc6000-cbbc8fff 12k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +cbbc9000-cbbcafff 8k 0bbc9000 rwxs- (rwx) 2/0/1 00:00 0 - [ uvm_aobj ] +cbbcb000-cbbcdfff 12k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +cbbce000-cbbcffff 8k 0bbce000 rwxs- (rwx) 2/0/1 00:00 0 - [ uvm_aobj ] +cbbd0000-cbca1fff 840k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +cbca2000-cbcadfff 48k 0bca2000 rwxs- (rwx) 2/0/1 00:00 0 - [ uvm_aobj ] +cbcae000-cbcaefff 4k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] +cbcaf000-cbcb2fff 16k 0bcaf000 rwxs- (rwx) 2/0/1 00:00 0 - [ uvm_aobj ] +cbcc0000-cbcdffff 128k 00000000 rwxs- (rwx) 2/0/1 00:00 0 - [ anon ] + total 193356k diff --git a/open_issues/rework_gnumach_ipc_spaces.mdwn b/open_issues/rework_gnumach_ipc_spaces.mdwn new file mode 100644 index 00000000..c0b7c8dd --- /dev/null +++ b/open_issues/rework_gnumach_ipc_spaces.mdwn @@ -0,0 +1,241 @@ +[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!tag open_issue_gnumach]] + +IRC, freenode, #hurd, 2011-04-23 + + youpi: is there any use of the port renaming facility ? + I don't know + at least, did you see such use ? + i wonder why mach mach_port_insert_right() lets the caller specify + the port name + ../hurd-debian/hurd/serverboot/default_pager.c: kr = + mach_port_rename( default_pager_self, + mach_port_rename() is used only once, in the default pager + so it's not that important + but mach_port_insert_right() lets userspace task decide the port + name value + just to repeat myself again, I don't know port stuff very much :) + well you know that a port denotes a right, which denotes a port + yes, but I don't have any real experience with it + err + port name + the only reason I see is that the caller, say /hurd/exec running a + fork() + hm + no, i don't even see the reason here + port names should be allocated by the kernel only, like file + descriptors + you can choose file descriptor values too + really ? + with dup2, yes + oh + hm + what's the data structure in current unices to store file + descriptors ? + a hash table ? + I don't know + i'll have to look at that + FYI, i'm asking these questions because i'm thinking of reworking + ipc spaces + i believe the use of splay trees completely destroys performance + of tasks with many many port names such as the root file system + that can be a problem yes + since there are 3 ports per opened file, and like 3 per thread too + + the page cache + with a few thousand opened files and threads, that makes a lot + by "opened file" I meant page cache actually + i saw numbers up to 30k + ok + on buildds I easily see 100k ports + for a single task ? + wow + yes + the page cache is 4k files + so that's definitely worth the try + so that already makes 12k ports + and 4k is not so big + it's limited to 4K ? + I haven't been able to check where the 100k come from yet + braunr: yas + could be leaks :/ + yes + omg, a hard limit on the page cache .. + vm/vm_object.c:int vm_object_cached_max = 4000; /* may + be patched*/ + mach is really old :( + I've raised it + before it was 200 + ... + oO + I tried to dro pthe limit, but then I was lacking memory + which I believe have fixed the other day, but I have to test again + that implementation doesn't know how to deal with memory pressure + yes + i saw your recent changes about adding warnings in such cases + so, back to ipc spaces + i think splay trees 1/ can get very unbalanced easily + which isn't hard to imagine + and 2/ make poor usage of the cpu caches because they're BST and + write a lot to memory + maybe you could write a patch which would dump statistics on that? + that's part of the job i'm assigning to myself + ok + i'd like to try replacing splay trees with radix trees + I can run it on the buildds + buildds are very good stress-tests :) + :) + 22h building -> 77k ports + 26h building -> 97k ports + the problem is that when I add leak debugging (backtraces), I'm + getting out of memory :) + that will be a small summer of code outside the gsoc :p + :/ + backtraces are very consuming + but that's only because of hardcoded limits + I'll have to test again with bigger limits + again .. + evil hard limits + well, actually we could as well just drop them + but we'd also need to easily get statistics on zone/vm_maps usage + because else we don't see leaks + (except that the machine eventually crashes) + hm + i haven't explained why i was asking my questions actually + so, i want radix trees, because they're nice + they reduce the paths lengths + they don't get too unbalanced (they're invariant wrt the order of + operations) + they don't need to write to memory on lookups + the only drawback is that they can create much overhead if their + usage pattern isn't appropriate + elements in such a structure should be close, so that they share + common nodes + the common usage pattern in ext2fs is a big bunch of ever-open + ports :) + if there is one entry per node, it's a big waste + yes + there are 3, actually + but the port names have low values + they're allocated sequentially, beginning at 0 + (or 1 actually) + which is perfect for radix trees + yes + 97989: send + but if anyone can rename + this introduces a new potential weakness + ah, if it's just a weakness it's probably not a problem + I thought it was even a no-go + i think so + I guess port rename is very seldom + but in a future version, it would be nice not to allow port + renaming + unless there are similar issues in current unix kernels + in which case i'd say it's acceptable + there are + of that order ? + and it'd be useful for e.g. processing + tracing/debugging/tweaking/whatever + it's also used to hide fds from a process + port renaming you mean ? + you allocate them very high + yes + ok + choosing your port name, generally + to match what the process expects for instance + then it would be a matter of resource limiting (which we totally + lack afaik) + along the number of maximum open files, you would have a number of + maximum rights + does that seem fine to you ? + if done throught rlimits, sure + something similar yes + (_no_ PORTS_MAX ;) ) + oh and, in addition, i remember gnumach has a special + configuration of the processor in which caching is limited + like write-through only + didn't I fix that recently ? + i don't know :) + CR0=e001003b + i don't think it's fixed + I mean, in the git + ah + not in the debian package + didn't tried the git version yet + last time i tried (which was a long time ago), it made the kernel + crash + have you figured why ? + I'm not aware of that + anyway, splay trees write a lot, and most trees write a lot even + at insertion/removal to rebalance + braunr: Mmm, there's no clearance of CD in the kernel actually + with radix trees, even if caching can't be fully enabled, it would + make much better use of it + so if port renaming isn't a true issue, i'll choose that data + structure + that'd probably be better yes + I'm surprised by the CD, I do remember fixing something like this + lately + there are several levels where CD can be set + the processors ORs all those if i'm right + to determine if caching is enabled + I know + ok + but in my memory that was at the CR* level, precisely + maybe for xen only ? + no + well good luck if you hunt that one, i'm off, see you :) + braunr: ah, no, it was the PGE flag that I had fixed + + braunr: explicit port naming is used for example to pass some + initial ports to a new task at well-known places IIRC + braunr: but these tend to be low numbers, so I don't see a problem + there + (I'm not familiar with radix trees... why would high numbers be a + problem?) + + braunr: iirc the ipc space is limited to ~192k ports + + antrik: in most cases i've seen, the insert_right() call is used + on task_self() + and if there really are special ports (like the bootstrap or + device ports), they should have special names + IIRC, these ports are given through command line expansion by the + kernel at boot time + but it seems reasonable to think of port renaming as a potentially + useful feature + antrik: the problem with radix trees isn't them being high, it's + them being sparse + you get the most efficient trees when entries have keys that are + close to each other + because radix trees are a type of tries (the path in the tree is + based on the elements composing the key) + so the more common prefixes you have, the less external nodes you + need + here, keys are port names, but they can be memory addresses or + offsets in memory objects (like in the page cache) + the radix algorithm takes a few bits, say 4 or 6, at a time from a + key, and uses that as an index in a node + if keys are sparse, there can be as little as one entry per node + IIRC, the worst case (on entry per node with the maximum possible + number of nodes for a 32-bits key) is 2% entries + the reste being null entries and almost-empty nodes containing + them + so if you leave the ability to give port rights the names you + want, you can create such worst case trees + which may consume several MiB of memory per tree + tens of MiB i'd say + on the other hand, in the current state, almost all hurd + applications use sequentially allocated port names, close to 0 (which + allows a nice optimization) + so a radix ree would be the most efficient + well, if some processes really feel they must use random numbers + for port names, they *ought* to be penalized ;-) -- cgit v1.2.3