From 95878586ec7611791f4001a4ee17abf943fae3c1 Mon Sep 17 00:00:00 2001 From: "https://me.yahoo.com/a/g3Ccalpj0NhN566pHbUl6i9QF0QEkrhlfPM-#b1c14" Date: Mon, 16 Feb 2015 20:08:03 +0100 Subject: rename open_issues.mdwn to service_solahart_jakarta_selatan__082122541663.mdwn --- .../gnumach_memory_management.mdwn | 2391 ++++++++++++++++++++ 1 file changed, 2391 insertions(+) create mode 100644 service_solahart_jakarta_selatan__082122541663/gnumach_memory_management.mdwn (limited to 'service_solahart_jakarta_selatan__082122541663/gnumach_memory_management.mdwn') diff --git a/service_solahart_jakarta_selatan__082122541663/gnumach_memory_management.mdwn b/service_solahart_jakarta_selatan__082122541663/gnumach_memory_management.mdwn new file mode 100644 index 00000000..b36c674a --- /dev/null +++ b/service_solahart_jakarta_selatan__082122541663/gnumach_memory_management.mdwn @@ -0,0 +1,2391 @@ +[[!meta copyright="Copyright © 2011, 2012, 2013, 2014 Free Software Foundation, +Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!tag open_issue_gnumach]] + +There is a [[!FF_project 266]][[!tag bounty]] on this task. + +[[!toc]] + + +# IRC, freenode, #hurd, 2011-04-12 + + braunr: do you think the allocator you wrote for x15 could be used + for gnumach? and would you be willing to mentor this? :-) + antrik: to be willing to isn't my current problem + antrik: and yes, I think my allocator can be used + it's a slab allocator after all, it only requires reap() and + grow() + or mmap()/munmap() whatever you want to call it + a backend + antrik: although i've been having other ideas recently + that would have more impact on our usage patterns I think + mcsim: have you investigated how the zone allocator works and how + it's hooked into the system yet? + mcsim: now let me give you a link + mcsim: + http://git.sceen.net/rbraun/libbraunr.git/?a=blob;f=mem.c;h=330436e799f322949bfd9e2fedf0475660309946;hb=HEAD + mcsim: this is an implementation of the slab allocator i've been + working on recently + mcsim: i haven't made it public because i reworked the per + processor layer, and this part isn't complete yet + mcsim: you could use it as a reference for your project + braunr: ok + it used to be close to the 2001 vmem paper + but after many tests, fragmentation and accounting issues have + been found + so i rewrote it to be closer to the linux implementation (cache + filling/draining in bukl transfers) + bulk* + they actually use the word draining in linux too :) + antrik: not complete yet. + braunr: oh, it's unfinished? that's unfortunate... + antrik: only the per processor part + antrik: so it doesn't matter much for gnumach + and it's not difficult to set up + mcsim: hm, OK... but do you think you will have a fairly good + understanding in the next couple of days?... + I'm asking because I'd really like to see a proposal a bit more + specific than "I'll look into things..." + i.e. you should have an idea which things you will actually have + to change to hook up a new allocator etc. + braunr: OK. will the interface remain unchanged, so it could be + easily replaced with an improved implementation later? + the zone allocator in gnumach is a badly written bare object + allocator actually, there aren't many things to understand about it + antrik: yes + great :-) + and the per processor part should be very close to the phys + allocator sitting next to it + (with the slight difference that, as per cpu caches have variable + sizes, they are allocated on the free path rather than on the allocation + path) + this is a nice trick in the vmem paper i've kept in mind + and the interface also allows to set a "source" for caches + ah, good point... do you think we should replace the physmem + allocator too? and if so, do it in one step, or one piece at a time?... + no + too many drivers currently depend on the physical allocator and + the pmap module as they are + remember linux 2.0 drivers need a direct virtual to physical + mapping + (especially true for dma mappings) + OK + the nice thing about having a configurable memory source is that + whot do you mean by "allocated on the free path"? + even if most caches will use the standard vm_kmem module as their + backend + there is one exception in the vm_map module, allowing us to get + rid of either a static limit, or specific allocation code + antrik: well, when you allocate a page, the allocator will lookup + one in a per cpu cache + if it's empty, it fills the cache + (called pools in my implementations) + it then retries + the problem in the slab allocator is that per cpu caches have + variable sizes + so per cpu pools are allocated from their own pools + (remember the magazine_xx caches in the output i showed you, this + is the same thing) + but if you allocate them at allocation time, you could end up in + an infinite loop + so, in the slab allocator, when a per cpu cache is empty, you just + fall back to the slab layer + on the free path, when a per cpu cache doesn't exist, you allocate + it from its own cache + this way you can't have an infinite loop + antrik: I'll try, but I have exams now. + As I understand amount of elements which could be allocated we + determine by zone initialization. And at this time memory for zone is + reserved. I'm going to change this. And make something similar to kmalloc + and vmalloc (support for pages consecutive physically and virtually). And + pages in zones consecutive always physically. + Am I right? + mcsim: don't try to do that + why? + mcsim: we just need a slab allocator with an interface close to + the zone allocator + mcsim: IIRC the size of the complete zalloc map is fixed; but not + the number of elements per zone + we don't need two allocators like kmalloc and vmalloc + actually we just need vmalloc + IIRC the limits are only present because the original developers + wanted to track leaks + they assumed zones would be large enough, which isn't true any + more today + but i didn't see any true reservation + antrik: i'm not sure i was clear enough about the "allocation of + cpu caches on the free path" + antrik: for a better explanation, read the vmem paper ;) + braunr: you mean there is no fundamental reason why the zone map + has a limited maximal size; and it was only put in to catch cases where + something eats up all memory with kernel object creation?... + braunr: I think I got it now :-) + antrik: i'm pretty certin of it yes + I don't see though how it is related to what we were talking + about... + 10:55 < braunr> and the per processor part should be very close to + the phys allocator sitting next to it + the phys allocator doesn't have to use this trick + because pages have a fixed size, so per cpu caches all have the + same size too + and the number of "caches", that is, physical segments, is limited + and known at compile time + so having them statically allocated is possible + I see + it would actually be very difficult to have a phys allocator + requiring dynamic allocation when the dynamic allocator isn't yet ready + hehe :-) + total size of all zone allocations is limited to 12 MB. And is "was + only put in to catch cases where something eats up all memory with kernel + object creation?" + mcsim: ah right, there could be a kernel submap backing all the + zones + but this can be increased too + submaps are kind of evil :/ + mcsim: I think it's actually 32 MiB or something like that in the + Debian version... + braunr: I'm not sure I ever fully understood what the zalloc map + is... I looked through the code once, and I think I got a rough + understading, but I was still pretty uncertain about some bits. and I + don't remember the details anyways :-) + antrik: IIRC, it's a kernel submap + it's named kmem_map in x15 + don't know what a submap is + submaps are vm_map objects + in a top vm_map, there are vm_map_entries + these entries usually point to vm_objects + (for the page cache) + but they can point to other maps too + the goal is to reduce fragmentation by isolating allocations + this also helps reducing contention + for exemple, on BSD, there is a submap for mbufs, so that the + network code doesn't interfere too much with other kernel allocations + antrik: they are similar to spans in vmem, but vmem has an elegant + importing mechanism which eliminates the static limit problem + so memory is not directly allocated from the physical allocator, + but instead from another map which in turn contains physical memory, or + something like that?... + no, this is entirely virtual + submaps are almost exclusively used for the kernel_map + you are using a lot of identifies here, but I don't remember (or + never knew) what most of them mean :-( + sorry :) + the kernel map is the vm_map used to represent the ~1 GiB of + virtual memory the kernel has (on i386) + vm_map objects are simple virtual space maps + they contain what you see in linux when doing /proc/self/maps + cat /proc/self/maps + (linux uses entirely different names but it's roughly the same + structure) + each line is a vm_map_entry + (well, there aren't submaps in linux though) + the pmap tool on netbsd is able to show the kernel map with its + submaps, but i don't have any image around + braunr: is limit for zones is feature and shouldn't be changed? + mcsim: i think we shouldn't have fixed limits for zones + mcsim: this should be part of the debugging facilities in the slab + allocator + is this fixed limit really a major problem ? + i mean, don't focus on that too much, there are other issues + requiring more attention + braunr: at 12 MiB, it used to be, causing a lot of zalloc + panics. after increasing, I don't think it's much of a problem anymore... + but as memory sizes grow, it might become one again + that's the problem with a fixed size... + yes, that's the issue with submaps + but gnumach is full of those, so let's fix them by order of + priority + well, I'm still trying to digest what you wrote about submaps :-) + i'm downloading netbsd, so you can have a good view of all this + so, when the kernel allocates virtual address space regions + (mostly for itself), instead of grabbing chunks of the address space + directly, it takes parts out of a pre-reserved region? + not exactly + both statements are true + antrik: only virtual addresses are reserved + it grabs chunks of the address space directly, but does so in a + reserved region of the address space + a submap is like a normal map, it has a start address, a size, and + is empty, then it's populated with vm_map_entries + so instead of allocating from 3-4 GiB, you allocate from, say, + 3.1-3.2 GiB + yeah, that's more or less what I meant... + braunr: I see two problems: limited zones and absence of caching. + with caching absence of readahead paging will be not so significant + please avoid readahead + ok + and it's not about paging, it's about kernel memory, which is + wired + (well most of it) + what about limited zones ? + the whole kernel space is limited, there has to be limits + the problem is how to handle them + braunr: almost all. I looked through all zones once, and IIRC I + found exactly one that actually allows paging... + currently, when you reach the limit, you have an OOM error + antrik: yes, there are + i don't remember which implementation does that but, when + processes haven't been active for a minute or so, they are "swapedout" + completely + even the kernel stack + and the page tables + (most of the pmap structures are destroyed, some are retained) + that might very well be true... at least inactive processes often + show up with 0 memory use in top on Hurd + this is done by having a pageable kernel map, with wired entries + when the swapper thread swaps tasks out, it unwires them + but i think modern implementations don't do that any more + well, I was talking about zalloc only :-) + oh + so the zalloc_map must be pageable + or there are two submaps ? + not sure whether "morden implementations" includes Linux ;-) + no, i'm talking about the bsd family only + but it's certainly true that on Linux even inactive processes + retain some memory + linux doesn't make any difference between processor-bound and + I/O-bound processes + braunr: I have no idea how it works. I just remember that when + creating zones, one of the optional flags decides whether the zone is + pagable. but as I said, IIRC there is exactly one that actually is... + zone_map = kmem_suballoc(kernel_map, &zone_min, &zone_max, + zone_map_size, FALSE); + kmem_suballoc(parent, min, max, size, pageable) + so the zone_map isn't + IIRC my conclusion was that pagable zones do not count in the + fixed zone map limit... but I'm not sure anymore + zinit() has a memtype parameter + with ZONE_PAGEABLE as a possible flag + this is wierd :) + There is no any zones which use ZONE_PAGEABLE flag + mcsim: are you sure? I think I found one... + if (zone->type & ZONE_PAGEABLE) { + admittedly, it is several years ago that I looked into this, so my + memory is rather dim... + if (kmem_alloc_pageable(zone_map, &addr, ... + calling kmem_alloc_pageable() on an unpageable submap seems wrong + I've greped gnumach code and there is no any zinit procedure call + with ZONE_PAGEABLE flag + good + hm... perhaps it was in some code that has been removed + alltogether since ;-) + actually I think it would be pretty neat to have pageable kernel + objects... but I guess it would require considerable effort to implement + this right + mcsim: you also mentioned absence of caching + mcsim: the zone allocator actually is a bare caching object + allocator + antrik: no, it's easy + antrik: i already had that in x15 0.1 + antrik: the problem is being sure the objects you allocate from a + pageable backing store are never used when resolving a page fault + that's all + I wouldn't expect that to be easy... but surely you know better + :-) + braunr: indeed. I was wrong. + braunr: what is a caching object allocator?... + antrik: ok, it's not easy + antrik: but once you have vm_objects implemented, having pageable + kernel object is just a matter of using the right options, really + antrik: an allocator that caches its buffers + some years ago, the term "object" would also apply to + preconstructed buffers + I have no idea what you mean by "caches its buffers" here :-) + well, a memory allocator which doesn't immediately free its + buffers caches them + braunr: but can it return objects to system? + mcsim: which one ? + yeah, obviously the *implementation* of pageable kernel objects is + not hard. the tricky part is deciding which objects can be pageable, and + which need to be wired... + Can zone allocator return cached objects to system as in slab? + I mean reap() + well yes, it does so, and it does that too often + the caching in the zone allocator is actually limited to the + pagesize + once page is completely free, it is returned to the vm + this is bad caching + yes + if object takes all page than there is now caching at all + caching by side effect + true + but the linux slab allocator does the same thing :p + hm + no, the solaris slab allocator does so + linux's slab returns objects only when system ask + without preconstructed objects, is there actually any point in + caching empty slabs?... + Once I've changed my allocator to slab and it cached more than 1GB + of my memory) + ok wait, need to fix a few mistakes first + s/ask/asks + the zone allocator (in gnumach) actually has a garbage collector + braunr: well, the Solaris allocator follows the slab/magazine + paper, right? so there is caching at the magazine layer... in that case + caching empty slabs too would be rather redundant I'd say... + which is called when running low on memory, similar to the slab + allocaotr + antrik: yes + (or rather the paper follows the Solaris allocator ;-) ) + mcsim: the zone allocator reap() is zone_gc() + braunr: hm, right, there is a "collectable" flag for zones... but + I never understood what it means + braunr: BTW, I heard Linux has yet another allocator now called + "slob"... do you happen to know what that is? + slob is a very simple allocator for embedded devices + AFAIR this is just heap allocator + useful when you have a very low amount of memory + like 1 MiB + yes + just googled it :-) + zone and slab are very similar + sounds like a simple heap allocator + there is another allocator that calls slub, and it better than slab + in many cases + the main difference is the data structures used to store slabs + mcsim: i disagree + mcsim: ah, you already said that :-) + mcsim: slub is better for systems with very large amounts of + memory and processors + otherwise, slab is better + in addition, there are accounting issues with slub + because of cache merging + ok. This strange that slub is default allocator + well both are very good + iirc, linus stated that he really doesn't care as long as its + works fine + he refused slqb because of that + slub is nice because it requires less memory than slab, while + still being as fast for most cases + it gets slower on the free path, when the cpu performing the free + is different from the one which allocated the object + that's a reasonable cost + slub uses heap for large object. Are there any tests that compare + what is better for large objects? + well, if slub requires less memory, why do you think slab is + better for smaller systems? :-) + antrik: smaller is relative + mcsim: for large objects slab allocation is rather pointless, as + you don't have multiple objects in a page anyways... + antrik: when lameter wrote slub, it was intended for systems with + several hundreds processors + BTW, was slqb really refused only because the other ones are "good + enough"?... + yes + wow, that's a strange argument... + linus is already unhappy of having "so many" allocators + well, if the new one is better, it could replace one of the others + :-) + or is it useful only in certain cases? + that's the problem + nobody really knows + hm, OK... I guess that should be tested *before* merging ;-) + is anyone still working on it, or was it abandonned? + mcsim: back to caching... + what does caching in the kernel object allocator got to do with + readahead (i.e. clustered paging)?... + if we cached some physical pages we don't need to find new ones for + allocating new object. And that's why there will not be a page fault. + antrik: Regarding kam. Hasn't he finished his project? + err... what? + one of us must be seriously confused + I totally fail to see what caching of physical pages (which isn't + even really a correct description of what slab does) has to do with page + faults + right, KAM didn't finish his project + If we free the physical page and return it to system we need + another one for next allocation. But if we keep it, we don't need to find + new physical page. + And physical page is allocated only then when page fault + occurs. Probably, I'm wrong + what does "return to system" mean? we are talking about the + kernel... + zalloc/slab are about allocating kernel objects. this doesn't have + *anything* to do with paging of userspace processes + only thing the have in common is that they need to get pages from + the physical page allocator. but that's yet another topic + Under "return to system" I mean ability to use this page for other + needs. + mcsim: consider kernel memory to be wired + here, return to system means releasing a page back to the vm + system + the vm_kmem module then unmaps the physical page and free its + virtual address in the kernel map + ok + antrik: the problem with new allocators like slqb is that it's + very difficult to really know if they're better, even with extensive + testing + antrik: there are papers (like wilson95) about the difficulties in + making valuable results in this field + see + http://www.sceen.net/~rbraun/dynamic_storage_allocation_a_survey_and_critical_review.pdf + how can be allocated physically continuous object now? + mcsim: rephrase please + what is similar to kmalloc in Linux to gnumach? + i know memory is reserved for dma in a direct virtual to physical + mapping + so even if the allocation is done similarly to vmalloc() + the selected region of virtual space maps physical memory, so + memory is physically contiguous too + for other allocation types, a block large enough is allocated, so + it's contiguous too + I don't clearly understand. If we have fragmentation in physical + ram, so there aren't 2 free pages in a row, but there are able apart, we + can't to allocate these 2 pages along? + no + but every system has this problem + But since we have only 12 or 32 MB of memory the problem becomes + more significant + you're confusing virtual and physical memory + those 32 MiB are virtual + the physical pages backing them don't have to be contiguous + Oh, indeed + So the only problem are limits? + and performance + and correctness + i find the zone allocator badly written + antrik: mcsim: here is the content of the kernel pmap on NetBSD + (which uses a virtual memory system close to the Mach VM) + antrik: mcsim: http://www.sceen.net/~rbraun/pmap.out + +[[pmap.out]] + + you can see the kmem_map (which is used for most general kernel + allocations) is 128 MiB large + actually it's not the kernel pmap, it's the kernel_map + braunr: why is it called pmap.out then? ;-) + antrik: because the tool is named pmap + for process map + it also exists under Linux, although direct access to + /proc/xx/maps gives more info + braunr: I've said that this is kernel_map. Can I see kernel_map for + Linux? + mcsim: I don't know how to do that + s/I've/You've + but Linux doesn't have submaps, and uses a direct virtual to + physical mapping, so it's used differently + how are things (such as zalloc zones) entered into kernel_map? + in zone_init() you have + zone_map = kmem_suballoc(kernel_map, &zone_min, &zone_max, + zone_map_size, FALSE); + so here, kmem_map is named zone_map + then, in zalloc() + kmem_alloc_wired(zone_map, &addr, zone->alloc_size) + so, kmem_alloc just deals out chunks of memory referenced directly + by the address, and without knowing anything about the use? + kmem_alloc() gives virtual pages + zalloc() carves them into buffers, as in the slab allocator + the difference is essentially the lack of formal "slab" object + which makes the zone code look like a mess + so kmem_suballoc() essentially just takes a bunch of pages from + the main kernel_map, and uses these to back another map which then in + turn deals out pages just like the main kernel_map? + no + kmem_suballoc creates a vm_map_entry object, and sets its start + and end address + and creates a vm_map object, which is then inserted in the new + entry + maybe that's what you meant with "essentially just takes a bunch + of pages from the main kernel_map" + but there really is no allocation at this point + except the map entry and the new map objects + well, I'm trying to understand how kmem_alloc() manages things. so + it has map_entry structures like the maps of userspace processes? do + these also reference actual memory objects? + kmem_alloc just allocates virtual pages from a vm_map, and backs + those with physical pages (unless the user requested pageable memory) + it's not "like the maps of userspace processes" + these are actually the same structures + a vm_map_entry can reference a memory object or a kernel submap + in netbsd, it can also referernce nothing (for pure wired kernel + memory like the vm_page array) + maybe it's the same in mach, i don't remember exactly + antrik: this is actually very clear in vm/vm_kern.c + kmem_alloc() creates a new kernel object for the allocation + allocates a new entry (or uses a previous existing one if it can + be extended) through vm_map_find_entry() + then calls kmem_alloc_pages() to back it with wired memory + "creates a new kernel object" -- what kind of kernel object? + kmem_alloc_wired() does roughly the same thing, except it doesn't + need a new kernel object because it knows the new area won't be pageable + a simple vm_object + used as a container for anonymous memory in case the pages are + swapped out + vm_object is the same as memory object/pager? or yet something + different? + antrik: almost + antrik: a memory_object is the user view of a vm_object + as in the kernel/user interfaces used by external pagers + vm_object is a more internal name + Is fragmentation a big problem in slab allocator? + I've tested it on my computer in Linux and for some caches it + reached 30-40% + well, fragmentation is a major problem for any allocator... + the original slab allocator was design specifically with the goal + of reducing fragmentation + the revised version with the addition of magazines takes a step + back on this though + have you compared it to slub? would be pretty interesting... + I have an idea how can it be decreased, but it will hurt by + performance... + antrik: no I haven't, but there will be might the same, I think + if each cache will handle two types of object: with sizes that will + fit cache sizes (or I bit smaller) and with sizes which are much smaller + than maximal cache size. For first type of object will be used standard + slab allocator and for latter type will be used (within page) heap + allocator. + I think that than fragmentation will be decreased + not at all. heap allocator has much worse fragmentation. that's + why slab allocator was invented + the problem is that in a long-running program (such an the + kernel), objects tend to have vastly varying lifespans + but we use heap only for objects of specified sizes + so often a few old objects will keep a whole page hostage + for example for 32 byte cache it could be 20-28 byte objects + that's particularily visible in programs such as firefox, which + will grow the heap during use even though actual needs don't change + the slab allocator groups objects in a fashion that makes it more + likely adjacent objects will be freed at similar times + well, that's pretty oversimplyfied, but I hope you get the + idea... it's about locality + I agree, but I speak not about general heap allocation. We have + many heaps for objects with different sizes. + Could it be better? + note that this has been a topic of considerable research. you + shouldn't seek to improve the actual algorithms -- you would have to read + up on the existing research at least before you can contribute anything + to the field :-) + how would that be different from the slab allocator? + slab will allocate 32 byte for both 20 and 32 byte requests + And if there was request for 20 bytes we get 12 unused + oh, you mean the implementation of the generic allocator on top of + slabs? well, that might not be optimal... but it's not an often used case + anyways. mostly the kernel uses constant-sized objects, which get their + own caches with custom tailored size + I don't think the waste here matters at all + affirmative. So my idea is useless. + does the statistic you refer to show the fragmentation in absolute + sizes too? + Can you explain what is absolute size? + I've counted what were requested (as parameter of kmalloc) and what + was really allocated (according to best fit cache size). + how did you get that information? + I simply wrote a hook + I mean total. i.e. how many KiB or MiB are wasted due to + fragmentation alltogether + ah, interesting. how does it work? + BTW, did you read the slab papers? + Do you mean articles from lwn.net? + no + I mean the papers from the Sun hackers who invented the slab + allocator(s) + Bonwick mostly IIRC + Yes + hm... then you really should know the rationale behind it... + There he says about 11% percent of memory waste + you didn't answer my other questions BTW :-) + I've corrupted kernel tree with patch, and tomorrow I'm going to + read myself up for exam (I have it on Thursday). But than I'll send you a + module which I've used for testing. + OK + I can send you module now, but it will not work without patch. + It would be better to rewrite it using debugfs, but when I was + writing this test I didn't know about trace_* macros + + +# IRC, freenode, #hurd, 2011-04-15 + + There is a hack in zone_gc when it allocates and frees two + vm_map_kentry_zone elements to make sure the gc will be able to allocate + two in vm_map_delete. Isn't it better to allocate memory for these + entries statically? + mcsim: that's not the point of the hack + mcsim: the point of the hack is to make sure vm_map_delete will be + able to allocate stuff + allocating them statically will just work once + it may happen several times that vm_map_delete needs to allocate it + while it's empty (and thus zget_space has to get called, leading to a + hang) + funnily enough, the bug is also in macos X + it's still in my TODO list to manage to find how to submit the + issue to them + really ? + eh + is that because of map entry splitting ? + it's git commit efc3d9c47cd744c316a8521c9a29fa274b507d26 + braunr: iirc something like this, yes + netbsd has this issue too + possibly + i think it's a fundamental problem with the design + people think of munmap() as something similar to free() + whereas it's really unmap + with a BSD-like VM, unmap can easily end up splitting one entry in + two + but your issue is more about harmful recursion right ? + I don't remember actually + it's quite some time ago :) + ok + i think that's why i have "sources" in my slab allocator, the + default source (vm_kern) and a custom one for kernel map entries + + +# IRC, freenode, #hurd, 2011-04-18 + + braunr: you've said that once page is completely free, it is + returned to the vm. + who else, besides zone_gc, can return free pages to the vm? + mcsim: i also said i was wrong about that + zone_gc is the only one + + +# IRC, freenode, #hurd, 2011-04-19 + + antrik: mcsim: i added back a new per-cpu layer as planned + + http://git.sceen.net/rbraun/libbraunr.git/?a=blob;f=mem.c;h=c629b2b9b149f118a30f0129bd8b7526b0302c22;hb=HEAD + mcsim: btw, in mem_cache_reap(), you can clearly see there are two + loops, just as in zone_gc, to reduce contention and avoid deadlocks + this is really common in memory allocators + + +# IRC, freenode, #hurd, 2011-04-23 + + I've looked through some allocators and all of them use different + per cpu cache policy. AFAIK gnuhurd doesn't support multiprocessing, but + still multiprocessing must be kept in mind. So, what do you think what + kind of cpu caches is better? As for me I like variant with only per-cpu + caches (like in slqb). + mcsim: well, have you looked at the allocator braunr wrote + himself? :-) + I'm not sure I suggested that explicitly to you; but probably it + makes most sense to use that in gnumach + + +# IRC, freenode, #hurd, 2011-04-24 + + antrik: Yes, I have. He uses both global and per cpu caches. But he + also suggested to look through slqb, where there are only per cpu + caches.\ + i don't remember slqb in detail + what do you mean by "only per-cpu caches" ? + a whole slab sytem for each cpu ? + I mean that there are no global queues in caches, but there are + special queues for each cpu. + I've just started investigating slqb's code, but I've read an + article on lwn about it. And I've read that it is used for zen kernel. + zen ? + Here is this article http://lwn.net/Articles/311502/ + Yes, this is linux kernel with some patches which haven't been + approved to torvald's tree + http://zen-kernel.org/ + i see + well it looks nice + but as for slub, the problem i can see is cross-CPU freeing + and I think nick piggins mentions it + piggin* + this means that sometimes, objects are "burst-free" from one cpu + cache to another + which has the same bad effects as in most other allocators, mainly + fragmentation + There is a special list for freeing object allocated for another + CPU + And garbage collector frees such object on his own + so what's your question ? + It is described in the end of article. + What cpu-cache policy do you think is better to implement? + at this point, any + and even if we had a kernel that perfectly supports + multiprocessor, I wouldn't care much now + it's very hard to evaluate such allocators + slqb looks nice, but if you have the same amount of fragmentation + per slab as other allocators do (which is likely), you have tat amount of + fragmentation multiplied by the number of processors + whereas having shared queues limit the problem somehow + having shared queues mean you have a bit more contention + so, as is the case most of the time, it's a tradeoff + by the way, does pigging say why he "doesn't like" slub ? :) + piggin* + http://lwn.net/Articles/311093/ + here he describes what slqb is better. + well it doesn't describe why slub is worse + but not very particularly + except for order-0 allocations + and that's a form of fragmentation like i mentioned above + in mach those problems have very different impacts + the backend memory isn't physical, it's the kernel virtual space + so the kernel allocator can request chunks of higher than order-0 + pages + physical pages are allocated one at a time, then mapped in the + kernel space + Doesn't order of page depend on buffer size? + it does + And why does gnumach allocates higher than order-0 pages more? + why more ? + i didn't say more + And why in mach those problems have very different impact? + ? + i've just explained why :) + 09:37 < braunr> physical pages are allocated one at a time, then + mapped in the kernel space + "one at a time" means order-0 pages, even if you allocate higher + than order-0 chunks + And in Linux they allocated more than one at time because of + prefetching page reading? + do you understand what virtual memory is ? + linux allocators allocate "physical memory" + mach kernel allocator allocates "virtual memory" + so even if you allocate a big chunk of virtual memory, it's backed + by order-0 physical pages + yes, I understand this + you don't seem to :/ + the problem of higher than order-0 page allocations is + fragmentation + do you see why ? + yes + so + fragmentation in the kernel space is less likely to create issues + than it does in physical memory + keep in mind physical memory is almost always full because of the + page cache + and constantly under some pressure + whereas the kernel space is mostly empty + so allocating higher then order-0 pages in linux is more dangerous + than it is in Mach or BSD + ok + on the other hand, linux focuses pure performance, and not having + to map memory means less operations, less tlb misses, quicker allocations + the Mach VM must map pages "one at a time", which can be expensive + it should be adapted to handle multiple page sizes (e.g. 2 MiB) so + that many allocations can be made with few mappings + but that's not easy + as always: tradeoffs + There are other benefits of physical allocating. In big DMA + transfers can be needed few continuous physical pages. How does mach + handles such cases? + gnumach does that awfully + it just reserves the whole DMA-able memory and uses special + allocation functions on it, IIRC + but kernels which have a MAch VM like memory sytem such as BSDs + have cleaner methods + NetBSD provides a function to allocate contiguous physical memory + with many constraints + FreeBSD uses a binary buddy system like Linux + the fact that the kernel allocator uses virtual memory doesn't + mean the kernel has no mean to allocate contiguous physical memory ... + + +# IRC, freenode, #hurd, 2011-05-02 + + hm nice, my allocator uses less memory than glibc (squeeze + version) on both 32 and 64 bits systems + the new per-cpu layer is proving effective + braunr: Are you reimplementation malloc? + no + it's still the slab allocator for mach, but tested in userspace + so i wrote malloc wrappers + Oh. + i try to heavily test most of my code in userspace now + it's easier :-) + I agree + even the physical memory allocator has been implemented this way + is this your mach version? + virtual memory allocation will follow + or are you working on gnu mach? + for now it's my version + but i intend to spend the summer working on ipc port names + management + +[[rework_gnumach_IPC_spaces]]. + + and integrate the result in gnu mach + are you keeping the same user-space API? + Or are you experimenting with something new? + braunr: to be fair, it's not terribly hard to use less memory than + glibc :-) + yes + antrik: well ptmalloc3 received some nice improvements + neal: the goal is to rework some of the internals only + neal: namely, i simply intend to replace the splay tree with a + radix tree + braunr: the glibc allocator is emphasising performace, unlike some + other allocators that trade some performance for much better memory + utilisation... + ptmalloc3? + that's the allocator used in glibc + http://www.malloc.de/en/ + OK. haven't seen any recent numbers... the comparision I have in + mind is many years old... + i also made some additions to my avl and red-black trees this week + end, which finally make them suitable for almost all generic uses + the red-black tree could be used in e.g. gnu mach to augment the + linked list used in vm maps + which is what's done in most modern systems + it could also be used to drop the overloaded (and probably over + imbalanced) page cache hash table + +[[gnumach_vm_map_red-black_trees]]. + + +# IRC, freenode, #hurd, 2011-05-03 + + antrik: How should I start porting? Have I just include rbraun's + allocator to gnumach and make it compile? + mcsim: well, basically yes I guess... but you will have to look at + the code in question first before we know anything more specific :-) + I guess braunr might know better how to start, but he doesn't + appear to be here :-( + mcsim: you can't juste put my code into gnu mach and make it run, + it really requires a few careful changes + mcsim: you will have to analyse how the current zone allocator + interacts with regard to locking + if it is used in interrupt handlers + what kind of locks it should use instead of the pthread stuff + available in userspace + you will have to change the reclamiing policy, so that caches are + reaped on demand + (this basically boils down to calling the new reclaiming function + instead of zone_gc()) + you must be careful about types too + there is work to be done ;) + (not to mention the obvious about replacing all the calls to the + zone allocator, and testing/debugging afterwards) + + +# IRC, freenode, #hurd, 2011-07-14 + + can you make your patch available ? + it is available in gnumach repository at savannah + tree mplaneta/libbraunr/master + mcsim: i'll test your branch + ok. I'll give you a link in a minute + hm why balloc ? + Braun's allocator + err + + http://git.sceen.net/rbraun/x15mach.git/?a=blob;f=kern/kmem.c;h=37173fa0b48fc9d7e177bf93de531819210159ab;hb=HEAD + mcsim: this is the interface i had in mind for a kernel version :) + very similar to the original slab allocator interface actually + well, you've been working + But I have a problem with this patch. When I apply it to gnumach + code from debian repository. I have to make a change in file ramdisk.c + with sed -i 's/kernel_map/\&kernel_map/' device/ramdisk.c + because in git repository there is no such file + mcsim: how do you configure the kernel before building ? + mcsim: you should keep in touch more often i think, so that you + get feedback from us and don't spend too much time "off course" + I didn't configure it. I just run dpkg-buildsource -b. + oh you build the debian package + well my version was by configure --enable-kdb --enable-rtl8139 + and it seems stuck in an infinite loop during bootstrap + and printf doesn't work. The first function called by c_boot_entry + is printf(version). + mcsim: also, you're invited to get the x15mach version of my + files, which are gplv2+ licensed + be careful of my macros.h file, it can conflict with the + macros_help.h file from gnumach iirc + There were conflicts with MACRO_BEGIN and MACRO_END. But I solved + it + ok + it's tricky + mcsim: try to find where the first use of the allocator is made + + +# IRC, freenode, #hurd, 2011-07-22 + + braunr, hello. Kernel with your allocator already compiles and + runs. There still some problems, but, certainly, I'm on the final stage + already. I hope I'll finish in a few days. + mcsim: Oh, cool! Have you done some measurements already? + Not yet + OK. + But if it able to run a GNU/Hurd system, then that already is + something, a big milestone! + nice + although you'll probably need to tweak the garbage collecting + process + tschwinge: thanks + braunr: As back-end for allocating memory I use + kmem_alloc_wired. But in zalloc was an opportunity to use as back-end + kmem_alloc_pageable. Although there was no any zone that used + kmem_alloc_pageable. Do I need to implement this functionality? + mcsim: do *not* use kmem_alloc_pageable() + braunr: Ok. This is even better) + mcsim: in x15, i've taken this even further: there is *no* kernel + vm object, which means all kernel memory is wired and unmanaged + making it fast and safe + pageable kernel memory was useful back when RAM was really scarce + 20 years ago + but it's a source of deadlock + Indeed. I'll won't use kmem_alloc_pageable. + + +# IRC, freenode, #hurd, 2011-08-09 + + < braunr> mcsim: what's the "bug related to MEM_CF_VERIFY" you refer to in + one of your commits ? + < braunr> mcsim: don't use spin_lock_t as a member of another structure + < mcsim> braunr: I confused with types in *_verify functions, so they + didn't work. Than I fixed it in the commit you mentioned. + < braunr> in gnumach, most types are actually structure pointers + < braunr> use simple_lock_data_t + < braunr> mcsim: ok + < mcsim> > use simple_lock_data_t + < mcsim> braunr: ok + < braunr> mcsim: don't make too many changes to the code base, and if + you're unsure, don't hesitate to ask + < braunr> also, i really insist you rename the allocator, as done in x15 + for example + (http://git.sceen.net/rbraun/x15mach.git/?a=blob;f=vm/kmem.c), instead of + a name based on mine :/ + < mcsim> braunr: Ok. It was just work name. When I finish I'll rename the + allocator. + < braunr> other than that, it's nice to see progress + < braunr> although again, it would be better with some reports along + < braunr> i won't be present at the meeting tomorrow unfortunately, but you + should use those to report the status of your work + < mcsim> braunr: You've said that I have to tweak gc process. Did you mean + to call mem_gc() when physical memory ends instead of calling it every x + seconds? Or something else? + < braunr> there are multiple topics, alhtough only one that really matters + < braunr> study how zone_gc was called + < braunr> reclaiming memory should happen when there is pressure on the VM + subsystem + < braunr> but it shouldn't happen too ofte, otherwise there is trashing + < braunr> and your caches become mostly useless + < braunr> the original slab allocator uses a 15-second period after a + reclaim during which reclaiming has no effect + < braunr> this allows having a somehow stable working set for this duration + < braunr> the linux slab allocator uses 5 seconds, but has a more + complicated reclaiming mechanism + < braunr> it releases memory gradually, and from reclaimable caches only + (dentry for example) + < braunr> for x15 i intend to implement the original 15 second interval and + then perform full reclaims + < mcsim> In zalloc mem_gc is called by vm_pageout_scan, but not often than + once a second. + < mcsim> In balloc I've changed interval to once in 15 seconds. + < braunr> don't use the code as it is + < braunr> the version you've based your work on was meant for userspace + < braunr> where there isn't memory pressure + < braunr> so a timer is used to trigger reclaims at regular intervals + < braunr> it's different in a kernel + < braunr> mcsim: where did you see vm_pageout_scan call the zone gc once a + second ? + < mcsim> vm_pageout_scan calls consider_zone_gc and consider_zone_gc checks + if second is passed. + < braunr> where ? + < mcsim> Than zone_gc can be called. + < braunr> ah ok, it's in zaclloc.c then + < braunr> zalloc.c + < braunr> yes this function is fine + < mcsim> so old gc didn't consider vm pressure. Or I missed something. + < braunr> it did + < mcsim> how? + < braunr> well, it's called by the pageout daemon + < braunr> under memory pressure + < braunr> so it's fine + < mcsim> so if mem_gc is called by pageout daemon is it fine? + < braunr> it must be changed to do something similar to what + consider_zone_gc does + < mcsim> It does. mem_gc does the same work as consider_zone_gc and + zone_gc. + < braunr> good + < mcsim> so gc process is fine? + < braunr> should be + < braunr> i see mem.c only includes mem.h, which then includes other + headers + < braunr> don't do that + < braunr> always include all the headers you need where you need them + < braunr> if you need avltree.h in both mem.c and mem.h, include it in both + files + < braunr> and by the way, i recommend you use the red black tree instead of + the avl type + < braunr> (it's the same interface so it shouldn't take long) + < mcsim> As to report. If you won't be present at the meeting, I can tell + you what I have to do now. + < braunr> sure + < braunr> in addition, use GPLv2 as the license, teh BSD one is meant for + the userspace version only + < braunr> GPLv2+ actually + < braunr> hm you don't need list.c + < braunr> it would only add dead code + < braunr> "Zone for dynamical allocator", don't mix terms + < braunr> this comment refers to a vm_map, so call it a map + < mcsim> 1. Change constructor for kentry_alloc_cache. + < mcsim> 2. Make measurements. + < mcsim> + + < mcsim> 3. Use simple_lock_data_t + < mcsim> 4. Replace license + < braunr> kentry_alloc_cache <= what is that ? + < braunr> cache for kernel map entries in vm_map ? + < braunr> the comment for mem_cpu_pool_get doesn't apply in gnumach, as + there is no kernel preemption + +[[microkernel/mach/gnumach/preemption]]. + + < braunr> "Don't attempt mem GC more frequently than hz/MEM_GC_INTERVAL + times a second. + < braunr> " + < mcsim> sorry. I meant vm_map_kentry_cache + < braunr> hm nothing actually about this comment + < braunr> mcsim: ok + < braunr> yes kernel map entries need special handling + < braunr> i don't know how it's done in gnumach though + < braunr> static preallocation ? + < mcsim> yes + < braunr> that's ugly :p + < mcsim> but it uses dynamic allocation further even for vm_map kernel + entries + < braunr> although such bootstrapping issues are generally difficult to + solve elegantly + < braunr> ah + < mcsim> now I use only static allocation, but I'll add dynamic allocation + too + < braunr> when you have time, mind the coding style (convert everything to + gnumach style, which mostly implies using tabs instead of 4-spaces + indentation) + < braunr> when you'll work on dynamic allocation for the kernel map + entries, you may want to review how it's done in x15 + < braunr> the mem_source type was originally intended for that purpose, but + has slightly changed once the allocator was adapted to work in my kernel + < mcsim> ok + < braunr> vm_map_kentry_zone is the only zone created with ZONE_FIXED + < braunr> and it is zcram()'ed immediately after + < braunr> so you can consider it a statically allocated zone + < braunr> in x15 i use another strategy: there is a special kernel submap + named kentry_map which contains only one map entry (statically allocated) + < braunr> this map is the backend (mem_source) for the kentry_cache + < braunr> the kentry_cache is created with a special flag that tells it + memory can't be reclaimed + < braunr> when the cache needs to grow, the single map entry is extended to + cover the allocated memory + < braunr> it's similar to the way pmap_growkernel() works for kernel page + table pages + < braunr> (and is actually based on that idea) + < braunr> it's a compromise between full static and dynamic allocation + types + < braunr> the advantage is that the allocator code can be used (so there is + no need for a special allocator like in netbsd) + < braunr> the drawback is that some resources can never be returned to + their source (and under peaks, the amount of unfreeable resources could + become large, but this is unexpected) + < braunr> mcsim: for now you shouldn't waste your time with this + < braunr> i see the number of kernel map entries is fixed at 256 + < braunr> and i've never seen the kernel use more than around 30 entries + < mcsim> Do you think that I have to left this problem to the end? + < braunr> yes + + +# IRC, freenode, #hurd, 2011-08-11 + + < mcsim> braunr: Hello. Can you give me an advice how can I make + measurements better? + < braunr> mcsim: what kind of measurements + < mcsim> braunr: How much is your allocator better than zalloc. + < braunr> slightly :p + < braunr> that's why i never took the time to put it in gnumach + < mcsim> braunr: Just I thought that there are some rules or + recommendations of such measurements. Or I can do them any way I want? + < braunr> mcsim: i don't know + < braunr> mcsim: benchmarking is an art of its own, and i don't even know + how to use the bits of profiling code available in gnumach (if it still + works) + < antrik> mcsim: hm... are you saying you already have a running system + with slab allocator?... :-) + < braunr> mcsim: the main advantage i can see is the removal of many + arbitrary hard limits + < mcsim> antrik: yes + < antrik> \o/ + < antrik> nice work! + < braunr> :) + < braunr> the cpu layer should also help a bit, but it's hard to measure + < braunr> i guess it could be seen on the ipc path for very small buffers + < mcsim> antrik: Thanks. But I still have to 1. Change constructor for + kentry_alloc_cache. and 2. Make measurements. + < braunr> and polish the whole thing :p + < antrik> mcsim: I'm not sure this can be measured... the performance + differente in any real live usage is probably just a few percent at most + -- it's hard to construct a benchmark giving enough precision so it's not + drowned in noise... + < antrik> perhaps it conserves some memory -- but that too would be hard to + measure I fear + < braunr> yes + < braunr> there *should* be better allocation times, less fragmentation, + better accounting ... :) + < braunr> and no arbitrary limits ! + < antrik> :-) + < braunr> oh, and the self debugging features can be nice too + < mcsim> But I need to prove that my work wasn't useless + < braunr> well it wasn't, but that's hard to measure + < braunr> it's easy to prove though, since there are additional features + that weren't present in the zone allocator + < mcsim> Ok. If there are some profiling features in gnumach can you give + me a link with their description? + < braunr> mcsim: sorry, no + < braunr> mcsim: you could still write the basic loop test, which counts + the number of allocations performed in a fixed time interval + < braunr> but as it doesn't match many real life patterns, it won't be very + useful + < braunr> and i'm afraid that if you consider real life patterns, you'll + see how negligeable the improvement can be compared to other operations + such as memory copies or I/O (ouch) + < mcsim> Do network drivers use this allocator? + < mcsim> ok. I'll scrape up some test and than I'll report results. + + +# IRC, freenode, #hurd, 2011-08-26 + + < mcsim> hello. Are there any analogs of copy_to_user and copy_from_user in + linux for gnumach? + < mcsim> Or how can I determine memory map if I know address? I need this + for vm_map_copyin + < guillem> mcsim: vm_map_lookup_entry? + < mcsim> guillem: but I need to transmit map to this function and it will + return an entry which contains specified address. + < mcsim> And I don't know what map have I transmit. + < mcsim> I need to transfer static array from kernel to user. What map + contains static data? + < antrik> mcsim: Mach doesn't have copy_{from,to}_user -- instead, large + chunks of data are transferred as out-of-line data in IPC messages + (i.e. using VM magic) + < mcsim> antrik: can you give me an example? I just found using + vm_map_copyin in host_zone_info. + < antrik> no idea what vm_map_copyin is to be honest... + + +# IRC, freenode, #hurd, 2011-08-27 + + < braunr> mcsim: the primitives are named copyin/copyout, and they are used + for messages with inline data + < braunr> or copyinmsg/copyoutmsg + < braunr> vm_map_copyin/out should be used for chunks larger than a page + (or roughly a page) + < braunr> also, when writing to a task space, see which is better suited: + vm_map_copyout or vm_map_copy_overwrite + < mcsim> braunr: and what will be src_map for vm_map_copyin/out? + < braunr> the caller map + < braunr> which you can get with current_map() iirc + < mcsim> braunr: thank you + < braunr> be careful not to leak anything in the transferred buffers + < braunr> memset() to 0 if in doubt + < mcsim> braunr:ok + < braunr> antrik: vm_map_copyin() is roughly vm_read() + < antrik> braunr: what is it used for? + < braunr> antrik: 01:11 < antrik> mcsim: Mach doesn't have + copy_{from,to}_user -- instead, large chunks of data are transferred as + out-of-line data in IPC messages (i.e. using VM magic) + < braunr> antrik: that "VM magic" is partly implemented using vm_map_copy* + functions + < antrik> braunr: oh, you mean it doesn't actually copy data, but only page + table entries? if so, that's *not* really comparable to + copy_{from,to}_user()... + + +# IRC, freenode, #hurd, 2011-08-28 + + < braunr> antrik: the equivalent of copy_{from,to}_user are + copy{in,out}{,msg} + < braunr> antrik: but when the data size is about a page or more, it's + better not to copy, of course + < antrik> braunr: it's actually not clear at all that it's really better to + do VM magic than to copy... + + +# IRC, freenode, #hurd, 2011-08-29 + + < braunr> antrik: at least, that used to be the general idea, and with a + simpler VM i suspect it's still true + < braunr> mcsim: did you progress on your host_zone_info replacement ? + < braunr> mcsim: i think you should stick to what the original + implementation did + < braunr> which is making an inline copy if caller provided enough space, + using kmem_alloc_pageable otherwise + < braunr> specify ipc_kernel_map if using kmem_alloc_pageable + < mcsim> braunr: yes. And it works. But I use kmem_alloc, not pageable. Is + it worse? + < mcsim> braunr: host_zone_info replacement is pushed to savannah + repository. + < braunr> mcsim: i'll have a look + < mcsim> braunr: I've pushed one more commit just now, which has attitude + to host_zone_info. + < braunr> mem_alloc_early_init should be renamed mem_bootstrap + < mcsim> ok + < braunr> mcsim: i don't understand your call to kmem_free + < mcsim> braunr: It shouldn't be there? + < braunr> why should it be there ? + < braunr> you're freeing what the copy object references + < braunr> it's strange that it even works + < braunr> also, you shouldn't pass infop directly as the copy object + < braunr> i guess you get a warning for that + < braunr> do what the original code does: use an intermediate copy object + and a cast + < mcsim> ok + < braunr> another error (without consequence but still, you should mind it) + < braunr> simple_lock(&mem_cache_list_lock); + < braunr> [...] + < braunr> kr = kmem_alloc(ipc_kernel_map, &info, info_size); + < braunr> you can't hold simple locks while allocating memory + < braunr> read how the original implementation works around this + < mcsim> ok + < braunr> i guess host_zone_info assumes the zone list doesn't change much + while unlocked + < braunr> or that's it's rather unimportant since it's for debugging + < braunr> a strict snapshot isn't required + < braunr> list_for_each_entry(&mem_cache_list, cache, node) max_caches++; + < braunr> you should really use two separate lines for readability + < braunr> also, instead of counting each time, you could just maintain a + global counter + < braunr> mcsim: use strncpy instead of strcpy for the cache names + < braunr> not to avoid overflow but rather to clear the unused bytes at the + end of the buffer + < braunr> mcsim: about kmem_alloc vs kmem_alloc_pageable, it's a minor + issue + < braunr> you're handing off debugging data to a userspace application + < braunr> a rather dull reporting tool in most cases, which doesn't require + wired down memory + < braunr> so in order to better use available memory, pageable memory + should be used + < braunr> in the future i guess it could become a not-so-minor issue though + < mcsim> ok. I'll fix it + < braunr> mcsim: have you tried to run the kernel with MC_VERIFY always on + ? + < braunr> MEM_CF_VERIFY actually + < mcsim1> yes. + < braunr> oh + < braunr> nothing wrong + < braunr> ? + < mcsim1> it is always set + < braunr> ok + < braunr> ah, you set it in macros.h .. + < braunr> don't + < braunr> put it in mem.c if you want, or better, make it a compile-time + option + < braunr> macros.h is a tiny macro library, it shouldn't define such + unrelated options + < mcsim1> ok. + < braunr> mcsim1: did you try fault injection to make sure the checking + code actually works and how it behaves when an error occurs ? + < mcsim1> I think that when I finish I'll merge files cpu.h and macros.h + with mem.c + < braunr> yes that would simplify things + < mcsim1> Yes. When I confused with types mem_buf_fill worked wrong and + panic occurred. + < braunr> very good + < braunr> have you progressed concerning the measurements you wanted to do + ? + < mcsim1> not much. + < braunr> ok + < mcsim1> I think they will be ready in a few days. + < antrik> what measurements are these? + < mcsim1> braunr: What maximal size for static data and stack in kernel? + < braunr> what do you mean ? + < braunr> kernel stacks are one page if i'm right + < braunr> static data (rodata+data+bss) are limited by grub bugs only :) + < mcsim1> braunr: probably they are present, because when I created too big + array I couldn't boot kernel + < braunr> local variable or static ? + < mcsim1> static + < braunr> how large ? + < mcsim1> 4Mb + < braunr> hm + < braunr> it's not a grub bug then + < braunr> i was able to embed as much as 32 MiB in x15 while doing this + kind of tests + < braunr> I guess it's the gnu mach boot code which only preallocates one + page for the initial kernel mapping + < braunr> one PTP (page table page) maps 4 MiB + < braunr> (x15 does this completely dynamically, unlike mach or even + current BSDs) + < mcsim1> antrik: First I want to measure time of each cache + creation/allocation/deallocation and then compile kernel. + < braunr> cache creation is irrelevant + < braunr> because of the cpu pools in the new allocator, you should test at + least two different allocation patterns + < braunr> one with quick allocs/frees + < braunr> the other with large numbers of allocs then their matching frees + < braunr> (larger being at least 100) + < braunr> i'd say the cpu pool layer is the real advantage over the + previous zone allocator + < braunr> (from a performance perspective) + < mcsim1> But there is only one cpu + < braunr> it doesn't matter + < braunr> it's stil a very effective cache + < braunr> in addition to reducing contention + < braunr> compare mem_cpu_pool_pop() against mem_cache_alloc_from_slab() + < braunr> mcsim1: work is needed to polish the whole thing, but getting it + actually working is a nice achievement for someone new on the project + < braunr> i hope it helped you learn about memory allocation, virtual + memory, gnu mach and the hurd in general :) + < antrik> indeed :-) + + +# IRC, freenode, #hurd, 2011-09-06 + + [some performance testing] + i'm not sure such long tests are relevant but let's assume balloc + is slower + some tuning is needed here + first, we can see that slab allocation occurs more often in balloc + than page allocation does in zalloc + so yes, as slab allocation is slower (have you measured which part + actually is slow ? i guess it's the kmem_alloc call) + the whole process gets a bit slower too + I used alloc_size = 4096 for zalloc + i don't know what that is exactly + but you can't hold 500 16 bytes buffers in a page so zalloc must + have had free pages around for that + I use kmem_alloc_wired + if you have time, measure it, so that we know how much it accounts + for + where are the results for dealloc ? + I can't give you result right now because internet works very + bad. But for first DEALLOC result are the same, exept some cases when it + takes balloc for more than 1000 ticks + must be the transfer from the cpu layer to the slab layer + as to kmem_alloc_wired. I think zalloc uses this function too for + allocating objects in zone I test. + mcsim: yes, but less frequently, which is why it's faster + mcsim: another very important aspect that should be measured is + memory consumption, have you looked into that ? + I think that I made too little iterations in test SMALL + If I increase constant SMALL_TESTS will it be good enough? + mcsim: i don't know, try both :) + if you increase the number of iterations, balloc average time will + be lower than zalloc, but this doesn't remove the first long + initialization step on the allocated slab + SMALL_TESTS to 500, I mean + i wonder if maintaining the slabs sorted through insertion sort is + what makes it slow + braunr: where do you sort slabs? I don't see this. + mcsim: mem_cache_alloc_from_slab and its free counterpart + mcsim: the mem_source stuff is useless in gnumach, you can remove + it and directly call the kmem_alloc/free functions + But I have to make special allocator for kernel map entries. + ah right + btw. It turned out that 256 entries are not enough. + that's weird + i'll make a patch so that the mem_source code looks more like what + i have in x15 then + about the results, i don't think the slab layer is that slow + it's the cpu_pool_fill/drain functions that take time + they preallocate many objects (64 for your objects size if i'm + right) at once + mcsim: look at the first result page: some times, a number around + 8000 is printed + the common time (ticks, whatever) for a single object is 120 + 8132/120 is 67, close enough to the 64 value + I forgot about SMALL tests here are they: + http://paste.debian.net/128533/ (balloc) http://paste.debian.net/128534/ + (zalloc) + braunr: why do you divide 8132 by 120? + mcsim: to see if it matches my assumption that the ~8000 number + matches the cpu_pool_fill call + braunr: I've got it + mcsim: i'd be much interested in the dealloc results if you can + paste them too + dealloc: http://paste.debian.net/128589/ + http://paste.debian.net/128590/ + mcsim: thanks + second dealloc: http://paste.debian.net/128591/ + http://paste.debian.net/128592/ + mcsim: so the main conclusion i retain from your tests is that the + transfers from the cpu and the slab layers are what makes the new + allocator a bit slower + OPERATION_SMALL dealloc: http://paste.debian.net/128593/ + http://paste.debian.net/128594/ + mcsim: what needs to be measured now is global memory usage + braunr: data from /proc/vmstat after kernel compilation will be + enough? + mcsim: let me check + mcsim: no it won't do, you need to measure kernel memory usage + the best moment to measure it is right after zone_gc is called + Are there any facilities in gnumach for memory measurement? + it's specific to the allocators + just count the number of used pages + after garbage collection, there should be no free page, so this + should be rather simple + ok + braunr: When I measure memory usage in balloc, what formula is + better cache->nr_slabs * cache->bufs_per_slab * cache->buf_size or + cache->nr_slabs * cache->slab_size? + the latter + + +# IRC, freenode, #hurd, 2011-09-07 + + braunr: I've disabled calling of mem_cpu_pool_fill and allocator + became faster + mcsim: sounds nice + mcsim: i suspect the free path might not be as fast though + results for first calling: http://paste.debian.net/128639/ second: + http://paste.debian.net/128640/ and with many alloc/free: + http://paste.debian.net/128641/ + mcsim: thanks + best result are for second call: average time decreased from 159.56 + to 118.756 + First call slightly worse, but this is because I've added some + profiling code + i still see some ~8k lines in 128639 + even some around ~12k + I think this is because of mem_cache_grow I'm investigating it now + i guess so too + I've measured time for first call in cache and from about 22000 + mem_cache_grow takes 20000 + how did you change the code so that it doesn't call + mem_cpu_pool_fill ? + is the cpu layer still used ? + http://paste.debian.net/128644/ + don't forget the free path + mcsim: anyway, even with the previous slightly slower behaviour we + could observe, the performance hit is negligible + Is free path a compilation? (I'm sorry for my english) + mcsim: mem_cache_free + mcsim: the last two measurements i'd advise are with big (>4k) + object sizes and, really, kernel allocator consumption + http://paste.debian.net/128648/ http://paste.debian.net/128646/ + http://paste.debian.net/128649/ (first, second, small) + mcsim: these numbers are closer to the zalloc ones, aren't they ? + deallocating slighty faster too + it may not be the case with larger objects, because of the use of + a tree + yes, they are closer + but then, i expect some space gains + the whole thing is about compromise + ok. I'll try to measure them today. Anyway I'll post result and you + could read them in the morning + at least, it shows that the zone allocator was actually quite good + i don't like how the code looks, there are various hacks here and + there, it lacks self inspection features, but it's quite good + and there was little room for true improvement in this area, like + i told you :) + (my allocator, like the current x15 dev branch, focuses on mp + machines) + mcsim: thanks again for these numbers + i wouldn't have had the courage to make the tests myself before + some time eh + braunr: hello. Look at the small_4096 results + http://paste.debian.net/128692/ (balloc) http://paste.debian.net/128693/ + (zalloc) + mcsim: wow, what's that ? :) + mcsim: you should really really include your test parameters in + the report + like object size, purpose, and other similar details + for balloc I specified only object_size = 4096 + for zalloc object_size = 4096, alloc_size = 4096, memtype = 0; + the results are weird + apart from the very strange numbers (e.g. 0 or 4429543648), none + is around 3k, which is the value matching a kmem_alloc call + happy to see balloc behaves quite good for this size too + s/good/well/ + Oh + here is significant only first 101 lines + I'm sorry + ok + what does the test do again ? 10 loops of 10 allocs/frees ? + yes + ok, so the only slowdown is at the beginning, when the slabs are + created + the two big numbers (31844 and 19548) are strange + on the other hand time of compilation is + balloc zalloc + 38m28.290s 38m58.400s + 38m38.240s 38m42.140s + 38m30.410s 38m52.920s + what are you compiling ? + gnumach kernel + in 40 mins ? + yes + you lack hvm i guess + is it long? + I use real PC + very + ok + so it's normal + in vm it was about 2 hours) + the difference really is negligible + ok i can explain the big numbers + the slab size depends on the object size, and for 4k, it is 32k + you can store 8 4k buffers in a slab (lines 2 to 9) + so we need use kmem_alloc_* 8 times? + on line 10, the ninth object is allocated, which adds another slab + to the cache, hence the big number + no, once for a size of 32k + and then the free list is initialized, which means accessing those + pages, which means tlb misses + i guess the zone allocator already has free pages available + I see + i think you can stop performance measurements, they show the + allocator is slightly slower, but so slightly we don't care about that + we need numbers on memory usage now (at the page level) + and this isn't easy + For balloc I can get numbers if I summarize nr_slabs*slab_size for + each cache, isn't it? + yes + you can have a look at the original implementation, function + mem_info + And for zalloc I have to summarize of cur_size and then add + zalloc_wasted_space? + i don't know :/ + i think the best moment to obtain accurate values is after zone_gc + removes the collected pages + for both allocators, you could fill a stats structure at that + moment, and have an rpc copy that structure when a client tool requests + it + concerning your tests, there is another point to have in mind + the very first loop in your code shows a result of 31844 + although you disabled the call to cpu_pool_fill + but the reason why it's so long is that the cpu layer still exists + and if you look carefully, the cpu pools are created as needed on + the free path + I removed cpu_pool_drain + but not cpu_pool_push/pop i guess + http://paste.debian.net/128698/ + see, you still allocate the cpu pool array on the free path + but I don't fill it + that's not the point + it uses mem_cache_alloc + so in a call to free, you can also have an allocation, that can + potentially create a new slab + I see, so I have to create cpu_pool at the initialization stage? + no, you can't + there is a reason why they're allocated on the free path + but since you don't have the fill/drain functions, i wonder if you + should just comment out the whole cpu layer code + but hmm + no really, it's not worth the effort + even with drains/fills, the results are really good enough + it makes the allocator smp ready + we should just keep it that way + mcsim: fyi, the reason why cpu pool arrays are allocated on the + free path is to avoid recursion + because cpu pool arrays are allocated from caches just as almost + everything else + ok + summ of cur_size and then adding zalloc_wasted_space gives 0x4e1954 + but this value isn't even page aligned + For balloc I've got 0x4c6000 0x4aa000 0x48d000 + hm can you report them in decimal, >> 10 so that values are in KiB + ? + 4888 4776 4660 for balloc + 4998 for zalloc + when ? + after boot ? + boot, compile, zone_gc + and then measure + ? + I call garbage collector before measuring + and I measure after kernel compilation + i thought it took you 40 minutes + for balloc I got results at night + oh so you already got them + i can't beleive the kernel only consumes 5 MiB + before gc it takes about 9052 Kib + can i see the measurement code ? + oh, and how much ram does your machine have ? + 758 mb + 768 + that's really weird + i'd expect the kernel to consume much more space + http://paste.debian.net/128703/ + it's only dynamically allocated data + yes + ipc ports, rights, vm map entries, vm objects, and lots of other + hanging buffers + about how much is zalloc_wasted_space ? + if it's small or constant, i guess you could ignore it + about 492 + KiB + well it's another good point, mach internal structures don't imply + much overhead + or, the zone allocator is underused + + mcsim, braunr: The memory allocator project is coming along + good, as I get from your IRC messages? + tschwinge: yes, but as expected, improvements are minor + But at the very least it's now well-known, maintainable code. + yes, it's readable, easier to understand, provides self inspection + and is smp ready + there also are less hacks, but a few less features (there are no + way to avoid sleeping so it's unusable - and unused - in interrupt + handlers) + is* no way + tschwinge: mcsim did a good job porting and measuring it + + +# IRC, freenode, #hurd, 2011-09-08 + + braunr: note that the zalloc map used to be limited to 8 MiB or + something like that a couple of years ago... so it doesn't seems + surprising that the kernel uses "only" 5 MiB :-) + (yes, we had a *lot* of zalloc panics back then...) + + +# IRC, freenode, #hurd, 2011-09-14 + + braunr: hello. I've written a constructor for kernel map entries + and it can return resources to their source. Can you have a look at it? + http://paste.debian.net/130037/ If all be OK I'll push it tomorrow. + mcsim: send the patch through mail please, i'll apply it on my + copy + are you sure the cache is reapable ? + All slabs, except first I allocate with kmem_alloc_wired. + how can you be sure ? + First slab I allocate during bootstrap and use pmap_steal_memory + and further I use only kmem_alloc_wired + no, you use kmem_free + in kentry_dealloc_cache() + which probably creates a recursion + using the constructor this way isn't a good idea + constructors are good for preconstructed state (set counters to 0, + init lists and locks, that kind of things, not allocating memory) + i don't think you should try to make this special cache reapable + mcsim: keep in mind constructors are applied on buffers at *slab* + creation, not at object allocation + so if you allocate a single slab with, say, 50 or 100 objects per + slab, kmem_alloc_wired would be called that number of times + why kentry_dealloc_cache can create recursion? kentry_dealloc_cache + is called only by mem_cache_reap. + right + but are you totally sure mem_cache_reap() can't be called by + kmem_free() ? + i think you're right, it probably can't + + +# IRC, freenode, #hurd, 2011-09-25 + + braunr: hello. I rewrote constructor for kernel entries and seems + that it works fine. I think that this was last milestone. Only moving of + memory allocator sources to more appropriate place and merge with main + branch left. + mcsim: it needs renaming and reindenting too + for reindenting C-x h Tab in emacs will be enough? + mcsim: make sure which style must be used first + and what should I rename and where better to place allocator? For + example, there is no lib directory, like in x15. Should I create it and + move list.* and rbtree.* to lib/ or move these files to util/ or + something else? + mcsim: i told you balloc isn't a good name before, use something + more meaningful (kmem is already used in gnumach unfortunately if i'm + right) + you can put the support files in kern/ + what about vm_alloc? + you should prefix it with vm_ + shouldn't + it's a top level allocator + on top of the vm system + maybe mcache + hm no + maybe just km_ + kern/km_alloc.*? + no + just km + ok. + + +# IRC, freenode, #hurd, 2011-09-27 + + braunr: hello. When I've tried to speed of new allocator and bad + I've removed function mem_cpu_pool_fill. But you've said to undo this. I + don't understand why this function is necessary. Can you explain it, + please? + When I've tried to compare speed of new allocator and old* + i'm not sure i said that + i said the performance overhead is negligible + so it's better to leave the cpu pool layer in place, as it almost + doesn't hurt + you can implement the KMEM_CF_NO_CPU_POOL I added in the x15 mach + version + so that cpu pools aren't used by default, but the code is present + in case smp is implemented + I didn't remove cpu pool layer. I've just removed filling of cpu + pool during creation of slab. + how do you fill the cpu pools then ? + If object is freed than it is added to cpu poll + so you don't fill/drain the pools ? + you try to get/put an object and if it fails you directly fall + back to the slab layer ? + I drain them during garbage collection + oh + yes + you shouldn't touch the cpu layer during gc + the number of objects should be small enough so that we don't care + much + ok. I can drain cpu pool at any other time if it is prohibited to + in mem_gc. + But why do we need to fill cpu poll during slab creation? + In this case allocation consist of: get object from slab -> put it + to cpu pool -> get it from cpu pool + I've just remove last to stages + hm cpu pools aren't filled at slab creation + they're filled when they're empty, and drained when they're full + so that the number of objects they contain is increased/reduced to + a value suitable for the next allocations/frees + the idea is to fall back as little as possible to the slab layer + because it requires the acquisition of the cache lock + oh. You're right. I'm really sorry. The point is that if cpu pool + is empty we don't need to fill it first + uh, yes we do :) + Why cache locking is so undesirable? If we have free objects in + slabs locking will not take a lot if time. + mcsim: it's undesirable on a smp system + ok. + mcsim: and spin locks are normally noops on a up system + which is the case in gnumach, hence the slightly better + performances without the cpu layer + but i designed this allocator for x15, which only supports mp + systems :) + mcsim: sorry i couldn't look at your code, sick first, busy with + server migration now (new server almost ready for xen hurds :)) + ok. + I ended with allocator if didn't miss anything important:) + i'll have a look soon i hope :) + + +# IRC, freenode, #hurd, 2011-09-27 + + braunr: would it be realistic/useful to check during GC whether + all "used" objects are actually in a CPU pool, and if so, destroy them so + the slab can be freed?... + mcsim: BTW, did you ever do any measurements of memory + use/fragmentation? + antrik: I couldn't do this for zalloc + oh... why not? + (BTW, I would be interested in a comparision between using the CPU + layer, and bare slab allocation without CPU layer) + Result I've got were strange. It wasn't even aligned to page size. + Probably is it better to look into /proc/vmstat? + Because I put hooks in the code and probably I missed something + mcsim: I doubt vmstat would give enough information to make any + useful comparision... + antrik: isn't this draining cpu pools at gc time ? + antrik: the cpu layer was found to add a slight overhead compared + to always falling back to the slab layer + braunr: my idea is only to drop entries from the CPU cache if they + actually prevent slabs from being freed... if other objects in the slab + are really in use, there is no point in flushing them from the CPU cache + braunr: I meant comparing the fragmentation with/without CPU + layer. the difference in CPU usage is probably negligable anyways... + you might remember that I was (and still am) sceptical about CPU + layer, as I suspect it worsens the good fragmentation properties of the + pure slab allocator -- but it would be nice to actually check this :-) + antrik: right + antrik: the more i think about it, the more i consider slqb to be + a better solution ...... :> + an idea for when there's time + eh + hehe :-) + + +# IRC, freenode, #hurd, 2011-10-13 + + mcsim: what's the current state of your gnumach branch ? + I've merged it with master in September + yes i've seen that, but does it build and run fine ? + I've tested it on gnumach from debian repository, but for building + I had to make additional change in device/ramdisk.c, as I mentioned. + mcsim: why ? + And it runs fine for me. + mcsim: why did you need to make other changes ? + because there is a patch which comes with from-debian-repository + kernel and it addes some code, where I have to make changes. Earlier + kernel_map was a pointer to structure, but I change that and now + kernel_map is structure. So handling to it should be by taking the + address (&kernel_map) + why did you do that ? + or put it another way: what made you do that type change on + kernel_map ? + Earlier memory for kernel_map was allocating with zalloc. But now + salloc can't allocate memory before it's initialisation + that's not a good reason + a simple workaround for your problem is this : + static struct vm_map kernel_map_store; + vm_map_t kernel_map = &kernel_map_store; + braunr: Ok. I'll correct this. + + +# IRC, freenode, #hurd, 2011-11-01 + + etenil: but mcsim's work is, for one, useful because the allocator + code is much clearer, adds some debugging support, and is smp-ready + + +# IRC, freenode, #hurd, 2011-11-14 + + i've just realized that replacing the zone allocator removes most + (if not all) static limit on allocated objects + as we have nothing similar to rlimits, this means kernel resources + are actually exhaustible + and i'm not sure every allocation is cleanly handled in case of + memory shortage + youpi: antrik: tschwinge: is this acceptable anyway ? + (although IMO, it's also a good thing to get rid of those limits + that made the kernel panic for no valid reason) + there are actually not many static limits on allocated objects + only a few have one + those defined in kern/mach_param.h + most of them are not actually enforced + ah ? + they are used at zinit() time + i thought they were + yes, but most zones are actually fine with overcoming the max + ok + see zone->max_size += (zone->max_size >> 1); + you need both !EXHAUSTIBLE and FIXED + ok + making having rlimits enforced would be nice... + s/making// + pinotree: the kernel wouldn't handle many standard rlimits anyway + + i've just committed my final patch on mcsim's branch, which will + serve as the starting point for integration + which means code in this branch won't change (or only last minute + changes) + you're invited to test it + there shouldn't be any noticeable difference with the master + branch + a bit less fragmentation + more memory can be reclaimed by the VM system + there are debugging features + it's SMP ready + and overall cleaner than the zone allocator + although a bit slower on the free path (because of what's + performed to reduce fragmentation) + but even "slower" here is completely negligible + + +# IRC, freenode, #hurd, 2011-11-15 + + I enabled cpu_pool layer and kentry cache exhausted at "apt-get + source gnumach && (cd gnumach-* && dpkg-buildpackage)" + I mean kernel with your last commit + braunr: I'll make patch how I've done it in a few minutes, ok? It + will be more specific. + mcsim: did you just remove the #if NCPUS > 1 directives ? + no. I replaced macro NCPUS > 1 with SLAB_LAYER, which equals NCPUS + > 1, than I redefined macro SLAB_LAYER + ah, you want to make the layer optional, even on UP machines + mcsim: can you give me the commands you used to trigger the + problem ? + apt-get source gnumach && (cd gnumach-* && dpkg-buildpackage) + mcsim: how much ram & swap ? + let's see if it can handle a quite large aptitude upgrade + how can I check swap size? + free + cat /proc/meminfo + top + whatever + total used free shared buffers + cached + Mem: 786368 332296 454072 0 0 + 0 + -/+ buffers/cache: 332296 454072 + Swap: 1533948 0 1533948 + ok, i got the problem too + braunr: do you run hurd in qemu? + yes + i guess the cpu layer increases fragmentation a bit + which means more map entries are needed + hm, something's not right + there are only 26 kernel map entries when i get the panic + i wonder why the cache gets that stressed + hm, reproducing the kentry exhaustion problem takes quite some + time + braunr: what do you mean? + sometimes, dpkg-buildpackage finishes without triggering the + problem + the problem is in apt-get source gnumach + i guess the problem happens because of drains/fills, which + allocate/free much more object than actually preallocated at boot time + ah ? + ok + i've never had it at that point, only later + i'm unable to trigger it currently, eh + do you use *-dbg kernel? + yes + well, i use the compiled kernel, with the slab allocator, built + with the in kernel debugger + when you run apt-get source gnumach, you run it in clean directory? + Or there are already present downloaded archives? + completely empty + ah just got it + ok the limit is reached, as expected + i'll just bump it + the cpu layer drains/fills allocate several objects at once (64 if + the size is small enough) + the limit of 256 (actually 252 since the slab descriptor is + embedded in its slab) is then easily reached + mcsim: most direct way to check swap usage is vmstat + damn, i can't live without slabtop and the amount of + active/inactive cache memory any more + hm, weird, we have active/inactive memory in procfs, but not + buffers/cached memory + we could set buffers to 0 and everything as cached memory, since + we're currently unable to communicate the purpose of cached memory + (whether it's used by disk servers or file system servers) + mcsim: looks like there are about 240 kernel map entries (i forgot + about the ones used in kernel submaps) + so yes, addin the cpu layer is what makes the kernel reach the + limit more easily + braunr: so just increasing limit will solve the problem? + mcsim: yes + slab reclaiming looks very stable + and unfrequent + (which is surprising) + braunr: "unfrequent"? + pinotree: there isn't much memory pressure + slab_collect() gets called once a minute on my hurd + or is it infrequent ? + :) + i have no idea :) + infrequent, yes + + +# IRC, freenode, #hurd, 2011-11-16 + + for those who want to play with the slab branch of gnumach, the + slabinfo tool is available at http://git.sceen.net/rbraun/slabinfo.git/ + for those merely interested in numbers, here is the output of + slabinfo, for a hurd running in kvm with 512 MiB of RAM, an unused swap, + and a short usage history (gnumach debian packages built, aptitude + upgrade for a dozen of packages, a few git commands) + http://www.sceen.net/~rbraun/slabinfo.out + braunr: numbers for a long usage history would be much more + interesting :-) + + +## IRC, freenode, #hurd, 2011-11-17 + + antrik: they'll come :) + is something going on on darnassus? it's mighty slow + yes + i've rebooted it to run a modified kernel (with the slab + allocator) and i'm building stuff on it to stress it + (i don't have any other available machine with that amount of + available physical memory) + ok + braunr: probably would be actually more interesting to test under + memory pressure... + guess that doesn't make much of a difference for the kernel object + allocator though + antrik: if ram is larger, there can be more objects stored in + kernel space, then, by building something large such as eglibc, memory + pressure is created, causing caches to be reaped + our page cache is useless because of vm_object_cached_max + it's a stupid arbitrary limit masking the inability of the vm to + handle pressure correctly + if removing it, the kernel freezes soon after ram is filled + antrik: it may help trigger the "double swap" issue you mentioned + what may help trigger it? + not checking this limit + hm... indeed I wonder whether the freezes I see might have the + same cause + + +## IRC, freenode, #hurd, 2011-11-19 + + http://www.sceen.net/~rbraun/slabinfo.out <= state of the slab + allocator after building the debian libc packages and removing all files + once done + it's mostly the same as on any other machine, because of the + various arbitrary limits in mach (most importantly, the max number of + objects in the page cache) + fragmentation is still quite low + braunr: actually fragmentation seems to be lower than on the other + run... + antrik: what makes you think that ? + the numbers of currently unused objects seem to be in a similar + range IIRC, but more of them are reclaimable I think + maybe I'm misremembering the other numbers + there had been more reclaims on the other run + + +# IRC, freenode, #hurd, 2011-11-25 + + mcsim: i've just updated the slab branch, please review my last + commit when you have time + braunr: Do you mean compilation/tests? + no, just a quick glance at the code, see if it matches what you + intended with your original patch + braunr: everything is ok + good + i think the branch is ready for integration + + +# IRC, freenode, #hurd, 2011-12-17 + + in the slab branch, there now is no use for the defines in + kern/mach_param.h + should the file be removed or left empty as a placeholder for + future arbitrary limits ? + (i'd tend ro remove it as a way of indicating we don't want + arbitrary limits but there may be a good reason to keep it around .. :)) + I'd just drop it + ok + hmm maybe we do want to keep that one : + #define IMAR_MAX (1 << 10) /* Max number of + msg-accepted reqs */ + whatever that is... + it gets returned in ipc_marequest_info + but the mach_debug interface has never been used on the hurd + there now is a master-slab branch in the gnumach repo, feel free + to test it + + +# IRC, freenode, #hurd, 2011-12-22 + + braunr: does the new gnumach allocator has profiling features? + e.g. to easily know where memory leaks reside + youpi: you mean tracking call traces to allocated blocks ? + not necessarily traces + but at least means to know what kind of objects is filling memory + it's very close to the zone allocator + but instead of zones, there are caches + each named after the type they store + see http://www.sceen.net/~rbraun/slabinfo.out + ok, so we can know, per-type, how much memory is used + yes + good + if backtraces can easily be forged, it wouldn't be hard to add + that feature too + does it dump such info when memory goes short? + no but it can + i've done this during tests + it'd be good + because I don't know in advance when a buildd will crash due to + that :) + each time slab_collect() is called for example + I mean not on collect, but when it's too late + and thus always enabled + ok + (because there's nothing better to do than at least give infos) + you just have to define "when it's too late", and i can add that + when there is no memory left + you mean when the number of free pages strictly reaches 0 ? + yes + ok + i.e. just before crashing the kernel + i see + + +# IRC, freenode, #hurdfr, 2012-01-02 + + braunr: le code du slab allocator, il est écrit from scratch ? + il y a encore du copyright carnegie mellon + (dans slab_info.h du moins) + ipc_hash_global_size = 256; + il faudrait mettre 256 comme constante dans un header + sinon c'est encore une valeur arbitraire cachée dans du code + de même pour ipc_marequest_size etc. + youpi: oui, from scratch + slab_info.h est à l'origine zone_info.h + pour les valeurs fixes, elles étaient déjà présentes de cette + façon, j'ai pensé qu'il valait mieux laisser comme ça pour faciliter la + lecture des diffs + je ferai des macros à la place + du coup il faudra peut-être remettre mach_param.h + ou alors dans les .h ipc + + +# IRC, freenode, #hurd, 2012-01-18 + + does the slab branch need other reviews/reports before being + integrated ? + + +# IRC, freenode, #hurd, 2012-01-30 + + youpi: do you have some idea about when you want to get the slab + branch in master ? + I was considering as soon as mcsim gets his paper + right + + +# IRC, freenode, #hurd, 2012-02-22 + + Do I understand correct, that real memory page should be + necessarily in one of following lists: vm_page_queue_active, + vm_page_queue_inactive, vm_page_queue_free? + cached pages are + some special pages used only by the kernel aren't + pages can be both wired and cached (i.e. managed by the page + cache), so that they can be passed to external applications and then + unwired (as is the case with your host_slab_info() function if you + remember) + use "physical" instead of "real memory" + braunr: thank you. + + +# IRC, freenode, #hurd, 2012-04-22 + + youpi: tschwinge: when the slab code was added, a few new files + made into gnumach that come from my git repo and are used in other + projects as well + they're licensed under BSD upstream and GPL in gnumach, and though + it initially didn't disturb me, now it does + i think i should fix this by leaving the original copyright and + adding the GPL on top + sure, submit a patch + hm i have direct commit acces if im right + then fix it :) + do you want to review ? + I don't think there is any need to + ok + + +# IRC, freenode, #hurd, 2012-12-08 + + braunr: hi. Do I understand correct that merely the same technique + is used in linux to determine the slab where, the object to be freed, + resides? + yes but it's faster on linux since it uses a direct mapping of + physical memory + it just has to shift the virtual address to obtain the physical + one, whereas x15 has to walk the pages tables + of course it only works for kmalloc, vmalloc is entirely different + btw, is there sense to use some kind of B-tree instead of AVL to + decrease number of cache misses? AFAIK, in modern processors size of L1 + cache line is at least 64 bytes, so in one node we can put at least 4 + leafs (key + pointer to data) making search faster. + that would be a b-tree + and yes, red-black trees were actually developed based on + properties observed on b-trees + but increasing the size of the nodes also increases memory + overhead + and code complexity + that's why i have a radix trees for cases where there are a large + number of entries with keys close to each other :) + a radix-tree is basically a b-tree using the bits of the key as + indexes in the various arrays it walks instead of comparing keys to each + other + the original avl tree used in my slab allocator was intended to + reduce the average height of the tree (avl is better for that) + avl trees are more suited for cases where there are more lookups + than inserts/deletions + they make the tree "flatter" but the maximum complexity of + operations that change the tree is 2log2(n), since rebalancing the tree + can make the algorithm reach back to the tree root + red-black trees have slightly bigger heights but insertions are + limited to 2 rotations and deletions to 3 + there should be not much lookups in slab allocators + which explains why they're more generally found in generic + containers + or do I misunderstand something? + well, there is a lookup for each free() + whereas there are insertions/deletions when a slab becomes + non-empty/empty + I see + so it was very efficient for caches of small objects, where slabs + have many of them + also, i wrote the implementation in userspace, without + functionality pmap provides (although i could have emulated it + afterwards) + + +# IRC, freenode, #hurd, 2013-01-06 + + braunr: panic: vm_map: kentry memory exhausted + youpi: ouch + that's what I usually get + ok + the kentry area is a preallocated memory area that is used to back + the vm_map_kentry cache + objects from this cache are used to describe kernel virtual memory + so in this case, i simply assume the kentry area must be enlarged + (currently, both virtual and physical memory is preallocated, an + improvement could be what is now done in x15, to preallocate virtual + memory only + ) + Mmm, why do we actually have this limit? + the kentry area must be described by one entry + ah, sorry, vm/vm_resident.c: kentry_data = + pmap_steal_memory(kentry_data_size); + a statically allocated one + I had missed that one + previously, the zone allocator would do that + the kentry area is required to avoid recursion when allocating + memory + another solution would be a custom allocator in vm_map, but i + wanted to use a common cache for those objects too + youpi: you could simply try doubling KENTRY_DATA_SIZE + already doing that + we might even consider a much larger size until it's reworked + well, it's rare enough on buildds already + doubling should be enough + or else we have leaks + right + it may not be leaks though + it may be poor map entry merging + i'd expected the kernel map entries to be easier to merge, but it + may simply not be the case + (i mean, when i made my tests, it looked like there were few + kernel map entries, but i may have missed corner cases that could cause + more of them to be needed) + + +## IRC, freenode, #hurd, 2014-02-11 + + youpi: what's the issue with kentry_data_size ? + I don't know + so back to 64pages from 256 ? + in debian for now yes + :/ + from what i recall with x15, grub is indeed allowed to put modules + and command lines around as it likes + restricted to 4G + iirc, command lines were in the first 1M while modules could be + loaded right after the kernel or at the end of memory, depending on the + versions + braunr: possibly VM_KERNEL_MAP_SIZE is then not big enough + youpi: what's the size of the ramdisk ? + youpi: or kmem_map too big + we discussed this earlier with teythoon + +[[user-space_device_drivers]], *Open Issues*, *System Boot*, *IRC, freenode, +\#hurd, 2011-07-27*, *IRC, freenode, #hurd, 2014-02-10* + + or maybe we want to remove kmem_map altogether and directly use + kernel_map + it's 6.2MiB big + hm + err no + looks small + 70MiB + ok yes + (uncompressed) + well + kernel_map is supposed to have 64M on i386 ... + it's 192M large, with kmem_map taking 128M + so at most 64M, with possible fragmentation + i believe the compressed initrd is stored in the ramdisk + ah, right it's ext2fs which uncompresses it + uncompresses it where + ? + libstore does that + module --nounzip /boot/${gtk}initrd.gz + braunr: in userland memory + it's not grub which uncompresses it for sure + braunr: so my ramdisk isn't 64 megs either + which explains why it sometimes works + yes + mine is like 15 megs + kentry_data_size calls pmap_steal_memory, an early allocation + function which changes virtual_space_start, which is later used to create + the first kernel map entry + err, pmap_steal_memory is called with kentry_data_size as its + argument + this first kernel map entry is installed inside kernel_map and + reduces the amount of available virtual memory there + so yes, it all points to a layout problem + i suggest reducing kmem_map down to 64M + that's enough to get d-i back to boot + what would be the downside? + (why did you raise it to 128 actually? :) ) + i merged the map used by generic kalloc allocations into kmem_map + both were 64M + i don't see any downside for the moment + i rarely see more than 50M used by the slab allocator + and with the recent code i added to collect reclaimable memory on + kernel allocation failures, it's unlikely the slab allocator will be + starved + but then we need that patch too + no + it would be needed if kmem_map gets filled + this very rarely happens + is "very rarely" enough ? :) + actualy i've never seen it happen + i added it because i had port leaks with fakeroot + port rights are a bit special because they're stored in a table in + kernel space + this table is enlarged with kmem_realloc + when an ipc space gets very large, fragmentation makes it very + difficult to successfully resize it + that should be the only possible issue + actually, there is another submap that steals memory from + kernel_map: device_io_map is 16M large + so kernel_map gets down to 48M + if the initial entry (that is, kentry_data_size + the physical + page table size) gets a bit large, kernel_map may have very little + available room + the physical page table size obviously varies depending on the + amount of physical memory loaded, which may explain why the installer + worked on some machines + well, it works up to 1855M + at 1856 it doesn't work any more :) + heh :) + and that's about the max gnumach can handle anyway + then reducing kmem_map down to 96M should be enough + it works indeed + could you check the amount of available space in kernel_map ? + the value of kernel_map->size should do + printing it "multiboot modules" print should be fine I guess? + + +### IRC, freenode, #hurd, 2014-02-12 + + probably + ? + i expect a bit more than 160M + (for the value of kernel_map->size) + teythoon: ? + well, it's 2110210048 + what is multiboot modules printing ? + almost last in gnumach bootup + humm + it must account directly mapped physical pages + considering the kernel has exactly 2G, this means there is 36M + available in kernel_map + youpi: is the ramdisk loaded at that moment ? + what do you mean by "loaded" ? :) + created + where? + allocated in kernel memory + the script hasn't started yet + ok + its size was 6M+ right ? + so it leaves around 30M + something like this yes + and changing kmem_map from 128M to 96M gave us 32M + so that's it + + +# IRC, freenode, #hurd, 2013-04-18 + + oh nice, i've found a big scalability issue with my slab allocator + it shouldn't affect gnumach much though + + +## IRC, freenode, #hurd, 2013-04-19 + + braunr: is it fixable? + yes + well, i'll do it in x15 for a start + again, i don't think gnumach is much affected + it's a scalability issue + when millions of objects are in use + gnumach rarely has more than a few hundred thousands + it's also related to heavy multithreading/smp + and by multithreading, i also mean preemption + gnumach isn't preemptible and uniprocessor + if the resulting diff is clean enough, i'll push it to gnumach + though :) + + +### IRC, freenode, #hurd, 2013-04-21 + + ArneBab_: i fixed the scalability problems btw + + +## IRC, freenode, #hurd, 2013-04-20 + + well, there is also a locking error in the slab allocator, + although not a problem for a non preemptible kernel like gnumach + non preemptible / uniprocessor -- cgit v1.2.3