summaryrefslogtreecommitdiff
path: root/open_issues/gnumach_memory_management.mdwn
diff options
context:
space:
mode:
authorThomas Schwinge <tschwinge@gnu.org>2011-09-01 09:27:33 +0200
committerThomas Schwinge <tschwinge@gnu.org>2011-09-01 09:27:33 +0200
commit3e7472b3d54853389cd8a17475901fbef976ef18 (patch)
treefdd31020d36728fe3c2059fa93a9dfcf7b2c2e87 /open_issues/gnumach_memory_management.mdwn
parent688fc9d79713c183c0b7ff2bc1717525c773bee9 (diff)
IRC.
Diffstat (limited to 'open_issues/gnumach_memory_management.mdwn')
-rw-r--r--open_issues/gnumach_memory_management.mdwn397
1 files changed, 397 insertions, 0 deletions
diff --git a/open_issues/gnumach_memory_management.mdwn b/open_issues/gnumach_memory_management.mdwn
index 448aafcc..a728fc9d 100644
--- a/open_issues/gnumach_memory_management.mdwn
+++ b/open_issues/gnumach_memory_management.mdwn
@@ -923,3 +923,400 @@ There is a [[!FF_project 266]][[!tag bounty]] on this task.
<braunr> 20 years ago
<braunr> but it's a source of deadlock
<mcsim> Indeed. I'll won't use kmem_alloc_pageable.
+
+
+# IRC, freenode, #hurd, 2011-08-09
+
+ < braunr> mcsim: what's the "bug related to MEM_CF_VERIFY" you refer to in
+ one of your commits ?
+ < braunr> mcsim: don't use spin_lock_t as a member of another structure
+ < mcsim> braunr: I confused with types in *_verify functions, so they
+ didn't work. Than I fixed it in the commit you mentioned.
+ < braunr> in gnumach, most types are actually structure pointers
+ < braunr> use simple_lock_data_t
+ < braunr> mcsim: ok
+ < mcsim> > use simple_lock_data_t
+ < mcsim> braunr: ok
+ < braunr> mcsim: don't make too many changes to the code base, and if
+ you're unsure, don't hesitate to ask
+ < braunr> also, i really insist you rename the allocator, as done in x15
+ for example
+ (http://git.sceen.net/rbraun/x15mach.git/?a=blob;f=vm/kmem.c), instead of
+ a name based on mine :/
+ < mcsim> braunr: Ok. It was just work name. When I finish I'll rename the
+ allocator.
+ < braunr> other than that, it's nice to see progress
+ < braunr> although again, it would be better with some reports along
+ < braunr> i won't be present at the meeting tomorrow unfortunately, but you
+ should use those to report the status of your work
+ < mcsim> braunr: You've said that I have to tweak gc process. Did you mean
+ to call mem_gc() when physical memory ends instead of calling it every x
+ seconds? Or something else?
+ < braunr> there are multiple topics, alhtough only one that really matters
+ < braunr> study how zone_gc was called
+ < braunr> reclaiming memory should happen when there is pressure on the VM
+ subsystem
+ < braunr> but it shouldn't happen too ofte, otherwise there is trashing
+ < braunr> and your caches become mostly useless
+ < braunr> the original slab allocator uses a 15-second period after a
+ reclaim during which reclaiming has no effect
+ < braunr> this allows having a somehow stable working set for this duration
+ < braunr> the linux slab allocator uses 5 seconds, but has a more
+ complicated reclaiming mechanism
+ < braunr> it releases memory gradually, and from reclaimable caches only
+ (dentry for example)
+ < braunr> for x15 i intend to implement the original 15 second interval and
+ then perform full reclaims
+ < mcsim> In zalloc mem_gc is called by vm_pageout_scan, but not often than
+ once a second.
+ < mcsim> In balloc I've changed interval to once in 15 seconds.
+ < braunr> don't use the code as it is
+ < braunr> the version you've based your work on was meant for userspace
+ < braunr> where there isn't memory pressure
+ < braunr> so a timer is used to trigger reclaims at regular intervals
+ < braunr> it's different in a kernel
+ < braunr> mcsim: where did you see vm_pageout_scan call the zone gc once a
+ second ?
+ < mcsim> vm_pageout_scan calls consider_zone_gc and consider_zone_gc checks
+ if second is passed.
+ < braunr> where ?
+ < mcsim> Than zone_gc can be called.
+ < braunr> ah ok, it's in zaclloc.c then
+ < braunr> zalloc.c
+ < braunr> yes this function is fine
+ < mcsim> so old gc didn't consider vm pressure. Or I missed something.
+ < braunr> it did
+ < mcsim> how?
+ < braunr> well, it's called by the pageout daemon
+ < braunr> under memory pressure
+ < braunr> so it's fine
+ < mcsim> so if mem_gc is called by pageout daemon is it fine?
+ < braunr> it must be changed to do something similar to what
+ consider_zone_gc does
+ < mcsim> It does. mem_gc does the same work as consider_zone_gc and
+ zone_gc.
+ < braunr> good
+ < mcsim> so gc process is fine?
+ < braunr> should be
+ < braunr> i see mem.c only includes mem.h, which then includes other
+ headers
+ < braunr> don't do that
+ < braunr> always include all the headers you need where you need them
+ < braunr> if you need avltree.h in both mem.c and mem.h, include it in both
+ files
+ < braunr> and by the way, i recommend you use the red black tree instead of
+ the avl type
+ < braunr> (it's the same interface so it shouldn't take long)
+ < mcsim> As to report. If you won't be present at the meeting, I can tell
+ you what I have to do now.
+ < braunr> sure
+ < braunr> in addition, use GPLv2 as the license, teh BSD one is meant for
+ the userspace version only
+ < braunr> GPLv2+ actually
+ < braunr> hm you don't need list.c
+ < braunr> it would only add dead code
+ < braunr> "Zone for dynamical allocator", don't mix terms
+ < braunr> this comment refers to a vm_map, so call it a map
+ < mcsim> 1. Change constructor for kentry_alloc_cache.
+ < mcsim> 2. Make measurements.
+ < mcsim> +
+ < mcsim> 3. Use simple_lock_data_t
+ < mcsim> 4. Replace license
+ < braunr> kentry_alloc_cache <= what is that ?
+ < braunr> cache for kernel map entries in vm_map ?
+ < braunr> the comment for mem_cpu_pool_get doesn't apply in gnumach, as
+ there is no kernel preemption
+ < braunr> "Don't attempt mem GC more frequently than hz/MEM_GC_INTERVAL
+ times a second.
+ < braunr> "
+ < mcsim> sorry. I meant vm_map_kentry_cache
+ < braunr> hm nothing actually about this comment
+ < braunr> mcsim: ok
+ < braunr> yes kernel map entries need special handling
+ < braunr> i don't know how it's done in gnumach though
+ < braunr> static preallocation ?
+ < mcsim> yes
+ < braunr> that's ugly :p
+ < mcsim> but it uses dynamic allocation further even for vm_map kernel
+ entries
+ < braunr> although such bootstrapping issues are generally difficult to
+ solve elegantly
+ < braunr> ah
+ < mcsim> now I use only static allocation, but I'll add dynamic allocation
+ too
+ < braunr> when you have time, mind the coding style (convert everything to
+ gnumach style, which mostly implies using tabs instead of 4-spaces
+ indentation)
+ < braunr> when you'll work on dynamic allocation for the kernel map
+ entries, you may want to review how it's done in x15
+ < braunr> the mem_source type was originally intended for that purpose, but
+ has slightly changed once the allocator was adapted to work in my kernel
+ < mcsim> ok
+ < braunr> vm_map_kentry_zone is the only zone created with ZONE_FIXED
+ < braunr> and it is zcram()'ed immediately after
+ < braunr> so you can consider it a statically allocated zone
+ < braunr> in x15 i use another strategy: there is a special kernel submap
+ named kentry_map which contains only one map entry (statically allocated)
+ < braunr> this map is the backend (mem_source) for the kentry_cache
+ < braunr> the kentry_cache is created with a special flag that tells it
+ memory can't be reclaimed
+ < braunr> when the cache needs to grow, the single map entry is extended to
+ cover the allocated memory
+ < braunr> it's similar to the way pmap_growkernel() works for kernel page
+ table pages
+ < braunr> (and is actually based on that idea)
+ < braunr> it's a compromise between full static and dynamic allocation
+ types
+ < braunr> the advantage is that the allocator code can be used (so there is
+ no need for a special allocator like in netbsd)
+ < braunr> the drawback is that some resources can never be returned to
+ their source (and under peaks, the amount of unfreeable resources could
+ become large, but this is unexpected)
+ < braunr> mcsim: for now you shouldn't waste your time with this
+ < braunr> i see the number of kernel map entries is fixed at 256
+ < braunr> and i've never seen the kernel use more than around 30 entries
+ < mcsim> Do you think that I have to left this problem to the end?
+ < braunr> yes
+
+
+# IRC, freenode, #hurd, 2011-08-11
+
+ < mcsim> braunr: Hello. Can you give me an advice how can I make
+ measurements better?
+ < braunr> mcsim: what kind of measurements
+ < mcsim> braunr: How much is your allocator better than zalloc.
+ < braunr> slightly :p
+ < braunr> that's why i never took the time to put it in gnumach
+ < mcsim> braunr: Just I thought that there are some rules or
+ recommendations of such measurements. Or I can do them any way I want?
+ < braunr> mcsim: i don't know
+ < braunr> mcsim: benchmarking is an art of its own, and i don't even know
+ how to use the bits of profiling code available in gnumach (if it still
+ works)
+ < antrik> mcsim: hm... are you saying you already have a running system
+ with slab allocator?... :-)
+ < braunr> mcsim: the main advantage i can see is the removal of many
+ arbitrary hard limits
+ < mcsim> antrik: yes
+ < antrik> \o/
+ < antrik> nice work!
+ < braunr> :)
+ < braunr> the cpu layer should also help a bit, but it's hard to measure
+ < braunr> i guess it could be seen on the ipc path for very small buffers
+ < mcsim> antrik: Thanks. But I still have to 1. Change constructor for
+ kentry_alloc_cache. and 2. Make measurements.
+ < braunr> and polish the whole thing :p
+ < antrik> mcsim: I'm not sure this can be measured... the performance
+ differente in any real live usage is probably just a few percent at most
+ -- it's hard to construct a benchmark giving enough precision so it's not
+ drowned in noise...
+ < antrik> perhaps it conserves some memory -- but that too would be hard to
+ measure I fear
+ < braunr> yes
+ < braunr> there *should* be better allocation times, less fragmentation,
+ better accounting ... :)
+ < braunr> and no arbitrary limits !
+ < antrik> :-)
+ < braunr> oh, and the self debugging features can be nice too
+ < mcsim> But I need to prove that my work wasn't useless
+ < braunr> well it wasn't, but that's hard to measure
+ < braunr> it's easy to prove though, since there are additional features
+ that weren't present in the zone allocator
+ < mcsim> Ok. If there are some profiling features in gnumach can you give
+ me a link with their description?
+ < braunr> mcsim: sorry, no
+ < braunr> mcsim: you could still write the basic loop test, which counts
+ the number of allocations performed in a fixed time interval
+ < braunr> but as it doesn't match many real life patterns, it won't be very
+ useful
+ < braunr> and i'm afraid that if you consider real life patterns, you'll
+ see how negligeable the improvement can be compared to other operations
+ such as memory copies or I/O (ouch)
+ < mcsim> Do network drivers use this allocator?
+ < mcsim> ok. I'll scrape up some test and than I'll report results.
+
+
+# IRC, freenode, #hurd, 2011-08-26
+
+ < mcsim> hello. Are there any analogs of copy_to_user and copy_from_user in
+ linux for gnumach?
+ < mcsim> Or how can I determine memory map if I know address? I need this
+ for vm_map_copyin
+ < guillem> mcsim: vm_map_lookup_entry?
+ < mcsim> guillem: but I need to transmit map to this function and it will
+ return an entry which contains specified address.
+ < mcsim> And I don't know what map have I transmit.
+ < mcsim> I need to transfer static array from kernel to user. What map
+ contains static data?
+ < antrik> mcsim: Mach doesn't have copy_{from,to}_user -- instead, large
+ chunks of data are transferred as out-of-line data in IPC messages
+ (i.e. using VM magic)
+ < mcsim> antrik: can you give me an example? I just found using
+ vm_map_copyin in host_zone_info.
+ < antrik> no idea what vm_map_copyin is to be honest...
+
+
+# IRC, freenode, #hurd, 2011-08-27
+
+ < braunr> mcsim: the primitives are named copyin/copyout, and they are used
+ for messages with inline data
+ < braunr> or copyinmsg/copyoutmsg
+ < braunr> vm_map_copyin/out should be used for chunks larger than a page
+ (or roughly a page)
+ < braunr> also, when writing to a task space, see which is better suited:
+ vm_map_copyout or vm_map_copy_overwrite
+ < mcsim> braunr: and what will be src_map for vm_map_copyin/out?
+ < braunr> the caller map
+ < braunr> which you can get with current_map() iirc
+ < mcsim> braunr: thank you
+ < braunr> be careful not to leak anything in the transferred buffers
+ < braunr> memset() to 0 if in doubt
+ < mcsim> braunr:ok
+ < braunr> antrik: vm_map_copyin() is roughly vm_read()
+ < antrik> braunr: what is it used for?
+ < braunr> antrik: 01:11 < antrik> mcsim: Mach doesn't have
+ copy_{from,to}_user -- instead, large chunks of data are transferred as
+ out-of-line data in IPC messages (i.e. using VM magic)
+ < braunr> antrik: that "VM magic" is partly implemented using vm_map_copy*
+ functions
+ < antrik> braunr: oh, you mean it doesn't actually copy data, but only page
+ table entries? if so, that's *not* really comparable to
+ copy_{from,to}_user()...
+
+
+# IRC, freenode, #hurd, 2011-08-28
+
+ < braunr> antrik: the equivalent of copy_{from,to}_user are
+ copy{in,out}{,msg}
+ < braunr> antrik: but when the data size is about a page or more, it's
+ better not to copy, of course
+ < antrik> braunr: it's actually not clear at all that it's really better to
+ do VM magic than to copy...
+
+
+# IRC, freenode, #hurd, 2011-08-29
+
+ < braunr> antrik: at least, that used to be the general idea, and with a
+ simpler VM i suspect it's still true
+ < braunr> mcsim: did you progress on your host_zone_info replacement ?
+ < braunr> mcsim: i think you should stick to what the original
+ implementation did
+ < braunr> which is making an inline copy if caller provided enough space,
+ using kmem_alloc_pageable otherwise
+ < braunr> specify ipc_kernel_map if using kmem_alloc_pageable
+ < mcsim> braunr: yes. And it works. But I use kmem_alloc, not pageable. Is
+ it worse?
+ < mcsim> braunr: host_zone_info replacement is pushed to savannah
+ repository.
+ < braunr> mcsim: i'll have a look
+ < mcsim> braunr: I've pushed one more commit just now, which has attitude
+ to host_zone_info.
+ < braunr> mem_alloc_early_init should be renamed mem_bootstrap
+ < mcsim> ok
+ < braunr> mcsim: i don't understand your call to kmem_free
+ < mcsim> braunr: It shouldn't be there?
+ < braunr> why should it be there ?
+ < braunr> you're freeing what the copy object references
+ < braunr> it's strange that it even works
+ < braunr> also, you shouldn't pass infop directly as the copy object
+ < braunr> i guess you get a warning for that
+ < braunr> do what the original code does: use an intermediate copy object
+ and a cast
+ < mcsim> ok
+ < braunr> another error (without consequence but still, you should mind it)
+ < braunr> simple_lock(&mem_cache_list_lock);
+ < braunr> [...]
+ < braunr> kr = kmem_alloc(ipc_kernel_map, &info, info_size);
+ < braunr> you can't hold simple locks while allocating memory
+ < braunr> read how the original implementation works around this
+ < mcsim> ok
+ < braunr> i guess host_zone_info assumes the zone list doesn't change much
+ while unlocked
+ < braunr> or that's it's rather unimportant since it's for debugging
+ < braunr> a strict snapshot isn't required
+ < braunr> list_for_each_entry(&mem_cache_list, cache, node) max_caches++;
+ < braunr> you should really use two separate lines for readability
+ < braunr> also, instead of counting each time, you could just maintain a
+ global counter
+ < braunr> mcsim: use strncpy instead of strcpy for the cache names
+ < braunr> not to avoid overflow but rather to clear the unused bytes at the
+ end of the buffer
+ < braunr> mcsim: about kmem_alloc vs kmem_alloc_pageable, it's a minor
+ issue
+ < braunr> you're handing off debugging data to a userspace application
+ < braunr> a rather dull reporting tool in most cases, which doesn't require
+ wired down memory
+ < braunr> so in order to better use available memory, pageable memory
+ should be used
+ < braunr> in the future i guess it could become a not-so-minor issue though
+ < mcsim> ok. I'll fix it
+ < braunr> mcsim: have you tried to run the kernel with MC_VERIFY always on
+ ?
+ < braunr> MEM_CF_VERIFY actually
+ < mcsim1> yes.
+ < braunr> oh
+ < braunr> nothing wrong
+ < braunr> ?
+ < mcsim1> it is always set
+ < braunr> ok
+ < braunr> ah, you set it in macros.h ..
+ < braunr> don't
+ < braunr> put it in mem.c if you want, or better, make it a compile-time
+ option
+ < braunr> macros.h is a tiny macro library, it shouldn't define such
+ unrelated options
+ < mcsim1> ok.
+ < braunr> mcsim1: did you try fault injection to make sure the checking
+ code actually works and how it behaves when an error occurs ?
+ < mcsim1> I think that when I finish I'll merge files cpu.h and macros.h
+ with mem.c
+ < braunr> yes that would simplify things
+ < mcsim1> Yes. When I confused with types mem_buf_fill worked wrong and
+ panic occurred.
+ < braunr> very good
+ < braunr> have you progressed concerning the measurements you wanted to do
+ ?
+ < mcsim1> not much.
+ < braunr> ok
+ < mcsim1> I think they will be ready in a few days.
+ < antrik> what measurements are these?
+ < mcsim1> braunr: What maximal size for static data and stack in kernel?
+ < braunr> what do you mean ?
+ < braunr> kernel stacks are one page if i'm right
+ < braunr> static data (rodata+data+bss) are limited by grub bugs only :)
+ < mcsim1> braunr: probably they are present, because when I created too big
+ array I couldn't boot kernel
+ < braunr> local variable or static ?
+ < mcsim1> static
+ < braunr> how large ?
+ < mcsim1> 4Mb
+ < braunr> hm
+ < braunr> it's not a grub bug then
+ < braunr> i was able to embed as much as 32 MiB in x15 while doing this
+ kind of tests
+ < braunr> I guess it's the gnu mach boot code which only preallocates one
+ page for the initial kernel mapping
+ < braunr> one PTP (page table page) maps 4 MiB
+ < braunr> (x15 does this completely dynamically, unlike mach or even
+ current BSDs)
+ < mcsim1> antrik: First I want to measure time of each cache
+ creation/allocation/deallocation and then compile kernel.
+ < braunr> cache creation is irrelevant
+ < braunr> because of the cpu pools in the new allocator, you should test at
+ least two different allocation patterns
+ < braunr> one with quick allocs/frees
+ < braunr> the other with large numbers of allocs then their matching frees
+ < braunr> (larger being at least 100)
+ < braunr> i'd say the cpu pool layer is the real advantage over the
+ previous zone allocator
+ < braunr> (from a performance perspective)
+ < mcsim1> But there is only one cpu
+ < braunr> it doesn't matter
+ < braunr> it's stil a very effective cache
+ < braunr> in addition to reducing contention
+ < braunr> compare mem_cpu_pool_pop() against mem_cache_alloc_from_slab()
+ < braunr> mcsim1: work is needed to polish the whole thing, but getting it
+ actually working is a nice achievement for someone new on the project
+ < braunr> i hope it helped you learn about memory allocation, virtual
+ memory, gnu mach and the hurd in general :)
+ < antrik> indeed :-)