From be4193108513f02439a211a92fd80e0651f6721b Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <tschwinge@gnu.org>
Date: Wed, 30 Nov 2011 21:21:45 +0100
Subject: IRC.

---
 open_issues/gnumach_memory_management.mdwn | 202 +++++++++++++++++++++++++++++
 1 file changed, 202 insertions(+)

(limited to 'open_issues/gnumach_memory_management.mdwn')
diff --git a/open_issues/gnumach_memory_management.mdwn b/open_issues/gnumach_memory_management.mdwn
index 9a4418c1..c9c3e64f 100644
--- a/open_issues/gnumach_memory_management.mdwn
+++ b/open_issues/gnumach_memory_management.mdwn
@@ -1810,3 +1810,205 @@ There is a [[!FF_project 266]][[!tag bounty]] on this task.
 
     <braunr> etenil: but mcsim's work is, for one, useful because the allocator
       code is much clearer, adds some debugging support, and is smp-ready
+
+
+# IRC, freenode, #hurd, 2011-11-14
+
+    <braunr> i've just realized that replacing the zone allocator removes most
+      (if not all) static limit on allocated objects
+    <braunr> as we have nothing similar to rlimits, this means kernel resources
+      are actually exhaustible
+    <braunr> and i'm not sure every allocation is cleanly handled in case of
+      memory shortage
+    <braunr> youpi: antrik: tschwinge: is this acceptable anyway ?
+    <braunr> (although IMO, it's also a good thing to get rid of those limits
+      that made the kernel panic for no valid reason)
+    <youpi> there are actually not many static limits on allocated objects
+    <youpi> only a few have one
+    <braunr> those defined in kern/mach_param.h
+    <youpi> most of them are not actually enforced
+    <braunr> ah ?
+    <braunr> they are used at zinit() time
+    <braunr> i thought they were
+    <youpi> yes,  but most zones are actually fine with overcoming the max
+    <braunr> ok
+    <youpi> see zone->max_size += (zone->max_size >> 1);
+    <youpi> you need both !EXHAUSTIBLE and FIXED
+    <braunr> ok
+    <pinotree> making having rlimits enforced would be nice...
+    <pinotree> s/making//
+    <braunr> pinotree: the kernel wouldn't handle many standard rlimits anyway
+
+    <braunr> i've just committed my final patch on mcsim's branch, which will
+      serve as the starting point for integration
+    <braunr> which means code in this branch won't change (or only last minute
+      changes)
+    <braunr> you're invited to test it
+    <braunr> there shouldn't be any noticeable difference with the master
+      branch
+    <braunr> a bit less fragmentation
+    <braunr> more memory can be reclaimed by the VM system
+    <braunr> there are debugging features
+    <braunr> it's SMP ready
+    <braunr> and overall cleaner than the zone allocator
+    <braunr> although a bit slower on the free path (because of what's
+      performed to reduce fragmentation)
+    <braunr> but even "slower" here is completely negligible
+
+
+# IRC, freenode, #hurd, 2011-11-15
+
+    <mcsim> I enabled cpu_pool layer and kentry cache exhausted at "apt-get
+      source gnumach && (cd gnumach-* && dpkg-buildpackage)"
+    <mcsim> I mean kernel with your last commit
+    <mcsim> braunr: I'll make patch how I've done it in a few minutes, ok? It
+      will be more specific.
+    <braunr> mcsim: did you just remove the #if NCPUS > 1 directives ?
+    <mcsim> no. I replaced macro NCPUS > 1 with SLAB_LAYER, which equals NCPUS
+      > 1, than I redefined macro SLAB_LAYER
+    <braunr> ah, you want to make the layer optional, even on UP machines
+    <braunr> mcsim: can you give me the commands you used to trigger the
+      problem ?
+    <mcsim> apt-get source gnumach && (cd gnumach-* && dpkg-buildpackage)
+    <braunr> mcsim: how much ram & swap ?
+    <braunr> let's see if it can handle a quite large aptitude upgrade
+    <mcsim> how can I check swap size?
+    <braunr> free
+    <braunr> cat /proc/meminfo
+    <braunr> top
+    <braunr> whatever
+    <mcsim>              total       used       free     shared    buffers
+      cached
+    <mcsim> Mem:        786368     332296     454072          0          0
+      0
+    <mcsim> -/+ buffers/cache:     332296     454072
+    <mcsim> Swap:      1533948          0    1533948
+    <braunr> ok, i got the problem too
+    <mcsim> braunr: do you run hurd in qemu?
+    <braunr> yes
+    <braunr> i guess the cpu layer increases fragmentation a bit
+    <braunr> which means more map entries are needed
+    <braunr> hm, something's not right
+    <braunr> there are only 26 kernel map entries when i get the panic
+    <braunr> i wonder why the cache gets that stressed
+    <braunr> hm, reproducing the kentry exhaustion problem takes quite some
+      time
+    <mcsim> braunr: what do you mean?
+    <braunr> sometimes, dpkg-buildpackage finishes without triggering the
+      problem
+    <mcsim> the problem is in apt-get source gnumach
+    <braunr> i guess the problem happens because of drains/fills, which
+      allocate/free much more object than actually preallocated at boot time
+    <braunr> ah ?
+    <braunr> ok
+    <braunr> i've never had it at that point, only later
+    <braunr> i'm unable to trigger it currently, eh
+    <mcsim> do you use *-dbg kernel?
+    <braunr> yes
+    <braunr> well, i use the compiled kernel, with the slab allocator, built
+      with the in kernel debugger
+    <mcsim> when you run apt-get source gnumach, you run it in clean directory?
+      Or there are already present downloaded archives?
+    <braunr> completely empty
+    <braunr> ah just got it
+    <braunr> ok the limit is reached, as expected
+    <braunr> i'll just bump it
+    <braunr> the cpu layer drains/fills allocate several objects at once (64 if
+      the size is small enough)
+    <braunr> the limit of 256 (actually 252 since the slab descriptor is
+      embedded in its slab) is then easily reached
+    <antrik> mcsim: most direct way to check swap usage is vmstat
+    <braunr> damn, i can't live without slabtop and the amount of
+      active/inactive cache memory any more
+    <braunr> hm, weird, we have active/inactive memory in procfs, but not
+      buffers/cached memory
+    <braunr> we could set buffers to 0 and everything as cached memory, since
+      we're currently unable to communicate the purpose of cached memory
+      (whether it's used by disk servers or file system servers)
+    <braunr> mcsim: looks like there are about 240 kernel map entries (i forgot
+      about the ones used in kernel submaps)
+    <braunr> so yes, addin the cpu layer is what makes the kernel reach the
+      limit more easily
+    <mcsim> braunr: so just increasing limit will solve the problem?
+    <braunr> mcsim: yes
+    <braunr> slab reclaiming looks very stable
+    <braunr> and unfrequent
+    <braunr> (which is surprising)
+    <pinotree> braunr: "unfrequent"?
+    <braunr> pinotree: there isn't much memory pressure
+    <braunr> slab_collect() gets called once a minute on my hurd
+    <braunr> or is it infrequent ?
+    <braunr> :)
+    <pinotree> i have no idea :)
+    <braunr> infrequent, yes
+
+
+# IRC, freenode, #hurd, 2011-11-16
+
+    <braunr> for those who want to play with the slab branch of gnumach, the
+      slabinfo tool is available at http://git.sceen.net/rbraun/slabinfo.git/
+    <braunr> for those merely interested in numbers, here is the output of
+      slabinfo, for a hurd running in kvm with 512 MiB of RAM, an unused swap,
+      and a short usage history (gnumach debian packages built, aptitude
+      upgrade for a dozen of packages, a few git commands)
+    <braunr> http://www.sceen.net/~rbraun/slabinfo.out
+    <antrik> braunr: numbers for a long usage history would be much more
+      interesting :-)
+
+
+## IRC, freenode, #hurd, 2011-11-17
+
+    <braunr> antrik: they'll come :)
+    <etenil> is something going on on darnassus? it's mighty slow
+    <braunr> yes
+    <braunr> i've rebooted it to run a modified kernel (with the slab
+      allocator) and i'm building stuff on it to stress it
+    <braunr> (i don't have any other available machine with that amount of
+      available physical memory)
+    <etenil> ok
+    <antrik> braunr: probably would be actually more interesting to test under
+      memory pressure...
+    <antrik> guess that doesn't make much of a difference for the kernel object
+      allocator though
+    <braunr> antrik: if ram is larger, there can be more objects stored in
+      kernel space, then, by building something large such as eglibc, memory
+      pressure is created, causing caches to be reaped
+    <braunr> our page cache is useless because of vm_object_cached_max
+    <braunr> it's a stupid arbitrary limit masking the inability of the vm to
+      handle pressure correctly 
+    <braunr> if removing it, the kernel freezes soon after ram is filled
+    <braunr> antrik: it may help trigger the "double swap" issue you mentioned
+    <antrik> what may help trigger it?
+    <braunr> not checking this limit
+    <antrik> hm... indeed I wonder whether the freezes I see might have the
+      same cause
+
+
+## IRC, freenode, #hurd, 2011-11-19
+
+    <braunr> http://www.sceen.net/~rbraun/slabinfo.out <= state of the slab
+      allocator after building the debian libc packages and removing all files
+      once done
+    <braunr> it's mostly the same as on any other machine, because of the
+      various arbitrary limits in mach (most importantly, the max number of
+      objects in the page cache)
+    <braunr> fragmentation is still quite low
+    <antrik> braunr: actually fragmentation seems to be lower than on the other
+      run...
+    <braunr> antrik: what makes you think that ?
+    <antrik> the numbers of currently unused objects seem to be in a similar
+      range IIRC, but more of them are reclaimable I think
+    <antrik> maybe I'm misremembering the other numbers
+    <braunr> there had been more reclaims on the other run
+
+
+# IRC, freenode, #hurd, 2011-11-25
+
+    <braunr> mcsim: i've just updated the slab branch, please review my last
+      commit when you have time
+    <mcsim> braunr: Do you mean compilation/tests?
+    <braunr> no, just a quick glance at the code, see if it matches what you
+      intended with your original patch
+    <mcsim> braunr: everything is ok
+    <braunr> good
+    <braunr> i think the branch is ready for integration
-- 
cgit v1.2.3