1 files changed, 361 insertions, 0 deletions
diff --git a/service_solahart_jakarta_selatan__082122541663/ext2fs_page_cache_swapping_leak.mdwn b/service_solahart_jakarta_selatan__082122541663/ext2fs_page_cache_swapping_leak.mdwn
new file mode 100644
index 00000000..81915492
--- /dev/null
+++ b/service_solahart_jakarta_selatan__082122541663/ext2fs_page_cache_swapping_leak.mdwn
@@ -0,0 +1,361 @@
+[[!meta copyright="Copyright © 2011, 2012, 2013 Free Software Foundation,
+Inc."]]
+
+[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
+id="license" text="Permission is granted to copy, distribute and/or modify this
+document under the terms of the GNU Free Documentation License, Version 1.2 or
+any later version published by the Free Software Foundation; with no Invariant
+Sections, no Front-Cover Texts, and no Back-Cover Texts.  A copy of the license
+is included in the section entitled [[GNU Free Documentation
+License|/fdl]]."]]"""]]
+
+[[!tag open_issue_gnumach open_issue_hurd]]
+
+There is a [[!FF_project 272]][[!tag bounty]] on this task.
+
+[[!toc]]
+
+
+# IRC, OFTC, #debian-hurd, 2011-03-24
+
+    <youpi> I still believe we have an ext2fs page cache swapping leak, however
+    <youpi> as the 1.8GiB swap was full, yet the ld process was only 1.5GiB big
+    <pinotree> a leak at swapping time, you mean?
+    <youpi> I mean the ext2fs page cache being swapped out instead of simply
+      dropped
+    <pinotree> ah
+    <pinotree> so the swap tends to accumulate unuseful stuff, i see
+    <youpi> yes
+    <youpi> the disk content, basicallyt :)
+
+
+# IRC, freenode, #hurd, 2011-04-18
+
+    <antrik> damn, a cp -a simply gobbles down swap space...
+    <braunr> really ?
+    <braunr> that's weird
+    <braunr> why would a copy use so much anonymous memory ?
+    <braunr> unless the external pager is so busy that the kernel falls back to
+      its default pager
+    <youpi> that's what I suggested some time ago
+    <braunr> maybe this case should be traced in the kernel
+    <braunr> a simple message in the kernel buffer to warn that this condition
+      happened may help
+    <youpi> I'm seeing swap space being kept used on buildds for no real reason
+      except possibly backing ext2fs pages
+    <youpi> that could help, yes
+    <antrik> youpi: I think it was actually slpz who suggested that...
+    <youpi> I think we're generally missing feedback from memory behavior
+    <antrik> youpi: do you think andrei's kernel instrumentation work might be
+      helpful with analyzing such things?
+    <youpi> antrik: I think I suggested it too, but never mind
+    <youpi> antrik: no, because it's not a trace of events that you want
+    <youpi> some specific events would be useful
+    <youpi> but then we don't really need a whole framework for that
+    <antrik> apt-get upgrade eats swap too
+    <youpi> the upgrade itself, or the computation of the ugprade?
+    <youpi> apt is a memory eater nowadays
+    <antrik> installing the packages
+    <antrik> seems to have stabilized though after a while...
+    <antrik> so perhaps it's not a leak in this case
+    <youpi> ideally we should have a way to know what was put in the swap
+    <braunr> how would you represent what's in the swap ?
+    <antrik> the apt-get process has 46M of virtual memory above the 128 M
+      baseline
+    <braunr> mostly libraries i guess
+    <braunr> are trheads stacks 8 MiB like on Linux ?
+    <youpi> braunr: at least knowing how much of each process is in the swap
+    <youpi> braunr: 2MiB
+    <braunr> ok
+    <youpi> vminfo could also report which parts of the address space are in
+      the swap
+    <antrik> youpi: would be nice to have some simple utility reporting how
+      much of a process' address space is anonymous
+    <antrik> (in fact, I wonder why it's not reported by standard tools such as
+      ps or top... this shouldn't be too difficult I would think?)
+    <antrik> it would be much more useful information than the total virt size,
+      which includes rather meaningless disk and device mappings...
+    <youpi> agreed
+    <braunr> well
+    <braunr> there are tools like pmap for this
+    <braunr> unfortunately, it's difficult in mach to know what backs a
+      non-anonymous mapping
+    <braunr> pagers should be able to name their mappings
+    <youpi> that'd be helpful for debugging yes
+    <braunr> there is almost no overhead in doing that, and it would be very
+      useful
+    <youpi> and could lead to /proc/pid/maps
+    <braunr> yes
+    <braunr> isn't there a maps already ?
+    <youpi> nope
+    <braunr> ok
+    <youpi> (probably not very useful without the names)
+    <braunr>  ithought i remembered maps without names, and guessed it might
+      have been on the hurd for that reason
+    <braunr> but i'm not sure
+    <youpi> there's the vminfo command, yes
+    <braunr> 14:06 < youpi> braunr: at least knowing how much of each process
+      is in the swap
+    <braunr> wouldn't it be clearer to do it the other way around ?
+    <braunr> like a swapinfo tool indicating what it contains ?
+    <youpi> sure, but it's a lot more difficult
+    <braunr> really ?
+    <braunr> why ?
+    <youpi> because you have to traverse all the mappings
+    <youpi> etc
+    <youpi> (in all processes, I mean)
+    <youpi> and you have to name what is waht
+    <braunr> there are other ways
+    <braunr> the swap is a central structure
+    <youpi> while simply introducing the swap %  in vminfo
+    <youpi> for a given process you know what is what
+    <braunr> right
+    <youpi> and doing that introduction is  probably very simple
+    <braunr> that's a good point
+    <braunr> top-down is effectively easier than bottom-up resolution in Mach
+      VM
+    <antrik> hm... the memory use caused by cp doesn't seem to be reflected in
+      the virtual size of any particular process
+    <antrik> ghost memory
+    <braunr> what's cp vmsize at the time of the problem ?
+    <antrik> it's at 134 M right now... so considering the 128 M baseline,
+      nothing worth speaking of
+    <braunr> right
+    <braunr> maybe a copy map during I/O
+    <braunr> but I don't know Mach copy maps in detail, as they have been
+      eliminated from UVM
+    <antrik> BTW, the memory eatup happens even before swap comes into
+      play... swapping seems to be a result of the problem, not the cause
+    <braunr> what do you mean ?
+    <braunr> I thought swapping was the issue
+    <braunr> you mean RAM is full before swapping ?
+    <antrik> well, I don't know what the actual problem is... I just don't
+      understand why the memory use increases without any particular process
+      seeing an increase in size
+    <antrik> the "free" size in vmstat decreses
+    <antrik> once it's eatun up, swap space use increases
+    <braunr> well it doesn't change much of it
+    <braunr> the anonymous memory pager will use RAM before resorting to the
+      external default-pager
+    <antrik> I would suspect normal block caching... but then, shouldn't this
+      show up in the memory info of the ext2 process?
+    <braunr> although, again, I'm not sure of the behaviour of the anonymous
+      memory pager
+    <braunr> antrik: I don't know how block caching behaves
+    <antrik> BTW, is it a know problem that doing ^C on a "cp -a" seems to hang
+      the whole system?...
+    <antrik> (the whole hurd instance that is... the other instance is not
+      affected)
+    <youpi> not that I know of
+    <braunr> seems like a deadlock in the anonymous memory handling
+    <youpi> (and I've never seen that)
+    <antrik> happens both in my main system (using ancient hurd/libc) and in my
+      subhurd (recently upgraded to current stuff)
+    <antrik> this make testing this stuff quite a lot harder... [sigh]
+    <antrik> any suggestions how to debug this hang?
+    <braunr> antrik: no :/
+
+2011-04-28: [[!taglink open_issue_documentation]]
+
+    <antrik> hm... is it normal that "swap free" doesn't increase as a process'
+      memory is paged back in?
+    <youpi> yes
+    <youpi> there's no real use cleaning swap
+    <youpi> on the contrary, it makes paging the process out again longer
+    <antrik> hm... so essentially, after swapping back and forth a bit, a part
+      of the swap equal to the size of physical RAM will be occupied with stuff
+      that is actually in RAM?
+    <youpi> yes
+    <youpi> so that that RAM can be freed immediately if needed
+    <antrik> hm... that means my effective swap size is only like 300 MB... no
+      wonder I see crashes under load
+    <antrik> err... make that 230 actually
+    <antrik> indeed, quitting the application freed both the physical RAM and
+      swap space
+    <braunr> 02:28 < antrik> hm... is it normal that "swap free" doesn't
+      increase as a process' memory is paged back in?
+    <braunr> swap is the backing store of anonymous memory, like ext2fs is the
+      backing store of memory objects created from its pager
+    <braunr> so you can view swap as the file system for everything that isn't
+      an external memory object
+
+
+# IRC, freenode, #hurd, 2011-11-15
+
+    <braunr> hm, now my system got unstable
+    <braunr> swap is increasing, without any apparent reason
+    <antrik> you mean without any load?
+    <braunr> with load, yes
+    <braunr> :)
+    <antrik> well, with load is "normal"...
+    <antrik> at least for some loads
+    <braunr> i can't create memory pressure to stress reclaiming without any
+      load
+    <antrik> what load are you using?
+    <braunr> ftp mirrorring
+    <antrik> hm... never tried that; but I guess it's similar to apt-get
+    <antrik> so yes, that's "normal". I talked about it several times, and also
+      wrote to the ML
+    <braunr> antrik: ok
+    <antrik> if you find out how to fix this, you are my hero ;-)
+    <braunr> arg :)
+    <antrik> I suspect it's the infamous double swapping problem; but that's
+      just a guess
+    <braunr> looks like this
+    <antrik> BTW, if you give me the exact command, I could check if I see it
+      too
+    <braunr> i use lftp (mirror -Re) from a linux git repository
+    <braunr> through sftp
+    <braunr> (lots of small files, big content)
+    <antrik> can't you just give me the exact command? I don't feel like
+      figuring it out myself
+    <braunr> antrik: cd linux-stable; lftp sftp://hurd_addr/
+    <braunr> inside lftp: mkdir linux-stable; cd linux-stable; mirror -Re
+    <braunr> hm, half of physical memory just got freed
+    <braunr> our page cache is really weird :/
+    <braunr> (i didn't delete any file when that happened)
+    <antrik> hurd_addr?
+    <braunr> ssh server ip address
+    <braunr> or name
+    <braunr> of your hurd :)
+    <antrik> I'm confused. you are mirroring *from* the Hurd box?
+    <braunr> no, to it
+    <antrik> ah, so you login via sftp and then push to it?
+    <braunr> yes
+    <braunr> fragmentation looks very fine
+    <braunr> even for the huge pv_entry cache and its 60k+ entries
+    <braunr> (and i'm running a kernel with the cpu layer enabled)
+    <braunr> git reset/status/diff/log/grep all work correctly
+    <braunr> anyway, mcsim's branch looks quite stable to me
+    <antrik> braunr: I can't reproduce the swap leak with ftp. free memory
+      idles around 6.5 k (seems to be the threshold where paging starts), and
+      swap use is constant
+    <antrik> might be because everything swappable is already present in swap
+      from previous load I guess...
+    <antrik> err... scratch that. was connected to the wrong host, silly me
+    <antrik> indeed swap gets eaten away, as expected
+    <antrik> but only if free memory actually falls below the
+      threshold. otherwise it just oscillates around a constant value, and
+      never touches swap
+    <antrik> so this seems to confirm the double swapping theory
+    <youpi> antrik: is that "double swap" theory written somewhere?
+    <youpi> (no, a quick google didn't tell me)
+
+
+## IRC, freenode, #hurd, 2011-11-16
+
+    <antrik> youpi:
+      http://lists.gnu.org/archive/html/l4-hurd/2002-06/msg00001.html talks
+      about "double paging". probably it's also the term others used for it;
+      however, the term is generally used in a completely different meaning, so
+      I guess it's not really suitable for googling either ;-)
+    <antrik> IIRC slpz (or perhaps someone else?) proposed a solution to this,
+      but I don't remember any details
+    <youpi> ok so it's the same thing I was thinking about with swap getting
+      filled
+    <youpi> my question was: is there something to release the double swap,
+      once the ext2fs pager managed to recover?
+    <antrik> apparently not
+    <antrik> the only way to free the memory seems to be terminating the FS
+      server
+    <youpi> uh :/
+
+
+# IRC, freenode, #hurd, 2011-11-30
+
+    <antrik> slpz: basically, whenever free memory goes below the paging
+      threshold (which seems to be around 6 MiB) while there is other I/O
+      happening, swap usage begins to increase continuously; and only gets
+      freed again when the filesystem translator in question exits
+    <antrik> so it sounds *very* much like pages go to swap because the
+      filesystem isn't quick enough to properly page them out
+    <antrik> slpz: I think it was you who talked about double paging a while
+      back?
+    <slpz> antrik: probably, sounds like me :-)
+    <antrik> slpz: I have some indication that the degenerating performance and
+      ultimate hang issues I'm seeing are partially or entirely caused by
+      double paging...
+    <antrik> slpz: I don't remember, did you propose some possible fix?
+    <slpz> antrik: hmm... perhaps it wasn't me, because I don't remember trying
+      to fix that problem...
+    <slpz> antrik: at which point do you think pages get duplicated?
+    <antrik> slpz: it was a question. I don't remember whether you proposed
+      something or not :-)
+    <antrik> slpz: basically, whenever free memory goes below the paging
+      threshold (which seems to be around 6 MiB) while there is other I/O
+      happening, swap usage begins to increase continuously; and only gets
+      freed again when the filesystem translator in question exits
+    <antrik> so it sounds *very* much like pages go to swap because the
+      filesystem isn't quick enough to properly page them out
+    <slpz> antrik: I see
+    <slpz> antrik: I didn't addressed this problem directly, but when I've
+      modified the pageout mechanism to provide a special treatment for
+      external pages, I also removed the possibility of sending them to the
+      default pager
+    <slpz> antrik: this was in my experimental environment, of course
+    <antrik> slpz: oh, nice... so it may fix the issues I'm seeing? :-)
+    <antrik> anything testable yet?
+    <slpz> antrik: yes, only anonymous memory could be swapped with that
+    <slpz> antrik: it works, but is ugly as hell
+    <antrik> tschwinge: these is also your observation about compilations
+      getting slower on further runs, and my followups... I *suspect* it's the
+      same issue
+
+[[performance/degradation]].
+
+    <slpz> antrik: I'm thinking about establishing a repository for these
+      experimental versions, so they don't get lost with the time
+    <antrik> slpz: please do :-)
+    <slpz> antrik: perhaps in savannah's HARD project
+    <antrik> even if it's not ready for upstream, it would be nice if I could
+      test it -- right now it's bothering me more than any other Hurd issues I
+      think...
+    <slpz> also, there's another problem which causes performance degradation
+      with the simple use of the system
+    <tschwinge> slpz: Please just push to Savannah Hurd.  Under your
+      slpz/... or similar.
+    <tschwinge> antrik: Might very well be, yes.
+    <slpz> and I almost sure it is the fragmentation of the task map
+    <slpz> tschwinge: ok
+    <slpz> after playing a bit with a translator, it can easily get more than
+      3000 entries in its map
+    <antrik> slpz: yeah, other issues might play a role here as well. I
+      observed that terminating the problematic FS servers does free most of
+      the memory and remove most of the performance degradation, but in some
+      cases it's still very slow
+    <slpz> that makes vm_map_lookup a lot slower
+    <antrik> on a related note: any idea what can cause paging errors and a
+      system hang even when there is plenty of free swap?
+    <antrik> (I'm not entirely sure, but my impression is that it *might* be
+      related to the swap usage and performance degradation problems)
+    <slpz> I think this degree of fragmentation has something to do with the
+      reiterative mapping of memory objects which is done in pager-memcpy.c
+    <slpz> antrik: which kind of paging errors?
+    <antrik> hm... I don't think I ever noted down the exact message; but I
+      think it's the same you get when actually running out of swap
+    <slpz> antrik: that could be the default pager dying for some internal bug
+    <antrik> well, but it *seems* to go along with the performance degradation
+      and/or swap usage
+    <slpz> I also have the impression that we're using memory objects the wrong
+      way
+    <antrik> basically, once I get to a certain level of swap use and slowness
+      (after about a month of use), the system eventually dies
+    <slpz> antrik: I never had a system running for that time, so it could be a
+      completely different problem from what I've seen before :-/
+    <slpz> Anybody has experience with block-level caches on microkernel
+      environments?
+    <antrik> slpz: yeah, it typically happens after about a month of my normal
+      use... but I can significantly accellerate it by putting some problematic
+      load on it, such as large apt-get runs...
+    <slpz> I wonder if it would be better to put them in kernel or in user
+      space. And in the latter, if it would be better to have one per-device
+      shared for all accesing translators, or just each task should have its
+      own cache...
+    <antrik> slpz:
+      http://lists.gnu.org/archive/html/bug-hurd/2011-09/msg00041.html is where
+      I described the issue(s)
+    <antrik> (should send another update for the most recent findings I
+      guess...)
+    <antrik> slpz: well, if we move to userspace drivers, the kernel part of
+      the question is already answered ;-)
+    <antrik> but I'm not sure about per-device cache vs. caching in FS server