[[!meta copyright="Copyright © 2012 Free Software Foundation, Inc."]]

[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
id="license" text="Permission is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation License, Version 1.2 or
any later version published by the Free Software Foundation; with no Invariant
Sections, no Front-Cover Texts, and no Back-Cover Texts.  A copy of the license
is included in the section entitled [[GNU Free Documentation
License|/fdl]]."]]"""]]

[[!tag open_issue_gnumach open_issue_glibc open_issue_hurd]]

Issues relating to system behavior under memory pressure.

[[!toc]]


# [[gnumach_page_cache_policy]]


# IRC, freenode, #hurd, 2012-07-08

    <braunr> am i mistaken or is the default pager simply not vm privileged ?
    <braunr> (which would explain the hangs when memory is very low)
    <youpi> no idea
    <youpi> but that's very possible
    <youpi> we start it by hand from the init scripts
    <braunr> actually, i see no way provided by mach to set that
    <braunr> i'd assume it would set the property when a thread would register
      itself as the default pager, but it doesn't
    <braunr> i'll check at runtime and see if fixing helps
    <youpi> thread_wire(host, thread, 1) ?
    <youpi> ./hurd/mach-defpager/wiring.c:	kr =
      thread_wire(priv_host_port,
    <braunr> no
    <braunr> look in cprocs.c
    <braunr> iir
    <braunr> iirc
    <braunr> iiuc, it sets a 1:1 kernel/user mapping
    <youpi> ??
    <youpi> thread_wire, not cthread_wire
    <braunr> ah
    <braunr> right, i'm getting tired
    <braunr> youpi: do you understand the comment in default_pager_thread() ?
    <youpi> well, I'm not sure to know what external vs internal is
    <braunr> i'm almost sure the default pager is blocked because of a relation
      with an unprivlege thread
    <braunr> +d
    <braunr> when hangs happen, the pageout daemon is still running, waiting
      for an event so he can continue
    <braunr> it*

    <braunr> all right, our pageout stuff completely sucks
    <braunr> when you think the system is hanged, it's actually not
    <pinotree> and what's happening instead?
    <braunr> instead, it seems it's in a very complex resursive state which
      ends in the slab allocator not being able to allocate kernel map entries
    <braunr> recursive*
    <braunr> the pageout daemon, unable to continue, progressively slows
    <braunr> in hope the default pager is able to service the pageout requests,
      but it's not
    <braunr> probably the most complicated deadlock i've seen :)
    <braunr> luckily !
    <braunr> i've been playing with some tunables involved in waking up the
      pageout daemon
    <braunr> and got good results so far
    <braunr> (although it's clearly not a proper solution)
    <braunr> one thing the kernel lacks is a way to separate clean from dirty
      pages
    <braunr> this stupid kernel doesn't try to free clean pages first .. :)
    <braunr> hm
    <braunr> now i can see the system recover, but some applications are still
      stuck :(
    <braunr> (but don't worry, my tests are rather aggressive)
    <braunr> what i mean by aggressive is several builds and various dd of a
      few hundred MiB in parallel, on various file systems
    <braunr> so far the file systems have been very resilient
    <braunr> ok, let's try running the hurd with 64 MiB of RAM
    <braunr> after some initial swapping, it runs smoothly :)
    <braunr> uh ?
    <braunr> ah no, i'm still doing my parallel builds
    <braunr> although less
    <braunr> gcc: internal compiler error: Resource lost (program as)
    <braunr> arg
    <braunr> lol
    <braunr> the file system crashed under the compiler
    <pinotree> too much memory required during linking? or ram+swap should have
      been enough?
    <braunr> there is a lot of swap, i doubt it
    <braunr> the hurd is such a dumb and impressive system at the same time
    <braunr> pinotree: what does this tell you ?
    <braunr> git: hurdsig.c:948: post_signal: Unexpected error: (os/kern)
      failure.
    <pinotree> something samuel spots often during the builds of haskell
      packages

Probably also the *sigpost* case mentioned in [[!message-id
"87bol6aixd.fsf@schwinge.name"]].

    <braunr> actually i should be asking jkoenig
    <braunr> it seems the lack of memory has a strong impact on signal delivery
    <braunr> which is bad
    <antrik> braunr: I have a vague recollection of slpz also saying something
      about missing dirty page tracking a while back... I might be confusing
      stuff though
    <braunr> pinotree: yes it happens often during links
    <braunr> which makes sense
    <pinotree> braunr: "happens often" == "hurdsig.c:948: post_signal: ..."?
    <braunr> yes
    <pinotree> if you can reproduce it often, what about debugging it? :P
    <braunr> i mean, the few times i got it, it was often during a link :p
    <braunr> i'd rather debug the pageout deadlock :(
    <braunr> but it's hard