[[!meta copyright="Copyright © 2010, 2012, 2014 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled [[GNU Free Documentation License|/fdl]]."]]"""]] [[!tag open_issue_hurd]] Deadlocks in libpager/periodic sync have been found. # [[gnumach_page_cache_policy]] ## IRC, freenode, #hurd, 2012-07-12 ah great, a paper about the mach pageout daemon ! braunr: Where is paper about the mach pageout daemon? ftp://ftp.cs.cmu.edu/project/mach/doc/published/defaultmm.ps might give us a clue about the swap deadlock (although i still have a few ideas to check) http://www.sceen.net/~rbraun/moving_the_default_memory_manager_out_of_the_mach_kernel.pdf we should more seriously consider sergio's advisory pageout branch some day [[user/Sergio_Lopez]], [[gnumach_memory_management_2]]. i'll try to get in touch with him about that before he completely looses interest i'll include it in my "make that page cache as decent as possible" task many of his comments match what i've seen and we both did a few optimizations the same way (like not deactivating pages when they enter the cache) ## IRC, freenode, #hurd, 2012-07-13 antrik: i'm able to consistently reproduce the swap deadlocks you regularly had when using apt with my page cache patch it happens when lots of dirty pages are write back to their pagers so apt, or a big file copy or anything that writes several MiB very quickly is a good candidate written* braunr: nice... antrik: well in a way, yes, as it will allow us to track it more easily ## IRC, freenode, #hurd, 2012-07-15 oh btw, i think i can say with confidence that the hurd *doesn't* deadlock (at least, concerning swapping) lol, one of my hurd systems has been hitting the "swap deadlock" for more than an hour, and suddenly got out of it something is really wrong in the pageout daemon, but it's not a deadlock a livelock then do you get out of livelocks ? i mean, it's not even a "lock" just a big damn tricky slowdown yes, you can, by giving a few more resources for instance depends on the kind of livelock of course i think it's that the pageout daemon clearly throttles itself, waiting for pagers to complete and another dangerous thing is the line in vm_resident, which only wakes on thread to avoid starvation hum, during the livelock, the kernel spends much time waiting in db_read_address could be a bad stack so, the pageout daemon seems to slow itself as much as waiting several seconds between each iteration when under load but each iteration possibly removes clean pages so at some point, there is enough memory to unblock waiting pagers for now i'll try a simple solution, like limiting the pausing delay but we'll need more page lists in the future (inactive-clean, inactive-dirty, etc..) limiting the amount of dirty pages is the only way to really make it safe actually wow, the pageout loop is still running even after many pages were freed, and it unable to free more pages i think i have an idea about the livelock i think it comes from the periodic syncing Too often? that's not the problem the problem is that it can happen at the same time with paging Oh if paging gets slow, it won't stop the periodic syncing which will grab any page it can as soon as some are free but then, before it even finishes, another sync may occur i have yet to check that it is possible and i don't understand why syncing isn't done by the kernel the kernel is supposed to handle the paging policy and it would make paging really scale It's done on the Hurd side? (instead of having external pagers make one request for each object, even if they're clean) yes Hmm, interesting ofc, with ext2fs --debug, i can't reproduce anything Ugh sync are serialized grmbl there is a big lock taken at sync time though uhg ## IRC, freenode, #hurd, 2012-07-16 all right so, there *is* a deadlock, and it may be due to the default pager actually the vm_page_laundry_count doesn't decrease at some point, even when there are more than enough free pages antrik: the thing is, i think the deadlock concerns the default pager the deadlock? yes when swapping ## IRC, freenode, #hurd, 2012-07-17 i can't even reproduce the swap deadlock when using upstrea ext2fs :( upstream* ## IRC, freenode, #hurd, 2012-07-19 the libpager deadlock patch looks wrong to me hm no, the libpager patch is ok acually ## [[synchronous_ipc]] ### IRC, freenode, #hurd, 2012-07-20 but actually after reviewing more, the debian patch for this particular issue seems correct well, it's most probably done by youpi, so I would be shocked if it wasn't correct... ;-) he wasn't sure at all about it still ;-) :) well, if you also think it's correct, I guess it's time to push it upstream... ## IRC, freenode, #hurd, 2012-07-23 i still can't conclude if we have any pageout deadlock, or if it's simply a side effect of the active and inactive lists getting very very large but almost every time this issue happens, it somehow recovers, sometimes hours later # See Also * [[ext2fs_deadlock]]