[[!meta copyright="Copyright © 2012, 2013 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled [[GNU Free Documentation License|/fdl]]."]]"""]] [[!tag open_issue_gnumach]] [[!toc]] # [[page_cache]] # IRC, freenode, #hurd, 2012-04-26 another not-too-long improvement would be changing the page cache policy to drop the 4000 objects limit, you mean ? yes do you still have my patch attempt ? no let me grab that oh i won't start it right away you know i'll ask for it when i do k (otherwise i fell i'll just loose it again eh) :) but i imagine it's not too hard to achieve yes i also imagine to set a large threshold of free pages to avoid deadlocks which will still be better than the current situation where we have either lots of free pages because tha max limit is reached, or lots of pressure and system freezes :/ yes ## IRC, freenode, #hurd, 2012-06-17 youpi: i don't understand your patch :/ arf  which part don't you understand? the global idea :/ first, drop the limit on number of objects you added a new collect call at pageout time (i.e. here, hack overflow into 0) yes obviously but then the cache keeps filling up with objects which sooner or later become empty thus the collect, which is supposed to look for empty objects, and just drop them but not at the right time objects should be collected as soon as their ref count drops to 0 err now, the code of the collect is just a crude attempt without knowing much about the vm when their resident page count drops to 0 so don't necessarily read it :) ok i've begin playing with the vm recently the limits (arbitrary, and very old obviously) seem far too low for current resources (e.g. the threshold on free pages is 50 iirc ...) yes i'll probably use a different approach the one i mentioned (collecting one object at a time - or pushing them on a list for bursts - when they become empty) this should relax the kernel allocator more (since there will be less empty vm_objects remaining until the next global collecttion) ## IRC, freenode, #hurd, 2012-06-30 the threshold values of the page cache seem quite enough actually braunr: ah youpi: yes, it seems the problems are in ext2, not in the VM k the page cache limitation still doesn't help :) the problem in the VM is the recycling of vm_objects, which aren't freed once empty but it only wastes some of the slab memory, it doesn't prevent correct processing braunr: thus the limitation, right? no well that's the policy they chose at the time for what reason .. i can't tell ok, but I mean we can't remove the policy because of the non-free of empty objects we must remove vm_objects at some point but even without it, it makes no sense to disable the limit while ext2 is still unstable also, i noticed that the page count in vm_objects never actually drop to 0 ... you mean the limit permits to avoid going into the buggy scenarii too often? yes k at least, that's my impression my test case is tar xf files.tar.gz, which contains 50000 files of 12k random data i'll try with other values i get crashes, deadlocks, livelocks, and it's not pretty :) [[libpager_deadlock]]. and always in ext2, mach doesn't seem affected by the issue, other than the obvious (well i get the usual "deallocating an invalid port", but as mentioned, it's "most probably a bug", which is the case here :) braunr: looks coherent with the hangs I get on the buildds youpi: so that's the nasty bug i have to track now though I'm also still getting some out of memory from gnumach sometimes the good thing is i can reproduce it very quickly a dump from the allocator to know which zone took all the room might help youpi: yes i promised that too although that's probably related with ext2 issues :) youpi: can you send me the panic message so i can point the code which must output the allocator state please ? next time I get it, sure :) braunr: you could implement a /proc/slabinfo :) pinotree: yes but when a panic happens, it's too late http://git.sceen.net/rbraun/slabinfo.git/ btw although it's not part of procfs and the mach_debug interface isn't provided :( ## IRC, freenode, #hurd, 2012-07-03 it looks like pagers create a thread per memory object ... braunr: oh. so if I open a lot of files, ext2fs will *inevitably* have lots of threads?... antrik: i'm not sure it may only be required to flush them but when there are lots of them, the threads could run slowly, giving the impression there is one per object in sync mode i don't see many threads and i don't get the bug either for now while i can see physical memory actually being used (and the bug happens before there is any memory pressure in the kernel) so it definitely looks like a corruption in ext2fs and i have an idea .... :> hm no, i thought an alloca with a big size parameter could erase memory outside the stack, but it's something else (although alloca should really be avoided) arg, the problem seems to be in diskfs_sync_everything -> ports_bucket_iterate (pager_bucket, sync_one); :/ :( looks like the ext2 problem is triggered by calling pager_sync from diskfs_sync_everything and is possibly related to http://lists.gnu.org/archive/html/bug-hurd/2010-03/msg00127.html (and for reference, the rest of the discussion http://lists.gnu.org/archive/html/bug-hurd/2010-04/msg00012.html) multithreading in libpager is scary :/ braunr: s/in libpager/ ;-) antrik: right omg the ugliness :/ ok i found a bug a real one :) (but not sure it's the only one since i tried that before) 01:38 < braunr> hm no, i thought an alloca with a big size parameter could erase memory outside the stack, but it's something else turns out alloca is sometimes used for 64k+ allocations which explains the stack corruptions ouch as it's used to duplicate the node table before traversing it, it also explains why the cache limit affects the frequency of the bug now the fun part, write the patch following GNU protocol .. :) [[!message-id "1341350006-2499-1-git-send-email-rbraun@sceen.net"]] if someone feels like it, there are a bunch of alloca calls in the hurd (like around 30 if i'm right) most of them look safe, but some could trigger that same problem in other servers ok so far, no problem with the upstream ext2fs code :) 20 loops of tar xf / rm -rf consuming all free memory as cache :) the hurd uses far too much cpu time for no valid reason in many places :/ * braunr happy my hurd is completely using its ram :) Meaning, the bug is solved? Congrats if so :) well, ext2fs looks way more stable now i haven't had a single issue since the change, so i guess i messed something with my previous test and the Mach VM cache implementation looks good enough now the only thing left is to detect unused objects and release them which is actually the core of my work :) but i'm glad i could polish ext2fs with luck, this is the issue that was striking during "thread storms" in the past * pinotree hugs braunr i'm also very happy to see the slab allocator reacting well upon memory pressure :> braunr: Why alloca corrupted memory diskfs_node_iterate? Was temporary node to big to keep it in stack? mcsim: yes 17:54 < braunr> turns out alloca is sometimes used for 64k+ allocations and i wouldn't be surprised if our thread stacks are simplecontiguous 64k mappings of zero-filled memory (as Mach only provides bottom-up allocation) our thread implementation should leave unmapped areas between thread stacks, to easily catch such overflows braunr: wouldn't also fatfs/inode.c and tmpfs/node.c need the same fix? pinotree: possibly i haven't looked more than 300 loops of tar xf / rm -rf on an archive of 20000 files of 12 KiB each, without any issue, still going on :) braunr: yay ## [[!message-id "20120703121820.GA30902@mail.sceen.net"]], 2012-07-03 ## IRC, freenode, #hurd, 2012-07-04 mach is so good it caches objects which *no* page in physical memory hm i think i have a working and not too dirty vm cache :> braunr: congrats :) kilobug: hey :) the dangerous side effect is the increased swappiness we'll have to monitor that on the buildds otherwise the cache is effectively used, and the slab allocator reports reasonable amounts of objects, not increasing once the ram is full let's see what happens with 1.8 GiB of RAM now damn glibc is really long to build :) and i fear my vm cache patch makes non scalable algorithms negate some of its benefits :/ 72 tasks, 2090 threads we need the ability to monitor threads somewhere ## IRC, freenode, #hurd, 2012-07-05 hm i get kernel panics when not using the host cache :/ no virtual memory for stack allocations that's scary ? i guess the lack of host cache makes I/O slow enough to create a big thread storm that completely exhausts the kernel space my patch challenges scalability :) and not having a zalloc zone anymore, instead of getting a nice panic when trying to allocate yet another thread, you get an address space exhaustion on an unrelated event instead. I see ;-) thread stacks are not allocated from a zone/cache also, the panic concerned aligned memory, but i don't think that matters the kernel panic clearly mentions it's about thread stack allocation oh, by "stack allocations" you actually mean allocating a stack for a new thread... yes that's not what I normally understand when reading "stack allocations" :-) user stacks are simple zero filled memory objects so we usually get a deadlock on them :> i wonder if making ports_manage_port_operations_multithread limit the number of threads would be a good thing to do braunr: last time slpz did that, it turned out that it causes deadlocks in at least one (very specific) situation ok I think you were actually active at the time slpz proposed the patch (and it was added to Debian) -- though probably not at the time where youpi tracked it down as the cause of certain lockups, so it was dropped again... what seems very weird though is that we're normally using continuations [[microkernel/mach/gnumach/continuation]]. braunr: you mean in the kernel? how is that relevant to the topic at hand?... antrik: continuations have been designed to reduce the number of stacks to one per cpu :/ but they're not used everywhere they are not used *anywhere* in the Hurd... antrik: continuations are supposed to be used by kernel code braunr: not sure what you are getting at. of course we should use some kind of continuations in the Hurd instead of having an active thread for every single request in flight -- but that's not something that could be done easily... antrik: oh no, i don't want to use continuations at all i just want to use less threads :) my panic definitely looks like a thread storm i guess increasing the kmem_map will help for the time bein g (it's not the whole kernel space that gets filled up actually) also, stacks are kept on a local cache until there is memory pressure oO their slab cache can fill the backing map before there is any pressure and it makes a two level cache, i'll have to remove that well, how do you reduce the number of threads? apart from optimising scheduling (so requests are more likely to be completed before new ones are handled), the only way to reduce the number of threads is to avoid having a thread per request exactly so instead the state of each request being handled has to be explicitly stored... i.e. continuations hm actually, no you use thread migration :) i don't want to artificially use the number of kernel threads the hurd should be revamped not to use that many threads but it looks like a hard task well, thread migration would reduce the global number of threads in the system... it wouldn't prevent a server from having thousands of threads threads would allready be allocated before getting in the server again, the only way not to use a thread for each outstanding request is having some explicit request state management, i.e. continuations hm right but we can nonetheless reduce the number of threads i wonder if the sync threads are created on behalf of the pagers or the kernel one good thing is that i can already feel better performance without using the host cache until the panic happens the tricky bit about that is that I/O can basically happen at any point during handling a request, by hitting a page fault. so we need to be able to continue with some other request at any point... yes actually, readahead should help a lot in reducing the number of request and thus threads... still will be quite a lot though we should have a bunch of pageout threads handling requests asynchronously it depends on the implementation consider readahead detects that, in the next 10 pages, 3 are not resident, then 1 is, then 3 aren't, then 1 is again, and the last two aren't how is this solved ? :) about the stack allocation issue, i actually think it's very simple to solv the code is a remnant of the old BSD days, when processes were heavily swapped so when a thread is created, its stack isn't allocated the allocation happens when the thread is dispatched, and the scheduler finds it's swapped (which is the initial state) the stack is allocated, and the operation is assumed to succeed, which is why failure produces a panic well, actually, not just readahead... clustered paging in general. the thread storms happen mostly on write not read AIUI changing that to allocate at thread creation time will allow a cleaner error handling antrik: yes, at writeback antrik: so i guess even when some physical pages are already present, we should aim at larger sizes for fewer I/O requests not sure that would be worthwhile... probably doesn't happen all that often. and if some of the pages are dirty, we would have to make sure that they are ignored although they were part of the request... yes so one request per missing area ? the opposite might be a good idea though -- if every other page is dirty, it *might* indeed be preferable to do a single request rewriting even the clean ones in between... yes i personally think one request, then replace only what was missing, is simpler and preferable OTOH, rewriting clean pages might considerably increase write time (and wear) on SSDs why ? I doubt the controller is smart enough to recognies if a page doesn't really need rewriting so it will actually allocate and write a new cluster no but it won't spread writes on different internal sectors, will it ? sectors are usually really big "sectors" is not a term used in SSDs :-) they'll be erased completely whatever the amount of data at some point if i'm right ah need to learn more about that i thought their internal hardware was much like nand flash admittedly I don't remember the correct terminology either... they *are* NAND flash writing is actually not the problem -- it can happen in small chunks. the problem is erasing, which is only possible in large blocks yes so having larger requests doesn't seem like a problem to me because of that thus smart controllers (which pretty much all SSD nowadays have, and apparently even SD cards) do not actually overwrite. instead, writes always happen to clean portions, and erasing only happens when a block is mostly clean (after relocating the remaining used parts to other clean areas) braunr: the problem is not having larger requests. the problem is rewriting clusters that don't really need rewriting. it means the dist performs unnecessary writing actions. it doesn't hurt for magnetic disks, as the head has to pass over the unchanged sectors anyways; and rewriting the unnecessarily doesn't increase wear but it's different for SSDs each write has a penalty there i thought only erases were the real penalty well, erase happens in the background with modern controllers; so it has no direct penalty. the write has a direct performance penalty when saturating the bandwith, and always has a direct wear penalty can't controllers handle 32k requests ? like everything does ? :/ sure they can. but that's beside the point... if they do, they won't mind the clean data inside such large blocks apparently we are talking past each other i must be missing something important about SSD braunr: the point is, the controller doesn't *know* it's clean data; so it will actually write it just like the really unclean data yes and it will choose an already clean sector for that (previously erased), so writing larger blocks shouldn't hurt there will be a slight increase in bandwidth usage, but that's pretty much all of it isn't it ? well, writing always happens to clean blocks. but writing more blocks obviously needs more time, and causes more wear... aiui, blocks are always far larger than the amount of pages we want to writeback in one request the only way to use more than one is crossing a boundary no. again, the blocks that can be *written* are actually quite small. IIRC most SSDs use 4k nowadays ok only erasing operates on much larger blocks so writing is a problem too i didn't think it would cause wear leveling to happen well, I'm not sure whether the wear actually happens on write or on erase... but that doesn't matter, as the number of blocks that need to be erased is equivalent to the number of blocks written... sorry, i'm really not sure if you erase one sector, then write the first and third block, it's clearly not equivalent i mean let's consider two kinds of pageout requests 1/ a big one including clean pages 2/ several ones for dirty pages only let's assume they both need an erase when they happen what's the actual difference between them ? wear will increase only if the controller handle it on writes, if i'm right but other than that, it's just bandwidth strictly speaking erase is only *necessary* when there are no clean blocks anymore. but modern controllers will try to perform erase of unused blocks in the background, so it doesn't delay actual writes i agree on that but the point is that for each 16 pages (or so) written, we need to erase one block so we get 16 clean pages to write... yes which is about the size of a request for the sequential policy so it fits just to be clear: it doesn't matter at all how the pages "fit". the controller will reallocate them anyways what matters is how many pages you write ah i thought it would just put the whole request in a single sector (or two) I'm not sure what you mean by "sector". as I said, it's not a term used in SSD technology so do you imply that writes can actually get spread over different sectors ? the sector is the unit at the nand flash level, its size is the erase size actually, I used the right terminology... the erase unit is the block; the write unit is the page sector is a synonym of block never seen it. and it's very confusing, as it isn't in any way similar to sectors in magnetic disks... http://en.wikipedia.org/wiki/Flash_memory#NAND_flash it's actually in the NOR part right before, paragraph "Erasing" "Modern NOR flash memory chips are divided into erase segments (often called blocks or sectors)." ah. I skipped the NOR part :-) i've only heard sector where i worked, but i don't consider french computer engineers to be authorities on the matter :) hehe let's call them block so, thread stacks are allocated out of the kernel map this is already a bad thing (which is probably why there is a local cache btw) anyways, yes. modern controllers might split a contiguous write request onto several blocks, as well as put writes to completely different logical pages into one block. the association between addresses and actual blocks is completely free now i wonder why the kernel map is so slow, as the panic happens at about 3k threads, so about 11M of thread stacks antrik: ok antrik: well then it makes sense to send only dirty pages s/slow/low/ it's different for raw flash (using MTD subsystem in Linux) -- but I don't think this is something we should consider any time soon :-) (also, raw flash is only really usable with specialised filesystems anyways) yes are the thread stacks really only 4k? I would expect them to be larger in many cases... youpi reduced them some time ago, yes they're 4k on xen uh, 16k damn, i'm wondering why i created separate submaps for the slab allocator :/ probably because that's how it was done by the zone allocator before but that's stupid :/ hm the stack issue is actually more complicated than i thought because of interrupt priority levels i increased the kernel map size to avoid the panic instead now libc0.3 seems to build fine and there seems to be a clear decrease of I/O :) ### IRC, freenode, #hurd, 2012-07-06 braunr: there is a submap for the slab allocator? that's strange indeed. I know we talked about this; and I am pretty sure we agreed removing the submap would actually be among the major benefits of a new allocator... antrik: a submap is a good idea anyway antrik: it avoids fragmenting the kernel space too much it also breaks down locking but we could consider it as a first step, i'll merge the kmem and kalloc submaps (the ones used for the slab caches and the malloc-like allocations respectively) then i'll change the allocation of thread stacks to use a slab cache and i'll also remove the thread swapping stuff it will take some time, but by the end we should be able to allocate tens of thousands of threads, and suffer no panic when the limit is reached braunr: I'm not sure "no panic" is really a worthwhile goal in such a situation... antrik: uh ?N antrik: it only means the system won't allow the creation of threads until there is memory available from my pov, the microkernel should never fail up to a point it can't continue its job braunr: the system won't be able to recover from such a situation anyways. without actual resource management/priorisation, not having a panic is not really helpful. it only makes it harder to guess what happened I fear... i don't see why it couldn't recover :/ ## IRC, freenode, #hurd, 2012-07-07 grmbl, there are a lot of issues with making the page cache larger :( it actually makes the system slower in half of my tests we have to test that on real hardware unfortunately my current results seem to indicate there is no clear benefit from my patch the current limit of 4000 objects creates a good balance between I/O and cpu time with the previous limit of 200, I/O is often extreme with my patch, either the working set is less than 4k objects, so nothing is gained, or the lack of scalability of various parts of the system add overhead that affect processing speed also, our file systems are cached, but our block layer isn't which means even when accessing data from the cache, accesses still cause some I/O for metadata ## IRC, freenode, #hurd, 2012-07-08 youpi: basically, it works fine, but exposes scalability issues, and increases swapiness so it doens't help with stability? hum, that was never the goal :) the goal was to reduce I/O, and increase performance sure but does it at least not lower stability too much? not too much, no k most of the issues i found could be reproduced without the patch ah then fine :) random deadlocks on heavy loads youpi: but i'm not sure it helps with performance youpi: at least not when emulated, and the host cache is used that's not very surprising it does help a lot when there is no host cache and the working set is greater (or far less) than 4k objects ok the amount of vm_object and ipc_port is gracefully adjusted that'd help us with not having to tell people to use the complex -drive option :) ([[hurd/running/qemu/writeback_caching]].) so you can easily run a hurd with 128 MiB with decent performance and no leak in ext2fs yes for example braunr: I'd say we should just try it on buildds (it's not finished yet, i'd like to work more on reducing swapping) (though they're really not busy atm, so the stability change can't really be measured) when building the hurd, which takes about 10 minutes in my kvm instances, there is only a 30 seconds difference between using the host cache and not using it this is already the case with the current kernel, since the working set is less than 4k objects while with the previous limit of 200 objects, it took 50 minutes without host cache, and 15 with it so it's a clear benefit for most uses, except my virtual machines :) heh because there, the amount of ram means a lot of objects can be cached, and i can measure an increase in cpu usage slight, but present youpi: isn't it a good thing that buildds are resting a bit ? :) on one hand, yes but on the other hand, that doesn't permit to continue stress-testing the Hurd :) we're not in a hurry for this patch because using it really means you're tickling the pageout daemon a lot :) ## [[metadata_caching]] ## IRC, freenode, #hurd, 2012-07-12 i'm only adding a cached pages count you know :) (well actually, this is now a vm_stats call that can replace vm_statistics, and uses flavors similar to task_info) my goal being to see that yellow bar in htop ... :) yellow? yes, yellow as in http://www.sceen.net/~rbraun/htop.png ah ## IRC, freenode, #hurd, 2012-07-13 i always get a "no more room for vm_map_enter" error when building glibc :/ but the build continues, probably a failed test ah yes, i can see the yellow bar :> braunr: congrats :-) antrik: thanks but i think my patch can't make it into the git repo until the swap deadlock is solved (or at least very infrequent ..) [[libpager_deadlock]]. well, the page cache accounting tells me something is wrong there too lol during a build 112M of data was created, of which only 28M made it into the cache which may imply something is still holding references on the others objects (shadow objects hold references to their underlying object, which could explain this) ok i'm stupid, i just forgot to subtract the cached pages from the used pages .. :> (hm, actually i'm tired, i don't think this should be done) ahh yes much better i simply forgot to convert pages in kilobytes .... :> with the fix, the accounting of cached files is perfect :) ## IRC, freenode, #hurd, 2012-07-14 braunr: btw, if you want to stress big builds, you might want to try webkit, ppl, rquantlib, rheolef, yade they don't pass on bach (1.3GiB), but do on ironforge (1.8GiB) youpi: i don't need to, i already know my patch triggers swap deadlocks more often, which was expected k there are 3 tasks concerning my work : 1/ page cache accounting (i'm sending the patch right now) 2/ removing the fixed limit and 3/ hunting the swap deadlock and fixing as much as possible 2/ can't get in the repository without 3/ imo btw, the increase of PAGE_FREE_* in your 2/ could go already, couldn't it? yes but we should test with higher thresholds well it really depends on the usage pattern :/ ## [[ext2fs_libports_reference_counting_assertion]] ## IRC, freenode, #hurd, 2012-07-15 concerning the page cache patch, i've been using for quite some time now, did lots of builds with it, and i actually wonder if it hurts stability as much as i think considering i didn't stress the system as much before and it really improves performance cached memobjs: 138606 cache: 1138M i bet ext2fs can have a hard time scanning 138k entries in a linked list, using callback functions on each of them :x ## IRC, freenode, #hurd, 2012-07-16 braunr: Sorry that I didn't have better results to present. :-/ eh, that was expected :) my biggest problem is the hurd itself :/ for my patch to be useful (and the rest of the intended work), the hurd needs some serious fixing not syncing from the pagers and scalable algorithms everywhere of course ## IRC, freenode, #hurd, 2012-07-23 youpi: FYI, the branches rbraun/page_cache in the gnupach and hurd repos are ready to be merged after review gnumach* so you fixed the hangs & such? they only the cache stats, not the "improved" cache no it requires much more work for that :) braunr: my concern is that the tests on buildds show stability regression youpi: tschwinge also reported performance degradation and not the minor kind uh :-/ far less pageins, but twice as many pageouts, and probably high cpu overhead building (which is what buildds do) means lots of small files so lots of objects huge lists, long scans, etc.. so it definitely requires more work the stability issue comes first in mind, and i don't see a way to obtain a usable trace do you ? nope (except making it loop forever instead of calling assert() and attach gdb to a qemu instance) youpi: if you think the infinite loop trick is ok, we could proceed with that which assert? the port refs one which one? whicih prevented you from using the page cache patch on buildds ah, the libports one for that one, I'd tend to take the time to perhaps use coccicheck actually [[code_analysis]]. oh it's one of those which is supposed to be statically ananyzable s/n/l that would be great :-) And set precedence. # IRC, freenode, #hurd, 2012-07-26 hm i killed darnassus, probably the page cache patch again # IRC, freenode, #hurd, 2012-09-19 I was wondering about the page cache information structure I guess the idea is that if we need to add a field, we'll just define another RPC? braunr: ↑ i've done that already, yes youpi: have a look at the rbraun/page_cache gnumach branch that's what I was referring to ok # IRC, freenode, #hurd, 2013-01-15 hm, no wonder the page cache patch reduced performance so much the page cache when building even moderately large packages is about a few dozens MiB (around 50) the patch enlarged it to several hundreds :/ braunr: so the big page cache essentially killed memory locality? ArneBab: no, it made ext2fs crazy (disk translators - used as pagers - scan their cached pages every 5 seconds to flush the dirty ones) you can imagine what happens if scanning and flushing a lot of pages takes more than 5 seconds ouch… that’s heavy, yes I already see it pile up in my mindb and it's completely linear, using a lock to protect the whole list darnassus is currently showing such a behaviour, because tschwinge is linking huge files (one object with lots of pages) 446 MB of swap used, between 200 and 1850 MiB of RAM used, and i can still use vim and build stuff without being too disturbed the system does feel laggy, but there has been great stability improvements have* and even if laggy, it doesn't feel much more than the usual lag of a network (ssh) based session # IRC, freenode, #hurd, 2013-10-08 hmm i have to change what gnumach reports as being cached memory ## IRC, freenode, #hurd, 2013-10-09 mhmm, i'm able to copy files as big as 256M while building debian packages, using a gnumach kernel patched for maximum memory usage in the page cache just because i used --sync=30 in ext2fs a bit of swapping (around 40M), no deadlock yet gitweb is a bit slow but that's about it that's quite impressive i suspect thread storms might not even be the cataclysmic event that we thought it was the true problem might simply be parallel fs synces ## IRC, freenode, #hurd, 2013-10-10 even with the page cache patch, memory filled, swap used, and lots of cached objects (over 200k), darnassus is impressively resilient i really wonder whether we fixed ext2fs deadlock youpi: fyi, darnassus is currently running a patched gnumach with the vm cache changes, in hope of reproducing the assertion errors we had in the past i increased the sync interval of ext2fs to 30s like we discussed a few months back and for now, it has been very resilient, failing only because of the lack of kernel map entries after several heavy package builds wait the latter wasn't a deadlock it resumed after 1363.06 s gg0: thread storms can sometimes (rarely) fade and let the system resume "normally" which is why i increased the sync interval to 30s, this leaves time between two intervals for normal operations otherwise writebacks are queued one after the other, and never processed fast enough for that queue to become empty again (except rarely) youpi: i think we should consider applying at least the sync interval to exodar, since many DDs are just unaware of the potential problems with large IOs sure 222k cached objects (1G of cached memory) and darnassus is still kicking :) youpi: those lock fixing patches your colleague sent last year must have helped somewhere :) ## IRC, freenode, #hurd, 2013-10-13 braunr: how are your tests going with the object cache? youpi: not so good youpi: it failed after 2 days of straight building without a single error output :/