[[!meta copyright="Copyright © 2012 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled [[GNU Free Documentation License|/fdl]]."]]"""]] [[!tag open_issue_documentation open_issue_hurd open_issue_gnumach]] # IRC, freenode, #hurd, 2012-02-14 Open question: what do you think about dropping the memory object model and implementing a simple block-level cache? [[microkernel/mach/memory_object]]. slpz: AFAIK the memory object has more purpose than just cache, it's allow used for passing chunk of data between processes, handling swap (which similar to cache, but still slightly different), ... kilobug: user processes usually make their way to data with POSIX operations, so memory objects are only needed for mmap'ed files kilobug: and swap can be replaced for an in-kernel system or even could still use the memory object slpz: memory objects are used for the page cache slpz: translators (especially diskfs based) make heavy use of memory objects, and if "user processes" use POSIX semantics, Hurd process (translators, pagers, ...) shouldn't be bound to POSIX braunr: and page cache could be moved to a lower level, near to the devices not likely well, it could, but then you'd still have the file system overhead kilobug: but the use of memory objects it's not compulsory, you can easily write a fs translator without implementing memory objects at all (except to mmap) a unified buffer/VM cache as all modern systems have is probably the most efficient approach braunr: I agree. I want to look at *BSD/Linux vfs systems to seem how much cache policy depends on the filesystem braunr: Are you aware of any good papers on this matter? netbsd UVM, the linux virtual memory system both a bit old bit still relevant braunr: Thanks. the problem in our case is that having FS and cache information at different contexts (kernel vs. translator), I find hard to coordinate them. that's why I though about a block-level cache that GNU Mach could manage by itself I wonder how QNX deals with this the point of having a simple page cache is explicitely about not caring if those pages are blocks or files or whatever the kernel (at least, mach) normally has all the accounting information it needs to implement its cache policy file system translators shouldn't cache much the pager interface could be refined, but it looks ok to me as it is Mach has the accounting info, but it's not able to purge the cache without coordination with translators which is normal And this is a big problem when memory pressure increases, as it doesn't know for sure when memory is going to be freed Mach flushes its cache when it decides to, and sends back dirty pages if needed by the pager that's the case with every paging implementation the main difference is security with untrusted pagers but that's another issue but in a monolithic implementation, the kernel is able for force a chunk of cache memory to be freed without hoping for other process to do the job that's not true they're not process, they're threads, but the timing issue is the same see pdflush on linux no, it isn't. when memory is scarce, threads that request memory can either wait or immediately fail, and if they wait, they're usually woken by one of the vm threads once flushing is done a kernel thread can access all the information in the kernel, and synchronization is pretty easy. on mach, synchronization is done with messages, that's even easier than shared kernel locks with processes in different spaces, resource coordination becomes really difficult and what kind of info would an external pager need when simply asked to take back its dirty pages what resources ? just take a look at the thread storm problem when GNU Mach needs to clean a bunch of pages Mach is big enough to correctly account memory there can be thread storms on monolithic systems that's a Mach issue, not a microkernel issue that's why linux limits the number of pdflush thread instances Mach can account memory, but can't assure when be freed by any means, in a lesser degree than a monolithic system again i disagree no system can guarantee when memory will be freed with paging a block level cache can, for most situations slpz: why ? slpz: or how i mean ? braunr: with a block-level page cache, GNU Mach should be able to flush dirty pages directly to the underlaying device without all the complexity and resource cost involved in a m_o_data_return message. It can also throttle the rate at which pages are being cleaned, and do all this while blocking new page allocations to deal with memory exhaustion cases. braunr: in the current state, when cleaning dirty pages, GNU Mach sends a bunch on m_o_data_return to the corresponding pagers, hoping they will do their job as soon and as fast as possible. memory is not really freed, but transformed from page cache to anonymous memory pertaining to the corresponding translator and GNU Mach never knows for sure when this memory is released, if it ever is. not being able to flush dirty pages synchronously is a big problem when you need to throttle memory usage and needing allocating more memory when you're trying to free (which is the case for the m_o_data_return mechanism) makes the problem even worse your idea of a block level cache means in kernel block drivers that's not the direction we're taking i agree flushing should be a synchronous process, which was one of the proposed improvements in the thread migration papers (they didn't achieve it but thought about it for future works, so that the thread at the origin of the fault would handle it itself) but it should be possible to have kernel threads similar to pdflush and throttle flush requests too again, i really think it's a mach bug, and having a buffer cache would be stepping backward the real design issue is allocating memory while trying to free it, yes braunr: thread migration doesn't apply to asynchronous IPC, and the entire paging mechanism is implemented this way in fact, trying to do a synchronous m_o_data_return will trigger a deadlock for sure to achieve synchronous flushing with translators, the entire paging model must be redesigned It's true that I'm not very confident of the viability of user space drivers at least, not for every device I know this is against the current ideas for most ukernel designs, but if we want to achieve real work functionality, I think some sacrifices must be done. Or at least a reasonable compromise. slpz: thread migration for paging requests implies synchronous RPC, we don't care much about the IPC layer there and it requires large changes of the VM code in addition, yes let's not talk about this, we don't have thread migration anyway :p except the allocation-on-free-path issue, i really don't see how the current pager interface or the page cache creates problems wrt flushing .. monolithic systems also have that problem, with lower impacts though, but still braunr: because as it doesn't know when memory is really freed, 1) it just blindly sends a bunch of m_o_data_return to the pagers, usually overloading them (causing thread storms), and 2) it can't properly throttle new page requests to deal with resource exhaustion it does know when memory is really freed and yes, it blindly sends a bunch of requests, they can and should be trottled but dirty pages freed become indistinguishable from common anonymous chunks released, so it doesn't really know if page flushes are really working or not (i.e. doesn't know how fast a device is processing write requests) memory is freed when the pager deallocates it the speed of the operation is irrelevant no system can rely on disk speed to guarantee correct page flushes disk or anything else requests can't be throttled if Mach doesn't know when they are being processed it can easily know it they are processed as soon as the request is sent from the kernel and processing is done when the pager acknowledges the end of the flush memory backing the flushed pages should be released before acknowleding that to avoid starting new requests too soon AFAIK pagers doesn't acknowledge the end of the flush well that's where the interface should be refined Mach just sends the m_o_data_return and continues on its own that's why flushing should be synrhconous are you sure about that however ? so the entire paging system needs a new design... :) pretty sure not a new design .. there is m_o_supply_completed, i don't see how difficult it would be to add m_o_data_return_completed it's not a small change, but not a difficult one either i'm more worried about the allocation problem the default pager should probably be wired in memory maybe others let's suppose a case in which Mach needs to free memory due to an increase in its pressure. vm_pageout_daemon starts running, clean pages are freed easily, but for each dirty one a m_o_data_return in sent. 1) when should this daemon stop sending m_o_data_return and start waiting for m_o_data_return_completed? 2) what happens if the translator needs to read new blocks to fulfill a write request (pretty common in ext2fs)? it should stop after an arbitrary limit is reached a reasonable one linux limits the number of pdflush threads for that reason as i mentioned (to 8 iirc) the problem of reading blocks while flushing is what i'm worried about too, hence the need to wire that code well, i'm nto sure it's needed again, a reasonable about of free memory should be reserved for that at all times but the work for pdflush seems to be a lot easier, as it only deals directly with block devices (if I understood it correctly, I just started looking at it). i don't know how other systems compute that, but this is how they seem to do as well no, i don't think so well, I'll try to invest a few days understanding how pdflush work, to see if some ideas can be borrowed for Hurd iirc, freebsd has thresholds in percent for each part of its cache (active, inactive, free, dirty) but I still think simple solutions work better, and using the memory object for page cache is tremendously complex. the amount of free cache pages is generally sufficient to guarantee much memory can be released at once if needed, without flushing anything yes but that's the whole point of the Mach VM and its greatest advance .. what, memory objects? yes using physical memory as a cache for anything, not just block buffers memory objects work great as a way to provide a shared image of objects between processes, but as page cache they are an overkill (IMHO). or, at least, in the way we're using them probably http://lwn.net/Articles/326552/ this can help udnerstand the problems we may have without better knowledge of the underlying devices, yes (e.g. not being able to send multiple requests to pagers that don't share the same disk) slpz: actually i'm not sure it's that overkill the linux vm uses struct vm_file to represent memory objects iirc there are many links between that structure and some vfs related subsystems when a system very actively uses the page cache, the kernel has to maintain a lot of objects to accurately describe the cache content you could consider this overkill at first too the mach way of doing it just implies some ipc messages instead of function calls, it's not that overkill for me the main problems are recursion (allocation while freeing, handling page faults in order to handle flushes, that sort of things) struct file and struct address_space actually slpz: see struct address_space, it contains a set of function pointers that can help understanding the linux pager interface they probably sufferred from similar caveats and worked around them, adjusting that interface on the way but their strategy makes them able to treat the relationship between the page cache and the block devices in a really simple way, almost as a traditional storage cache. meanwhile on Mach+pager scenario, the relationship between a block in a file and its underlying storage becomes really blurry this is a huge advantage when flusing out data, specially when resources are scarce I think the idea of using abstract objects for page cache, loses a bit the point that we just want to avoid accessing constantly to a slow device and breaking the tight relationship between the device and its cache, makes things a lot harder this also manifest itself when flushing clean pages, as things like having an static maximum for cached memory objects we shouldn't care about the number of objects, we just need to control the number of pages but as we need the pager to flush pages, we need to keep alive a lot of control ports to them slpz: When mo_data_return is called, once the memory manager no longer needs supplied data, it should be deallocated using vm_deallocate. So this way pagers acknowledges the end of flush.