From 95878586ec7611791f4001a4ee17abf943fae3c1 Mon Sep 17 00:00:00 2001 From: "https://me.yahoo.com/a/g3Ccalpj0NhN566pHbUl6i9QF0QEkrhlfPM-#b1c14" Date: Mon, 16 Feb 2015 20:08:03 +0100 Subject: rename open_issues.mdwn to service_solahart_jakarta_selatan__082122541663.mdwn --- .../memory_object_model_vs_block-level_cache.mdwn | 514 --------------------- 1 file changed, 514 deletions(-) delete mode 100644 open_issues/memory_object_model_vs_block-level_cache.mdwn (limited to 'open_issues/memory_object_model_vs_block-level_cache.mdwn') diff --git a/open_issues/memory_object_model_vs_block-level_cache.mdwn b/open_issues/memory_object_model_vs_block-level_cache.mdwn deleted file mode 100644 index 22db9b86..00000000 --- a/open_issues/memory_object_model_vs_block-level_cache.mdwn +++ /dev/null @@ -1,514 +0,0 @@ -[[!meta copyright="Copyright © 2012, 2013 Free Software Foundation, Inc."]] - -[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable -id="license" text="Permission is granted to copy, distribute and/or modify this -document under the terms of the GNU Free Documentation License, Version 1.2 or -any later version published by the Free Software Foundation; with no Invariant -Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license -is included in the section entitled [[GNU Free Documentation -License|/fdl]]."]]"""]] - -[[!tag open_issue_documentation open_issue_hurd open_issue_gnumach]] - -[[!toc]] - - -# IRC, freenode, #hurd, 2012-02-14 - - Open question: what do you think about dropping the memory object - model and implementing a simple block-level cache? - -[[microkernel/mach/memory_object]]. - - slpz: AFAIK the memory object has more purpose than just cache, - it's allow used for passing chunk of data between processes, handling - swap (which similar to cache, but still slightly different), ... - kilobug: user processes usually make their way to data with POSIX - operations, so memory objects are only needed for mmap'ed files - kilobug: and swap can be replaced for an in-kernel system or even - could still use the memory object - slpz: memory objects are used for the page cache - slpz: translators (especially diskfs based) make heavy use of - memory objects, and if "user processes" use POSIX semantics, Hurd process - (translators, pagers, ...) shouldn't be bound to POSIX - braunr: and page cache could be moved to a lower level, near to the - devices - not likely - well, it could, but then you'd still have the file system overhead - kilobug: but the use of memory objects it's not compulsory, you can - easily write a fs translator without implementing memory objects at all - (except to mmap) - a unified buffer/VM cache as all modern systems have is probably - the most efficient approach - braunr: I agree. I want to look at *BSD/Linux vfs systems to seem - how much cache policy depends on the filesystem - braunr: Are you aware of any good papers on this matter? - netbsd UVM, the linux virtual memory system - both a bit old bit still relevant - braunr: Thanks. - the problem in our case is that having FS and cache information at - different contexts (kernel vs. translator), I find hard to coordinate - them. - that's why I though about a block-level cache that GNU Mach could - manage by itself - I wonder how QNX deals with this - the point of having a simple page cache is explicitely about not - caring if those pages are blocks or files or whatever - the kernel (at least, mach) normally has all the accounting - information it needs to implement its cache policy - file system translators shouldn't cache much - the pager interface could be refined, but it looks ok to me as it - is - Mach has the accounting info, but it's not able to purge the cache - without coordination with translators - which is normal - And this is a big problem when memory pressure increases, as it - doesn't know for sure when memory is going to be freed - Mach flushes its cache when it decides to, and sends back dirty - pages if needed by the pager - that's the case with every paging implementation - the main difference is security with untrusted pagers - but that's another issue - but in a monolithic implementation, the kernel is able for force a - chunk of cache memory to be freed without hoping for other process to do - the job - that's not true - they're not process, they're threads, but the timing issue is the - same - see pdflush on linux - no, it isn't. - when memory is scarce, threads that request memory can either wait - or immediately fail, and if they wait, they're usually woken by one of - the vm threads once flushing is done - a kernel thread can access all the information in the kernel, and - synchronization is pretty easy. - on mach, synchronization is done with messages, that's even easier - than shared kernel locks - with processes in different spaces, resource coordination becomes - really difficult - and what kind of info would an external pager need when simply - asked to take back its dirty pages - what resources ? - just take a look at the thread storm problem when GNU Mach needs to - clean a bunch of pages - Mach is big enough to correctly account memory - there can be thread storms on monolithic systems - that's a Mach issue, not a microkernel issue - that's why linux limits the number of pdflush thread instances - Mach can account memory, but can't assure when be freed by any - means, in a lesser degree than a monolithic system - again i disagree - no system can guarantee when memory will be freed with paging - a block level cache can, for most situations - slpz: why ? - slpz: or how i mean ? - braunr: with a block-level page cache, GNU Mach should be able to - flush dirty pages directly to the underlaying device without all the - complexity and resource cost involved in a m_o_data_return message. It - can also throttle the rate at which pages are being cleaned, and do all - this while blocking new page allocations to deal with memory exhaustion - cases. - braunr: in the current state, when cleaning dirty pages, GNU Mach - sends a bunch on m_o_data_return to the corresponding pagers, hoping they - will do their job as soon and as fast as possible. - memory is not really freed, but transformed from page cache to - anonymous memory pertaining to the corresponding translator - and GNU Mach never knows for sure when this memory is released, if - it ever is. - not being able to flush dirty pages synchronously is a big problem - when you need to throttle memory usage - and needing allocating more memory when you're trying to free (which - is the case for the m_o_data_return mechanism) makes the problem even - worse - your idea of a block level cache means in kernel block drivers - that's not the direction we're taking - i agree flushing should be a synchronous process, which was one of - the proposed improvements in the thread migration papers - (they didn't achieve it but thought about it for future works, so - that the thread at the origin of the fault would handle it itself) - but it should be possible to have kernel threads similar to - pdflush and throttle flush requests too - again, i really think it's a mach bug, and having a buffer cache - would be stepping backward - the real design issue is allocating memory while trying to free - it, yes - braunr: thread migration doesn't apply to asynchronous IPC, and the - entire paging mechanism is implemented this way - in fact, trying to do a synchronous m_o_data_return will trigger a - deadlock for sure - to achieve synchronous flushing with translators, the entire paging - model must be redesigned - It's true that I'm not very confident of the viability of user space - drivers - at least, not for every device - I know this is against the current ideas for most ukernel designs, - but if we want to achieve real work functionality, I think some - sacrifices must be done. Or at least a reasonable compromise. - slpz: thread migration for paging requests implies synchronous - RPC, we don't care much about the IPC layer there - and it requires large changes of the VM code in addition, yes - let's not talk about this, we don't have thread migration anyway - :p - except the allocation-on-free-path issue, i really don't see how - the current pager interface or the page cache creates problems wrt - flushing .. - monolithic systems also have that problem, with lower impacts - though, but still - braunr: because as it doesn't know when memory is really freed, 1) - it just blindly sends a bunch of m_o_data_return to the pagers, usually - overloading them (causing thread storms), and 2) it can't properly - throttle new page requests to deal with resource exhaustion - it does know when memory is really freed - and yes, it blindly sends a bunch of requests, they can and should - be trottled - but dirty pages freed become indistinguishable from common anonymous - chunks released, so it doesn't really know if page flushes are really - working or not (i.e. doesn't know how fast a device is processing write - requests) - memory is freed when the pager deallocates it - the speed of the operation is irrelevant - no system can rely on disk speed to guarantee correct page flushes - disk or anything else - requests can't be throttled if Mach doesn't know when they are being - processed - it can easily know it - they are processed as soon as the request is sent from the kernel - and processing is done when the pager acknowledges the end of the - flush - memory backing the flushed pages should be released before - acknowleding that to avoid starting new requests too soon - AFAIK pagers doesn't acknowledge the end of the flush - well that's where the interface should be refined - Mach just sends the m_o_data_return and continues on its own - that's why flushing should be synrhconous - are you sure about that however ? - so the entire paging system needs a new design... :) - pretty sure - not a new design .. - there is m_o_supply_completed, i don't see how difficult it would - be to add m_o_data_return_completed - it's not a small change, but not a difficult one either - i'm more worried about the allocation problem - the default pager should probably be wired in memory - maybe others - let's suppose a case in which Mach needs to free memory due to an - increase in its pressure. vm_pageout_daemon starts running, clean pages - are freed easily, but for each dirty one a m_o_data_return in sent. 1) - when should this daemon stop sending m_o_data_return and start waiting - for m_o_data_return_completed? 2) what happens if the translator needs to - read new blocks to fulfill a write request (pretty common in ext2fs)? - it should stop after an arbitrary limit is reached - a reasonable one - linux limits the number of pdflush threads for that reason as i - mentioned (to 8 iirc) - the problem of reading blocks while flushing is what i'm worried - about too, hence the need to wire that code - well, i'm nto sure it's needed - again, a reasonable about of free memory should be reserved for - that at all times - but the work for pdflush seems to be a lot easier, as it only deals - directly with block devices (if I understood it correctly, I just started - looking at it). - i don't know how other systems compute that, but this is how they - seem to do as well - no, i don't think so - well, I'll try to invest a few days understanding how pdflush work, - to see if some ideas can be borrowed for Hurd - iirc, freebsd has thresholds in percent for each part of its cache - (active, inactive, free, dirty) - but I still think simple solutions work better, and using the memory - object for page cache is tremendously complex. - the amount of free cache pages is generally sufficient to - guarantee much memory can be released at once if needed, without flushing - anything - yes but that's the whole point of the Mach VM - and its greatest advance .. - what, memory objects? - yes - using physical memory as a cache for anything, not just block - buffers - memory objects work great as a way to provide a shared image of - objects between processes, but as page cache they are an overkill (IMHO). - or, at least, in the way we're using them - probably - http://lwn.net/Articles/326552/ - this can help udnerstand the problems we may have without better - knowledge of the underlying devices, yes - (e.g. not being able to send multiple requests to pagers that - don't share the same disk) - slpz: actually i'm not sure it's that overkill - the linux vm uses struct vm_file to represent memory objects iirc - there are many links between that structure and some vfs related - subsystems - when a system very actively uses the page cache, the kernel has to - maintain a lot of objects to accurately describe the cache content - you could consider this overkill at first too - the mach way of doing it just implies some ipc messages instead of - function calls, it's not that overkill for me - the main problems are recursion (allocation while freeing, - handling page faults in order to handle flushes, that sort of things) - struct file and struct address_space actually - slpz: see struct address_space, it contains a set of function - pointers that can help understanding the linux pager interface - they probably sufferred from similar caveats and worked around - them, adjusting that interface on the way - but their strategy makes them able to treat the relationship between - the page cache and the block devices in a really simple way, almost as a - traditional storage cache. - meanwhile on Mach+pager scenario, the relationship between a block - in a file and its underlying storage becomes really blurry - this is a huge advantage when flusing out data, specially when - resources are scarce - I think the idea of using abstract objects for page cache, loses a - bit the point that we just want to avoid accessing constantly to a slow - device - and breaking the tight relationship between the device and its - cache, makes things a lot harder - this also manifest itself when flushing clean pages, as things like - having an static maximum for cached memory objects - we shouldn't care about the number of objects, we just need to - control the number of pages - but as we need the pager to flush pages, we need to keep alive a lot - of control ports to them - slpz: When mo_data_return is called, once the memory manager no - longer needs supplied data, it should be deallocated using - vm_deallocate. So this way pagers acknowledges the end of flush. - - -# IRC, freenode, #hurd, 2013-08-26 - - < Spyro> Ok, so - < Spyro> idiot question: in a nutshell, what is a memory object? - < Spyro> and how is swapping/paging handled? - < braunr> Spyro: a memory object is how the virtual memory system views a - file - < braunr> so it's a sequence of bytes with a length - < braunr> "swapping" is just a special case of paging that applies to - anonymous objects - < braunr> (which are named so because they're not associated with a file - and have no name) - < Spyro> Who creates a memory object, and when? - < braunr> pagers create memory objects when needed, e.g. when you open a - file - < Spyro> and this applies both to mmap opens as well as regular I/O opens - as in for read() and write()? - < braunr> basically, all file systems capable of handling mmap requests - and/or caching in physical memory are pagers - < braunr> yes - < braunr> read/write will go through the page cache when possible - < Spyro> and who owns the page cache? - < Spyro> also, who decides what pages ot evict to swap/file if physical - memory gets tight? - < braunr> the kernel - < braunr> that's one of the things that make mach a hybrid - < Spyro> so the kernel owns the page cage? - < Spyro> ...fml - < Spyro> cache! - < braunr> yes - - -## IRC, freenode, #hurd, 2013-08-27 - - < Spyro> so braunr: So, who creates the memory object, and how does it get - populated? - < Spyro> and how does a process accessing a file get hooked up to the - memory object? - < braunr> Spyro: i told you, pagers create memory objects - < braunr> memory objects are how the VM system views files, so they're - populated from the content of files - < braunr> either true files or virtual files such as in /proc - < braunr> Spyro: processes don't directly access memory objects unless - memory mapping them with vm_map() - < braunr> pagers (basically = file systems) do - ok, so how is a pager/fs involved in handling a fault? - - -## IRC, freenode, #hurd, 2013-08-28 - - Spyro: each object is linked to a pager - Spyro: when a fault occurs, the kernel looks up the VM map (kernel - or a user one), and the address in this map, then the map entry, checks - access and lots of other details - ok, so it's pager -> object -> vmem - ? - Spyro: then finds the object mapped at that address (similar to - how a file is mapped with mmap) - from the object, it finds the pager - ok - and asks the pager about the data at the appropriate offset - so how does a user process do normal file I/O? is faulting just a - special case of it? - it's completely separate - eww - normal I/O is done with message passing - the hurd io interface - ok - so who talks to who on a file I/O? - a client (e.g. cat) talks to a file system server (e.g. ext2fs) - ok so - it's client to the pager for regular file I/O? - Spyro: i don't understand the question - Spyro: it's client to server, the server might not be a pager - ok - just trying to figure out the difference between paging/faulting - and regular I/O - regular I/O is just message passing - page fault handling is dealt with by pagers - and I have a hunch that the fs/pager is involved somehow in both, - because the server is the source of the data - I'm getting a headache - nalaginrut: a server like ext2fs is both a file server and a pager - oh! - oh btw, does a file server make use of memory objects for caching? - Spyro: yes - or rather, can it? - does it have to? - memory objects are for caching, and thus for page faults - Spyro: for caching, it's a requirement - for I/O, it's not - you could have I/O without memory objects - ok - so how does the pager/fileserver use memory objects for caching? - does it just map and write to them? - basically yes but there is a complete protocol with the kernel for - that - - http://www.gnu.org/software/hurd/gnumach-doc/External-Memory-Management.html#External-Memory-Management - heh, lucky guess - ty - I am in way over my head here btw - zero experience with micro kernels in practice - it's not trivial - that's not a microkernel thing at all - that's how it works in monolithic kernels too - i recommend netbsd uvm thesis - there are nice pictures describing the vm system - derrr...preacious? - wow - just ignore the anonymous memory handling part which is specific - to uvm - @_@ - the rest is common to practically all VM systems out there - I know about the linux page cache - well it's almost the same - with memory objects being the same thing as files in a page cache? - memory objects are linux "address spaces" - and address spaces are how the linux mm views a file, yes - derp - ... - um... - struvt vm_page == struct page - * Spyro first must learn what an address_space is - struct vm_map == struct mm_struct - struct vm_map_entry == struct vm_area_struct - * Spyro isn't a linux kernel vm expert either - struct vm_object == struct address_space - roughly - details vary a lot - and what's an address_space ? - 11:41 < braunr> and address spaces are how the linux mm views a - file, yes - ok - see include/linux/fs.h - struct address_space_operations is the pager interface - * Spyro should look at the linux kernel sources perhaps, unless you have an - easier reference - embarrassingly, RVR hired me as an editor for the linux-mm wiki - I should know this stuff - see - http://darnassus.sceen.net/~rbraun/design_and_implementation_of_the_uvm_virtual_memory_system.pdf - page 42 - page 66 for another nice view - i wouldn't recommend using linux source as refernece - it's very complicated, filled with a lot of code dealing with - details - lmao - and linux guys have a habit of choosing crappy names - I was only going to - stoppit - except for "linux" and "git" - ...make me laugh any more and I'll need rib surgery - laugh ? - complicated and crappy - seriously, "address space" for a file is very very confusing - oh I agree with that - yes, names are crappy - and the code is very complicated - it took me half an hour to find where readahead is done once - and i'm still not sure it was the right code - so in linkern, there is an address_space for each cached file? - takes me 30 seconds in netbsd .. - yes - eww - yeah, BAD name - but thanks for the explanation - now I finally know what an address space is - many linux core developers admit they don't care much about names - so, in hurd, a memory object is to hurd, what an address_space is - to linux? - yes - notto hurd - ok - to mach - you know what I mean - :P - easier than for linux I can tell you that much - and the bsd vm system is a stripped version of the mach vm - ok - that's why i think it's important to note it - good, I learned something abou tthe linux vm...from the mach guys - this is funny - linux did too - there is a paper about linux page eviction that directly borrows - the mach algorithm and improves it - mach is the historic motivation behind mmap on posix - oh nice! - but yes, linux picked a shitty name - is all that clearer to you ? - I think that address_space connection was a magic bolt of - understanding - and do you see how I/O and paging are mostly unrelated ? - almost - but how does a file I/O take advantage of caching by a memory - object? - does the file server just nudge the core for a hint? - the file system copies from the memory object - * Spyro noddles - I think I understand a bit better now - it's message passing - but I havfe too much to digest already - memory copying - if the memory is already there, good, if not, the kernel will ask - the file system to bring the data - if message passing uses zero copy, data retrieval can be deferred - until the client actually accesses it - which is a fancy way of saying demand paging? :P - it's always demand paging - what i mean is that the file system won't fetch data as soon as it - copies memory - but when this data is actually needed by the client - uh... - whta's a precious page? - let me check quickly - If precious is FALSE, the kernel treats the data as a temporary - and may throw it away if it hasn't been changed. If the precious value is - TRUE, the kernel treats its copy as a data repository and promises to - return it to the manager - basically, it's used when you want the kernel to keep cached data - in memory - the cache becomes a lossless container for such pages - the kernel may flush them, but not evict them - what's the difference? - imagine a ramfs - point made - ok - would be pretty hard to flush something that doesn't have a backing - store - that was quick :) - well - the normal backing store for anonymous memory is the default pager - aka swap - eww - but if you want your data *either* in swap or in memory and never - in both - it may be useful -- cgit v1.2.3