summaryrefslogtreecommitdiff
path: root/open_issues/memory_object_model_vs_block-level_cache.mdwn
diff options
context:
space:
mode:
authorhttps://me.yahoo.com/a/g3Ccalpj0NhN566pHbUl6i9QF0QEkrhlfPM-#b1c14 <diana@web>2015-02-16 20:08:03 +0100
committerGNU Hurd web pages engine <web-hurd@gnu.org>2015-02-16 20:08:03 +0100
commit95878586ec7611791f4001a4ee17abf943fae3c1 (patch)
tree847cf658ab3c3208a296202194b16a6550b243cf /open_issues/memory_object_model_vs_block-level_cache.mdwn
parent8063426bf7848411b0ef3626d57be8cb4826715e (diff)
rename open_issues.mdwn to service_solahart_jakarta_selatan__082122541663.mdwn
Diffstat (limited to 'open_issues/memory_object_model_vs_block-level_cache.mdwn')
-rw-r--r--open_issues/memory_object_model_vs_block-level_cache.mdwn514
1 files changed, 0 insertions, 514 deletions
diff --git a/open_issues/memory_object_model_vs_block-level_cache.mdwn b/open_issues/memory_object_model_vs_block-level_cache.mdwn
deleted file mode 100644
index 22db9b86..00000000
--- a/open_issues/memory_object_model_vs_block-level_cache.mdwn
+++ /dev/null
@@ -1,514 +0,0 @@
-[[!meta copyright="Copyright © 2012, 2013 Free Software Foundation, Inc."]]
-
-[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
-id="license" text="Permission is granted to copy, distribute and/or modify this
-document under the terms of the GNU Free Documentation License, Version 1.2 or
-any later version published by the Free Software Foundation; with no Invariant
-Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license
-is included in the section entitled [[GNU Free Documentation
-License|/fdl]]."]]"""]]
-
-[[!tag open_issue_documentation open_issue_hurd open_issue_gnumach]]
-
-[[!toc]]
-
-
-# IRC, freenode, #hurd, 2012-02-14
-
- <slpz> Open question: what do you think about dropping the memory object
- model and implementing a simple block-level cache?
-
-[[microkernel/mach/memory_object]].
-
- <kilobug> slpz: AFAIK the memory object has more purpose than just cache,
- it's allow used for passing chunk of data between processes, handling
- swap (which similar to cache, but still slightly different), ...
- <slpz> kilobug: user processes usually make their way to data with POSIX
- operations, so memory objects are only needed for mmap'ed files
- <slpz> kilobug: and swap can be replaced for an in-kernel system or even
- could still use the memory object
- <braunr> slpz: memory objects are used for the page cache
- <kilobug> slpz: translators (especially diskfs based) make heavy use of
- memory objects, and if "user processes" use POSIX semantics, Hurd process
- (translators, pagers, ...) shouldn't be bound to POSIX
- <slpz> braunr: and page cache could be moved to a lower level, near to the
- devices
- <braunr> not likely
- <braunr> well, it could, but then you'd still have the file system overhead
- <slpz> kilobug: but the use of memory objects it's not compulsory, you can
- easily write a fs translator without implementing memory objects at all
- (except to mmap)
- <braunr> a unified buffer/VM cache as all modern systems have is probably
- the most efficient approach
- <slpz> braunr: I agree. I want to look at *BSD/Linux vfs systems to seem
- how much cache policy depends on the filesystem
- <slpz> braunr: Are you aware of any good papers on this matter?
- <braunr> netbsd UVM, the linux virtual memory system
- <braunr> both a bit old bit still relevant
- <slpz> braunr: Thanks.
- <slpz> the problem in our case is that having FS and cache information at
- different contexts (kernel vs. translator), I find hard to coordinate
- them.
- <slpz> that's why I though about a block-level cache that GNU Mach could
- manage by itself
- <slpz> I wonder how QNX deals with this
- <braunr> the point of having a simple page cache is explicitely about not
- caring if those pages are blocks or files or whatever
- <braunr> the kernel (at least, mach) normally has all the accounting
- information it needs to implement its cache policy
- <braunr> file system translators shouldn't cache much
- <braunr> the pager interface could be refined, but it looks ok to me as it
- is
- <slpz> Mach has the accounting info, but it's not able to purge the cache
- without coordination with translators
- <braunr> which is normal
- <slpz> And this is a big problem when memory pressure increases, as it
- doesn't know for sure when memory is going to be freed
- <braunr> Mach flushes its cache when it decides to, and sends back dirty
- pages if needed by the pager
- <braunr> that's the case with every paging implementation
- <braunr> the main difference is security with untrusted pagers
- <braunr> but that's another issue
- <slpz> but in a monolithic implementation, the kernel is able for force a
- chunk of cache memory to be freed without hoping for other process to do
- the job
- <braunr> that's not true
- <braunr> they're not process, they're threads, but the timing issue is the
- same
- <braunr> see pdflush on linux
- <slpz> no, it isn't.
- <braunr> when memory is scarce, threads that request memory can either wait
- or immediately fail, and if they wait, they're usually woken by one of
- the vm threads once flushing is done
- <slpz> a kernel thread can access all the information in the kernel, and
- synchronization is pretty easy.
- <braunr> on mach, synchronization is done with messages, that's even easier
- than shared kernel locks
- <slpz> with processes in different spaces, resource coordination becomes
- really difficult
- <braunr> and what kind of info would an external pager need when simply
- asked to take back its dirty pages
- <braunr> what resources ?
- <slpz> just take a look at the thread storm problem when GNU Mach needs to
- clean a bunch of pages
- <braunr> Mach is big enough to correctly account memory
- <braunr> there can be thread storms on monolithic systems
- <braunr> that's a Mach issue, not a microkernel issue
- <braunr> that's why linux limits the number of pdflush thread instances
- <slpz> Mach can account memory, but can't assure when be freed by any
- means, in a lesser degree than a monolithic system
- <braunr> again i disagree
- <braunr> no system can guarantee when memory will be freed with paging
- <slpz> a block level cache can, for most situations
- <braunr> slpz: why ?
- <braunr> slpz: or how i mean ?
- <slpz> braunr: with a block-level page cache, GNU Mach should be able to
- flush dirty pages directly to the underlaying device without all the
- complexity and resource cost involved in a m_o_data_return message. It
- can also throttle the rate at which pages are being cleaned, and do all
- this while blocking new page allocations to deal with memory exhaustion
- cases.
- <slpz> braunr: in the current state, when cleaning dirty pages, GNU Mach
- sends a bunch on m_o_data_return to the corresponding pagers, hoping they
- will do their job as soon and as fast as possible.
- <slpz> memory is not really freed, but transformed from page cache to
- anonymous memory pertaining to the corresponding translator
- <slpz> and GNU Mach never knows for sure when this memory is released, if
- it ever is.
- <slpz> not being able to flush dirty pages synchronously is a big problem
- when you need to throttle memory usage
- <slpz> and needing allocating more memory when you're trying to free (which
- is the case for the m_o_data_return mechanism) makes the problem even
- worse
- <braunr> your idea of a block level cache means in kernel block drivers
- <braunr> that's not the direction we're taking
- <braunr> i agree flushing should be a synchronous process, which was one of
- the proposed improvements in the thread migration papers
- <braunr> (they didn't achieve it but thought about it for future works, so
- that the thread at the origin of the fault would handle it itself)
- <braunr> but it should be possible to have kernel threads similar to
- pdflush and throttle flush requests too
- <braunr> again, i really think it's a mach bug, and having a buffer cache
- would be stepping backward
- <braunr> the real design issue is allocating memory while trying to free
- it, yes
- <slpz> braunr: thread migration doesn't apply to asynchronous IPC, and the
- entire paging mechanism is implemented this way
- <slpz> in fact, trying to do a synchronous m_o_data_return will trigger a
- deadlock for sure
- <slpz> to achieve synchronous flushing with translators, the entire paging
- model must be redesigned
- <slpz> It's true that I'm not very confident of the viability of user space
- drivers
- <slpz> at least, not for every device
- <slpz> I know this is against the current ideas for most ukernel designs,
- but if we want to achieve real work functionality, I think some
- sacrifices must be done. Or at least a reasonable compromise.
- <braunr> slpz: thread migration for paging requests implies synchronous
- RPC, we don't care much about the IPC layer there
- <braunr> and it requires large changes of the VM code in addition, yes
- <braunr> let's not talk about this, we don't have thread migration anyway
- :p
- <braunr> except the allocation-on-free-path issue, i really don't see how
- the current pager interface or the page cache creates problems wrt
- flushing ..
- <braunr> monolithic systems also have that problem, with lower impacts
- though, but still
- <slpz> braunr: because as it doesn't know when memory is really freed, 1)
- it just blindly sends a bunch of m_o_data_return to the pagers, usually
- overloading them (causing thread storms), and 2) it can't properly
- throttle new page requests to deal with resource exhaustion
- <braunr> it does know when memory is really freed
- <braunr> and yes, it blindly sends a bunch of requests, they can and should
- be trottled
- <slpz> but dirty pages freed become indistinguishable from common anonymous
- chunks released, so it doesn't really know if page flushes are really
- working or not (i.e. doesn't know how fast a device is processing write
- requests)
- <braunr> memory is freed when the pager deallocates it
- <braunr> the speed of the operation is irrelevant
- <braunr> no system can rely on disk speed to guarantee correct page flushes
- <braunr> disk or anything else
- <slpz> requests can't be throttled if Mach doesn't know when they are being
- processed
- <braunr> it can easily know it
- <braunr> they are processed as soon as the request is sent from the kernel
- <braunr> and processing is done when the pager acknowledges the end of the
- flush
- <braunr> memory backing the flushed pages should be released before
- acknowleding that to avoid starting new requests too soon
- <slpz> AFAIK pagers doesn't acknowledge the end of the flush
- <braunr> well that's where the interface should be refined
- <slpz> Mach just sends the m_o_data_return and continues on its own
- <braunr> that's why flushing should be synrhconous
- <braunr> are you sure about that however ?
- <slpz> so the entire paging system needs a new design... :)
- <slpz> pretty sure
- <braunr> not a new design ..
- <braunr> there is m_o_supply_completed, i don't see how difficult it would
- be to add m_o_data_return_completed
- <braunr> it's not a small change, but not a difficult one either
- <braunr> i'm more worried about the allocation problem
- <braunr> the default pager should probably be wired in memory
- <braunr> maybe others
- <slpz> let's suppose a case in which Mach needs to free memory due to an
- increase in its pressure. vm_pageout_daemon starts running, clean pages
- are freed easily, but for each dirty one a m_o_data_return in sent. 1)
- when should this daemon stop sending m_o_data_return and start waiting
- for m_o_data_return_completed? 2) what happens if the translator needs to
- read new blocks to fulfill a write request (pretty common in ext2fs)?
- <braunr> it should stop after an arbitrary limit is reached
- <braunr> a reasonable one
- <braunr> linux limits the number of pdflush threads for that reason as i
- mentioned (to 8 iirc)
- <braunr> the problem of reading blocks while flushing is what i'm worried
- about too, hence the need to wire that code
- <braunr> well, i'm nto sure it's needed
- <braunr> again, a reasonable about of free memory should be reserved for
- that at all times
- <slpz> but the work for pdflush seems to be a lot easier, as it only deals
- directly with block devices (if I understood it correctly, I just started
- looking at it).
- <braunr> i don't know how other systems compute that, but this is how they
- seem to do as well
- <braunr> no, i don't think so
- <slpz> well, I'll try to invest a few days understanding how pdflush work,
- to see if some ideas can be borrowed for Hurd
- <braunr> iirc, freebsd has thresholds in percent for each part of its cache
- (active, inactive, free, dirty)
- <slpz> but I still think simple solutions work better, and using the memory
- object for page cache is tremendously complex.
- <braunr> the amount of free cache pages is generally sufficient to
- guarantee much memory can be released at once if needed, without flushing
- anything
- <braunr> yes but that's the whole point of the Mach VM
- <braunr> and its greatest advance ..
- <slpz> what, memory objects?
- <braunr> yes
- <braunr> using physical memory as a cache for anything, not just block
- buffers
- <slpz> memory objects work great as a way to provide a shared image of
- objects between processes, but as page cache they are an overkill (IMHO).
- <slpz> or, at least, in the way we're using them
- <braunr> probably
- <braunr> http://lwn.net/Articles/326552/
- <braunr> this can help udnerstand the problems we may have without better
- knowledge of the underlying devices, yes
- <braunr> (e.g. not being able to send multiple requests to pagers that
- don't share the same disk)
- <braunr> slpz: actually i'm not sure it's that overkill
- <braunr> the linux vm uses struct vm_file to represent memory objects iirc
- <braunr> there are many links between that structure and some vfs related
- subsystems
- <braunr> when a system very actively uses the page cache, the kernel has to
- maintain a lot of objects to accurately describe the cache content
- <braunr> you could consider this overkill at first too
- <braunr> the mach way of doing it just implies some ipc messages instead of
- function calls, it's not that overkill for me
- <braunr> the main problems are recursion (allocation while freeing,
- handling page faults in order to handle flushes, that sort of things)
- <braunr> struct file and struct address_space actually
- <braunr> slpz: see struct address_space, it contains a set of function
- pointers that can help understanding the linux pager interface
- <braunr> they probably sufferred from similar caveats and worked around
- them, adjusting that interface on the way
- <slpz> but their strategy makes them able to treat the relationship between
- the page cache and the block devices in a really simple way, almost as a
- traditional storage cache.
- <slpz> meanwhile on Mach+pager scenario, the relationship between a block
- in a file and its underlying storage becomes really blurry
- <slpz> this is a huge advantage when flusing out data, specially when
- resources are scarce
- <slpz> I think the idea of using abstract objects for page cache, loses a
- bit the point that we just want to avoid accessing constantly to a slow
- device
- <slpz> and breaking the tight relationship between the device and its
- cache, makes things a lot harder
- <slpz> this also manifest itself when flushing clean pages, as things like
- having an static maximum for cached memory objects
- <slpz> we shouldn't care about the number of objects, we just need to
- control the number of pages
- <slpz> but as we need the pager to flush pages, we need to keep alive a lot
- of control ports to them
- <mcsim> slpz: When mo_data_return is called, once the memory manager no
- longer needs supplied data, it should be deallocated using
- vm_deallocate. So this way pagers acknowledges the end of flush.
-
-
-# IRC, freenode, #hurd, 2013-08-26
-
- < Spyro> Ok, so
- < Spyro> idiot question: in a nutshell, what is a memory object?
- < Spyro> and how is swapping/paging handled?
- < braunr> Spyro: a memory object is how the virtual memory system views a
- file
- < braunr> so it's a sequence of bytes with a length
- < braunr> "swapping" is just a special case of paging that applies to
- anonymous objects
- < braunr> (which are named so because they're not associated with a file
- and have no name)
- < Spyro> Who creates a memory object, and when?
- < braunr> pagers create memory objects when needed, e.g. when you open a
- file
- < Spyro> and this applies both to mmap opens as well as regular I/O opens
- as in for read() and write()?
- < braunr> basically, all file systems capable of handling mmap requests
- and/or caching in physical memory are pagers
- < braunr> yes
- < braunr> read/write will go through the page cache when possible
- < Spyro> and who owns the page cache?
- < Spyro> also, who decides what pages ot evict to swap/file if physical
- memory gets tight?
- < braunr> the kernel
- < braunr> that's one of the things that make mach a hybrid
- < Spyro> so the kernel owns the page cage?
- < Spyro> ...fml
- < Spyro> cache!
- < braunr> yes
-
-
-## IRC, freenode, #hurd, 2013-08-27
-
- < Spyro> so braunr: So, who creates the memory object, and how does it get
- populated?
- < Spyro> and how does a process accessing a file get hooked up to the
- memory object?
- < braunr> Spyro: i told you, pagers create memory objects
- < braunr> memory objects are how the VM system views files, so they're
- populated from the content of files
- < braunr> either true files or virtual files such as in /proc
- < braunr> Spyro: processes don't directly access memory objects unless
- memory mapping them with vm_map()
- < braunr> pagers (basically = file systems) do
- <Spyro> ok, so how is a pager/fs involved in handling a fault?
-
-
-## IRC, freenode, #hurd, 2013-08-28
-
- <braunr> Spyro: each object is linked to a pager
- <braunr> Spyro: when a fault occurs, the kernel looks up the VM map (kernel
- or a user one), and the address in this map, then the map entry, checks
- access and lots of other details
- <Spyro> ok, so it's pager -> object -> vmem
- <Spyro> ?
- <braunr> Spyro: then finds the object mapped at that address (similar to
- how a file is mapped with mmap)
- <braunr> from the object, it finds the pager
- <Spyro> ok
- <braunr> and asks the pager about the data at the appropriate offset
- <Spyro> so how does a user process do normal file I/O? is faulting just a
- special case of it?
- <braunr> it's completely separate
- <Spyro> eww
- <braunr> normal I/O is done with message passing
- <braunr> the hurd io interface
- <Spyro> ok
- <Spyro> so who talks to who on a file I/O?
- <braunr> a client (e.g. cat) talks to a file system server (e.g. ext2fs)
- <Spyro> ok so
- <Spyro> it's client to the pager for regular file I/O?
- <braunr> Spyro: i don't understand the question
- <braunr> Spyro: it's client to server, the server might not be a pager
- <Spyro> ok
- <Spyro> just trying to figure out the difference between paging/faulting
- and regular I/O
- <braunr> regular I/O is just message passing
- <braunr> page fault handling is dealt with by pagers
- <Spyro> and I have a hunch that the fs/pager is involved somehow in both,
- because the server is the source of the data
- <Spyro> I'm getting a headache
- <braunr> nalaginrut: a server like ext2fs is both a file server and a pager
- <Spyro> oh!
- <Spyro> oh btw, does a file server make use of memory objects for caching?
- <braunr> Spyro: yes
- <Spyro> or rather, can it?
- <Spyro> does it have to?
- <braunr> memory objects are for caching, and thus for page faults
- <braunr> Spyro: for caching, it's a requirement
- <braunr> for I/O, it's not
- <braunr> you could have I/O without memory objects
- <Spyro> ok
- <Spyro> so how does the pager/fileserver use memory objects for caching?
- <Spyro> does it just map and write to them?
- <braunr> basically yes but there is a complete protocol with the kernel for
- that
- <braunr>
- http://www.gnu.org/software/hurd/gnumach-doc/External-Memory-Management.html#External-Memory-Management
- <Spyro> heh, lucky guess
- <Spyro> ty
- <Spyro> I am in way over my head here btw
- <Spyro> zero experience with micro kernels in practice
- <braunr> it's not trivial
- <braunr> that's not a microkernel thing at all
- <braunr> that's how it works in monolithic kernels too
- <braunr> i recommend netbsd uvm thesis
- <braunr> there are nice pictures describing the vm system
- <Spyro> derrr...preacious?
- <Spyro> wow
- <braunr> just ignore the anonymous memory handling part which is specific
- to uvm
- <Spyro> @_@
- <braunr> the rest is common to practically all VM systems out there
- <Spyro> I know about the linux page cache
- <braunr> well it's almost the same
- <Spyro> with memory objects being the same thing as files in a page cache?
- <braunr> memory objects are linux "address spaces"
- <braunr> and address spaces are how the linux mm views a file, yes
- <Spyro> derp
- <Spyro> ...
- <Spyro> um...
- <braunr> struvt vm_page == struct page
- * Spyro first must learn what an address_space is
- <braunr> struct vm_map == struct mm_struct
- <braunr> struct vm_map_entry == struct vm_area_struct
- * Spyro isn't a linux kernel vm expert either
- <braunr> struct vm_object == struct address_space
- <braunr> roughly
- <braunr> details vary a lot
- <Spyro> and what's an address_space ?
- <braunr> 11:41 < braunr> and address spaces are how the linux mm views a
- file, yes
- <Spyro> ok
- <braunr> see include/linux/fs.h
- <braunr> struct address_space_operations is the pager interface
- * Spyro should look at the linux kernel sources perhaps, unless you have an
- easier reference
- <Spyro> embarrassingly, RVR hired me as an editor for the linux-mm wiki
- <Spyro> I should know this stuff
- <braunr> see
- http://darnassus.sceen.net/~rbraun/design_and_implementation_of_the_uvm_virtual_memory_system.pdf
- <braunr> page 42
- <braunr> page 66 for another nice view
- <braunr> i wouldn't recommend using linux source as refernece
- <braunr> it's very complicated, filled with a lot of code dealing with
- details
- <Spyro> lmao
- <braunr> and linux guys have a habit of choosing crappy names
- <Spyro> I was only going to
- <Spyro> stoppit
- <braunr> except for "linux" and "git"
- <Spyro> ...make me laugh any more and I'll need rib surgery
- <braunr> laugh ?
- <Spyro> complicated and crappy
- <braunr> seriously, "address space" for a file is very very confusing
- <Spyro> oh I agree with that
- <braunr> yes, names are crappy
- <braunr> and the code is very complicated
- <braunr> it took me half an hour to find where readahead is done once
- <braunr> and i'm still not sure it was the right code
- <Spyro> so in linkern, there is an address_space for each cached file?
- <braunr> takes me 30 seconds in netbsd ..
- <braunr> yes
- <Spyro> eww
- <Spyro> yeah, BAD name
- <Spyro> but thanks for the explanation
- <Spyro> now I finally know what an address space is
- <braunr> many linux core developers admit they don't care much about names
- <Spyro> so, in hurd, a memory object is to hurd, what an address_space is
- to linux?
- <braunr> yes
- <braunr> notto hurd
- <Spyro> ok
- <braunr> to mach
- <Spyro> you know what I mean
- <Spyro> :P
- <Spyro> easier than for linux I can tell you that much
- <braunr> and the bsd vm system is a stripped version of the mach vm
- <Spyro> ok
- <braunr> that's why i think it's important to note it
- <Spyro> good, I learned something abou tthe linux vm...from the mach guys
- <Spyro> this is funny
- <braunr> linux did too
- <braunr> there is a paper about linux page eviction that directly borrows
- the mach algorithm and improves it
- <braunr> mach is the historic motivation behind mmap on posix
- <Spyro> oh nice!
- <Spyro> but yes, linux picked a shitty name
- <braunr> is all that clearer to you ?
- <Spyro> I think that address_space connection was a magic bolt of
- understanding
- <braunr> and do you see how I/O and paging are mostly unrelated ?
- <Spyro> almost
- <Spyro> but how does a file I/O take advantage of caching by a memory
- object?
- <Spyro> does the file server just nudge the core for a hint?
- <braunr> the file system copies from the memory object
- * Spyro noddles
- <Spyro> I think I understand a bit better now
- <braunr> it's message passing
- <Spyro> but I havfe too much to digest already
- <braunr> memory copying
- <braunr> if the memory is already there, good, if not, the kernel will ask
- the file system to bring the data
- <braunr> if message passing uses zero copy, data retrieval can be deferred
- until the client actually accesses it
- <Spyro> which is a fancy way of saying demand paging? :P
- <braunr> it's always demand paging
- <braunr> what i mean is that the file system won't fetch data as soon as it
- copies memory
- <braunr> but when this data is actually needed by the client
- <Spyro> uh...
- <Spyro> whta's a precious page?
- <braunr> let me check quickly
- <braunr> If precious is FALSE, the kernel treats the data as a temporary
- and may throw it away if it hasn't been changed. If the precious value is
- TRUE, the kernel treats its copy as a data repository and promises to
- return it to the manager
- <braunr> basically, it's used when you want the kernel to keep cached data
- in memory
- <braunr> the cache becomes a lossless container for such pages
- <braunr> the kernel may flush them, but not evict them
- <Spyro> what's the difference?
- <braunr> imagine a ramfs
- <Spyro> point made
- <braunr> ok
- <Spyro> would be pretty hard to flush something that doesn't have a backing
- store
- <braunr> that was quick :)
- <braunr> well
- <braunr> the normal backing store for anonymous memory is the default pager
- <braunr> aka swap
- <Spyro> eww
- <braunr> but if you want your data *either* in swap or in memory and never
- in both
- <braunr> it may be useful