From 9cedf47ebaebf9224fbbb9670001567098943d79 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Fri, 14 Sep 2007 16:35:32 +0200 Subject: Rename some pages to be more consistent. --- microkernel/mach/external_pager_mechanism.mdwn | 188 +++++++++++++++++++++++++ 1 file changed, 188 insertions(+) create mode 100644 microkernel/mach/external_pager_mechanism.mdwn (limited to 'microkernel/mach/external_pager_mechanism.mdwn') diff --git a/microkernel/mach/external_pager_mechanism.mdwn b/microkernel/mach/external_pager_mechanism.mdwn new file mode 100644 index 00000000..169745fb --- /dev/null +++ b/microkernel/mach/external_pager_mechanism.mdwn @@ -0,0 +1,188 @@ +[[license text=""" +Copyright © 2002, 2007 Free Software Foundation, Inc. + +Permission is granted to copy, distribute and/or modify this document under the +terms of the GNU Free Documentation License, Version 1.2 or any later version +published by the Free Software Foundation; with no Invariant Sections, no +Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included +in the section entitled [[GNU_Free_Documentation_License|/fdl.txt]]. + +By contributing to this page, you agree to assign copyright for your +contribution to the Free Software Foundation. The Free Software Foundation +promises to always use either a verbatim copying license or a free +documentation license when publishing your contribution. We grant you back all +your rights under copyright, including the rights to copy, modify, and +redistribute your contributions. +"""]] + +Mach provides a so-called external pager [[mechanism]]. This +mechanism serves to separate *managing memory* from *managing +content*. Mach does the former while user space tasks do the +latter. + + +# Introduction + +In Mach, a task's [[Mach/AddressSpace]] consists of references +to [[Mach/MemoryObjects]]. A memory object is designated using +a [[port]] (a port is just a [[capability]]) and +implemented by a normal process. + +To associate a memory object with a portion of a task's +address space, vm\_map is invoked a capability designating +the task and passing a reference to the memory object +and the offset at which to install it. (The first time +a task maps an object, Mach sends an initialization message +to the server including a control capability, which it uses +to supply pages to the kernel.) This is essentially +the same as mapping a file into an address space on Unix +using mmap. + +When a task faults, Mach checks to see if there is a memory +object associated with the fault address. If not, the task +is sent an exception, which is normally further propagated +as a segmentation fault. If there is an associated memory +object, Mach checks whether the corresponding page is in core. +If it is, it installs the page and resumes the task. Mach +then invokes the memory object with the memory\_object\_request +method and the page to read. The memory manager then fetches +or creates the content as appropriate and supplies it to +Mach using the memory\_object\_supply method. + + +# Creating and Mapping a Memory Object + +The following illustrates the basic idea: + +> ________ +> / \ +> | Mach | +> \________/ +> /| / |\ \ +> (C) vm_map / / m_o_ready (E)\ \ (D) memory_object_init +> / |/ (F) return \ \| +> ________ ________ +> / \ -----> / \ +> | Client | (A) open | Server | +> \________/ <----- \________/ +> (B) memory_object + +(A) The client sends an "open" rpc to the server. + +(B) The server creates a memory object (i.e., a port receive right), adds +it to the port set that it is listening on and returns a capability (a port +send right) to the client. + +(C) The client attempts to map the object into its address space using +the vm\_map rpc. It passes a reference to the port that the server gave +it to the vm server (typically Mach). + +(D) Since Mach has never seen the object before, it queues a +memory\_object\_init on the given port along with a send right (the +memory control port) for the manager to use to send messages to the +kernel and also as an authentication mechanism for future +interactions: the port is supplied so that the manager will be able to +identify from which kernel a given memory\_object\_* IPC is from. + +(E) The server dequeues the message, initializes internal data +structures to manage the mapping and then invokes the +memory\_object\_ready method on the control object. + +(F) The kernel sees that the manager is ready, sets up the appropriate +mappings in the client and then replies to the vm\_map rpc indicating +success. + +There is nothing stopping others from playing "the kernel." This is +not a security problem: clients must [[trust]] the server from whom they +obtain memory objects and also the servers with whom they share +the object. Multiple memory managers are a reality that should be +dealt with gracefully: they are useful for network transparent +mappings etc. + + +# Resolving Page Faults + +> (G) Client ________ +> resumed / \ +> | Mach | +> (A) Fault +----|------+ | \ (B) m_o_request (C) store_read +> ____|___ \_____|__/ |\ \| ________ _________ +> / +---\-------+ \ / \ / \ +> | Client | (F) | Server |<===>| storeio | +> \________/ m_o_supply \________/ \_________/ +> (E) return data | ^ +> | | (D) device_read +> v | +> ________ +> / Device \ +> | Driver | +> \________/ +> | ^ +> | | +> v +> ____________ +> / Hardware \ + +(A) The client does a memory access and faults. The kernel catches +the fault and maps the address to the appropriate memory object. It +then invokes the memory\_object\_request method on the associated +capability. (In addition to the page to supply, it also supplies the +control port so that the server can determine which kernel +sent the message.) + +(B) The manager dequeues the message. On the Hurd, this is translated +into a store\_read: a function in the libstore library which is used to +transparently manage block devices. The storeio server starts off as +a separate process, however, if the server has the appropriate +permission, the backing object can be contacted directly by the +server. This layer of indirection is desirable when, for instance, a +storeio running as root may want to only permit read only access to a +resource, yet it cannot safely transfer its handle to the client. In +this case, it would proxy the requests. + +(C) The storeio server contacts, for instance, a device driver to do +the read. This could also be a network block device (the NBD server +in GNU/Linux), a file, a memory object, etc. + +(D) The device driver allocates an anonymous page from the default +pager and reads the data into it. Once all of the operations are +complete, the device returns the data to the client unmapping it from +its own address space at the same time. + +(E) The storeio transfers the page to the server. The page is still +anonymous. + +(F) The manager does a memory\_object\_supply transferring the page to +the kernel. Only now is the page not considered to be anonymous but +managed. + +(G) The kernel caches the page, installs it in the client's virtual +address space and finally, resumes the client. + + +# Paging Data Out + +> Change manager Pager m_o_return store_write +> \ _________ (B) __(A)__ (C) ________ (D) _______ +> S | / Default \ / \ / \ / \ +> W |<=>| Pager |<=>| Mach |==>| server |<=>| storeio |<=> +> A | \_________/ \________/ \________/ \_______/ +> P | +> / + +(A) The paging [[policy]] is implemented by Mach: servers just implement +the [[mechanism]]. + +(B) Once the kernel has selected a page that it would like to evict, it +changes the manager from the server to the default pager. This way, +if the server does not deallocate the page quickly enough, it cannot +cause a denial of service: the kernel will just later double page it +to swap (the default pager is part of the [[tcb]]). + +(C) Mach then invokes memory\_object\_return method on the control +object. The server is expected to save the page free it in a timely +fashion. The server is not required to send a response to the kernel. + +(D) The manager then transfers the data to the storeio which +eventually sends it to disk. The device driver consumes the memory +doing the equivalent of a vm\_deallocate. -- cgit v1.2.3