summaryrefslogtreecommitdiff
path: root/microkernel/mach/externalpagermechanism.mdwn
diff options
context:
space:
mode:
Diffstat (limited to 'microkernel/mach/externalpagermechanism.mdwn')
-rw-r--r--microkernel/mach/externalpagermechanism.mdwn189
1 files changed, 189 insertions, 0 deletions
diff --git a/microkernel/mach/externalpagermechanism.mdwn b/microkernel/mach/externalpagermechanism.mdwn
new file mode 100644
index 00000000..1ccab6c4
--- /dev/null
+++ b/microkernel/mach/externalpagermechanism.mdwn
@@ -0,0 +1,189 @@
+[[license text="""
+Copyright © 2002, 2007 Free Software Foundation, Inc.
+
+Permission is granted to copy, distribute and/or modify this document under the
+terms of the GNU Free Documentation License, Version 1.2 or any later version
+published by the Free Software Foundation; with no Invariant Sections, no
+Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included
+in the section entitled [[GNU_Free_Documentation_License|/fdl.txt]].
+
+By contributing to this page, you agree to assign copyright for your
+contribution to the Free Software Foundation. The Free Software Foundation
+promises to always use either a verbatim copying license or a free
+documentation license when publishing your contribution. We grant you back all
+your rights under copyright, including the rights to copy, modify, and
+redistribute your contributions.
+"""]]
+
+Mach provides a so-called external pager [[mechanism]]. This
+mechanism serves to separate *managing memory* from *managing
+content*. Mach does the former while user space tasks do the
+latter.
+
+# Introduction
+
+In Mach, a task's [[Mach/AddressSpace]] consists of references
+to [[Mach/MemoryObjects]]. A memory object is designated using
+a [[port]] (a port is just a [[capability]]) and
+implemented by a normal process.
+
+To associate a memory object with a portion of a task's
+address space, vm\_map is invoked a capability designating
+the task and passing a reference to the memory object
+and the offset at which to install it. (The first time
+a task maps an object, Mach sends an initialization message
+to the server including a control capability, which it uses
+to supply pages to the kernel.) This is essentially
+the same as mapping a file into an address space on Unix
+using mmap.
+
+When a task faults, Mach checks to see if there is a memory
+object associated with the fault address. If not, the task
+is sent an exception, which is normally further propagated
+as a segmentation fault. If there is an associated memory
+object, Mach checks whether the corresponding page is in core.
+If it is, it installs the page and resumes the task. Mach
+then invokes the memory object with the memory\_object\_request
+method and the page to read. The memory manager then fetches
+or creates the content as appropriate and supplies it to
+Mach using the memory\_object\_supply method.
+
+
+# Creating and Mapping a Memory Object
+
+The following illustrates the basic idea:
+
+> ________
+> / \
+> | Mach |
+> \________/
+> /| / |\ \
+> (C) vm_map / / m_o_ready (E)\ \ (D) memory_object_init
+> / |/ (F) return \ \|
+> ________ ________
+> / \ -----> / \
+> | Client | (A) open | Server |
+> \________/ <----- \________/
+> (B) memory_object
+
+
+(A) The client sends an "open" rpc to the server.
+
+(B) The server creates a memory object (i.e., a port receive right), adds
+it to the port set that it is listening on and returns a capability (a port
+send right) to the client.
+
+(C) The client attempts to map the object into its address space using
+the vm\_map rpc. It passes a reference to the port that the server gave
+it to the vm server (typically Mach).
+
+(D) Since Mach has never seen the object before, it queues a
+memory\_object\_init on the given port along with a send right (the
+memory control port) for the manager to use to send messages to the
+kernel and also as an authentication mechanism for future
+interactions: the port is supplied so that the manager will be able to
+identify from which kernel a given memory\_object\_* IPC is from.
+
+(E) The server dequeues the message, initializes internal data
+structures to manage the mapping and then invokes the
+memory\_object\_ready method on the control object.
+
+(F) The kernel sees that the manager is ready, sets up the appropriate
+mappings in the client and then replies to the vm\_map rpc indicating
+success.
+
+There is nothing stopping others from playing "the kernel." This is
+not a security problem: clients must [[trust]] the server from whom they
+obtain memory objects and also the servers with whom they share
+the object. Multiple memory managers are a reality that should be
+dealt with gracefully: they are useful for network transparent
+mappings etc.
+
+# Resolving Page Faults
+
+> (G) Client ________
+> resumed / \
+> | Mach |
+> (A) Fault +----|------+ | \ (B) m_o_request (C) store_read
+> ____|___ \_____|__/ |\ \| ________ _________
+> / +---\-------+ \ / \ / \
+> | Client | (F) | Server |<===>| storeio |
+> \________/ m_o_supply \________/ \_________/
+> (E) return data | ^
+> | | (D) device_read
+> v |
+> ________
+> / Device \
+> | Driver |
+> \________/
+> | ^
+> | |
+> v
+> ____________
+> / Hardware \
+
+
+(A) The client does a memory access and faults. The kernel catches
+the fault and maps the address to the appropriate memory object. It
+then invokes the memory_object_request method on the associated
+capability. (In addition to the page to supply, it also supplies the
+control port so that the server can determine which kernel
+sent the message.)
+
+(B) The manager dequeues the message. On the Hurd, this is translated
+into a store_read: a function in the libstore library which is used to
+transparently manage block devices. The storeio server starts off as
+a separate process, however, if the server has the appropriate
+permission, the backing object can be contacted directly by the
+server. This layer of indirection is desirable when, for instance, a
+storeio running as root may want to only permit read only access to a
+resource, yet it cannot safely transfer its handle to the client. In
+this case, it would proxy the requests.
+
+(C) The storeio server contacts, for instance, a device driver to do
+the read. This could also be a network block device (the NBD server
+in GNU/Linux), a file, a memory object, etc.
+
+(D) The device driver allocates an anonymous page from the default
+pager and reads the data into it. Once all of the operations are
+complete, the device returns the data to the client unmapping it from
+its own address space at the same time.
+
+(E) The storeio transfers the page to the server. The page is still
+anonymous.
+
+(F) The manager does a memory_object_supply transferring the page to
+the kernel. Only now is the page not considered to be anonymous but
+managed.
+
+(G) The kernel caches the page, installs it in the client's virtual
+address space and finally, resumes the client.
+
+# Paging Data Out
+
+
+> Change manager Pager m_o_return store_write
+> \ _________ (B) __(A)__ (C) ________ (D) _______
+> S | / Default \ / \ / \ / \
+> W |<=>| Pager |<=>| Mach |==>| server |<=>| storeio |<=>
+> A | \_________/ \________/ \________/ \_______/
+> P |
+> /
+
+
+(A) The paging [[policy]] is implemented by Mach: servers just implement
+the [[mechanism]].
+
+(B) Once the kernel has selected a page that it would like to evict, it
+changes the manager from the server to the default pager. This way,
+if the server does not deallocate the page quickly enough, it cannot
+cause a denial of service: the kernel will just later double page it
+to swap (the default pager is part of the [[tcb]]).
+
+(C) Mach then invokes memory\_object\_return method on the control
+object. The server is expected to save the page free it in a timely
+fashion. The server is not required to send a response to the kernel.
+
+(D) The manager then transfers the data to the storeio which
+eventually sends it to disk. The device driver consumes the memory
+doing the equivalent of a vm_deallocate.