diff options
author | GNU Hurd wiki engine <web-hurd@gnu.org> | 2007-08-19 18:40:45 +0000 |
---|---|---|
committer | GNU Hurd wiki engine <web-hurd@gnu.org> | 2007-08-19 18:40:45 +0000 |
commit | 982fc1df4e827fd1af88390b04b13b9ef0e94ad7 (patch) | |
tree | 3c7cf40da9bb4c122f170de95faaf5d8ae8d6d59 /mach/externalpagermechanism.mdwn | |
parent | 95d4a891ade88062b7388728c843fde9ac0ed64e (diff) |
web commit by NealWalfield: Create.
Diffstat (limited to 'mach/externalpagermechanism.mdwn')
-rw-r--r-- | mach/externalpagermechanism.mdwn | 172 |
1 files changed, 172 insertions, 0 deletions
diff --git a/mach/externalpagermechanism.mdwn b/mach/externalpagermechanism.mdwn new file mode 100644 index 00000000..71ffab12 --- /dev/null +++ b/mach/externalpagermechanism.mdwn @@ -0,0 +1,172 @@ +Mach provides a so-called external pager mechanism. This +mechanism serves to separate *managing memory* from *managing +content*. Mach does the former while user space tasks do the +latter. + +# Introduction + +In Mach, a task's address space consists of references to +[[Mach/MemoryObjects]]. A memory object is designated using +a [[Mac/Port]] (a port is just a [[capability]]) and +implemented by a normal process. + +To associate a memory object with a portion of a task's +address space, vm\_map is invoked a capability designating +the task and passing a reference to the memory object +and the offset at which to install it. (The first time +a task maps an object, Mach sends an initialization message +to the server including a control capability, which it uses +to supply pages to the kernel.) This is essentially +the same as mapping a file into an address space on Unix +using mmap. + +When a task faults, Mach checks to see if there is a memory +object associated with the fault address. If not, the task +is sent an exception, which is normally further propagated +as a segmentation fault. If there is an associated memory +object, Mach checks whether the corresponding page is in core. +If it is, it installs the page and resumes the task. Mach +then invokes the memory object with the memory\_object\_request +method and the page to read. The memory manager then fetches +or creates the content as appropriate and supplies it to +Mach using the memory\_object\_supply method. + + +# Creating and Mapping a Memory Object + +The following illustrates the basic idea: + +> ________ +> / \ +> | Mach | +> \________/ +> /| / |\ \ +> (C) vm_map / / m_o_ready (E)\ \ (D) memory_object_init +> / |/ (F) return \ \| +> ________ ________ +> / \ -----> / \ +> | Client | (A) open | Server | +> \________/ <----- \________/ +> (B) memory_object + + +(A) The client sends an "open" rpc to the server. + +(B) The server creates a memory object (i.e., a port receive right), adds +it to the port set that it is listening on and returns a capability (a port +send right) to the client. + +(C) The client attempts to map the object into its address space using +the vm_map rpc. It passes a reference to the port that the server gave +it to the vm server (typically Mach). + +(D) Since Mach has never seen the object before, it queues a +memory_object_init on the given port along with a send right (the +memory control port) for the manager to use to send messages to the +kernel and also as an authentication mechanism for future +interactions: the port is supplied so that the manager will be able to +identify from which kernel a given memory\_object\_* IPC is from. + +(E) The server dequeues the message, initializes internal data +structures to manage the mapping and then invokes the +memory\_object\_ready method on the control object. + +(F) The kernel sees that the manager is ready, sets up the appropriate +mappings in the client and then replies to the vm_map rpc indicating +success. + +There is nothing stopping others from playing "the kernel." This is +not a security problem: clients must [[trust]] the server from whom they +obtain memory objects and also the servers with whom they share +the object. Multiple memory managers are a reality that should be +dealt with gracefully: they are useful for network transparent +mappings etc. + +# Resolving Page Faults + + +> (G) Client ________ +> resumed / \ +> | Mach | +> (A) Fault +----|------+ | \ (B) m_o_request (C) store_read +> ____|___ \_____|__/ |\ \| ________ _________ +> / +---\-------+ \ / \ / \ +> | Client | (F) | Server |<===>| storeio | +> \________/ m_o_supply \________/ \_________/ +> (E) return data | ^ +> | | (D) device_read +> v | +> ________ +> / Device \ +> | Driver | +> \________/ +> | ^ +> | | +> v +> ____________ +> / Hardware \ + +(A) The client does a memory access and faults. The kernel catches +the fault and maps the address to the appropriate memory object. It +then invokes the memory_object_request method on the associated +capability. (In addition to the page to supply, it also supplies the +control port so that the server can determine which kernel +sent the message.) + +(B) The manager dequeues the message. On the Hurd, this is translated +into a store_read: a function in the libstore library which is used to +transparently manage block devices. The storeio server starts off as +a separate process, however, if the server has the appropriate +permission, the backing object can be contacted directly by the +server. This layer of indirection is desirable when, for instance, a +storeio running as root may want to only permit read only access to a +resource, yet it cannot safely transfer its handle to the client. In +this case, it would proxy the requests. + +(C) The storeio server contacts, for instance, a device driver to do +the read. This could also be a network block device (the NBD server +in GNU/Linux), a file, a memory object, etc. + +(D) The device driver allocates an anonymous page from the default +pager and reads the data into it. Once all of the operations are +complete, the device returns the data to the client unmapping it from +its own address space at the same time. + +(E) The storeio transfers the page to the server. The page is still +anonymous. + +(F) The manager does a memory_object_supply transferring the page to +the kernel. Only now is the page not considered to be anonymous but +managed. + +(G) The kernel caches the page, installs it in the client's virtual +address space and finally, resumes the client. + +# Paging Data Out + + + Change manager Pager m_o_return store_write +> \ _________ (B) __(A)__ (C) ________ (D) _______ +> S | / Default \ / \ / \ / \ +> W |<=>| Pager |<=>| Mach |==>| server |<=>| storeio |<=> +> A | \_________/ \________/ \________/ \_______/ +> P | +> / + +(A) The paging [[policy]] is implemented by Mach: servers just implement +the [[mechanism]]. + +(B) Once the kernel has selected a page that it would like to evict, it +changes the manager from the server to the default pager. This way, +if the server does not deallocate the page quickly enough, it cannot +cause a denial of service: the kernel will just later double page it +to swap (the default pager is part of the [[tcb]]). + +(C) Mach then invokes memory\_object\_return method on the control +object. The server is expected to save the page free it in a timely +fashion. The server is not required to send a response to the kernel. + +(D) The manager then transfers the data to the storeio which +eventually sends it to disk. The device driver consumes the memory +doing the equivalent of a vm_deallocate. + |