summaryrefslogtreecommitdiff
path: root/microkernel/mach/externalpagermechanism.mdwn
blob: 169745fbe50fad6d20386e9f93b86c0bbf389e17 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
[[license text="""
Copyright © 2002, 2007 Free Software Foundation, Inc.

Permission is granted to copy, distribute and/or modify this document under the
terms of the GNU Free Documentation License, Version 1.2 or any later version
published by the Free Software Foundation; with no Invariant Sections, no
Front-Cover Texts, and no Back-Cover Texts.  A copy of the license is included
in the section entitled [[GNU_Free_Documentation_License|/fdl.txt]].

By contributing to this page, you agree to assign copyright for your
contribution to the Free Software Foundation.  The Free Software Foundation
promises to always use either a verbatim copying license or a free
documentation license when publishing your contribution.  We grant you back all
your rights under copyright, including the rights to copy, modify, and
redistribute your contributions.
"""]]

Mach provides a so-called external pager [[mechanism]].  This
mechanism serves to separate *managing memory* from *managing
content*.  Mach does the former while user space tasks do the
latter.


# Introduction

In Mach, a task's [[Mach/AddressSpace]] consists of references
to [[Mach/MemoryObjects]].  A memory object is designated using
a [[port]] (a port is just a [[capability]]) and
implemented by a normal process.

To associate a memory object with a portion of a task's
address space, vm\_map is invoked a capability designating
the task and passing a reference to the memory object
and the offset at which to install it.  (The first time
a task maps an object, Mach sends an initialization message
to the server including a control capability, which it uses
to supply pages to the kernel.)  This is essentially
the same as mapping a file into an address space on Unix
using mmap.

When a task faults, Mach checks to see if there is a memory
object associated with the fault address.  If not, the task
is sent an exception, which is normally further propagated
as a segmentation fault.  If there is an associated memory
object, Mach checks whether the corresponding page is in core.
If it is, it installs the page and resumes the task. Mach
then invokes the memory object with the memory\_object\_request
method and the page to read.  The memory manager then fetches
or creates the content as appropriate and supplies it to
Mach using the memory\_object\_supply method.


# Creating and Mapping a Memory Object

The following illustrates the basic idea:

>                           ________
>                          /        \
>                         |   Mach   |
>                          \________/
>                     /| /           |\  \
>        (C) vm_map  /  / m_o_ready (E)\  \ (D) memory_object_init
>                   / |/ (F) return     \  \|
>                ________              ________
>               /        \   ----->   /        \
>              |  Client  | (A) open |  Server  |
>               \________/   <-----   \________/
>                     (B) memory_object

(A) The client sends an "open" rpc to the server.

(B) The server creates a memory object (i.e., a port receive right), adds
it to the port set that it is listening on and returns a capability (a port
send right) to the client.

(C) The client attempts to map the object into its address space using
the vm\_map rpc.  It passes a reference to the port that the server gave
it to the vm server (typically Mach).

(D) Since Mach has never seen the object before, it queues a
memory\_object\_init on the given port along with a send right (the
memory control port) for the manager to use to send messages to the
kernel and also as an authentication mechanism for future
interactions: the port is supplied so that the manager will be able to
identify from which kernel a given memory\_object\_* IPC is from.

(E) The server dequeues the message, initializes internal data
structures to manage the mapping and then invokes the
memory\_object\_ready method on the control object.

(F) The kernel sees that the manager is ready, sets up the appropriate
mappings in the client and then replies to the vm\_map rpc indicating
success.

There is nothing stopping others from playing "the kernel."  This is
not a security problem: clients must [[trust]] the server from whom they
obtain memory objects and also the servers with whom they share
the object.  Multiple memory managers are a reality that should be
dealt with gracefully: they are useful for network transparent
mappings etc.


# Resolving Page Faults

>       (G) Client      ________
>           resumed    /        \
>                     |   Mach   |
>      (A) Fault +----|------+   |  \ (B) m_o_request  (C) store_read
>            ____|___  \_____|__/ |\  \| ________         _________  
>           /    +---\-------+       \  /        \       /         \ 
>          |  Client  |          (F)   |  Server  |<===>|  storeio  |
>           \________/       m_o_supply \________/       \_________/ 
>                                           (E) return data  | ^
>                                                            | | (D) device_read 
>                                                            v |
>                                                          ________
>                                                         / Device \
>                                                        |  Driver  |
>                                                         \________/
>                                                            | ^
>                                                            | |
>                                                            v
>                                                       ____________
>                                                      /  Hardware  \

(A) The client does a memory access and faults.  The kernel catches
the fault and maps the address to the appropriate memory object.  It
then invokes the memory\_object\_request method on the associated
capability.  (In addition to the page to supply, it also supplies the
control port so that the server can determine which kernel
sent the message.)

(B) The manager dequeues the message.  On the Hurd, this is translated
into a store\_read: a function in the libstore library which is used to
transparently manage block devices.  The storeio server starts off as
a separate process, however, if the server has the appropriate
permission, the backing object can be contacted directly by the
server.  This layer of indirection is desirable when, for instance, a
storeio running as root may want to only permit read only access to a
resource, yet it cannot safely transfer its handle to the client.  In
this case, it would proxy the requests.

(C) The storeio server contacts, for instance, a device driver to do
the read.  This could also be a network block device (the NBD server
in GNU/Linux), a file, a memory object, etc.

(D) The device driver allocates an anonymous page from the default
pager and reads the data into it.  Once all of the operations are
complete, the device returns the data to the client unmapping it from
its own address space at the same time.

(E) The storeio transfers the page to the server.  The page is still
anonymous.

(F) The manager does a memory\_object\_supply transferring the page to
the kernel.  Only now is the page not considered to be anonymous but
managed.

(G) The kernel caches the page, installs it in the client's virtual
address space and finally, resumes the client.


# Paging Data Out

>               Change manager   Pager m_o_return    store_write
>        \      _________  (B)  __(A)__   (C)  ________  (D)  _______
>      S  |    / Default \     /        \     /        \     /       \ 
>      W  |<=>|   Pager   |<=>|   Mach   |==>|  server  |<=>| storeio |<=>
>      A  |    \_________/     \________/     \________/     \_______/
>      P  |
>        /

(A) The paging [[policy]] is implemented by Mach: servers just implement
the [[mechanism]].

(B) Once the kernel has selected a page that it would like to evict, it
changes the manager from the server to the default pager.  This way,
if the server does not deallocate the page quickly enough, it cannot
cause a denial of service: the kernel will just later double page it
to swap (the default pager is part of the [[tcb]]).

(C) Mach then invokes memory\_object\_return method on the control
object.  The server is expected to save the page free it in a timely
fashion.  The server is not required to send a response to the kernel.

(D) The manager then transfers the data to the storeio which
eventually sends it to disk.  The device driver consumes the memory
doing the equivalent of a vm\_deallocate.