summaryrefslogtreecommitdiff
path: root/hurd/translator/tmpfs/tmpfs_vs_defpager.mdwn
blob: 16a2340516d477fe14f3d0ddc4ed9fc75a7c3c2f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
[[!meta copyright="Copyright © 2010, 2011, 2013 Free Software Foundation,
Inc."]]

[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
id="license" text="Permission is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation License, Version 1.2 or
any later version published by the Free Software Foundation; with no Invariant
Sections, no Front-Cover Texts, and no Back-Cover Texts.  A copy of the license
is included in the section entitled [[GNU Free Documentation
License|/fdl]]."]]"""]]

[[!tag open_issue_gnumach open_issue_hurd]]


# IRC, freenode, #hurd, 2010

    <slpz> humm... why does tmpfs try to use the default pager? that's a bad
      idea, and probably will never work correctly...
    * slpz is thinking about old issues
    <slpz> tmpfs should create its own pagers, just like ext2fs, storeio...
    <slpz> slopez@slp-hurd:~$ settrans -a tmp /hurd/tmpfs 10M
    <slpz> slopez@slp-hurd:~$ echo "foo" > tmp/bar
    <slpz> slopez@slp-hurd:~$ cat tmp/bar
    <slpz> foo
    <slpz> slopez@slp-hurd:~$ 
    <slpz> :-)
    <pochu> slpz: woo you fixed it?
    <slpz> pochu: well, it's WIP, but reading/writing works...
    <slpz> I've replaced the use of default pager for the standard pager
      creation mechanism
    <antrik> slpz: err... how is it supposed to use swap space if not using the
      default pager?
    <antrik> slpz: or do you mean that it should act as a proxy, just
      allocating anonymous memory (backed by the default pager) itself?
    <youpi> antrik: the kernel uses the default pager if the application pager
      isn't responsive enough
    <slpz> antrik: it will just create memory objects and provide zerofilled
      pages when requested by the kernel (after a page fault)
    <antrik> youpi: that makes sense I guess... but how is that relevant to the
      question at hand?...
    <slpz> antrik: memory objects will contain the data by themselves
    <slpz> antrik: as youpi said, when memory is scarce, GNU Mach will start
      paging out data from memory objects to the default pager
    <slpz> antrik: that's the way in which pages will get into swap space
    <slpz> (if needed)
    <youpi> the thing being that the tmpfs pager has a chance to select pages
      he doesn't care any more about
    <antrik> slpz: well, the point is that instead of writing the pages to a
      backing store, tmpfs will just keep them in anonymous memory, and let the
      default pager write them out when there is pressure, right?
    <antrik> youpi: no idea what you are talking about. apparently I still
      don't really understand this stuff :-(
    <youpi> ah, but tmpfs doesn't have pages he doesn't care about, does it?
    <slpz> antrik: yes, but the term "anonymous memory" could be a bit
      confusing.
    <slpz> antrik: in GNU Mach, anonymous memory is backed by a memory object
      without a pager. In tmpfs, nodes will be allocated in memory objects, and
      the pager for those memory objects will be tmpfs itself
    <antrik> slpz: hm... I thought anynymous memory is backed by memory objects
      created from the default pager?
    <antrik> yes, I understand that tmpfs is supposed to be the pager for the
      objects it provides. they are obviously not anonymoust -- they have
      inodes in the tmpfs name space
    <antrik> but my understanding so far was that when Mach returns pages to
      the pager, they end up in anonymous memory allocated to the pager
      process; and then this pager is responsible for writing them back to the
      actual backing store
    <antrik> am I totally off there?...
    <antrik> (i.e. in my understanding the returned pages do not reside in the
      actual memory object the pager provides, but in an anonymous memory
      object)
    <slpz> antrik: you're right. The trick here is, when does Mach return the
      pages?
    <slpz> antrik: if we set the attribute "can_persist" in a memory object,
      Mach will keep it until object cache is full or memory is scarce
    <slpz> or we change the attributes so it can no longer persist, of course
    <slpz> without a backing store, if Mach starts sending us pages to be
      written, we're in trouble
    <slpz> so we must do something about it. One option, could be creating
      another pager and copying the contents between objects.
    <antrik> another pager? not sure what you mean
    <antrik> BTW, you didn't really say why we can't use the default pager for
      tmpfs objects :-)
    <slpz> well, there're two problems when using the default pager as backing
      store for translators
    <slpz> 1) Mach relies on it to do swapping tasks, so meddling with it is
      not a good idea
    <slpz> 2) There're problems with seqnos when trying to work with the
      default pager from tasks other the kernel itself
    <slpz> (probably, the latter could be fixed)
    <slpz> antrik: pager's terminology is a bit confusing. One can also say
      creating another memory object (though the function in libpager is
      "pager_create")
    <antrik> not sure why "meddling" with it would be a problem...
    <antrik> and yeah, I was vaguely aware that there is some seqno problem
      with tmpfs... though so far I didn't really understand what it was about
      :-)
    <antrik> makes sense now
    <antrik> anyways, AIUI now you are trying to come up with a mechanism where
      the default pager is not used for tmpfs objects directly, but without
      making it inefficient?
    <antrik> slpz: still don't understand what you mean by creating another
      memory object/pager...
    <antrik> (and yeat, the terminology is pretty mixed up even in Mach itself)
    <slpz> antrik: I meant creating another pager, in terms of calling again to
      libpager's pager_create
    <antrik> slpz: well, I understand what "create another pager" means... I
      just don't understand what this other pager would be, when you would
      create it, and what for...
    <slpz> antrik: oh, ok, sorry
    <slpz> antrik: creating another pager it's just a trick to avoid losing
      information when Mach's objects cache is full, and it decides to purge
      one of our objects
    <slpz> anyway, IMHO object caching mechanism is obsolete and should be
      replaced
    <slpz> I'm writting a comment to bug #28730 which says something about this
    <slpz> antrik: just one more thing :-) 
    <slpz> if you look at the code, for most time of their lives, anonymous
      memory objects don't have a pager
    <slpz> not even the default one
    <slpz> only the pageout thread, when the system is running really low on
      memory, gives them a reference to the default pager by calling
      vm_object_pager_create
    <slpz> this is not really important, but worth noting ;-)


# IRC, freenode, #hurd, 2011-09-28

    <slpz> mcsim: "Fix tmpfs" task should be called "Fix default pager" :-)
    <slpz> mcsim: I've been thinking about modifying tmpfs to actually have
      it's own storeio based backend, even if a tmpfs with storage sounds a bit
      stupid.
    <slpz> mcsim: but I don't like the idea of having translators messing up
      with the default pager...
    <antrik> slpz: messing up?...
    <slpz> antrik: in the sense of creating a number of arbitrarily sized
      objects
    <antrik> slpz: well, it doesn't really matter much whether a process
      indirectly eats up arbitrary amounts of swap through tmpfs, or directly
      through vm_allocate()...
    <antrik> though admittedly it's harder to implement resource limits with
      tmpfs
    <slpz> antrik: but I've talked about having its own storeio device as
      backend. This way Mach can pageout memory to tmpfs if it's needed.
    <mcsim> Do I understand correctly that the goal of tmpfs task is to create
      tmpfs in RAM?
    <slpz> mcsim: It is. But it also needs some kind of backend, just in case
      it's ordered to page out data to free some system's memory.
    <slpz> mcsim: Nowadays, this backend is another translator that acts as
      default pager for the whole system
    <antrik> slpz: pageout memory to tmpfs? not sure what you mean
    <slpz> antrik: I mean tmpfs acting as its own pager
    <antrik> slpz: you mean tmpfs not using the swap partition, but some other
      backing store?
    <slpz> antrik: Yes.

See also: [[open_issues/resource_management_problems/pagers]].

    <antrik> slpz: I don't think an extra backing store for tmpfs is a good
      idea. the whole point of tmpfs is not having a backing store... TBH, I'd
      even like to see a single backing store for anonymous memory and named
      files
    <slpz> antrik: But you need a backing store, even if it's the default pager
      :-)
    <slpz> antrik: The question is, Should users share the same backing store
      (swap space) or provide their own?
    <antrik> slpz: not sure what you mean by "users" in this context :-)
    <slpz> antrik: Real users with the ability of setting tmpfs translators
    <antrik> essentially, I'd like to have a single partition that contains
      both swap space and the main filesystem (at least /tmp, but probably also
      all of /run, and possibly even /home...)
    <antrik> but that's a bit off-topic :-)
    <antrik> well, ideally all storage should be accounted to a user,
      regardless whether it's swapped out anonymous storage, temporary named
      files, or permanent files
    <slpz> antrik: you could use a file as backend for tmpfs
    <antrik> slpz: what's the point of using tmpfs then? :-)
    <pinotree> (and then store the file in another tmpfs)
    <slpz> antrik: mach-defpager could be modified to use storeio instead of
      Mach's device_* operations, but by the way things work right now, that
      could be dangerous, IMHO
    <antrik> pinotree: hehe
    <pinotree> .. recursive tmpfs'es ;)
    <antrik> slpz: hm, sounds interesting
    <slpz> antrik: tmpfs would try to keep data in memory always it's possible
      (not calling m_o_lock_request would do the trick), but if memory is
      scarce an Mach starts paging out, it would write it to that
      file/device/whatever
    <antrik> ideally, all storage used by system tasks for swapped out
      anonymous memory as well as temporary named files would end up on the
      /run partition; while all storage used by users would end up in /home/*
    <antrik> if users share a partition, some explicit storage accounting would
      be useful too...
    <antrik> slpz: is that any different from what "normal" filesystems do?...
    <antrik> (and *should* it be different?...)
    <slpz> antrik: Yes, as most FS try to synchronize to disk at a reasonable
      rate, to prevent data losses.
    <slpz> antrik: tmpfs would be a FS that wouldn't synchronize until it's
      forced to do that (which, by the way, it's what's currently happening
      with everyone that uses the default pager).
    <antrik> slpz: hm, good point...
    <slpz> antrik: Also, metadata in never written to disk, only kept in memory
      (which saves a lot of I/O, too).
    <slpz> antrik: In fact, we would be doing the same as every other kernel
      does, but doing it explicitly :-)
    <antrik> I see the use in separating precious data (in permanent named
      files) from temporary state (anonymous memory and temporary named files)
      -- but I'm not sure whether having a completely separate FS for the
      temporary data is the right approach for that...
    <slpz> antrik: And giving the user the option to specify its own storage,
      so we don't limit him to the size established for swap by the super-user.
    <antrik> either way, that would be a rather radical change... still would
      be good to fix tmpfs as it is first if possible
    <antrik> as for limited swap, that's precisely why I'd prefer not to have
      an extra swap partition at all...
    <slpz> antrik: It's not much o fa change, it's how it works right now, with
      the exception of replacing the default pager with its own.
    <slpz> antrik: I think it's just a matter of 10-20 hours, as
      much. Including testing.
    <slpz> antrik: It could be forked with another name, though :-)
    <antrik> slpz: I don't mean radical change in the implementation... but a
      radical change in the way it would be used
    <slpz> antrik: I suggest "almosttmpfs" as the name for the forked one :-P
    <antrik> hehe
    <antrik> how about lazyfs?
    <slpz> antrik: That sound good to me, but probably we should use a more
      descriptive name :-)


## 2011-09-29

    <tschwinge> slpz, antrik: There is a defpager in the Hurd code.  It is not
      currently being used, and likely incomplete.  It is backed by libstore.
      I have never looked at it.

[[open_issues/mach-defpager_vs_defpager]].


# IRC, freenode, #hurd, 2011-11-08

    <mcsim> who else uses defpager besides tmpfs and kernel?
    <braunr> normally, nothing directly
    <mcsim> than why tmpfs should use defpager?
    <braunr> it's its backend
    <braunr> backign store rather
    <braunr> the backing store of most file systems are partitions
    <braunr> tmpfs has none, it uses the swap space
    <mcsim> if we allocate memory for tmpfs using vm_allocate, will it be able
      to use swap partition?
    <braunr> it should
    <braunr> vm_allocate just maps anonymous memory
    <braunr> anonymous memory uses swap space as its backing store too
    <braunr> but be aware that this part of the vm system is known to have
      deficiencies
    <braunr> which is why all mach based implementations have rewritten their
      default pager
    <mcsim> what kind of deficiencies?
    <braunr> bugs
    <braunr> and design issues, making anonymous memory fragmentation horrible
    <antrik> mcsim: vm_allocate doesn't return a memory object; so it can't be
      passed to clients for mmap()
    <mcsim> antrik: I use vm_allocate in pager_read_page
    <antrik> mcsim: well, that means that you have to actually implement a
      pager yourself
    <antrik> also, when the kernel asks the pager to write back some pages, it
      expects the memory to become free. if you are "paging" to ordinary
      anonymous memory, this doesn't happen; so I expect it to have a very bad
      effect on system performance
    <antrik> both can be avoided by just passing a real anonymous memory
      object, i.e. one provided by the defpager
    <antrik> only problem is that the current defpager implementation can't
      really handle that...
    <antrik> at least that's my understanding of the situation


# IRC, freenode, #hurd, 2013-07-05

    <teythoon> btw, why does the tmpfs translator have to talk to the pager?
    <teythoon> to get more control about how the memory is paged out?
    <teythoon> read lot's of irc logs about tmpfs on the wiki, but I couldn't
      find the answer to that
    <mcsim> teythoon: did you read this?
      http://www.gnu.org/software/hurd/hurd/translator/tmpfs/tmpfs_vs_defpager.html
    <teythoon> mcsim: I did
    <mcsim> teythoon: Last discussion, i think has very good point.
    <mcsim> To provide memory objects you should implement pager interface
    <mcsim> And if you implement pager interface you are the one who is asked
      to write data to backing storage to evict them
    <mcsim> But tmpfs doesn't do this
    <teythoon> mmm, clients doing mmap...
    <mcsim> teythoon: You don't have mmap
    <mcsim> teythoon: mmap is implemented on top of mach interface
    <mcsim> teythoon: I mean you don't have mmap at this level
    <teythoon> mcsim: sure, but that's close enough for me at this point
    <mcsim> teythoon: diskfs interface requires implementor to provide a memory
      object port (send right)
    <mcsim> Guest8183: Why tmpfs requires defpager
    <Guest8183> how did you get to talk about that ?
    <mcsim> I was just asked
    <teythoon> Guest8183: it's just so unsettling that tmpfs has to be started
      as root :/
    <Guest8183> teythoon: why ?
    *** Guest8183 (~rbraun@dalaran.sceen.net) is now known as braunr_
    <teythoon> braunr_: b/c starting translators isn't a privileged operation,
      and starting a tmpfs translator that doesn't even access any device but
      "just" memory shouldn't require any special privileges as well imho
    <teythoon> so why is tmpfs not based on say libnetfs? b/c it is used for
      d-i and someone (apt?) mmaps stuff?
    <pinotree> being libdiskfs-based isn't much the issue, iirc
    <pinotree> http://lists.gnu.org/archive/html/bug-hurd/2013-03/msg00014.html
      too
    <kilobug> teythoon: AFAIK apt uses mmap, yes
    <braunr_> teythoon: right
    <braunr_> a ramfs is actually tricky to implement well
    <mcsim> braunr_: What do you mean under "to implement well"?
    <braunr_> as efficiently as possible
    <braunr_> i.e. being as close as possible to the page cache for minimum
      overhead
    <mcsim> braunr: AFAIK ramfs should not use swap partition, so page cache
      shouldn't be relevant for it.
    <braunr> i'm talking about a ramfs in general
    <braunr> not the specific linux ramfs
    <braunr> in linux, what they call ramfs is the tiny version of tmpfs that
      doesn't use swap
    <braunr> i actually don't like "tmpfs" much
    <braunr> memfs may be more appropriate
    <braunr> anyway
    <mcsim> braunr: I see. And do you consider defpager variant as "close as
      possible to the page cache"?
    <braunr> not far at least
    <braunr> if we were able to use it for memory obects, it would be nice
    <braunr> but defpager only gets attached to objects when they're evicted
    <braunr> before that, anonymous (or temporary, in mach terminology) objects
      have no backing store
    <braunr> this was probably designed without having tmpfs in mind
    <braunr> i wonder if it's possible to create a memory object without a
      backing store
    <mcsim> what should happen to it if kernel decides to evict it?
    <braunr> it sets the default pager as its backing store and pushes it out
    <mcsim> that's how it works now, but you said "create a memory object
      without a backing store"
    <braunr> mach can do that
    <braunr> i was wondering if we could do that too from userspace
    <mcsim> mach does not evict such objects, unless it bound a defpager to
      them
    <mcsim> but how can you handle this in userspace?
    <braunr> i mean, create a memory object with a null control port
    <braunr> mcsim: is that clearer ?
    <mcsim> suppose you create such object, how kernel will evict it if kernel
      does not know who is responsible for eviction of this object?
    <braunr> it does
    <braunr> 16:41 < braunr> it sets the default pager as its backing store and
      pushes it out
    <braunr> that's how i intend to do it on x15 at least
    <braunr> but it's much simpler there because uvm provides better separation
      between anonymous and file memory
    <braunr> whereas they're much too similar in mach vm
    <mcsim> than what the difference between current situation, when you
      explicitly invoke defpager to create object and implicit method you
      propose?
    <braunr> you don't need a true defpager unless you actually have swap
    <mcsim> ok
    <mcsim> now I see
    <braunr> it also saves the communication overhead when initializing the
      object
    <mcsim> thank you
    <braunr> which may be important since we use ramfs for speed mostly
    <mcsim> agree
    <braunr> it should also simplify the defpager implementation, since it
      would only have a single client, the kernel
    <braunr> which may also be important with regard to global design
    <braunr> one thing which is in my opinion very wrong with mach is that it
      may be a client
    <braunr> a well designed distributed system should normally not allow on
      component to act as both client and server toward another
    <braunr> i.e. the kernel should only be a server, not a client
    <braunr> and there should be a well designed server hierarchy to avoid
      deadlocks
    <braunr> (such as the one we had in libpager because of that)
    <mcsim> And how about filesystem? It acts both as server and as client
    <braunr> yes
    <braunr> but not towards the same other component
    <braunr> application -> file system -> kernel
    <braunr> no "<->"
    <braunr> the qnx documentation explains that quite well
    <braunr> let me see if i can find the related description
    <mcsim> Basically, I've got your point. And I would rather agree that
      kernel should not act as client
    <braunr> mcsim:
      http://www.qnx.com/developers/docs/6.4.0/neutrino/sys_arch/ipc.html#Robust
    <braunr> one way to implement that (and qnx does that too) is to make
      pagers act as client only
    <braunr> they sleep in the kernel, waiting for a reply
    <braunr> and when the kernel needs to evict something, a reply is sent
    <braunr> (qnx doesn't actually do that for paging, but it's a general idea)
    <mcsim> braunr: how hierarchy of senders is enforced?
    <braunr> it's not
    <braunr> developers must take care
    <braunr> same as locking, be careful about it