1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
|
[[!meta copyright="Copyright © 2011, 2012, 2013, 2014 Free Software Foundation,
Inc."]]
[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
id="license" text="Permission is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation License, Version 1.2 or
any later version published by the Free Software Foundation; with no Invariant
Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license
is included in the section entitled [[GNU Free Documentation
License|/fdl]]."]]"""]]
[[!tag open_issue_hurd]]
* [[notes_bing]]
* [[notes_various]]
* [[tmpfs_vs_defpager]]
* [[!GNU_Savannah_bug 26751]]
[[!toc]]
# [[Maksym_Planeta]]
## IRC, freenode, #hurd, 2011-11-29
<mcsim> Hello. In seqno_memory_object_data_request I call
memory_object_data_supply and supply one zero filled page, but seems that
kernel ignores this call because this page stays filled in specified
memory object. In what cases kernel may ignore this call? It is written
in documentation that "kernel prohibits the overwriting of live data
pages". But when I called memory_object_lock_request on this page with
should flush and MEMORY_OBJECT_RETURN_ALL nothing change
<braunr> what are you trying to do ?
<mcsim> I think that memory object holds wrong data, so I'm trying to
replace them. This happens when file is truncated, so I should notify
memory object that there is no some data. But since gnumach works only
with sizes that are multiple of vm_page_size, I should manually correct
last page for case when file size isn't multiple of vm_page_size. This is
needed for case when file grows again and that tail of last page, which
wasn't part of file should be filled wit
<mcsim> I've put some printf's in kernel and it seems that page that holds
data which I want replace both absent and busy:
<mcsim> m = vm_page_lookup(object,offset);
<mcsim> ...
<mcsim> if (m->absent && m->busy) { <-- Condition is true
<mcsim> in vm/memory_object.c:169
<slpz> mcsim: Receiving m_o_data_request means there's no page in the
memory object at that offset, so m_o_data_supply should work
<slpz> are you sure that page is not being installed into the memory
object?
<braunr> it seems normal it's both absent and busy
<braunr> absent because, as sergio said, the page is missing, and busy
because the kernel starts a transfer for its content
<braunr> i don't understand how you determine the kernel ignores your
data_supply
<braunr> "because this page stays filled in specified memory object"
<braunr> please explain this with more detail
<slpz> mcsim: anyway, when truncating a file to a non page-aligned length,
you can just zero fill the rest of the page by mapping the object and
writing to it with memset/bzero
<braunr> (avoid bzero, it's obsolete)
<mcsim> slpz: I'll try try it now.
<braunr> slpz: i think that's what he's trying to do
<mcsim> I don't vm_map it
<braunr> how do you zero it then ?
<braunr> "I call memory_object_data_supply and supply one zero filled page"
<mcsim> First I call mo_lock_request and ask to return this page, than I
memset tail and try to mo_data_supply
<mcsim> I use this function when I try to replace kr =
memory_object_data_supply(reply_to, offset, addr, vm_page_size, FALSE,
VM_PROT_NONE, FALSE, MACH_PORT_NULL);
<mcsim> where addr points to new data, offset points to old data in
object. and reply_to is memory_control which I get as parameter in
mo_data_request
<braunr> why would you want to vm_map it then ?
<mcsim> because mo_data_supply doesn't work.
<braunr> mcsim: i still don't see why you want to vm_map
<mcsim> I just want to try it.
<braunr> but what do you think will happen ?
<mcsim> But seems that it doesn't work too, because I can't vm_map
memory_object from memory_manager of this object.
## IRC, freenode, #hurd, 2012-01-05
<mcsim> Seems tmpfs works now. The code really needs cleaning, but the main
is that it works. So in nearest future it will be ready for merging to
master branch. BTW, anyone knows good tutorial about refactoring using
git (I have a lot of pointless commits and I want to gather and scatter
them to sensible ones).
<antrik> I wonder whether he actually got the "proper" tmpfs with the
defaul pager working? or only the hack with a private pager?
<mcsim> antrik: with default pager
<antrik> mcsim: wow, that's great :-)
<antrik> how did you fix it?
<mcsim> antrik: The main code I wrote before December, so I forgot some of
what exactly I were doing. So I have to look through my code :)
<mcsim> antrik: old defpager was using old functions like m_o_data_write
instead of m_o_data_return etc. I changed it, mostly because of my
misunderstanding. But I hope that this is not a problem.
## IRC, freenode, #hurd, 2012-01-18
<antrik> mcsim: did you publish your in-progress work?
<mcsim> there is a branch with working tmpfs in git repository:
https://git.savannah.gnu.org/cgit/hurd/hurd.git/log/?h=mplaneta/tmpfs/defpager
<jd823592> sorry for interrupting the meeting but i wonder what is a
lazyfs?
<mcsim> jd823592: lazyfs is tmpfs which uses own pager
<antrik> mcsim: ah, nice :-)
<antrik> BTW, what exactly did you need to fix to make it work?
<mcsim> most fixes wore in defpager in default_pager_object_set_size. Also,
as i said earlier, I switched to new functions (m_o_data_return instead
of m_o_data_write and so on). I said that this was mostly because of my
misunderstanding, but it turned out that new function provide work with
precious attribute of page.
<mcsim> Also there were some small errors like this:
<mcsim> pager->map = (dp_map_t) kalloc (PAGEMAP_SIZE (new_size));
<mcsim> memcpy (pager->map, old_mapptr, PAGEMAP_SIZE (old_size));
<mcsim> where in second line should be new_size too
<mcsim> I removed all warnings in compiling defpager (and this helped to
find an error).
<antrik> great work :-)
<jd823592> tmpfs is nice thing to have :), are there other recent
improvements that were not yet published in previous moth?
<mcsim> BTW, i measured tmpfs speed in it is up to 6 times faster than
ramdisk+ext2fs
<antrik> mcsim: whow, that's quite a difference... didn't expect that
## IRC, freenode, #hurd, 2012-01-24
<mcsim> braunr: I'm just wondering is there any messages before hurd
breaks. I have quite strange message: memory_object_data_request(0x0,
0x0, 0xf000, 0x1000, 0x1) failed, 10000003
<braunr> hm i don't think so
<braunr> usually it either freezes completely, or it panics because of an
exhausted resource
<mcsim> where first and second 0x0 are pager and pager_request for memory
object in vm_fault_page from gnumach/vm_fault.c
<braunr> if you're using the code you're currently working on (which i
assume), then look for a bug there first
<tschwinge> mcsim: Maybe you're running out of swap?
<mcsim> tschwinge: no
<braunr> also, translate the error code
<mcsim> AFAIR that's MACH_INVALID_DEST
<braunr> and what does it mean in this situation ?
<mcsim> I've run fsx as long as possible several times. It runs quite long
but it breaks in different ways.
[[open_issues/file_system_exerciser]].
<mcsim> MACH_SEND_INVALID_DEST
<mcsim> this means that kernel tries to call rpc with pager 0x0
<mcsim> this is invalid destiantion
<braunr> null port
<braunr> ok
<braunr> did the pager die ?
<mcsim> When I get this message pager dies, but also computer can suddenly
reboot
<braunr> i guess the pager crashing makes mach print this error
<braunr> but then you may have a dead port instead of a null port, i don't
remember the details
<mcsim> braunr: thank you.
<mcsim> btw, for big file sizes fsx breaks on ext2fs
<braunr> could you identify the threshold ?
<braunr> and what's fsx exactly ?
<mcsim> fsx is a testing utility for filesystems
<mcsim> see http://codemonkey.org.uk/projects/fsx/
<braunr> ah, written by tevanian
<mcsim> threshold seems to be 8Mb
<braunr> fyi, avadis tevanian is the main author of the mach 3 core
services and VM parts
<braunr> well, ext2fs is bugged, we already know that
<braunr> old code maintained as well as possible, but still
<mcsim> hmm, with 6mb it breaks too
<braunr> i guess that it may break on anything larger than a page actually
:p
<mcsim> When I tested with size of 256kb, fsx worked quite long and didn't
break
<braunr> mcsim: without knowing exactly what the test actually does, it's
hard to tell
<mcsim> I see, I just wanted to tell that there are bugs in ext2fs too. But
I didn't debugged it.
<mcsim> fsx performs different operations, like read, write, truncate file,
grow file in random order.
<braunr> in parellel too ?
<braunr> parellel
<braunr> parallel*
<mcsim> no
<mcsim> I run several fsx's parallel on tmpfs, but they break on file with
size 8mb.
<braunr> that must match something in mach
<braunr> s/must/could/ :)
<mcsim> braunr: I've pushed my commits to mplaneta/tmpfs/master branch in
hurd repository, so you could review it.
<braunr> you shouldn't do that just for me :p
<braunr> you should do that regularly, and ask for reviews after
(e.g. during the meetings)
<mcsim> everyone could do that :)
<braunr> i'm quite busy currently unfortunately
<braunr> i'll try when i have time, but the best would be to ask very
specific questions
<braunr> these are usually the faster to answer for people ho have the
necessary expertise to help you
<braunr> fastest*
<mcsim> ok.
<mcsim> braunr: probably, I was doing something wrong, because now parallel
works only for small sizes. Sorry, for disinformation.
### IRC, freenode, #hurd, 2012-01-25
<antrik> braunr: actually, the paging errors are *precisely* the way my
system tends to die...
<antrik> (it's after about a month of uptime usually though, not a week...)
<antrik> tschwinge: in my case at least, I have still plenty of swap when
this happens. swap usage is generally at about the amount of physical
memory -- I have no idea though whether there is an actual connection, or
it's just coincidence
<braunr> antrik: ok, your hurd dies because of memory issues, my virtual
machines die because of something else (though idk what)
<antrik> before I aquired the habit of running my box 24/7 and thus hitting
this issue, most of the hangs I experienced were also of a different
nature... but very rare in general, except when doing specific
problematic actions
<mcsim> antrik: yes. Do you get messages like that I posted?
<mcsim> here is it: memory_object_data_request(0x0, 0x0, 0xf000, 0x1000,
0x1) failed, 10000003
<antrik> mcsim: I can't tell for sure (never noted them down, silly me...)
<antrik> but I definitely get paging errors right before it hangs
<antrik> I guess that was unclear... what I'm trying to say is: I do get
memory_object_data_request() failed; but I'm not sure about the
parameters
<mcsim> antrik: ok. Thank you.
<mcsim> I'll try to find something in defpager, but there should be errors
in mach too. At least because sometimes computer suddenly reboots during
test.
<antrik> mcsim: I don't get sudden reboots
<antrik> might be a different error
<antrik> do you have debugging mode activated in Mach? otherwise it reboots
on kernel panics...
<mcsim> antrik: no. But usually on kernel panics mach waits for some time
showing the error message and only than reboots.
<antrik> OK
<mcsim> how can I know that tmpfs is stable enough? Correcting errors in
kernel to make fsx test work seems to be very complex.
<mcsim> *If errors are in kernel.
<antrik> well, it seems that you tested it already much more thoroughly
than any other code in the Hurd was ever tested :-)
<antrik> of course it would be great if you could pinpoint some of the
problems you see nevertheless :-)
<antrik> but that's not really necessary before declaring tmpfs good enough
I'd say
<mcsim> ok. I'll describe every error I meet on my userpage
<mcsim> but it will take some time, not before weekend.
<antrik> don't worry, it's not urgent
<antrik> the reason I'd really love to see those errors investigated is
that most likely they are the same ones that cause stability problems in
actual use...
<antrik> having an easy method for reproducing them is already a good start
<mcsim> no. they are not the same
<mcsim> every time i get different one
<mcsim> especially when i just start one process fsx and wait error
<antrik> mcsim: have you watched memory stats while running it? if it's
related to the problems I'm experiencing, you will probably see rising
memory use while the test is running
<mcsim> it could be reboot, message, I posted and also fsx could stop
telling that something wrong with data
<antrik> you get all of these also on ext2?
<mcsim> i've done it only once. Here is the log:
http://paste.debian.net/153511/
<mcsim> I saved "free" output every 30 seconds
<mcsim> no. I'll do it now
<antrik> would be better to log with "vmstat 1"
<mcsim> ok.
<mcsim> as you can see, there is now any leek during work. But near end
free memory suddenly decreases
<antrik> yeah... it's a bit odd, as there is a single large drop, but seems
stable again afterwards...
<antrik> a more detailed log might shed some light
<mcsim> drop at the beginning was when I started translator.
<mcsim> what kind of log do you mean?
<antrik> vmstat 1 I mean
<mcsim> ah...
## IRC, freenode, #hurd, 2012-02-01
<mcsim> I run fsx with this command: fsx -N3000 foo/bar -S4
-l$((1024*1024*8)). And after 70 commands it breaks.
<mcsim> The strangeness is at address 0xc000 there is text, which was
printed in fsx with vfprintf
<mcsim> I've lost log. Wait a bit, while I generate new
<jkoenig_> mcsim, what's fsx / where can I find it ?
<mcsim> fsx is filesystem exersiser
<mcsim> http://codemonkey.org.uk/projects/fsx/
<jkoenig_> ok thanks
<mcsim> i use it to test tmpfs
<mcsim> here is fsx that compiles on linux: http://paste.debian.net/154390/
and Makefile for it: http://paste.debian.net/154392/
<jkoenig_> mcsim, hmm, I get a failure with ext2fs too, is it expected?
<mcsim> yes
<mcsim> i'll show you logs with tmpfs. They slightly differ
<mcsim> here: http://paste.debian.net/154399/
<mcsim> pre last operation is truncate
<mcsim> and last is read
<mcsim> during pre-last (or last) starting from address 0xa000, every
0x1000 bytes appears text
<mcsim> skipping zero size read
<mcsim> skipping zero size read
<mcsim> truncating to largest ever: 0x705f4b
<mcsim> signal 2
<mcsim> testcalls = 38
<mcsim> this text is printed by fsx, by function prt
<mcsim> I've mistaken: this text appears even from every beginning
<mcsim> I know that this text appears exactly at this moment, because I
added check of the whole file after every step. And this error appeared
only after last truncation.
<mcsim> I think that the problem is in defpager (I'm fixing it), but I
don't understand where defpager could get this text
<jkoenig_> wow I get java code and debconf templates
<mcsim> So, my question is: is it possible for defpager to get somehow this
text?
<jkoenig_> possibly recycled, non-zeroed pages?
<mcsim> hmmm... probably you're right
<jkoenig_> 0x1000 bytes is consistent with the page size
<mcsim> Should I clean these pages in tmpfs?
<mcsim> or in defpager?
<mcsim> What is proper way?
<jkoenig_> mcsim, I'd say defpager should do it, to avoid leaking
information, I'm not sure though.
<jkoenig_> maybe tmpfs should also not assume the pages have been blanked
out.
<mcsim> if i do it in both, it could have big influence on performance.
<mcsim> i'll do it only in defpager so far.
<mcsim> jkoenig_: Thank you a lot
<jkoenig_> mcsim, no problem.
## IRC, freenode, #hurd, 2012-02-08
<tschwinge> mcsim: You pushed another branch with cleaned-up patches?
<mcsim> yes.
<tschwinge> mcsim: Anyway, any data from your report that we could be
interested in? (Though it's not in English.)
<mcsim> It's completely in ukrainian an and mostly describes some aspects
of hurd's work.
<tschwinge> mcsim: OK. So you ran out of time to do the benchmarking,
etc.?
<tschwinge> Comparing tmpfs to ext2fs with RAM backend, etc., I mean.
<mcsim> tschwinge: I made benchmarking and it turned out that tmpfs up to 6
times faster than ext2fs
<mcsim> tschwinge: is it possible to have a review of work, I've already
done, even if parallel writing doesn't work?
<tschwinge> mcsim: Do you need this for university or just a general review
for inclusion in the Git master branch?
<mcsim> general review
<tschwinge> Will need to find someone who feels competent to do that...
<mcsim> the branch that should be checked is tmpfs-final
<pinotree> cool, i guess you tested also special types of files like
sockets and pipes? (they are used in eg /run, /var/run or similar)
<mcsim> Oh. I accidentally created this branch. It is my private
branch. I'll delete it now and merge everything to mplaneta/tmpfs/master
<mcsim> pinotree: Completely forgot about them :( I'll do it by all means
<pinotree> mcsim: no worries :)
<mcsim> tschwinge: Ready. The right branch is mplaneta/tmpfs/master
## IRC, freenode, #hurd, 2012-03-07
<pinotree> did you test it with sockets and pipes?
<mcsim> pinotree: pipes work and sockets seems to work too (I've created
new pfinet device for them and pinged it).
<pinotree> try with simple C apps
<mcsim> Anyway all these are just translators, so there shouldn't be any
problems.
<mcsim> pinotree: works
## IRC, freenode, #hurd, 2012-03-22
<mcsim> Hello. Is it normal that when i try to run du at directory where
translator is mounted it says that directory is 'Not a directory'? Here
are some examples with different filesystems: paste.debian.net/160699
First is ramdisk+ext2fs, second is tmpfs, third is ext2fs.
<civodul> i can't reproduce the problem with ext2fs
<civodul> perhaps you can try rpctracing it to see where ENOTDIR comes from
<mcsim> civodul: when I run du io_stat_request ipc is called. But reply is
((os/kern) invalid address). Where is server code for this ipc? I only
found its definition in defs file and that's all.
<civodul> mcsim: server code is in libdiskfs + ext2fs, for instance
<mcsim> civodul: Does io_stat_request have changed name in server code? I
just can't find it. Here are my grep results fore io_stat_request (i was
grepping in root of hurd repository: paste.debian.net/160708
<youpi> remove _request
<youpi> it's just io_stat
<mcsim> youpi: thank you
## IRC, freenode, #hurd, 2012-04-08
<mcsim> youpi: I've corrected everything you said, and pushed code to new
branch mplaneta/tmpfs/master-v2
<youpi> mcsim: all applied, thanks !
<youpi> I'll probably test it a bit and upload a new version of hurd
<youpi> mcsim: it seems to be working fine indeed!
<mcsim> youpi: thank you for all your reviews, suggestions you gave and
corrections you made :)
<youpi> and it seems translators indeed work there too
<youpi> hopefully it'll work to run the debian installer
<youpi> that'd permit to solve memory consumption
<pinotree> (so tmpfs works really fine now? great!)
<youpi> I could reboot with tmpfs on /tmp and build a package there, yes
<mcsim> youpi: yes, I've compiled several packages already, but it does not
give big advantage in performance.
<youpi> I wasn't really looking for performance, but for correctness :)
<youpi> are you using writeback for your /, actually ?
<youpi> argl, /run gets triggered before mach-defpager is started
<youpi> the X11 socket works there too
<youpi> gnu_srs: might your mouse issue with Xorg be related with vnc usage
too?
<youpi> it seems ENOSPC works fine too
<mcsim> youpi: as to writeback. I think yes, because default pager is asked
to write data only when this data is evicted.
<youpi> I'm talking about kvm
<mcsim> youpi: I use real computer.
<youpi> ok
<youpi> but that indeed means writeback of ext2fs works, which is a good
sign :)
# IRC, freenode, #hurd, 2013-10-04
<teythoon> btw, I noticed that fifos do not work on tmpfs
<braunr> teythoon: tmpfs seems limited, yes
<teythoon> that's annoying b/c /run is a tmpfs on Debian and sysvinit
creates a crontrol fifo there
<teythoon> I wonder why I didn't notice that before
<braunr> also, fifos, like symlinks, can be shortcircuited in libdiskfs
<braunr> i wonder if that has anything to do with the problem at hand
[[mtab/discussion]], *Multiple mtab Translators Spawned*.
<teythoon> b/c this breaks reboot & friends
<teythoon> I do too
<teythoon> b/c I cannot find any shortcut related code in tmpfs
<braunr> well, it's optional normally
<braunr> so that's ok
<braunr> but has it really been tested when the option wasn't there ? :)
<teythoon> yes, but the tmpfs requests this by setting diskfs_shortcut_fifo
= 1;
<pinotree> hm i remember tmpfs was said to be working with
sockets/fifos/etc, back then when it was fixed
<braunr> teythoon: oh
## IRC, freenode, #hurd, 2013-10-11
<teythoon> this will have to wait for the next hurd pkg unfortunately, b/c
I broke tmpfs by accident :-/
<pinotree> how so?
<teythoon> the dropping of privileges broke passive translators and mkfifo
<braunr> there actually is a reason why those are run as root or with the
privilege of their owner
<braunr> privileges should be decoupled from identity
<teythoon> yes
## IRC, freenode, #hurd, 2013-11-08
<teythoon> braunr: I'm investigating this port leak of mine
<teythoon> well, I thought I introduced one
<teythoon> but I'm not too sure anymore
<teythoon> the setting is this
<teythoon> i start an active tmpfs translator, bind it to foo
<teythoon> then, i create foo/bar with a passive translator entry
<teythoon> i access foo/bar, the passive translator is started
<teythoon> my test suite now covers several methods of making that
translator go away
<teythoon> killing it with 15 or 9 is fine, i.e. does not make the first
tmpfs leak ports
<teythoon> however, doing settrans -g foo/bar does for some reason
<teythoon> i think my code is fine, i spent considerable time on tracking
down this problem, always thinking that i must have introduced it
<teythoon> but another thing just cought my eye, the first tmpfs translator
says this when i do settrans -g foo/bar:
<teythoon> tmpfs/tmpfs: pthread_create: Resource temporarily unavailable
<teythoon> could it be that a no-sender notification is ignored b/c the
handler thread failed to start ?
<braunr> teythoon: i saw this pthread error too
# IRC, freenode, #hurd, 2014-02-09
<gg0> remounting tmpfs doesn't work if in use
http://paste.debian.net/plain/80937/
<gg0> you will also get a pthread_create: Resource temporarily unavailable
<youpi> iirc the pthread_create warning happens for any kind of translator
|