summaryrefslogtreecommitdiff
path: root/open_issues/gnumach_vm_map_red-black_trees.mdwn
blob: 53ff66c50aa4d76a760c246f510cc1bcdc5d403e (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
[[!meta copyright="Copyright © 2012 Free Software Foundation, Inc."]]

[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
id="license" text="Permission is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation License, Version 1.2 or
any later version published by the Free Software Foundation; with no Invariant
Sections, no Front-Cover Texts, and no Back-Cover Texts.  A copy of the license
is included in the section entitled [[GNU Free Documentation
License|/fdl]]."]]"""]]

[[!tag open_issue_gnumach]]


# IRC, freenode, #hurd, 2012-04-23

    <braunr> btw, i'm running a gnumach version using red-black trees for vm
      map entries
    <antrik> braunr: sounds fashionable ;-)
    <youpi> braunr: with some perf improvement?
    <braunr> looks promising for our ext2fs instances showing several thousands
      of map entries
    <braunr> youpi: i'm not using it for lookups yet
    <braunr> but the tree is there, maintained, used for both regular and copy
      maps, and it doesn't crash
    <youpi> good :)
    <braunr> antrik: isn't it ? :)
    <braunr> youpi: and the diff stat is like 50/15
    <antrik> braunr: what's the goal of using the fashionable trees?
    <braunr> antrik: speeding up lookups in address spaces
    <antrik> braunr: so the idea is that if we have a heavily fragmented
      address space, the performance penalty is smaller?
    <braunr> yes
    <antrik> OK
    <antrik> I take it you gave up on attempts to actually decrease
      fragmentation?...
    <braunr> it's not as good as reducing fragmentation, which requires
      implementing a powerful merge, but it's still better
    <braunr> yes
    <braunr> it's too messy for my brain :/
    <antrik> I see
    <antrik> oh
    <braunr> it will add some overhead though
    <youpi> I guess log(n) ?
    <braunr> but if there is a significant performance gain, it'll be worth it
    <braunr> yes
    <braunr> i was more thinking about the memory overhead
    <antrik> right now it's a linear list?
    <youpi> I don't think we care nowadays :)
    <braunr> antrik: yes
    <antrik> ouch
    <braunr> antrik: yes ... :>
    <braunr> the original authors expected vm maps to have like 30 entries
    <braunr> so they used a list for the maps, and a hash table for the
      object/offset to physical page lookups
    <braunr> there is a small lookup cache though, which is a nice optimization
    <braunr> my code now uses it first, and falls back to the RB tree if the
      hint didn't help
    <antrik> braunr: well, don't forget to check whether it actually *is* still
      an optimisation, when using fashionable trees ;-)
    <braunr> antrik: i checked that already :)
    <braunr> i did the same in x15
    <antrik> I see
    <braunr> both bsd and linux uses a similar technique
    <braunr> use*
    <braunr> (well, bsd actually use what is done in mach :)
    <antrik> (or perhaps the other way around... ;-) )
    <braunr> i don't think so, as the bsd vm is really the mach vm
    <braunr> but we don't care much
    <antrik> oh, right... that part actually went full circle
    <braunr> youpi: i have a patch ready for test on machines with significant
      amounts of map entries (e.g. the buildds ..)
    <braunr> youpi: i won't have time for tests tonight, are you interested ?
    <braunr> (i've been running it for 15 minutes without any issue for now)
    <youpi> I'd say post to the list
    <braunr> ok
    <youpi> braunr: your patch uses the rb tree for lookups, right?
    <youpi> braunr: the buildd using rbtree seems swift
    <youpi> but maybe it's just a psychologic effect :)
    <youpi> the chroot ext2fs already has 1392 lines in vminfo
    <youpi> an rbtree can't hurt  there :)
    <youpi> braunr: it really seems faster
    <youpi> the reboot might have helped too
    <youpi> benchmarks shall say
    <youpi> for now, I'll just let ironforge use it
    <antrik> youpi: it's always fast after a reboot ;-)
    <youpi> sure
    <youpi> but still
    <youpi> I mean
    <youpi> *obviously* the reboot has helped
    <youpi> but it might not be all
    <youpi> at least it feels so
    <youpi> and obviously only benchmarks can say
    <antrik> the major benefit AIUI is rather that the slowdown happening over
      time will be less noticable

[[performance/degradation]].

    <youpi> "over time" is actually quite fast
    <youpi> ext2 fills up very quickly when you build a package
    <youpi> it went up to 1700 lines very quickly
    <youpi> and stabilized around there
    <antrik> yeah, it can be very fast under heavy load
    <youpi> that's why I say reboot seems not all
    <youpi> it's already not so fresh
    <youpi> with 1700 lines in vminfo
    <antrik> well, I don't know how much of the slowdown I'm experiencing
      (after doing stuff under memory pressure) is actually related to VM map
      fragmentation...
    <antrik> might be all, might be none, might be something in between
    <youpi> then try his patch
    <antrik> guess I should play a bit with vminfo...
    <antrik> well, currently I actually experience pretty little slowdown, as
      for certain reasons (only indirectly related to the Hurd) I'm not running
      mutt on this machine, so I don't really have memory pressure...


## IRC, freenode, #hurd, 2012-04-24

    <braunr> youpi: yes, it uses bst lookups
    <braunr> youpi: FYI, the last time i checked, one ext2fs instance had 4k+
      map entries, and another around 7.5k
    <braunr> (on ironforge)


## IRC, freenode, #hurd, 2012-04-24

    <youpi> braunr: $ sudo vminfo  624 | wc -l
    <youpi> 22957
    <youpi> there's no way it can not help :)
    <braunr> youpi: 23k, that's really huge


## IRC, freenode, #hurd, 2012-04-26

    <braunr> youpi: any new numbers wrt the rbtree patch ?
    <youpi> well, buildd times are not really accurate :)
    <youpi> but what I can tell is that it managed to build qtwebkit fine
    <braunr> ok
    <youpi> so the patch is probably safe :)
    <braunr> i'll commit it soon then
    <youpi> I'd say feel free to, yes
    <braunr> thanks


## IRC, freenode, #hurd, 2012-04-27

    <braunr> elmig: don't expect anything grand though, this patch is mostly
      useful when address spaces get really fragmented, which mainly happens on
      buildds
    <braunr> (and it only speeds lookups, which isn't as good as reducing
      fragmentation; things like fork still have to copy thousands of map
      entries)

[[glibc/fork]].


## IRC, freenode, #hurdfr, 2012-06-02

    <youpi> braunr: oh, un bug de rbtree
    <youpi> Assertion `diff != 0' failed in file "vm/vm_map.c", line 1002
    <youpi> c'est dans rbtree_insert()
    <youpi> vm_map_enter (vm/vm_map.c:1002).
    <youpi> vm_map (vm/vm_user.c:373).
    <youpi> syscall_vm_map (kern/ipc_mig.c:657).
    <youpi> erf j'ai tué mon débuggueur, je ne peux pas inspecter plus
    <youpi> le peu qui me reste c'est qu'apparemment target_map == 1, size ==
      0, mask == 0
    <youpi> anywhere = 1
    <braunr> youpi: ça signifie sûrement que des adresses overlappent
    <braunr> je rejetterai un coup d'oeil sur le code demain
    <braunr> (si ça se trouve c'est un bug rare de la vm, le genre qui fait
      crasher le noyau)
    <braunr> (enfin jveux dire, qui faisait crasher le noyau de façon très
      obscure avant le patch rbtree)


### IRC, freenode, #hurd, 2012-07-15

    <bddebian> I get errors in vm_map.c whenever I try to "mount" a CD
    <bddebian> Hmm, this time it rebooted the machine
    <bddebian> braunr: The translator set this time and the machine reboots
      before I can get the full message about vm_map, but here is some of the
      crap I get:  http://paste.debian.net/179191/
    <braunr> oh
    <braunr> nice
    <braunr> that may be the bug youpi saw with my redblack tree patch
    <braunr> bddebian: assert(diff != 0); ?
    <bddebian> Aye
    <braunr> good
    <braunr> it means we're trying to insert a vm_map_entry at a region in a
      map which is already occupied
    <bddebian> Oh
    <braunr> and unlike the previous code, the tree actually checks that
    <braunr> it has to
    <braunr> so you just simply use the iso9660fs translator and it crashes ?
    <bddebian> Well it used to on just trying to set the translator.  This time
      I was able to set the translator but as soon as I cd to the mount point I
      get all that crap
    <braunr> that's very good
    <braunr> more test cases to fix the vm


### IRC, freenode, #hurd, 2012-11-01

    <youpi> braunr: Assertion `diff != 0' failed in file "vm/vm_map.c", line
      1002
    <youpi> that's in rbtree_insert
    <braunr> youpi: the problem isn't the tree, it's the map entries
    <braunr> some must overlap
    <braunr> if you can inspect that, it would be helpful
    <youpi> I have a kdb there
    <youpi> it's within a port_name_to_task system call
    <braunr> this assertion basically means there already is an item in the
      tree where the new item is supposed to be inserted
    <youpi> this port_name_to_task presence in the stack is odd
    <braunr> it's in vm_map_enter
    <youpi> there's a vm_map just after that (and the assembly trap code
      before)
    <youpi> I know
    <youpi> I'm wondering about the caller
    <braunr> do you have a way to inspect the inserted map entry ?
    <youpi> I'm actually wondering whether I have the right kernel in gdb
    <braunr> oh
    <youpi> better
    <youpi> with the right kernel :)
    <youpi> 0x80039acf (syscall_vm_map)
      (target_map=d48b6640,address=d3b63f90,size=0,mask=0,anywhere=1)
    <youpi> size == 0 seems odd to me
    <youpi> (same parameters for vm_map)
    <braunr> right
    <braunr> my code does assume an entry has a non null size
    <braunr> (in the entry comparison function)
    <braunr>        EINVAL (since Linux 2.6.12) length was 0.
    <braunr> that's a quick glance at mmap(2)
    <braunr> might help track bugs from userspace (e.g. in exec .. :))
    <braunr> posix says the saem
    <braunr> same*
    <braunr> the gnumach manual isn't that precise
    <youpi> I don't seem to manage to read the entry
    <youpi> but I guess size==0 is the problem anyway
    <mcsim> youpi, braunr: Is there another kernel fault? Was that in my
      kernel?
    <braunr> no that's another problem
    <braunr> which became apparent following the addition of red black trees in
      the vm_map code
    <braunr> (but which was probably present long before)
    <mcsim> braunr: BTW, do you know if there where some specific circumstances
      that led to memory exhaustion in my code? Or it just aggregated over
      time?
    <braunr> mcsim: i don't know
    <mcsim> s/where/were
    <mcsim> braunr: ok


### IRC, freenode, #hurd, 2012-11-05

    <tschwinge> braunr: I have now also hit the diff != 0 assertion error;
      sitting in KDB, waiting for your commands.
    <braunr> tschwinge: can you check the backtrace, have a look at the system
      call and its parameters like youpi did ?
    <tschwinge> If I manage to figure out how to do that...  :-)
    * tschwinge goes read scrollback.
    <braunr> "trace" i suppose
    <braunr> if running inside qemu, you can use the integrated gdb server
    <tschwinge> braunr: No, hardware.  And work intervened.  And mobile phone
      <-> laptop via bluetooth didn't work.  But now:
    <tschwinge> Pretty similar to Samuel's:
    <tschwinge>     Assert([...])
    <tschwinge>     vm_map_enter(0xc11de6c8, 0xc1785f94, 0, 0, 1)
    <tschwinge>     vm_map(0xc11de6c8, 0xc1785f94, 0, 0, 1)
    <tschwinge>     syscall_vm_map(1, 0x1024a88, 0, 0, 1)
    <tschwinge>     mach_call_call(1, 0x1024a88, 0, 0, 1)
    <braunr> thanks
    <braunr> same as youpi observed, the requested size for the mapping is 0
    <braunr> tschwinge: thanks
    <tschwinge> braunr: Anything else you'd like to see before I reboot?
    <braunr> tschwinge: no, that's enough for now, and the other kind of info
      i'd like are much more difficult to obtain
    <braunr> if we still have the problem once a small patch to prevent null
      size is applied, then it'll be worth looking more into it
    <pinotree> isn't it possible to find out who called with that size?
    <braunr> not easy, no
    <braunr> it's also likely that the call that fails isn't the first one
    <pinotree> ah sure
    <pinotree> braunr: making mmap reject 0 size length could help? posix says
      such size should be rejected straight away
    <braunr> 17:09 < braunr> if we still have the problem once a small patch to
      prevent null size is applied, then it'll be worth looking more into it
    <braunr> that's the idea
    <braunr> making faulty processes choke on it should work fine :)
    <pinotree> «If len is zero, mmap() shall fail and no mapping shall be
      established.»
    <pinotree> braunr: should i cook up such patch for mmap?
    <braunr> no, the change must be applied in gnumach
    <pinotree> sure, but that could simply such condition in mmap (ie avoiding
      to call io_map on a file)
    <braunr> such calls are erroneous and rare, i don't see the need
    <pinotree> ok
    <braunr> i bet it comes from the exec server anyway :p
    <tschwinge> braunr: Is the mmap with size 0 already a reproducible testcase
      you can use for the diff != 0 assertion?
    <tschwinge> Otherwise I'd have a reproducer now.
    <braunr> tschwinge: i'm not sure but probably yes
    <tschwinge> braunr: Otherwise, take GDB sources, then: gcc -fsplit-stack
      gdb/testsuite/gdb.base/morestack.c && ./a.out
    <tschwinge> I have not looked what exactly this does; I think -fsplit-stack
      is not really implemented for us (needs something in libgcc we might not
      have), is on my GCC TODO list already.
    <braunr> tschwinge: interesting too :)


### IRC, freenode, #hurd, 2012-11-19

    <tschwinge> braunr: Hmm, I have now hit the diff != 0 GNU Mach assertion
      failure during some GCC invocation (GCC testsuite) that does not relate
      to -fsplit-stack (as the others before always have).
    <tschwinge> Reproduced:
      /media/erich/home/thomas/tmp/gcc/hurd/master.build/gcc/xgcc
      -B/media/erich/home/thomas/tmp/gcc/hurd/master.build/gcc/
      /home/thomas/tmp/gcc/hurd/master/gcc/testsuite/gcc.dg/torture/pr42878-1.c
      -fno-diagnostics-show-caret -O2 -flto -fuse-linker-plugin
      -fno-fat-lto-objects -fcompare-debug -S -o pr42878-1.s
    <tschwinge> Will check whether it's the same backtrace in GNU Mach.
    <tschwinge> Yes, same.
    <braunr> tschwinge: as youpi seems quite busy these days, i'll cook a patch
      and commit it directly
    <tschwinge> braunr: Thanks!  I have, by the way, confirmed that the
      following is enough to trigger the issue: vm_map(mach_task_self(), 0, 0,
      0, 1, 0, 0, 0, 0, 0, 0);
    <tschwinge> ... and before the allocator patch, GNU Mach did accept that
      and return 0 -- though I did not check what effect it actually has.  (And
      I don't think it has any useful one.)  I'm also reading that as of lately
      (Linux 2.6.12), mmap (length = 0) is to return EINVAL, which I think is
      the foremost user of vm_map.
    <pinotree> tschwinge: posix too says to return EINVAL for length = 0
    <braunr> yes, we checked that earlier with youpi

[[!message-id "87sj8522zx.fsf@kepler.schwinge.homeip.net"]].

    <braunr> tschwinge: well, actually your patch is what i had in mind
      (although i'd like one in vm_map_enter to catch wrong kernel requests
      too)
    <braunr> tschwinge: i'll work on it tonight, and do some testing to make
      sure we don't regress critical stuff (exec is another major direct user
      of vm_map iirc)
    <tschwinge> braunr: Oh, OK.  :-)