summaryrefslogtreecommitdiff
path: root/open_issues/ext2fs_page_cache_swapping_leak.mdwn
blob: 81915492bf400bf1b326398811bffd2d9155e7a6 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
[[!meta copyright="Copyright © 2011, 2012, 2013 Free Software Foundation,
Inc."]]

[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
id="license" text="Permission is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation License, Version 1.2 or
any later version published by the Free Software Foundation; with no Invariant
Sections, no Front-Cover Texts, and no Back-Cover Texts.  A copy of the license
is included in the section entitled [[GNU Free Documentation
License|/fdl]]."]]"""]]

[[!tag open_issue_gnumach open_issue_hurd]]

There is a [[!FF_project 272]][[!tag bounty]] on this task.

[[!toc]]


# IRC, OFTC, #debian-hurd, 2011-03-24

    <youpi> I still believe we have an ext2fs page cache swapping leak, however
    <youpi> as the 1.8GiB swap was full, yet the ld process was only 1.5GiB big
    <pinotree> a leak at swapping time, you mean?
    <youpi> I mean the ext2fs page cache being swapped out instead of simply
      dropped
    <pinotree> ah
    <pinotree> so the swap tends to accumulate unuseful stuff, i see
    <youpi> yes
    <youpi> the disk content, basicallyt :)


# IRC, freenode, #hurd, 2011-04-18

    <antrik> damn, a cp -a simply gobbles down swap space...
    <braunr> really ?
    <braunr> that's weird
    <braunr> why would a copy use so much anonymous memory ?
    <braunr> unless the external pager is so busy that the kernel falls back to
      its default pager
    <youpi> that's what I suggested some time ago
    <braunr> maybe this case should be traced in the kernel
    <braunr> a simple message in the kernel buffer to warn that this condition
      happened may help
    <youpi> I'm seeing swap space being kept used on buildds for no real reason
      except possibly backing ext2fs pages
    <youpi> that could help, yes
    <antrik> youpi: I think it was actually slpz who suggested that...
    <youpi> I think we're generally missing feedback from memory behavior
    <antrik> youpi: do you think andrei's kernel instrumentation work might be
      helpful with analyzing such things?
    <youpi> antrik: I think I suggested it too, but never mind
    <youpi> antrik: no, because it's not a trace of events that you want
    <youpi> some specific events would be useful
    <youpi> but then we don't really need a whole framework for that
    <antrik> apt-get upgrade eats swap too
    <youpi> the upgrade itself, or the computation of the ugprade?
    <youpi> apt is a memory eater nowadays
    <antrik> installing the packages
    <antrik> seems to have stabilized though after a while...
    <antrik> so perhaps it's not a leak in this case
    <youpi> ideally we should have a way to know what was put in the swap
    <braunr> how would you represent what's in the swap ?
    <antrik> the apt-get process has 46M of virtual memory above the 128 M
      baseline
    <braunr> mostly libraries i guess
    <braunr> are trheads stacks 8 MiB like on Linux ?
    <youpi> braunr: at least knowing how much of each process is in the swap
    <youpi> braunr: 2MiB
    <braunr> ok
    <youpi> vminfo could also report which parts of the address space are in
      the swap
    <antrik> youpi: would be nice to have some simple utility reporting how
      much of a process' address space is anonymous
    <antrik> (in fact, I wonder why it's not reported by standard tools such as
      ps or top... this shouldn't be too difficult I would think?)
    <antrik> it would be much more useful information than the total virt size,
      which includes rather meaningless disk and device mappings...
    <youpi> agreed
    <braunr> well
    <braunr> there are tools like pmap for this
    <braunr> unfortunately, it's difficult in mach to know what backs a
      non-anonymous mapping
    <braunr> pagers should be able to name their mappings
    <youpi> that'd be helpful for debugging yes
    <braunr> there is almost no overhead in doing that, and it would be very
      useful
    <youpi> and could lead to /proc/pid/maps
    <braunr> yes
    <braunr> isn't there a maps already ?
    <youpi> nope
    <braunr> ok
    <youpi> (probably not very useful without the names)
    <braunr>  ithought i remembered maps without names, and guessed it might
      have been on the hurd for that reason
    <braunr> but i'm not sure
    <youpi> there's the vminfo command, yes
    <braunr> 14:06 < youpi> braunr: at least knowing how much of each process
      is in the swap
    <braunr> wouldn't it be clearer to do it the other way around ?
    <braunr> like a swapinfo tool indicating what it contains ?
    <youpi> sure, but it's a lot more difficult
    <braunr> really ?
    <braunr> why ?
    <youpi> because you have to traverse all the mappings
    <youpi> etc
    <youpi> (in all processes, I mean)
    <youpi> and you have to name what is waht
    <braunr> there are other ways
    <braunr> the swap is a central structure
    <youpi> while simply introducing the swap %  in vminfo
    <youpi> for a given process you know what is what
    <braunr> right
    <youpi> and doing that introduction is  probably very simple
    <braunr> that's a good point
    <braunr> top-down is effectively easier than bottom-up resolution in Mach
      VM
    <antrik> hm... the memory use caused by cp doesn't seem to be reflected in
      the virtual size of any particular process
    <antrik> ghost memory
    <braunr> what's cp vmsize at the time of the problem ?
    <antrik> it's at 134 M right now... so considering the 128 M baseline,
      nothing worth speaking of
    <braunr> right
    <braunr> maybe a copy map during I/O
    <braunr> but I don't know Mach copy maps in detail, as they have been
      eliminated from UVM
    <antrik> BTW, the memory eatup happens even before swap comes into
      play... swapping seems to be a result of the problem, not the cause
    <braunr> what do you mean ?
    <braunr> I thought swapping was the issue
    <braunr> you mean RAM is full before swapping ?
    <antrik> well, I don't know what the actual problem is... I just don't
      understand why the memory use increases without any particular process
      seeing an increase in size
    <antrik> the "free" size in vmstat decreses
    <antrik> once it's eatun up, swap space use increases
    <braunr> well it doesn't change much of it
    <braunr> the anonymous memory pager will use RAM before resorting to the
      external default-pager
    <antrik> I would suspect normal block caching... but then, shouldn't this
      show up in the memory info of the ext2 process?
    <braunr> although, again, I'm not sure of the behaviour of the anonymous
      memory pager
    <braunr> antrik: I don't know how block caching behaves
    <antrik> BTW, is it a know problem that doing ^C on a "cp -a" seems to hang
      the whole system?...
    <antrik> (the whole hurd instance that is... the other instance is not
      affected)
    <youpi> not that I know of
    <braunr> seems like a deadlock in the anonymous memory handling
    <youpi> (and I've never seen that)
    <antrik> happens both in my main system (using ancient hurd/libc) and in my
      subhurd (recently upgraded to current stuff)
    <antrik> this make testing this stuff quite a lot harder... [sigh]
    <antrik> any suggestions how to debug this hang?
    <braunr> antrik: no :/

2011-04-28: [[!taglink open_issue_documentation]]

    <antrik> hm... is it normal that "swap free" doesn't increase as a process'
      memory is paged back in?
    <youpi> yes
    <youpi> there's no real use cleaning swap
    <youpi> on the contrary, it makes paging the process out again longer
    <antrik> hm... so essentially, after swapping back and forth a bit, a part
      of the swap equal to the size of physical RAM will be occupied with stuff
      that is actually in RAM?
    <youpi> yes
    <youpi> so that that RAM can be freed immediately if needed
    <antrik> hm... that means my effective swap size is only like 300 MB... no
      wonder I see crashes under load
    <antrik> err... make that 230 actually
    <antrik> indeed, quitting the application freed both the physical RAM and
      swap space
    <braunr> 02:28 < antrik> hm... is it normal that "swap free" doesn't
      increase as a process' memory is paged back in?
    <braunr> swap is the backing store of anonymous memory, like ext2fs is the
      backing store of memory objects created from its pager
    <braunr> so you can view swap as the file system for everything that isn't
      an external memory object


# IRC, freenode, #hurd, 2011-11-15

    <braunr> hm, now my system got unstable
    <braunr> swap is increasing, without any apparent reason
    <antrik> you mean without any load?
    <braunr> with load, yes
    <braunr> :)
    <antrik> well, with load is "normal"...
    <antrik> at least for some loads
    <braunr> i can't create memory pressure to stress reclaiming without any
      load
    <antrik> what load are you using?
    <braunr> ftp mirrorring
    <antrik> hm... never tried that; but I guess it's similar to apt-get
    <antrik> so yes, that's "normal". I talked about it several times, and also
      wrote to the ML
    <braunr> antrik: ok
    <antrik> if you find out how to fix this, you are my hero ;-)
    <braunr> arg :)
    <antrik> I suspect it's the infamous double swapping problem; but that's
      just a guess
    <braunr> looks like this
    <antrik> BTW, if you give me the exact command, I could check if I see it
      too
    <braunr> i use lftp (mirror -Re) from a linux git repository
    <braunr> through sftp
    <braunr> (lots of small files, big content)
    <antrik> can't you just give me the exact command? I don't feel like
      figuring it out myself
    <braunr> antrik: cd linux-stable; lftp sftp://hurd_addr/
    <braunr> inside lftp: mkdir linux-stable; cd linux-stable; mirror -Re
    <braunr> hm, half of physical memory just got freed
    <braunr> our page cache is really weird :/
    <braunr> (i didn't delete any file when that happened)
    <antrik> hurd_addr?
    <braunr> ssh server ip address
    <braunr> or name
    <braunr> of your hurd :)
    <antrik> I'm confused. you are mirroring *from* the Hurd box?
    <braunr> no, to it
    <antrik> ah, so you login via sftp and then push to it?
    <braunr> yes
    <braunr> fragmentation looks very fine
    <braunr> even for the huge pv_entry cache and its 60k+ entries
    <braunr> (and i'm running a kernel with the cpu layer enabled)
    <braunr> git reset/status/diff/log/grep all work correctly
    <braunr> anyway, mcsim's branch looks quite stable to me
    <antrik> braunr: I can't reproduce the swap leak with ftp. free memory
      idles around 6.5 k (seems to be the threshold where paging starts), and
      swap use is constant
    <antrik> might be because everything swappable is already present in swap
      from previous load I guess...
    <antrik> err... scratch that. was connected to the wrong host, silly me
    <antrik> indeed swap gets eaten away, as expected
    <antrik> but only if free memory actually falls below the
      threshold. otherwise it just oscillates around a constant value, and
      never touches swap
    <antrik> so this seems to confirm the double swapping theory
    <youpi> antrik: is that "double swap" theory written somewhere?
    <youpi> (no, a quick google didn't tell me)


## IRC, freenode, #hurd, 2011-11-16

    <antrik> youpi:
      http://lists.gnu.org/archive/html/l4-hurd/2002-06/msg00001.html talks
      about "double paging". probably it's also the term others used for it;
      however, the term is generally used in a completely different meaning, so
      I guess it's not really suitable for googling either ;-)
    <antrik> IIRC slpz (or perhaps someone else?) proposed a solution to this,
      but I don't remember any details
    <youpi> ok so it's the same thing I was thinking about with swap getting
      filled
    <youpi> my question was: is there something to release the double swap,
      once the ext2fs pager managed to recover?
    <antrik> apparently not
    <antrik> the only way to free the memory seems to be terminating the FS
      server
    <youpi> uh :/


# IRC, freenode, #hurd, 2011-11-30

    <antrik> slpz: basically, whenever free memory goes below the paging
      threshold (which seems to be around 6 MiB) while there is other I/O
      happening, swap usage begins to increase continuously; and only gets
      freed again when the filesystem translator in question exits
    <antrik> so it sounds *very* much like pages go to swap because the
      filesystem isn't quick enough to properly page them out
    <antrik> slpz: I think it was you who talked about double paging a while
      back?
    <slpz> antrik: probably, sounds like me :-)
    <antrik> slpz: I have some indication that the degenerating performance and
      ultimate hang issues I'm seeing are partially or entirely caused by
      double paging...
    <antrik> slpz: I don't remember, did you propose some possible fix?
    <slpz> antrik: hmm... perhaps it wasn't me, because I don't remember trying
      to fix that problem...
    <slpz> antrik: at which point do you think pages get duplicated?
    <antrik> slpz: it was a question. I don't remember whether you proposed
      something or not :-)
    <antrik> slpz: basically, whenever free memory goes below the paging
      threshold (which seems to be around 6 MiB) while there is other I/O
      happening, swap usage begins to increase continuously; and only gets
      freed again when the filesystem translator in question exits
    <antrik> so it sounds *very* much like pages go to swap because the
      filesystem isn't quick enough to properly page them out
    <slpz> antrik: I see
    <slpz> antrik: I didn't addressed this problem directly, but when I've
      modified the pageout mechanism to provide a special treatment for
      external pages, I also removed the possibility of sending them to the
      default pager
    <slpz> antrik: this was in my experimental environment, of course
    <antrik> slpz: oh, nice... so it may fix the issues I'm seeing? :-)
    <antrik> anything testable yet?
    <slpz> antrik: yes, only anonymous memory could be swapped with that
    <slpz> antrik: it works, but is ugly as hell
    <antrik> tschwinge: these is also your observation about compilations
      getting slower on further runs, and my followups... I *suspect* it's the
      same issue

[[performance/degradation]].

    <slpz> antrik: I'm thinking about establishing a repository for these
      experimental versions, so they don't get lost with the time
    <antrik> slpz: please do :-)
    <slpz> antrik: perhaps in savannah's HARD project
    <antrik> even if it's not ready for upstream, it would be nice if I could
      test it -- right now it's bothering me more than any other Hurd issues I
      think...
    <slpz> also, there's another problem which causes performance degradation
      with the simple use of the system
    <tschwinge> slpz: Please just push to Savannah Hurd.  Under your
      slpz/... or similar.
    <tschwinge> antrik: Might very well be, yes.
    <slpz> and I almost sure it is the fragmentation of the task map
    <slpz> tschwinge: ok
    <slpz> after playing a bit with a translator, it can easily get more than
      3000 entries in its map
    <antrik> slpz: yeah, other issues might play a role here as well. I
      observed that terminating the problematic FS servers does free most of
      the memory and remove most of the performance degradation, but in some
      cases it's still very slow
    <slpz> that makes vm_map_lookup a lot slower
    <antrik> on a related note: any idea what can cause paging errors and a
      system hang even when there is plenty of free swap?
    <antrik> (I'm not entirely sure, but my impression is that it *might* be
      related to the swap usage and performance degradation problems)
    <slpz> I think this degree of fragmentation has something to do with the
      reiterative mapping of memory objects which is done in pager-memcpy.c
    <slpz> antrik: which kind of paging errors?
    <antrik> hm... I don't think I ever noted down the exact message; but I
      think it's the same you get when actually running out of swap
    <slpz> antrik: that could be the default pager dying for some internal bug
    <antrik> well, but it *seems* to go along with the performance degradation
      and/or swap usage
    <slpz> I also have the impression that we're using memory objects the wrong
      way
    <antrik> basically, once I get to a certain level of swap use and slowness
      (after about a month of use), the system eventually dies
    <slpz> antrik: I never had a system running for that time, so it could be a
      completely different problem from what I've seen before :-/
    <slpz> Anybody has experience with block-level caches on microkernel
      environments?
    <antrik> slpz: yeah, it typically happens after about a month of my normal
      use... but I can significantly accellerate it by putting some problematic
      load on it, such as large apt-get runs...
    <slpz> I wonder if it would be better to put them in kernel or in user
      space. And in the latter, if it would be better to have one per-device
      shared for all accesing translators, or just each task should have its
      own cache...
    <antrik> slpz:
      http://lists.gnu.org/archive/html/bug-hurd/2011-09/msg00041.html is where
      I described the issue(s)
    <antrik> (should send another update for the most recent findings I
      guess...)
    <antrik> slpz: well, if we move to userspace drivers, the kernel part of
      the question is already answered ;-)
    <antrik> but I'm not sure about per-device cache vs. caching in FS server