summaryrefslogtreecommitdiff
path: root/open_issues/libpthread/t/fix_have_kernel_resources.mdwn
blob: 6f09ea0d9e3ff7afb6dbbc74192ecb3b306b61c5 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
[[!meta copyright="Copyright © 2012, 2013 Free Software Foundation, Inc."]]

[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
id="license" text="Permission is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation License, Version 1.2 or
any later version published by the Free Software Foundation; with no Invariant
Sections, no Front-Cover Texts, and no Back-Cover Texts.  A copy of the license
is included in the section entitled [[GNU Free Documentation
License|/fdl]]."]]"""]]

[[!tag open_issue_libpthread]]

`t/fix_have_kernel_resources`

Address problem mentioned in [[/libpthread]], *Threads' Death*.


# IRC, freenode, #hurd, 2012-08-30

    <braunr> tschwinge: this issue needs more cooperation with the kernel
    <braunr> tschwinge: i.e. the ability to tell the kernel where the stack is,
      so it's unmapped when the thread dies
    <braunr> which requiring another thread to perform this deallocation


## IRC, freenode, #hurd, 2013-05-09

    <bddebian> braunr: Speaking of which, didn't you say you had another "easy"
      task?
    <braunr> bddebian: make a system call that both terminates a thread and
      releases memory
    <braunr> (the memory released being the thread stack)
    <braunr> this way, a thread can completely terminates itself without the
      assistance of a managing thread or deferring work
    <bddebian> braunr: That's "easy" ? :)
    <braunr> bddebian: since it's just a thread_terminate+vm_deallocate, it is
    <braunr> something like thread_terminate_self
    <bddebian> But a syscall not an RPC right?
    <braunr> in hurd terminology, we don't make the distinction
    <braunr> the only real syscalls are mach_msg (obviously) and some to get
      well known port rights
    <braunr> e.g. mach_task_self
    <braunr> everything else should be an RPC but could be a system call for
      performance
    <braunr> since mach was designed to support clusters, it was necessary that
      anything not strictly machine-local was an RPC
    <braunr> and it also helps emulation a lot
    <braunr> so keep doing RPCs :p


## IRC, freenode, #hurd, 2013-05-10

    <braunr> i'm not sure it should only apply to self though
    <braunr> youpi: can we get a quick opinion on this please ?
    <braunr> i've suggested bddebian to work on a new RPC that both terminates
      a thread and releases its stack to help fix libpthread
    <braunr> and initially, i thought of it as operating only on the calling
      thread
    <braunr> do you see any reason to make it work on any thread ?
    <braunr> (e.g. a real thread_terminate + vm_deallocate)
    <braunr> (or any reason not to)
    <youpi> thread stack deallocation is always a burden indeed
    <youpi> I'd tend to think it'd be useful, but perhaps ask the list


## IRC, freenode, #hurd, 2013-06-26

    <braunr> looks like there is a port right leak in libpthread
    <braunr> grmbl, the port leak seems to come from mach_port_destroy being
      buggy :/
    <braunr> hum, apparently we're not the only ones to suffer from port leaks
      wrt mach_port_destroy
    <braunr> ew, libpthread is leaking
    <pinotree> memory or ports?
    <braunr> both
    <pinotree> sounds great ;)
    <braunr> as it is, libpthread doesn't destroy threads
    <braunr> it queues them so they're recycled late
    <braunr> r
    <braunr> but there is confusion between the thread structure itself and its
      internal resources
    <braunr> i.e. there is pthread_alloc which allocates a thread structure,
      and pthread_create which allocates everything else
    <braunr> but on pthread_exit, nothing is destroyed
    <braunr> when a thread structure is reused, its internal resources are
      replaced by new instances
    <pinotree> oh
    <braunr> it's ok for joinable threads but most of our threads are detached
    <braunr> pinotree: as expected, it's bigger than expected :p
    <braunr> so i won't be able to write a quick fix
    <braunr> the true way to fix this is make it possible for threads to free
      their own resources
    <braunr> let's do that :p
    <braunr> ok, got the new thread termination function, i'll build eglibc
      package providing it, then experiment with libpthread
    <pinotree> braunr: iirc there's also a tschwinge patch in the debian eglibc
      about that
    <braunr> ah
    <pinotree> libpthread_fix.diff
    <braunr> i see
    <braunr> thanks for the notice
    <braunr> bddebian:
      http://www.sceen.net/~rbraun/0001-thread_terminate_deallocate.patch
    <braunr> bddebian: this is what it looks like
    <braunr> see, short and easy
    <bddebian> Aye but didn't youpi say not to bother with it??
    <braunr> he did ?
    <braunr> i don't remember
    <bddebian> I thought that was the implication.  Or maybe that was the one I
      already did!?
    <braunr> i'd be interested in reading that
    <braunr> anyway, there still are problems in libpthread, and this call is
      one building block to fix some of them
    <braunr> some important ones
    <braunr> (big leaks)


## IRC, freenode, #hurd, 2013-06-29

    <braunr> damn, i fix leaks in libpthread, only to find out leaks somewhere
      else :(
    <braunr> bddebian: ok, actually it was a bit more complicated than what i
      showed you
    <braunr> because in addition to the stack, the call must also release the
      send right in the caller's ipc space
    <braunr> (it can't be released before since there would be no mean to
      reference the thread to destroy)
    <braunr> or perhaps it should strictly be reserved to self termination
    <braunr> hmm
    <braunr> yes it would probably be simpler
    <braunr> but it should be a decent compromise
    <braunr> i'm close to having a libpthread that doesn't leak anything
    <braunr> and that properly destroys threads and their resources


## IRC, freenode, #hurd, 2013-06-30

    <braunr> bddebian: ok, it was even more tricky, because the kernel would
      save the return value on the user stack (which is released by the call
      and then invalid) before checking for asynchronous software traps (ASTs,
      a kind of software interrupts in mach), and terminating the calling
      thread is done by a deferred AST ... :)
    <braunr> hmm, making threads able to terminate themselves makes rpctrace a
      bit useless :/
    <braunr> well, more restricted

    <braunr> ok so, tough question :
    <braunr> i have a small test program that creates a thread, and inspect its
      state before any thread dies
    <braunr> i can see msg_report_wait requests when using ps
    <braunr> (one per thread)
    <braunr> one of these requests create a new receive right, apparently for
      the second thread in the test program
    <braunr> each time i use ps, i can see the sequence numbers of two receive
      rights increase
    <braunr> i guess these rights are related to proc and signal handling per
      thread
    <braunr> but i can't find what create them
    <braunr> does anyone know ?
    <braunr> tschwing_: ^ :)

    <braunr> again, too many things wrong elsewhere to cleanly destroy threads
      ..
    <braunr> something is deeply wrong with controlling terminals ..


## IRC, freenode, #hurd, 2013-07-01

    <braunr> youpi: if you happen to notice what receive right is created for
      each thread (beyond the obvious port used for blocking and waking up),
      please let me know
    <braunr> it's the only port leak i have with thread destruction
    <braunr> and i think it's related to the proc server since i see the
      sequence number increase every time i use ps

    <braunr> pinotree: my change doesn't fix all the pthread leaks but it's a
      lot better
    <braunr> bddebian: i've spent almost the whole week end trying to find the
      last port leak without success
    <braunr> there is some weird bug related to the controlling tty that hits
      me every time i try to change something
    <braunr> it's the same bug that prevents ttys from being correctly closed
      when using ssh or screen
    <braunr> well maybe not the same, but it's close
    <braunr> some stale receive right kept around for no apparent reason
    <braunr> and i can't find its source


## IRC, freenode, #hurd, 2013-07-02

    <braunr> and btw, i don't think i can make my libpthread patch work
    <braunr> i'll just aim at avoiding leaks, but destroying threads and their
      related resources depends on other changes i don't clearly see


## IRC, freenode, #hurd, 2013-07-03

    <braunr> grmbl, i don't want to give up thread destruction ..


## IRC, freenode, #hurd, 2013-07-15

    <braunr> btw, my work on thread destruction is currently stalled
    <braunr> i don't have much free time right now


## IRC, freenode, #hurd, 2013-09-13

    <braunr> i think i know why my thread_terminate_deallocate patches leak one
      receive port :>
    <braunr> but now i'm not sure of the proper solution
    <braunr> every time a thread is created and destroyed, a receive right is
      leaked
    <braunr> i guess it's simply the reply port ..
    <braunr> grmbl
    <braunr> i guess i have to make it a simpleroutine ...
    <braunr> hm too bad, it's not the reply port :(
    <braunr> it's also leaking some memory
    <braunr> it doesn't seem related to my changes though
    <braunr> stacks, rights, and threads are correctly destroyed
    <braunr> some obscure state is left behind
    <braunr> i wonder how exception ports are dealt with
    <braunr> vminfo seems to confirm memory is leaking in the heap
    <braunr> humpf
    <braunr> oh silly me
    <braunr> i don't detach threads
    <teythoon> well, detach them ;)
    <braunr> hm worse :p
    <braunr> now i get additional dead names
    <braunr> but it's a step forward


## IRC, freenode, #hurd, 2013-09-16

    <braunr> that thread port leak is so strange
    <braunr> the leaked port seems to be created when the new thread starts
      running
    <braunr> so it looks like a port the kernel would implicitely create
    <braunr> hm could it be a thread-specific reply port ?
    <youpi> ah, yes, there is one of those
    <braunr> how come mach/mig-reply.c in glibc isn't thread-safe ?
    <youpi> it is overriden by sysdeps/mach/hurd/img-reply.c I guess
    <youpi> which uses a threadvar for the mig reply port
    <braunr> oh
    <youpi> talking of which, there is also last_value in
      sysdeps/mach/strerror_l.c
    <youpi> strerror_thread_freeres is supposed to get called, but who knows
    <braunr> it does look to be that port
    <youpi> iirc that's the issue which prevents from letting us make threads
      exit on idleness?
    <braunr> one of them
    <youpi> ok
    <braunr> maybe the only one, yes
    <braunr> i see memory leaks but they could be related/normal
    <braunr> (i.e. not actual leaks)
    <braunr> on the other hand, i also can't boot a hurd with my patch
    <braunr> but i consider removing such leaks a priority
    <braunr> does anyone know the semantic difference between
      __mig_put_reply_port and __mig_dealloc_reply_port ?
    <braunr> i guess __mig_dealloc_reply_port is actually a destruction
      operation, right ?
    <youpi> AIUI, dealloc is used when one wants the port not to be reused at
      all
    <youpi> because it has been used as a reference for something, and can
      still be currently in use
    <youpi> while put_reply would be when we're really done with it, and won't
      use it again, and can thus be used as such
    <youpi> or at least something like that
    <braunr> heh
    <braunr> __mig_dealloc_reply_port calls __mach_port_mod_refs, which is a
      RPC, and creates a new reply port when destroying the current one
    <youpi> bah
    <youpi> that's fine, it's a deref of the old port, which is not in the
      reply_port variable any more
    <braunr> it's fine, but still a leak
    <youpi> well, dealloc does not completely deallocs, yes
    <braunr> that's not really the problem here
    <braunr> i've introduced a case that wasn't considered at the time, namely
      that a thread can destroy itself
    <youpi> we probably need another function to be called from the thread exit
    <braunr> i'll simply try with mach_port_destroy
    <braunr> mach_port_destroy seems to be a RPC too ...
    <braunr> grmbl
    <youpi> isn't there a trap version somehow ?
    <braunr> not in libc
    <youpi> erf
    <braunr> at least i know what's wrong now :)
    <braunr> there still is a small memory leak i have to investigate
    <braunr> but outside the stack
    <braunr> the stack, the thread name and the thread are correctly destroyed
    <braunr> slabinfo confirms only one port leak and nothing else is leaked
    <braunr> ok so the port leak was indeed the thread-specific reply port,
      taken care of
    <braunr> there are also memory leaks too


## IRC, freenode, #hurd, 2013-09-17

    <braunr> teythoon: on my side, i'm getting to know our threading
      implementation better
    <braunr> closing to clean thread destruction
    <braunr> x15 ipc will hide reply ports ;p
    <braunr> memory leaks solved \o/
    <braunr> now, have to fix memory release when joining
    <braunr> proper reference counting on detach/join/exit, let's see how it
      goes ..
    <braunr> seems to work fine


## IRC, freenode, #hurd, 2013-09-18

    <braunr> ok i'll soon have gnumach and libc packages including proper
      thread destruction :>
    <teythoon> braunr: why did you have to touch gnumach?
    <braunr> to add a call allowing threads to release ports and memory
    <braunr> i.e. their last self reference, their reply port and their stack
    <braunr> let me public my current patches
    <teythoon> braunr: thread_commit_suicide ?
    <braunr> hehe
    <braunr> initially thread_terminate_self but
    <braunr> it can be used by other threads too
    <braunr> to i named it thread_terminate_release
    <braunr> http://darnassus.sceen.net/~rbraun/0001-pthread_thread_halt.patch
    <braunr>
      http://darnassus.sceen.net/~rbraun/0001-thread_terminate_release.patch
    <braunr> the pthread patch needs to be polished because it changes the
      semantics of pthread_thread_halt
    <braunr> but other than that, it should be complete
    <pinotree> pthread_thread_halt_reallyhalt
    <braunr> ok let's try these libc packages
    <braunr> old static ext2fs for the root, but other than that, it boots
    <braunr> let's try iceweasel
    <braunr> (i'll need to build a hurd package against this new libc, removing
      the libports_stability patch which prevents thread destruction in servers
      on the way)
    <teythoon> prevents thread destruction o_O
    <braunr> yes
    <braunr> in libports only ;p
    <teythoon> oh, *only* in libports, I assumed for a moment that it affected
      almost every component of the Hurd...
    <teythoon> *phew(
    <braunr> ... :)
    <braunr> that's why, after a burst of messages, say because of aptitude
      (select), you may see a few hundred threads still hanging around
    <braunr> also why unused servers remain running even after several minutes,
      where the normal timeout is 2mins
    <teythoon> I wondered about that, some servers (symlink comes to mind) seem
      to go away if unused (or that's how I read the code)
    <braunr> symlinks are usually not servers, since most of them actually
      exist in file systems, and are implemented through an optimization
    <teythoon> yes I know that
    <teythoon> trans/symlink.c reads:
    <teythoon>       /* The timeout here is 10 minutes */
    <teythoon>       err = mach_msg_server_timeout (fsys_server, 0, control,
    <teythoon> 				     MACH_RCV_TIMEOUT, 1000 * 60 * 10);
    <teythoon>       if (err == MACH_RCV_TIMED_OUT)
    <teythoon> 	exit (0);
    <braunr> ok
    <teythoon> hm, /hurd/symlink doesn't feel at all like a symlink... but
      works like one
    <braunr> well, starting iceweasel makes X on my host freeze oO
    <braunr> bbl
    <teythoon> /hurd/symlink translators do go away after being unused for 10
      minutes... this is funny if they are set up by hand instead of being
      started from a passive translator record
    <teythoon> magically vanishing symlinks ;)


## IRC, freenode, #hurd, 2013-09-19

    <braunr> hum, i can't rebuild a hurd package :(
    <teythoon> braunr: with your thread destruction patches in libc?
    <braunr> yes but it's unrelated
    <braunr> In file included from ../../libdiskfs/boot-start.c:38:0:
    <braunr> ./fsys_reply_U.h:173:15: error: conflicting types for
      ‘fsys_get_children’
    <braunr> i didn't see a new libc debian release
    <teythoon> hm, David reported that as well
    <teythoon>
      id:CAEvUa7=QzOiS41G5Vq8k4AiaN10jAPm+CL_205OHJnL0xpJXbw@mail.gmail.com
    <teythoon> uh oh
    <teythoon> it seems I didn't add a _reply suffix to the reply routines :/
    <teythoon> there's quite a bit of fallout from my patches, I kinda feel bad
      :(
    <braunr> teythoon: what i'm wondering is what youpi did too, since he got
      hurd binary packages
    <teythoon> braunr: well neither he nor I noticed that b/c for us the
      declarations were just missing
    <braunr> from libc you mean ?
    <braunr> or hum gnumach-common ?
    <teythoon> not sure actually
    <braunr> no it's not a gnumach thing
    <braunr> hurd-dev then
    <teythoon> the build system should have cought these, or mig...
    <braunr> also, i see you changed fsys_reply.defs, but nothing about
      fsys_request.defs
    <teythoon> I have no fsys_requests.defs
    <braunr> looks like there was no fsys_request.defs in the first place
      ... *sigh*
    <braunr> do you know an application that often creates and destroys threads
      ?
    <teythoon> no, sorry
    <pinotree> maybe some test suite
    <braunr> ah right
    <braunr> sysbench maybe
    <braunr> also, i've been hit by a lot more network deadlocks than usual
      lately
    <braunr> fixing netdde has gained some priority in my todo list


## IRC, freenode, #hurd, 2013-09-20

    <braunr> oh, git is multithreaded
    <braunr> great
    <braunr> so i've actually tested my libpthread patch quite a lot