[[!meta copyright="Copyright © 2012, 2013 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled [[GNU Free Documentation License|/fdl]]."]]"""]] [[!tag open_issue_libpthread]] `t/fix_have_kernel_resources` Address problem mentioned in [[/libpthread]], *Threads' Death*. # IRC, freenode, #hurd, 2012-08-30 tschwinge: this issue needs more cooperation with the kernel tschwinge: i.e. the ability to tell the kernel where the stack is, so it's unmapped when the thread dies which requiring another thread to perform this deallocation ## IRC, freenode, #hurd, 2013-05-09 braunr: Speaking of which, didn't you say you had another "easy" task? bddebian: make a system call that both terminates a thread and releases memory (the memory released being the thread stack) this way, a thread can completely terminates itself without the assistance of a managing thread or deferring work braunr: That's "easy" ? :) bddebian: since it's just a thread_terminate+vm_deallocate, it is something like thread_terminate_self But a syscall not an RPC right? in hurd terminology, we don't make the distinction the only real syscalls are mach_msg (obviously) and some to get well known port rights e.g. mach_task_self everything else should be an RPC but could be a system call for performance since mach was designed to support clusters, it was necessary that anything not strictly machine-local was an RPC and it also helps emulation a lot so keep doing RPCs :p ## IRC, freenode, #hurd, 2013-05-10 i'm not sure it should only apply to self though youpi: can we get a quick opinion on this please ? i've suggested bddebian to work on a new RPC that both terminates a thread and releases its stack to help fix libpthread and initially, i thought of it as operating only on the calling thread do you see any reason to make it work on any thread ? (e.g. a real thread_terminate + vm_deallocate) (or any reason not to) thread stack deallocation is always a burden indeed I'd tend to think it'd be useful, but perhaps ask the list ## IRC, freenode, #hurd, 2013-06-26 looks like there is a port right leak in libpthread grmbl, the port leak seems to come from mach_port_destroy being buggy :/ hum, apparently we're not the only ones to suffer from port leaks wrt mach_port_destroy ew, libpthread is leaking memory or ports? both sounds great ;) as it is, libpthread doesn't destroy threads it queues them so they're recycled late r but there is confusion between the thread structure itself and its internal resources i.e. there is pthread_alloc which allocates a thread structure, and pthread_create which allocates everything else but on pthread_exit, nothing is destroyed when a thread structure is reused, its internal resources are replaced by new instances oh it's ok for joinable threads but most of our threads are detached pinotree: as expected, it's bigger than expected :p so i won't be able to write a quick fix the true way to fix this is make it possible for threads to free their own resources let's do that :p ok, got the new thread termination function, i'll build eglibc package providing it, then experiment with libpthread braunr: iirc there's also a tschwinge patch in the debian eglibc about that ah libpthread_fix.diff i see thanks for the notice bddebian: http://www.sceen.net/~rbraun/0001-thread_terminate_deallocate.patch bddebian: this is what it looks like see, short and easy Aye but didn't youpi say not to bother with it?? he did ? i don't remember I thought that was the implication. Or maybe that was the one I already did!? i'd be interested in reading that anyway, there still are problems in libpthread, and this call is one building block to fix some of them some important ones (big leaks) ## IRC, freenode, #hurd, 2013-06-29 damn, i fix leaks in libpthread, only to find out leaks somewhere else :( bddebian: ok, actually it was a bit more complicated than what i showed you because in addition to the stack, the call must also release the send right in the caller's ipc space (it can't be released before since there would be no mean to reference the thread to destroy) or perhaps it should strictly be reserved to self termination hmm yes it would probably be simpler but it should be a decent compromise i'm close to having a libpthread that doesn't leak anything and that properly destroys threads and their resources ## IRC, freenode, #hurd, 2013-06-30 bddebian: ok, it was even more tricky, because the kernel would save the return value on the user stack (which is released by the call and then invalid) before checking for asynchronous software traps (ASTs, a kind of software interrupts in mach), and terminating the calling thread is done by a deferred AST ... :) hmm, making threads able to terminate themselves makes rpctrace a bit useless :/ well, more restricted ok so, tough question : i have a small test program that creates a thread, and inspect its state before any thread dies i can see msg_report_wait requests when using ps (one per thread) one of these requests create a new receive right, apparently for the second thread in the test program each time i use ps, i can see the sequence numbers of two receive rights increase i guess these rights are related to proc and signal handling per thread but i can't find what create them does anyone know ? tschwing_: ^ :) again, too many things wrong elsewhere to cleanly destroy threads .. something is deeply wrong with controlling terminals .. ## IRC, freenode, #hurd, 2013-07-01 youpi: if you happen to notice what receive right is created for each thread (beyond the obvious port used for blocking and waking up), please let me know it's the only port leak i have with thread destruction and i think it's related to the proc server since i see the sequence number increase every time i use ps pinotree: my change doesn't fix all the pthread leaks but it's a lot better bddebian: i've spent almost the whole week end trying to find the last port leak without success there is some weird bug related to the controlling tty that hits me every time i try to change something it's the same bug that prevents ttys from being correctly closed when using ssh or screen well maybe not the same, but it's close some stale receive right kept around for no apparent reason and i can't find its source ## IRC, freenode, #hurd, 2013-07-02 and btw, i don't think i can make my libpthread patch work i'll just aim at avoiding leaks, but destroying threads and their related resources depends on other changes i don't clearly see ## IRC, freenode, #hurd, 2013-07-03 grmbl, i don't want to give up thread destruction .. ## IRC, freenode, #hurd, 2013-07-15 btw, my work on thread destruction is currently stalled i don't have much free time right now ## IRC, freenode, #hurd, 2013-09-13 i think i know why my thread_terminate_deallocate patches leak one receive port :> but now i'm not sure of the proper solution every time a thread is created and destroyed, a receive right is leaked i guess it's simply the reply port .. grmbl i guess i have to make it a simpleroutine ... hm too bad, it's not the reply port :( it's also leaking some memory it doesn't seem related to my changes though stacks, rights, and threads are correctly destroyed some obscure state is left behind i wonder how exception ports are dealt with vminfo seems to confirm memory is leaking in the heap humpf oh silly me i don't detach threads well, detach them ;) hm worse :p now i get additional dead names but it's a step forward ## IRC, freenode, #hurd, 2013-09-16 that thread port leak is so strange the leaked port seems to be created when the new thread starts running so it looks like a port the kernel would implicitely create hm could it be a thread-specific reply port ? ah, yes, there is one of those how come mach/mig-reply.c in glibc isn't thread-safe ? it is overriden by sysdeps/mach/hurd/img-reply.c I guess which uses a threadvar for the mig reply port oh talking of which, there is also last_value in sysdeps/mach/strerror_l.c strerror_thread_freeres is supposed to get called, but who knows it does look to be that port iirc that's the issue which prevents from letting us make threads exit on idleness? one of them ok maybe the only one, yes i see memory leaks but they could be related/normal (i.e. not actual leaks) on the other hand, i also can't boot a hurd with my patch but i consider removing such leaks a priority does anyone know the semantic difference between __mig_put_reply_port and __mig_dealloc_reply_port ? i guess __mig_dealloc_reply_port is actually a destruction operation, right ? AIUI, dealloc is used when one wants the port not to be reused at all because it has been used as a reference for something, and can still be currently in use while put_reply would be when we're really done with it, and won't use it again, and can thus be used as such or at least something like that heh __mig_dealloc_reply_port calls __mach_port_mod_refs, which is a RPC, and creates a new reply port when destroying the current one bah that's fine, it's a deref of the old port, which is not in the reply_port variable any more it's fine, but still a leak well, dealloc does not completely deallocs, yes that's not really the problem here i've introduced a case that wasn't considered at the time, namely that a thread can destroy itself we probably need another function to be called from the thread exit i'll simply try with mach_port_destroy mach_port_destroy seems to be a RPC too ... grmbl isn't there a trap version somehow ? not in libc erf at least i know what's wrong now :) there still is a small memory leak i have to investigate but outside the stack the stack, the thread name and the thread are correctly destroyed slabinfo confirms only one port leak and nothing else is leaked ok so the port leak was indeed the thread-specific reply port, taken care of there are also memory leaks too ## IRC, freenode, #hurd, 2013-09-17 teythoon: on my side, i'm getting to know our threading implementation better closing to clean thread destruction x15 ipc will hide reply ports ;p memory leaks solved \o/ now, have to fix memory release when joining proper reference counting on detach/join/exit, let's see how it goes .. seems to work fine ## IRC, freenode, #hurd, 2013-09-18 ok i'll soon have gnumach and libc packages including proper thread destruction :> braunr: why did you have to touch gnumach? to add a call allowing threads to release ports and memory i.e. their last self reference, their reply port and their stack let me public my current patches braunr: thread_commit_suicide ? hehe initially thread_terminate_self but it can be used by other threads too to i named it thread_terminate_release http://darnassus.sceen.net/~rbraun/0001-pthread_thread_halt.patch http://darnassus.sceen.net/~rbraun/0001-thread_terminate_release.patch the pthread patch needs to be polished because it changes the semantics of pthread_thread_halt but other than that, it should be complete pthread_thread_halt_reallyhalt ok let's try these libc packages old static ext2fs for the root, but other than that, it boots let's try iceweasel (i'll need to build a hurd package against this new libc, removing the libports_stability patch which prevents thread destruction in servers on the way) prevents thread destruction o_O yes in libports only ;p oh, *only* in libports, I assumed for a moment that it affected almost every component of the Hurd... *phew( ... :) that's why, after a burst of messages, say because of aptitude (select), you may see a few hundred threads still hanging around also why unused servers remain running even after several minutes, where the normal timeout is 2mins I wondered about that, some servers (symlink comes to mind) seem to go away if unused (or that's how I read the code) symlinks are usually not servers, since most of them actually exist in file systems, and are implemented through an optimization yes I know that trans/symlink.c reads: /* The timeout here is 10 minutes */ err = mach_msg_server_timeout (fsys_server, 0, control, MACH_RCV_TIMEOUT, 1000 * 60 * 10); if (err == MACH_RCV_TIMED_OUT) exit (0); ok hm, /hurd/symlink doesn't feel at all like a symlink... but works like one well, starting iceweasel makes X on my host freeze oO bbl /hurd/symlink translators do go away after being unused for 10 minutes... this is funny if they are set up by hand instead of being started from a passive translator record magically vanishing symlinks ;) ## IRC, freenode, #hurd, 2013-09-19 hum, i can't rebuild a hurd package :( braunr: with your thread destruction patches in libc? yes but it's unrelated In file included from ../../libdiskfs/boot-start.c:38:0: ./fsys_reply_U.h:173:15: error: conflicting types for ‘fsys_get_children’ i didn't see a new libc debian release hm, David reported that as well id:CAEvUa7=QzOiS41G5Vq8k4AiaN10jAPm+CL_205OHJnL0xpJXbw@mail.gmail.com uh oh it seems I didn't add a _reply suffix to the reply routines :/ there's quite a bit of fallout from my patches, I kinda feel bad :( teythoon: what i'm wondering is what youpi did too, since he got hurd binary packages braunr: well neither he nor I noticed that b/c for us the declarations were just missing from libc you mean ? or hum gnumach-common ? not sure actually no it's not a gnumach thing hurd-dev then the build system should have cought these, or mig... also, i see you changed fsys_reply.defs, but nothing about fsys_request.defs I have no fsys_requests.defs looks like there was no fsys_request.defs in the first place ... *sigh* do you know an application that often creates and destroys threads ? no, sorry maybe some test suite ah right sysbench maybe also, i've been hit by a lot more network deadlocks than usual lately fixing netdde has gained some priority in my todo list ## IRC, freenode, #hurd, 2013-09-20 oh, git is multithreaded great so i've actually tested my libpthread patch quite a lot ## IRC, freenode, #hurd, 2013-09-25 on a side note, i was able to build gnumach/libc/hurd packages with thread destruction nice :) they boot and work mostly fine, although they add their own issues e.g. the comm field of the root ext2fs is empty ps crashes when trying to display threads but thread destruction actually works, i.e. servers (those that are configured that away at least) go away after some time, and even heavily used servers such as ext2fs dynamically scale over time :) ## IRC, freenode, #hurd, 2013-10-10 concerning threads, i think i figured out the last bugs i had with thread destruction it should be well on its way to be merged by the end of the year ## IRC, freenode, #hurd, 2013-10-11 braunr: is your thread destruction patch ready for testing? gg0: there are packages at my repository, yes but i still have hurd fixes to do before i polish it in particular, posix says returning from main() stops the entire process and all other threads i didn't check that during the switch to pthreads, and ext2fs (and maybe others) actually return from main but expect other threads to live on this creates problems when the main thread is actually destroyed, but not the process braunr: tmpfs does something like that, but calls pthread_exit at the end of main same effect this was fine with cthreads, but must be changed with pthreads and libpthread must be fixed to enforce it (or libc) diskfs_startup_diskfs should probably be changed to reuse the main thread instead of returning ## IRC, freenode, #hurd, 2013-10-19 I know what threads are, but what is 'thread destruction'? the hurd currently never destroys individual threads they're destroyed when tasks are destroyed if the number of threads in a task peaks at a high number, say thousands of them, they'll remain until the task is terminated such tasks are usually file systems, normally never restarted (and in the case of the root file system, not restartable) this results in a form of leak another effect of this leak is that servers which should go away because of inactivity still remain since thread destruction doesn't actually work, the debian package uses a patch to prevent worker threads from timeouting and to finish with, since thread destruction actually doesn't work, normal (unpatched) applications that destroy threads are certainly failing bad i just need to polish a few things, wait for youpi to finish his work on TLS to resolve conflicts, and that will be all