[[!meta copyright="Copyright © 2012, 2013, 2014 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled [[GNU Free Documentation License|/fdl]]."]]"""]] [[!tag open_issue_libpthread]] `t/fix_have_kernel_resources` Address problem mentioned in [[/libpthread]], *Threads' Death*. # IRC, freenode, #hurd, 2012-08-30 tschwinge: this issue needs more cooperation with the kernel tschwinge: i.e. the ability to tell the kernel where the stack is, so it's unmapped when the thread dies which requiring another thread to perform this deallocation ## IRC, freenode, #hurd, 2013-05-09 braunr: Speaking of which, didn't you say you had another "easy" task? bddebian: make a system call that both terminates a thread and releases memory (the memory released being the thread stack) this way, a thread can completely terminates itself without the assistance of a managing thread or deferring work braunr: That's "easy" ? :) bddebian: since it's just a thread_terminate+vm_deallocate, it is something like thread_terminate_self But a syscall not an RPC right? in hurd terminology, we don't make the distinction the only real syscalls are mach_msg (obviously) and some to get well known port rights e.g. mach_task_self everything else should be an RPC but could be a system call for performance since mach was designed to support clusters, it was necessary that anything not strictly machine-local was an RPC and it also helps emulation a lot so keep doing RPCs :p ## IRC, freenode, #hurd, 2013-05-10 i'm not sure it should only apply to self though youpi: can we get a quick opinion on this please ? i've suggested bddebian to work on a new RPC that both terminates a thread and releases its stack to help fix libpthread and initially, i thought of it as operating only on the calling thread do you see any reason to make it work on any thread ? (e.g. a real thread_terminate + vm_deallocate) (or any reason not to) thread stack deallocation is always a burden indeed I'd tend to think it'd be useful, but perhaps ask the list ## IRC, freenode, #hurd, 2013-06-26 looks like there is a port right leak in libpthread grmbl, the port leak seems to come from mach_port_destroy being buggy :/ hum, apparently we're not the only ones to suffer from port leaks wrt mach_port_destroy ew, libpthread is leaking memory or ports? both sounds great ;) as it is, libpthread doesn't destroy threads it queues them so they're recycled late r but there is confusion between the thread structure itself and its internal resources i.e. there is pthread_alloc which allocates a thread structure, and pthread_create which allocates everything else but on pthread_exit, nothing is destroyed when a thread structure is reused, its internal resources are replaced by new instances oh it's ok for joinable threads but most of our threads are detached pinotree: as expected, it's bigger than expected :p so i won't be able to write a quick fix the true way to fix this is make it possible for threads to free their own resources let's do that :p ok, got the new thread termination function, i'll build eglibc package providing it, then experiment with libpthread braunr: iirc there's also a tschwinge patch in the debian eglibc about that ah libpthread_fix.diff i see thanks for the notice bddebian: http://www.sceen.net/~rbraun/0001-thread_terminate_deallocate.patch bddebian: this is what it looks like see, short and easy Aye but didn't youpi say not to bother with it?? he did ? i don't remember I thought that was the implication. Or maybe that was the one I already did!? i'd be interested in reading that anyway, there still are problems in libpthread, and this call is one building block to fix some of them some important ones (big leaks) ## IRC, freenode, #hurd, 2013-06-29 damn, i fix leaks in libpthread, only to find out leaks somewhere else :( bddebian: ok, actually it was a bit more complicated than what i showed you because in addition to the stack, the call must also release the send right in the caller's ipc space (it can't be released before since there would be no mean to reference the thread to destroy) or perhaps it should strictly be reserved to self termination hmm yes it would probably be simpler but it should be a decent compromise i'm close to having a libpthread that doesn't leak anything and that properly destroys threads and their resources ## IRC, freenode, #hurd, 2013-06-30 bddebian: ok, it was even more tricky, because the kernel would save the return value on the user stack (which is released by the call and then invalid) before checking for asynchronous software traps (ASTs, a kind of software interrupts in mach), and terminating the calling thread is done by a deferred AST ... :) hmm, making threads able to terminate themselves makes rpctrace a bit useless :/ well, more restricted ok so, tough question : i have a small test program that creates a thread, and inspect its state before any thread dies i can see msg_report_wait requests when using ps (one per thread) one of these requests create a new receive right, apparently for the second thread in the test program each time i use ps, i can see the sequence numbers of two receive rights increase i guess these rights are related to proc and signal handling per thread but i can't find what create them does anyone know ? tschwing_: ^ :) again, too many things wrong elsewhere to cleanly destroy threads .. something is deeply wrong with controlling terminals .. ## IRC, freenode, #hurd, 2013-07-01 youpi: if you happen to notice what receive right is created for each thread (beyond the obvious port used for blocking and waking up), please let me know it's the only port leak i have with thread destruction and i think it's related to the proc server since i see the sequence number increase every time i use ps pinotree: my change doesn't fix all the pthread leaks but it's a lot better bddebian: i've spent almost the whole week end trying to find the last port leak without success there is some weird bug related to the controlling tty that hits me every time i try to change something it's the same bug that prevents ttys from being correctly closed when using ssh or screen well maybe not the same, but it's close some stale receive right kept around for no apparent reason and i can't find its source ## IRC, freenode, #hurd, 2013-07-02 and btw, i don't think i can make my libpthread patch work i'll just aim at avoiding leaks, but destroying threads and their related resources depends on other changes i don't clearly see ## IRC, freenode, #hurd, 2013-07-03 grmbl, i don't want to give up thread destruction .. ## IRC, freenode, #hurd, 2013-07-15 btw, my work on thread destruction is currently stalled i don't have much free time right now ## IRC, freenode, #hurd, 2013-09-13 i think i know why my thread_terminate_deallocate patches leak one receive port :> but now i'm not sure of the proper solution every time a thread is created and destroyed, a receive right is leaked i guess it's simply the reply port .. grmbl i guess i have to make it a simpleroutine ... hm too bad, it's not the reply port :( it's also leaking some memory it doesn't seem related to my changes though stacks, rights, and threads are correctly destroyed some obscure state is left behind i wonder how exception ports are dealt with vminfo seems to confirm memory is leaking in the heap humpf oh silly me i don't detach threads well, detach them ;) hm worse :p now i get additional dead names but it's a step forward ## IRC, freenode, #hurd, 2013-09-16 that thread port leak is so strange the leaked port seems to be created when the new thread starts running so it looks like a port the kernel would implicitely create hm could it be a thread-specific reply port ? ah, yes, there is one of those how come mach/mig-reply.c in glibc isn't thread-safe ? it is overriden by sysdeps/mach/hurd/img-reply.c I guess which uses a threadvar for the mig reply port oh talking of which, there is also last_value in sysdeps/mach/strerror_l.c strerror_thread_freeres is supposed to get called, but who knows it does look to be that port iirc that's the issue which prevents from letting us make threads exit on idleness? one of them ok maybe the only one, yes i see memory leaks but they could be related/normal (i.e. not actual leaks) on the other hand, i also can't boot a hurd with my patch but i consider removing such leaks a priority does anyone know the semantic difference between __mig_put_reply_port and __mig_dealloc_reply_port ? i guess __mig_dealloc_reply_port is actually a destruction operation, right ? AIUI, dealloc is used when one wants the port not to be reused at all because it has been used as a reference for something, and can still be currently in use while put_reply would be when we're really done with it, and won't use it again, and can thus be used as such or at least something like that heh __mig_dealloc_reply_port calls __mach_port_mod_refs, which is a RPC, and creates a new reply port when destroying the current one bah that's fine, it's a deref of the old port, which is not in the reply_port variable any more it's fine, but still a leak well, dealloc does not completely deallocs, yes that's not really the problem here i've introduced a case that wasn't considered at the time, namely that a thread can destroy itself we probably need another function to be called from the thread exit i'll simply try with mach_port_destroy mach_port_destroy seems to be a RPC too ... grmbl isn't there a trap version somehow ? not in libc erf at least i know what's wrong now :) there still is a small memory leak i have to investigate but outside the stack the stack, the thread name and the thread are correctly destroyed slabinfo confirms only one port leak and nothing else is leaked ok so the port leak was indeed the thread-specific reply port, taken care of there are also memory leaks too ## IRC, freenode, #hurd, 2013-09-17 teythoon: on my side, i'm getting to know our threading implementation better closing to clean thread destruction x15 ipc will hide reply ports ;p memory leaks solved \o/ now, have to fix memory release when joining proper reference counting on detach/join/exit, let's see how it goes .. seems to work fine ## IRC, freenode, #hurd, 2013-09-18 ok i'll soon have gnumach and libc packages including proper thread destruction :> braunr: why did you have to touch gnumach? to add a call allowing threads to release ports and memory i.e. their last self reference, their reply port and their stack let me public my current patches braunr: thread_commit_suicide ? hehe initially thread_terminate_self but it can be used by other threads too to i named it thread_terminate_release http://darnassus.sceen.net/~rbraun/0001-pthread_thread_halt.patch http://darnassus.sceen.net/~rbraun/0001-thread_terminate_release.patch the pthread patch needs to be polished because it changes the semantics of pthread_thread_halt but other than that, it should be complete pthread_thread_halt_reallyhalt ok let's try these libc packages old static ext2fs for the root, but other than that, it boots let's try iceweasel (i'll need to build a hurd package against this new libc, removing the libports_stability patch which prevents thread destruction in servers on the way) prevents thread destruction o_O yes in libports only ;p oh, *only* in libports, I assumed for a moment that it affected almost every component of the Hurd... *phew( ... :) that's why, after a burst of messages, say because of aptitude (select), you may see a few hundred threads still hanging around also why unused servers remain running even after several minutes, where the normal timeout is 2mins I wondered about that, some servers (symlink comes to mind) seem to go away if unused (or that's how I read the code) symlinks are usually not servers, since most of them actually exist in file systems, and are implemented through an optimization yes I know that trans/symlink.c reads: /* The timeout here is 10 minutes */ err = mach_msg_server_timeout (fsys_server, 0, control, MACH_RCV_TIMEOUT, 1000 * 60 * 10); if (err == MACH_RCV_TIMED_OUT) exit (0); ok hm, /hurd/symlink doesn't feel at all like a symlink... but works like one well, starting iceweasel makes X on my host freeze oO bbl /hurd/symlink translators do go away after being unused for 10 minutes... this is funny if they are set up by hand instead of being started from a passive translator record magically vanishing symlinks ;) ## IRC, freenode, #hurd, 2013-09-19 hum, i can't rebuild a hurd package :( braunr: with your thread destruction patches in libc? yes but it's unrelated In file included from ../../libdiskfs/boot-start.c:38:0: ./fsys_reply_U.h:173:15: error: conflicting types for ‘fsys_get_children’ i didn't see a new libc debian release hm, David reported that as well id:CAEvUa7=QzOiS41G5Vq8k4AiaN10jAPm+CL_205OHJnL0xpJXbw@mail.gmail.com uh oh it seems I didn't add a _reply suffix to the reply routines :/ there's quite a bit of fallout from my patches, I kinda feel bad :( teythoon: what i'm wondering is what youpi did too, since he got hurd binary packages braunr: well neither he nor I noticed that b/c for us the declarations were just missing from libc you mean ? or hum gnumach-common ? not sure actually no it's not a gnumach thing hurd-dev then the build system should have cought these, or mig... also, i see you changed fsys_reply.defs, but nothing about fsys_request.defs I have no fsys_requests.defs looks like there was no fsys_request.defs in the first place ... *sigh* do you know an application that often creates and destroys threads ? no, sorry maybe some test suite ah right sysbench maybe also, i've been hit by a lot more network deadlocks than usual lately fixing netdde has gained some priority in my todo list ## IRC, freenode, #hurd, 2013-09-20 oh, git is multithreaded great so i've actually tested my libpthread patch quite a lot ## IRC, freenode, #hurd, 2013-09-25 on a side note, i was able to build gnumach/libc/hurd packages with thread destruction nice :) they boot and work mostly fine, although they add their own issues e.g. the comm field of the root ext2fs is empty ps crashes when trying to display threads but thread destruction actually works, i.e. servers (those that are configured that away at least) go away after some time, and even heavily used servers such as ext2fs dynamically scale over time :) ## IRC, freenode, #hurd, 2013-10-10 concerning threads, i think i figured out the last bugs i had with thread destruction it should be well on its way to be merged by the end of the year ## IRC, freenode, #hurd, 2013-10-11 braunr: is your thread destruction patch ready for testing? gg0: there are packages at my repository, yes but i still have hurd fixes to do before i polish it in particular, posix says returning from main() stops the entire process and all other threads i didn't check that during the switch to pthreads, and ext2fs (and maybe others) actually return from main but expect other threads to live on this creates problems when the main thread is actually destroyed, but not the process braunr: tmpfs does something like that, but calls pthread_exit at the end of main same effect this was fine with cthreads, but must be changed with pthreads and libpthread must be fixed to enforce it (or libc) diskfs_startup_diskfs should probably be changed to reuse the main thread instead of returning ## IRC, freenode, #hurd, 2013-10-19 I know what threads are, but what is 'thread destruction'? the hurd currently never destroys individual threads they're destroyed when tasks are destroyed if the number of threads in a task peaks at a high number, say thousands of them, they'll remain until the task is terminated such tasks are usually file systems, normally never restarted (and in the case of the root file system, not restartable) this results in a form of leak another effect of this leak is that servers which should go away because of inactivity still remain since thread destruction doesn't actually work, the debian package uses a patch to prevent worker threads from timeouting and to finish with, since thread destruction actually doesn't work, normal (unpatched) applications that destroy threads are certainly failing bad i just need to polish a few things, wait for youpi to finish his work on TLS to resolve conflicts, and that will be all ## IRC, freenode, #hurd, 2013-10-30 FYI, the packages on my repository enable actual thread destruction, and i've altered the libports_stability.patch it nows only sets the global timeout to 0 now* we actually can't let translator "die" on global timeout because of a race issue tested for about two weeks now and no major problem sighted top reports processes running for 100% of their time when terminating threads, but i expect it's simply mach/proc aggregating their run time to the task 100% of cpu time ## IRC, freenode, #hurd, 2013-11-08 teythoon: darnassus is currently running a modified glibc with thread destruction, yes braunr: did that require any fixups in Hurd that I'd have missed ? no well b/c the resulting hurd package would not boot actually yes one i'll push the patch somewhere iirc the mach-defpager spewed some error and /hurd/init failed to bootstrap the system teythoon: http://darnassus.sceen.net/~rbraun/0001-Prevent-diskfs-translators-from-destroying-main-thre.patch make sure you have the proper gnumach packages too :p well, that could very well account for my trouble ;) uh well gnumach implements thread destruction, glibc uses it, hurd makes sure it doesn't exit from main ## IRC, freenode, #hurd, 2013-11-12 ok so, calling pthread_exit() from main isn't the same as returning from main() unlike what some man pages seem to say so loosing task info when destroying the main thread is actually a proc bug ugh ^^ or a glibc one the proc server, your favorite Hurd component... :) hm :/ looks like command line arguments are stored on the stack of the main thread and proc merely receives the addresses of those in the target task why not just keep the main thread around? it represents a minor resource leak, true yes that's the hack i suggested but it is relatively small well no my hack was about diskfs translators it should be generalized in libpthread seems reasonable let's do it >) ## IRC, freenode, #hurd, 2013-11-13 braunr: there is a thread destruction issue in the experimental ocaml build, worth looking at, probably what do you mean ? ... testing 'testfork.ml': ocamlcocamlrun: ../libpthread/sysdeps/mach/pt-thread-halt.c:51: __pthread_thread_halt: Unexpected error: (ipc/send) invalid destination port. during the experimental ocaml build well yes thread recycling is buggy i had the choice to fix it, or implement true destruction i'm tweaking my patch so it leaves the main thread stack untouched on destruction and it should be ready for review at least ## IRC, OFTC, #debian-hurd, 2013-11-13 ironforge out of memory during ruby1.9.1 rebuild. during test which creates 10000 threads ironforge out of memory during ruby1.9.1 rebuild, test which creates 10000 threads i guess ironforge kernel has been rebuilt against -95, correct? err, what kernel? 23:37 < youpi> hurd needs a rebuild to be able to work with the newer eglibc i mean hurd yes, libc0.3 breaks the old packages anyway wrt ENOMEM, was it expected? wrt disk problems, aren't there on alioth only? well 10,000 threads is a lot, especially on 32bit machine with 2M default stack size that makes 2GiB stacks can't fit in a 2/2 split model, which gnumach uses well, though active thread should die right away, just after set x to false, if i read it correctly perhaps the stacks are not correctly reused that's probably worth digging in libpthread by putting printfs, etc. it seems stacks are never reused indeed, damn I just wrote a small test that creates threads which just print their stack address that takes just a few minutes to do i see. about reusage i guess you mean base address is kindof always incremented * gg0 likes being wrong that's it, yes gg0: take care, by keeping being wrong all the time, sometimes you get right ;) and you are definitely right here :) Mmm, but the stack is really deallocated and the numbers wrap around I wonder how that is :) ok, creating 20 000 threads does work perhaps ruby does odd things which makes it not work ### IRC, OFTC, #debian-hurd, 2013-11-14 UID PID PPID TH MSGI MSGO SZ RSS SC STAT TIME COMMAND 1012 16446 15473 720 987 509 1.89G 23.6M 1 Hu 0:00.15 /home/gg0-guest/ruby/ruby1.9.git/ruby1.9.1 -I/home/gg0-guest/ruby/ruby1.9.git/lib -W0 bootstraptest.tmp.rb 720 threads, stuck 2G SZ is very big :) 00:42 < youpi> perhaps ruby does odd things which makes it not work is that enough to file a ruby bug? as ruby suggests itself btw no, they will probably not be able to investigate but you can already check out how they create threads and try to reproduce the same with a small C program ehm on ruby2.0 with *context _enabled_ i can not reproduce it See [[/open_issues/glibc]] for `*context` functions. ## IRC, freenode, #hurd, 2013-11-14 nice, i got glibc packages with thread destruction building hurd packages against it now everything seems fine hurd packages ready, let's see ruby1.9.1 FTBFS due to a couple of tests https://buildd.debian.org/status/fetch.php?pkg=ruby1.9.1&arch=hurd-i386&ver=1.9.3.448-1&stamp=1384265526 second one creates 10000 threads and machine got ENOMEM bootstraptest.tmp.rb: [BUG] [BUG] pthread_cond_init: Cannot allocate memory (ENOMEM) ew few hours ago trying to reproduce it: 01:20 < gg0> UID PID PPID TH MSGI MSGO SZ RSS SC STAT TIME COMMAND 01:20 < gg0> 1012 16446 15473 720 987 509 1.89G 23.6M 1 Hu 0:00.15 /home/gg0-guest/ruby/ruby1.9.git/ruby1.9.1 -I/home/gg0-guest/ruby/ruby1.9.git/lib -W0 bootstraptest.tmp.rb yes that's expected our stacks are 2M 10k threads means right over 2G of stacks userspace is restricted to 2G but if i read correctly test in question, thread should just set x to false then die so ? and ENOMEM popped upk when there were thread count was at 720 hum 10k threads would actually be 20G 1k threads is 2G 720 is about 1.5G the rest is probably the ruby runtime youpi tried to create 10000 thread, no problem. he guessed something wrong on ruby side indeed on ruby2.0 such test succeeds you can't create 10k threads unless you change the stack size hurd servers use a stack size of 64k by default which allows them to go up to 30k iirc