[[!meta copyright="Copyright © 2012, 2013, 2014 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled [[GNU Free Documentation License|/fdl]]."]]"""]] [[!tag open_issue_libpthread]] `t/fix_have_kernel_resources` Address problem mentioned in [[/libpthread]], *Threads' Death*. # IRC, freenode, #hurd, 2012-08-30 tschwinge: this issue needs more cooperation with the kernel tschwinge: i.e. the ability to tell the kernel where the stack is, so it's unmapped when the thread dies which requiring another thread to perform this deallocation ## IRC, freenode, #hurd, 2013-05-09 braunr: Speaking of which, didn't you say you had another "easy" task? bddebian: make a system call that both terminates a thread and releases memory (the memory released being the thread stack) this way, a thread can completely terminates itself without the assistance of a managing thread or deferring work braunr: That's "easy" ? :) bddebian: since it's just a thread_terminate+vm_deallocate, it is something like thread_terminate_self But a syscall not an RPC right? in hurd terminology, we don't make the distinction the only real syscalls are mach_msg (obviously) and some to get well known port rights e.g. mach_task_self everything else should be an RPC but could be a system call for performance since mach was designed to support clusters, it was necessary that anything not strictly machine-local was an RPC and it also helps emulation a lot so keep doing RPCs :p ## IRC, freenode, #hurd, 2013-05-10 i'm not sure it should only apply to self though youpi: can we get a quick opinion on this please ? i've suggested bddebian to work on a new RPC that both terminates a thread and releases its stack to help fix libpthread and initially, i thought of it as operating only on the calling thread do you see any reason to make it work on any thread ? (e.g. a real thread_terminate + vm_deallocate) (or any reason not to) thread stack deallocation is always a burden indeed I'd tend to think it'd be useful, but perhaps ask the list ## IRC, freenode, #hurd, 2013-06-26 looks like there is a port right leak in libpthread grmbl, the port leak seems to come from mach_port_destroy being buggy :/ hum, apparently we're not the only ones to suffer from port leaks wrt mach_port_destroy ew, libpthread is leaking memory or ports? both sounds great ;) as it is, libpthread doesn't destroy threads it queues them so they're recycled late r but there is confusion between the thread structure itself and its internal resources i.e. there is pthread_alloc which allocates a thread structure, and pthread_create which allocates everything else but on pthread_exit, nothing is destroyed when a thread structure is reused, its internal resources are replaced by new instances oh it's ok for joinable threads but most of our threads are detached pinotree: as expected, it's bigger than expected :p so i won't be able to write a quick fix the true way to fix this is make it possible for threads to free their own resources let's do that :p ok, got the new thread termination function, i'll build eglibc package providing it, then experiment with libpthread braunr: iirc there's also a tschwinge patch in the debian eglibc about that ah libpthread_fix.diff i see thanks for the notice bddebian: http://www.sceen.net/~rbraun/0001-thread_terminate_deallocate.patch bddebian: this is what it looks like see, short and easy Aye but didn't youpi say not to bother with it?? he did ? i don't remember I thought that was the implication. Or maybe that was the one I already did!? i'd be interested in reading that anyway, there still are problems in libpthread, and this call is one building block to fix some of them some important ones (big leaks) ## IRC, freenode, #hurd, 2013-06-29 damn, i fix leaks in libpthread, only to find out leaks somewhere else :( bddebian: ok, actually it was a bit more complicated than what i showed you because in addition to the stack, the call must also release the send right in the caller's ipc space (it can't be released before since there would be no mean to reference the thread to destroy) or perhaps it should strictly be reserved to self termination hmm yes it would probably be simpler but it should be a decent compromise i'm close to having a libpthread that doesn't leak anything and that properly destroys threads and their resources ## IRC, freenode, #hurd, 2013-06-30 bddebian: ok, it was even more tricky, because the kernel would save the return value on the user stack (which is released by the call and then invalid) before checking for asynchronous software traps (ASTs, a kind of software interrupts in mach), and terminating the calling thread is done by a deferred AST ... :) hmm, making threads able to terminate themselves makes rpctrace a bit useless :/ well, more restricted ok so, tough question : i have a small test program that creates a thread, and inspect its state before any thread dies i can see msg_report_wait requests when using ps (one per thread) one of these requests create a new receive right, apparently for the second thread in the test program each time i use ps, i can see the sequence numbers of two receive rights increase i guess these rights are related to proc and signal handling per thread but i can't find what create them does anyone know ? tschwing_: ^ :) again, too many things wrong elsewhere to cleanly destroy threads .. something is deeply wrong with controlling terminals .. ## IRC, freenode, #hurd, 2013-07-01 youpi: if you happen to notice what receive right is created for each thread (beyond the obvious port used for blocking and waking up), please let me know it's the only port leak i have with thread destruction and i think it's related to the proc server since i see the sequence number increase every time i use ps pinotree: my change doesn't fix all the pthread leaks but it's a lot better bddebian: i've spent almost the whole week end trying to find the last port leak without success there is some weird bug related to the controlling tty that hits me every time i try to change something it's the same bug that prevents ttys from being correctly closed when using ssh or screen well maybe not the same, but it's close some stale receive right kept around for no apparent reason and i can't find its source ## IRC, freenode, #hurd, 2013-07-02 and btw, i don't think i can make my libpthread patch work i'll just aim at avoiding leaks, but destroying threads and their related resources depends on other changes i don't clearly see ## IRC, freenode, #hurd, 2013-07-03 grmbl, i don't want to give up thread destruction .. ## IRC, freenode, #hurd, 2013-07-15 btw, my work on thread destruction is currently stalled i don't have much free time right now ## IRC, freenode, #hurd, 2013-09-13 i think i know why my thread_terminate_deallocate patches leak one receive port :> but now i'm not sure of the proper solution every time a thread is created and destroyed, a receive right is leaked i guess it's simply the reply port .. grmbl i guess i have to make it a simpleroutine ... hm too bad, it's not the reply port :( it's also leaking some memory it doesn't seem related to my changes though stacks, rights, and threads are correctly destroyed some obscure state is left behind i wonder how exception ports are dealt with vminfo seems to confirm memory is leaking in the heap humpf oh silly me i don't detach threads well, detach them ;) hm worse :p now i get additional dead names but it's a step forward ## IRC, freenode, #hurd, 2013-09-16 that thread port leak is so strange the leaked port seems to be created when the new thread starts running so it looks like a port the kernel would implicitely create hm could it be a thread-specific reply port ? ah, yes, there is one of those how come mach/mig-reply.c in glibc isn't thread-safe ? it is overriden by sysdeps/mach/hurd/img-reply.c I guess which uses a threadvar for the mig reply port oh talking of which, there is also last_value in sysdeps/mach/strerror_l.c strerror_thread_freeres is supposed to get called, but who knows it does look to be that port iirc that's the issue which prevents from letting us make threads exit on idleness? one of them ok maybe the only one, yes i see memory leaks but they could be related/normal (i.e. not actual leaks) on the other hand, i also can't boot a hurd with my patch but i consider removing such leaks a priority does anyone know the semantic difference between __mig_put_reply_port and __mig_dealloc_reply_port ? i guess __mig_dealloc_reply_port is actually a destruction operation, right ? AIUI, dealloc is used when one wants the port not to be reused at all because it has been used as a reference for something, and can still be currently in use while put_reply would be when we're really done with it, and won't use it again, and can thus be used as such or at least something like that heh __mig_dealloc_reply_port calls __mach_port_mod_refs, which is a RPC, and creates a new reply port when destroying the current one bah that's fine, it's a deref of the old port, which is not in the reply_port variable any more it's fine, but still a leak well, dealloc does not completely deallocs, yes that's not really the problem here i've introduced a case that wasn't considered at the time, namely that a thread can destroy itself we probably need another function to be called from the thread exit i'll simply try with mach_port_destroy mach_port_destroy seems to be a RPC too ... grmbl isn't there a trap version somehow ? not in libc erf at least i know what's wrong now :) there still is a small memory leak i have to investigate but outside the stack the stack, the thread name and the thread are correctly destroyed slabinfo confirms only one port leak and nothing else is leaked ok so the port leak was indeed the thread-specific reply port, taken care of there are also memory leaks too ## IRC, freenode, #hurd, 2013-09-17 teythoon: on my side, i'm getting to know our threading implementation better closing to clean thread destruction x15 ipc will hide reply ports ;p memory leaks solved \o/ now, have to fix memory release when joining proper reference counting on detach/join/exit, let's see how it goes .. seems to work fine ## IRC, freenode, #hurd, 2013-09-18 ok i'll soon have gnumach and libc packages including proper thread destruction :> braunr: why did you have to touch gnumach? to add a call allowing threads to release ports and memory i.e. their last self reference, their reply port and their stack let me public my current patches braunr: thread_commit_suicide ? hehe initially thread_terminate_self but it can be used by other threads too to i named it thread_terminate_release http://darnassus.sceen.net/~rbraun/0001-pthread_thread_halt.patch http://darnassus.sceen.net/~rbraun/0001-thread_terminate_release.patch the pthread patch needs to be polished because it changes the semantics of pthread_thread_halt but other than that, it should be complete pthread_thread_halt_reallyhalt ok let's try these libc packages old static ext2fs for the root, but other than that, it boots let's try iceweasel (i'll need to build a hurd package against this new libc, removing the libports_stability patch which prevents thread destruction in servers on the way) prevents thread destruction o_O yes in libports only ;p oh, *only* in libports, I assumed for a moment that it affected almost every component of the Hurd... *phew( ... :) that's why, after a burst of messages, say because of aptitude (select), you may see a few hundred threads still hanging around also why unused servers remain running even after several minutes, where the normal timeout is 2mins I wondered about that, some servers (symlink comes to mind) seem to go away if unused (or that's how I read the code) symlinks are usually not servers, since most of them actually exist in file systems, and are implemented through an optimization yes I know that trans/symlink.c reads: /* The timeout here is 10 minutes */ err = mach_msg_server_timeout (fsys_server, 0, control, MACH_RCV_TIMEOUT, 1000 * 60 * 10); if (err == MACH_RCV_TIMED_OUT) exit (0); ok hm, /hurd/symlink doesn't feel at all like a symlink... but works like one well, starting iceweasel makes X on my host freeze oO bbl /hurd/symlink translators do go away after being unused for 10 minutes... this is funny if they are set up by hand instead of being started from a passive translator record magically vanishing symlinks ;) ## IRC, freenode, #hurd, 2013-09-19 hum, i can't rebuild a hurd package :( braunr: with your thread destruction patches in libc? yes but it's unrelated In file included from ../../libdiskfs/boot-start.c:38:0: ./fsys_reply_U.h:173:15: error: conflicting types for ‘fsys_get_children’ i didn't see a new libc debian release hm, David reported that as well id:CAEvUa7=QzOiS41G5Vq8k4AiaN10jAPm+CL_205OHJnL0xpJXbw@mail.gmail.com uh oh it seems I didn't add a _reply suffix to the reply routines :/ there's quite a bit of fallout from my patches, I kinda feel bad :( teythoon: what i'm wondering is what youpi did too, since he got hurd binary packages braunr: well neither he nor I noticed that b/c for us the declarations were just missing from libc you mean ? or hum gnumach-common ? not sure actually no it's not a gnumach thing hurd-dev then the build system should have cought these, or mig... also, i see you changed fsys_reply.defs, but nothing about fsys_request.defs I have no fsys_requests.defs looks like there was no fsys_request.defs in the first place ... *sigh* do you know an application that often creates and destroys threads ? no, sorry maybe some test suite ah right sysbench maybe also, i've been hit by a lot more network deadlocks than usual lately fixing netdde has gained some priority in my todo list ## IRC, freenode, #hurd, 2013-09-20 oh, git is multithreaded great so i've actually tested my libpthread patch quite a lot ## IRC, freenode, #hurd, 2013-09-25 on a side note, i was able to build gnumach/libc/hurd packages with thread destruction nice :) they boot and work mostly fine, although they add their own issues e.g. the comm field of the root ext2fs is empty ps crashes when trying to display threads but thread destruction actually works, i.e. servers (those that are configured that away at least) go away after some time, and even heavily used servers such as ext2fs dynamically scale over time :) ## IRC, freenode, #hurd, 2013-10-10 concerning threads, i think i figured out the last bugs i had with thread destruction it should be well on its way to be merged by the end of the year ## IRC, freenode, #hurd, 2013-10-11 braunr: is your thread destruction patch ready for testing? gg0: there are packages at my repository, yes but i still have hurd fixes to do before i polish it in particular, posix says returning from main() stops the entire process and all other threads i didn't check that during the switch to pthreads, and ext2fs (and maybe others) actually return from main but expect other threads to live on this creates problems when the main thread is actually destroyed, but not the process braunr: tmpfs does something like that, but calls pthread_exit at the end of main same effect this was fine with cthreads, but must be changed with pthreads and libpthread must be fixed to enforce it (or libc) diskfs_startup_diskfs should probably be changed to reuse the main thread instead of returning ## IRC, freenode, #hurd, 2013-10-19 I know what threads are, but what is 'thread destruction'? the hurd currently never destroys individual threads they're destroyed when tasks are destroyed if the number of threads in a task peaks at a high number, say thousands of them, they'll remain until the task is terminated such tasks are usually file systems, normally never restarted (and in the case of the root file system, not restartable) this results in a form of leak another effect of this leak is that servers which should go away because of inactivity still remain since thread destruction doesn't actually work, the debian package uses a patch to prevent worker threads from timeouting and to finish with, since thread destruction actually doesn't work, normal (unpatched) applications that destroy threads are certainly failing bad i just need to polish a few things, wait for youpi to finish his work on TLS to resolve conflicts, and that will be all ## IRC, freenode, #hurd, 2013-10-30 FYI, the packages on my repository enable actual thread destruction, and i've altered the libports_stability.patch it nows only sets the global timeout to 0 now* we actually can't let translator "die" on global timeout because of a race issue tested for about two weeks now and no major problem sighted top reports processes running for 100% of their time when terminating threads, but i expect it's simply mach/proc aggregating their run time to the task 100% of cpu time ## IRC, freenode, #hurd, 2013-11-08 teythoon: darnassus is currently running a modified glibc with thread destruction, yes braunr: did that require any fixups in Hurd that I'd have missed ? no well b/c the resulting hurd package would not boot actually yes one i'll push the patch somewhere iirc the mach-defpager spewed some error and /hurd/init failed to bootstrap the system teythoon: http://darnassus.sceen.net/~rbraun/0001-Prevent-diskfs-translators-from-destroying-main-thre.patch make sure you have the proper gnumach packages too :p well, that could very well account for my trouble ;) uh well gnumach implements thread destruction, glibc uses it, hurd makes sure it doesn't exit from main ## IRC, freenode, #hurd, 2013-11-12 ok so, calling pthread_exit() from main isn't the same as returning from main() unlike what some man pages seem to say so loosing task info when destroying the main thread is actually a proc bug ugh ^^ or a glibc one the proc server, your favorite Hurd component... :) hm :/ looks like command line arguments are stored on the stack of the main thread and proc merely receives the addresses of those in the target task why not just keep the main thread around? it represents a minor resource leak, true yes that's the hack i suggested but it is relatively small well no my hack was about diskfs translators it should be generalized in libpthread seems reasonable let's do it >) ## IRC, freenode, #hurd, 2013-11-13 braunr: there is a thread destruction issue in the experimental ocaml build, worth looking at, probably what do you mean ? ... testing 'testfork.ml': ocamlcocamlrun: ../libpthread/sysdeps/mach/pt-thread-halt.c:51: __pthread_thread_halt: Unexpected error: (ipc/send) invalid destination port. during the experimental ocaml build well yes thread recycling is buggy i had the choice to fix it, or implement true destruction i'm tweaking my patch so it leaves the main thread stack untouched on destruction and it should be ready for review at least ## IRC, OFTC, #debian-hurd, 2013-11-13 ironforge out of memory during ruby1.9.1 rebuild. during test which creates 10000 threads ironforge out of memory during ruby1.9.1 rebuild, test which creates 10000 threads i guess ironforge kernel has been rebuilt against -95, correct? err, what kernel? 23:37 < youpi> hurd needs a rebuild to be able to work with the newer eglibc i mean hurd yes, libc0.3 breaks the old packages anyway wrt ENOMEM, was it expected? wrt disk problems, aren't there on alioth only? well 10,000 threads is a lot, especially on 32bit machine with 2M default stack size that makes 2GiB stacks can't fit in a 2/2 split model, which gnumach uses well, though active thread should die right away, just after set x to false, if i read it correctly perhaps the stacks are not correctly reused that's probably worth digging in libpthread by putting printfs, etc. it seems stacks are never reused indeed, damn I just wrote a small test that creates threads which just print their stack address that takes just a few minutes to do i see. about reusage i guess you mean base address is kindof always incremented * gg0 likes being wrong that's it, yes gg0: take care, by keeping being wrong all the time, sometimes you get right ;) and you are definitely right here :) Mmm, but the stack is really deallocated and the numbers wrap around I wonder how that is :) ok, creating 20 000 threads does work perhaps ruby does odd things which makes it not work ### IRC, OFTC, #debian-hurd, 2013-11-14 UID PID PPID TH MSGI MSGO SZ RSS SC STAT TIME COMMAND 1012 16446 15473 720 987 509 1.89G 23.6M 1 Hu 0:00.15 /home/gg0-guest/ruby/ruby1.9.git/ruby1.9.1 -I/home/gg0-guest/ruby/ruby1.9.git/lib -W0 bootstraptest.tmp.rb 720 threads, stuck 2G SZ is very big :) 00:42 < youpi> perhaps ruby does odd things which makes it not work is that enough to file a ruby bug? as ruby suggests itself btw no, they will probably not be able to investigate but you can already check out how they create threads and try to reproduce the same with a small C program ehm on ruby2.0 with *context _enabled_ i can not reproduce it See [[/open_issues/glibc]] for `*context` functions. ## IRC, freenode, #hurd, 2013-11-14 nice, i got glibc packages with thread destruction building hurd packages against it now everything seems fine hurd packages ready, let's see ruby1.9.1 FTBFS due to a couple of tests https://buildd.debian.org/status/fetch.php?pkg=ruby1.9.1&arch=hurd-i386&ver=1.9.3.448-1&stamp=1384265526 second one creates 10000 threads and machine got ENOMEM bootstraptest.tmp.rb: [BUG] [BUG] pthread_cond_init: Cannot allocate memory (ENOMEM) ew few hours ago trying to reproduce it: 01:20 < gg0> UID PID PPID TH MSGI MSGO SZ RSS SC STAT TIME COMMAND 01:20 < gg0> 1012 16446 15473 720 987 509 1.89G 23.6M 1 Hu 0:00.15 /home/gg0-guest/ruby/ruby1.9.git/ruby1.9.1 -I/home/gg0-guest/ruby/ruby1.9.git/lib -W0 bootstraptest.tmp.rb yes that's expected our stacks are 2M 10k threads means right over 2G of stacks userspace is restricted to 2G but if i read correctly test in question, thread should just set x to false then die so ? and ENOMEM popped upk when there were thread count was at 720 hum 10k threads would actually be 20G 1k threads is 2G 720 is about 1.5G the rest is probably the ruby runtime youpi tried to create 10000 thread, no problem. he guessed something wrong on ruby side indeed on ruby2.0 such test succeeds you can't create 10k threads unless you change the stack size hurd servers use a stack size of 64k by default which allows them to go up to 30k iirc but normal applications use the default 2M i guess you mean 10000 threads active at the same time. test in question should make them die after simply setting x to false, i guess youpi's test did so as well no it's about stacks hm yes at the same time but thread recycling is known to be buggy which is what i'm currently fixing btw what's the bug? neal: there are several subtle issues for example, joining a thread that is also calling pthread_exit can fail badly hmm good that you are on it then :) or detaching i don't remember the details but i remember such problems apparently, keeping the stack of the main thread isn't enough :( for now, i'll keep the entire thread ## IRC, freenode, #hurd, 2013-11-15 i wasn't doing anything, just some single test runs. but yes, also that one which creates hundreds of threads it would like creating 10000 but goes out of memory after ~720 btw same tests succeed on ruby2.0, so they should be fixed by backporting some changes actually it looks more like a deadlock .. deadlock that says ENOMEM? ? ENOMEM is returned because the test task has no more virtual memory this doesn't mean the rest of the system should fail ok i thought you were talking about such test no it's something else a deadlock in a critical server the root file system maybe braunr: htop and ps hang. just run the test once again now you should still be able to login htop/ps hanging means one process is unable to reply to queries sent to the message port/thread procfs does that to report on what a process is waiting it usually mean there is a bug around signals, since the message thread is also in charge of delivering signals use ps -eM and kill -KILL hum root 954 S dumping cores is known not to work most of the time exodar shouldn't be configured like that so yes, the crash server is hanging gg0: i've set it to crash --kill and killed the hanging crash instances blocking top/ps nice my thread destruction patch and tls are indeed conflicting a bit i suspect the tcb is used after being freed i think i'll simply recycle the tcb, along with the pthread structs ok i think it's fine now there was also a small bug in the tls code, keeping a reference on the thread port mach reference counting is so counter intuitive :/ well, error-prone argh, more bugs in libc :( :/ but don't worry, there is always one more bug ;) this one might explain crashes that are long to trigger _hurd_self_sigstate() is implemented like this : _hurd_thread_sigstate (__mach_thread_self ()); it leaks a reference on the current thread each time it's called >,< but glibc maintains such references, so if the maximum value is reached, and references are dropped, the value can reach 0 ouch at which point any call on a thread will result in an invalid send right and probably an assertion well it's a good thing then that you found it :) i think it's always been there but it's more apparent since jknoenig's patch on signal dispositions the maximum number of user references in mach is 64k this right leak isn't easy tls is very tricky heh :) for the main thread, tls initialization happens after the thread creation, obviously but for other threads, it's initialized before starting them the leak was probably an overlook caused by that complexity teythoon: actually that leak i mentioned in _hurd_self_sigstate has only been recently added in Convert sigstate to TLS so it's merely tls integration polishing youpi: i'm currently reviewing changes related to tls and i think there is a bug in _hurd_self_sigstate calls to mach_thread_self() should be paired with mach_port_deallocate to avoid urefs overflows and right leaks _hurd_critical_section_lock is probably affected too hm mhmm in glibc, hurd/hurd/signal.h, _hurd_critical_section_lock why is the sigstate unlocked after the call to _hurd_thread_sigstate _hurd_thread_sigstate doesn't seem to lock it .. unless __spin_lock_init does it yes, leak solved :) ## IRC, freenode, #hurd, 2013-11-16 argh, _hurd_critical_section_lock is called before the send right on the main thread is fetched in libpthread :/ is that bad ? the sigstate is supposed to be initialized after pthreads _hurd_critical_section_lock will create it if it sees there is none creating the sigstate is currently what makes the send right leak ok it's bad then it may be due to my patch _hurd_critical_section_lock is called during pthreads initializatio n before the sigstate for the main thread is created, but after the pthread init routine is called it does indeed look like the code wasn't written with thread being destroyed some day in mind :/ braunr: btw, if you ever feel like benchmarking, sysbench has a benchmark for threads contending for a lock yes i've used it before was it useful for this purpose ? no :) :/ we already know libpthread isn't optimized and felt it when we switched from cthreads humpf simply calling malloc implies a call to _hurd_critical_section_lock on the other hand, unlike what some glibc comments say, this does work ## IRC, freenode, #hurd, 2013-11-17 looks like i've fixed all leak issues with thread destruction and tls :) let's see if ext2fs.static works fine too braunr: \o/ sorry about introducing the tls ones :) no worries, it was expected and tls was really needed :) i mean, i expected to have some problems when rebasing on tls :p braunr: this is good news, how is your rootfs translator holding up? building hurd packages right now for now, only test applications and a few really multithreaded ones (e.g. iceweasel) have been tested well, the system boots :) awesome :) stressing the file system with git while watching youtube videos with gnash doesn't make the system crash you can actually watch yt videos on your Hurd box ? yes for a while now o_O can't you ? I never even dared to try hehe teythoon: looks stable enough to install on darnassus ## IRC, freenode, #hurd, 2013-11-18 braunr: wrt to your thread destruction patchset, I thought you also had to fix the proc server ? teythoon: no the problem was in glibc i may have to fix proc/procfs though, because cpu time gets wrong with the patch currently, it's the addition of the cpu time of all threads mach provides aggregate times including destroyed threads though ah, I see one side effect is that you'll see processes sometimes taking 100% of cpu time although the cpu is unused or the cpu time of a process gets reduced :) i guess the 100% cpu is how top sees a negative increment ^^ gg0: do my threadterm packages help with ruby1.9 ? i mean, can you test with them some time ? :) ## IRC, freenode, #hurd, 2013-11-21 youpi: ping about my question regarding error handling in the proposed thread_terminate_release call I agree with what Neal said he didn't say anything about error handling see http://lists.gnu.org/archive/html/bug-hurd/2013-11/msg00181.html i think i should make the call fail on first error it shouldn't happen, so it would merely serve to catch bugs it's not easily recoverable (if it's recoverable at all) uh, I thought he had I must have dreamt i think i'll go ahead with thread destruction integration ## IRC, freenode, #hurd, 2013-11-25 i've pushed the thread destruction patches for gnumach upstream and made a branch in glibc for that too awesome :) youpi: i don't remember how glibc changes should be managed once those are applied, i'll commit in libpthread braunr: usually we create a topgit branch, and then we add the patch from that to the debian repository ## IRC, freenode, #hurd, 2013-11-29 youpi: i still have a leak somewhere with the thread destruction patches maybe on the host priv port in bootstrap servers (root fs and proc server) it prevents priority adjusting in libports and can easily bring down a system because servers can start trashing a lot sooner, as it was the case during the pthread migration See discussion about that on [[/open_issues/libpthread]]. so i'll hunt it down before merging ## IRC, freenode, #hurd, 2013-12-19 darnassus still has the libports priority adjustement leaks i'll apply a few more patches to my hurd packages humpf, proc seems to have a problem getting the host priv port :/ thats bad what did you do ? i fixed all the leaks in libports when adjusting priorities the last one being releasing the host priv right and i get errors at boot time from the proc server remember when i had this problem ? proc doesn't get the host priv port the normal way since the normal way is to get it from proc iirc ah, thought you fixed that so i guess the alternate way doesn't add a reference well the leak is fixed the problem you had was due to the leak which made the host priv port reach its max uref value now it's just the proc server the system works fine though for real ? the proc server needs the host priv port for getting the new tasks well yes how can it work w/o it ? i don't know .. i guess the problem is internal to glibc i mean, get_priv_ports fails, but that doesn't mean the host priv port is lost could be are you running a patched rootfs translator too ? yes ok b/c i remember having trouble with that right, the glibc call would make proc call __proc_getprivports hum teythoon: do you remember how proc gets its host priv port ? from init i think startup_procinit ? possibly right so it's probably not the host priv port i mean, the error is about another invalid send right hm nope, it is on host_priv :/ hm ok i see, looks like a bug from a debian patch or rather, a bug fix not yet imported into the debian package teythoon: you actually fixed it in 2c9422595f41635e2f4f7ef1afb7eece9001feae great :) ah, that one i was looking at the upstream code and couldn't understand what was going wrong :) much better except ps -eT doesn't work any more .. interestingly, with the thread destruction patch, ps -eT sometimes work, and sometimes doesn't the behaviour doesn't seem to change without a reboot and of course, as soon as i say it, i'm proven wrong by the next test :) ## IRC, freenode, #hurd, 2013-12-26 __pthread_sigstate_init doesn't seem to be converted to TLS in the upstream repository master branch ah dammit, the global signal dispositions patch touches both glibc and libpthread @#! what a mess youpi: do you have some time to quickly review the rbraun/thread_destruction branch in libpthread ? there might be conflict with some glibc patches or do you prefer it on the mailing list ? (i used a branch because it's not based on master) rather mail the list, yes ok it'd also be useful to write the rationale probably to be left as comment in the source code yes, that branch was for personal storage :) so the reader knows how things are recycled or not hm that should already be the case ok the two structures that are still recycled are the pthread struct and tls it's quite obvious from pthread_alloc and well commented there for tls, it's explained in pthread_exit there, thread destruction finally merged in and now, we can remove the ugly hacks that were done for threadvars :) change stacks at will and support all sorts of weird languages and runtimes braunr: cool :) ## IRC, freenode, #hurd, 2013-12-31 braunr: I've added sigstate_locking, sigstate_thread_reference and tls_thread_leak to the debian glibc 2.18 package I believe that's complete? is mach_msg_uspace_options ready for being added? Does it bring much speedup? AIUI, thread_terminate_release is the union of the branches mentioned above? (I'm cleaning up branches in the glibc repo) youpi1: mach_msg_uspace_options can be left over, it only affects selects and not noticeably yes, those three branches are the only ones needed for thread destruction ok does the hurd changes depend on these changes ? no good :) only on tls for one of them (it's about the default stack size of 64k for hurd servers) and we have had this in debian for a long time already :) yes (how big were they before?) (where they a couple MiB, and thus exploding to GiBs on thousands of threads?) 64k pthread stacks are 2M by default yes ## IRC, freenode, #hurd, 2014-01-14 braunr: it seems your time change in libps made ps produce odd re results samy 10987 5 -514358:-18:-42.17 /hurd/firmlink tmp youpi: wow :) that change is supposed to run on a system where threads actually get destroyed but i don't see what could trigger this side effect root 8629 664 56 years make -j 3 :) heh youpi: does the hurd package on darnassus include that patch ? yes i don't reproduce the problem :/ err what command are you using ? ps -feM on darnassus root 29642 473 7 months /usr/sbin/sshd -R hmmmm i don't see it with a make -j well, it's not systematic it's like once over two launches hhhhmmmmm it'd look like some random numbers get added strangely, the gcc processes started by a recursive make aren't children of make .. ps -eF hurd seems to report the correct values even ps -eM oO ps -ef too the problem seems to be with ps -efM too bad I'm always using that :) another way to see it is that it makes us spot the issue ;p ### IRC, freenode, #hurd, 2014-01-15 ok i have an idea of what goes wrong in libps youpi: for some reason, ps -efM lacks the PSTAT_TASK_BASIC flag my patch is wrong since it doesn't try to determine whether the stats apply to a task or a thread, but that is easy to fix ps -efM should nonetheless provide basic task info, obviously in addition, the problems i've observed with ps -T (occasional segfaults) seem to have existed before thread destruction they're just strongly exposed now that the thread list can be shrunk libps is quite complicated even hairy, i'd say .. ### IRC, freenode, #hurd, 2014-01-16 youpi: i think i have a proper fix for libps i'll commit it soon ok basically, getting system times simply set the PSTAT_THREAD_BASIC flag whereas getting the run time of the terminated threads requires PSTAT_TASK_BASIC i assumed it was always set in the function i changed when dealing with a task and not a thread and well, that was a wrong assumtion, -M can remove it if not strictly needed by the format the default format asks for suspend_count, which forces the retrieval of task basic info, os it works with -eM but -f doesn't :) so extremely bad lucky combination of flags :) indeed i added a pstat_times using the last (!) available flag bit looks clean to me i hope there is no abi issue (at least everything works with the unmodified ps-hurd executable and a new libps.so) hm, small bug in the thread destruction patch :/ ### IRC, freenode, #hurd, 2014-01-17 good, i have proper fixes for tls in the main thread and thread termination :) awesome :) i've been wondering, what does it take to get the thread destruction stuff into the debian package ? i still have to build test packages, look for (unlikely, heh) regressions and work some integration details with samuel hum the main thread tls fixup i guess youpi was waiting for me to fix that gnumach already provides the RPC so it will be in glibc soon i just have to get those last bits right teythoon: i'm quite slow at integrating stuff and samuel then builds packages ? i mean, is our libc package build linked to the other libc packages ? libpthread is applied as a patch to glibc and loaded as a plugin ## IRC, freenode, #hurd, 2014-01-17 uhm, did we break fakeroot-tcp ? we did ? fakeroot-tcp just works fine on buildds with fakeroot-tcp, i get make[4]: Entering directory `/home/rbraun/devel/debian/packages/hurd/hurd-0.5.git20140113/libdde-linux26/contrib/include' rm -f .general.d make[4]: *** [cleanall] Killed when cleaning the package before building .. ### IRC, freenode, #hurd, 2014-01-18 damn, fakeroot-tcp won't work on darnassus .. uh, looks like my tls/thread destruction "fixes" do cause regressions :( fakeroot works fine with debian glibc which one ? which fakeroot i mean -tcp yes, it fails as soon as i use the patched glibc :/ at least it's easy to reproduce ### IRC, freenode, #hurd, 2014-01-20 great, 3rd libc version installed on darnassus, let's see if i can build hurd packages against that ### IRC, freenode, #hurd, 2014-01-21 damn, fakeroot-tcp still crashes with my latest changes .... darnassus looks in good shape youpi: ^ youpi: if you have other tests, feel free to do them now i feel confident about committing the changes, if you're ok with it which changes ? I'm a bit lost in what you were talking about :) you can find them in 2 patches in /var/tmp on darnassus one is about fixing thread destruction i'm pretty certain about this one so i'll commit it directly the other is fixing the tcb of the main thread [[open_issues/libpthread]]. where i simply do tcb->self = thread->kernel_thread :) with a comment explaining why i don't do something else like deallocating the unused tcb braunr: ok, that looks good braunr: awesome :) youpi: ok ### IRC, freenode, #hurd, 2014-01-22 there, libpthread should be fine now ## IRC, freenode, #hurd, 2014-02-06 youpi: in case you're planning to upgrade glibc (or not), the thread destruction changes are complete youpi: darnassus has been running them for some weeks with no visible regression braunr: ok, good including it in glibc was on my todo list indeed and Adam indeed plan for a 2.18 upload good :) braunr: this is up to 7c6dc6e28b2fc4b67934223f41cf080ffe58b230, right? (Wed Jan 22, Fix up the main thread TCB) yes oh, i just saw 2.17-98~0 glibc packages on debian-ports :) yes, it's just to fix the dhcp crash ah yes, it's not 2.18 2.18 is available in experimental braunr: just to make sure: did you have 983b18a6ff16f5687a9ece63a50d1831dec88609 in libc on darnassus? (which drops the stack size hack) youpi: let me check youpi: ah no, i don't, you're right well, I was just wondering, nothing make me think that was the case :) what was the issue that it was raising btw? threadvards ok, b ut in which case? (to make sure I test that before committing) now that we switched to tls, i would assume the transition path to be 1/ hurd stops defining that symbol, 2/ libpthread can stop using it the goal was to reduce the stack size of hurd server threads well, that's not my question :) I'm wondering in which precise case that was breaking things youpi: i don't know, it shouldn't break ok youpi: just in case, don't forget that last one line patch i committed last night, fakeroot can't work right without it (i made a minor change while reviewing before comitting, and obviously got it wrong :p) ok braunr: I've upgraded libpthread in debian's eglibc btw /home/rbraun/devel/debian/packages/eglibc/eglibc-2.17/build-tree/hurd-i386-libc/libc.so.phdr: *** executable stack signaled from build-tree/hurd-i386-libc/elf/check-execstack.out i thought glibc didn't use those anyway it doesn't look to be the regression i'm having does this ring a bell : Encountered regressions that don't match expected failures (debian/testsuite-checking/expected-results-i486-gnu-libc): test-stpcpy_chk.out, Error 1 TEST test-stpcpy_chk.out: __stpcpy_chk normal_stpcpy simple_stpcpy_chk nope after what are you getting this regression? building glibc 2.17-97 with thread destruction patches, including the one removing the stack size hack during tests there also are "progressions", but i'm not sure what these are some progressions are just luck, other seem to happen on some platforms only I'm not sure you want to test 2.17 a lot has changed between 2.17's libpthread and 2.18's libpthread (which is now equal to cvs's libpthread ) s/cvs/git/ yes i usually build with nocheck ## IRC, freenode, #hurd, 2014-02-07 youpi: on a vm with hurd 1:0.5.git20140203-1, upgrading to a patched glibc 2.17-97 that includes the patch which reverts the stack size hack, the system reboots and works fine ok. I don't remember what problem I was seeing that version of the hurd no longer defines the symbol but even then, there shouldn't have been any problem hm, or does it yes, it does youpi: the hurd package patch mentions Revert this for now, will have to wait for dropping the use of __pthread_stack_default_size from eglibc's libpthread_hurd_cond_wait.diff i wonder how it got there IIRC I was wondering too i've installed my c library on darnassus and it works fine there too with older (january) hurd packages looks good to me ## IRC, freenode, #hurd, 2014-02-10 braunr: btw, do the new libc packages contain your thread destruction work ? teythoon: the -98 ones on experimental ? i don't think they do the -18 ones should do