From ea874bc0797b8a3e5dbac278178b74f777e08d2c Mon Sep 17 00:00:00 2001 From: guy fleury iteriteka Date: Sat, 2 Jan 2021 12:12:09 +0200 Subject: rename open_issues/libpthread/t/fix_have_kernel_resources.mdwn -> open_issues/libpthread/fix_have_kernel_resources.mdwn Message-Id: <20210102101217.8372-4-gfleury@disroot.org> --- .../libpthread/t/fix_have_kernel_resources.mdwn | 1301 -------------------- 1 file changed, 1301 deletions(-) delete mode 100644 open_issues/libpthread/t/fix_have_kernel_resources.mdwn (limited to 'open_issues/libpthread/t') diff --git a/open_issues/libpthread/t/fix_have_kernel_resources.mdwn b/open_issues/libpthread/t/fix_have_kernel_resources.mdwn deleted file mode 100644 index 02b6ab05..00000000 --- a/open_issues/libpthread/t/fix_have_kernel_resources.mdwn +++ /dev/null @@ -1,1301 +0,0 @@ -[[!meta copyright="Copyright © 2012, 2013, 2014 Free Software Foundation, -Inc."]] - -[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable -id="license" text="Permission is granted to copy, distribute and/or modify this -document under the terms of the GNU Free Documentation License, Version 1.2 or -any later version published by the Free Software Foundation; with no Invariant -Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license -is included in the section entitled [[GNU Free Documentation -License|/fdl]]."]]"""]] - -[[!tag open_issue_libpthread]] - -`t/fix_have_kernel_resources` - -Address problem mentioned in [[/libpthread]], *Threads' Death*. - - -# IRC, freenode, #hurd, 2012-08-30 - - tschwinge: this issue needs more cooperation with the kernel - tschwinge: i.e. the ability to tell the kernel where the stack is, - so it's unmapped when the thread dies - which requiring another thread to perform this deallocation - - -## IRC, freenode, #hurd, 2013-05-09 - - braunr: Speaking of which, didn't you say you had another "easy" - task? - bddebian: make a system call that both terminates a thread and - releases memory - (the memory released being the thread stack) - this way, a thread can completely terminates itself without the - assistance of a managing thread or deferring work - braunr: That's "easy" ? :) - bddebian: since it's just a thread_terminate+vm_deallocate, it is - something like thread_terminate_self - But a syscall not an RPC right? - in hurd terminology, we don't make the distinction - the only real syscalls are mach_msg (obviously) and some to get - well known port rights - e.g. mach_task_self - everything else should be an RPC but could be a system call for - performance - since mach was designed to support clusters, it was necessary that - anything not strictly machine-local was an RPC - and it also helps emulation a lot - so keep doing RPCs :p - - -## IRC, freenode, #hurd, 2013-05-10 - - i'm not sure it should only apply to self though - youpi: can we get a quick opinion on this please ? - i've suggested bddebian to work on a new RPC that both terminates - a thread and releases its stack to help fix libpthread - and initially, i thought of it as operating only on the calling - thread - do you see any reason to make it work on any thread ? - (e.g. a real thread_terminate + vm_deallocate) - (or any reason not to) - thread stack deallocation is always a burden indeed - I'd tend to think it'd be useful, but perhaps ask the list - - -## IRC, freenode, #hurd, 2013-06-26 - - looks like there is a port right leak in libpthread - grmbl, the port leak seems to come from mach_port_destroy being - buggy :/ - hum, apparently we're not the only ones to suffer from port leaks - wrt mach_port_destroy - ew, libpthread is leaking - memory or ports? - both - sounds great ;) - as it is, libpthread doesn't destroy threads - it queues them so they're recycled late - r - but there is confusion between the thread structure itself and its - internal resources - i.e. there is pthread_alloc which allocates a thread structure, - and pthread_create which allocates everything else - but on pthread_exit, nothing is destroyed - when a thread structure is reused, its internal resources are - replaced by new instances - oh - it's ok for joinable threads but most of our threads are detached - pinotree: as expected, it's bigger than expected :p - so i won't be able to write a quick fix - the true way to fix this is make it possible for threads to free - their own resources - let's do that :p - ok, got the new thread termination function, i'll build eglibc - package providing it, then experiment with libpthread - braunr: iirc there's also a tschwinge patch in the debian eglibc - about that - ah - libpthread_fix.diff - i see - thanks for the notice - bddebian: - http://www.sceen.net/~rbraun/0001-thread_terminate_deallocate.patch - bddebian: this is what it looks like - see, short and easy - Aye but didn't youpi say not to bother with it?? - he did ? - i don't remember - I thought that was the implication. Or maybe that was the one I - already did!? - i'd be interested in reading that - anyway, there still are problems in libpthread, and this call is - one building block to fix some of them - some important ones - (big leaks) - - -## IRC, freenode, #hurd, 2013-06-29 - - damn, i fix leaks in libpthread, only to find out leaks somewhere - else :( - bddebian: ok, actually it was a bit more complicated than what i - showed you - because in addition to the stack, the call must also release the - send right in the caller's ipc space - (it can't be released before since there would be no mean to - reference the thread to destroy) - or perhaps it should strictly be reserved to self termination - hmm - yes it would probably be simpler - but it should be a decent compromise - i'm close to having a libpthread that doesn't leak anything - and that properly destroys threads and their resources - - -## IRC, freenode, #hurd, 2013-06-30 - - bddebian: ok, it was even more tricky, because the kernel would - save the return value on the user stack (which is released by the call - and then invalid) before checking for asynchronous software traps (ASTs, - a kind of software interrupts in mach), and terminating the calling - thread is done by a deferred AST ... :) - hmm, making threads able to terminate themselves makes rpctrace a - bit useless :/ - well, more restricted - - ok so, tough question : - i have a small test program that creates a thread, and inspect its - state before any thread dies - i can see msg_report_wait requests when using ps - (one per thread) - one of these requests create a new receive right, apparently for - the second thread in the test program - each time i use ps, i can see the sequence numbers of two receive - rights increase - i guess these rights are related to proc and signal handling per - thread - but i can't find what create them - does anyone know ? - tschwing_: ^ :) - - again, too many things wrong elsewhere to cleanly destroy threads - .. - something is deeply wrong with controlling terminals .. - - -## IRC, freenode, #hurd, 2013-07-01 - - youpi: if you happen to notice what receive right is created for - each thread (beyond the obvious port used for blocking and waking up), - please let me know - it's the only port leak i have with thread destruction - and i think it's related to the proc server since i see the - sequence number increase every time i use ps - - pinotree: my change doesn't fix all the pthread leaks but it's a - lot better - bddebian: i've spent almost the whole week end trying to find the - last port leak without success - there is some weird bug related to the controlling tty that hits - me every time i try to change something - it's the same bug that prevents ttys from being correctly closed - when using ssh or screen - well maybe not the same, but it's close - some stale receive right kept around for no apparent reason - and i can't find its source - - -## IRC, freenode, #hurd, 2013-07-02 - - and btw, i don't think i can make my libpthread patch work - i'll just aim at avoiding leaks, but destroying threads and their - related resources depends on other changes i don't clearly see - - -## IRC, freenode, #hurd, 2013-07-03 - - grmbl, i don't want to give up thread destruction .. - - -## IRC, freenode, #hurd, 2013-07-15 - - btw, my work on thread destruction is currently stalled - i don't have much free time right now - - -## IRC, freenode, #hurd, 2013-09-13 - - i think i know why my thread_terminate_deallocate patches leak one - receive port :> - but now i'm not sure of the proper solution - every time a thread is created and destroyed, a receive right is - leaked - i guess it's simply the reply port .. - grmbl - i guess i have to make it a simpleroutine ... - hm too bad, it's not the reply port :( - it's also leaking some memory - it doesn't seem related to my changes though - stacks, rights, and threads are correctly destroyed - some obscure state is left behind - i wonder how exception ports are dealt with - vminfo seems to confirm memory is leaking in the heap - humpf - oh silly me - i don't detach threads - well, detach them ;) - hm worse :p - now i get additional dead names - but it's a step forward - - -## IRC, freenode, #hurd, 2013-09-16 - - that thread port leak is so strange - the leaked port seems to be created when the new thread starts - running - so it looks like a port the kernel would implicitely create - hm could it be a thread-specific reply port ? - ah, yes, there is one of those - how come mach/mig-reply.c in glibc isn't thread-safe ? - it is overriden by sysdeps/mach/hurd/img-reply.c I guess - which uses a threadvar for the mig reply port - oh - talking of which, there is also last_value in - sysdeps/mach/strerror_l.c - strerror_thread_freeres is supposed to get called, but who knows - it does look to be that port - iirc that's the issue which prevents from letting us make threads - exit on idleness? - one of them - ok - maybe the only one, yes - i see memory leaks but they could be related/normal - (i.e. not actual leaks) - on the other hand, i also can't boot a hurd with my patch - but i consider removing such leaks a priority - does anyone know the semantic difference between - __mig_put_reply_port and __mig_dealloc_reply_port ? - i guess __mig_dealloc_reply_port is actually a destruction - operation, right ? - AIUI, dealloc is used when one wants the port not to be reused at - all - because it has been used as a reference for something, and can - still be currently in use - while put_reply would be when we're really done with it, and won't - use it again, and can thus be used as such - or at least something like that - heh - __mig_dealloc_reply_port calls __mach_port_mod_refs, which is a - RPC, and creates a new reply port when destroying the current one - bah - that's fine, it's a deref of the old port, which is not in the - reply_port variable any more - it's fine, but still a leak - well, dealloc does not completely deallocs, yes - that's not really the problem here - i've introduced a case that wasn't considered at the time, namely - that a thread can destroy itself - we probably need another function to be called from the thread exit - i'll simply try with mach_port_destroy - mach_port_destroy seems to be a RPC too ... - grmbl - isn't there a trap version somehow ? - not in libc - erf - at least i know what's wrong now :) - there still is a small memory leak i have to investigate - but outside the stack - the stack, the thread name and the thread are correctly destroyed - slabinfo confirms only one port leak and nothing else is leaked - ok so the port leak was indeed the thread-specific reply port, - taken care of - there are also memory leaks too - - -## IRC, freenode, #hurd, 2013-09-17 - - teythoon: on my side, i'm getting to know our threading - implementation better - closing to clean thread destruction - x15 ipc will hide reply ports ;p - memory leaks solved \o/ - now, have to fix memory release when joining - proper reference counting on detach/join/exit, let's see how it - goes .. - seems to work fine - - -## IRC, freenode, #hurd, 2013-09-18 - - ok i'll soon have gnumach and libc packages including proper - thread destruction :> - braunr: why did you have to touch gnumach? - to add a call allowing threads to release ports and memory - i.e. their last self reference, their reply port and their stack - let me public my current patches - braunr: thread_commit_suicide ? - hehe - initially thread_terminate_self but - it can be used by other threads too - to i named it thread_terminate_release - http://darnassus.sceen.net/~rbraun/0001-pthread_thread_halt.patch - - http://darnassus.sceen.net/~rbraun/0001-thread_terminate_release.patch - the pthread patch needs to be polished because it changes the - semantics of pthread_thread_halt - but other than that, it should be complete - pthread_thread_halt_reallyhalt - ok let's try these libc packages - old static ext2fs for the root, but other than that, it boots - let's try iceweasel - (i'll need to build a hurd package against this new libc, removing - the libports_stability patch which prevents thread destruction in servers - on the way) - prevents thread destruction o_O - yes - in libports only ;p - oh, *only* in libports, I assumed for a moment that it affected - almost every component of the Hurd... - *phew( - ... :) - that's why, after a burst of messages, say because of aptitude - (select), you may see a few hundred threads still hanging around - also why unused servers remain running even after several minutes, - where the normal timeout is 2mins - I wondered about that, some servers (symlink comes to mind) seem - to go away if unused (or that's how I read the code) - symlinks are usually not servers, since most of them actually - exist in file systems, and are implemented through an optimization - yes I know that - trans/symlink.c reads: - /* The timeout here is 10 minutes */ - err = mach_msg_server_timeout (fsys_server, 0, control, - MACH_RCV_TIMEOUT, 1000 * 60 * 10); - if (err == MACH_RCV_TIMED_OUT) - exit (0); - ok - hm, /hurd/symlink doesn't feel at all like a symlink... but - works like one - well, starting iceweasel makes X on my host freeze oO - bbl - /hurd/symlink translators do go away after being unused for 10 - minutes... this is funny if they are set up by hand instead of being - started from a passive translator record - magically vanishing symlinks ;) - - -## IRC, freenode, #hurd, 2013-09-19 - - hum, i can't rebuild a hurd package :( - braunr: with your thread destruction patches in libc? - yes but it's unrelated - In file included from ../../libdiskfs/boot-start.c:38:0: - ./fsys_reply_U.h:173:15: error: conflicting types for - ‘fsys_get_children’ - i didn't see a new libc debian release - hm, David reported that as well - - id:CAEvUa7=QzOiS41G5Vq8k4AiaN10jAPm+CL_205OHJnL0xpJXbw@mail.gmail.com - uh oh - it seems I didn't add a _reply suffix to the reply routines :/ - there's quite a bit of fallout from my patches, I kinda feel bad - :( - teythoon: what i'm wondering is what youpi did too, since he got - hurd binary packages - braunr: well neither he nor I noticed that b/c for us the - declarations were just missing - from libc you mean ? - or hum gnumach-common ? - not sure actually - no it's not a gnumach thing - hurd-dev then - the build system should have cought these, or mig... - also, i see you changed fsys_reply.defs, but nothing about - fsys_request.defs - I have no fsys_requests.defs - looks like there was no fsys_request.defs in the first place - ... *sigh* - do you know an application that often creates and destroys threads - ? - no, sorry - maybe some test suite - ah right - sysbench maybe - also, i've been hit by a lot more network deadlocks than usual - lately - fixing netdde has gained some priority in my todo list - - -## IRC, freenode, #hurd, 2013-09-20 - - oh, git is multithreaded - great - so i've actually tested my libpthread patch quite a lot - - -## IRC, freenode, #hurd, 2013-09-25 - - on a side note, i was able to build gnumach/libc/hurd packages - with thread destruction - nice :) - they boot and work mostly fine, although they add their own issues - e.g. the comm field of the root ext2fs is empty - ps crashes when trying to display threads - but thread destruction actually works, i.e. servers (those that - are configured that away at least) go away after some time, and even - heavily used servers such as ext2fs dynamically scale over time :) - - -## IRC, freenode, #hurd, 2013-10-10 - - concerning threads, i think i figured out the last bugs i had with - thread destruction - it should be well on its way to be merged by the end of the year - - -## IRC, freenode, #hurd, 2013-10-11 - - braunr: is your thread destruction patch ready for testing? - gg0: there are packages at my repository, yes - but i still have hurd fixes to do before i polish it - in particular, posix says returning from main() stops the entire - process and all other threads - i didn't check that during the switch to pthreads, and ext2fs (and - maybe others) actually return from main but expect other threads to live - on - this creates problems when the main thread is actually destroyed, - but not the process - braunr: tmpfs does something like that, but calls pthread_exit - at the end of main - same effect - this was fine with cthreads, but must be changed with pthreads - and libpthread must be fixed to enforce it - (or libc) - - diskfs_startup_diskfs should probably be changed to reuse the main - thread instead of returning - - -## IRC, freenode, #hurd, 2013-10-19 - - I know what threads are, but what is 'thread destruction'? - the hurd currently never destroys individual threads - they're destroyed when tasks are destroyed - if the number of threads in a task peaks at a high number, say - thousands of them, they'll remain until the task is terminated - such tasks are usually file systems, normally never restarted (and - in the case of the root file system, not restartable) - this results in a form of leak - another effect of this leak is that servers which should go away - because of inactivity still remain - since thread destruction doesn't actually work, the debian package - uses a patch to prevent worker threads from timeouting - and to finish with, since thread destruction actually doesn't - work, normal (unpatched) applications that destroy threads are certainly - failing bad - i just need to polish a few things, wait for youpi to finish his - work on TLS to resolve conflicts, and that will be all - - -## IRC, freenode, #hurd, 2013-10-30 - - FYI, the packages on my repository enable actual thread - destruction, and i've altered the libports_stability.patch - it nows only sets the global timeout to 0 - now* - we actually can't let translator "die" on global timeout because - of a race issue - tested for about two weeks now and no major problem sighted - top reports processes running for 100% of their time when - terminating threads, but i expect it's simply mach/proc aggregating their - run time to the task - 100% of cpu time - - -## IRC, freenode, #hurd, 2013-11-08 - - teythoon: darnassus is currently running a modified glibc with - thread destruction, yes - braunr: did that require any fixups in Hurd that I'd have missed - ? - no - well - b/c the resulting hurd package would not boot - actually yes - one - i'll push the patch somewhere - iirc the mach-defpager spewed some error and /hurd/init failed - to bootstrap the system - teythoon: - http://darnassus.sceen.net/~rbraun/0001-Prevent-diskfs-translators-from-destroying-main-thre.patch - make sure you have the proper gnumach packages too :p - well, that could very well account for my trouble ;) - uh - well - gnumach implements thread destruction, glibc uses it, hurd makes - sure it doesn't exit from main - - -## IRC, freenode, #hurd, 2013-11-12 - - ok so, calling pthread_exit() from main isn't the same as - returning from main() - unlike what some man pages seem to say - so loosing task info when destroying the main thread is actually a - proc bug - ugh - ^^ - or a glibc one - the proc server, your favorite Hurd component... - :) - hm :/ - looks like command line arguments are stored on the stack of the - main thread - and proc merely receives the addresses of those in the target task - why not just keep the main thread around? - it represents a minor resource leak, true - yes - that's the hack i suggested - but it is relatively small - well no - my hack was about diskfs translators - it should be generalized in libpthread - seems reasonable - let's do it >) - - -## IRC, freenode, #hurd, 2013-11-13 - - braunr: there is a thread destruction issue in the experimental - ocaml build, worth looking at, probably - what do you mean ? - ... testing 'testfork.ml': ocamlcocamlrun: - ../libpthread/sysdeps/mach/pt-thread-halt.c:51: __pthread_thread_halt: - Unexpected error: (ipc/send) invalid destination port. - during the experimental ocaml build - well yes - thread recycling is buggy - i had the choice to fix it, or implement true destruction - i'm tweaking my patch so it leaves the main thread stack untouched - on destruction - and it should be ready - for review at least - - -## IRC, OFTC, #debian-hurd, 2013-11-13 - - ironforge out of memory during ruby1.9.1 rebuild. during test which - creates 10000 threads - ironforge out of memory during ruby1.9.1 rebuild, test which creates - 10000 threads - i guess ironforge kernel has been rebuilt against -95, correct? - err, what kernel? - 23:37 < youpi> hurd needs a rebuild to be able to work with the newer - eglibc - i mean hurd - yes, libc0.3 breaks the old packages anyway - wrt ENOMEM, was it expected? - wrt disk problems, aren't there on alioth only? - well 10,000 threads is a lot, especially on 32bit machine with 2M - default stack size - that makes 2GiB stacks - can't fit in a 2/2 split model, which gnumach uses - well, though active thread should die right away, just after set x to - false, if i read it correctly - perhaps the stacks are not correctly reused - that's probably worth digging in libpthread - by putting printfs, etc. - it seems stacks are never reused indeed, damn - I just wrote a small test that creates threads which just print - their stack address - that takes just a few minutes to do - i see. about reusage i guess you mean base address is kindof always - incremented - * gg0 likes being wrong - that's it, yes - gg0: take care, by keeping being wrong all the time, sometimes you - get right ;) - and you are definitely right here :) - Mmm, but the stack is really deallocated - and the numbers wrap around - I wonder how that is :) - ok, creating 20 000 threads does work - perhaps ruby does odd things which makes it not work - - -### IRC, OFTC, #debian-hurd, 2013-11-14 - - UID PID PPID TH MSGI MSGO SZ RSS SC STAT TIME COMMAND - 1012 16446 15473 720 987 509 1.89G 23.6M 1 Hu 0:00.15 - /home/gg0-guest/ruby/ruby1.9.git/ruby1.9.1 - -I/home/gg0-guest/ruby/ruby1.9.git/lib -W0 bootstraptest.tmp.rb - 720 threads, stuck - 2G SZ is very big :) - 00:42 < youpi> perhaps ruby does odd things which makes it not work - is that enough to file a ruby bug? as ruby suggests itself btw - no, they will probably not be able to investigate - but you can already check out how they create threads - and try to reproduce the same with a small C program - ehm on ruby2.0 with *context _enabled_ i can not reproduce it - -See [[/open_issues/glibc]] for `*context` functions. - - -## IRC, freenode, #hurd, 2013-11-14 - - nice, i got glibc packages with thread destruction - building hurd packages against it now - everything seems fine - hurd packages ready, let's see - - ruby1.9.1 FTBFS due to a couple of tests - https://buildd.debian.org/status/fetch.php?pkg=ruby1.9.1&arch=hurd-i386&ver=1.9.3.448-1&stamp=1384265526 - second one creates 10000 threads and machine got ENOMEM - bootstraptest.tmp.rb: [BUG] [BUG] pthread_cond_init: Cannot - allocate memory (ENOMEM) ew - few hours ago trying to reproduce it: - 01:20 < gg0> UID PID PPID TH MSGI MSGO SZ RSS SC STAT - TIME COMMAND - 01:20 < gg0> 1012 16446 15473 720 987 509 1.89G 23.6M 1 Hu - 0:00.15 /home/gg0-guest/ruby/ruby1.9.git/ruby1.9.1 - -I/home/gg0-guest/ruby/ruby1.9.git/lib -W0 bootstraptest.tmp.rb - yes that's expected - our stacks are 2M - 10k threads means right over 2G of stacks - userspace is restricted to 2G - but if i read correctly test in question, thread should just set x to - false then die - so ? - and ENOMEM popped upk when there were thread count was at 720 - hum - 10k threads would actually be 20G - 1k threads is 2G - 720 is about 1.5G - the rest is probably the ruby runtime - youpi tried to create 10000 thread, no problem. he guessed something - wrong on ruby side - indeed on ruby2.0 such test succeeds - you can't create 10k threads unless you change the stack size - hurd servers use a stack size of 64k by default which allows them - to go up to 30k iirc - but normal applications use the default 2M - i guess you mean 10000 threads active at the same time. test in - question should make them die after simply setting x to false, i guess - youpi's test did so as well - no - it's about stacks - hm - yes at the same time but - thread recycling is known to be buggy - which is what i'm currently fixing btw - what's the bug? - neal: there are several subtle issues - for example, joining a thread that is also calling pthread_exit - can fail badly - hmm - good that you are on it then :) - or detaching - i don't remember the details - but i remember such problems - apparently, keeping the stack of the main thread isn't enough - :( - for now, i'll keep the entire thread - - -## IRC, freenode, #hurd, 2013-11-15 - - i wasn't doing anything, just some single test runs. but yes, also - that one which creates hundreds of threads - it would like creating 10000 but goes out of memory after ~720 - btw same tests succeed on ruby2.0, so they should be fixed by - backporting some changes - actually it looks more like a deadlock .. - deadlock that says ENOMEM? - ? - ENOMEM is returned because the test task has no more virtual - memory - this doesn't mean the rest of the system should fail - ok i thought you were talking about such test - no it's something else - a deadlock in a critical server - the root file system maybe - braunr: htop and ps hang. just run the test once again - now you should still be able to login - htop/ps hanging means one process is unable to reply to queries - sent to the message port/thread - procfs does that to report on what a process is waiting - it usually mean there is a bug around signals, since the message - thread is also in charge of delivering signals - use ps -eM - and kill -KILL - hum - root 954 S dumping cores is known not to work most of the time - exodar shouldn't be configured like that - so yes, the crash server is hanging - gg0: i've set it to crash --kill and killed the hanging crash - instances blocking top/ps - nice - - my thread destruction patch and tls are indeed conflicting a bit - i suspect the tcb is used after being freed - i think i'll simply recycle the tcb, along with the pthread - structs - ok i think it's fine now - there was also a small bug in the tls code, keeping a reference on - the thread port - mach reference counting is so counter intuitive :/ - well, error-prone - - argh, more bugs in libc :( - :/ - but don't worry, there is always one more bug ;) - this one might explain crashes that are long to trigger - _hurd_self_sigstate() is implemented like this : - _hurd_thread_sigstate (__mach_thread_self ()); - it leaks a reference on the current thread each time it's called - >,< - but glibc maintains such references, so if the maximum value is - reached, and references are dropped, the value can reach 0 - ouch - at which point any call on a thread will result in an invalid send - right - and probably an assertion - well it's a good thing then that you found it :) - i think it's always been there - but it's more apparent since jknoenig's patch on signal - dispositions - the maximum number of user references in mach is 64k - this right leak isn't easy - tls is very tricky heh :) - for the main thread, tls initialization happens after the thread - creation, obviously - but for other threads, it's initialized before starting them - the leak was probably an overlook caused by that complexity - teythoon: actually that leak i mentioned in _hurd_self_sigstate - has only been recently added in Convert sigstate to TLS - so it's merely tls integration polishing - youpi: i'm currently reviewing changes related to tls and i think - there is a bug in _hurd_self_sigstate - calls to mach_thread_self() should be paired with - mach_port_deallocate to avoid urefs overflows - and right leaks - _hurd_critical_section_lock is probably affected too - hm - mhmm - in glibc, hurd/hurd/signal.h, _hurd_critical_section_lock - why is the sigstate unlocked after the call to - _hurd_thread_sigstate - _hurd_thread_sigstate doesn't seem to lock it .. - unless __spin_lock_init does it - yes, leak solved :) - - -## IRC, freenode, #hurd, 2013-11-16 - - argh, _hurd_critical_section_lock is called before the send right - on the main thread is fetched in libpthread :/ - is that bad ? - the sigstate is supposed to be initialized after pthreads - _hurd_critical_section_lock will create it if it sees there is - none - creating the sigstate is currently what makes the send right leak - ok - it's bad then - it may be due to my patch - _hurd_critical_section_lock is called during pthreads - initializatio - n - before the sigstate for the main thread is created, but after the - pthread init routine is called - it does indeed look like the code wasn't written with thread being - destroyed some day in mind :/ - braunr: btw, if you ever feel like benchmarking, sysbench has a - benchmark for threads contending for a lock - yes i've used it before - was it useful for this purpose ? - no :) - :/ - we already know libpthread isn't optimized - and felt it when we switched from cthreads - humpf - simply calling malloc implies a call to - _hurd_critical_section_lock - on the other hand, unlike what some glibc comments say, this does - work - - -## IRC, freenode, #hurd, 2013-11-17 - - looks like i've fixed all leak issues with thread destruction and - tls :) - let's see if ext2fs.static works fine too - braunr: \o/ - sorry about introducing the tls ones :) - no worries, it was expected - and tls was really needed :) - i mean, i expected to have some problems when rebasing on tls :p - braunr: this is good news, how is your rootfs translator holding - up? - building hurd packages right now - for now, only test applications and a few really multithreaded - ones (e.g. iceweasel) have been tested - well, the system boots :) - awesome :) - stressing the file system with git while watching youtube videos - with gnash doesn't make the system crash - you can actually watch yt videos on your Hurd box ? - yes - for a while now - o_O - can't you ? - I never even dared to try - hehe - teythoon: looks stable enough to install on darnassus - - -## IRC, freenode, #hurd, 2013-11-18 - - braunr: wrt to your thread destruction patchset, I thought you - also had to fix the proc server ? - teythoon: no - the problem was in glibc - i may have to fix proc/procfs though, because cpu time gets wrong - with the patch - currently, it's the addition of the cpu time of all threads - mach provides aggregate times including destroyed threads though - ah, I see - one side effect is that you'll see processes sometimes taking 100% - of cpu time although the cpu is unused - or the cpu time of a process gets reduced :) - i guess the 100% cpu is how top sees a negative increment - ^^ - gg0: do my threadterm packages help with ruby1.9 ? - i mean, can you test with them some time ? :) - - -## IRC, freenode, #hurd, 2013-11-21 - - youpi: ping about my question regarding error handling in the - proposed thread_terminate_release call - I agree with what Neal said - he didn't say anything about error handling - see - http://lists.gnu.org/archive/html/bug-hurd/2013-11/msg00181.html - i think i should make the call fail on first error - it shouldn't happen, so it would merely serve to catch bugs - it's not easily recoverable (if it's recoverable at all) - uh, I thought he had - I must have dreamt - - i think i'll go ahead with thread destruction integration - - -## IRC, freenode, #hurd, 2013-11-25 - - i've pushed the thread destruction patches for gnumach upstream - and made a branch in glibc for that too - awesome :) - youpi: i don't remember how glibc changes should be managed - once those are applied, i'll commit in libpthread - braunr: usually we create a topgit branch, and then we add the - patch from that to the debian repository - - -## IRC, freenode, #hurd, 2013-11-29 - - youpi: i still have a leak somewhere with the thread destruction - patches - maybe on the host priv port in bootstrap servers (root fs and proc - server) - it prevents priority adjusting in libports and can easily bring - down a system because servers can start trashing a lot sooner, as it was - the case during the pthread migration - -See discussion about that on [[/open_issues/libpthread]]. - - so i'll hunt it down before merging - - -## IRC, freenode, #hurd, 2013-12-19 - - darnassus still has the libports priority adjustement leaks - i'll apply a few more patches to my hurd packages - - humpf, proc seems to have a problem getting the host priv port :/ - thats bad - what did you do ? - i fixed all the leaks in libports when adjusting priorities - the last one being releasing the host priv right - and i get errors at boot time from the proc server - remember when i had this problem ? - proc doesn't get the host priv port the normal way since the - normal way is to get it from proc iirc - ah, thought you fixed that - so i guess the alternate way doesn't add a reference - well the leak is fixed - the problem you had was due to the leak which made the host priv - port reach its max uref value - now it's just the proc server - the system works fine though - for real ? - the proc server needs the host priv port for getting the new - tasks - well yes - how can it work w/o it ? - i don't know .. - i guess the problem is internal to glibc - i mean, get_priv_ports fails, but that doesn't mean the host priv - port is lost - could be - are you running a patched rootfs translator too ? - yes - ok - b/c i remember having trouble with that - right, the glibc call would make proc call __proc_getprivports - hum - teythoon: do you remember how proc gets its host priv port ? - from init - i think - startup_procinit ? - possibly - right - so it's probably not the host priv port - i mean, the error is about another invalid send right - hm nope, it is on host_priv :/ - hm ok i see, looks like a bug from a debian patch - or rather, a bug fix not yet imported into the debian package - teythoon: you actually fixed it in - 2c9422595f41635e2f4f7ef1afb7eece9001feae - great :) - ah, that one - i was looking at the upstream code and couldn't understand what - was going wrong - :) - much better - except ps -eT doesn't work any more .. - interestingly, with the thread destruction patch, ps -eT sometimes - work, and sometimes doesn't - the behaviour doesn't seem to change without a reboot - and of course, as soon as i say it, i'm proven wrong by the next - test :) - - -## IRC, freenode, #hurd, 2013-12-26 - - __pthread_sigstate_init doesn't seem to be converted to TLS in the - upstream repository master branch - - ah dammit, the global signal dispositions patch touches both glibc - and libpthread @#! - what a mess - - youpi: do you have some time to quickly review the - rbraun/thread_destruction branch in libpthread ? - there might be conflict with some glibc patches - or do you prefer it on the mailing list ? - (i used a branch because it's not based on master) - rather mail the list, yes - ok - it'd also be useful to write the rationale - probably to be left as comment in the source code - yes, that branch was for personal storage :) - so the reader knows how things are recycled or not - hm - that should already be the case - ok - the two structures that are still recycled are the pthread struct - and tls - it's quite obvious from pthread_alloc - and well commented there - for tls, it's explained in pthread_exit - - there, thread destruction finally merged in - and now, we can remove the ugly hacks that were done for - threadvars - :) - change stacks at will and support all sorts of weird languages and - runtimes - braunr: cool :) - - -## IRC, freenode, #hurd, 2013-12-31 - - braunr: I've added sigstate_locking, sigstate_thread_reference and - tls_thread_leak to the debian glibc 2.18 package - I believe that's complete? - is mach_msg_uspace_options ready for being added? Does it bring - much speedup? - AIUI, thread_terminate_release is the union of the branches - mentioned above? - (I'm cleaning up branches in the glibc repo) - youpi1: mach_msg_uspace_options can be left over, it only affects - selects and not noticeably - yes, those three branches are the only ones needed for thread - destruction - ok - does the hurd changes depend on these changes ? - no - good :) - only on tls for one of them - (it's about the default stack size of 64k for hurd servers) - and we have had this in debian for a long time already :) - yes - (how big were they before?) - (where they a couple MiB, and thus exploding to GiBs on thousands - of threads?) - 64k - pthread stacks are 2M by default - yes - - -## IRC, freenode, #hurd, 2014-01-14 - - braunr: it seems your time change in libps made ps produce odd re - results - samy 10987 5 -514358:-18:-42.17 /hurd/firmlink tmp - youpi: wow :) - that change is supposed to run on a system where threads actually - get destroyed - but i don't see what could trigger this side effect - root 8629 664 56 years make -j 3 - :) - heh - youpi: does the hurd package on darnassus include that patch ? - yes - i don't reproduce the problem :/ - err - what command are you using ? - ps -feM on darnassus - root 29642 473 7 months /usr/sbin/sshd -R - hmmmm - i don't see it with a make -j - well, it's not systematic - it's like once over two launches - hhhhmmmmm - it'd look like some random numbers get added - strangely, the gcc processes started by a recursive make aren't - children of make .. - ps -eF hurd seems to report the correct values - even ps -eM - oO - ps -ef too - the problem seems to be with ps -efM - too bad I'm always using that :) - another way to see it is that it makes us spot the issue ;p - - -### IRC, freenode, #hurd, 2014-01-15 - - ok i have an idea of what goes wrong in libps - - youpi: for some reason, ps -efM lacks the PSTAT_TASK_BASIC flag - my patch is wrong since it doesn't try to determine whether the - stats apply to a task or a thread, but that is easy to fix - ps -efM should nonetheless provide basic task info, obviously - in addition, the problems i've observed with ps -T (occasional - segfaults) seem to have existed before thread destruction - they're just strongly exposed now that the thread list can be - shrunk - - libps is quite complicated - even hairy, i'd say .. - - -### IRC, freenode, #hurd, 2014-01-16 - - youpi: i think i have a proper fix for libps - i'll commit it soon - ok - basically, getting system times simply set the PSTAT_THREAD_BASIC - flag - whereas getting the run time of the terminated threads requires - PSTAT_TASK_BASIC - i assumed it was always set in the function i changed when dealing - with a task and not a thread - and well, that was a wrong assumtion, -M can remove it if not - strictly needed by the format - the default format asks for suspend_count, which forces the - retrieval of task basic info, os it works with -eM - but -f doesn't :) - so extremely bad lucky combination of flags :) - indeed - i added a pstat_times using the last (!) available flag bit - looks clean to me - i hope there is no abi issue - (at least everything works with the unmodified ps-hurd executable - and a new libps.so) - - hm, small bug in the thread destruction patch :/ - - -### IRC, freenode, #hurd, 2014-01-17 - - good, i have proper fixes for tls in the main thread and thread - termination :) - awesome :) - i've been wondering, what does it take to get the thread - destruction stuff into the debian package ? - i still have to build test packages, look for (unlikely, heh) - regressions and work some integration details with samuel - hum the main thread tls fixup i guess - youpi was waiting for me to fix that - gnumach already provides the RPC - so it will be in glibc soon - i just have to get those last bits right - teythoon: i'm quite slow at integrating stuff - and samuel then builds packages ? - i mean, is our libc package build linked to the other libc - packages ? - libpthread is applied as a patch to glibc - and loaded as a plugin - - -## IRC, freenode, #hurd, 2014-01-17 - - uhm, did we break fakeroot-tcp ? - we did ? - fakeroot-tcp just works fine on buildds - with fakeroot-tcp, i get - make[4]: Entering directory - `/home/rbraun/devel/debian/packages/hurd/hurd-0.5.git20140113/libdde-linux26/contrib/include' - rm -f .general.d - make[4]: *** [cleanall] Killed - when cleaning the package before building .. - - -### IRC, freenode, #hurd, 2014-01-18 - - damn, fakeroot-tcp won't work on darnassus .. - uh, looks like my tls/thread destruction "fixes" do cause - regressions :( - fakeroot works fine with debian glibc - which one ? - which fakeroot i mean - -tcp - yes, it fails as soon as i use the patched glibc :/ - at least it's easy to reproduce - - -### IRC, freenode, #hurd, 2014-01-20 - - great, 3rd libc version installed on darnassus, let's see if i can - build hurd packages against that - - -### IRC, freenode, #hurd, 2014-01-21 - - damn, fakeroot-tcp still crashes with my latest changes .... - - darnassus looks in good shape - youpi: ^ - youpi: if you have other tests, feel free to do them now - i feel confident about committing the changes, if you're ok with - it - which changes ? - I'm a bit lost in what you were talking about :) - you can find them in 2 patches in /var/tmp on darnassus - one is about fixing thread destruction - i'm pretty certain about this one so i'll commit it directly - the other is fixing the tcb of the main thread - -[[open_issues/libpthread]]. - - where i simply do tcb->self = thread->kernel_thread :) - with a comment explaining why i don't do something else like - deallocating the unused tcb - braunr: ok, that looks good - braunr: awesome :) - youpi: ok - - -### IRC, freenode, #hurd, 2014-01-22 - - there, libpthread should be fine now - - -## IRC, freenode, #hurd, 2014-02-06 - - youpi: in case you're planning to upgrade glibc (or not), the - thread destruction changes are complete - youpi: darnassus has been running them for some weeks with no - visible regression - braunr: ok, good - including it in glibc was on my todo list indeed - and Adam indeed plan for a 2.18 upload - good :) - braunr: this is up to 7c6dc6e28b2fc4b67934223f41cf080ffe58b230, - right? (Wed Jan 22, Fix up the main thread TCB) - yes - oh, i just saw 2.17-98~0 glibc packages on debian-ports :) - yes, it's just to fix the dhcp crash - ah yes, it's not 2.18 - 2.18 is available in experimental - - braunr: just to make sure: did you have - 983b18a6ff16f5687a9ece63a50d1831dec88609 in libc on darnassus? - (which drops the stack size hack) - youpi: let me check - youpi: ah no, i don't, you're right - well, I was just wondering, nothing make me think that was the case - :) - what was the issue that it was raising btw? - threadvards - ok, b ut in which case? - (to make sure I test that before committing) - now that we switched to tls, i would assume the transition path to - be 1/ hurd stops defining that symbol, 2/ libpthread can stop using it - the goal was to reduce the stack size of hurd server threads - well, that's not my question :) I'm wondering in which precise case - that was breaking things - youpi: i don't know, it shouldn't break - ok - youpi: just in case, don't forget that last one line patch i - committed last night, fakeroot can't work right without it - (i made a minor change while reviewing before comitting, and - obviously got it wrong :p) - ok - - braunr: I've upgraded libpthread in debian's eglibc btw - - - /home/rbraun/devel/debian/packages/eglibc/eglibc-2.17/build-tree/hurd-i386-libc/libc.so.phdr: - *** executable stack signaled - from build-tree/hurd-i386-libc/elf/check-execstack.out - i thought glibc didn't use those - anyway it doesn't look to be the regression i'm having - does this ring a bell : - Encountered regressions that don't match expected failures - (debian/testsuite-checking/expected-results-i486-gnu-libc): - test-stpcpy_chk.out, Error 1 - TEST test-stpcpy_chk.out: __stpcpy_chk normal_stpcpy - simple_stpcpy_chk - nope - after what are you getting this regression? - building glibc 2.17-97 with thread destruction patches, including - the one removing the stack size hack - during tests - there also are "progressions", but i'm not sure what these are - some progressions are just luck, other seem to happen on some - platforms only - I'm not sure you want to test 2.17 - a lot has changed between 2.17's libpthread and 2.18's libpthread - (which is now equal to cvs's libpthread - ) - s/cvs/git/ - yes - i usually build with nocheck - - -## IRC, freenode, #hurd, 2014-02-07 - - youpi: on a vm with hurd 1:0.5.git20140203-1, upgrading to a - patched glibc 2.17-97 that includes the patch which reverts the stack - size hack, the system reboots and works fine - ok. I don't remember what problem I was seeing - that version of the hurd no longer defines the symbol - but even then, there shouldn't have been any problem - hm, or does it - yes, it does - youpi: the hurd package patch mentions - Revert this for now, will have to wait for dropping the use of - __pthread_stack_default_size from eglibc's - libpthread_hurd_cond_wait.diff - i wonder how it got there - IIRC I was wondering too - i've installed my c library on darnassus and it works fine there - too - with older (january) hurd packages - looks good to me - - -## IRC, freenode, #hurd, 2014-02-10 - - braunr: btw, do the new libc packages contain your thread - destruction work ? - teythoon: the -98 ones on experimental ? - i don't think they do - the -18 ones should do -- cgit v1.2.3