From 49a086299e047b18280457b654790ef4a2e5abfa Mon Sep 17 00:00:00 2001 From: Samuel Thibault Date: Wed, 18 Feb 2015 00:58:35 +0100 Subject: Revert "rename open_issues.mdwn to service_solahart_jakarta_selatan__082122541663.mdwn" This reverts commit 95878586ec7611791f4001a4ee17abf943fae3c1. --- .../libpthread/t/fix_have_kernel_resources.mdwn | 1301 ++++++++++++++++++++ 1 file changed, 1301 insertions(+) create mode 100644 open_issues/libpthread/t/fix_have_kernel_resources.mdwn (limited to 'open_issues/libpthread') diff --git a/open_issues/libpthread/t/fix_have_kernel_resources.mdwn b/open_issues/libpthread/t/fix_have_kernel_resources.mdwn new file mode 100644 index 00000000..02b6ab05 --- /dev/null +++ b/open_issues/libpthread/t/fix_have_kernel_resources.mdwn @@ -0,0 +1,1301 @@ +[[!meta copyright="Copyright © 2012, 2013, 2014 Free Software Foundation, +Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!tag open_issue_libpthread]] + +`t/fix_have_kernel_resources` + +Address problem mentioned in [[/libpthread]], *Threads' Death*. + + +# IRC, freenode, #hurd, 2012-08-30 + + tschwinge: this issue needs more cooperation with the kernel + tschwinge: i.e. the ability to tell the kernel where the stack is, + so it's unmapped when the thread dies + which requiring another thread to perform this deallocation + + +## IRC, freenode, #hurd, 2013-05-09 + + braunr: Speaking of which, didn't you say you had another "easy" + task? + bddebian: make a system call that both terminates a thread and + releases memory + (the memory released being the thread stack) + this way, a thread can completely terminates itself without the + assistance of a managing thread or deferring work + braunr: That's "easy" ? :) + bddebian: since it's just a thread_terminate+vm_deallocate, it is + something like thread_terminate_self + But a syscall not an RPC right? + in hurd terminology, we don't make the distinction + the only real syscalls are mach_msg (obviously) and some to get + well known port rights + e.g. mach_task_self + everything else should be an RPC but could be a system call for + performance + since mach was designed to support clusters, it was necessary that + anything not strictly machine-local was an RPC + and it also helps emulation a lot + so keep doing RPCs :p + + +## IRC, freenode, #hurd, 2013-05-10 + + i'm not sure it should only apply to self though + youpi: can we get a quick opinion on this please ? + i've suggested bddebian to work on a new RPC that both terminates + a thread and releases its stack to help fix libpthread + and initially, i thought of it as operating only on the calling + thread + do you see any reason to make it work on any thread ? + (e.g. a real thread_terminate + vm_deallocate) + (or any reason not to) + thread stack deallocation is always a burden indeed + I'd tend to think it'd be useful, but perhaps ask the list + + +## IRC, freenode, #hurd, 2013-06-26 + + looks like there is a port right leak in libpthread + grmbl, the port leak seems to come from mach_port_destroy being + buggy :/ + hum, apparently we're not the only ones to suffer from port leaks + wrt mach_port_destroy + ew, libpthread is leaking + memory or ports? + both + sounds great ;) + as it is, libpthread doesn't destroy threads + it queues them so they're recycled late + r + but there is confusion between the thread structure itself and its + internal resources + i.e. there is pthread_alloc which allocates a thread structure, + and pthread_create which allocates everything else + but on pthread_exit, nothing is destroyed + when a thread structure is reused, its internal resources are + replaced by new instances + oh + it's ok for joinable threads but most of our threads are detached + pinotree: as expected, it's bigger than expected :p + so i won't be able to write a quick fix + the true way to fix this is make it possible for threads to free + their own resources + let's do that :p + ok, got the new thread termination function, i'll build eglibc + package providing it, then experiment with libpthread + braunr: iirc there's also a tschwinge patch in the debian eglibc + about that + ah + libpthread_fix.diff + i see + thanks for the notice + bddebian: + http://www.sceen.net/~rbraun/0001-thread_terminate_deallocate.patch + bddebian: this is what it looks like + see, short and easy + Aye but didn't youpi say not to bother with it?? + he did ? + i don't remember + I thought that was the implication. Or maybe that was the one I + already did!? + i'd be interested in reading that + anyway, there still are problems in libpthread, and this call is + one building block to fix some of them + some important ones + (big leaks) + + +## IRC, freenode, #hurd, 2013-06-29 + + damn, i fix leaks in libpthread, only to find out leaks somewhere + else :( + bddebian: ok, actually it was a bit more complicated than what i + showed you + because in addition to the stack, the call must also release the + send right in the caller's ipc space + (it can't be released before since there would be no mean to + reference the thread to destroy) + or perhaps it should strictly be reserved to self termination + hmm + yes it would probably be simpler + but it should be a decent compromise + i'm close to having a libpthread that doesn't leak anything + and that properly destroys threads and their resources + + +## IRC, freenode, #hurd, 2013-06-30 + + bddebian: ok, it was even more tricky, because the kernel would + save the return value on the user stack (which is released by the call + and then invalid) before checking for asynchronous software traps (ASTs, + a kind of software interrupts in mach), and terminating the calling + thread is done by a deferred AST ... :) + hmm, making threads able to terminate themselves makes rpctrace a + bit useless :/ + well, more restricted + + ok so, tough question : + i have a small test program that creates a thread, and inspect its + state before any thread dies + i can see msg_report_wait requests when using ps + (one per thread) + one of these requests create a new receive right, apparently for + the second thread in the test program + each time i use ps, i can see the sequence numbers of two receive + rights increase + i guess these rights are related to proc and signal handling per + thread + but i can't find what create them + does anyone know ? + tschwing_: ^ :) + + again, too many things wrong elsewhere to cleanly destroy threads + .. + something is deeply wrong with controlling terminals .. + + +## IRC, freenode, #hurd, 2013-07-01 + + youpi: if you happen to notice what receive right is created for + each thread (beyond the obvious port used for blocking and waking up), + please let me know + it's the only port leak i have with thread destruction + and i think it's related to the proc server since i see the + sequence number increase every time i use ps + + pinotree: my change doesn't fix all the pthread leaks but it's a + lot better + bddebian: i've spent almost the whole week end trying to find the + last port leak without success + there is some weird bug related to the controlling tty that hits + me every time i try to change something + it's the same bug that prevents ttys from being correctly closed + when using ssh or screen + well maybe not the same, but it's close + some stale receive right kept around for no apparent reason + and i can't find its source + + +## IRC, freenode, #hurd, 2013-07-02 + + and btw, i don't think i can make my libpthread patch work + i'll just aim at avoiding leaks, but destroying threads and their + related resources depends on other changes i don't clearly see + + +## IRC, freenode, #hurd, 2013-07-03 + + grmbl, i don't want to give up thread destruction .. + + +## IRC, freenode, #hurd, 2013-07-15 + + btw, my work on thread destruction is currently stalled + i don't have much free time right now + + +## IRC, freenode, #hurd, 2013-09-13 + + i think i know why my thread_terminate_deallocate patches leak one + receive port :> + but now i'm not sure of the proper solution + every time a thread is created and destroyed, a receive right is + leaked + i guess it's simply the reply port .. + grmbl + i guess i have to make it a simpleroutine ... + hm too bad, it's not the reply port :( + it's also leaking some memory + it doesn't seem related to my changes though + stacks, rights, and threads are correctly destroyed + some obscure state is left behind + i wonder how exception ports are dealt with + vminfo seems to confirm memory is leaking in the heap + humpf + oh silly me + i don't detach threads + well, detach them ;) + hm worse :p + now i get additional dead names + but it's a step forward + + +## IRC, freenode, #hurd, 2013-09-16 + + that thread port leak is so strange + the leaked port seems to be created when the new thread starts + running + so it looks like a port the kernel would implicitely create + hm could it be a thread-specific reply port ? + ah, yes, there is one of those + how come mach/mig-reply.c in glibc isn't thread-safe ? + it is overriden by sysdeps/mach/hurd/img-reply.c I guess + which uses a threadvar for the mig reply port + oh + talking of which, there is also last_value in + sysdeps/mach/strerror_l.c + strerror_thread_freeres is supposed to get called, but who knows + it does look to be that port + iirc that's the issue which prevents from letting us make threads + exit on idleness? + one of them + ok + maybe the only one, yes + i see memory leaks but they could be related/normal + (i.e. not actual leaks) + on the other hand, i also can't boot a hurd with my patch + but i consider removing such leaks a priority + does anyone know the semantic difference between + __mig_put_reply_port and __mig_dealloc_reply_port ? + i guess __mig_dealloc_reply_port is actually a destruction + operation, right ? + AIUI, dealloc is used when one wants the port not to be reused at + all + because it has been used as a reference for something, and can + still be currently in use + while put_reply would be when we're really done with it, and won't + use it again, and can thus be used as such + or at least something like that + heh + __mig_dealloc_reply_port calls __mach_port_mod_refs, which is a + RPC, and creates a new reply port when destroying the current one + bah + that's fine, it's a deref of the old port, which is not in the + reply_port variable any more + it's fine, but still a leak + well, dealloc does not completely deallocs, yes + that's not really the problem here + i've introduced a case that wasn't considered at the time, namely + that a thread can destroy itself + we probably need another function to be called from the thread exit + i'll simply try with mach_port_destroy + mach_port_destroy seems to be a RPC too ... + grmbl + isn't there a trap version somehow ? + not in libc + erf + at least i know what's wrong now :) + there still is a small memory leak i have to investigate + but outside the stack + the stack, the thread name and the thread are correctly destroyed + slabinfo confirms only one port leak and nothing else is leaked + ok so the port leak was indeed the thread-specific reply port, + taken care of + there are also memory leaks too + + +## IRC, freenode, #hurd, 2013-09-17 + + teythoon: on my side, i'm getting to know our threading + implementation better + closing to clean thread destruction + x15 ipc will hide reply ports ;p + memory leaks solved \o/ + now, have to fix memory release when joining + proper reference counting on detach/join/exit, let's see how it + goes .. + seems to work fine + + +## IRC, freenode, #hurd, 2013-09-18 + + ok i'll soon have gnumach and libc packages including proper + thread destruction :> + braunr: why did you have to touch gnumach? + to add a call allowing threads to release ports and memory + i.e. their last self reference, their reply port and their stack + let me public my current patches + braunr: thread_commit_suicide ? + hehe + initially thread_terminate_self but + it can be used by other threads too + to i named it thread_terminate_release + http://darnassus.sceen.net/~rbraun/0001-pthread_thread_halt.patch + + http://darnassus.sceen.net/~rbraun/0001-thread_terminate_release.patch + the pthread patch needs to be polished because it changes the + semantics of pthread_thread_halt + but other than that, it should be complete + pthread_thread_halt_reallyhalt + ok let's try these libc packages + old static ext2fs for the root, but other than that, it boots + let's try iceweasel + (i'll need to build a hurd package against this new libc, removing + the libports_stability patch which prevents thread destruction in servers + on the way) + prevents thread destruction o_O + yes + in libports only ;p + oh, *only* in libports, I assumed for a moment that it affected + almost every component of the Hurd... + *phew( + ... :) + that's why, after a burst of messages, say because of aptitude + (select), you may see a few hundred threads still hanging around + also why unused servers remain running even after several minutes, + where the normal timeout is 2mins + I wondered about that, some servers (symlink comes to mind) seem + to go away if unused (or that's how I read the code) + symlinks are usually not servers, since most of them actually + exist in file systems, and are implemented through an optimization + yes I know that + trans/symlink.c reads: + /* The timeout here is 10 minutes */ + err = mach_msg_server_timeout (fsys_server, 0, control, + MACH_RCV_TIMEOUT, 1000 * 60 * 10); + if (err == MACH_RCV_TIMED_OUT) + exit (0); + ok + hm, /hurd/symlink doesn't feel at all like a symlink... but + works like one + well, starting iceweasel makes X on my host freeze oO + bbl + /hurd/symlink translators do go away after being unused for 10 + minutes... this is funny if they are set up by hand instead of being + started from a passive translator record + magically vanishing symlinks ;) + + +## IRC, freenode, #hurd, 2013-09-19 + + hum, i can't rebuild a hurd package :( + braunr: with your thread destruction patches in libc? + yes but it's unrelated + In file included from ../../libdiskfs/boot-start.c:38:0: + ./fsys_reply_U.h:173:15: error: conflicting types for + ‘fsys_get_children’ + i didn't see a new libc debian release + hm, David reported that as well + + id:CAEvUa7=QzOiS41G5Vq8k4AiaN10jAPm+CL_205OHJnL0xpJXbw@mail.gmail.com + uh oh + it seems I didn't add a _reply suffix to the reply routines :/ + there's quite a bit of fallout from my patches, I kinda feel bad + :( + teythoon: what i'm wondering is what youpi did too, since he got + hurd binary packages + braunr: well neither he nor I noticed that b/c for us the + declarations were just missing + from libc you mean ? + or hum gnumach-common ? + not sure actually + no it's not a gnumach thing + hurd-dev then + the build system should have cought these, or mig... + also, i see you changed fsys_reply.defs, but nothing about + fsys_request.defs + I have no fsys_requests.defs + looks like there was no fsys_request.defs in the first place + ... *sigh* + do you know an application that often creates and destroys threads + ? + no, sorry + maybe some test suite + ah right + sysbench maybe + also, i've been hit by a lot more network deadlocks than usual + lately + fixing netdde has gained some priority in my todo list + + +## IRC, freenode, #hurd, 2013-09-20 + + oh, git is multithreaded + great + so i've actually tested my libpthread patch quite a lot + + +## IRC, freenode, #hurd, 2013-09-25 + + on a side note, i was able to build gnumach/libc/hurd packages + with thread destruction + nice :) + they boot and work mostly fine, although they add their own issues + e.g. the comm field of the root ext2fs is empty + ps crashes when trying to display threads + but thread destruction actually works, i.e. servers (those that + are configured that away at least) go away after some time, and even + heavily used servers such as ext2fs dynamically scale over time :) + + +## IRC, freenode, #hurd, 2013-10-10 + + concerning threads, i think i figured out the last bugs i had with + thread destruction + it should be well on its way to be merged by the end of the year + + +## IRC, freenode, #hurd, 2013-10-11 + + braunr: is your thread destruction patch ready for testing? + gg0: there are packages at my repository, yes + but i still have hurd fixes to do before i polish it + in particular, posix says returning from main() stops the entire + process and all other threads + i didn't check that during the switch to pthreads, and ext2fs (and + maybe others) actually return from main but expect other threads to live + on + this creates problems when the main thread is actually destroyed, + but not the process + braunr: tmpfs does something like that, but calls pthread_exit + at the end of main + same effect + this was fine with cthreads, but must be changed with pthreads + and libpthread must be fixed to enforce it + (or libc) + + diskfs_startup_diskfs should probably be changed to reuse the main + thread instead of returning + + +## IRC, freenode, #hurd, 2013-10-19 + + I know what threads are, but what is 'thread destruction'? + the hurd currently never destroys individual threads + they're destroyed when tasks are destroyed + if the number of threads in a task peaks at a high number, say + thousands of them, they'll remain until the task is terminated + such tasks are usually file systems, normally never restarted (and + in the case of the root file system, not restartable) + this results in a form of leak + another effect of this leak is that servers which should go away + because of inactivity still remain + since thread destruction doesn't actually work, the debian package + uses a patch to prevent worker threads from timeouting + and to finish with, since thread destruction actually doesn't + work, normal (unpatched) applications that destroy threads are certainly + failing bad + i just need to polish a few things, wait for youpi to finish his + work on TLS to resolve conflicts, and that will be all + + +## IRC, freenode, #hurd, 2013-10-30 + + FYI, the packages on my repository enable actual thread + destruction, and i've altered the libports_stability.patch + it nows only sets the global timeout to 0 + now* + we actually can't let translator "die" on global timeout because + of a race issue + tested for about two weeks now and no major problem sighted + top reports processes running for 100% of their time when + terminating threads, but i expect it's simply mach/proc aggregating their + run time to the task + 100% of cpu time + + +## IRC, freenode, #hurd, 2013-11-08 + + teythoon: darnassus is currently running a modified glibc with + thread destruction, yes + braunr: did that require any fixups in Hurd that I'd have missed + ? + no + well + b/c the resulting hurd package would not boot + actually yes + one + i'll push the patch somewhere + iirc the mach-defpager spewed some error and /hurd/init failed + to bootstrap the system + teythoon: + http://darnassus.sceen.net/~rbraun/0001-Prevent-diskfs-translators-from-destroying-main-thre.patch + make sure you have the proper gnumach packages too :p + well, that could very well account for my trouble ;) + uh + well + gnumach implements thread destruction, glibc uses it, hurd makes + sure it doesn't exit from main + + +## IRC, freenode, #hurd, 2013-11-12 + + ok so, calling pthread_exit() from main isn't the same as + returning from main() + unlike what some man pages seem to say + so loosing task info when destroying the main thread is actually a + proc bug + ugh + ^^ + or a glibc one + the proc server, your favorite Hurd component... + :) + hm :/ + looks like command line arguments are stored on the stack of the + main thread + and proc merely receives the addresses of those in the target task + why not just keep the main thread around? + it represents a minor resource leak, true + yes + that's the hack i suggested + but it is relatively small + well no + my hack was about diskfs translators + it should be generalized in libpthread + seems reasonable + let's do it >) + + +## IRC, freenode, #hurd, 2013-11-13 + + braunr: there is a thread destruction issue in the experimental + ocaml build, worth looking at, probably + what do you mean ? + ... testing 'testfork.ml': ocamlcocamlrun: + ../libpthread/sysdeps/mach/pt-thread-halt.c:51: __pthread_thread_halt: + Unexpected error: (ipc/send) invalid destination port. + during the experimental ocaml build + well yes + thread recycling is buggy + i had the choice to fix it, or implement true destruction + i'm tweaking my patch so it leaves the main thread stack untouched + on destruction + and it should be ready + for review at least + + +## IRC, OFTC, #debian-hurd, 2013-11-13 + + ironforge out of memory during ruby1.9.1 rebuild. during test which + creates 10000 threads + ironforge out of memory during ruby1.9.1 rebuild, test which creates + 10000 threads + i guess ironforge kernel has been rebuilt against -95, correct? + err, what kernel? + 23:37 < youpi> hurd needs a rebuild to be able to work with the newer + eglibc + i mean hurd + yes, libc0.3 breaks the old packages anyway + wrt ENOMEM, was it expected? + wrt disk problems, aren't there on alioth only? + well 10,000 threads is a lot, especially on 32bit machine with 2M + default stack size + that makes 2GiB stacks + can't fit in a 2/2 split model, which gnumach uses + well, though active thread should die right away, just after set x to + false, if i read it correctly + perhaps the stacks are not correctly reused + that's probably worth digging in libpthread + by putting printfs, etc. + it seems stacks are never reused indeed, damn + I just wrote a small test that creates threads which just print + their stack address + that takes just a few minutes to do + i see. about reusage i guess you mean base address is kindof always + incremented + * gg0 likes being wrong + that's it, yes + gg0: take care, by keeping being wrong all the time, sometimes you + get right ;) + and you are definitely right here :) + Mmm, but the stack is really deallocated + and the numbers wrap around + I wonder how that is :) + ok, creating 20 000 threads does work + perhaps ruby does odd things which makes it not work + + +### IRC, OFTC, #debian-hurd, 2013-11-14 + + UID PID PPID TH MSGI MSGO SZ RSS SC STAT TIME COMMAND + 1012 16446 15473 720 987 509 1.89G 23.6M 1 Hu 0:00.15 + /home/gg0-guest/ruby/ruby1.9.git/ruby1.9.1 + -I/home/gg0-guest/ruby/ruby1.9.git/lib -W0 bootstraptest.tmp.rb + 720 threads, stuck + 2G SZ is very big :) + 00:42 < youpi> perhaps ruby does odd things which makes it not work + is that enough to file a ruby bug? as ruby suggests itself btw + no, they will probably not be able to investigate + but you can already check out how they create threads + and try to reproduce the same with a small C program + ehm on ruby2.0 with *context _enabled_ i can not reproduce it + +See [[/open_issues/glibc]] for `*context` functions. + + +## IRC, freenode, #hurd, 2013-11-14 + + nice, i got glibc packages with thread destruction + building hurd packages against it now + everything seems fine + hurd packages ready, let's see + + ruby1.9.1 FTBFS due to a couple of tests + https://buildd.debian.org/status/fetch.php?pkg=ruby1.9.1&arch=hurd-i386&ver=1.9.3.448-1&stamp=1384265526 + second one creates 10000 threads and machine got ENOMEM + bootstraptest.tmp.rb: [BUG] [BUG] pthread_cond_init: Cannot + allocate memory (ENOMEM) ew + few hours ago trying to reproduce it: + 01:20 < gg0> UID PID PPID TH MSGI MSGO SZ RSS SC STAT + TIME COMMAND + 01:20 < gg0> 1012 16446 15473 720 987 509 1.89G 23.6M 1 Hu + 0:00.15 /home/gg0-guest/ruby/ruby1.9.git/ruby1.9.1 + -I/home/gg0-guest/ruby/ruby1.9.git/lib -W0 bootstraptest.tmp.rb + yes that's expected + our stacks are 2M + 10k threads means right over 2G of stacks + userspace is restricted to 2G + but if i read correctly test in question, thread should just set x to + false then die + so ? + and ENOMEM popped upk when there were thread count was at 720 + hum + 10k threads would actually be 20G + 1k threads is 2G + 720 is about 1.5G + the rest is probably the ruby runtime + youpi tried to create 10000 thread, no problem. he guessed something + wrong on ruby side + indeed on ruby2.0 such test succeeds + you can't create 10k threads unless you change the stack size + hurd servers use a stack size of 64k by default which allows them + to go up to 30k iirc + but normal applications use the default 2M + i guess you mean 10000 threads active at the same time. test in + question should make them die after simply setting x to false, i guess + youpi's test did so as well + no + it's about stacks + hm + yes at the same time but + thread recycling is known to be buggy + which is what i'm currently fixing btw + what's the bug? + neal: there are several subtle issues + for example, joining a thread that is also calling pthread_exit + can fail badly + hmm + good that you are on it then :) + or detaching + i don't remember the details + but i remember such problems + apparently, keeping the stack of the main thread isn't enough + :( + for now, i'll keep the entire thread + + +## IRC, freenode, #hurd, 2013-11-15 + + i wasn't doing anything, just some single test runs. but yes, also + that one which creates hundreds of threads + it would like creating 10000 but goes out of memory after ~720 + btw same tests succeed on ruby2.0, so they should be fixed by + backporting some changes + actually it looks more like a deadlock .. + deadlock that says ENOMEM? + ? + ENOMEM is returned because the test task has no more virtual + memory + this doesn't mean the rest of the system should fail + ok i thought you were talking about such test + no it's something else + a deadlock in a critical server + the root file system maybe + braunr: htop and ps hang. just run the test once again + now you should still be able to login + htop/ps hanging means one process is unable to reply to queries + sent to the message port/thread + procfs does that to report on what a process is waiting + it usually mean there is a bug around signals, since the message + thread is also in charge of delivering signals + use ps -eM + and kill -KILL + hum + root 954 S dumping cores is known not to work most of the time + exodar shouldn't be configured like that + so yes, the crash server is hanging + gg0: i've set it to crash --kill and killed the hanging crash + instances blocking top/ps + nice + + my thread destruction patch and tls are indeed conflicting a bit + i suspect the tcb is used after being freed + i think i'll simply recycle the tcb, along with the pthread + structs + ok i think it's fine now + there was also a small bug in the tls code, keeping a reference on + the thread port + mach reference counting is so counter intuitive :/ + well, error-prone + + argh, more bugs in libc :( + :/ + but don't worry, there is always one more bug ;) + this one might explain crashes that are long to trigger + _hurd_self_sigstate() is implemented like this : + _hurd_thread_sigstate (__mach_thread_self ()); + it leaks a reference on the current thread each time it's called + >,< + but glibc maintains such references, so if the maximum value is + reached, and references are dropped, the value can reach 0 + ouch + at which point any call on a thread will result in an invalid send + right + and probably an assertion + well it's a good thing then that you found it :) + i think it's always been there + but it's more apparent since jknoenig's patch on signal + dispositions + the maximum number of user references in mach is 64k + this right leak isn't easy + tls is very tricky heh :) + for the main thread, tls initialization happens after the thread + creation, obviously + but for other threads, it's initialized before starting them + the leak was probably an overlook caused by that complexity + teythoon: actually that leak i mentioned in _hurd_self_sigstate + has only been recently added in Convert sigstate to TLS + so it's merely tls integration polishing + youpi: i'm currently reviewing changes related to tls and i think + there is a bug in _hurd_self_sigstate + calls to mach_thread_self() should be paired with + mach_port_deallocate to avoid urefs overflows + and right leaks + _hurd_critical_section_lock is probably affected too + hm + mhmm + in glibc, hurd/hurd/signal.h, _hurd_critical_section_lock + why is the sigstate unlocked after the call to + _hurd_thread_sigstate + _hurd_thread_sigstate doesn't seem to lock it .. + unless __spin_lock_init does it + yes, leak solved :) + + +## IRC, freenode, #hurd, 2013-11-16 + + argh, _hurd_critical_section_lock is called before the send right + on the main thread is fetched in libpthread :/ + is that bad ? + the sigstate is supposed to be initialized after pthreads + _hurd_critical_section_lock will create it if it sees there is + none + creating the sigstate is currently what makes the send right leak + ok + it's bad then + it may be due to my patch + _hurd_critical_section_lock is called during pthreads + initializatio + n + before the sigstate for the main thread is created, but after the + pthread init routine is called + it does indeed look like the code wasn't written with thread being + destroyed some day in mind :/ + braunr: btw, if you ever feel like benchmarking, sysbench has a + benchmark for threads contending for a lock + yes i've used it before + was it useful for this purpose ? + no :) + :/ + we already know libpthread isn't optimized + and felt it when we switched from cthreads + humpf + simply calling malloc implies a call to + _hurd_critical_section_lock + on the other hand, unlike what some glibc comments say, this does + work + + +## IRC, freenode, #hurd, 2013-11-17 + + looks like i've fixed all leak issues with thread destruction and + tls :) + let's see if ext2fs.static works fine too + braunr: \o/ + sorry about introducing the tls ones :) + no worries, it was expected + and tls was really needed :) + i mean, i expected to have some problems when rebasing on tls :p + braunr: this is good news, how is your rootfs translator holding + up? + building hurd packages right now + for now, only test applications and a few really multithreaded + ones (e.g. iceweasel) have been tested + well, the system boots :) + awesome :) + stressing the file system with git while watching youtube videos + with gnash doesn't make the system crash + you can actually watch yt videos on your Hurd box ? + yes + for a while now + o_O + can't you ? + I never even dared to try + hehe + teythoon: looks stable enough to install on darnassus + + +## IRC, freenode, #hurd, 2013-11-18 + + braunr: wrt to your thread destruction patchset, I thought you + also had to fix the proc server ? + teythoon: no + the problem was in glibc + i may have to fix proc/procfs though, because cpu time gets wrong + with the patch + currently, it's the addition of the cpu time of all threads + mach provides aggregate times including destroyed threads though + ah, I see + one side effect is that you'll see processes sometimes taking 100% + of cpu time although the cpu is unused + or the cpu time of a process gets reduced :) + i guess the 100% cpu is how top sees a negative increment + ^^ + gg0: do my threadterm packages help with ruby1.9 ? + i mean, can you test with them some time ? :) + + +## IRC, freenode, #hurd, 2013-11-21 + + youpi: ping about my question regarding error handling in the + proposed thread_terminate_release call + I agree with what Neal said + he didn't say anything about error handling + see + http://lists.gnu.org/archive/html/bug-hurd/2013-11/msg00181.html + i think i should make the call fail on first error + it shouldn't happen, so it would merely serve to catch bugs + it's not easily recoverable (if it's recoverable at all) + uh, I thought he had + I must have dreamt + + i think i'll go ahead with thread destruction integration + + +## IRC, freenode, #hurd, 2013-11-25 + + i've pushed the thread destruction patches for gnumach upstream + and made a branch in glibc for that too + awesome :) + youpi: i don't remember how glibc changes should be managed + once those are applied, i'll commit in libpthread + braunr: usually we create a topgit branch, and then we add the + patch from that to the debian repository + + +## IRC, freenode, #hurd, 2013-11-29 + + youpi: i still have a leak somewhere with the thread destruction + patches + maybe on the host priv port in bootstrap servers (root fs and proc + server) + it prevents priority adjusting in libports and can easily bring + down a system because servers can start trashing a lot sooner, as it was + the case during the pthread migration + +See discussion about that on [[/open_issues/libpthread]]. + + so i'll hunt it down before merging + + +## IRC, freenode, #hurd, 2013-12-19 + + darnassus still has the libports priority adjustement leaks + i'll apply a few more patches to my hurd packages + + humpf, proc seems to have a problem getting the host priv port :/ + thats bad + what did you do ? + i fixed all the leaks in libports when adjusting priorities + the last one being releasing the host priv right + and i get errors at boot time from the proc server + remember when i had this problem ? + proc doesn't get the host priv port the normal way since the + normal way is to get it from proc iirc + ah, thought you fixed that + so i guess the alternate way doesn't add a reference + well the leak is fixed + the problem you had was due to the leak which made the host priv + port reach its max uref value + now it's just the proc server + the system works fine though + for real ? + the proc server needs the host priv port for getting the new + tasks + well yes + how can it work w/o it ? + i don't know .. + i guess the problem is internal to glibc + i mean, get_priv_ports fails, but that doesn't mean the host priv + port is lost + could be + are you running a patched rootfs translator too ? + yes + ok + b/c i remember having trouble with that + right, the glibc call would make proc call __proc_getprivports + hum + teythoon: do you remember how proc gets its host priv port ? + from init + i think + startup_procinit ? + possibly + right + so it's probably not the host priv port + i mean, the error is about another invalid send right + hm nope, it is on host_priv :/ + hm ok i see, looks like a bug from a debian patch + or rather, a bug fix not yet imported into the debian package + teythoon: you actually fixed it in + 2c9422595f41635e2f4f7ef1afb7eece9001feae + great :) + ah, that one + i was looking at the upstream code and couldn't understand what + was going wrong + :) + much better + except ps -eT doesn't work any more .. + interestingly, with the thread destruction patch, ps -eT sometimes + work, and sometimes doesn't + the behaviour doesn't seem to change without a reboot + and of course, as soon as i say it, i'm proven wrong by the next + test :) + + +## IRC, freenode, #hurd, 2013-12-26 + + __pthread_sigstate_init doesn't seem to be converted to TLS in the + upstream repository master branch + + ah dammit, the global signal dispositions patch touches both glibc + and libpthread @#! + what a mess + + youpi: do you have some time to quickly review the + rbraun/thread_destruction branch in libpthread ? + there might be conflict with some glibc patches + or do you prefer it on the mailing list ? + (i used a branch because it's not based on master) + rather mail the list, yes + ok + it'd also be useful to write the rationale + probably to be left as comment in the source code + yes, that branch was for personal storage :) + so the reader knows how things are recycled or not + hm + that should already be the case + ok + the two structures that are still recycled are the pthread struct + and tls + it's quite obvious from pthread_alloc + and well commented there + for tls, it's explained in pthread_exit + + there, thread destruction finally merged in + and now, we can remove the ugly hacks that were done for + threadvars + :) + change stacks at will and support all sorts of weird languages and + runtimes + braunr: cool :) + + +## IRC, freenode, #hurd, 2013-12-31 + + braunr: I've added sigstate_locking, sigstate_thread_reference and + tls_thread_leak to the debian glibc 2.18 package + I believe that's complete? + is mach_msg_uspace_options ready for being added? Does it bring + much speedup? + AIUI, thread_terminate_release is the union of the branches + mentioned above? + (I'm cleaning up branches in the glibc repo) + youpi1: mach_msg_uspace_options can be left over, it only affects + selects and not noticeably + yes, those three branches are the only ones needed for thread + destruction + ok + does the hurd changes depend on these changes ? + no + good :) + only on tls for one of them + (it's about the default stack size of 64k for hurd servers) + and we have had this in debian for a long time already :) + yes + (how big were they before?) + (where they a couple MiB, and thus exploding to GiBs on thousands + of threads?) + 64k + pthread stacks are 2M by default + yes + + +## IRC, freenode, #hurd, 2014-01-14 + + braunr: it seems your time change in libps made ps produce odd re + results + samy 10987 5 -514358:-18:-42.17 /hurd/firmlink tmp + youpi: wow :) + that change is supposed to run on a system where threads actually + get destroyed + but i don't see what could trigger this side effect + root 8629 664 56 years make -j 3 + :) + heh + youpi: does the hurd package on darnassus include that patch ? + yes + i don't reproduce the problem :/ + err + what command are you using ? + ps -feM on darnassus + root 29642 473 7 months /usr/sbin/sshd -R + hmmmm + i don't see it with a make -j + well, it's not systematic + it's like once over two launches + hhhhmmmmm + it'd look like some random numbers get added + strangely, the gcc processes started by a recursive make aren't + children of make .. + ps -eF hurd seems to report the correct values + even ps -eM + oO + ps -ef too + the problem seems to be with ps -efM + too bad I'm always using that :) + another way to see it is that it makes us spot the issue ;p + + +### IRC, freenode, #hurd, 2014-01-15 + + ok i have an idea of what goes wrong in libps + + youpi: for some reason, ps -efM lacks the PSTAT_TASK_BASIC flag + my patch is wrong since it doesn't try to determine whether the + stats apply to a task or a thread, but that is easy to fix + ps -efM should nonetheless provide basic task info, obviously + in addition, the problems i've observed with ps -T (occasional + segfaults) seem to have existed before thread destruction + they're just strongly exposed now that the thread list can be + shrunk + + libps is quite complicated + even hairy, i'd say .. + + +### IRC, freenode, #hurd, 2014-01-16 + + youpi: i think i have a proper fix for libps + i'll commit it soon + ok + basically, getting system times simply set the PSTAT_THREAD_BASIC + flag + whereas getting the run time of the terminated threads requires + PSTAT_TASK_BASIC + i assumed it was always set in the function i changed when dealing + with a task and not a thread + and well, that was a wrong assumtion, -M can remove it if not + strictly needed by the format + the default format asks for suspend_count, which forces the + retrieval of task basic info, os it works with -eM + but -f doesn't :) + so extremely bad lucky combination of flags :) + indeed + i added a pstat_times using the last (!) available flag bit + looks clean to me + i hope there is no abi issue + (at least everything works with the unmodified ps-hurd executable + and a new libps.so) + + hm, small bug in the thread destruction patch :/ + + +### IRC, freenode, #hurd, 2014-01-17 + + good, i have proper fixes for tls in the main thread and thread + termination :) + awesome :) + i've been wondering, what does it take to get the thread + destruction stuff into the debian package ? + i still have to build test packages, look for (unlikely, heh) + regressions and work some integration details with samuel + hum the main thread tls fixup i guess + youpi was waiting for me to fix that + gnumach already provides the RPC + so it will be in glibc soon + i just have to get those last bits right + teythoon: i'm quite slow at integrating stuff + and samuel then builds packages ? + i mean, is our libc package build linked to the other libc + packages ? + libpthread is applied as a patch to glibc + and loaded as a plugin + + +## IRC, freenode, #hurd, 2014-01-17 + + uhm, did we break fakeroot-tcp ? + we did ? + fakeroot-tcp just works fine on buildds + with fakeroot-tcp, i get + make[4]: Entering directory + `/home/rbraun/devel/debian/packages/hurd/hurd-0.5.git20140113/libdde-linux26/contrib/include' + rm -f .general.d + make[4]: *** [cleanall] Killed + when cleaning the package before building .. + + +### IRC, freenode, #hurd, 2014-01-18 + + damn, fakeroot-tcp won't work on darnassus .. + uh, looks like my tls/thread destruction "fixes" do cause + regressions :( + fakeroot works fine with debian glibc + which one ? + which fakeroot i mean + -tcp + yes, it fails as soon as i use the patched glibc :/ + at least it's easy to reproduce + + +### IRC, freenode, #hurd, 2014-01-20 + + great, 3rd libc version installed on darnassus, let's see if i can + build hurd packages against that + + +### IRC, freenode, #hurd, 2014-01-21 + + damn, fakeroot-tcp still crashes with my latest changes .... + + darnassus looks in good shape + youpi: ^ + youpi: if you have other tests, feel free to do them now + i feel confident about committing the changes, if you're ok with + it + which changes ? + I'm a bit lost in what you were talking about :) + you can find them in 2 patches in /var/tmp on darnassus + one is about fixing thread destruction + i'm pretty certain about this one so i'll commit it directly + the other is fixing the tcb of the main thread + +[[open_issues/libpthread]]. + + where i simply do tcb->self = thread->kernel_thread :) + with a comment explaining why i don't do something else like + deallocating the unused tcb + braunr: ok, that looks good + braunr: awesome :) + youpi: ok + + +### IRC, freenode, #hurd, 2014-01-22 + + there, libpthread should be fine now + + +## IRC, freenode, #hurd, 2014-02-06 + + youpi: in case you're planning to upgrade glibc (or not), the + thread destruction changes are complete + youpi: darnassus has been running them for some weeks with no + visible regression + braunr: ok, good + including it in glibc was on my todo list indeed + and Adam indeed plan for a 2.18 upload + good :) + braunr: this is up to 7c6dc6e28b2fc4b67934223f41cf080ffe58b230, + right? (Wed Jan 22, Fix up the main thread TCB) + yes + oh, i just saw 2.17-98~0 glibc packages on debian-ports :) + yes, it's just to fix the dhcp crash + ah yes, it's not 2.18 + 2.18 is available in experimental + + braunr: just to make sure: did you have + 983b18a6ff16f5687a9ece63a50d1831dec88609 in libc on darnassus? + (which drops the stack size hack) + youpi: let me check + youpi: ah no, i don't, you're right + well, I was just wondering, nothing make me think that was the case + :) + what was the issue that it was raising btw? + threadvards + ok, b ut in which case? + (to make sure I test that before committing) + now that we switched to tls, i would assume the transition path to + be 1/ hurd stops defining that symbol, 2/ libpthread can stop using it + the goal was to reduce the stack size of hurd server threads + well, that's not my question :) I'm wondering in which precise case + that was breaking things + youpi: i don't know, it shouldn't break + ok + youpi: just in case, don't forget that last one line patch i + committed last night, fakeroot can't work right without it + (i made a minor change while reviewing before comitting, and + obviously got it wrong :p) + ok + + braunr: I've upgraded libpthread in debian's eglibc btw + + + /home/rbraun/devel/debian/packages/eglibc/eglibc-2.17/build-tree/hurd-i386-libc/libc.so.phdr: + *** executable stack signaled + from build-tree/hurd-i386-libc/elf/check-execstack.out + i thought glibc didn't use those + anyway it doesn't look to be the regression i'm having + does this ring a bell : + Encountered regressions that don't match expected failures + (debian/testsuite-checking/expected-results-i486-gnu-libc): + test-stpcpy_chk.out, Error 1 + TEST test-stpcpy_chk.out: __stpcpy_chk normal_stpcpy + simple_stpcpy_chk + nope + after what are you getting this regression? + building glibc 2.17-97 with thread destruction patches, including + the one removing the stack size hack + during tests + there also are "progressions", but i'm not sure what these are + some progressions are just luck, other seem to happen on some + platforms only + I'm not sure you want to test 2.17 + a lot has changed between 2.17's libpthread and 2.18's libpthread + (which is now equal to cvs's libpthread + ) + s/cvs/git/ + yes + i usually build with nocheck + + +## IRC, freenode, #hurd, 2014-02-07 + + youpi: on a vm with hurd 1:0.5.git20140203-1, upgrading to a + patched glibc 2.17-97 that includes the patch which reverts the stack + size hack, the system reboots and works fine + ok. I don't remember what problem I was seeing + that version of the hurd no longer defines the symbol + but even then, there shouldn't have been any problem + hm, or does it + yes, it does + youpi: the hurd package patch mentions + Revert this for now, will have to wait for dropping the use of + __pthread_stack_default_size from eglibc's + libpthread_hurd_cond_wait.diff + i wonder how it got there + IIRC I was wondering too + i've installed my c library on darnassus and it works fine there + too + with older (january) hurd packages + looks good to me + + +## IRC, freenode, #hurd, 2014-02-10 + + braunr: btw, do the new libc packages contain your thread + destruction work ? + teythoon: the -98 ones on experimental ? + i don't think they do + the -18 ones should do -- cgit v1.2.3