From ea874bc0797b8a3e5dbac278178b74f777e08d2c Mon Sep 17 00:00:00 2001 From: guy fleury iteriteka Date: Sat, 2 Jan 2021 12:12:09 +0200 Subject: rename open_issues/libpthread/t/fix_have_kernel_resources.mdwn -> open_issues/libpthread/fix_have_kernel_resources.mdwn Message-Id: <20210102101217.8372-4-gfleury@disroot.org> --- open_issues/libpthread.mdwn | 4 +- .../libpthread/fix_have_kernel_resources.mdwn | 1301 ++++++++++++++++++++ .../libpthread/t/fix_have_kernel_resources.mdwn | 1301 -------------------- 3 files changed, 1303 insertions(+), 1303 deletions(-) create mode 100644 open_issues/libpthread/fix_have_kernel_resources.mdwn delete mode 100644 open_issues/libpthread/t/fix_have_kernel_resources.mdwn (limited to 'open_issues') diff --git a/open_issues/libpthread.mdwn b/open_issues/libpthread.mdwn index c628bc7b..f8d9e1f1 100644 --- a/open_issues/libpthread.mdwn +++ b/open_issues/libpthread.mdwn @@ -1310,7 +1310,7 @@ Most of the issues raised on this page has been resolved, a few remain. thhe hurd must be plagued with wrong deallocations :( i have so many problems when trying to cleanly destroy threads -[[libpthread/t/fix_have_kernel_resources]]. +[[libpthread/fix_have_kernel_resources]]. #### IRC, freenode, #hurd, 2013-11-25 @@ -1322,7 +1322,7 @@ Most of the issues raised on this page has been resolved, a few remain. #### IRC, freenode, #hurd, 2013-11-29 -See also [[open_issues/libpthread/t/fix_have_kernel_resources]]. +See also [[open_issues/libpthread/fix_have_kernel_resources]]. there still are some leak ports making servers spawn threads with non-elevated priorities :/ diff --git a/open_issues/libpthread/fix_have_kernel_resources.mdwn b/open_issues/libpthread/fix_have_kernel_resources.mdwn new file mode 100644 index 00000000..02b6ab05 --- /dev/null +++ b/open_issues/libpthread/fix_have_kernel_resources.mdwn @@ -0,0 +1,1301 @@ +[[!meta copyright="Copyright © 2012, 2013, 2014 Free Software Foundation, +Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!tag open_issue_libpthread]] + +`t/fix_have_kernel_resources` + +Address problem mentioned in [[/libpthread]], *Threads' Death*. + + +# IRC, freenode, #hurd, 2012-08-30 + + tschwinge: this issue needs more cooperation with the kernel + tschwinge: i.e. the ability to tell the kernel where the stack is, + so it's unmapped when the thread dies + which requiring another thread to perform this deallocation + + +## IRC, freenode, #hurd, 2013-05-09 + + braunr: Speaking of which, didn't you say you had another "easy" + task? + bddebian: make a system call that both terminates a thread and + releases memory + (the memory released being the thread stack) + this way, a thread can completely terminates itself without the + assistance of a managing thread or deferring work + braunr: That's "easy" ? :) + bddebian: since it's just a thread_terminate+vm_deallocate, it is + something like thread_terminate_self + But a syscall not an RPC right? + in hurd terminology, we don't make the distinction + the only real syscalls are mach_msg (obviously) and some to get + well known port rights + e.g. mach_task_self + everything else should be an RPC but could be a system call for + performance + since mach was designed to support clusters, it was necessary that + anything not strictly machine-local was an RPC + and it also helps emulation a lot + so keep doing RPCs :p + + +## IRC, freenode, #hurd, 2013-05-10 + + i'm not sure it should only apply to self though + youpi: can we get a quick opinion on this please ? + i've suggested bddebian to work on a new RPC that both terminates + a thread and releases its stack to help fix libpthread + and initially, i thought of it as operating only on the calling + thread + do you see any reason to make it work on any thread ? + (e.g. a real thread_terminate + vm_deallocate) + (or any reason not to) + thread stack deallocation is always a burden indeed + I'd tend to think it'd be useful, but perhaps ask the list + + +## IRC, freenode, #hurd, 2013-06-26 + + looks like there is a port right leak in libpthread + grmbl, the port leak seems to come from mach_port_destroy being + buggy :/ + hum, apparently we're not the only ones to suffer from port leaks + wrt mach_port_destroy + ew, libpthread is leaking + memory or ports? + both + sounds great ;) + as it is, libpthread doesn't destroy threads + it queues them so they're recycled late + r + but there is confusion between the thread structure itself and its + internal resources + i.e. there is pthread_alloc which allocates a thread structure, + and pthread_create which allocates everything else + but on pthread_exit, nothing is destroyed + when a thread structure is reused, its internal resources are + replaced by new instances + oh + it's ok for joinable threads but most of our threads are detached + pinotree: as expected, it's bigger than expected :p + so i won't be able to write a quick fix + the true way to fix this is make it possible for threads to free + their own resources + let's do that :p + ok, got the new thread termination function, i'll build eglibc + package providing it, then experiment with libpthread + braunr: iirc there's also a tschwinge patch in the debian eglibc + about that + ah + libpthread_fix.diff + i see + thanks for the notice + bddebian: + http://www.sceen.net/~rbraun/0001-thread_terminate_deallocate.patch + bddebian: this is what it looks like + see, short and easy + Aye but didn't youpi say not to bother with it?? + he did ? + i don't remember + I thought that was the implication. Or maybe that was the one I + already did!? + i'd be interested in reading that + anyway, there still are problems in libpthread, and this call is + one building block to fix some of them + some important ones + (big leaks) + + +## IRC, freenode, #hurd, 2013-06-29 + + damn, i fix leaks in libpthread, only to find out leaks somewhere + else :( + bddebian: ok, actually it was a bit more complicated than what i + showed you + because in addition to the stack, the call must also release the + send right in the caller's ipc space + (it can't be released before since there would be no mean to + reference the thread to destroy) + or perhaps it should strictly be reserved to self termination + hmm + yes it would probably be simpler + but it should be a decent compromise + i'm close to having a libpthread that doesn't leak anything + and that properly destroys threads and their resources + + +## IRC, freenode, #hurd, 2013-06-30 + + bddebian: ok, it was even more tricky, because the kernel would + save the return value on the user stack (which is released by the call + and then invalid) before checking for asynchronous software traps (ASTs, + a kind of software interrupts in mach), and terminating the calling + thread is done by a deferred AST ... :) + hmm, making threads able to terminate themselves makes rpctrace a + bit useless :/ + well, more restricted + + ok so, tough question : + i have a small test program that creates a thread, and inspect its + state before any thread dies + i can see msg_report_wait requests when using ps + (one per thread) + one of these requests create a new receive right, apparently for + the second thread in the test program + each time i use ps, i can see the sequence numbers of two receive + rights increase + i guess these rights are related to proc and signal handling per + thread + but i can't find what create them + does anyone know ? + tschwing_: ^ :) + + again, too many things wrong elsewhere to cleanly destroy threads + .. + something is deeply wrong with controlling terminals .. + + +## IRC, freenode, #hurd, 2013-07-01 + + youpi: if you happen to notice what receive right is created for + each thread (beyond the obvious port used for blocking and waking up), + please let me know + it's the only port leak i have with thread destruction + and i think it's related to the proc server since i see the + sequence number increase every time i use ps + + pinotree: my change doesn't fix all the pthread leaks but it's a + lot better + bddebian: i've spent almost the whole week end trying to find the + last port leak without success + there is some weird bug related to the controlling tty that hits + me every time i try to change something + it's the same bug that prevents ttys from being correctly closed + when using ssh or screen + well maybe not the same, but it's close + some stale receive right kept around for no apparent reason + and i can't find its source + + +## IRC, freenode, #hurd, 2013-07-02 + + and btw, i don't think i can make my libpthread patch work + i'll just aim at avoiding leaks, but destroying threads and their + related resources depends on other changes i don't clearly see + + +## IRC, freenode, #hurd, 2013-07-03 + + grmbl, i don't want to give up thread destruction .. + + +## IRC, freenode, #hurd, 2013-07-15 + + btw, my work on thread destruction is currently stalled + i don't have much free time right now + + +## IRC, freenode, #hurd, 2013-09-13 + + i think i know why my thread_terminate_deallocate patches leak one + receive port :> + but now i'm not sure of the proper solution + every time a thread is created and destroyed, a receive right is + leaked + i guess it's simply the reply port .. + grmbl + i guess i have to make it a simpleroutine ... + hm too bad, it's not the reply port :( + it's also leaking some memory + it doesn't seem related to my changes though + stacks, rights, and threads are correctly destroyed + some obscure state is left behind + i wonder how exception ports are dealt with + vminfo seems to confirm memory is leaking in the heap + humpf + oh silly me + i don't detach threads + well, detach them ;) + hm worse :p + now i get additional dead names + but it's a step forward + + +## IRC, freenode, #hurd, 2013-09-16 + + that thread port leak is so strange + the leaked port seems to be created when the new thread starts + running + so it looks like a port the kernel would implicitely create + hm could it be a thread-specific reply port ? + ah, yes, there is one of those + how come mach/mig-reply.c in glibc isn't thread-safe ? + it is overriden by sysdeps/mach/hurd/img-reply.c I guess + which uses a threadvar for the mig reply port + oh + talking of which, there is also last_value in + sysdeps/mach/strerror_l.c + strerror_thread_freeres is supposed to get called, but who knows + it does look to be that port + iirc that's the issue which prevents from letting us make threads + exit on idleness? + one of them + ok + maybe the only one, yes + i see memory leaks but they could be related/normal + (i.e. not actual leaks) + on the other hand, i also can't boot a hurd with my patch + but i consider removing such leaks a priority + does anyone know the semantic difference between + __mig_put_reply_port and __mig_dealloc_reply_port ? + i guess __mig_dealloc_reply_port is actually a destruction + operation, right ? + AIUI, dealloc is used when one wants the port not to be reused at + all + because it has been used as a reference for something, and can + still be currently in use + while put_reply would be when we're really done with it, and won't + use it again, and can thus be used as such + or at least something like that + heh + __mig_dealloc_reply_port calls __mach_port_mod_refs, which is a + RPC, and creates a new reply port when destroying the current one + bah + that's fine, it's a deref of the old port, which is not in the + reply_port variable any more + it's fine, but still a leak + well, dealloc does not completely deallocs, yes + that's not really the problem here + i've introduced a case that wasn't considered at the time, namely + that a thread can destroy itself + we probably need another function to be called from the thread exit + i'll simply try with mach_port_destroy + mach_port_destroy seems to be a RPC too ... + grmbl + isn't there a trap version somehow ? + not in libc + erf + at least i know what's wrong now :) + there still is a small memory leak i have to investigate + but outside the stack + the stack, the thread name and the thread are correctly destroyed + slabinfo confirms only one port leak and nothing else is leaked + ok so the port leak was indeed the thread-specific reply port, + taken care of + there are also memory leaks too + + +## IRC, freenode, #hurd, 2013-09-17 + + teythoon: on my side, i'm getting to know our threading + implementation better + closing to clean thread destruction + x15 ipc will hide reply ports ;p + memory leaks solved \o/ + now, have to fix memory release when joining + proper reference counting on detach/join/exit, let's see how it + goes .. + seems to work fine + + +## IRC, freenode, #hurd, 2013-09-18 + + ok i'll soon have gnumach and libc packages including proper + thread destruction :> + braunr: why did you have to touch gnumach? + to add a call allowing threads to release ports and memory + i.e. their last self reference, their reply port and their stack + let me public my current patches + braunr: thread_commit_suicide ? + hehe + initially thread_terminate_self but + it can be used by other threads too + to i named it thread_terminate_release + http://darnassus.sceen.net/~rbraun/0001-pthread_thread_halt.patch + + http://darnassus.sceen.net/~rbraun/0001-thread_terminate_release.patch + the pthread patch needs to be polished because it changes the + semantics of pthread_thread_halt + but other than that, it should be complete + pthread_thread_halt_reallyhalt + ok let's try these libc packages + old static ext2fs for the root, but other than that, it boots + let's try iceweasel + (i'll need to build a hurd package against this new libc, removing + the libports_stability patch which prevents thread destruction in servers + on the way) + prevents thread destruction o_O + yes + in libports only ;p + oh, *only* in libports, I assumed for a moment that it affected + almost every component of the Hurd... + *phew( + ... :) + that's why, after a burst of messages, say because of aptitude + (select), you may see a few hundred threads still hanging around + also why unused servers remain running even after several minutes, + where the normal timeout is 2mins + I wondered about that, some servers (symlink comes to mind) seem + to go away if unused (or that's how I read the code) + symlinks are usually not servers, since most of them actually + exist in file systems, and are implemented through an optimization + yes I know that + trans/symlink.c reads: + /* The timeout here is 10 minutes */ + err = mach_msg_server_timeout (fsys_server, 0, control, + MACH_RCV_TIMEOUT, 1000 * 60 * 10); + if (err == MACH_RCV_TIMED_OUT) + exit (0); + ok + hm, /hurd/symlink doesn't feel at all like a symlink... but + works like one + well, starting iceweasel makes X on my host freeze oO + bbl + /hurd/symlink translators do go away after being unused for 10 + minutes... this is funny if they are set up by hand instead of being + started from a passive translator record + magically vanishing symlinks ;) + + +## IRC, freenode, #hurd, 2013-09-19 + + hum, i can't rebuild a hurd package :( + braunr: with your thread destruction patches in libc? + yes but it's unrelated + In file included from ../../libdiskfs/boot-start.c:38:0: + ./fsys_reply_U.h:173:15: error: conflicting types for + ‘fsys_get_children’ + i didn't see a new libc debian release + hm, David reported that as well + + id:CAEvUa7=QzOiS41G5Vq8k4AiaN10jAPm+CL_205OHJnL0xpJXbw@mail.gmail.com + uh oh + it seems I didn't add a _reply suffix to the reply routines :/ + there's quite a bit of fallout from my patches, I kinda feel bad + :( + teythoon: what i'm wondering is what youpi did too, since he got + hurd binary packages + braunr: well neither he nor I noticed that b/c for us the + declarations were just missing + from libc you mean ? + or hum gnumach-common ? + not sure actually + no it's not a gnumach thing + hurd-dev then + the build system should have cought these, or mig... + also, i see you changed fsys_reply.defs, but nothing about + fsys_request.defs + I have no fsys_requests.defs + looks like there was no fsys_request.defs in the first place + ... *sigh* + do you know an application that often creates and destroys threads + ? + no, sorry + maybe some test suite + ah right + sysbench maybe + also, i've been hit by a lot more network deadlocks than usual + lately + fixing netdde has gained some priority in my todo list + + +## IRC, freenode, #hurd, 2013-09-20 + + oh, git is multithreaded + great + so i've actually tested my libpthread patch quite a lot + + +## IRC, freenode, #hurd, 2013-09-25 + + on a side note, i was able to build gnumach/libc/hurd packages + with thread destruction + nice :) + they boot and work mostly fine, although they add their own issues + e.g. the comm field of the root ext2fs is empty + ps crashes when trying to display threads + but thread destruction actually works, i.e. servers (those that + are configured that away at least) go away after some time, and even + heavily used servers such as ext2fs dynamically scale over time :) + + +## IRC, freenode, #hurd, 2013-10-10 + + concerning threads, i think i figured out the last bugs i had with + thread destruction + it should be well on its way to be merged by the end of the year + + +## IRC, freenode, #hurd, 2013-10-11 + + braunr: is your thread destruction patch ready for testing? + gg0: there are packages at my repository, yes + but i still have hurd fixes to do before i polish it + in particular, posix says returning from main() stops the entire + process and all other threads + i didn't check that during the switch to pthreads, and ext2fs (and + maybe others) actually return from main but expect other threads to live + on + this creates problems when the main thread is actually destroyed, + but not the process + braunr: tmpfs does something like that, but calls pthread_exit + at the end of main + same effect + this was fine with cthreads, but must be changed with pthreads + and libpthread must be fixed to enforce it + (or libc) + + diskfs_startup_diskfs should probably be changed to reuse the main + thread instead of returning + + +## IRC, freenode, #hurd, 2013-10-19 + + I know what threads are, but what is 'thread destruction'? + the hurd currently never destroys individual threads + they're destroyed when tasks are destroyed + if the number of threads in a task peaks at a high number, say + thousands of them, they'll remain until the task is terminated + such tasks are usually file systems, normally never restarted (and + in the case of the root file system, not restartable) + this results in a form of leak + another effect of this leak is that servers which should go away + because of inactivity still remain + since thread destruction doesn't actually work, the debian package + uses a patch to prevent worker threads from timeouting + and to finish with, since thread destruction actually doesn't + work, normal (unpatched) applications that destroy threads are certainly + failing bad + i just need to polish a few things, wait for youpi to finish his + work on TLS to resolve conflicts, and that will be all + + +## IRC, freenode, #hurd, 2013-10-30 + + FYI, the packages on my repository enable actual thread + destruction, and i've altered the libports_stability.patch + it nows only sets the global timeout to 0 + now* + we actually can't let translator "die" on global timeout because + of a race issue + tested for about two weeks now and no major problem sighted + top reports processes running for 100% of their time when + terminating threads, but i expect it's simply mach/proc aggregating their + run time to the task + 100% of cpu time + + +## IRC, freenode, #hurd, 2013-11-08 + + teythoon: darnassus is currently running a modified glibc with + thread destruction, yes + braunr: did that require any fixups in Hurd that I'd have missed + ? + no + well + b/c the resulting hurd package would not boot + actually yes + one + i'll push the patch somewhere + iirc the mach-defpager spewed some error and /hurd/init failed + to bootstrap the system + teythoon: + http://darnassus.sceen.net/~rbraun/0001-Prevent-diskfs-translators-from-destroying-main-thre.patch + make sure you have the proper gnumach packages too :p + well, that could very well account for my trouble ;) + uh + well + gnumach implements thread destruction, glibc uses it, hurd makes + sure it doesn't exit from main + + +## IRC, freenode, #hurd, 2013-11-12 + + ok so, calling pthread_exit() from main isn't the same as + returning from main() + unlike what some man pages seem to say + so loosing task info when destroying the main thread is actually a + proc bug + ugh + ^^ + or a glibc one + the proc server, your favorite Hurd component... + :) + hm :/ + looks like command line arguments are stored on the stack of the + main thread + and proc merely receives the addresses of those in the target task + why not just keep the main thread around? + it represents a minor resource leak, true + yes + that's the hack i suggested + but it is relatively small + well no + my hack was about diskfs translators + it should be generalized in libpthread + seems reasonable + let's do it >) + + +## IRC, freenode, #hurd, 2013-11-13 + + braunr: there is a thread destruction issue in the experimental + ocaml build, worth looking at, probably + what do you mean ? + ... testing 'testfork.ml': ocamlcocamlrun: + ../libpthread/sysdeps/mach/pt-thread-halt.c:51: __pthread_thread_halt: + Unexpected error: (ipc/send) invalid destination port. + during the experimental ocaml build + well yes + thread recycling is buggy + i had the choice to fix it, or implement true destruction + i'm tweaking my patch so it leaves the main thread stack untouched + on destruction + and it should be ready + for review at least + + +## IRC, OFTC, #debian-hurd, 2013-11-13 + + ironforge out of memory during ruby1.9.1 rebuild. during test which + creates 10000 threads + ironforge out of memory during ruby1.9.1 rebuild, test which creates + 10000 threads + i guess ironforge kernel has been rebuilt against -95, correct? + err, what kernel? + 23:37 < youpi> hurd needs a rebuild to be able to work with the newer + eglibc + i mean hurd + yes, libc0.3 breaks the old packages anyway + wrt ENOMEM, was it expected? + wrt disk problems, aren't there on alioth only? + well 10,000 threads is a lot, especially on 32bit machine with 2M + default stack size + that makes 2GiB stacks + can't fit in a 2/2 split model, which gnumach uses + well, though active thread should die right away, just after set x to + false, if i read it correctly + perhaps the stacks are not correctly reused + that's probably worth digging in libpthread + by putting printfs, etc. + it seems stacks are never reused indeed, damn + I just wrote a small test that creates threads which just print + their stack address + that takes just a few minutes to do + i see. about reusage i guess you mean base address is kindof always + incremented + * gg0 likes being wrong + that's it, yes + gg0: take care, by keeping being wrong all the time, sometimes you + get right ;) + and you are definitely right here :) + Mmm, but the stack is really deallocated + and the numbers wrap around + I wonder how that is :) + ok, creating 20 000 threads does work + perhaps ruby does odd things which makes it not work + + +### IRC, OFTC, #debian-hurd, 2013-11-14 + + UID PID PPID TH MSGI MSGO SZ RSS SC STAT TIME COMMAND + 1012 16446 15473 720 987 509 1.89G 23.6M 1 Hu 0:00.15 + /home/gg0-guest/ruby/ruby1.9.git/ruby1.9.1 + -I/home/gg0-guest/ruby/ruby1.9.git/lib -W0 bootstraptest.tmp.rb + 720 threads, stuck + 2G SZ is very big :) + 00:42 < youpi> perhaps ruby does odd things which makes it not work + is that enough to file a ruby bug? as ruby suggests itself btw + no, they will probably not be able to investigate + but you can already check out how they create threads + and try to reproduce the same with a small C program + ehm on ruby2.0 with *context _enabled_ i can not reproduce it + +See [[/open_issues/glibc]] for `*context` functions. + + +## IRC, freenode, #hurd, 2013-11-14 + + nice, i got glibc packages with thread destruction + building hurd packages against it now + everything seems fine + hurd packages ready, let's see + + ruby1.9.1 FTBFS due to a couple of tests + https://buildd.debian.org/status/fetch.php?pkg=ruby1.9.1&arch=hurd-i386&ver=1.9.3.448-1&stamp=1384265526 + second one creates 10000 threads and machine got ENOMEM + bootstraptest.tmp.rb: [BUG] [BUG] pthread_cond_init: Cannot + allocate memory (ENOMEM) ew + few hours ago trying to reproduce it: + 01:20 < gg0> UID PID PPID TH MSGI MSGO SZ RSS SC STAT + TIME COMMAND + 01:20 < gg0> 1012 16446 15473 720 987 509 1.89G 23.6M 1 Hu + 0:00.15 /home/gg0-guest/ruby/ruby1.9.git/ruby1.9.1 + -I/home/gg0-guest/ruby/ruby1.9.git/lib -W0 bootstraptest.tmp.rb + yes that's expected + our stacks are 2M + 10k threads means right over 2G of stacks + userspace is restricted to 2G + but if i read correctly test in question, thread should just set x to + false then die + so ? + and ENOMEM popped upk when there were thread count was at 720 + hum + 10k threads would actually be 20G + 1k threads is 2G + 720 is about 1.5G + the rest is probably the ruby runtime + youpi tried to create 10000 thread, no problem. he guessed something + wrong on ruby side + indeed on ruby2.0 such test succeeds + you can't create 10k threads unless you change the stack size + hurd servers use a stack size of 64k by default which allows them + to go up to 30k iirc + but normal applications use the default 2M + i guess you mean 10000 threads active at the same time. test in + question should make them die after simply setting x to false, i guess + youpi's test did so as well + no + it's about stacks + hm + yes at the same time but + thread recycling is known to be buggy + which is what i'm currently fixing btw + what's the bug? + neal: there are several subtle issues + for example, joining a thread that is also calling pthread_exit + can fail badly + hmm + good that you are on it then :) + or detaching + i don't remember the details + but i remember such problems + apparently, keeping the stack of the main thread isn't enough + :( + for now, i'll keep the entire thread + + +## IRC, freenode, #hurd, 2013-11-15 + + i wasn't doing anything, just some single test runs. but yes, also + that one which creates hundreds of threads + it would like creating 10000 but goes out of memory after ~720 + btw same tests succeed on ruby2.0, so they should be fixed by + backporting some changes + actually it looks more like a deadlock .. + deadlock that says ENOMEM? + ? + ENOMEM is returned because the test task has no more virtual + memory + this doesn't mean the rest of the system should fail + ok i thought you were talking about such test + no it's something else + a deadlock in a critical server + the root file system maybe + braunr: htop and ps hang. just run the test once again + now you should still be able to login + htop/ps hanging means one process is unable to reply to queries + sent to the message port/thread + procfs does that to report on what a process is waiting + it usually mean there is a bug around signals, since the message + thread is also in charge of delivering signals + use ps -eM + and kill -KILL + hum + root 954 S dumping cores is known not to work most of the time + exodar shouldn't be configured like that + so yes, the crash server is hanging + gg0: i've set it to crash --kill and killed the hanging crash + instances blocking top/ps + nice + + my thread destruction patch and tls are indeed conflicting a bit + i suspect the tcb is used after being freed + i think i'll simply recycle the tcb, along with the pthread + structs + ok i think it's fine now + there was also a small bug in the tls code, keeping a reference on + the thread port + mach reference counting is so counter intuitive :/ + well, error-prone + + argh, more bugs in libc :( + :/ + but don't worry, there is always one more bug ;) + this one might explain crashes that are long to trigger + _hurd_self_sigstate() is implemented like this : + _hurd_thread_sigstate (__mach_thread_self ()); + it leaks a reference on the current thread each time it's called + >,< + but glibc maintains such references, so if the maximum value is + reached, and references are dropped, the value can reach 0 + ouch + at which point any call on a thread will result in an invalid send + right + and probably an assertion + well it's a good thing then that you found it :) + i think it's always been there + but it's more apparent since jknoenig's patch on signal + dispositions + the maximum number of user references in mach is 64k + this right leak isn't easy + tls is very tricky heh :) + for the main thread, tls initialization happens after the thread + creation, obviously + but for other threads, it's initialized before starting them + the leak was probably an overlook caused by that complexity + teythoon: actually that leak i mentioned in _hurd_self_sigstate + has only been recently added in Convert sigstate to TLS + so it's merely tls integration polishing + youpi: i'm currently reviewing changes related to tls and i think + there is a bug in _hurd_self_sigstate + calls to mach_thread_self() should be paired with + mach_port_deallocate to avoid urefs overflows + and right leaks + _hurd_critical_section_lock is probably affected too + hm + mhmm + in glibc, hurd/hurd/signal.h, _hurd_critical_section_lock + why is the sigstate unlocked after the call to + _hurd_thread_sigstate + _hurd_thread_sigstate doesn't seem to lock it .. + unless __spin_lock_init does it + yes, leak solved :) + + +## IRC, freenode, #hurd, 2013-11-16 + + argh, _hurd_critical_section_lock is called before the send right + on the main thread is fetched in libpthread :/ + is that bad ? + the sigstate is supposed to be initialized after pthreads + _hurd_critical_section_lock will create it if it sees there is + none + creating the sigstate is currently what makes the send right leak + ok + it's bad then + it may be due to my patch + _hurd_critical_section_lock is called during pthreads + initializatio + n + before the sigstate for the main thread is created, but after the + pthread init routine is called + it does indeed look like the code wasn't written with thread being + destroyed some day in mind :/ + braunr: btw, if you ever feel like benchmarking, sysbench has a + benchmark for threads contending for a lock + yes i've used it before + was it useful for this purpose ? + no :) + :/ + we already know libpthread isn't optimized + and felt it when we switched from cthreads + humpf + simply calling malloc implies a call to + _hurd_critical_section_lock + on the other hand, unlike what some glibc comments say, this does + work + + +## IRC, freenode, #hurd, 2013-11-17 + + looks like i've fixed all leak issues with thread destruction and + tls :) + let's see if ext2fs.static works fine too + braunr: \o/ + sorry about introducing the tls ones :) + no worries, it was expected + and tls was really needed :) + i mean, i expected to have some problems when rebasing on tls :p + braunr: this is good news, how is your rootfs translator holding + up? + building hurd packages right now + for now, only test applications and a few really multithreaded + ones (e.g. iceweasel) have been tested + well, the system boots :) + awesome :) + stressing the file system with git while watching youtube videos + with gnash doesn't make the system crash + you can actually watch yt videos on your Hurd box ? + yes + for a while now + o_O + can't you ? + I never even dared to try + hehe + teythoon: looks stable enough to install on darnassus + + +## IRC, freenode, #hurd, 2013-11-18 + + braunr: wrt to your thread destruction patchset, I thought you + also had to fix the proc server ? + teythoon: no + the problem was in glibc + i may have to fix proc/procfs though, because cpu time gets wrong + with the patch + currently, it's the addition of the cpu time of all threads + mach provides aggregate times including destroyed threads though + ah, I see + one side effect is that you'll see processes sometimes taking 100% + of cpu time although the cpu is unused + or the cpu time of a process gets reduced :) + i guess the 100% cpu is how top sees a negative increment + ^^ + gg0: do my threadterm packages help with ruby1.9 ? + i mean, can you test with them some time ? :) + + +## IRC, freenode, #hurd, 2013-11-21 + + youpi: ping about my question regarding error handling in the + proposed thread_terminate_release call + I agree with what Neal said + he didn't say anything about error handling + see + http://lists.gnu.org/archive/html/bug-hurd/2013-11/msg00181.html + i think i should make the call fail on first error + it shouldn't happen, so it would merely serve to catch bugs + it's not easily recoverable (if it's recoverable at all) + uh, I thought he had + I must have dreamt + + i think i'll go ahead with thread destruction integration + + +## IRC, freenode, #hurd, 2013-11-25 + + i've pushed the thread destruction patches for gnumach upstream + and made a branch in glibc for that too + awesome :) + youpi: i don't remember how glibc changes should be managed + once those are applied, i'll commit in libpthread + braunr: usually we create a topgit branch, and then we add the + patch from that to the debian repository + + +## IRC, freenode, #hurd, 2013-11-29 + + youpi: i still have a leak somewhere with the thread destruction + patches + maybe on the host priv port in bootstrap servers (root fs and proc + server) + it prevents priority adjusting in libports and can easily bring + down a system because servers can start trashing a lot sooner, as it was + the case during the pthread migration + +See discussion about that on [[/open_issues/libpthread]]. + + so i'll hunt it down before merging + + +## IRC, freenode, #hurd, 2013-12-19 + + darnassus still has the libports priority adjustement leaks + i'll apply a few more patches to my hurd packages + + humpf, proc seems to have a problem getting the host priv port :/ + thats bad + what did you do ? + i fixed all the leaks in libports when adjusting priorities + the last one being releasing the host priv right + and i get errors at boot time from the proc server + remember when i had this problem ? + proc doesn't get the host priv port the normal way since the + normal way is to get it from proc iirc + ah, thought you fixed that + so i guess the alternate way doesn't add a reference + well the leak is fixed + the problem you had was due to the leak which made the host priv + port reach its max uref value + now it's just the proc server + the system works fine though + for real ? + the proc server needs the host priv port for getting the new + tasks + well yes + how can it work w/o it ? + i don't know .. + i guess the problem is internal to glibc + i mean, get_priv_ports fails, but that doesn't mean the host priv + port is lost + could be + are you running a patched rootfs translator too ? + yes + ok + b/c i remember having trouble with that + right, the glibc call would make proc call __proc_getprivports + hum + teythoon: do you remember how proc gets its host priv port ? + from init + i think + startup_procinit ? + possibly + right + so it's probably not the host priv port + i mean, the error is about another invalid send right + hm nope, it is on host_priv :/ + hm ok i see, looks like a bug from a debian patch + or rather, a bug fix not yet imported into the debian package + teythoon: you actually fixed it in + 2c9422595f41635e2f4f7ef1afb7eece9001feae + great :) + ah, that one + i was looking at the upstream code and couldn't understand what + was going wrong + :) + much better + except ps -eT doesn't work any more .. + interestingly, with the thread destruction patch, ps -eT sometimes + work, and sometimes doesn't + the behaviour doesn't seem to change without a reboot + and of course, as soon as i say it, i'm proven wrong by the next + test :) + + +## IRC, freenode, #hurd, 2013-12-26 + + __pthread_sigstate_init doesn't seem to be converted to TLS in the + upstream repository master branch + + ah dammit, the global signal dispositions patch touches both glibc + and libpthread @#! + what a mess + + youpi: do you have some time to quickly review the + rbraun/thread_destruction branch in libpthread ? + there might be conflict with some glibc patches + or do you prefer it on the mailing list ? + (i used a branch because it's not based on master) + rather mail the list, yes + ok + it'd also be useful to write the rationale + probably to be left as comment in the source code + yes, that branch was for personal storage :) + so the reader knows how things are recycled or not + hm + that should already be the case + ok + the two structures that are still recycled are the pthread struct + and tls + it's quite obvious from pthread_alloc + and well commented there + for tls, it's explained in pthread_exit + + there, thread destruction finally merged in + and now, we can remove the ugly hacks that were done for + threadvars + :) + change stacks at will and support all sorts of weird languages and + runtimes + braunr: cool :) + + +## IRC, freenode, #hurd, 2013-12-31 + + braunr: I've added sigstate_locking, sigstate_thread_reference and + tls_thread_leak to the debian glibc 2.18 package + I believe that's complete? + is mach_msg_uspace_options ready for being added? Does it bring + much speedup? + AIUI, thread_terminate_release is the union of the branches + mentioned above? + (I'm cleaning up branches in the glibc repo) + youpi1: mach_msg_uspace_options can be left over, it only affects + selects and not noticeably + yes, those three branches are the only ones needed for thread + destruction + ok + does the hurd changes depend on these changes ? + no + good :) + only on tls for one of them + (it's about the default stack size of 64k for hurd servers) + and we have had this in debian for a long time already :) + yes + (how big were they before?) + (where they a couple MiB, and thus exploding to GiBs on thousands + of threads?) + 64k + pthread stacks are 2M by default + yes + + +## IRC, freenode, #hurd, 2014-01-14 + + braunr: it seems your time change in libps made ps produce odd re + results + samy 10987 5 -514358:-18:-42.17 /hurd/firmlink tmp + youpi: wow :) + that change is supposed to run on a system where threads actually + get destroyed + but i don't see what could trigger this side effect + root 8629 664 56 years make -j 3 + :) + heh + youpi: does the hurd package on darnassus include that patch ? + yes + i don't reproduce the problem :/ + err + what command are you using ? + ps -feM on darnassus + root 29642 473 7 months /usr/sbin/sshd -R + hmmmm + i don't see it with a make -j + well, it's not systematic + it's like once over two launches + hhhhmmmmm + it'd look like some random numbers get added + strangely, the gcc processes started by a recursive make aren't + children of make .. + ps -eF hurd seems to report the correct values + even ps -eM + oO + ps -ef too + the problem seems to be with ps -efM + too bad I'm always using that :) + another way to see it is that it makes us spot the issue ;p + + +### IRC, freenode, #hurd, 2014-01-15 + + ok i have an idea of what goes wrong in libps + + youpi: for some reason, ps -efM lacks the PSTAT_TASK_BASIC flag + my patch is wrong since it doesn't try to determine whether the + stats apply to a task or a thread, but that is easy to fix + ps -efM should nonetheless provide basic task info, obviously + in addition, the problems i've observed with ps -T (occasional + segfaults) seem to have existed before thread destruction + they're just strongly exposed now that the thread list can be + shrunk + + libps is quite complicated + even hairy, i'd say .. + + +### IRC, freenode, #hurd, 2014-01-16 + + youpi: i think i have a proper fix for libps + i'll commit it soon + ok + basically, getting system times simply set the PSTAT_THREAD_BASIC + flag + whereas getting the run time of the terminated threads requires + PSTAT_TASK_BASIC + i assumed it was always set in the function i changed when dealing + with a task and not a thread + and well, that was a wrong assumtion, -M can remove it if not + strictly needed by the format + the default format asks for suspend_count, which forces the + retrieval of task basic info, os it works with -eM + but -f doesn't :) + so extremely bad lucky combination of flags :) + indeed + i added a pstat_times using the last (!) available flag bit + looks clean to me + i hope there is no abi issue + (at least everything works with the unmodified ps-hurd executable + and a new libps.so) + + hm, small bug in the thread destruction patch :/ + + +### IRC, freenode, #hurd, 2014-01-17 + + good, i have proper fixes for tls in the main thread and thread + termination :) + awesome :) + i've been wondering, what does it take to get the thread + destruction stuff into the debian package ? + i still have to build test packages, look for (unlikely, heh) + regressions and work some integration details with samuel + hum the main thread tls fixup i guess + youpi was waiting for me to fix that + gnumach already provides the RPC + so it will be in glibc soon + i just have to get those last bits right + teythoon: i'm quite slow at integrating stuff + and samuel then builds packages ? + i mean, is our libc package build linked to the other libc + packages ? + libpthread is applied as a patch to glibc + and loaded as a plugin + + +## IRC, freenode, #hurd, 2014-01-17 + + uhm, did we break fakeroot-tcp ? + we did ? + fakeroot-tcp just works fine on buildds + with fakeroot-tcp, i get + make[4]: Entering directory + `/home/rbraun/devel/debian/packages/hurd/hurd-0.5.git20140113/libdde-linux26/contrib/include' + rm -f .general.d + make[4]: *** [cleanall] Killed + when cleaning the package before building .. + + +### IRC, freenode, #hurd, 2014-01-18 + + damn, fakeroot-tcp won't work on darnassus .. + uh, looks like my tls/thread destruction "fixes" do cause + regressions :( + fakeroot works fine with debian glibc + which one ? + which fakeroot i mean + -tcp + yes, it fails as soon as i use the patched glibc :/ + at least it's easy to reproduce + + +### IRC, freenode, #hurd, 2014-01-20 + + great, 3rd libc version installed on darnassus, let's see if i can + build hurd packages against that + + +### IRC, freenode, #hurd, 2014-01-21 + + damn, fakeroot-tcp still crashes with my latest changes .... + + darnassus looks in good shape + youpi: ^ + youpi: if you have other tests, feel free to do them now + i feel confident about committing the changes, if you're ok with + it + which changes ? + I'm a bit lost in what you were talking about :) + you can find them in 2 patches in /var/tmp on darnassus + one is about fixing thread destruction + i'm pretty certain about this one so i'll commit it directly + the other is fixing the tcb of the main thread + +[[open_issues/libpthread]]. + + where i simply do tcb->self = thread->kernel_thread :) + with a comment explaining why i don't do something else like + deallocating the unused tcb + braunr: ok, that looks good + braunr: awesome :) + youpi: ok + + +### IRC, freenode, #hurd, 2014-01-22 + + there, libpthread should be fine now + + +## IRC, freenode, #hurd, 2014-02-06 + + youpi: in case you're planning to upgrade glibc (or not), the + thread destruction changes are complete + youpi: darnassus has been running them for some weeks with no + visible regression + braunr: ok, good + including it in glibc was on my todo list indeed + and Adam indeed plan for a 2.18 upload + good :) + braunr: this is up to 7c6dc6e28b2fc4b67934223f41cf080ffe58b230, + right? (Wed Jan 22, Fix up the main thread TCB) + yes + oh, i just saw 2.17-98~0 glibc packages on debian-ports :) + yes, it's just to fix the dhcp crash + ah yes, it's not 2.18 + 2.18 is available in experimental + + braunr: just to make sure: did you have + 983b18a6ff16f5687a9ece63a50d1831dec88609 in libc on darnassus? + (which drops the stack size hack) + youpi: let me check + youpi: ah no, i don't, you're right + well, I was just wondering, nothing make me think that was the case + :) + what was the issue that it was raising btw? + threadvards + ok, b ut in which case? + (to make sure I test that before committing) + now that we switched to tls, i would assume the transition path to + be 1/ hurd stops defining that symbol, 2/ libpthread can stop using it + the goal was to reduce the stack size of hurd server threads + well, that's not my question :) I'm wondering in which precise case + that was breaking things + youpi: i don't know, it shouldn't break + ok + youpi: just in case, don't forget that last one line patch i + committed last night, fakeroot can't work right without it + (i made a minor change while reviewing before comitting, and + obviously got it wrong :p) + ok + + braunr: I've upgraded libpthread in debian's eglibc btw + + + /home/rbraun/devel/debian/packages/eglibc/eglibc-2.17/build-tree/hurd-i386-libc/libc.so.phdr: + *** executable stack signaled + from build-tree/hurd-i386-libc/elf/check-execstack.out + i thought glibc didn't use those + anyway it doesn't look to be the regression i'm having + does this ring a bell : + Encountered regressions that don't match expected failures + (debian/testsuite-checking/expected-results-i486-gnu-libc): + test-stpcpy_chk.out, Error 1 + TEST test-stpcpy_chk.out: __stpcpy_chk normal_stpcpy + simple_stpcpy_chk + nope + after what are you getting this regression? + building glibc 2.17-97 with thread destruction patches, including + the one removing the stack size hack + during tests + there also are "progressions", but i'm not sure what these are + some progressions are just luck, other seem to happen on some + platforms only + I'm not sure you want to test 2.17 + a lot has changed between 2.17's libpthread and 2.18's libpthread + (which is now equal to cvs's libpthread + ) + s/cvs/git/ + yes + i usually build with nocheck + + +## IRC, freenode, #hurd, 2014-02-07 + + youpi: on a vm with hurd 1:0.5.git20140203-1, upgrading to a + patched glibc 2.17-97 that includes the patch which reverts the stack + size hack, the system reboots and works fine + ok. I don't remember what problem I was seeing + that version of the hurd no longer defines the symbol + but even then, there shouldn't have been any problem + hm, or does it + yes, it does + youpi: the hurd package patch mentions + Revert this for now, will have to wait for dropping the use of + __pthread_stack_default_size from eglibc's + libpthread_hurd_cond_wait.diff + i wonder how it got there + IIRC I was wondering too + i've installed my c library on darnassus and it works fine there + too + with older (january) hurd packages + looks good to me + + +## IRC, freenode, #hurd, 2014-02-10 + + braunr: btw, do the new libc packages contain your thread + destruction work ? + teythoon: the -98 ones on experimental ? + i don't think they do + the -18 ones should do diff --git a/open_issues/libpthread/t/fix_have_kernel_resources.mdwn b/open_issues/libpthread/t/fix_have_kernel_resources.mdwn deleted file mode 100644 index 02b6ab05..00000000 --- a/open_issues/libpthread/t/fix_have_kernel_resources.mdwn +++ /dev/null @@ -1,1301 +0,0 @@ -[[!meta copyright="Copyright © 2012, 2013, 2014 Free Software Foundation, -Inc."]] - -[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable -id="license" text="Permission is granted to copy, distribute and/or modify this -document under the terms of the GNU Free Documentation License, Version 1.2 or -any later version published by the Free Software Foundation; with no Invariant -Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license -is included in the section entitled [[GNU Free Documentation -License|/fdl]]."]]"""]] - -[[!tag open_issue_libpthread]] - -`t/fix_have_kernel_resources` - -Address problem mentioned in [[/libpthread]], *Threads' Death*. - - -# IRC, freenode, #hurd, 2012-08-30 - - tschwinge: this issue needs more cooperation with the kernel - tschwinge: i.e. the ability to tell the kernel where the stack is, - so it's unmapped when the thread dies - which requiring another thread to perform this deallocation - - -## IRC, freenode, #hurd, 2013-05-09 - - braunr: Speaking of which, didn't you say you had another "easy" - task? - bddebian: make a system call that both terminates a thread and - releases memory - (the memory released being the thread stack) - this way, a thread can completely terminates itself without the - assistance of a managing thread or deferring work - braunr: That's "easy" ? :) - bddebian: since it's just a thread_terminate+vm_deallocate, it is - something like thread_terminate_self - But a syscall not an RPC right? - in hurd terminology, we don't make the distinction - the only real syscalls are mach_msg (obviously) and some to get - well known port rights - e.g. mach_task_self - everything else should be an RPC but could be a system call for - performance - since mach was designed to support clusters, it was necessary that - anything not strictly machine-local was an RPC - and it also helps emulation a lot - so keep doing RPCs :p - - -## IRC, freenode, #hurd, 2013-05-10 - - i'm not sure it should only apply to self though - youpi: can we get a quick opinion on this please ? - i've suggested bddebian to work on a new RPC that both terminates - a thread and releases its stack to help fix libpthread - and initially, i thought of it as operating only on the calling - thread - do you see any reason to make it work on any thread ? - (e.g. a real thread_terminate + vm_deallocate) - (or any reason not to) - thread stack deallocation is always a burden indeed - I'd tend to think it'd be useful, but perhaps ask the list - - -## IRC, freenode, #hurd, 2013-06-26 - - looks like there is a port right leak in libpthread - grmbl, the port leak seems to come from mach_port_destroy being - buggy :/ - hum, apparently we're not the only ones to suffer from port leaks - wrt mach_port_destroy - ew, libpthread is leaking - memory or ports? - both - sounds great ;) - as it is, libpthread doesn't destroy threads - it queues them so they're recycled late - r - but there is confusion between the thread structure itself and its - internal resources - i.e. there is pthread_alloc which allocates a thread structure, - and pthread_create which allocates everything else - but on pthread_exit, nothing is destroyed - when a thread structure is reused, its internal resources are - replaced by new instances - oh - it's ok for joinable threads but most of our threads are detached - pinotree: as expected, it's bigger than expected :p - so i won't be able to write a quick fix - the true way to fix this is make it possible for threads to free - their own resources - let's do that :p - ok, got the new thread termination function, i'll build eglibc - package providing it, then experiment with libpthread - braunr: iirc there's also a tschwinge patch in the debian eglibc - about that - ah - libpthread_fix.diff - i see - thanks for the notice - bddebian: - http://www.sceen.net/~rbraun/0001-thread_terminate_deallocate.patch - bddebian: this is what it looks like - see, short and easy - Aye but didn't youpi say not to bother with it?? - he did ? - i don't remember - I thought that was the implication. Or maybe that was the one I - already did!? - i'd be interested in reading that - anyway, there still are problems in libpthread, and this call is - one building block to fix some of them - some important ones - (big leaks) - - -## IRC, freenode, #hurd, 2013-06-29 - - damn, i fix leaks in libpthread, only to find out leaks somewhere - else :( - bddebian: ok, actually it was a bit more complicated than what i - showed you - because in addition to the stack, the call must also release the - send right in the caller's ipc space - (it can't be released before since there would be no mean to - reference the thread to destroy) - or perhaps it should strictly be reserved to self termination - hmm - yes it would probably be simpler - but it should be a decent compromise - i'm close to having a libpthread that doesn't leak anything - and that properly destroys threads and their resources - - -## IRC, freenode, #hurd, 2013-06-30 - - bddebian: ok, it was even more tricky, because the kernel would - save the return value on the user stack (which is released by the call - and then invalid) before checking for asynchronous software traps (ASTs, - a kind of software interrupts in mach), and terminating the calling - thread is done by a deferred AST ... :) - hmm, making threads able to terminate themselves makes rpctrace a - bit useless :/ - well, more restricted - - ok so, tough question : - i have a small test program that creates a thread, and inspect its - state before any thread dies - i can see msg_report_wait requests when using ps - (one per thread) - one of these requests create a new receive right, apparently for - the second thread in the test program - each time i use ps, i can see the sequence numbers of two receive - rights increase - i guess these rights are related to proc and signal handling per - thread - but i can't find what create them - does anyone know ? - tschwing_: ^ :) - - again, too many things wrong elsewhere to cleanly destroy threads - .. - something is deeply wrong with controlling terminals .. - - -## IRC, freenode, #hurd, 2013-07-01 - - youpi: if you happen to notice what receive right is created for - each thread (beyond the obvious port used for blocking and waking up), - please let me know - it's the only port leak i have with thread destruction - and i think it's related to the proc server since i see the - sequence number increase every time i use ps - - pinotree: my change doesn't fix all the pthread leaks but it's a - lot better - bddebian: i've spent almost the whole week end trying to find the - last port leak without success - there is some weird bug related to the controlling tty that hits - me every time i try to change something - it's the same bug that prevents ttys from being correctly closed - when using ssh or screen - well maybe not the same, but it's close - some stale receive right kept around for no apparent reason - and i can't find its source - - -## IRC, freenode, #hurd, 2013-07-02 - - and btw, i don't think i can make my libpthread patch work - i'll just aim at avoiding leaks, but destroying threads and their - related resources depends on other changes i don't clearly see - - -## IRC, freenode, #hurd, 2013-07-03 - - grmbl, i don't want to give up thread destruction .. - - -## IRC, freenode, #hurd, 2013-07-15 - - btw, my work on thread destruction is currently stalled - i don't have much free time right now - - -## IRC, freenode, #hurd, 2013-09-13 - - i think i know why my thread_terminate_deallocate patches leak one - receive port :> - but now i'm not sure of the proper solution - every time a thread is created and destroyed, a receive right is - leaked - i guess it's simply the reply port .. - grmbl - i guess i have to make it a simpleroutine ... - hm too bad, it's not the reply port :( - it's also leaking some memory - it doesn't seem related to my changes though - stacks, rights, and threads are correctly destroyed - some obscure state is left behind - i wonder how exception ports are dealt with - vminfo seems to confirm memory is leaking in the heap - humpf - oh silly me - i don't detach threads - well, detach them ;) - hm worse :p - now i get additional dead names - but it's a step forward - - -## IRC, freenode, #hurd, 2013-09-16 - - that thread port leak is so strange - the leaked port seems to be created when the new thread starts - running - so it looks like a port the kernel would implicitely create - hm could it be a thread-specific reply port ? - ah, yes, there is one of those - how come mach/mig-reply.c in glibc isn't thread-safe ? - it is overriden by sysdeps/mach/hurd/img-reply.c I guess - which uses a threadvar for the mig reply port - oh - talking of which, there is also last_value in - sysdeps/mach/strerror_l.c - strerror_thread_freeres is supposed to get called, but who knows - it does look to be that port - iirc that's the issue which prevents from letting us make threads - exit on idleness? - one of them - ok - maybe the only one, yes - i see memory leaks but they could be related/normal - (i.e. not actual leaks) - on the other hand, i also can't boot a hurd with my patch - but i consider removing such leaks a priority - does anyone know the semantic difference between - __mig_put_reply_port and __mig_dealloc_reply_port ? - i guess __mig_dealloc_reply_port is actually a destruction - operation, right ? - AIUI, dealloc is used when one wants the port not to be reused at - all - because it has been used as a reference for something, and can - still be currently in use - while put_reply would be when we're really done with it, and won't - use it again, and can thus be used as such - or at least something like that - heh - __mig_dealloc_reply_port calls __mach_port_mod_refs, which is a - RPC, and creates a new reply port when destroying the current one - bah - that's fine, it's a deref of the old port, which is not in the - reply_port variable any more - it's fine, but still a leak - well, dealloc does not completely deallocs, yes - that's not really the problem here - i've introduced a case that wasn't considered at the time, namely - that a thread can destroy itself - we probably need another function to be called from the thread exit - i'll simply try with mach_port_destroy - mach_port_destroy seems to be a RPC too ... - grmbl - isn't there a trap version somehow ? - not in libc - erf - at least i know what's wrong now :) - there still is a small memory leak i have to investigate - but outside the stack - the stack, the thread name and the thread are correctly destroyed - slabinfo confirms only one port leak and nothing else is leaked - ok so the port leak was indeed the thread-specific reply port, - taken care of - there are also memory leaks too - - -## IRC, freenode, #hurd, 2013-09-17 - - teythoon: on my side, i'm getting to know our threading - implementation better - closing to clean thread destruction - x15 ipc will hide reply ports ;p - memory leaks solved \o/ - now, have to fix memory release when joining - proper reference counting on detach/join/exit, let's see how it - goes .. - seems to work fine - - -## IRC, freenode, #hurd, 2013-09-18 - - ok i'll soon have gnumach and libc packages including proper - thread destruction :> - braunr: why did you have to touch gnumach? - to add a call allowing threads to release ports and memory - i.e. their last self reference, their reply port and their stack - let me public my current patches - braunr: thread_commit_suicide ? - hehe - initially thread_terminate_self but - it can be used by other threads too - to i named it thread_terminate_release - http://darnassus.sceen.net/~rbraun/0001-pthread_thread_halt.patch - - http://darnassus.sceen.net/~rbraun/0001-thread_terminate_release.patch - the pthread patch needs to be polished because it changes the - semantics of pthread_thread_halt - but other than that, it should be complete - pthread_thread_halt_reallyhalt - ok let's try these libc packages - old static ext2fs for the root, but other than that, it boots - let's try iceweasel - (i'll need to build a hurd package against this new libc, removing - the libports_stability patch which prevents thread destruction in servers - on the way) - prevents thread destruction o_O - yes - in libports only ;p - oh, *only* in libports, I assumed for a moment that it affected - almost every component of the Hurd... - *phew( - ... :) - that's why, after a burst of messages, say because of aptitude - (select), you may see a few hundred threads still hanging around - also why unused servers remain running even after several minutes, - where the normal timeout is 2mins - I wondered about that, some servers (symlink comes to mind) seem - to go away if unused (or that's how I read the code) - symlinks are usually not servers, since most of them actually - exist in file systems, and are implemented through an optimization - yes I know that - trans/symlink.c reads: - /* The timeout here is 10 minutes */ - err = mach_msg_server_timeout (fsys_server, 0, control, - MACH_RCV_TIMEOUT, 1000 * 60 * 10); - if (err == MACH_RCV_TIMED_OUT) - exit (0); - ok - hm, /hurd/symlink doesn't feel at all like a symlink... but - works like one - well, starting iceweasel makes X on my host freeze oO - bbl - /hurd/symlink translators do go away after being unused for 10 - minutes... this is funny if they are set up by hand instead of being - started from a passive translator record - magically vanishing symlinks ;) - - -## IRC, freenode, #hurd, 2013-09-19 - - hum, i can't rebuild a hurd package :( - braunr: with your thread destruction patches in libc? - yes but it's unrelated - In file included from ../../libdiskfs/boot-start.c:38:0: - ./fsys_reply_U.h:173:15: error: conflicting types for - ‘fsys_get_children’ - i didn't see a new libc debian release - hm, David reported that as well - - id:CAEvUa7=QzOiS41G5Vq8k4AiaN10jAPm+CL_205OHJnL0xpJXbw@mail.gmail.com - uh oh - it seems I didn't add a _reply suffix to the reply routines :/ - there's quite a bit of fallout from my patches, I kinda feel bad - :( - teythoon: what i'm wondering is what youpi did too, since he got - hurd binary packages - braunr: well neither he nor I noticed that b/c for us the - declarations were just missing - from libc you mean ? - or hum gnumach-common ? - not sure actually - no it's not a gnumach thing - hurd-dev then - the build system should have cought these, or mig... - also, i see you changed fsys_reply.defs, but nothing about - fsys_request.defs - I have no fsys_requests.defs - looks like there was no fsys_request.defs in the first place - ... *sigh* - do you know an application that often creates and destroys threads - ? - no, sorry - maybe some test suite - ah right - sysbench maybe - also, i've been hit by a lot more network deadlocks than usual - lately - fixing netdde has gained some priority in my todo list - - -## IRC, freenode, #hurd, 2013-09-20 - - oh, git is multithreaded - great - so i've actually tested my libpthread patch quite a lot - - -## IRC, freenode, #hurd, 2013-09-25 - - on a side note, i was able to build gnumach/libc/hurd packages - with thread destruction - nice :) - they boot and work mostly fine, although they add their own issues - e.g. the comm field of the root ext2fs is empty - ps crashes when trying to display threads - but thread destruction actually works, i.e. servers (those that - are configured that away at least) go away after some time, and even - heavily used servers such as ext2fs dynamically scale over time :) - - -## IRC, freenode, #hurd, 2013-10-10 - - concerning threads, i think i figured out the last bugs i had with - thread destruction - it should be well on its way to be merged by the end of the year - - -## IRC, freenode, #hurd, 2013-10-11 - - braunr: is your thread destruction patch ready for testing? - gg0: there are packages at my repository, yes - but i still have hurd fixes to do before i polish it - in particular, posix says returning from main() stops the entire - process and all other threads - i didn't check that during the switch to pthreads, and ext2fs (and - maybe others) actually return from main but expect other threads to live - on - this creates problems when the main thread is actually destroyed, - but not the process - braunr: tmpfs does something like that, but calls pthread_exit - at the end of main - same effect - this was fine with cthreads, but must be changed with pthreads - and libpthread must be fixed to enforce it - (or libc) - - diskfs_startup_diskfs should probably be changed to reuse the main - thread instead of returning - - -## IRC, freenode, #hurd, 2013-10-19 - - I know what threads are, but what is 'thread destruction'? - the hurd currently never destroys individual threads - they're destroyed when tasks are destroyed - if the number of threads in a task peaks at a high number, say - thousands of them, they'll remain until the task is terminated - such tasks are usually file systems, normally never restarted (and - in the case of the root file system, not restartable) - this results in a form of leak - another effect of this leak is that servers which should go away - because of inactivity still remain - since thread destruction doesn't actually work, the debian package - uses a patch to prevent worker threads from timeouting - and to finish with, since thread destruction actually doesn't - work, normal (unpatched) applications that destroy threads are certainly - failing bad - i just need to polish a few things, wait for youpi to finish his - work on TLS to resolve conflicts, and that will be all - - -## IRC, freenode, #hurd, 2013-10-30 - - FYI, the packages on my repository enable actual thread - destruction, and i've altered the libports_stability.patch - it nows only sets the global timeout to 0 - now* - we actually can't let translator "die" on global timeout because - of a race issue - tested for about two weeks now and no major problem sighted - top reports processes running for 100% of their time when - terminating threads, but i expect it's simply mach/proc aggregating their - run time to the task - 100% of cpu time - - -## IRC, freenode, #hurd, 2013-11-08 - - teythoon: darnassus is currently running a modified glibc with - thread destruction, yes - braunr: did that require any fixups in Hurd that I'd have missed - ? - no - well - b/c the resulting hurd package would not boot - actually yes - one - i'll push the patch somewhere - iirc the mach-defpager spewed some error and /hurd/init failed - to bootstrap the system - teythoon: - http://darnassus.sceen.net/~rbraun/0001-Prevent-diskfs-translators-from-destroying-main-thre.patch - make sure you have the proper gnumach packages too :p - well, that could very well account for my trouble ;) - uh - well - gnumach implements thread destruction, glibc uses it, hurd makes - sure it doesn't exit from main - - -## IRC, freenode, #hurd, 2013-11-12 - - ok so, calling pthread_exit() from main isn't the same as - returning from main() - unlike what some man pages seem to say - so loosing task info when destroying the main thread is actually a - proc bug - ugh - ^^ - or a glibc one - the proc server, your favorite Hurd component... - :) - hm :/ - looks like command line arguments are stored on the stack of the - main thread - and proc merely receives the addresses of those in the target task - why not just keep the main thread around? - it represents a minor resource leak, true - yes - that's the hack i suggested - but it is relatively small - well no - my hack was about diskfs translators - it should be generalized in libpthread - seems reasonable - let's do it >) - - -## IRC, freenode, #hurd, 2013-11-13 - - braunr: there is a thread destruction issue in the experimental - ocaml build, worth looking at, probably - what do you mean ? - ... testing 'testfork.ml': ocamlcocamlrun: - ../libpthread/sysdeps/mach/pt-thread-halt.c:51: __pthread_thread_halt: - Unexpected error: (ipc/send) invalid destination port. - during the experimental ocaml build - well yes - thread recycling is buggy - i had the choice to fix it, or implement true destruction - i'm tweaking my patch so it leaves the main thread stack untouched - on destruction - and it should be ready - for review at least - - -## IRC, OFTC, #debian-hurd, 2013-11-13 - - ironforge out of memory during ruby1.9.1 rebuild. during test which - creates 10000 threads - ironforge out of memory during ruby1.9.1 rebuild, test which creates - 10000 threads - i guess ironforge kernel has been rebuilt against -95, correct? - err, what kernel? - 23:37 < youpi> hurd needs a rebuild to be able to work with the newer - eglibc - i mean hurd - yes, libc0.3 breaks the old packages anyway - wrt ENOMEM, was it expected? - wrt disk problems, aren't there on alioth only? - well 10,000 threads is a lot, especially on 32bit machine with 2M - default stack size - that makes 2GiB stacks - can't fit in a 2/2 split model, which gnumach uses - well, though active thread should die right away, just after set x to - false, if i read it correctly - perhaps the stacks are not correctly reused - that's probably worth digging in libpthread - by putting printfs, etc. - it seems stacks are never reused indeed, damn - I just wrote a small test that creates threads which just print - their stack address - that takes just a few minutes to do - i see. about reusage i guess you mean base address is kindof always - incremented - * gg0 likes being wrong - that's it, yes - gg0: take care, by keeping being wrong all the time, sometimes you - get right ;) - and you are definitely right here :) - Mmm, but the stack is really deallocated - and the numbers wrap around - I wonder how that is :) - ok, creating 20 000 threads does work - perhaps ruby does odd things which makes it not work - - -### IRC, OFTC, #debian-hurd, 2013-11-14 - - UID PID PPID TH MSGI MSGO SZ RSS SC STAT TIME COMMAND - 1012 16446 15473 720 987 509 1.89G 23.6M 1 Hu 0:00.15 - /home/gg0-guest/ruby/ruby1.9.git/ruby1.9.1 - -I/home/gg0-guest/ruby/ruby1.9.git/lib -W0 bootstraptest.tmp.rb - 720 threads, stuck - 2G SZ is very big :) - 00:42 < youpi> perhaps ruby does odd things which makes it not work - is that enough to file a ruby bug? as ruby suggests itself btw - no, they will probably not be able to investigate - but you can already check out how they create threads - and try to reproduce the same with a small C program - ehm on ruby2.0 with *context _enabled_ i can not reproduce it - -See [[/open_issues/glibc]] for `*context` functions. - - -## IRC, freenode, #hurd, 2013-11-14 - - nice, i got glibc packages with thread destruction - building hurd packages against it now - everything seems fine - hurd packages ready, let's see - - ruby1.9.1 FTBFS due to a couple of tests - https://buildd.debian.org/status/fetch.php?pkg=ruby1.9.1&arch=hurd-i386&ver=1.9.3.448-1&stamp=1384265526 - second one creates 10000 threads and machine got ENOMEM - bootstraptest.tmp.rb: [BUG] [BUG] pthread_cond_init: Cannot - allocate memory (ENOMEM) ew - few hours ago trying to reproduce it: - 01:20 < gg0> UID PID PPID TH MSGI MSGO SZ RSS SC STAT - TIME COMMAND - 01:20 < gg0> 1012 16446 15473 720 987 509 1.89G 23.6M 1 Hu - 0:00.15 /home/gg0-guest/ruby/ruby1.9.git/ruby1.9.1 - -I/home/gg0-guest/ruby/ruby1.9.git/lib -W0 bootstraptest.tmp.rb - yes that's expected - our stacks are 2M - 10k threads means right over 2G of stacks - userspace is restricted to 2G - but if i read correctly test in question, thread should just set x to - false then die - so ? - and ENOMEM popped upk when there were thread count was at 720 - hum - 10k threads would actually be 20G - 1k threads is 2G - 720 is about 1.5G - the rest is probably the ruby runtime - youpi tried to create 10000 thread, no problem. he guessed something - wrong on ruby side - indeed on ruby2.0 such test succeeds - you can't create 10k threads unless you change the stack size - hurd servers use a stack size of 64k by default which allows them - to go up to 30k iirc - but normal applications use the default 2M - i guess you mean 10000 threads active at the same time. test in - question should make them die after simply setting x to false, i guess - youpi's test did so as well - no - it's about stacks - hm - yes at the same time but - thread recycling is known to be buggy - which is what i'm currently fixing btw - what's the bug? - neal: there are several subtle issues - for example, joining a thread that is also calling pthread_exit - can fail badly - hmm - good that you are on it then :) - or detaching - i don't remember the details - but i remember such problems - apparently, keeping the stack of the main thread isn't enough - :( - for now, i'll keep the entire thread - - -## IRC, freenode, #hurd, 2013-11-15 - - i wasn't doing anything, just some single test runs. but yes, also - that one which creates hundreds of threads - it would like creating 10000 but goes out of memory after ~720 - btw same tests succeed on ruby2.0, so they should be fixed by - backporting some changes - actually it looks more like a deadlock .. - deadlock that says ENOMEM? - ? - ENOMEM is returned because the test task has no more virtual - memory - this doesn't mean the rest of the system should fail - ok i thought you were talking about such test - no it's something else - a deadlock in a critical server - the root file system maybe - braunr: htop and ps hang. just run the test once again - now you should still be able to login - htop/ps hanging means one process is unable to reply to queries - sent to the message port/thread - procfs does that to report on what a process is waiting - it usually mean there is a bug around signals, since the message - thread is also in charge of delivering signals - use ps -eM - and kill -KILL - hum - root 954 S dumping cores is known not to work most of the time - exodar shouldn't be configured like that - so yes, the crash server is hanging - gg0: i've set it to crash --kill and killed the hanging crash - instances blocking top/ps - nice - - my thread destruction patch and tls are indeed conflicting a bit - i suspect the tcb is used after being freed - i think i'll simply recycle the tcb, along with the pthread - structs - ok i think it's fine now - there was also a small bug in the tls code, keeping a reference on - the thread port - mach reference counting is so counter intuitive :/ - well, error-prone - - argh, more bugs in libc :( - :/ - but don't worry, there is always one more bug ;) - this one might explain crashes that are long to trigger - _hurd_self_sigstate() is implemented like this : - _hurd_thread_sigstate (__mach_thread_self ()); - it leaks a reference on the current thread each time it's called - >,< - but glibc maintains such references, so if the maximum value is - reached, and references are dropped, the value can reach 0 - ouch - at which point any call on a thread will result in an invalid send - right - and probably an assertion - well it's a good thing then that you found it :) - i think it's always been there - but it's more apparent since jknoenig's patch on signal - dispositions - the maximum number of user references in mach is 64k - this right leak isn't easy - tls is very tricky heh :) - for the main thread, tls initialization happens after the thread - creation, obviously - but for other threads, it's initialized before starting them - the leak was probably an overlook caused by that complexity - teythoon: actually that leak i mentioned in _hurd_self_sigstate - has only been recently added in Convert sigstate to TLS - so it's merely tls integration polishing - youpi: i'm currently reviewing changes related to tls and i think - there is a bug in _hurd_self_sigstate - calls to mach_thread_self() should be paired with - mach_port_deallocate to avoid urefs overflows - and right leaks - _hurd_critical_section_lock is probably affected too - hm - mhmm - in glibc, hurd/hurd/signal.h, _hurd_critical_section_lock - why is the sigstate unlocked after the call to - _hurd_thread_sigstate - _hurd_thread_sigstate doesn't seem to lock it .. - unless __spin_lock_init does it - yes, leak solved :) - - -## IRC, freenode, #hurd, 2013-11-16 - - argh, _hurd_critical_section_lock is called before the send right - on the main thread is fetched in libpthread :/ - is that bad ? - the sigstate is supposed to be initialized after pthreads - _hurd_critical_section_lock will create it if it sees there is - none - creating the sigstate is currently what makes the send right leak - ok - it's bad then - it may be due to my patch - _hurd_critical_section_lock is called during pthreads - initializatio - n - before the sigstate for the main thread is created, but after the - pthread init routine is called - it does indeed look like the code wasn't written with thread being - destroyed some day in mind :/ - braunr: btw, if you ever feel like benchmarking, sysbench has a - benchmark for threads contending for a lock - yes i've used it before - was it useful for this purpose ? - no :) - :/ - we already know libpthread isn't optimized - and felt it when we switched from cthreads - humpf - simply calling malloc implies a call to - _hurd_critical_section_lock - on the other hand, unlike what some glibc comments say, this does - work - - -## IRC, freenode, #hurd, 2013-11-17 - - looks like i've fixed all leak issues with thread destruction and - tls :) - let's see if ext2fs.static works fine too - braunr: \o/ - sorry about introducing the tls ones :) - no worries, it was expected - and tls was really needed :) - i mean, i expected to have some problems when rebasing on tls :p - braunr: this is good news, how is your rootfs translator holding - up? - building hurd packages right now - for now, only test applications and a few really multithreaded - ones (e.g. iceweasel) have been tested - well, the system boots :) - awesome :) - stressing the file system with git while watching youtube videos - with gnash doesn't make the system crash - you can actually watch yt videos on your Hurd box ? - yes - for a while now - o_O - can't you ? - I never even dared to try - hehe - teythoon: looks stable enough to install on darnassus - - -## IRC, freenode, #hurd, 2013-11-18 - - braunr: wrt to your thread destruction patchset, I thought you - also had to fix the proc server ? - teythoon: no - the problem was in glibc - i may have to fix proc/procfs though, because cpu time gets wrong - with the patch - currently, it's the addition of the cpu time of all threads - mach provides aggregate times including destroyed threads though - ah, I see - one side effect is that you'll see processes sometimes taking 100% - of cpu time although the cpu is unused - or the cpu time of a process gets reduced :) - i guess the 100% cpu is how top sees a negative increment - ^^ - gg0: do my threadterm packages help with ruby1.9 ? - i mean, can you test with them some time ? :) - - -## IRC, freenode, #hurd, 2013-11-21 - - youpi: ping about my question regarding error handling in the - proposed thread_terminate_release call - I agree with what Neal said - he didn't say anything about error handling - see - http://lists.gnu.org/archive/html/bug-hurd/2013-11/msg00181.html - i think i should make the call fail on first error - it shouldn't happen, so it would merely serve to catch bugs - it's not easily recoverable (if it's recoverable at all) - uh, I thought he had - I must have dreamt - - i think i'll go ahead with thread destruction integration - - -## IRC, freenode, #hurd, 2013-11-25 - - i've pushed the thread destruction patches for gnumach upstream - and made a branch in glibc for that too - awesome :) - youpi: i don't remember how glibc changes should be managed - once those are applied, i'll commit in libpthread - braunr: usually we create a topgit branch, and then we add the - patch from that to the debian repository - - -## IRC, freenode, #hurd, 2013-11-29 - - youpi: i still have a leak somewhere with the thread destruction - patches - maybe on the host priv port in bootstrap servers (root fs and proc - server) - it prevents priority adjusting in libports and can easily bring - down a system because servers can start trashing a lot sooner, as it was - the case during the pthread migration - -See discussion about that on [[/open_issues/libpthread]]. - - so i'll hunt it down before merging - - -## IRC, freenode, #hurd, 2013-12-19 - - darnassus still has the libports priority adjustement leaks - i'll apply a few more patches to my hurd packages - - humpf, proc seems to have a problem getting the host priv port :/ - thats bad - what did you do ? - i fixed all the leaks in libports when adjusting priorities - the last one being releasing the host priv right - and i get errors at boot time from the proc server - remember when i had this problem ? - proc doesn't get the host priv port the normal way since the - normal way is to get it from proc iirc - ah, thought you fixed that - so i guess the alternate way doesn't add a reference - well the leak is fixed - the problem you had was due to the leak which made the host priv - port reach its max uref value - now it's just the proc server - the system works fine though - for real ? - the proc server needs the host priv port for getting the new - tasks - well yes - how can it work w/o it ? - i don't know .. - i guess the problem is internal to glibc - i mean, get_priv_ports fails, but that doesn't mean the host priv - port is lost - could be - are you running a patched rootfs translator too ? - yes - ok - b/c i remember having trouble with that - right, the glibc call would make proc call __proc_getprivports - hum - teythoon: do you remember how proc gets its host priv port ? - from init - i think - startup_procinit ? - possibly - right - so it's probably not the host priv port - i mean, the error is about another invalid send right - hm nope, it is on host_priv :/ - hm ok i see, looks like a bug from a debian patch - or rather, a bug fix not yet imported into the debian package - teythoon: you actually fixed it in - 2c9422595f41635e2f4f7ef1afb7eece9001feae - great :) - ah, that one - i was looking at the upstream code and couldn't understand what - was going wrong - :) - much better - except ps -eT doesn't work any more .. - interestingly, with the thread destruction patch, ps -eT sometimes - work, and sometimes doesn't - the behaviour doesn't seem to change without a reboot - and of course, as soon as i say it, i'm proven wrong by the next - test :) - - -## IRC, freenode, #hurd, 2013-12-26 - - __pthread_sigstate_init doesn't seem to be converted to TLS in the - upstream repository master branch - - ah dammit, the global signal dispositions patch touches both glibc - and libpthread @#! - what a mess - - youpi: do you have some time to quickly review the - rbraun/thread_destruction branch in libpthread ? - there might be conflict with some glibc patches - or do you prefer it on the mailing list ? - (i used a branch because it's not based on master) - rather mail the list, yes - ok - it'd also be useful to write the rationale - probably to be left as comment in the source code - yes, that branch was for personal storage :) - so the reader knows how things are recycled or not - hm - that should already be the case - ok - the two structures that are still recycled are the pthread struct - and tls - it's quite obvious from pthread_alloc - and well commented there - for tls, it's explained in pthread_exit - - there, thread destruction finally merged in - and now, we can remove the ugly hacks that were done for - threadvars - :) - change stacks at will and support all sorts of weird languages and - runtimes - braunr: cool :) - - -## IRC, freenode, #hurd, 2013-12-31 - - braunr: I've added sigstate_locking, sigstate_thread_reference and - tls_thread_leak to the debian glibc 2.18 package - I believe that's complete? - is mach_msg_uspace_options ready for being added? Does it bring - much speedup? - AIUI, thread_terminate_release is the union of the branches - mentioned above? - (I'm cleaning up branches in the glibc repo) - youpi1: mach_msg_uspace_options can be left over, it only affects - selects and not noticeably - yes, those three branches are the only ones needed for thread - destruction - ok - does the hurd changes depend on these changes ? - no - good :) - only on tls for one of them - (it's about the default stack size of 64k for hurd servers) - and we have had this in debian for a long time already :) - yes - (how big were they before?) - (where they a couple MiB, and thus exploding to GiBs on thousands - of threads?) - 64k - pthread stacks are 2M by default - yes - - -## IRC, freenode, #hurd, 2014-01-14 - - braunr: it seems your time change in libps made ps produce odd re - results - samy 10987 5 -514358:-18:-42.17 /hurd/firmlink tmp - youpi: wow :) - that change is supposed to run on a system where threads actually - get destroyed - but i don't see what could trigger this side effect - root 8629 664 56 years make -j 3 - :) - heh - youpi: does the hurd package on darnassus include that patch ? - yes - i don't reproduce the problem :/ - err - what command are you using ? - ps -feM on darnassus - root 29642 473 7 months /usr/sbin/sshd -R - hmmmm - i don't see it with a make -j - well, it's not systematic - it's like once over two launches - hhhhmmmmm - it'd look like some random numbers get added - strangely, the gcc processes started by a recursive make aren't - children of make .. - ps -eF hurd seems to report the correct values - even ps -eM - oO - ps -ef too - the problem seems to be with ps -efM - too bad I'm always using that :) - another way to see it is that it makes us spot the issue ;p - - -### IRC, freenode, #hurd, 2014-01-15 - - ok i have an idea of what goes wrong in libps - - youpi: for some reason, ps -efM lacks the PSTAT_TASK_BASIC flag - my patch is wrong since it doesn't try to determine whether the - stats apply to a task or a thread, but that is easy to fix - ps -efM should nonetheless provide basic task info, obviously - in addition, the problems i've observed with ps -T (occasional - segfaults) seem to have existed before thread destruction - they're just strongly exposed now that the thread list can be - shrunk - - libps is quite complicated - even hairy, i'd say .. - - -### IRC, freenode, #hurd, 2014-01-16 - - youpi: i think i have a proper fix for libps - i'll commit it soon - ok - basically, getting system times simply set the PSTAT_THREAD_BASIC - flag - whereas getting the run time of the terminated threads requires - PSTAT_TASK_BASIC - i assumed it was always set in the function i changed when dealing - with a task and not a thread - and well, that was a wrong assumtion, -M can remove it if not - strictly needed by the format - the default format asks for suspend_count, which forces the - retrieval of task basic info, os it works with -eM - but -f doesn't :) - so extremely bad lucky combination of flags :) - indeed - i added a pstat_times using the last (!) available flag bit - looks clean to me - i hope there is no abi issue - (at least everything works with the unmodified ps-hurd executable - and a new libps.so) - - hm, small bug in the thread destruction patch :/ - - -### IRC, freenode, #hurd, 2014-01-17 - - good, i have proper fixes for tls in the main thread and thread - termination :) - awesome :) - i've been wondering, what does it take to get the thread - destruction stuff into the debian package ? - i still have to build test packages, look for (unlikely, heh) - regressions and work some integration details with samuel - hum the main thread tls fixup i guess - youpi was waiting for me to fix that - gnumach already provides the RPC - so it will be in glibc soon - i just have to get those last bits right - teythoon: i'm quite slow at integrating stuff - and samuel then builds packages ? - i mean, is our libc package build linked to the other libc - packages ? - libpthread is applied as a patch to glibc - and loaded as a plugin - - -## IRC, freenode, #hurd, 2014-01-17 - - uhm, did we break fakeroot-tcp ? - we did ? - fakeroot-tcp just works fine on buildds - with fakeroot-tcp, i get - make[4]: Entering directory - `/home/rbraun/devel/debian/packages/hurd/hurd-0.5.git20140113/libdde-linux26/contrib/include' - rm -f .general.d - make[4]: *** [cleanall] Killed - when cleaning the package before building .. - - -### IRC, freenode, #hurd, 2014-01-18 - - damn, fakeroot-tcp won't work on darnassus .. - uh, looks like my tls/thread destruction "fixes" do cause - regressions :( - fakeroot works fine with debian glibc - which one ? - which fakeroot i mean - -tcp - yes, it fails as soon as i use the patched glibc :/ - at least it's easy to reproduce - - -### IRC, freenode, #hurd, 2014-01-20 - - great, 3rd libc version installed on darnassus, let's see if i can - build hurd packages against that - - -### IRC, freenode, #hurd, 2014-01-21 - - damn, fakeroot-tcp still crashes with my latest changes .... - - darnassus looks in good shape - youpi: ^ - youpi: if you have other tests, feel free to do them now - i feel confident about committing the changes, if you're ok with - it - which changes ? - I'm a bit lost in what you were talking about :) - you can find them in 2 patches in /var/tmp on darnassus - one is about fixing thread destruction - i'm pretty certain about this one so i'll commit it directly - the other is fixing the tcb of the main thread - -[[open_issues/libpthread]]. - - where i simply do tcb->self = thread->kernel_thread :) - with a comment explaining why i don't do something else like - deallocating the unused tcb - braunr: ok, that looks good - braunr: awesome :) - youpi: ok - - -### IRC, freenode, #hurd, 2014-01-22 - - there, libpthread should be fine now - - -## IRC, freenode, #hurd, 2014-02-06 - - youpi: in case you're planning to upgrade glibc (or not), the - thread destruction changes are complete - youpi: darnassus has been running them for some weeks with no - visible regression - braunr: ok, good - including it in glibc was on my todo list indeed - and Adam indeed plan for a 2.18 upload - good :) - braunr: this is up to 7c6dc6e28b2fc4b67934223f41cf080ffe58b230, - right? (Wed Jan 22, Fix up the main thread TCB) - yes - oh, i just saw 2.17-98~0 glibc packages on debian-ports :) - yes, it's just to fix the dhcp crash - ah yes, it's not 2.18 - 2.18 is available in experimental - - braunr: just to make sure: did you have - 983b18a6ff16f5687a9ece63a50d1831dec88609 in libc on darnassus? - (which drops the stack size hack) - youpi: let me check - youpi: ah no, i don't, you're right - well, I was just wondering, nothing make me think that was the case - :) - what was the issue that it was raising btw? - threadvards - ok, b ut in which case? - (to make sure I test that before committing) - now that we switched to tls, i would assume the transition path to - be 1/ hurd stops defining that symbol, 2/ libpthread can stop using it - the goal was to reduce the stack size of hurd server threads - well, that's not my question :) I'm wondering in which precise case - that was breaking things - youpi: i don't know, it shouldn't break - ok - youpi: just in case, don't forget that last one line patch i - committed last night, fakeroot can't work right without it - (i made a minor change while reviewing before comitting, and - obviously got it wrong :p) - ok - - braunr: I've upgraded libpthread in debian's eglibc btw - - - /home/rbraun/devel/debian/packages/eglibc/eglibc-2.17/build-tree/hurd-i386-libc/libc.so.phdr: - *** executable stack signaled - from build-tree/hurd-i386-libc/elf/check-execstack.out - i thought glibc didn't use those - anyway it doesn't look to be the regression i'm having - does this ring a bell : - Encountered regressions that don't match expected failures - (debian/testsuite-checking/expected-results-i486-gnu-libc): - test-stpcpy_chk.out, Error 1 - TEST test-stpcpy_chk.out: __stpcpy_chk normal_stpcpy - simple_stpcpy_chk - nope - after what are you getting this regression? - building glibc 2.17-97 with thread destruction patches, including - the one removing the stack size hack - during tests - there also are "progressions", but i'm not sure what these are - some progressions are just luck, other seem to happen on some - platforms only - I'm not sure you want to test 2.17 - a lot has changed between 2.17's libpthread and 2.18's libpthread - (which is now equal to cvs's libpthread - ) - s/cvs/git/ - yes - i usually build with nocheck - - -## IRC, freenode, #hurd, 2014-02-07 - - youpi: on a vm with hurd 1:0.5.git20140203-1, upgrading to a - patched glibc 2.17-97 that includes the patch which reverts the stack - size hack, the system reboots and works fine - ok. I don't remember what problem I was seeing - that version of the hurd no longer defines the symbol - but even then, there shouldn't have been any problem - hm, or does it - yes, it does - youpi: the hurd package patch mentions - Revert this for now, will have to wait for dropping the use of - __pthread_stack_default_size from eglibc's - libpthread_hurd_cond_wait.diff - i wonder how it got there - IIRC I was wondering too - i've installed my c library on darnassus and it works fine there - too - with older (january) hurd packages - looks good to me - - -## IRC, freenode, #hurd, 2014-02-10 - - braunr: btw, do the new libc packages contain your thread - destruction work ? - teythoon: the -98 ones on experimental ? - i don't think they do - the -18 ones should do -- cgit v1.2.3