From 12c341b917921eb631026ec44a284c4d884e5de6 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Wed, 6 Mar 2013 21:52:20 +0100 Subject: IRC. --- open_issues/libpthread.mdwn | 346 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 346 insertions(+) (limited to 'open_issues/libpthread.mdwn') diff --git a/open_issues/libpthread.mdwn b/open_issues/libpthread.mdwn index 05aab85f..f0c0db58 100644 --- a/open_issues/libpthread.mdwn +++ b/open_issues/libpthread.mdwn @@ -1170,6 +1170,12 @@ There is a [[!FF_project 275]][[!tag bounty]] on this task. haven't tested +### IRC, freenode, #hurd, 2013-01-26 + + ah great, one of the recent fixes (probably select-eintr or + setitimer) fixed exim4 :) + + ## IRC, freenode, #hurd, 2012-09-23 tschwinge: i committed the last hurd pthread change, @@ -1270,6 +1276,17 @@ There is a [[!FF_project 275]][[!tag bounty]] on this task. that's it, yes +### IRC, freenode, #hurd, 2013-03-01 + + braunr: btw, "unable to adjust libports thread priority: (ipc/send) + invalid destination port" is actually not a sign of fatality + bach recovered from it + youpi: well, it never was a sign of fatality + but it means that, for some reason, a process looses a right for a + very obscure reason :/ + weird sentence, agreed :p + + ## IRC, freenode, #hurd, 2012-12-05 tschwinge: i'm currently working on a few easy bugs and i have @@ -1459,3 +1476,332 @@ Same issue as [[term_blocking]] perhaps? we have a similar problem with the hurd-specific cancellation code, it's in my todo list with io_select ah, no, the condvar is not global + + +## IRC, freenode, #hurd, 2013-01-14 + + *sigh* thread cancellable is totally broken :( + cancellation* + it looks like playing with thread cancellability can make some + functions completely restart + (e.g. one call to printf to write twice its output) + +[[git_duplicated_content]], [[git-core-2]]. + + * braunr is cooking a patch to fix pthread cancellation in + pthread_cond_{,timed}wait, smells good + youpi: ever heard of something that would make libc functions + "restart" ? + you mean as a feature, or as a bug ? + when changing the pthread cancellation state of a thread, i + sometimes see printf print its output twice + or perhaps after a signal dispatch? + i'll post my test code + that could be a duplicate write + due to restarting after signal + http://www.sceen.net/~rbraun/pthreads_test_cancel.c + #include + #include + #include + #include + #include + + static pthread_cond_t cond = PTHREAD_COND_INITIALIZER; + static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; + static int predicate; + static int ready; + static int cancelled; + + static void + uncancellable_printf(const char *format, ...) + { + int oldstate; + va_list ap; + + va_start(ap, format); + pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, &oldstate); + vprintf(format, ap); + pthread_setcancelstate(oldstate, &oldstate); + va_end(ap); + } + + static void * + run(void *arg) + { + uncancellable_printf("thread: setting ready\n"); + ready = 1; + uncancellable_printf("thread: spin until cancellation is sent\n"); + + while (!cancelled) + sched_yield(); + + uncancellable_printf("thread: locking mutex\n"); + pthread_mutex_lock(&mutex); + uncancellable_printf("thread: waiting for predicate\n"); + + while (!predicate) + pthread_cond_wait(&cond, &mutex); + + uncancellable_printf("thread: unlocking mutex\n"); + pthread_mutex_unlock(&mutex); + uncancellable_printf("thread: exit\n"); + return NULL; + } + + int + main(int argc, char *argv[]) + { + pthread_t thread; + + uncancellable_printf("main: create thread\n"); + pthread_create(&thread, NULL, run, NULL); + uncancellable_printf("main: spin until thread is ready\n"); + + while (!ready) + sched_yield(); + + uncancellable_printf("main: sending cancellation\n"); + pthread_cancel(thread); + uncancellable_printf("main: setting cancelled\n"); + cancelled = 1; + uncancellable_printf("main: joining thread\n"); + pthread_join(thread, NULL); + uncancellable_printf("main: exit\n"); + return EXIT_SUCCESS; + } + youpi: i'd see two calls to write, the second because of a signal, + as normal, as long as the second call resumes, but not restarts after + finishing :/ + or restarts because nothing was done (or everything was entirely + rolled back) + well, with an RPC you may not be sure whether it's finished or not + ah + we don't really have rollback + i don't really see the difference with a syscall there + the kernel controls the interruption in the case of the syscall + except that write is normally atomic if i'm right + it can't happen on the way back to userland + but that could be exactly the same with RPCs + while perhaps it can happen on the mach_msg back to userland + back to userland ok, back to the application, no + anyway, that's a side issue + i'm fixing a few bugs in libpthread + and noticed that + (i should soon have patches to fix - at least partially - thread + cancellation and timed blocking) + i was just wondering how cancellation how handled in glibc wrt + libpthread + I don't know + (because the non standard hurd cancellation has nothing to do with + pthread cancellation)à + ok + s/how h/is h/ + + +### IRC, freenode, #hurd, 2013-01-15 + + braunr: Re »one call to printf to write twice its output«: + sounds familiar: + http://www.gnu.org/software/hurd/open_issues/git_duplicated_content.html + and http://www.gnu.org/software/hurd/open_issues/git-core-2.html + tschwinge: what i find strange with the duplicated operations i've + seen is that i merely use pthreads and printf, nothing else + no setitimer, no alarm, no select + so i wonder how cancellation/syscall restart is actually handled + in our glibc + but i agree with you on the analysis + + +### IRC, freenode, #hurd, 2013-01-16 + + neal: do you (by any chance) remember if there could possibly be + spurious wakeups in your libpthread implementation ? + braunr: There probably are. + but I don't recall + + i think the duplicated content issue is due to the libmach/glibc + mach_msg wrapper + which restarts a message send if interrupted + Hrm, depending on which point it has been interrupted you mean? + yes + not sure yet and i could be wrong + but i suspect that if interrupted after send and during receive, + the restart might be wrongfully done + i'm currently reworking the timed* pthreads functions, doing the + same kind of changes i did last summer when working on select (since + implement the timeout at the server side requires pthread_cond_timedwait) + and i limit the message queue size of the port used to wake up + threads to 1 + and it seems i have the same kind of problems, i.e. blocking + because of a second, unexpected send + i'll try using __mach_msg_trap directly and see how it goes + Hrm, mach/msg.c:__mach_msg does look correct to me, but yeah, + won't hurd to confirm this by looking what direct usage of + __mach_msg_trap is doing. + tschwinge: can i ask if you still have a cthreads based hurd + around ? + tschwinge: and if so, to send me libthreads.so.0.3 ... :) + braunr: darnassus:~tschwinge/libthreads.so.0.3 + call 19c0 + so, cthreads were also using the glibc wrapper + and i never had a single MACH_SEND_INTERRUPTED + or a busy queue :/ + (IOW, no duplicated messages, and the wrapper indeed looks + correct, so it's something else) + (Assuming Mach is doing the correct thing re interruptions, of + course...) + mach doesn't implement it + it's explicitely meant to be done in userspace + mach merely reports the error + i checked the osfmach code of libmach, it's almost exactly the + same as ours + Yeah, I meant Mach returns the interurption code but anyway + completed the RPC. + ok + i don't expect mach wouldn't do it right + the only difference in osf libmach is that, when retrying, + MACH_SEND_INTERRUPT|MACH_RCV_INTERRUPT are both masked (for both the + send/send+receive and receive cases) + Hrm. + but they say it's for performance, i.e. mach won't take the slow + path because of unexpected bits in the options + we probably should do the same anyway + + +### IRC, freenode, #hurd, 2013-01-17 + + tschwinge: i think our duplicated RPCs come from + hurd/intr-msg.c:148 (err == MACH_SEND_INTERRUPTED but !(option & + MACH_SEND_MSG)) + a thread is interrupted by a signal meant for a different thread + hum no, still not that .. + or maybe .. :) + Hrm. Why would it matter for for the current thread for which + reason (different thread) mach_msg_trap returns *_INTERRUPTED? + mach_msg wouldn't return it, as explained in the comment + the signal thread would, to indicate the send was completed but + the receive must be retried + however, when retrying, the original user_options are used again, + which contain MACH_SEND_MSG + i'll test with a modified version that masks it + tschwinge: hm no, doesn't fix anything :( + + +### IRC, freenode, #hurd, 2013-01-18 + + the duplicated rpc calls is one i find very very frustrating :/ + you mean the dup writes we've seen lately? + yes + k + + +### IRC, freenode, #hurd, 2013-01-19 + + all right, i think the duplicated message sends are due to thread + creation + the duplicated message seems to be sent by the newly created + thread + arg no, misread + + +### IRC, freenode, #hurd, 2013-01-20 + + tschwinge: youpi: about the diplucated messages issue, it seems to + be caused by two threads (with pthreads) doing an rpc concurrently + duplicated* + + +### IRC, freenode, #hurd, 2013-01-21 + + ah, found something interesting + tschwinge: there seems to be a race on our file descriptors + the content written by one thread seems to be retained somewhere + and another thread writing data to the file descriptor will resend what + the first already did + it could be a FILE race instead of fd one though + yes, it's not at the fd level, it's above + so good news, seems like the low level message/signalling code + isn't faulty here + all right, simple explanation: our IO_lockfile functions are + no-ops + braunr: i found that out days ago, and samuel said they were + okay + +[[glibc]], `flockfile`/`ftrylockfile`/`funlockfile`. + + +## IRC, freenode, #hurd, 2013-01-15 + + hmm, looks like subhurds have been broken by the pthreads patch :/ + arg, we really do have broken subhurds :(( + time for an immersion in the early hurd bootstrapping stuff + Hrm. Narrowed down to cthreads -> pthread you say. + i think so + but i think the problem is only exposed + it was already present before + even for the main hurd, i sometimes have systems blocking on exec + there must be a race there that showed far less frequently with + cthreads + youpi: we broke subhurds :/ + ? + i can't start one + exec seems to die and prevent the root file system from + progressing + there must be a race, exposed by the switch to pthreads + arg, looks like exec doesn't even reach main :( + now, i'm wondering if it could be the tls support that stops exec + although i wonder why exec would start correctly on a main hurd, + and not on a subhurd :( + i even wonder how much progress ld.so.1 is able to make, and don't + have much idea on how to debug that + + +### IRC, freenode, #hurd, 2013-01-22 + + hm, subhurds seem to be broken because of select + damn select ! + hm i see, we can't boot a subhurd that still uses libthreads from + a main hurd that doesn't + the linker can't find it and doesn't start exec + pinotree: do you understand what the fmh function does in + sysdeps/mach/hurd/dl-sysdep.c ? + i think we broke subhurds by fixing vm_map with size 0 + braunr: no idea, but i remember thomas talking about this code + +[[vm_map_kernel_bug]] + + it checks for KERN_INVALID_ADDRESS and KERN_NO_SPACE + and calls assert_perror(err); to make sure it's one of them + but now, KERN_INVALID_ARGUMENT can be returned + ok i understand what it does + and youpi has changed the code, so he does too + (now i'm wondering why he didn't think of it when we fixed vm_map + size with 0 but his head must already be filled with other things so ..) + anyway, once this is dealt with, we get subhurds back :) + yes, with a slight change, my subhurd starts again \o/ + youpi: i found the bug that prevents subhurds from booting + it's caused by our fixing of vm_map with size 0 + when ld.so.1 starts exec, the code in + sysdeps/mach/hurd/dl-sysdep.c fails because it doesn't expect the new + error code we introduced + (the fmh functions) + ah :) + good :) + adding KERN_INVALID_ARGUMENT to the list should do the job, but if + i understand the code correctly, checking if fmhs isn't 0 before calling + vm_map should do the work too + s/do the work/work/ + i'm not sure which is the preferred way + otherwise I believe fmh could be just fixed to avoid calling vm_map + in the !fmhs case + yes that's what i currently do + at the start of the loop, just after computing it + seems to work so far + + +## IRC, freenode, #hurd, 2013-01-22 + + i have almost completed fixing both cancellation and timeout + handling, but there are still a few bugs remaining + fyi, the related discussion was + https://lists.gnu.org/archive/html/bug-hurd/2012-08/msg00057.html -- cgit v1.2.3