[[!meta copyright="Copyright © 2013, 2015 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled [[GNU Free Documentation License|/fdl]]."]]"""]] [[!meta title="cancellation points are not cancelling threads"]] [[!tag open_issue_libpthread]] #include <pthread.h> #include <stdio.h> #include <sys/select.h> #include <unistd.h> void *f (void*foo) { char buf[128]; //pthread_setcanceltype (PTHREAD_CANCEL_ASYNCHRONOUS, NULL); while (1) { read (0, buf, sizeof(buf)); } } int main (void) { pthread_t t; pthread_create (&t, NULL, f, NULL); sleep (1); pthread_cancel (t); pthread_join (t, NULL); exit(0); } read() is not behaving as a cancellation point, only setting the cancel type to asynchronous permits this testcase to terminate. We do have the pthread_setcanceltype glibc/libpthread hook in the forward structure, but we are not using it: the LIBC_CANCEL_ASYNC macros are void, and we're not using them in the mig msg call either. # Provenance ## IRC, OFTC, #debian-hurd, 2013-04-15 <paravoid> so, let me say a few things about the bug in the first place <paravoid> the package builds and runs a test suite <paravoid> the second test in the test suite blocks forever <paravoid> a blocked pthread_join is what I see <paravoid> I'm unsure why <paravoid> have you seen anything like it before? <youpi> whenever the thread doesn't actually terminate, sure <youpi> what is the thread usually blocked on when you cancel it? <paravoid> this is a hurd-specific issue <paravoid> works on all other arches <youpi> could be just that all other archs have more relaxed behavior <youpi> thus the question of what exactly is supposed to be happening <youpi> apparently it is inside a select? <youpi> it seems select is not cancellable here <pinotree> wasn't the patch you sent? <youpi> no, my patch was about signals <youpi> not cancellation <pinotree> k <youpi> (even if that could be related, of course) <paravoid> how did you see that? <paravoid> what's the equivalent of strace? <youpi> thread 3 is inside _hurd_select <paravoid> thread 1 is blocked on join <paravoid> but the code is <paravoid> if(gdmaps->reload_thread_spawned) { <paravoid> pthread_cancel(gdmaps->reload_tid); <paravoid> pthread_join(gdmaps->reload_tid, NULL); <paravoid> } <paravoid> so cancel should have killed the thread <youpi> cancelling a thread is a complex matter <youpi> there are cancellation points <youpi> e.g. a thread performing while(1); can't be cancelled <paravoid> thread 3 is just a libev event loop <youpi> yes, "just" calling poll, the most complex system call of unix :) <youpi> paravoid: anyway, don't look for a bug in your program, it's most likely a bug in glibc, thanks for the report <paravoid> I think it all boils down to a problem cancelling a thread in poll() <youpi> yes <youpi> paravoid: ok, actually with the latest libc it does work <paravoid> oh? <youpi> where latest = not uploaded yet :/ <paravoid> did you test this on exodar? <youpi> pinotree: that's the libpthread_cancellation.diff I guess <paravoid> because I commented out the join :) <youpi> paravoid: in the root, yes <youpi> well, I tried my own program <paravoid> oh, okay <youpi> which is indeed hanging inside select (or just read) in the chroot <youpi> but not in the root <pinotree> ah, richard's patch <paravoid> url? <youpi> I've installed the build-dep in the root, if you want to try <paravoid> strange that root is newer than the chroot :) <youpi> paravoid: it's the usual eglibc debian source <paravoid> tried in root, still fails <youpi> could you keep the process running? <paravoid> done <youpi> Mmm, but the thread running gdmaps_reload_thread never set the cancel type to async? <youpi> that said I guess read and select are supposed to be cancellation points <youpi> thus cancel_deferred should be working, but they are not <youpi> it seems it's cancellation points which have just not been implemented <youpi> (they happen to be one of the most obscure things in posix) ## IRC, freenode, #hurd, 2013-04-15 <youpi> but yes, there is still an issue, with PTHREAD_CANCEL_DEFERRED <youpi> how calls like read() or select() are supposed to test cancellation? <pinotree> iirc there are the LIBC_CANCEL_* macros in glibc <pinotree> eg sysdeps/unix/sysv/linux/pread.c <youpi> yes <youpi> but in our libpthredaD? <pinotree> could it be we lack the libpthread → glibc bridge of cancellation stuff? <youpi> we do have pthread_setcancelstate/type forwards <youpi> but it seems the default LIBC_CANCEL_ASYNC is void <pinotree> i mean, so when you cancel a thread, you can get that cancel status in libc proper, just like it seems done with LIBC_CANCEL_* macros and nptl <youpi> as I said, the bridge is there <youpi> we're just not using it in glibc <youpi> I'm writing an open_issues page ### IRC, freenode, #hurd, 2013-04-16 <braunr> youpi: yes, we said some time ago that it was lacking # `userspace-rcu` With `2.13-39+hurd.3.rbraun.1` (that is, `2.13-39+hurd.3` plus `hurd-i386/0001-Mask-options-implemented-by-the-userspace-side-of-ma.patch.`) installed. During `make check` of the `userspace-rcu` package. [...] ./test_urcu_gc 4 4 10 -d 0 -b 4096 [hangs] (gdb) thread apply all bt Thread 5 (Thread 14933.5): #0 0x0106785c in mach_msg_trap () at /home/rbraun/devel/debian/packages/eglibc/eglibc-2.13/build-tree/hurd-i386-libc/mach/mach_msg_trap.S:2 #1 0x01068074 in __mach_msg (msg=0x27fff2c, option=3, send_size=24, rcv_size=32, rcv_name=120, timeout=0, notify=0) at msg.c:115 #2 0x011ed35c in __thread_suspend (target_thread=115) at /home/rbraun/devel/debian/packages/eglibc/eglibc-2.13/build-tree/hurd-i386-libc/mach/RPC_thread_suspend.c:84 #3 0x01045016 in __pthread_thread_halt (thread=0x80744a8) at ../libpthread/sysdeps/mach/pt-thread-halt.c:43 #4 0x01041365 in __pthread_exit (status=0x2) at ./pthread/pt-exit.c:118 #5 0x01040e78 in entry_point (start_routine=0x80494b0 <thr_writer>, arg=0x3) at ./pthread/pt-create.c:50 #6 0x00000000 in ?? () Thread 4 (Thread 14933.4): #0 0x0106785c in mach_msg_trap () at /home/rbraun/devel/debian/packages/eglibc/eglibc-2.13/build-tree/hurd-i386-libc/mach/mach_msg_trap.S:2 #1 0x01068074 in __mach_msg (msg=0x25fff2c, option=3, send_size=24, rcv_size=32, rcv_name=119, timeout=0, notify=0) at msg.c:115 #2 0x011ed35c in __thread_suspend (target_thread=113) at /home/rbraun/devel/debian/packages/eglibc/eglibc-2.13/build-tree/hurd-i386-libc/mach/RPC_thread_suspend.c:84 #3 0x01045016 in __pthread_thread_halt (thread=0x8073aa8) at ../libpthread/sysdeps/mach/pt-thread-halt.c:43 #4 0x01041365 in __pthread_exit (status=0x2) at ./pthread/pt-exit.c:118 #5 0x01040e78 in entry_point (start_routine=0x80494b0 <thr_writer>, arg=0x2) at ./pthread/pt-create.c:50 #6 0x00000000 in ?? () Thread 3 (Thread 14933.3): #0 0x0106785c in mach_msg_trap () at /home/rbraun/devel/debian/packages/eglibc/eglibc-2.13/build-tree/hurd-i386-libc/mach/mach_msg_trap.S:2 #1 0x01068074 in __mach_msg (msg=0x23ffe34, option=1282, send_size=0, rcv_size=40, rcv_name=122, timeout=10, notify=0) at msg.c:115 #2 0x0106ece3 in _hurd_select (nfds=0, pollfds=0x0, readfds=0x0, writefds=0x0, exceptfds=0x0, timeout=0x23ffefc, sigmask=0x0) at hurdselect.c:382 #3 0x0115875b in __poll (fds=fds@entry=0x0, nfds=nfds@entry=0, timeout=timeout@entry=10) at ../sysdeps/mach/hurd/poll.c:48 #4 0x0804a1bc in urcu_adaptative_busy_wait (wait=0x23fff48) at ../urcu-wait.h:164 #5 synchronize_rcu_mb () at ../urcu.c:329 #6 0x0804946c in rcu_gc_clear_queue (wtidx=wtidx@entry=1) at test_urcu_gc.c:241 #7 0x080495e6 in rcu_gc_reclaim (old=<optimized out>, wtidx=1) at test_urcu_gc.c:264 #8 thr_writer (data=0x1) at test_urcu_gc.c:295 #9 0x01040e70 in entry_point (start_routine=0x80494b0 <thr_writer>, arg=0x1) at ./pthread/pt-create.c:50 #10 0x00000000 in ?? () Thread 2 (Thread 14933.2): #0 0x0106785c in mach_msg_trap () at /home/rbraun/devel/debian/packages/eglibc/eglibc-2.13/build-tree/hurd-i386-libc/mach/mach_msg_trap.S:2 #1 0x01068074 in __mach_msg (msg=0x17fdf30, option=3, send_size=32, rcv_size=4096, rcv_name=95, timeout=0, notify=0) at msg.c:115 #2 0x01068799 in __mach_msg_server_timeout (demux=0x1079150 <msgport_server>, max_size=4096, rcv_name=95, option=0, timeout=0) at msgserver.c:151 #3 0x0106886b in __mach_msg_server (demux=0x1079150 <msgport_server>, max_size=4096, rcv_name=95) at msgserver.c:196 #4 0x0107911f in _hurd_msgport_receive () at msgportdemux.c:68 #5 0x01040e70 in entry_point (start_routine=0x10790b0 <_hurd_msgport_receive>, arg=0x0) at ./pthread/pt-create.c:50 #6 0x00000000 in ?? () Thread 1 (Thread 14933.1): #0 0x0106785c in mach_msg_trap () at /home/rbraun/devel/debian/packages/eglibc/eglibc-2.13/build-tree/hurd-i386-libc/mach/mach_msg_trap.S:2 #1 0x01068074 in __mach_msg (msg=0x15ff93c, option=2, send_size=0, rcv_size=24, rcv_name=94, timeout=0, notify=0) at msg.c:115 #2 0x010451a2 in __pthread_block (thread=0x805e600) at ../libpthread/sysdeps/mach/pt-block.c:35 #3 0x010443a8 in __pthread_cond_timedwait_internal (cond=0x80730dc, mutex=0x80730bc, abstime=0x0) at ./pthread/../sysdeps/generic/pt-cond-timedwait.c:130 #4 0x01043fcc in __pthread_cond_wait (cond=0x80730dc, mutex=0x80730bc) at ./pthread/../sysdeps/generic/pt-cond-wait.c:36 #5 0x010414ef in pthread_join (thread=8, status=status@entry=0x15ffa6c) at ./pthread/pt-join.c:46 #6 0x08048f9b in main (argc=8, argv=0x15ffb08) at test_urcu_gc.c:466 (gdb) thread 3 [Switching to thread 3 (Thread 14933.3)] #0 0x0106785c in mach_msg_trap () at /home/rbraun/devel/debian/packages/eglibc/eglibc-2.13/build-tree/hurd-i386-libc/mach/mach_msg_trap.S:2 2 /home/rbraun/devel/debian/packages/eglibc/eglibc-2.13/build-tree/hurd-i386-libc/mach/mach_msg_trap.S: Datei oder Verzeichnis nicht gefunden. (gdb) call pthread_self() $1 = 8 That is, Thread 1 is waiting for Thread 3 (8) to join, which is stuck in `poll`. Is this really the [[libpthread_cancellation_points]] issue -- there doesn't seem to be any thread cancellation involved?