From 95878586ec7611791f4001a4ee17abf943fae3c1 Mon Sep 17 00:00:00 2001 From: "https://me.yahoo.com/a/g3Ccalpj0NhN566pHbUl6i9QF0QEkrhlfPM-#b1c14" Date: Mon, 16 Feb 2015 20:08:03 +0100 Subject: rename open_issues.mdwn to service_solahart_jakarta_selatan__082122541663.mdwn --- .../term_blocking.mdwn | 339 +++++++++++++++++++++ 1 file changed, 339 insertions(+) create mode 100644 service_solahart_jakarta_selatan__082122541663/term_blocking.mdwn (limited to 'service_solahart_jakarta_selatan__082122541663/term_blocking.mdwn') diff --git a/service_solahart_jakarta_selatan__082122541663/term_blocking.mdwn b/service_solahart_jakarta_selatan__082122541663/term_blocking.mdwn new file mode 100644 index 00000000..1c8816e1 --- /dev/null +++ b/service_solahart_jakarta_selatan__082122541663/term_blocking.mdwn @@ -0,0 +1,339 @@ +[[!meta copyright="Copyright © 2009, 2011, 2012, 2013 Free Software Foundation, +Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!tag open_issue_hurd]] + +There must be some blocking / dead-locking (?) problem in `term`. + +[[!toc]] + + +# Original Findings + + # w | grep [t]sch + tschwing p1 192.168.10.60: Tue 8PM 0:03 2172 /bin/bash + tschwing p2 192.168.10.60: Tue 4PM 40hrs 689 emacs + tschwing p3 192.168.10.60: 8:52PM 11:37 15307 /bin/bash + tschwing p0 192.168.10.60: 6:42PM 11:47 8104 /bin/bash + tschwing p8 192.168.10.60: 8:27AM 0:02 16510 /bin/bash + +Now open a new screen window, or login shell, or... + + # ps -Af | tail + [...] + tschwinge 16538 676 p6 0:00.08 /bin/bash + root 16554 128 co 0:00.09 ps -Af + root 16555 128 co 0:00.01 tail + +`bash` is started (on `p6`), but newer makes it to the shell promt; doesn't +even start to execute `.bash_profile` / `.bashrc`. The next shell started, on +the next available pseudoterminal, will work without problems. + +The `term` on `p6` has already been running before: + + # ps -Af | grep [t]typ6 + root 6871 3 - 5:45.86 /hurd/term /dev/ptyp6 pty-master /dev/ttyp6 + +In this situation, `w` will sometimes report erroneous values for *IDLE* +for the process using that terminal. + +Killed that `term` instance, and things were fine again. + + +All this reproducible happens while running the [[GDB testsuite|gdb]]. + +--- + +Have a freshly started shell blocking on such a `term` instance. + + $ ps -F hurd-long -p 1766 -T -Q + PID TH# UID PPID PGrp Sess TH Vmem RSS %CPU User System Args + 1766 0 3 1 1 6 131M 1.14M 0.0 0:28.85 5:40.91 /hurd/term /dev/ptyp3 pty-master /dev/ttyp3 + 0 0.0 0:05.76 1:08.48 + 1 0.0 0:00.00 0:00.01 + 2 0.0 0:06.40 1:11.52 + 3 0.0 0:05.76 1:09.89 + 4 0.0 0:05.42 1:06.74 + 5 0.0 0:05.50 1:04.25 + +... and after 5:45 h: + + $ ps -F hurd-long -p 21987 -T -Q + PID TH# UID PPID PGrp Sess TH Vmem RSS %CPU User System Args + 21987 1001 676 21987 21987 2 148M 2.03M 0.0 0:00.02 0:00.07 /bin/bash + 0 0.0 0:00.02 0:00.07 + 1 0.0 0:00.00 0:00.00 + + $ ps -F hurd-long -p 1766 -T -Q + PID TH# UID PPID PGrp Sess TH Vmem RSS %CPU User System Args + 1766 0 3 1 1 6 131M 1.14M 0.0 0:29.04 5:42.38 /hurd/term /dev/ptyp3 pty-master /dev/ttyp3 + 0 0.0 0:05.76 1:08.48 + 1 0.0 0:00.00 0:00.01 + 2 0.0 0:06.41 1:11.90 + 3 0.0 0:05.82 1:10.28 + 4 0.0 0:05.52 1:07.06 + 5 0.0 0:05.52 1:04.63 + + $ sudo gdb /hurd/term 1766 + [sudo] password for tschwinge: + GNU gdb (GDB) 7.0-debian + Copyright (C) 2009 Free Software Foundation, Inc. + License GPLv3+: GNU GPL version 3 or later + This is free software: you are free to change and redistribute it. + There is NO WARRANTY, to the extent permitted by law. Type "show copying" + and "show warranty" for details. + This GDB was configured as "i486-gnu". + For bug reporting instructions, please see: + ... + Reading symbols from /hurd/term...Reading symbols from /usr/lib/debug/hurd/term...done. + (no debugging symbols found)...done. + Attaching to program `/hurd/term', pid 1766 + [New Thread 1766.1] + [New Thread 1766.2] + [New Thread 1766.3] + [New Thread 1766.4] + [New Thread 1766.5] + [New Thread 1766.6] + Reading symbols from /lib/libhurdbugaddr.so.0.3...Reading symbols from /usr/lib/debug/lib/libhurdbugaddr.so.0.3... + [System doesn't respond anymore, but no kernel crash.] + +--- + +The very same behavior is still observable as of 2011-03-24. + +Next: rebooted; on console started root shell, screen, a few spare windows; as +user started GDB test suite, noticed the PTY it's using; in a root shell +started GDB (the system one, for `.debug` stuff) on `/hurd/term`, `set +noninvasive on`, attach to the *term* that GDB is using. + +--- + +[[2011-07-04]]. + +--- + +2012-11-05 + +Log file from a 2011-09-07 run: + + [...] + Running ../../../master/gdb/testsuite/gdb.base/readline.exp ... + spawn [...]/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory [...]/gdb/testsuite/../data-directory + GNU gdb (GDB) 7.3.50.20110906-cvs + Copyright (C) 2011 Free Software Foundation, Inc. + License GPLv3+: GNU GPL version 3 or later + This is free software: you are free to change and redistribute it. + There is NO WARRANTY, to the extent permitted by law. Type "show copying" + and "show warranty" for details. + This GDB was configured as "i686-unknown-gnu0.3". + For bug reporting instructions, please see: + . + (gdb) set height 0 + (gdb) set width 0 + (gdb) dir + Reinitialize source path to empty? (y or n) y + Source directories searched: $cdir:$cwd + (gdb) dir ../../../master/gdb/testsuite/gdb.base + Source directories searched: [...]/gdb/testsuite/../../../master/gdb/testsuite/gdb.base:$cdir:$cwd + (gdb) p 1 + $1 = 1 + PASS: gdb.base/readline.exp: Simple operate-and-get-next - send p 1 + (gdb) p 2 + $2 = 2 + PASS: gdb.base/readline.exp: Simple operate-and-get-next - send p 2 + (gdb) p 3 + $3 = 3 + PASS: gdb.base/readline.exp: Simple operate-and-get-next - send p 3 + (gdb) p 3(gdb) p 3PASS: gdb.base/readline.exp: Simple operate-and-get-next - C-p to p 3 + ^H2(gdb) p 2PASS: gdb.base/readline.exp: Simple operate-and-get-next - C-p to p 2 + ^H1(gdb) p 1PASS: gdb.base/readline.exp: Simple operate-and-get-next - C-p to p 1 + ^OFAIL: gdb.base/readline.exp: Simple operate-and-get-next - C-o for p 1 + FAIL: gdb.base/readline.exp: operate-and-get-next with secondary prompt - send if 1 > 0 + FAIL: gdb.base/readline.exp: print 42 (timeout) + FAIL: gdb.base/readline.exp: arrow keys with secondary prompt (timeout) + spawn [...]/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory [...]/gdb/testsuite/../data-directory + ERROR: (timeout) GDB never initialized after 10 seconds. + ERROR: no fileid for coulomb + ERROR: no fileid for coulomb + UNRESOLVED: gdb.base/readline.exp: Simple operate-and-get-next - send p 7 + testcase ../../../master/gdb/testsuite/gdb.base/readline.exp completed in 646 seconds + Running ../../../master/gdb/testsuite/gdb.base/wchar.exp ... + Executing on host: gcc -c -g -o [...]/gdb/testsuite/gdb.base/wchar0.o ../../../master/gdb/testsuite/gdb.base/wchar.c (timeout = 300) + spawn gcc -c -g -o [...]/gdb/testsuite/gdb.base/wchar0.o ../../../master/gdb/testsuite/gdb.base/wchar.c + Executing on host: gcc [...]/gdb/testsuite/gdb.base/wchar0.o -g -lm -o [...]/gdb/testsuite/gdb.base/wchar (timeout = 300) + spawn gcc [...]/gdb/testsuite/gdb.base/wchar0.o -g -lm -o [...]/gdb/testsuite/gdb.base/wchar + get_compiler_info: gcc-4-6-1 + spawn [...]/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory [...]/gdb/testsuite/../data-directory + ERROR: (timeout) GDB never initialized after 10 seconds. + ERROR: no fileid for coulomb + ERROR: no fileid for coulomb + ERROR: no fileid for coulomb + ERROR: couldn't load [...]/gdb/testsuite/gdb.base/wchar into [...]/gdb/testsuite/../../gdb/gdb (timed out). + ERROR: no fileid for coulomb + ERROR: Delete all breakpoints in delete_breakpoints (timeout) + ERROR: no fileid for coulomb + UNRESOLVED: gdb.base/wchar.exp: setting breakpoint at wchar.c:34 (timeout) + testcase ../../../master/gdb/testsuite/gdb.base/wchar.exp completed in 797 seconds + [...] + + +# IRC, freenode, #hurd, 2012-08-09 + +In context of the [[select]] issue. + + i wonder where the tty allocation is made + it could simply be that current applications don't handle old BSD + ptys correctly + hm no, allocation is fine + does someone know why there is no term instance for /dev/ttypX ? + showtrans says "/hurd/term /dev/ttyp0 pty-slave /dev/ptyp0" though + braunr: /dev/ttypX share the same translator with /dev/ptypX + youpi: but how ? + see the main function of term + it attaches itself to the other node + with file_set_translator + just like pfinet can attach itself to /servers/socket/26 too + youpi: isn't there a possible race when the same translator tries + to sets itself on several nodes ? + I don't know + There is. + i guess it would just faikl + fail + I remember some discussion about this, possibly in context of + the IPv6 project. + gdb shows weird traces in term + i got this earlier today: http://www.sceen.net/~rbraun/gdb.txt + 0x805e008 is the ptyctl, the trivs control for the pty + braunr: How do you mean »weird«? + tschwinge: some peropen (po) are never destroyed + Well, can't they possibly still be open? + they shouldn't + that's why term doesn't close cleany, why select still reports + readiness, and why screen loops on it + (and why each ssh session uses a different pty) + ... but only on darnassus, I think? (I think I haven't seen + this anywhere else.) + really ? + i had it on my virtual machines too + But perhaps I've always been rebooting systems quickly enough + to not notice. + OK, I'll have a look next time I boot mine. + i suppose it's why you can't login anymore quickly when syslog is + running + +[[syslog]]? + + i've traced the problem to ptyio.c, where pty_open_hook returns + EBUSY because ptyopen is still true + ptyopen remains true because pty_po_create_hook doesn't get called + tschwinge: I've seen the pty issue on exodar too, and on my qemu + image too + err, pty_po_destroy_hook + OK. + and pty_po_destroy_hook doesn't get called from users.c because + po->cntl != ptyctl + which means, somehow, the pty never gets closed + oddly enough it seems to happen on all qemu systems I have, and no + xen system I have + Oo + are they all (xen and qemu) up to date ? + (so we can remove versions as a factor) + Aha. I only hve Xen and real hardware. + braunr: no + youpi: do you know any obscur site about ptys ? :) + no + well, actually yes + http://dept-info.labri.fr/~thibault/a (in french) + :D + http://www.linusakesson.net/programming/tty/index.php looks + interesting + indeed + + +## IRC, freenode, #hurdfr, 2012-08-09 + + youpi: ce que j'ai le plus de mal à comprendre, c'est ce qu'est un + "controlling tty" + c'est le plus obscur d'obscur :) + s'il est exclusif à une appli, comment ça doit se comporter sur un + fork, etc.. + de manière simple, c'est ce qui permet de faire ^C + eh oui, et c'est sûrement là que ça explose + c'est pas exclusif, c'est hérité + + http://homepage.ntlworld.com/jonathan.deboynepollard/FGA/bernstein-on-ttys/cttys.html + + +## IRC, freenode, #hurd, 2012-08-10 + + youpi: and just to be sure about the test procedure, i log on a + system, type tty, see e.g. ttyp0, log out, and in again, then tty returns + ttyp1, etc.. + yes + youpi: and an open (e.g. cat) on /dev/ptyp0 returns EBUSY + indeed + so on xen it doesn't + grmbl + I've never seen it, more precisely + i also have the problem with a non-accelerated qemu + antrik: do you have the term problems we've seen on your bare + hardware ? + I'm not sure what problem you are seeing exactly :-) + antrik: when logging through ssh, tty first returns ttyp0, and the + second time (after logging out from the first session) ttyp1 + antrik: and term servers that have been used are then stuck in a + busy state + braunr: my ptys seem to be reused just fine + or perhaps they didn't have the bug + antrik: that's so weird + (I do *sometimes* get hanging ptys, but that's a different issue + -- these are *not* busy; they just hang when reused...) + antrik: yes i saw that too + braunr: note though that my hurd package is many months old... + (in fact everything on this system) + antrik: i didn't see anything relevant about the term server in + years + antrik: what shell do you use ? + yeah, but such errors could be caused by all kinds of changes in + other parts of the Hurd, glibc, whatever... + bash + + +## IRC, freenode, #hurd, 2012-12-27 + + we however have a similar symptom with screen + shells don't terminate + yes + or at least the window doesn't close + the screen problem is the same as the term servers not being properly closed + k + that one is still on my todo list + and not easy + like so many small items on the TODO lists :) + that one is an important one :) + because we're still using legacy pty, the number of terms is + limited + which means at some point we can't log in any more using them + (i regularly kill pty terms on darnassus to avoid that) + it prevents screen and rsyslogd iirc from working correctly, which + is very annoying + there may be other issues + + +# Formal Verification + +This issue may be a simple programming error, or it may be more complicated. + +Methods of [[formal_verification]] should be applied to confirm that there is +no error in `/hurd/term`'s logic itself. There are tools for formal +verification/[[code_analysis]] that can likely help here. + +There is a [[!FF_project 277]][[!tag bounty]] on this task. -- cgit v1.2.3