summaryrefslogtreecommitdiff
path: root/open_issues/term_blocking.mdwn
diff options
context:
space:
mode:
authorSamuel Thibault <samuel.thibault@ens-lyon.org>2015-02-18 00:58:35 +0100
committerSamuel Thibault <samuel.thibault@ens-lyon.org>2015-02-18 00:58:35 +0100
commit49a086299e047b18280457b654790ef4a2e5abfa (patch)
treec2b29e0734d560ce4f58c6945390650b5cac8a1b /open_issues/term_blocking.mdwn
parente2b3602ea241cd0f6bc3db88bf055bee459028b6 (diff)
Revert "rename open_issues.mdwn to service_solahart_jakarta_selatan__082122541663.mdwn"
This reverts commit 95878586ec7611791f4001a4ee17abf943fae3c1.
Diffstat (limited to 'open_issues/term_blocking.mdwn')
-rw-r--r--open_issues/term_blocking.mdwn339
1 files changed, 339 insertions, 0 deletions
diff --git a/open_issues/term_blocking.mdwn b/open_issues/term_blocking.mdwn
new file mode 100644
index 00000000..1c8816e1
--- /dev/null
+++ b/open_issues/term_blocking.mdwn
@@ -0,0 +1,339 @@
+[[!meta copyright="Copyright © 2009, 2011, 2012, 2013 Free Software Foundation,
+Inc."]]
+
+[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
+id="license" text="Permission is granted to copy, distribute and/or modify this
+document under the terms of the GNU Free Documentation License, Version 1.2 or
+any later version published by the Free Software Foundation; with no Invariant
+Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license
+is included in the section entitled [[GNU Free Documentation
+License|/fdl]]."]]"""]]
+
+[[!tag open_issue_hurd]]
+
+There must be some blocking / dead-locking (?) problem in `term`.
+
+[[!toc]]
+
+
+# Original Findings
+
+ # w | grep [t]sch
+ tschwing p1 192.168.10.60: Tue 8PM 0:03 2172 /bin/bash
+ tschwing p2 192.168.10.60: Tue 4PM 40hrs 689 emacs
+ tschwing p3 192.168.10.60: 8:52PM 11:37 15307 /bin/bash
+ tschwing p0 192.168.10.60: 6:42PM 11:47 8104 /bin/bash
+ tschwing p8 192.168.10.60: 8:27AM 0:02 16510 /bin/bash
+
+Now open a new screen window, or login shell, or...
+
+ # ps -Af | tail
+ [...]
+ tschwinge 16538 676 p6 0:00.08 /bin/bash
+ root 16554 128 co 0:00.09 ps -Af
+ root 16555 128 co 0:00.01 tail
+
+`bash` is started (on `p6`), but newer makes it to the shell promt; doesn't
+even start to execute `.bash_profile` / `.bashrc`. The next shell started, on
+the next available pseudoterminal, will work without problems.
+
+The `term` on `p6` has already been running before:
+
+ # ps -Af | grep [t]typ6
+ root 6871 3 - 5:45.86 /hurd/term /dev/ptyp6 pty-master /dev/ttyp6
+
+In this situation, `w` will sometimes report erroneous values for *IDLE*
+for the process using that terminal.
+
+Killed that `term` instance, and things were fine again.
+
+
+All this reproducible happens while running the [[GDB testsuite|gdb]].
+
+---
+
+Have a freshly started shell blocking on such a `term` instance.
+
+ $ ps -F hurd-long -p 1766 -T -Q
+ PID TH# UID PPID PGrp Sess TH Vmem RSS %CPU User System Args
+ 1766 0 3 1 1 6 131M 1.14M 0.0 0:28.85 5:40.91 /hurd/term /dev/ptyp3 pty-master /dev/ttyp3
+ 0 0.0 0:05.76 1:08.48
+ 1 0.0 0:00.00 0:00.01
+ 2 0.0 0:06.40 1:11.52
+ 3 0.0 0:05.76 1:09.89
+ 4 0.0 0:05.42 1:06.74
+ 5 0.0 0:05.50 1:04.25
+
+... and after 5:45 h:
+
+ $ ps -F hurd-long -p 21987 -T -Q
+ PID TH# UID PPID PGrp Sess TH Vmem RSS %CPU User System Args
+ 21987 1001 676 21987 21987 2 148M 2.03M 0.0 0:00.02 0:00.07 /bin/bash
+ 0 0.0 0:00.02 0:00.07
+ 1 0.0 0:00.00 0:00.00
+
+ $ ps -F hurd-long -p 1766 -T -Q
+ PID TH# UID PPID PGrp Sess TH Vmem RSS %CPU User System Args
+ 1766 0 3 1 1 6 131M 1.14M 0.0 0:29.04 5:42.38 /hurd/term /dev/ptyp3 pty-master /dev/ttyp3
+ 0 0.0 0:05.76 1:08.48
+ 1 0.0 0:00.00 0:00.01
+ 2 0.0 0:06.41 1:11.90
+ 3 0.0 0:05.82 1:10.28
+ 4 0.0 0:05.52 1:07.06
+ 5 0.0 0:05.52 1:04.63
+
+ $ sudo gdb /hurd/term 1766
+ [sudo] password for tschwinge:
+ GNU gdb (GDB) 7.0-debian
+ Copyright (C) 2009 Free Software Foundation, Inc.
+ License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
+ This is free software: you are free to change and redistribute it.
+ There is NO WARRANTY, to the extent permitted by law. Type "show copying"
+ and "show warranty" for details.
+ This GDB was configured as "i486-gnu".
+ For bug reporting instructions, please see:
+ <http://www.gnu.org/software/gdb/bugs/>...
+ Reading symbols from /hurd/term...Reading symbols from /usr/lib/debug/hurd/term...done.
+ (no debugging symbols found)...done.
+ Attaching to program `/hurd/term', pid 1766
+ [New Thread 1766.1]
+ [New Thread 1766.2]
+ [New Thread 1766.3]
+ [New Thread 1766.4]
+ [New Thread 1766.5]
+ [New Thread 1766.6]
+ Reading symbols from /lib/libhurdbugaddr.so.0.3...Reading symbols from /usr/lib/debug/lib/libhurdbugaddr.so.0.3...
+ [System doesn't respond anymore, but no kernel crash.]
+
+---
+
+The very same behavior is still observable as of 2011-03-24.
+
+Next: rebooted; on console started root shell, screen, a few spare windows; as
+user started GDB test suite, noticed the PTY it's using; in a root shell
+started GDB (the system one, for `.debug` stuff) on `/hurd/term`, `set
+noninvasive on`, attach to the *term* that GDB is using.
+
+---
+
+[[2011-07-04]].
+
+---
+
+2012-11-05
+
+Log file from a 2011-09-07 run:
+
+ [...]
+ Running ../../../master/gdb/testsuite/gdb.base/readline.exp ...
+ spawn [...]/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory [...]/gdb/testsuite/../data-directory
+ GNU gdb (GDB) 7.3.50.20110906-cvs
+ Copyright (C) 2011 Free Software Foundation, Inc.
+ License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
+ This is free software: you are free to change and redistribute it.
+ There is NO WARRANTY, to the extent permitted by law. Type "show copying"
+ and "show warranty" for details.
+ This GDB was configured as "i686-unknown-gnu0.3".
+ For bug reporting instructions, please see:
+ <http://www.gnu.org/software/gdb/bugs/>.
+ (gdb) set height 0
+ (gdb) set width 0
+ (gdb) dir
+ Reinitialize source path to empty? (y or n) y
+ Source directories searched: $cdir:$cwd
+ (gdb) dir ../../../master/gdb/testsuite/gdb.base
+ Source directories searched: [...]/gdb/testsuite/../../../master/gdb/testsuite/gdb.base:$cdir:$cwd
+ (gdb) p 1
+ $1 = 1
+ PASS: gdb.base/readline.exp: Simple operate-and-get-next - send p 1
+ (gdb) p 2
+ $2 = 2
+ PASS: gdb.base/readline.exp: Simple operate-and-get-next - send p 2
+ (gdb) p 3
+ $3 = 3
+ PASS: gdb.base/readline.exp: Simple operate-and-get-next - send p 3
+ (gdb) p 3(gdb) p 3PASS: gdb.base/readline.exp: Simple operate-and-get-next - C-p to p 3
+ ^H2(gdb) p 2PASS: gdb.base/readline.exp: Simple operate-and-get-next - C-p to p 2
+ ^H1(gdb) p 1PASS: gdb.base/readline.exp: Simple operate-and-get-next - C-p to p 1
+ ^OFAIL: gdb.base/readline.exp: Simple operate-and-get-next - C-o for p 1
+ FAIL: gdb.base/readline.exp: operate-and-get-next with secondary prompt - send if 1 > 0
+ FAIL: gdb.base/readline.exp: print 42 (timeout)
+ FAIL: gdb.base/readline.exp: arrow keys with secondary prompt (timeout)
+ spawn [...]/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory [...]/gdb/testsuite/../data-directory
+ ERROR: (timeout) GDB never initialized after 10 seconds.
+ ERROR: no fileid for coulomb
+ ERROR: no fileid for coulomb
+ UNRESOLVED: gdb.base/readline.exp: Simple operate-and-get-next - send p 7
+ testcase ../../../master/gdb/testsuite/gdb.base/readline.exp completed in 646 seconds
+ Running ../../../master/gdb/testsuite/gdb.base/wchar.exp ...
+ Executing on host: gcc -c -g -o [...]/gdb/testsuite/gdb.base/wchar0.o ../../../master/gdb/testsuite/gdb.base/wchar.c (timeout = 300)
+ spawn gcc -c -g -o [...]/gdb/testsuite/gdb.base/wchar0.o ../../../master/gdb/testsuite/gdb.base/wchar.c
+ Executing on host: gcc [...]/gdb/testsuite/gdb.base/wchar0.o -g -lm -o [...]/gdb/testsuite/gdb.base/wchar (timeout = 300)
+ spawn gcc [...]/gdb/testsuite/gdb.base/wchar0.o -g -lm -o [...]/gdb/testsuite/gdb.base/wchar
+ get_compiler_info: gcc-4-6-1
+ spawn [...]/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory [...]/gdb/testsuite/../data-directory
+ ERROR: (timeout) GDB never initialized after 10 seconds.
+ ERROR: no fileid for coulomb
+ ERROR: no fileid for coulomb
+ ERROR: no fileid for coulomb
+ ERROR: couldn't load [...]/gdb/testsuite/gdb.base/wchar into [...]/gdb/testsuite/../../gdb/gdb (timed out).
+ ERROR: no fileid for coulomb
+ ERROR: Delete all breakpoints in delete_breakpoints (timeout)
+ ERROR: no fileid for coulomb
+ UNRESOLVED: gdb.base/wchar.exp: setting breakpoint at wchar.c:34 (timeout)
+ testcase ../../../master/gdb/testsuite/gdb.base/wchar.exp completed in 797 seconds
+ [...]
+
+
+# IRC, freenode, #hurd, 2012-08-09
+
+In context of the [[select]] issue.
+
+ <braunr> i wonder where the tty allocation is made
+ <braunr> it could simply be that current applications don't handle old BSD
+ ptys correctly
+ <braunr> hm no, allocation is fine
+ <braunr> does someone know why there is no term instance for /dev/ttypX ?
+ <braunr> showtrans says "/hurd/term /dev/ttyp0 pty-slave /dev/ptyp0" though
+ <youpi> braunr: /dev/ttypX share the same translator with /dev/ptypX
+ <braunr> youpi: but how ?
+ <youpi> see the main function of term
+ <youpi> it attaches itself to the other node
+ <youpi> with file_set_translator
+ <youpi> just like pfinet can attach itself to /servers/socket/26 too
+ <braunr> youpi: isn't there a possible race when the same translator tries
+ to sets itself on several nodes ?
+ <youpi> I don't know
+ <tschwinge> There is.
+ <braunr> i guess it would just faikl
+ <braunr> fail
+ <tschwinge> I remember some discussion about this, possibly in context of
+ the IPv6 project.
+ <braunr> gdb shows weird traces in term
+ <braunr> i got this earlier today: http://www.sceen.net/~rbraun/gdb.txt
+ <braunr> 0x805e008 is the ptyctl, the trivs control for the pty
+ <tschwinge> braunr: How do you mean »weird«?
+ <braunr> tschwinge: some peropen (po) are never destroyed
+ <tschwinge> Well, can't they possibly still be open?
+ <braunr> they shouldn't
+ <braunr> that's why term doesn't close cleany, why select still reports
+ readiness, and why screen loops on it
+ <braunr> (and why each ssh session uses a different pty)
+ <tschwinge> ... but only on darnassus, I think? (I think I haven't seen
+ this anywhere else.)
+ <braunr> really ?
+ <braunr> i had it on my virtual machines too
+ <tschwinge> But perhaps I've always been rebooting systems quickly enough
+ to not notice.
+ <tschwinge> OK, I'll have a look next time I boot mine.
+ <braunr> i suppose it's why you can't login anymore quickly when syslog is
+ running
+
+[[syslog]]?
+
+ <braunr> i've traced the problem to ptyio.c, where pty_open_hook returns
+ EBUSY because ptyopen is still true
+ <braunr> ptyopen remains true because pty_po_create_hook doesn't get called
+ <youpi> tschwinge: I've seen the pty issue on exodar too, and on my qemu
+ image too
+ <braunr> err, pty_po_destroy_hook
+ <tschwinge> OK.
+ <braunr> and pty_po_destroy_hook doesn't get called from users.c because
+ po->cntl != ptyctl
+ <braunr> which means, somehow, the pty never gets closed
+ <youpi> oddly enough it seems to happen on all qemu systems I have, and no
+ xen system I have
+ <braunr> Oo
+ <braunr> are they all (xen and qemu) up to date ?
+ <braunr> (so we can remove versions as a factor)
+ <tschwinge> Aha. I only hve Xen and real hardware.
+ <youpi> braunr: no
+ <braunr> youpi: do you know any obscur site about ptys ? :)
+ <youpi> no
+ <youpi> well, actually yes
+ <youpi> http://dept-info.labri.fr/~thibault/a (in french)
+ <braunr> :D
+ <braunr> http://www.linusakesson.net/programming/tty/index.php looks
+ interesting
+ <youpi> indeed
+
+
+## IRC, freenode, #hurdfr, 2012-08-09
+
+ <braunr> youpi: ce que j'ai le plus de mal à comprendre, c'est ce qu'est un
+ "controlling tty"
+ <youpi> c'est le plus obscur d'obscur :)
+ <braunr> s'il est exclusif à une appli, comment ça doit se comporter sur un
+ fork, etc..
+ <youpi> de manière simple, c'est ce qui permet de faire ^C
+ <braunr> eh oui, et c'est sûrement là que ça explose
+ <youpi> c'est pas exclusif, c'est hérité
+ <braunr>
+ http://homepage.ntlworld.com/jonathan.deboynepollard/FGA/bernstein-on-ttys/cttys.html
+
+
+## IRC, freenode, #hurd, 2012-08-10
+
+ <braunr> youpi: and just to be sure about the test procedure, i log on a
+ system, type tty, see e.g. ttyp0, log out, and in again, then tty returns
+ ttyp1, etc..
+ <youpi> yes
+ <braunr> youpi: and an open (e.g. cat) on /dev/ptyp0 returns EBUSY
+ <youpi> indeed
+ <braunr> so on xen it doesn't
+ <braunr> grmbl
+ <youpi> I've never seen it, more precisely
+ <braunr> i also have the problem with a non-accelerated qemu
+ <braunr> antrik: do you have the term problems we've seen on your bare
+ hardware ?
+ <antrik> I'm not sure what problem you are seeing exactly :-)
+ <braunr> antrik: when logging through ssh, tty first returns ttyp0, and the
+ second time (after logging out from the first session) ttyp1
+ <braunr> antrik: and term servers that have been used are then stuck in a
+ busy state
+ <antrik> braunr: my ptys seem to be reused just fine
+ <braunr> or perhaps they didn't have the bug
+ <braunr> antrik: that's so weird
+ <antrik> (I do *sometimes* get hanging ptys, but that's a different issue
+ -- these are *not* busy; they just hang when reused...)
+ <braunr> antrik: yes i saw that too
+ <antrik> braunr: note though that my hurd package is many months old...
+ <antrik> (in fact everything on this system)
+ <braunr> antrik: i didn't see anything relevant about the term server in
+ years
+ <braunr> antrik: what shell do you use ?
+ <antrik> yeah, but such errors could be caused by all kinds of changes in
+ other parts of the Hurd, glibc, whatever...
+ <antrik> bash
+
+
+## IRC, freenode, #hurd, 2012-12-27
+
+ <youpi> we however have a similar symptom with screen
+ <youpi> shells don't terminate
+ <braunr> yes
+ <youpi> or at least the window doesn't close
+ <braunr> the screen problem is the same as the term servers not being properly closed
+ <youpi> k
+ <braunr> that one is still on my todo list
+ <braunr> and not easy
+ <youpi> like so many small items on the TODO lists :)
+ <braunr> that one is an important one :)
+ <braunr> because we're still using legacy pty, the number of terms is
+ limited
+ <braunr> which means at some point we can't log in any more using them
+ <braunr> (i regularly kill pty terms on darnassus to avoid that)
+ <braunr> it prevents screen and rsyslogd iirc from working correctly, which
+ is very annoying
+ <braunr> there may be other issues
+
+
+# Formal Verification
+
+This issue may be a simple programming error, or it may be more complicated.
+
+Methods of [[formal_verification]] should be applied to confirm that there is
+no error in `/hurd/term`'s logic itself. There are tools for formal
+verification/[[code_analysis]] that can likely help here.
+
+There is a [[!FF_project 277]][[!tag bounty]] on this task.