From 47e4d194dc36adfcfd2577fa4630c9fcded005d3 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Sun, 27 Oct 2013 19:15:06 +0100 Subject: IRC. --- hurd/translator/auth.mdwn | 3 +- hurd/translator/discussion.mdwn | 26 +- hurd/translator/ext2fs.mdwn | 38 +- hurd/translator/fifo.mdwn | 6 + hurd/translator/magic.mdwn | 262 +++++++++++++- hurd/translator/mtab/discussion.mdwn | 482 ++++++++++++++++++++++++- hurd/translator/proc.mdwn | 29 ++ hurd/translator/procfs/jkoenig/discussion.mdwn | 82 +++++ hurd/translator/term.mdwn | 207 +++++++++++ hurd/translator/tmpfs/discussion.mdwn | 37 ++ 10 files changed, 1131 insertions(+), 41 deletions(-) create mode 100644 hurd/translator/term.mdwn (limited to 'hurd/translator') diff --git a/hurd/translator/auth.mdwn b/hurd/translator/auth.mdwn index 7fd4832c..10cfb3aa 100644 --- a/hurd/translator/auth.mdwn +++ b/hurd/translator/auth.mdwn @@ -8,7 +8,8 @@ Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled [[GNU Free Documentation License|/fdl]]."]]"""]] -The *auth server* (or, *authentification server*). +The *auth server* (or, *authentification server*) is a key component managing +[[authentication]] in a Hurd system. It is stated by `/hurd/init`. diff --git a/hurd/translator/discussion.mdwn b/hurd/translator/discussion.mdwn index e038ba84..95f5ab0c 100644 --- a/hurd/translator/discussion.mdwn +++ b/hurd/translator/discussion.mdwn @@ -1,4 +1,4 @@ -[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]] +[[!meta copyright="Copyright © 2011, 2013 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this @@ -10,7 +10,8 @@ License|/fdl]]."]]"""]] [[!tag open_issue_documentation open_issue_hurd]] -IRC, freenode, #hurd, 2011-08-25: + +# IRC, freenode, #hurd, 2011-08-25 < frhodes> how can I replace an existing running server with a new one without rebooting? @@ -23,3 +24,24 @@ IRC, freenode, #hurd, 2011-08-25: nature < antrik> in some cases, you might even be able simply to remove the old translator... but obviously only for non-critical stuff :-) + + +# IRC, freenode, #hurd, 2013-10-21 + + mhmm, there is a problem with thread destruction + +[[open_issues/libpthread/t/fix_have_kernel_resources]]. + + actually, translator self destruction + if a request arrives after the last thread servicing a port set + returns from mach_msg because of a timeout, but before the translator is + detached from its parent, the client will get an error + it should very rarely happen, but if it does, we could face the + same kind of issues we have when a server crashes + e.g. sshd looping over select() returning EBADF, consuming all cpu + not sure we want to introduce such new issues + + i don't think i'll be able to make translators disappear reliably + .. + but at least, thread consumption will correctly decrease with + inactivity diff --git a/hurd/translator/ext2fs.mdwn b/hurd/translator/ext2fs.mdwn index e2f6b044..cfd09502 100644 --- a/hurd/translator/ext2fs.mdwn +++ b/hurd/translator/ext2fs.mdwn @@ -163,6 +163,11 @@ small backend stores, like floppy devices. ok +#### IRC, freenode, #hurd, 2013-10-08 + + ogi: your ext2fs patches were finally merged upstream :) + + ## Sync Interval [[!tag open_issue_hurd]] @@ -209,39 +214,6 @@ That would be a nice improvement, but only after writeback throttling is impleme tschwinge: well, thanks anyway ;) -## Increased Memory Consumption - -### IRC, freenode, #hurd, 2013-09-18 - - ext2fs is using a ginormous amount of memory on darnassus since i - last updated the hurd package :/ - i wonder if my ext2fs large store patches rework have introduced a - regression - the order of magnitude here is around 1.5G virtual space :/ - it used to take up to 3 times less before that - looks like my patches didn't make it into the latest hurd package - teythoon: looks like there definitely is a new leak in ext2fs - :/ - memory only - the number of ports looks stable relative to file system usage - braunr: I tested my patches on my development machine, it's up - for 14 days (yay libvirt :) and never encountered problems like this - i've been building glibc to reach that state - hm, that's a heavy load indeed - could be the file name tracking stuff, I tried to make sure that - everything is freed, but I might have missed something - teythoon: simply running htop run shows a slight, regular increase - in physical memory usage in ext2fs - old procfs stikes again? :) - braunr: I see that as well... curious... - 16:46 < teythoon> could be the file name tracking stuff, I tried - to make sure that everything is freed, but I might have missed something - how knows, maybe completely unrelated - the tracking patch isn't that big, I've gone over it twice today - and it still seems reasonable to me - hm - - # Documentation * diff --git a/hurd/translator/fifo.mdwn b/hurd/translator/fifo.mdwn index 857922fc..4132e94a 100644 --- a/hurd/translator/fifo.mdwn +++ b/hurd/translator/fifo.mdwn @@ -46,3 +46,9 @@ The *fifo* translator implements named pipes (FIFOs). gg0: got an example? http://bugs.debian.org/629184 i didn't close it myself + + +## IRC, OFTC, #debian-hurd, 2013-10-04 + + there is new-fifo, which you can try + i guess none of us know what it was really meant for diff --git a/hurd/translator/magic.mdwn b/hurd/translator/magic.mdwn index 84bacdfb..2b0d1bf7 100644 --- a/hurd/translator/magic.mdwn +++ b/hurd/translator/magic.mdwn @@ -1,5 +1,5 @@ -[[!meta copyright="Copyright © 2006, 2007, 2008, 2010 Free Software Foundation, -Inc."]] +[[!meta copyright="Copyright © 2006, 2007, 2008, 2010, 2013 Free Software +Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this @@ -9,7 +9,13 @@ Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled [[GNU Free Documentation License|/fdl]]."]]"""]] -The magic translator provides `/dev/fd`. +The `magic` translator returns magic retry results, which are then resolved by +[[glibc]]'s *name lookup* routines. + +[[!toc]] + + +# `/dev/fd`. $ showtrans /dev/fd /hurd/magic --directory fd @@ -20,3 +26,253 @@ individually like this: $ ls -l /dev/fd/0 crw--w---- 1 bing tty 0, 0 Nov 19 18:00 /dev/fd/0 + + +# `/dev/tty` + + $ showtrans /dev/tty + /hurd/magic tty + + +## Open Issues + +### IRC, OFTC, #debian-hurd, 2013-06-18 + + http://www.zsh.org/mla/workers/2013/msg00547.html + + +#### IRC, OFTC, #debian-hurd, 2013-06-19 + + youpi: http://www.zsh.org/mla/workers/2013/msg00548.html -- Is + that realistic? If yes, can someone of you test it? I though would expect + that if /dev/tty exists everywhere, it's a chardev everywhere, too. + that's not impossible indeed + I've noted it on my TODO list + + +#### IRC, OFTC, #debian-hurd, 2013-06-20 + + youpi: wrt the /dev/tty existance, + https://buildd.debian.org/status/fetch.php?pkg=mksh&arch=hurd-i386&ver=46-2&stamp=1371553966 + For the build logs, demonstrate that /dev/null and /dev/tty + exist: + ls: cannot access /dev/tty: No such device or address + uh?! + ah, ENODEV + so that's what we was thinking, no tty -> no /dev/tty + + +#### IRC, OFTC, #debian-hurd, 2013-09-20 + + Hi. zsh still FTBFS on Hurd due to some test failure: + https://buildd.debian.org/status/package.php?p=zsh -- IIRC I checked last + time on some porterbox and couldn't reproduce the failure there. Any + insight if /dev/tty is not accessible on the buildds inside the chroot? + Or is it no character device there? I checked on strauss and there it is + a character device. + My only other option to debug this (didn't think of that yesterday + before the upload unfortunately) would be to override dh_auto_test with + "ls -l /dev/tty; dh_auto_test". Do you think that would be helpful? + i see /dev/tty on exodar, in the root system and in the chroot + pinotree: And it is a character device? + ... in both cases? + crw--w---- 1 pino tty 0, 0 Sep 20 10:20 /dev/tty + yes + pinotree: Hrm. + (/dev in the chroot is a firmlink to the system /dev, iirc) + pinotree: What is a firmlink? :) + pinotree: /dev/tty belongs to your user in the example above. + something between a (sym)link and an union mount + pinotree: Is it possible that /dev/tty is not visible if the + buildd runs without a connected terminal? + that i'm not sure + I see. + wouldn't it be possible to skip only that check, instead of the + whole test suite? + maybe something like + tty=$(find /dev/ -name 'tty*' -type c -print) + if [[ -n $tty ]]; then / [[ -c $tty[(f)1] && ! -c $zerolength ]] + / else / print -u$ZTST_fd 'Warning: Not testing [[ -c tty ]] (no tty + found)' / [[ ! -c $zerolength ]] / fi + (never used zsh, so please excuse me if i wrote something silly + above) + re + pinotree: Yeah, sure. That would be one way to get the thing + building again, if that's really the cause. + i guess it would find any of the available tty* devices + it does that for block devices, why not with tty devices, after + all? :) + pinotree: I just wonder if the failing test is because the test + doesn't work properly on that architecture or because it indicates that + there is a bug in zsh which only is present on hurd. + wouldn't the change proposed above help in determining it? + If I'm sure that it's a broken test, I'll try to disable that + one. If not I'd report (more details) to upstream. :) + pinotree: Oh, indeed. + if you get no warning, then a tty device was found with find + (using its -type c option), so the failing condition would be a zsh (or + maybe something in the stack below) bug + with the warning, somehow there were no tty devices available, + hence nothing to test -c with + So basically doing a check with dash to see if we should run the + zsh test. + dash? + Well, whatever /bin/sh points to. :) + ah, do you mean because of $(find ...)? + Ah, right, -type c is from find not /bin/sh + pinotree: That's my try: + http://anonscm.debian.org/gitweb/?p=collab-maint/zsh.git;a=commitdiff;h=ba5c7320d4876deb14dba60584fcdf5d5774e13b + o_O + isn't that a bit... overcomplicated? + pinotree: Yeah, it's a little bit more complicated as the tests + itself are not pure shell code but some format on their own. + why not the "thing" i wrote earlier? + pinotree: Actually it is what I understand you wanted to do, just + with more debug output. Or I dunderstood + pinotree: Actually it is what I understand you wanted to do, just + with more debug output. Or I understood your thing wrongly. + tty=$(find /dev/ -name 'tty*' -type c -print) + if [[ -n $tty ]]; then / [[ -c $tty[(f)1] && ! -c + $zerolength ]] / else / print -u$ZTST_fd 'Warning: Not testing [[ -c tty + ]] (no tty found)' / [[ ! -c $zerolength ]] / fi + pinotree: Yeah, I know. + that is, putting these lines instead of the current two + tty=/dev/tty + following + imho that should be fit for upstream + pinotree: You mean inside C02cond.ztst? + yep + pinotree: No, IMHO that's a bad idea. + why? + pinotree: That file is to test the freshly compiled zsh. I can't + rely on their code if I'm testing it. + uh? + the test above for -b is basically doing the same + pinotree: Indeed. Hrm. + that's where i did c&p most of it :) + So upstream relies on -n in the testsuite before it has tested it? + Ugly. + if upstream does it, why cannot i too? :D + pinotree: You've got a point there. + Ok, rethinking. :) + otoh you could just move the testcase for -n up to that file, so + after that you know it works already + pinotree: Well, if so, upstream should do that, not me. :) + you could suggest them to, given the -n usage in the -b testcase + pinotree: Looks alphabetically sorted, so I guess that's at least + not accidentially. + pinotree: Ok, you've convinced me. :) + :D + Especially because this is upstream-suitable once it proved to fix + the Hurd FTBFS. :) + pinotree: The previous upstream code (laast change 2001) instead + of the hardcoded /dev/tty was btw "char=(/dev/tty*([1]))", so I suspect + that the find may work on Cygwin, too. + s/aa/a/ + ah, so that's that comment about globbing on cygwin was + referring to + Yep + cool, so incidentally i've solved also that small issue :9 + :) + pinotree: I hope so. :) + Then again, I hope, external commands like find are fine for + upstream. + then they should rework the already existing testcases ;) + pinotree: Ah, I fall again for the same assumptions. :) + Seems as I would really build test suites with a different + approach. :) + nothing bad in that, i'd say + I'd try to make the tests as far as possible independent from + other tools or features to be sure to test only the stuff I want to test. + Warning: Not testing [[ -c tty ]] (no tty found) + Interesting. I didn't expect that outside a chroot. :) + where's that? + pinotree: A plain "debuild on my Sid VM. + ah + Linux, amd64 + (and Debian of course ;-) + pinotree: Ah, my fault, I kept upstreams char= but didn't change + it in your code. :) + hehe + pinotree: Will be included in the next zsh upload. But I don't + want to upload a new package before the current one moved to testing (or + got an RC bug report to fix :-) + oh sure, that's fine + pinotree: + http://anonscm.debian.org/gitweb/?p=collab-maint/zsh.git;a=commitdiff;h=22bc9278997a8172766538a2ec6613524df03742 + (I've reverted my previous commit) + \o/ + + +#### IRC, OFTC, #debian-hurd, 2013-09-30 + + Anyone knows why the building of zsh on ironforge restarted? It + was at something like "building 4h20m" when I looked last and it now is + at "building 1h17m" but there's no old or last log, so it does still look + like the first build. + most probably got stuck + Oh, ok. + pinotree: So there are cases where the log is not kept? + looks so + when the machine crashes, yes :) + youpi: Ooops. Was that me? + no, I just rebooted the box + I didn't easily find which process to kill + Ok. Then I'll check back tomorrow morning if pinotree's fix for + zsh's test suite on hurd worked. :) + it seems to be hung on + /build/buildd-zsh_5.0.2-5-hurd-i386-vO9pnz/zsh-5.0.2/obj/Test/../Src/zsh + ../Src/zsh ../../Test/ztst.zsh ../../Test/Y02compmatch.ztst + :( + At least pinotree's patch worked as it then likely passed + C02cond.ztst. :) + youpi: For how long? There are multiple tests which take at least + 3 seconds per subtest. + one hour already + Ok. + That's far too long + + +#### IRC, OFTC, #debian-hurd, 2013-10-01 + + pinotree: I've just checked + https://buildd.debian.org/status/fetch.php?pkg=zsh&arch=hurd-i386&ver=5.0.2-5&stamp=1380608100 + manually: Your fix unfortunately seemed not to help, but another test + failed, too, and that one came later and was hence suspected as primary + failing issue. + pinotree: But "+ find: `/dev/tty': No such device or address" + gives some hint. I just have no idea, why find issues that message. + * XTaran really wonders how that message can be caused. + So find sees /dev/tty, but gets an error if it tries to access + (maybe only stat) it while not being connected to a terminal. + Bingo: This reproduces the issue (note the missing -t option to + ssh): ssh exodar.debian.net "find /dev/ -nowarn -maxdepth 1 -name 'tty*' + -type c -ls" + Even clearer: $ ssh exodar.debian.net "ls -l /dev/" | grep 'tty$' + ls: cannot access /dev/tty: No such device or address + ?????????? ? ? ? ? ? tty + I'd say this is a bug somewhere deep down, either in libc or the + kernel. + or in the console translator + pinotree: Never heard of that so far. :) + pinotree: Someone from zsh upstream suggests to use /dev/null or + /dev/zero instead of /dev/tty* -- will try that for the next upload. + ah right, /dev/null should be standard POSIX + I hope so. :) + http://pubs.opengroup.org/onlinepubs/9699919799/ check in POSIX + in any case, sorry for the troubles it is giving you... + pinotree: I'm more concerned about the hanging second test. I + think I can get that test working with using /dev/null. + Now that I've understood why the original test is failing. + pinotree: Shall I write a bug report for that issue? If so, + against which package? + XTaran: not sure it is worth at this stage, having a clearer + situation on what happens could be useful + it is something that can happen sporadically, though + pinotree: Well, it seems a definitely unwanted inconsistency + between what the directory listing shows and which (pseudo) files are + accessible. Independently of where the bug resides, this needs to be + fixed IMHO. + sure, nobody denies that + pinotree: I'd call it easily reproducible. :) + not really + ... once you know where to look for. diff --git a/hurd/translator/mtab/discussion.mdwn b/hurd/translator/mtab/discussion.mdwn index 0734e1e6..973fb938 100644 --- a/hurd/translator/mtab/discussion.mdwn +++ b/hurd/translator/mtab/discussion.mdwn @@ -2103,7 +2103,245 @@ In context of [[open_issues/mig_portable_rpc_declarations]]. anyway, got to run -## IRC, freenode, #hurd, 2013-09-20 +## Memory Leak + +### IRC, freenode, #hurd, 2013-09-18 + + ext2fs is using a ginormous amount of memory on darnassus since i + last updated the hurd package :/ + i wonder if my ext2fs large store patches rework have introduced a + regression + the order of magnitude here is around 1.5G virtual space :/ + it used to take up to 3 times less before that + looks like my patches didn't make it into the latest hurd package + teythoon: looks like there definitely is a new leak in ext2fs + :/ + memory only + the number of ports looks stable relative to file system usage + braunr: I tested my patches on my development machine, it's up + for 14 days (yay libvirt :) and never encountered problems like this + i've been building glibc to reach that state + hm, that's a heavy load indeed + could be the file name tracking stuff, I tried to make sure that + everything is freed, but I might have missed something + teythoon: simply running htop run shows a slight, regular increase + in physical memory usage in ext2fs + old procfs stikes again? :) + braunr: I see that as well... curious... + 16:46 < teythoon> could be the file name tracking stuff, I tried + to make sure that everything is freed, but I might have missed something + how knows, maybe completely unrelated + the tracking patch isn't that big, I've gone over it twice today + and it still seems reasonable to me + hm + + +### IRC, freenode, #hurd, 2013-09-25 + + seems like a small leak per file access + but htop makes it obvious because it makes lots of them + shouldn't be too hard to find + since it might also come from the large store patch, i'll take a + look at it + + +### IRC, freenode, #hurd, 2013-09-27 + + teythoon: found the leak :) + although its origin is weird + braunr: where is it? + i'm still building packages to make sure that's it + see + http://darnassus.sceen.net/gitweb/savannah_mirror/hurd.git/blob/HEAD:/libdiskfs/dir-lookup.c + which you changed in + http://darnassus.sceen.net/gitweb/savannah_mirror/hurd.git/commit/06d49cdadd9e96361f3fe49b9c940b88bb869284 + line 306 is "return error" instead of "goto out" + has been so since 1994 + what is unclear is why this code path is now run + patch is here: + http://darnassus.sceen.net/~rbraun/0001-Fix-memory-leak-in-libdiskfs.patch + I see, weird indeed + teythoon: the system also feels slower somehow + such errors might have introduced unexpected retries + i think it's possible to write a coccinelle patch to find such + errors + + +### IRC, freenode, #hurd, 2013-09-28 + + braunr: bah, I havent noticed the leak on my box, even after + building eglibc & hurd several times + that's weird + are you sure it's up to date ? + also, is procfs correctly attached to /proc ? + that's what seems to trigger it + yes, 20130924-2, with procfs on /proc + + braunr: that turned out to be the leak indeed? and somehow my + changes triggered it? did you discover why? + teythoon: yes, yes, no + but youpi didn't see the leak on his system + ^^ cool that you found it + I did + oh yes you mean you saw the leak + yes + + +### IRC, freenode, #hurd, 2013-10-01 + + the fix i did in libdiskfs might have fixed other issues + apparently, it's the code path taken when error isn't ENOENT, + including no error (translator started) + the memory leak fix, you mean? + yes + it might haved fixed reference counting too + although i'm not sure if we actually ever run into that issue in + the past + the weird thing is, that path is taken when starting a passive + translator + (i think) + (it might be any kind of translator, and just doing nothing if + alcready active) + already* + anyway, the fact that the leak was so visible means this code was + run very often + which doesn't make sense + hm ok, it seems that code was run every time actually + but the leak became visible when it concerned memory + which side-effects did the old code produce? + teythoon added a dynamically allocated path that wasn't freed + reference leaks + which might explain the assertion on reference we sometimes see + with ext2fs + when a counter overflows and becomes 0 + +[[open_issues/ext2fs_libports_reference_counting_assertion]]? + + hmm + which is why i'm mentioning it + :) + i'll try to reproduce the assertion + libdiskfs/node-drop.c: assert (np->dn_stat.st_size == 0); ← + this one? + yes + hm no + oho + no, not that one + no-oho + well maybe by side effect + but i doubt it + iirc you constantly get that when building ustr + (e.g., because the object was freed and reallocated quickly, + st_size has been reset, something like that) + is ustr a package ? + yes + ok + thanks + pinotree: indeed, it's still present + pinotree: actually, after a more in-depth look, reference counting + looks valid before the fix too + ok, thanks for checking + pinotree: the assertion affects the root translator, and is + triggered by a test that stresses memory + memory as in ram, or as in disk storage? + malloc + ok + i suspect the code doesn't handle memory failure well + iirc the ustr tests are mostly disk-intensive + this one is really about enonmem + enomem + i'll make ext2fs print a stack trace + (might be wrong, but did not investigate further, sorry) + no worries + i'm doing it now :) + + +### IRC, freenode, #hurd, 2013-10-02 + + i've traced the problem up to truncate + which gets a negative size + shouldn't take long to find out where it comes from now + it seems our truncate doesn't handle negative values well though + EINVAL The argument length is negative or larger than the + maximum file size. + i still have to see whether it comes from the user (unlikely) or + if it's an internal inconsistency + i suspect some code wrongly handles vm_map failures + leading to that inconsistency + pinotree: looks like glibc doesn't check for length >= 0 + yeah + servers should do it nonetheless + should we fix glibc, libdiskfs/libnetfs/libtrivfs/etc, or both? + it appears a client does the truncate + i'd say both + can you take the glibc part ? :) + i was going to do the hurd part... :p + ok, i'll pick libc + well i'm doing it already + i want to write a test case first + to make sure that's the problem + already on the hurd part, you mean? + yes + ok + ok looks like it + would you share the test you are doing, so i don't need to write + it again? :) + * pinotree lazy + :) + as soon as darnassus is restarted + ideally we could have some repository with all the testcases + written over time to fix bugs in implementations/compatibility/etc + i noticed the system doesn't automatically reboot when e2fsck says + reboot, and no unexpected inconsistency was found + is that normal ? + or having something like posixtestsuite, but actively maintained + pinotree: polishing the test before sending it + sure, no hurry :) + i can't reproduce the assertion but it does make ext2fs freeze + pinotree: http://darnassus.sceen.net/~rbraun/test_ftruncate.c + merci + pinotree: ustr builds + wow + the client code (ustr) seems to perform a ftruncate with size + ((size_t)-1) whereas lengths are signed .. + i'll check other libraries and send a patch soon + + braunr: btw, did you fix the leak? + yes + + http://darnassus.sceen.net/gitweb/savannah_mirror/hurd.git/commit/a81c0c28ea606b0d0a2ad5eeb74071c746b7cdeb + 1h after tagging 0.5 ( + :( + ah yes, I've seen that commit + I just wanted to know whether this settled the issue + it does :) + good + i still can't figure out why youpi didn't had it + the code path is run when no error (actually error != ENOENT) + which explains why the leak was so visible + so my patch exposed this b/c of the allocation I added, makes + sense + it's funny actually, b/c this wasn't an issue for me as well, I + had my development vm running on that patches for two weeks + + +### IRC, freenode, #hurd, 2013-10-03 + + youpi: i've committed a fix to hurd that checks for negative sizes + when truncating files + this allows building the ustr package without making ext2fs choke + on an assertion + pinotree is preparing a patch for glibc + see truncate/ftruncate + with an off_t size parameter, which can be negative + EINVAL The argument length is negative or larger than the + maximum file size. + hurd servers were not conforming to that before my change + + +## Multiple mtab Translators Spawned + +### IRC, freenode, #hurd, 2013-09-20 teythoon: how come i see three mtab translators running ? 6 now oO @@ -2113,10 +2351,250 @@ In context of [[open_issues/mig_portable_rpc_declarations]]. teythoon: more bug fixing for you :) -## IRC, freenode, #hurd, 2013-09-23 +### IRC, freenode, #hurd, 2013-09-23 so it might be a problem with either libnetfs (which afaics has never supported passive translator records before) or procfs, but tbh I haven't investigated this yet [[open_issues/libnetfs_passive_translators]]. + +### IRC, freenode, #hurd, 2013-09-26 + + teythoon: hum, i just saw something disturbing + teythoon: to isolate the leak, i created my own proc directory + and the mtab translators it spawns seem to be owned by root oO + braunr: but how is that possible? are you sure? have you checked + with 'ids'? + no i'm not sure + also, ext2fs seems to ignore --writable when started as a passive + translator + < teythoon> braunr: but how is that possible? + messup with passive translators i guess + teythoon: actually, it looks like it has effective/available id + it has no* + this feature doesn't map well in unix + braunr: ah yes, htop doesn't handle this well and shows root + indeed, our ps shows - as username + yes + + +### [[!debbug 724868]] + + +### IRC, freenode, #hurd, 2013-10-03 + + i can't manage to find out where the hurd stores information about + active translators ... + there is this transbox per node + but where are nodes stored ? + what if they are are dropped ? + braunr: iirc, see libfshelp + well i have + i still can't find it + i fear that it works for ext2fs because that particular translator + implements a cache of open nodes + whereas things like procfs drop and recreate nodes per open + which would be the root cause for the multiple mtab bug + doesn't tmpfs support translators? + good idea + although it's still a libdiskfs based one + no problem for tmpfs, so it would be a netfs/procfs issue + better than what i feared :) + now, how is libdiskfs able to find active translators .. + ah, there is a name cache in libdiskfs .. + nope, looks fine + + +### IRC, freenode, #hurd, 2013-10-04 + + nodes with a translator seem to keep a reference in libdiskfs and + not in libnetfs + mhmmpf + oh great .. + each libdiskfs that "works" seems to implement its own + diskfs_cached_lookup function + so both ext2fs and tmpfs actually maintain a list of nodes, + keeping a reference on those with a translator + while procfs simply doesn't + teythoon: ^ + *sigh* + braunr: ok, thanks, I'll look into that + i'm not sure how to fix it + we can either fix node destruction to cleanly shut down + translators + but this would mean starting mtab on each access + or we could implement a custom cache in procfs + or perhaps a very custom change in the lookup callback for mounts + i'll try the latter + err, shouldn't we try to fix this in lib*fs? + unless you really want to work on it + i dont' know + ah, so the node is destroyed but the translator is kept running? + that's what you mean by the above? + and ext2fs makes an effort of killing it in its node cleanup + code? + yes + grmbl, i'm lagging a lot + i'm not sure + ext2fs maintains it + with ext2fs, translators can only be explicitely removed + i mean, ext2fs keeps all node descriptors alive once accessed + while procfs doesn't + teythoon: ok, looks like i have a working patch that merely caches + the node for mounts + libnetfs suffers from the same leak as libdiskfs when looking up a + translator + i'll fix it too + + i installed my fixed procfs on darnassus, only one mtab :) + nice :) + now, why is there no /home in df output ? + not sure + note how /dev/tty* end up in /proc/mounts, those are passive + translators too, no? + yes + but that's a good thing i guess + or was mounts intended for file systems only ? + well, in the unix traditional meaning + I think its nice too, yes + but why are they fine and your /home is not... + that's weirder + also, mounts actually doesn't show passive translators + teythoon: does your code perform any kind of comparison ? + i see /servers/socket/26 but not /servers/socket/2 + s/comparison/filter/g + hmm + well, yes, try /hurd/mtab --insecure / + (I cannot connect to darnassus from here...) + ok but that looks unrelated + both /servers/socket/26 and /servers/socket/2 refer to the same + translator + i was wondering if mtab was filtering similar entries based on + that + no + that's weird too then, isn't it ? + yes ;) + ok + btw, how is that done with the same traanslator being bound to + two nodes? settrans cannot do that, can it? + no it can't + the translator does it when started + ah + (which means there is a race if both are started simulatneously, + although it's very rare and not hard to solve) + a weird beaving translator then :) + + i have a fix for the multiple mtab issue, will send a patch + tonight + + teythoon: if ext2fs is set active, mtab output reports it + + teythoon: looks like this bug is what allows mtab not to deadlock + teythoon: when i attach it as an active translator, cat freezes + + teythoon: if (control && control->pi.port_right == fsys) + that's the filtering i was previously talking about + oh please don't name global variables "path" ... + + youpi: i fixed procfs on ironforge and exodar to be started as + procfs -c -k 3 + without -k 3, many things as simple as top and uptime won't work + + +### IRC, freenode, #hurd, 2013-10-06 + + teythoon: pty-s also bind to two nodes, not only pfinet + + +### IRC, freenode, #hurd, 2013-10-07 + + teythoon: please tell us when you're available, we need to work + out the last mtab issues + braunr: I'm available now :) + I'm sorry, I've been very busy the last two weeks, but I've + plenty of time now + great :) + did you see youpi's mail ? + i have the exact same question + I did + it seems your code registers active translators + but parent translators don't seem to register them when they're + created from passive translators + or am i mistaken ? + I'll need a moment to get my hurd machine and myself up to + speed... + braunr: I concur with youpi, hooking into fshelp_fetch_root + should do just fine + I'll just try that + ok + how do you deal with mtab reporting itself ? + o_O does it do that? + no, but it should + when i set it as an active translator, i get a deadlock + hm + teythoon: before you change libfshelp, i'd like you to try + something else + use more appropriate names for global variables in mtab.c + in particular, the variable path clashes with local names + noted + teythoon: as a side note (i'm not asking to rewrite anything) + i strongly recommend a very explicit object oriented style of + coding + (or data-oriented as it's sometimes called) + use prefixes for all your interfaces so they can be made public if + needed (which acts as a namespace and avoids lots of collisions + naturally) + use "constructors" and "destructors" (functions that both allocate + and initialize) + this helps avoiding leaks a lot too + hm, I thought I did that, could you be more specific? + ok didn't see the comment + /* XXX split up */ error_t mtab_populate (... + :) + as a better example, see your code in libfshelp/translator-list.c + struct translator should have been treated as an object + this would probably have completely avoided any leaks in the first + place + braunr: right, I deviated from that style there + teythoon: these are minor details, don't mind them too much, i + just find it helps me a lot + braunr: sure, I appreciate the feedback :) + + +### IRC, freenode, #hurd, 2013-10-08 + + braunr: I'm on to the passive translator not getting registered + issue + however, removing them from the list if the active translator is + killed does not work as expected... I still need to fiddle with the + notifications to get this right + ok + + +### IRC, freenode, #hurd, 2013-10-16 + + braunr: btw, I fixed the 'passive translator not showing up in + proc/mounts'-issue + but 4 ports do leak each time a translator is killed and + reinstalled + this happens with passive ones as well as active ones + teythoon: is that issue tied to your changed ? + changes* + I'm not sure tbh, testing that is on my list of things to do + ok + first thing to know i guess + yes + + +## Memory Leak in `translator_ihash_cleanup` + +### IRC, freenode, #hurd, 2013-10-04 + + teythoon: isn't there a leak in translator_ihash_cleanup ? + braunr: looks like, yes + braunr: I probably forgot to add the free (element->name) when I + added the name field + teythoon: ok + teythoon: i let you fix that :p + braunr: sure ;) diff --git a/hurd/translator/proc.mdwn b/hurd/translator/proc.mdwn index d5e0960c..75bfb8fd 100644 --- a/hurd/translator/proc.mdwn +++ b/hurd/translator/proc.mdwn @@ -63,6 +63,35 @@ It is stated by `/hurd/init`. something special +## IRC, freenode, #hurd, 2013-09-25 + + so nice to finally see proc in top :) + hm cute, htop layout has become buggy, top just won't start + braunr: make sure your procfs knows the correct kernel pid + # showtrans /proc + /hurd/procfs -c -k 3 + we could have handled this nicer if procfs were integrated + upstream + we should probably just update the default + teythoon: mhm + $ fsysopts /proc + /hurd/procfs --stat-mode=444 --fake-self=1 + $ showtrans /proc + /hurd/procfs -c + -c == --stat-mode=444 --fake-self=1 + better indeed + teythoon: thanks + + +## IRC, freenode, #hurd, 2013-10-24 + + braunr: i'm using your repo and i can't see cpu percentage in htop + anymore, all zeroes, confirmed? + gg0: no + gg0: you probably need to reset procfs + gg0: settrans /proc /hurd/procfs -c -k 3 + + # Process Discovery ## IRC, freenode, #hurd, 2013-08-26 diff --git a/hurd/translator/procfs/jkoenig/discussion.mdwn b/hurd/translator/procfs/jkoenig/discussion.mdwn index fc071337..018db7b2 100644 --- a/hurd/translator/procfs/jkoenig/discussion.mdwn +++ b/hurd/translator/procfs/jkoenig/discussion.mdwn @@ -436,6 +436,72 @@ Also used in `[GCC]/intl/relocatable.c`:`find_shared_library_fullname` for `#ifdef __linux__`. +### IRC, freenode, #hurd, 2013-10-03 + + what's the equivalent of cat /proc/self/maps on hurd? + camm`: for now, /proc/self doesn't work as expected + thanks, I just want to get a list of maps and protection status for + a running process -- how? + vminfo + thanks so much! I'm trying to debug an unexec failure on hurd when + a linker script is present. All works with the default script, but when + the text address is changed, unexec fails, running into a page with no + access in the middle of the executable: 0xc4b000[0x1000] (prot=0, + max_prot=RWX, offs=0xb55000) + I get a segfault when trying to read from this page. + unexec ? + emacs/gcl/maxima/acl2/hol88/axiom use unexec to dump a running + image into a saved executable elf file. + what is unexec ? + ok looks like a dirty tool + camm`: what is segfaulting, unexec or the resulting executable ? + unexec opens the file from which the running program was originally + executed, finds its section start addresses, then writes a new file + replacing any data in the old file with possibly modified versions in + running memory. The reverse of 'exec'. + the read from running memory delimited by the addresses in the + executable file is hitting a page which has been protected with *no* + access, and is segfaulting. Somehow, when the binary file is loaded, + hurd turning off all rights to this page. + let me check the stack location ... + ok I think I've got it -- hurd moves the sbrk(0) address away from + the end of .data (as reported by readelf) if the addresses are low, + presumably to avoid running into the stack. + starting sbrk(0)!=.data+data_size on hurd + i'm not sure there is anything like the heap on the hurd + sbrk is probably implemented on top of mmap + camm`: hm no, i'm wrong, glibc implements brk and sbrk mostly as + expected, but remapping the area isn't atomic + "Now reallocate it with no access allowed" + then, there is a call to vm_protect + and no error checking + ... + ok, that's fine, but need to know -- in general there is no + relationship between the address returned by sbrk(0) and the .data + addresses reported by readelf on the file, (hurd only) yes? + i don't know about that + there should be .. + Specific example: readelf -a -> [24] .data PROGBITS + 000f5580 0c4580 000328 00 WA 0 0 32 + + sbrk(0)->(void *) 0x8021000 + camm`: is that on an executable or a shared object ? + executable + 000f5580 looks very low + This is using a linker script. The default setup works just fine. + I think it (might) make sense for hurd to silently do this give the + placement of the C stack, but the assumptions behind my algorithm need + changing (perhaps). + (I probe in configure the allowable range of __executable_start, + and then choose a value to either ensure a large free signed range around + NULL, or a low data start to maximize heap) + braunr: are there any guarantees of sbrk(0)==.data+size without a + linker script? + camm`: i'm not sure at all + sbrk isn't even posix + thanks + + # `/proc/[PID]/mem` Needed by glibc's `pldd` tool (commit @@ -471,3 +537,19 @@ Needed by glibc's `pldd` tool (commit both htop and top seem to have problems report the cpu time so i expect the problem to be in procfs + + +# IRC, freenode, #hurd, 2013-10-03 + + teythoon: any reason the static variable translator_exists isn't + protected by a lock in procfs/rootdir.c ? + + +## IRC, freenode, #hurd, 2013-10-04 + + teythoon: can you tell me why translator_exists isn't protected + from shared access in rootdir_mounts_exists ? + braunr: hm, dunno tbh, I probably thought the race was harmless + enough + it probably is + settrans -Rg doesn't work on procfs :( diff --git a/hurd/translator/term.mdwn b/hurd/translator/term.mdwn new file mode 100644 index 00000000..667677a7 --- /dev/null +++ b/hurd/translator/term.mdwn @@ -0,0 +1,207 @@ +[[!meta copyright="Copyright © 2013 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +The *term* translator implements POSIX termios discipline. + + +# Open Issues + +## [[open_issues/Term_Blocking]] + +## Leaks/Not Re-used/Not Terminating + +[[!tag open_issue_hurd]] + + +### IRC, freenode, #hurd, 2013-10-14 + + good news + the terminal leak is related to privilege separation + I love how, as an unknowing by-stander, that is somehow good news + :-) + :) + it's a good news because 1/ we have more knowledge about the issue + and 2/ it may not even be a hurd bug + but rather an openssh-on-hurd bug + this explains why i didn't see the issue on anything else + (mach/hurd consoles, x terminals) + and this will also indirectly solve the screen lockup issue + braunr: good catch :) + s/a good news/good news/ + ah, yes, both definitely good news. Congrats on the progress. + i remember we used to disable privilege separation in the past + i'll have to dig what made us use it + interesting, screen seems to be affected nonetheless + so it's something common to both screen and ssh privsep + apparently, what sshd+privse and screen have in common is a fifo + so it's probably a tricky hurd bug actually + + +### IRC, freenode, #hurd, 2013-10-16 + + pflocal is leaking ports .. + this might be what blocks terminals + * pinotree gives braunr a stick of glue + thanks + + pflocal leaks struct sock .. + grmbl + + hm nice, pflocal leaks each time a socket is bound and/or accepted + on + looks like a simple ref mess + braunr: really? + yes + a leak in pflocal feels strange, never noticed it taking lots of + memory (and it's used a lot) + it's a port leak + well + no it's both a memory and port leak + not sure which one is the root cause yet + i guess server sockets aren't automatically unbound + if you want to see the leak, just disable priv separation in ssh + (to avoid the terminal leak ....) and write a shell loop to start ssh + your_server echo hello + google shows mails about the leak in the past + i also hope it fixes the terminal leak, although i'm really not + sure :( + + +### IRC, freenode, #hurd, 2013-10-17 + + hm nice, apparently, there is no pflocal leak + but a libdiskfs one ! + since ext2fs enables the ifsock shortcut + seems like it leaks a reference on sock node deletion + braunr: have you looked at libdiskfs/dead-name.c? + braunr: I think I'm hunting a very similar problem + i'm doing it now + I had the problem of dead name notifications not being delivered + wow + b/c I held no reference to the ports_info thing, so the dead + name handler in libports could no longer find the pi struct, so the + notification was silently dropped + i see + but it looks like dropping a node makes sure the associated + sockaddr has been deleted if any + are you sure the node is dropped in the first place? + no + well + i see something happenning at the pflocal side when removing the + node + but there is still a send right lingering somewhere + (see why we need a global lsof :p) + indeed + i'll try portinfo with that option we talked about + yes + 121 => 1682: send (refs: 1) + yep, ext2fs still has it + (I wonder how portinfo does that...) + i guess it imports rights from the target task + and see if it gets the same name as a local right + makes sense + easy to check + well, no, it cannot do that for receive rights + it creates an empty task just for that purpose + and uses mach_port_extract_right + but it works as you described, yes + so yes it does work for receive rights too + yes + cool :) + so it assumes identical port names are part of the ipc interface + something neal said we shouldn't rely on + iirc + yes, I remember something like that too + here is the strange thing + node->sockaddr is deallocated on a dead name notification + drop_node checks that sockaddr is null + so how can the dead name notification occur before the node is + dropped ? + so maybe the node is still around indeed + apparently, libdiskfs considers the address holds a reference on + the node + on the other hand, the server socket won't get released unless the + address gets a no-sender notification ... + this should probably be turned into a weak reference + teythoon: indeed, the node is leaked + + pflocal crashes when removing correctly deallocating addresses and + removing server sockets :/ + + ok, pflocal bug fixed + still have to fix the libdiskfs leak + and libdiskfs leak fixed too + :) + i'll build hurd packages with my changes to make sure i don't + break something before comitting + and see if this fixes the term issue + + looks like my patches work just fine :) + it doesn't solve the term issue though + + so, according to portinfo, pflocal has send rights to terminals oO + + mhhhmmmmmm + openssh seems to pass terminal file descriptors through unix + sockets when using privilege separation + braunr: i a write(sock, &pid, sizeof int) (or the like)? + *ie + not pid, file descriptors + SCM_RIGHTS + ah ok + the socket send/recv interface does support passing mach ports + and the leaked ports do turn into dead names when i kill terminals + yes, we support with a patch pochu did few years ago + so it seems the leak is related to libpipe this time + ok got it :) + pflocal used copy_send instead of move_send + \o/ + that bug was such a pain + * braunr happy + :) + speaking of it, in pflocal' S_socket_recv is it correct the + "out_flags = 0;"? + nice catch + although i wonder why flags are returned + it may have been set to null to tell us that we don't want to + return flags + pfinet seems to use it + but you change a local variable anyway + yes it's not useful + hmm + out_flags is what gets in struct msghdr -> msg_flags + so i guess it makes sense to fix it to *out_flags = 0, just to be + safe + pinotree: do you want me to push it tonight along with the others + ? + yes please + ok + thanks! + pflocal seems to not leak any memory or ports at all + great :> + + there, patches pushed :) + + +## `screen` Logout Hang + +[[!tag open_issue_hurd]] + + +### IRC, freenode, #hurd, 2013-10-14 + + i fixed term so that screen can shutdown properly + read() wouldn't return EIO after terminal hangup + + +### IRC, freenode, #hurd, 2013-10-17 + + and the missing EOI prevented screen from correctly shutting down + windows diff --git a/hurd/translator/tmpfs/discussion.mdwn b/hurd/translator/tmpfs/discussion.mdwn index 20aba837..8c332d84 100644 --- a/hurd/translator/tmpfs/discussion.mdwn +++ b/hurd/translator/tmpfs/discussion.mdwn @@ -430,3 +430,40 @@ License|/fdl]]."]]"""]] ok but that indeed means writeback of ext2fs works, which is a good sign :) + + +# IRC, freenode, #hurd, 2013-10-04 + + btw, I noticed that fifos do not work on tmpfs + teythoon: tmpfs seems limited, yes + that's annoying b/c /run is a tmpfs on Debian and sysvinit + creates a crontrol fifo there + I wonder why I didn't notice that before + also, fifos, like symlinks, can be shortcircuited in libdiskfs + i wonder if that has anything to do with the problem at hand + +[[mtab/discussion]], *Multiple mtab Translators Spawned*. + + b/c this breaks reboot & friends + I do too + b/c I cannot find any shortcut related code in tmpfs + well, it's optional normally + so that's ok + but has it really been tested when the option wasn't there ? :) + yes, but the tmpfs requests this by setting diskfs_shortcut_fifo + = 1; + hm i remember tmpfs was said to be working with + sockets/fifos/etc, back then when it was fixed + teythoon: oh + + +## IRC, freenode, #hurd, 2013-10-11 + + this will have to wait for the next hurd pkg unfortunately, b/c + I broke tmpfs by accident :-/ + how so? + the dropping of privileges broke passive translators and mkfifo + there actually is a reason why those are run as root or with the + privilege of their owner + privileges should be decoupled from identity + yes -- cgit v1.2.3