From 47e4d194dc36adfcfd2577fa4630c9fcded005d3 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Sun, 27 Oct 2013 19:15:06 +0100 Subject: IRC. --- hurd/translator/mtab/discussion.mdwn | 482 ++++++++++++++++++++++++++++++++++- 1 file changed, 480 insertions(+), 2 deletions(-) (limited to 'hurd/translator/mtab') diff --git a/hurd/translator/mtab/discussion.mdwn b/hurd/translator/mtab/discussion.mdwn index 0734e1e6..973fb938 100644 --- a/hurd/translator/mtab/discussion.mdwn +++ b/hurd/translator/mtab/discussion.mdwn @@ -2103,7 +2103,245 @@ In context of [[open_issues/mig_portable_rpc_declarations]]. anyway, got to run -## IRC, freenode, #hurd, 2013-09-20 +## Memory Leak + +### IRC, freenode, #hurd, 2013-09-18 + + ext2fs is using a ginormous amount of memory on darnassus since i + last updated the hurd package :/ + i wonder if my ext2fs large store patches rework have introduced a + regression + the order of magnitude here is around 1.5G virtual space :/ + it used to take up to 3 times less before that + looks like my patches didn't make it into the latest hurd package + teythoon: looks like there definitely is a new leak in ext2fs + :/ + memory only + the number of ports looks stable relative to file system usage + braunr: I tested my patches on my development machine, it's up + for 14 days (yay libvirt :) and never encountered problems like this + i've been building glibc to reach that state + hm, that's a heavy load indeed + could be the file name tracking stuff, I tried to make sure that + everything is freed, but I might have missed something + teythoon: simply running htop run shows a slight, regular increase + in physical memory usage in ext2fs + old procfs stikes again? :) + braunr: I see that as well... curious... + 16:46 < teythoon> could be the file name tracking stuff, I tried + to make sure that everything is freed, but I might have missed something + how knows, maybe completely unrelated + the tracking patch isn't that big, I've gone over it twice today + and it still seems reasonable to me + hm + + +### IRC, freenode, #hurd, 2013-09-25 + + seems like a small leak per file access + but htop makes it obvious because it makes lots of them + shouldn't be too hard to find + since it might also come from the large store patch, i'll take a + look at it + + +### IRC, freenode, #hurd, 2013-09-27 + + teythoon: found the leak :) + although its origin is weird + braunr: where is it? + i'm still building packages to make sure that's it + see + http://darnassus.sceen.net/gitweb/savannah_mirror/hurd.git/blob/HEAD:/libdiskfs/dir-lookup.c + which you changed in + http://darnassus.sceen.net/gitweb/savannah_mirror/hurd.git/commit/06d49cdadd9e96361f3fe49b9c940b88bb869284 + line 306 is "return error" instead of "goto out" + has been so since 1994 + what is unclear is why this code path is now run + patch is here: + http://darnassus.sceen.net/~rbraun/0001-Fix-memory-leak-in-libdiskfs.patch + I see, weird indeed + teythoon: the system also feels slower somehow + such errors might have introduced unexpected retries + i think it's possible to write a coccinelle patch to find such + errors + + +### IRC, freenode, #hurd, 2013-09-28 + + braunr: bah, I havent noticed the leak on my box, even after + building eglibc & hurd several times + that's weird + are you sure it's up to date ? + also, is procfs correctly attached to /proc ? + that's what seems to trigger it + yes, 20130924-2, with procfs on /proc + + braunr: that turned out to be the leak indeed? and somehow my + changes triggered it? did you discover why? + teythoon: yes, yes, no + but youpi didn't see the leak on his system + ^^ cool that you found it + I did + oh yes you mean you saw the leak + yes + + +### IRC, freenode, #hurd, 2013-10-01 + + the fix i did in libdiskfs might have fixed other issues + apparently, it's the code path taken when error isn't ENOENT, + including no error (translator started) + the memory leak fix, you mean? + yes + it might haved fixed reference counting too + although i'm not sure if we actually ever run into that issue in + the past + the weird thing is, that path is taken when starting a passive + translator + (i think) + (it might be any kind of translator, and just doing nothing if + alcready active) + already* + anyway, the fact that the leak was so visible means this code was + run very often + which doesn't make sense + hm ok, it seems that code was run every time actually + but the leak became visible when it concerned memory + which side-effects did the old code produce? + teythoon added a dynamically allocated path that wasn't freed + reference leaks + which might explain the assertion on reference we sometimes see + with ext2fs + when a counter overflows and becomes 0 + +[[open_issues/ext2fs_libports_reference_counting_assertion]]? + + hmm + which is why i'm mentioning it + :) + i'll try to reproduce the assertion + libdiskfs/node-drop.c: assert (np->dn_stat.st_size == 0); ← + this one? + yes + hm no + oho + no, not that one + no-oho + well maybe by side effect + but i doubt it + iirc you constantly get that when building ustr + (e.g., because the object was freed and reallocated quickly, + st_size has been reset, something like that) + is ustr a package ? + yes + ok + thanks + pinotree: indeed, it's still present + pinotree: actually, after a more in-depth look, reference counting + looks valid before the fix too + ok, thanks for checking + pinotree: the assertion affects the root translator, and is + triggered by a test that stresses memory + memory as in ram, or as in disk storage? + malloc + ok + i suspect the code doesn't handle memory failure well + iirc the ustr tests are mostly disk-intensive + this one is really about enonmem + enomem + i'll make ext2fs print a stack trace + (might be wrong, but did not investigate further, sorry) + no worries + i'm doing it now :) + + +### IRC, freenode, #hurd, 2013-10-02 + + i've traced the problem up to truncate + which gets a negative size + shouldn't take long to find out where it comes from now + it seems our truncate doesn't handle negative values well though + EINVAL The argument length is negative or larger than the + maximum file size. + i still have to see whether it comes from the user (unlikely) or + if it's an internal inconsistency + i suspect some code wrongly handles vm_map failures + leading to that inconsistency + pinotree: looks like glibc doesn't check for length >= 0 + yeah + servers should do it nonetheless + should we fix glibc, libdiskfs/libnetfs/libtrivfs/etc, or both? + it appears a client does the truncate + i'd say both + can you take the glibc part ? :) + i was going to do the hurd part... :p + ok, i'll pick libc + well i'm doing it already + i want to write a test case first + to make sure that's the problem + already on the hurd part, you mean? + yes + ok + ok looks like it + would you share the test you are doing, so i don't need to write + it again? :) + * pinotree lazy + :) + as soon as darnassus is restarted + ideally we could have some repository with all the testcases + written over time to fix bugs in implementations/compatibility/etc + i noticed the system doesn't automatically reboot when e2fsck says + reboot, and no unexpected inconsistency was found + is that normal ? + or having something like posixtestsuite, but actively maintained + pinotree: polishing the test before sending it + sure, no hurry :) + i can't reproduce the assertion but it does make ext2fs freeze + pinotree: http://darnassus.sceen.net/~rbraun/test_ftruncate.c + merci + pinotree: ustr builds + wow + the client code (ustr) seems to perform a ftruncate with size + ((size_t)-1) whereas lengths are signed .. + i'll check other libraries and send a patch soon + + braunr: btw, did you fix the leak? + yes + + http://darnassus.sceen.net/gitweb/savannah_mirror/hurd.git/commit/a81c0c28ea606b0d0a2ad5eeb74071c746b7cdeb + 1h after tagging 0.5 ( + :( + ah yes, I've seen that commit + I just wanted to know whether this settled the issue + it does :) + good + i still can't figure out why youpi didn't had it + the code path is run when no error (actually error != ENOENT) + which explains why the leak was so visible + so my patch exposed this b/c of the allocation I added, makes + sense + it's funny actually, b/c this wasn't an issue for me as well, I + had my development vm running on that patches for two weeks + + +### IRC, freenode, #hurd, 2013-10-03 + + youpi: i've committed a fix to hurd that checks for negative sizes + when truncating files + this allows building the ustr package without making ext2fs choke + on an assertion + pinotree is preparing a patch for glibc + see truncate/ftruncate + with an off_t size parameter, which can be negative + EINVAL The argument length is negative or larger than the + maximum file size. + hurd servers were not conforming to that before my change + + +## Multiple mtab Translators Spawned + +### IRC, freenode, #hurd, 2013-09-20 teythoon: how come i see three mtab translators running ? 6 now oO @@ -2113,10 +2351,250 @@ In context of [[open_issues/mig_portable_rpc_declarations]]. teythoon: more bug fixing for you :) -## IRC, freenode, #hurd, 2013-09-23 +### IRC, freenode, #hurd, 2013-09-23 so it might be a problem with either libnetfs (which afaics has never supported passive translator records before) or procfs, but tbh I haven't investigated this yet [[open_issues/libnetfs_passive_translators]]. + +### IRC, freenode, #hurd, 2013-09-26 + + teythoon: hum, i just saw something disturbing + teythoon: to isolate the leak, i created my own proc directory + and the mtab translators it spawns seem to be owned by root oO + braunr: but how is that possible? are you sure? have you checked + with 'ids'? + no i'm not sure + also, ext2fs seems to ignore --writable when started as a passive + translator + < teythoon> braunr: but how is that possible? + messup with passive translators i guess + teythoon: actually, it looks like it has effective/available id + it has no* + this feature doesn't map well in unix + braunr: ah yes, htop doesn't handle this well and shows root + indeed, our ps shows - as username + yes + + +### [[!debbug 724868]] + + +### IRC, freenode, #hurd, 2013-10-03 + + i can't manage to find out where the hurd stores information about + active translators ... + there is this transbox per node + but where are nodes stored ? + what if they are are dropped ? + braunr: iirc, see libfshelp + well i have + i still can't find it + i fear that it works for ext2fs because that particular translator + implements a cache of open nodes + whereas things like procfs drop and recreate nodes per open + which would be the root cause for the multiple mtab bug + doesn't tmpfs support translators? + good idea + although it's still a libdiskfs based one + no problem for tmpfs, so it would be a netfs/procfs issue + better than what i feared :) + now, how is libdiskfs able to find active translators .. + ah, there is a name cache in libdiskfs .. + nope, looks fine + + +### IRC, freenode, #hurd, 2013-10-04 + + nodes with a translator seem to keep a reference in libdiskfs and + not in libnetfs + mhmmpf + oh great .. + each libdiskfs that "works" seems to implement its own + diskfs_cached_lookup function + so both ext2fs and tmpfs actually maintain a list of nodes, + keeping a reference on those with a translator + while procfs simply doesn't + teythoon: ^ + *sigh* + braunr: ok, thanks, I'll look into that + i'm not sure how to fix it + we can either fix node destruction to cleanly shut down + translators + but this would mean starting mtab on each access + or we could implement a custom cache in procfs + or perhaps a very custom change in the lookup callback for mounts + i'll try the latter + err, shouldn't we try to fix this in lib*fs? + unless you really want to work on it + i dont' know + ah, so the node is destroyed but the translator is kept running? + that's what you mean by the above? + and ext2fs makes an effort of killing it in its node cleanup + code? + yes + grmbl, i'm lagging a lot + i'm not sure + ext2fs maintains it + with ext2fs, translators can only be explicitely removed + i mean, ext2fs keeps all node descriptors alive once accessed + while procfs doesn't + teythoon: ok, looks like i have a working patch that merely caches + the node for mounts + libnetfs suffers from the same leak as libdiskfs when looking up a + translator + i'll fix it too + + i installed my fixed procfs on darnassus, only one mtab :) + nice :) + now, why is there no /home in df output ? + not sure + note how /dev/tty* end up in /proc/mounts, those are passive + translators too, no? + yes + but that's a good thing i guess + or was mounts intended for file systems only ? + well, in the unix traditional meaning + I think its nice too, yes + but why are they fine and your /home is not... + that's weirder + also, mounts actually doesn't show passive translators + teythoon: does your code perform any kind of comparison ? + i see /servers/socket/26 but not /servers/socket/2 + s/comparison/filter/g + hmm + well, yes, try /hurd/mtab --insecure / + (I cannot connect to darnassus from here...) + ok but that looks unrelated + both /servers/socket/26 and /servers/socket/2 refer to the same + translator + i was wondering if mtab was filtering similar entries based on + that + no + that's weird too then, isn't it ? + yes ;) + ok + btw, how is that done with the same traanslator being bound to + two nodes? settrans cannot do that, can it? + no it can't + the translator does it when started + ah + (which means there is a race if both are started simulatneously, + although it's very rare and not hard to solve) + a weird beaving translator then :) + + i have a fix for the multiple mtab issue, will send a patch + tonight + + teythoon: if ext2fs is set active, mtab output reports it + + teythoon: looks like this bug is what allows mtab not to deadlock + teythoon: when i attach it as an active translator, cat freezes + + teythoon: if (control && control->pi.port_right == fsys) + that's the filtering i was previously talking about + oh please don't name global variables "path" ... + + youpi: i fixed procfs on ironforge and exodar to be started as + procfs -c -k 3 + without -k 3, many things as simple as top and uptime won't work + + +### IRC, freenode, #hurd, 2013-10-06 + + teythoon: pty-s also bind to two nodes, not only pfinet + + +### IRC, freenode, #hurd, 2013-10-07 + + teythoon: please tell us when you're available, we need to work + out the last mtab issues + braunr: I'm available now :) + I'm sorry, I've been very busy the last two weeks, but I've + plenty of time now + great :) + did you see youpi's mail ? + i have the exact same question + I did + it seems your code registers active translators + but parent translators don't seem to register them when they're + created from passive translators + or am i mistaken ? + I'll need a moment to get my hurd machine and myself up to + speed... + braunr: I concur with youpi, hooking into fshelp_fetch_root + should do just fine + I'll just try that + ok + how do you deal with mtab reporting itself ? + o_O does it do that? + no, but it should + when i set it as an active translator, i get a deadlock + hm + teythoon: before you change libfshelp, i'd like you to try + something else + use more appropriate names for global variables in mtab.c + in particular, the variable path clashes with local names + noted + teythoon: as a side note (i'm not asking to rewrite anything) + i strongly recommend a very explicit object oriented style of + coding + (or data-oriented as it's sometimes called) + use prefixes for all your interfaces so they can be made public if + needed (which acts as a namespace and avoids lots of collisions + naturally) + use "constructors" and "destructors" (functions that both allocate + and initialize) + this helps avoiding leaks a lot too + hm, I thought I did that, could you be more specific? + ok didn't see the comment + /* XXX split up */ error_t mtab_populate (... + :) + as a better example, see your code in libfshelp/translator-list.c + struct translator should have been treated as an object + this would probably have completely avoided any leaks in the first + place + braunr: right, I deviated from that style there + teythoon: these are minor details, don't mind them too much, i + just find it helps me a lot + braunr: sure, I appreciate the feedback :) + + +### IRC, freenode, #hurd, 2013-10-08 + + braunr: I'm on to the passive translator not getting registered + issue + however, removing them from the list if the active translator is + killed does not work as expected... I still need to fiddle with the + notifications to get this right + ok + + +### IRC, freenode, #hurd, 2013-10-16 + + braunr: btw, I fixed the 'passive translator not showing up in + proc/mounts'-issue + but 4 ports do leak each time a translator is killed and + reinstalled + this happens with passive ones as well as active ones + teythoon: is that issue tied to your changed ? + changes* + I'm not sure tbh, testing that is on my list of things to do + ok + first thing to know i guess + yes + + +## Memory Leak in `translator_ihash_cleanup` + +### IRC, freenode, #hurd, 2013-10-04 + + teythoon: isn't there a leak in translator_ihash_cleanup ? + braunr: looks like, yes + braunr: I probably forgot to add the free (element->name) when I + added the name field + teythoon: ok + teythoon: i let you fix that :p + braunr: sure ;) -- cgit v1.2.3