diff options
author | Thomas Schwinge <tschwinge@gnu.org> | 2013-10-27 19:15:06 +0100 |
---|---|---|
committer | Thomas Schwinge <tschwinge@gnu.org> | 2013-10-27 19:15:06 +0100 |
commit | 47e4d194dc36adfcfd2577fa4630c9fcded005d3 (patch) | |
tree | d16ffd2eeb74d1977fb3e9744e4a38befedb4ddf /hurd/translator/mtab | |
parent | ca39ad0592e9b99dac9d99c68bb36ef1d27f72df (diff) |
IRC.
Diffstat (limited to 'hurd/translator/mtab')
-rw-r--r-- | hurd/translator/mtab/discussion.mdwn | 482 |
1 files changed, 480 insertions, 2 deletions
diff --git a/hurd/translator/mtab/discussion.mdwn b/hurd/translator/mtab/discussion.mdwn index 0734e1e6..973fb938 100644 --- a/hurd/translator/mtab/discussion.mdwn +++ b/hurd/translator/mtab/discussion.mdwn @@ -2103,7 +2103,245 @@ In context of [[open_issues/mig_portable_rpc_declarations]]. <youpi> anyway, got to run -## IRC, freenode, #hurd, 2013-09-20 +## Memory Leak + +### IRC, freenode, #hurd, 2013-09-18 + + <braunr> ext2fs is using a ginormous amount of memory on darnassus since i + last updated the hurd package :/ + <braunr> i wonder if my ext2fs large store patches rework have introduced a + regression + <braunr> the order of magnitude here is around 1.5G virtual space :/ + <braunr> it used to take up to 3 times less before that + <braunr> looks like my patches didn't make it into the latest hurd package + <braunr> teythoon: looks like there definitely is a new leak in ext2fs + <teythoon> :/ + <braunr> memory only + <braunr> the number of ports looks stable relative to file system usage + <teythoon> braunr: I tested my patches on my development machine, it's up + for 14 days (yay libvirt :) and never encountered problems like this + <braunr> i've been building glibc to reach that state + <teythoon> hm, that's a heavy load indeed + <teythoon> could be the file name tracking stuff, I tried to make sure that + everything is freed, but I might have missed something + <braunr> teythoon: simply running htop run shows a slight, regular increase + in physical memory usage in ext2fs + <pinotree> old procfs stikes again? :) + <teythoon> braunr: I see that as well... curious... + <braunr> 16:46 < teythoon> could be the file name tracking stuff, I tried + to make sure that everything is freed, but I might have missed something + <braunr> how knows, maybe completely unrelated + <teythoon> the tracking patch isn't that big, I've gone over it twice today + and it still seems reasonable to me + <braunr> hm + + +### IRC, freenode, #hurd, 2013-09-25 + + <braunr> seems like a small leak per file access + <braunr> but htop makes it obvious because it makes lots of them + <braunr> shouldn't be too hard to find + <braunr> since it might also come from the large store patch, i'll take a + look at it + + +### IRC, freenode, #hurd, 2013-09-27 + + <braunr> teythoon: found the leak :) + <braunr> although its origin is weird + <teythoon> braunr: where is it? + <braunr> i'm still building packages to make sure that's it + <braunr> see + http://darnassus.sceen.net/gitweb/savannah_mirror/hurd.git/blob/HEAD:/libdiskfs/dir-lookup.c + <braunr> which you changed in + http://darnassus.sceen.net/gitweb/savannah_mirror/hurd.git/commit/06d49cdadd9e96361f3fe49b9c940b88bb869284 + <braunr> line 306 is "return error" instead of "goto out" + <braunr> has been so since 1994 + <braunr> what is unclear is why this code path is now run + <braunr> patch is here: + http://darnassus.sceen.net/~rbraun/0001-Fix-memory-leak-in-libdiskfs.patch + <teythoon> I see, weird indeed + <braunr> teythoon: the system also feels slower somehow + <braunr> such errors might have introduced unexpected retries + <teythoon> i think it's possible to write a coccinelle patch to find such + errors + + +### IRC, freenode, #hurd, 2013-09-28 + + <youpi> braunr: bah, I havent noticed the leak on my box, even after + building eglibc & hurd several times + <braunr> that's weird + <braunr> are you sure it's up to date ? + <braunr> also, is procfs correctly attached to /proc ? + <braunr> that's what seems to trigger it + <youpi> yes, 20130924-2, with procfs on /proc + + <teythoon> braunr: that turned out to be the leak indeed? and somehow my + changes triggered it? did you discover why? + <braunr> teythoon: yes, yes, no + <braunr> but youpi didn't see the leak on his system + <teythoon> ^^ cool that you found it + <teythoon> I did + <braunr> oh yes you mean you saw the leak + <teythoon> yes + + +### IRC, freenode, #hurd, 2013-10-01 + + <braunr> the fix i did in libdiskfs might have fixed other issues + <braunr> apparently, it's the code path taken when error isn't ENOENT, + including no error (translator started) + <pinotree> the memory leak fix, you mean? + <braunr> yes + <braunr> it might haved fixed reference counting too + <braunr> although i'm not sure if we actually ever run into that issue in + the past + <braunr> the weird thing is, that path is taken when starting a passive + translator + <braunr> (i think) + <braunr> (it might be any kind of translator, and just doing nothing if + alcready active) + <braunr> already* + <braunr> anyway, the fact that the leak was so visible means this code was + run very often + <braunr> which doesn't make sense + <braunr> hm ok, it seems that code was run every time actually + <braunr> but the leak became visible when it concerned memory + <pinotree> which side-effects did the old code produce? + <braunr> teythoon added a dynamically allocated path that wasn't freed + <braunr> reference leaks + <braunr> which might explain the assertion on reference we sometimes see + with ext2fs + <braunr> when a counter overflows and becomes 0 + +[[open_issues/ext2fs_libports_reference_counting_assertion]]? + + <pinotree> hmm + <braunr> which is why i'm mentioning it + <braunr> :) + <braunr> i'll try to reproduce the assertion + <pinotree> libdiskfs/node-drop.c: assert (np->dn_stat.st_size == 0); ← + this one? + <braunr> yes + <braunr> hm no + <pinotree> oho + <braunr> no, not that one + <pinotree> no-oho + <braunr> well maybe by side effect + <braunr> but i doubt it + <pinotree> iirc you constantly get that when building ustr + <braunr> (e.g., because the object was freed and reallocated quickly, + st_size has been reset, something like that) + <braunr> is ustr a package ? + <pinotree> yes + <braunr> ok + <braunr> thanks + <braunr> pinotree: indeed, it's still present + <braunr> pinotree: actually, after a more in-depth look, reference counting + looks valid before the fix too + <pinotree> ok, thanks for checking + <braunr> pinotree: the assertion affects the root translator, and is + triggered by a test that stresses memory + <pinotree> memory as in ram, or as in disk storage? + <braunr> malloc + <pinotree> ok + <braunr> i suspect the code doesn't handle memory failure well + <pinotree> iirc the ustr tests are mostly disk-intensive + <braunr> this one is really about enonmem + <braunr> enomem + <braunr> i'll make ext2fs print a stack trace + <pinotree> (might be wrong, but did not investigate further, sorry) + <braunr> no worries + <braunr> i'm doing it now :) + + +### IRC, freenode, #hurd, 2013-10-02 + + <braunr> i've traced the problem up to truncate + <braunr> which gets a negative size + <braunr> shouldn't take long to find out where it comes from now + <pinotree> it seems our truncate doesn't handle negative values well though + <braunr> EINVAL The argument length is negative or larger than the + maximum file size. + <braunr> i still have to see whether it comes from the user (unlikely) or + if it's an internal inconsistency + <braunr> i suspect some code wrongly handles vm_map failures + <braunr> leading to that inconsistency + <braunr> pinotree: looks like glibc doesn't check for length >= 0 + <pinotree> yeah + <braunr> servers should do it nonetheless + <pinotree> should we fix glibc, libdiskfs/libnetfs/libtrivfs/etc, or both? + <braunr> it appears a client does the truncate + <braunr> i'd say both + <braunr> can you take the glibc part ? :) + <pinotree> i was going to do the hurd part... :p + <pinotree> ok, i'll pick libc + <braunr> well i'm doing it already + <braunr> i want to write a test case first + <braunr> to make sure that's the problem + <pinotree> already on the hurd part, you mean? + <braunr> yes + <pinotree> ok + <braunr> ok looks like it + <pinotree> would you share the test you are doing, so i don't need to write + it again? :) + * pinotree lazy + <braunr> :) + <braunr> as soon as darnassus is restarted + <pinotree> ideally we could have some repository with all the testcases + written over time to fix bugs in implementations/compatibility/etc + <braunr> i noticed the system doesn't automatically reboot when e2fsck says + reboot, and no unexpected inconsistency was found + <braunr> is that normal ? + <pinotree> or having something like posixtestsuite, but actively maintained + <braunr> pinotree: polishing the test before sending it + <pinotree> sure, no hurry :) + <braunr> i can't reproduce the assertion but it does make ext2fs freeze + <braunr> pinotree: http://darnassus.sceen.net/~rbraun/test_ftruncate.c + <pinotree> merci + <braunr> pinotree: ustr builds + <pinotree> wow + <braunr> the client code (ustr) seems to perform a ftruncate with size + ((size_t)-1) whereas lengths are signed .. + <braunr> i'll check other libraries and send a patch soon + + <teythoon_> braunr: btw, did you fix the leak? + <braunr> yes + <braunr> + http://darnassus.sceen.net/gitweb/savannah_mirror/hurd.git/commit/a81c0c28ea606b0d0a2ad5eeb74071c746b7cdeb + <braunr> 1h after tagging 0.5 ( + <braunr> :( + <teythoon> ah yes, I've seen that commit + <teythoon> I just wanted to know whether this settled the issue + <braunr> it does :) + <teythoon> good + <braunr> i still can't figure out why youpi didn't had it + <braunr> the code path is run when no error (actually error != ENOENT) + <braunr> which explains why the leak was so visible + <teythoon> so my patch exposed this b/c of the allocation I added, makes + sense + <teythoon> it's funny actually, b/c this wasn't an issue for me as well, I + had my development vm running on that patches for two weeks + + +### IRC, freenode, #hurd, 2013-10-03 + + <braunr> youpi: i've committed a fix to hurd that checks for negative sizes + when truncating files + <braunr> this allows building the ustr package without making ext2fs choke + on an assertion + <braunr> pinotree is preparing a patch for glibc + <braunr> see truncate/ftruncate + <braunr> with an off_t size parameter, which can be negative + <braunr> EINVAL The argument length is negative or larger than the + maximum file size. + <braunr> hurd servers were not conforming to that before my change + + +## Multiple mtab Translators Spawned + +### IRC, freenode, #hurd, 2013-09-20 <braunr> teythoon: how come i see three mtab translators running ? <braunr> 6 now oO @@ -2113,10 +2351,250 @@ In context of [[open_issues/mig_portable_rpc_declarations]]. <braunr> teythoon: more bug fixing for you :) -## IRC, freenode, #hurd, 2013-09-23 +### IRC, freenode, #hurd, 2013-09-23 <teythoon> so it might be a problem with either libnetfs (which afaics has never supported passive translator records before) or procfs, but tbh I haven't investigated this yet [[open_issues/libnetfs_passive_translators]]. + +### IRC, freenode, #hurd, 2013-09-26 + + <braunr> teythoon: hum, i just saw something disturbing + <braunr> teythoon: to isolate the leak, i created my own proc directory + <braunr> and the mtab translators it spawns seem to be owned by root oO + <teythoon> braunr: but how is that possible? are you sure? have you checked + with 'ids'? + <braunr> no i'm not sure + <braunr> also, ext2fs seems to ignore --writable when started as a passive + translator + <braunr> < teythoon> braunr: but how is that possible? + <braunr> messup with passive translators i guess + <braunr> teythoon: actually, it looks like it has effective/available id + <braunr> it has no* + <braunr> this feature doesn't map well in unix + <teythoon> braunr: ah yes, htop doesn't handle this well and shows root + indeed, our ps shows - as username + <braunr> yes + + +### [[!debbug 724868]] + + +### IRC, freenode, #hurd, 2013-10-03 + + <braunr> i can't manage to find out where the hurd stores information about + active translators ... + <braunr> there is this transbox per node + <braunr> but where are nodes stored ? + <braunr> what if they are are dropped ? + <pinotree> braunr: iirc, see libfshelp + <braunr> well i have + <braunr> i still can't find it + <braunr> i fear that it works for ext2fs because that particular translator + implements a cache of open nodes + <braunr> whereas things like procfs drop and recreate nodes per open + <braunr> which would be the root cause for the multiple mtab bug + <pinotree> doesn't tmpfs support translators? + <braunr> good idea + <braunr> although it's still a libdiskfs based one + <braunr> no problem for tmpfs, so it would be a netfs/procfs issue + <braunr> better than what i feared :) + <braunr> now, how is libdiskfs able to find active translators .. + <braunr> ah, there is a name cache in libdiskfs .. + <braunr> nope, looks fine + + +### IRC, freenode, #hurd, 2013-10-04 + + <braunr> nodes with a translator seem to keep a reference in libdiskfs and + not in libnetfs + <braunr> mhmmpf + <braunr> oh great .. + <braunr> each libdiskfs that "works" seems to implement its own + diskfs_cached_lookup function + <braunr> so both ext2fs and tmpfs actually maintain a list of nodes, + keeping a reference on those with a translator + <braunr> while procfs simply doesn't + <braunr> teythoon: ^ + <braunr> *sigh* + <teythoon> braunr: ok, thanks, I'll look into that + <braunr> i'm not sure how to fix it + <braunr> we can either fix node destruction to cleanly shut down + translators + <braunr> but this would mean starting mtab on each access + <braunr> or we could implement a custom cache in procfs + <braunr> or perhaps a very custom change in the lookup callback for mounts + <braunr> i'll try the latter + <teythoon> err, shouldn't we try to fix this in lib*fs? + <braunr> unless you really want to work on it + <braunr> i dont' know + <teythoon> ah, so the node is destroyed but the translator is kept running? + that's what you mean by the above? + <teythoon> and ext2fs makes an effort of killing it in its node cleanup + code? + <braunr> yes + <braunr> grmbl, i'm lagging a lot + <braunr> i'm not sure + <braunr> ext2fs maintains it + <braunr> with ext2fs, translators can only be explicitely removed + <braunr> i mean, ext2fs keeps all node descriptors alive once accessed + <braunr> while procfs doesn't + <braunr> teythoon: ok, looks like i have a working patch that merely caches + the node for mounts + <braunr> libnetfs suffers from the same leak as libdiskfs when looking up a + translator + <braunr> i'll fix it too + + <braunr> i installed my fixed procfs on darnassus, only one mtab :) + <teythoon> nice :) + <braunr> now, why is there no /home in df output ? + <teythoon> not sure + <teythoon> note how /dev/tty* end up in /proc/mounts, those are passive + translators too, no? + <braunr> yes + <braunr> but that's a good thing i guess + <braunr> or was mounts intended for file systems only ? + <braunr> well, in the unix traditional meaning + <teythoon> I think its nice too, yes + <teythoon> but why are they fine and your /home is not... + <braunr> that's weirder + <braunr> also, mounts actually doesn't show passive translators + <braunr> teythoon: does your code perform any kind of comparison ? + <braunr> i see /servers/socket/26 but not /servers/socket/2 + <braunr> s/comparison/filter/g + <teythoon> hmm + <teythoon> well, yes, try /hurd/mtab --insecure / + <teythoon> (I cannot connect to darnassus from here...) + <braunr> ok but that looks unrelated + <braunr> both /servers/socket/26 and /servers/socket/2 refer to the same + translator + <braunr> i was wondering if mtab was filtering similar entries based on + that + <teythoon> no + <braunr> that's weird too then, isn't it ? + <teythoon> yes ;) + <braunr> ok + <teythoon> btw, how is that done with the same traanslator being bound to + two nodes? settrans cannot do that, can it? + <braunr> no it can't + <braunr> the translator does it when started + <teythoon> ah + <braunr> (which means there is a race if both are started simulatneously, + although it's very rare and not hard to solve) + <teythoon> a weird beaving translator then :) + + <braunr> i have a fix for the multiple mtab issue, will send a patch + tonight + + <braunr> teythoon: if ext2fs is set active, mtab output reports it + + <braunr> teythoon: looks like this bug is what allows mtab not to deadlock + <braunr> teythoon: when i attach it as an active translator, cat freezes + + <braunr> teythoon: if (control && control->pi.port_right == fsys) + <braunr> that's the filtering i was previously talking about + <braunr> oh please don't name global variables "path" ... + + <braunr> youpi: i fixed procfs on ironforge and exodar to be started as + procfs -c -k 3 + <braunr> without -k 3, many things as simple as top and uptime won't work + + +### IRC, freenode, #hurd, 2013-10-06 + + <antrik> teythoon: pty-s also bind to two nodes, not only pfinet + + +### IRC, freenode, #hurd, 2013-10-07 + + <braunr> teythoon: please tell us when you're available, we need to work + out the last mtab issues + <teythoon> braunr: I'm available now :) + <teythoon> I'm sorry, I've been very busy the last two weeks, but I've + plenty of time now + <braunr> great :) + <braunr> did you see youpi's mail ? + <braunr> i have the exact same question + <teythoon> I did + <braunr> it seems your code registers active translators + <braunr> but parent translators don't seem to register them when they're + created from passive translators + <braunr> or am i mistaken ? + <teythoon> I'll need a moment to get my hurd machine and myself up to + speed... + <teythoon> braunr: I concur with youpi, hooking into fshelp_fetch_root + should do just fine + <teythoon> I'll just try that + <braunr> ok + <braunr> how do you deal with mtab reporting itself ? + <teythoon> o_O does it do that? + <braunr> no, but it should + <braunr> when i set it as an active translator, i get a deadlock + <braunr> hm + <braunr> teythoon: before you change libfshelp, i'd like you to try + something else + <braunr> use more appropriate names for global variables in mtab.c + <braunr> in particular, the variable path clashes with local names + <teythoon> noted + <braunr> teythoon: as a side note (i'm not asking to rewrite anything) + <braunr> i strongly recommend a very explicit object oriented style of + coding + <braunr> (or data-oriented as it's sometimes called) + <braunr> use prefixes for all your interfaces so they can be made public if + needed (which acts as a namespace and avoids lots of collisions + naturally) + <braunr> use "constructors" and "destructors" (functions that both allocate + and initialize) + <braunr> this helps avoiding leaks a lot too + <teythoon> hm, I thought I did that, could you be more specific? + <braunr> ok didn't see the comment + <braunr> /* XXX split up */ error_t mtab_populate (... + <braunr> :) + <braunr> as a better example, see your code in libfshelp/translator-list.c + <braunr> struct translator should have been treated as an object + <braunr> this would probably have completely avoided any leaks in the first + place + <teythoon> braunr: right, I deviated from that style there + <braunr> teythoon: these are minor details, don't mind them too much, i + just find it helps me a lot + <teythoon> braunr: sure, I appreciate the feedback :) + + +### IRC, freenode, #hurd, 2013-10-08 + + <teythoon> braunr: I'm on to the passive translator not getting registered + issue + <teythoon> however, removing them from the list if the active translator is + killed does not work as expected... I still need to fiddle with the + notifications to get this right + <braunr> ok + + +### IRC, freenode, #hurd, 2013-10-16 + + <teythoon> braunr: btw, I fixed the 'passive translator not showing up in + proc/mounts'-issue + <teythoon> but 4 ports do leak each time a translator is killed and + reinstalled + <teythoon> this happens with passive ones as well as active ones + <braunr> teythoon: is that issue tied to your changed ? + <braunr> changes* + <teythoon> I'm not sure tbh, testing that is on my list of things to do + <braunr> ok + <braunr> first thing to know i guess + <teythoon> yes + + +## Memory Leak in `translator_ihash_cleanup` + +### IRC, freenode, #hurd, 2013-10-04 + + <braunr> teythoon: isn't there a leak in translator_ihash_cleanup ? + <teythoon> braunr: looks like, yes + <teythoon> braunr: I probably forgot to add the free (element->name) when I + added the name field + <braunr> teythoon: ok + <braunr> teythoon: i let you fix that :p + <teythoon> braunr: sure ;) |