summaryrefslogtreecommitdiff
path: root/hurd/translator/mtab
diff options
context:
space:
mode:
authorThomas Schwinge <tschwinge@gnu.org>2013-10-27 19:15:06 +0100
committerThomas Schwinge <tschwinge@gnu.org>2013-10-27 19:15:06 +0100
commit47e4d194dc36adfcfd2577fa4630c9fcded005d3 (patch)
treed16ffd2eeb74d1977fb3e9744e4a38befedb4ddf /hurd/translator/mtab
parentca39ad0592e9b99dac9d99c68bb36ef1d27f72df (diff)
IRC.
Diffstat (limited to 'hurd/translator/mtab')
-rw-r--r--hurd/translator/mtab/discussion.mdwn482
1 files changed, 480 insertions, 2 deletions
diff --git a/hurd/translator/mtab/discussion.mdwn b/hurd/translator/mtab/discussion.mdwn
index 0734e1e6..973fb938 100644
--- a/hurd/translator/mtab/discussion.mdwn
+++ b/hurd/translator/mtab/discussion.mdwn
@@ -2103,7 +2103,245 @@ In context of [[open_issues/mig_portable_rpc_declarations]].
<youpi> anyway, got to run
-## IRC, freenode, #hurd, 2013-09-20
+## Memory Leak
+
+### IRC, freenode, #hurd, 2013-09-18
+
+ <braunr> ext2fs is using a ginormous amount of memory on darnassus since i
+ last updated the hurd package :/
+ <braunr> i wonder if my ext2fs large store patches rework have introduced a
+ regression
+ <braunr> the order of magnitude here is around 1.5G virtual space :/
+ <braunr> it used to take up to 3 times less before that
+ <braunr> looks like my patches didn't make it into the latest hurd package
+ <braunr> teythoon: looks like there definitely is a new leak in ext2fs
+ <teythoon> :/
+ <braunr> memory only
+ <braunr> the number of ports looks stable relative to file system usage
+ <teythoon> braunr: I tested my patches on my development machine, it's up
+ for 14 days (yay libvirt :) and never encountered problems like this
+ <braunr> i've been building glibc to reach that state
+ <teythoon> hm, that's a heavy load indeed
+ <teythoon> could be the file name tracking stuff, I tried to make sure that
+ everything is freed, but I might have missed something
+ <braunr> teythoon: simply running htop run shows a slight, regular increase
+ in physical memory usage in ext2fs
+ <pinotree> old procfs stikes again? :)
+ <teythoon> braunr: I see that as well... curious...
+ <braunr> 16:46 < teythoon> could be the file name tracking stuff, I tried
+ to make sure that everything is freed, but I might have missed something
+ <braunr> how knows, maybe completely unrelated
+ <teythoon> the tracking patch isn't that big, I've gone over it twice today
+ and it still seems reasonable to me
+ <braunr> hm
+
+
+### IRC, freenode, #hurd, 2013-09-25
+
+ <braunr> seems like a small leak per file access
+ <braunr> but htop makes it obvious because it makes lots of them
+ <braunr> shouldn't be too hard to find
+ <braunr> since it might also come from the large store patch, i'll take a
+ look at it
+
+
+### IRC, freenode, #hurd, 2013-09-27
+
+ <braunr> teythoon: found the leak :)
+ <braunr> although its origin is weird
+ <teythoon> braunr: where is it?
+ <braunr> i'm still building packages to make sure that's it
+ <braunr> see
+ http://darnassus.sceen.net/gitweb/savannah_mirror/hurd.git/blob/HEAD:/libdiskfs/dir-lookup.c
+ <braunr> which you changed in
+ http://darnassus.sceen.net/gitweb/savannah_mirror/hurd.git/commit/06d49cdadd9e96361f3fe49b9c940b88bb869284
+ <braunr> line 306 is "return error" instead of "goto out"
+ <braunr> has been so since 1994
+ <braunr> what is unclear is why this code path is now run
+ <braunr> patch is here:
+ http://darnassus.sceen.net/~rbraun/0001-Fix-memory-leak-in-libdiskfs.patch
+ <teythoon> I see, weird indeed
+ <braunr> teythoon: the system also feels slower somehow
+ <braunr> such errors might have introduced unexpected retries
+ <teythoon> i think it's possible to write a coccinelle patch to find such
+ errors
+
+
+### IRC, freenode, #hurd, 2013-09-28
+
+ <youpi> braunr: bah, I havent noticed the leak on my box, even after
+ building eglibc & hurd several times
+ <braunr> that's weird
+ <braunr> are you sure it's up to date ?
+ <braunr> also, is procfs correctly attached to /proc ?
+ <braunr> that's what seems to trigger it
+ <youpi> yes, 20130924-2, with procfs on /proc
+
+ <teythoon> braunr: that turned out to be the leak indeed? and somehow my
+ changes triggered it? did you discover why?
+ <braunr> teythoon: yes, yes, no
+ <braunr> but youpi didn't see the leak on his system
+ <teythoon> ^^ cool that you found it
+ <teythoon> I did
+ <braunr> oh yes you mean you saw the leak
+ <teythoon> yes
+
+
+### IRC, freenode, #hurd, 2013-10-01
+
+ <braunr> the fix i did in libdiskfs might have fixed other issues
+ <braunr> apparently, it's the code path taken when error isn't ENOENT,
+ including no error (translator started)
+ <pinotree> the memory leak fix, you mean?
+ <braunr> yes
+ <braunr> it might haved fixed reference counting too
+ <braunr> although i'm not sure if we actually ever run into that issue in
+ the past
+ <braunr> the weird thing is, that path is taken when starting a passive
+ translator
+ <braunr> (i think)
+ <braunr> (it might be any kind of translator, and just doing nothing if
+ alcready active)
+ <braunr> already*
+ <braunr> anyway, the fact that the leak was so visible means this code was
+ run very often
+ <braunr> which doesn't make sense
+ <braunr> hm ok, it seems that code was run every time actually
+ <braunr> but the leak became visible when it concerned memory
+ <pinotree> which side-effects did the old code produce?
+ <braunr> teythoon added a dynamically allocated path that wasn't freed
+ <braunr> reference leaks
+ <braunr> which might explain the assertion on reference we sometimes see
+ with ext2fs
+ <braunr> when a counter overflows and becomes 0
+
+[[open_issues/ext2fs_libports_reference_counting_assertion]]?
+
+ <pinotree> hmm
+ <braunr> which is why i'm mentioning it
+ <braunr> :)
+ <braunr> i'll try to reproduce the assertion
+ <pinotree> libdiskfs/node-drop.c: assert (np->dn_stat.st_size == 0); ←
+ this one?
+ <braunr> yes
+ <braunr> hm no
+ <pinotree> oho
+ <braunr> no, not that one
+ <pinotree> no-oho
+ <braunr> well maybe by side effect
+ <braunr> but i doubt it
+ <pinotree> iirc you constantly get that when building ustr
+ <braunr> (e.g., because the object was freed and reallocated quickly,
+ st_size has been reset, something like that)
+ <braunr> is ustr a package ?
+ <pinotree> yes
+ <braunr> ok
+ <braunr> thanks
+ <braunr> pinotree: indeed, it's still present
+ <braunr> pinotree: actually, after a more in-depth look, reference counting
+ looks valid before the fix too
+ <pinotree> ok, thanks for checking
+ <braunr> pinotree: the assertion affects the root translator, and is
+ triggered by a test that stresses memory
+ <pinotree> memory as in ram, or as in disk storage?
+ <braunr> malloc
+ <pinotree> ok
+ <braunr> i suspect the code doesn't handle memory failure well
+ <pinotree> iirc the ustr tests are mostly disk-intensive
+ <braunr> this one is really about enonmem
+ <braunr> enomem
+ <braunr> i'll make ext2fs print a stack trace
+ <pinotree> (might be wrong, but did not investigate further, sorry)
+ <braunr> no worries
+ <braunr> i'm doing it now :)
+
+
+### IRC, freenode, #hurd, 2013-10-02
+
+ <braunr> i've traced the problem up to truncate
+ <braunr> which gets a negative size
+ <braunr> shouldn't take long to find out where it comes from now
+ <pinotree> it seems our truncate doesn't handle negative values well though
+ <braunr> EINVAL The argument length is negative or larger than the
+ maximum file size.
+ <braunr> i still have to see whether it comes from the user (unlikely) or
+ if it's an internal inconsistency
+ <braunr> i suspect some code wrongly handles vm_map failures
+ <braunr> leading to that inconsistency
+ <braunr> pinotree: looks like glibc doesn't check for length >= 0
+ <pinotree> yeah
+ <braunr> servers should do it nonetheless
+ <pinotree> should we fix glibc, libdiskfs/libnetfs/libtrivfs/etc, or both?
+ <braunr> it appears a client does the truncate
+ <braunr> i'd say both
+ <braunr> can you take the glibc part ? :)
+ <pinotree> i was going to do the hurd part... :p
+ <pinotree> ok, i'll pick libc
+ <braunr> well i'm doing it already
+ <braunr> i want to write a test case first
+ <braunr> to make sure that's the problem
+ <pinotree> already on the hurd part, you mean?
+ <braunr> yes
+ <pinotree> ok
+ <braunr> ok looks like it
+ <pinotree> would you share the test you are doing, so i don't need to write
+ it again? :)
+ * pinotree lazy
+ <braunr> :)
+ <braunr> as soon as darnassus is restarted
+ <pinotree> ideally we could have some repository with all the testcases
+ written over time to fix bugs in implementations/compatibility/etc
+ <braunr> i noticed the system doesn't automatically reboot when e2fsck says
+ reboot, and no unexpected inconsistency was found
+ <braunr> is that normal ?
+ <pinotree> or having something like posixtestsuite, but actively maintained
+ <braunr> pinotree: polishing the test before sending it
+ <pinotree> sure, no hurry :)
+ <braunr> i can't reproduce the assertion but it does make ext2fs freeze
+ <braunr> pinotree: http://darnassus.sceen.net/~rbraun/test_ftruncate.c
+ <pinotree> merci
+ <braunr> pinotree: ustr builds
+ <pinotree> wow
+ <braunr> the client code (ustr) seems to perform a ftruncate with size
+ ((size_t)-1) whereas lengths are signed ..
+ <braunr> i'll check other libraries and send a patch soon
+
+ <teythoon_> braunr: btw, did you fix the leak?
+ <braunr> yes
+ <braunr>
+ http://darnassus.sceen.net/gitweb/savannah_mirror/hurd.git/commit/a81c0c28ea606b0d0a2ad5eeb74071c746b7cdeb
+ <braunr> 1h after tagging 0.5 (
+ <braunr> :(
+ <teythoon> ah yes, I've seen that commit
+ <teythoon> I just wanted to know whether this settled the issue
+ <braunr> it does :)
+ <teythoon> good
+ <braunr> i still can't figure out why youpi didn't had it
+ <braunr> the code path is run when no error (actually error != ENOENT)
+ <braunr> which explains why the leak was so visible
+ <teythoon> so my patch exposed this b/c of the allocation I added, makes
+ sense
+ <teythoon> it's funny actually, b/c this wasn't an issue for me as well, I
+ had my development vm running on that patches for two weeks
+
+
+### IRC, freenode, #hurd, 2013-10-03
+
+ <braunr> youpi: i've committed a fix to hurd that checks for negative sizes
+ when truncating files
+ <braunr> this allows building the ustr package without making ext2fs choke
+ on an assertion
+ <braunr> pinotree is preparing a patch for glibc
+ <braunr> see truncate/ftruncate
+ <braunr> with an off_t size parameter, which can be negative
+ <braunr> EINVAL The argument length is negative or larger than the
+ maximum file size.
+ <braunr> hurd servers were not conforming to that before my change
+
+
+## Multiple mtab Translators Spawned
+
+### IRC, freenode, #hurd, 2013-09-20
<braunr> teythoon: how come i see three mtab translators running ?
<braunr> 6 now oO
@@ -2113,10 +2351,250 @@ In context of [[open_issues/mig_portable_rpc_declarations]].
<braunr> teythoon: more bug fixing for you :)
-## IRC, freenode, #hurd, 2013-09-23
+### IRC, freenode, #hurd, 2013-09-23
<teythoon> so it might be a problem with either libnetfs (which afaics has
never supported passive translator records before) or procfs, but tbh I
haven't investigated this yet
[[open_issues/libnetfs_passive_translators]].
+
+### IRC, freenode, #hurd, 2013-09-26
+
+ <braunr> teythoon: hum, i just saw something disturbing
+ <braunr> teythoon: to isolate the leak, i created my own proc directory
+ <braunr> and the mtab translators it spawns seem to be owned by root oO
+ <teythoon> braunr: but how is that possible? are you sure? have you checked
+ with 'ids'?
+ <braunr> no i'm not sure
+ <braunr> also, ext2fs seems to ignore --writable when started as a passive
+ translator
+ <braunr> < teythoon> braunr: but how is that possible?
+ <braunr> messup with passive translators i guess
+ <braunr> teythoon: actually, it looks like it has effective/available id
+ <braunr> it has no*
+ <braunr> this feature doesn't map well in unix
+ <teythoon> braunr: ah yes, htop doesn't handle this well and shows root
+ indeed, our ps shows - as username
+ <braunr> yes
+
+
+### [[!debbug 724868]]
+
+
+### IRC, freenode, #hurd, 2013-10-03
+
+ <braunr> i can't manage to find out where the hurd stores information about
+ active translators ...
+ <braunr> there is this transbox per node
+ <braunr> but where are nodes stored ?
+ <braunr> what if they are are dropped ?
+ <pinotree> braunr: iirc, see libfshelp
+ <braunr> well i have
+ <braunr> i still can't find it
+ <braunr> i fear that it works for ext2fs because that particular translator
+ implements a cache of open nodes
+ <braunr> whereas things like procfs drop and recreate nodes per open
+ <braunr> which would be the root cause for the multiple mtab bug
+ <pinotree> doesn't tmpfs support translators?
+ <braunr> good idea
+ <braunr> although it's still a libdiskfs based one
+ <braunr> no problem for tmpfs, so it would be a netfs/procfs issue
+ <braunr> better than what i feared :)
+ <braunr> now, how is libdiskfs able to find active translators ..
+ <braunr> ah, there is a name cache in libdiskfs ..
+ <braunr> nope, looks fine
+
+
+### IRC, freenode, #hurd, 2013-10-04
+
+ <braunr> nodes with a translator seem to keep a reference in libdiskfs and
+ not in libnetfs
+ <braunr> mhmmpf
+ <braunr> oh great ..
+ <braunr> each libdiskfs that "works" seems to implement its own
+ diskfs_cached_lookup function
+ <braunr> so both ext2fs and tmpfs actually maintain a list of nodes,
+ keeping a reference on those with a translator
+ <braunr> while procfs simply doesn't
+ <braunr> teythoon: ^
+ <braunr> *sigh*
+ <teythoon> braunr: ok, thanks, I'll look into that
+ <braunr> i'm not sure how to fix it
+ <braunr> we can either fix node destruction to cleanly shut down
+ translators
+ <braunr> but this would mean starting mtab on each access
+ <braunr> or we could implement a custom cache in procfs
+ <braunr> or perhaps a very custom change in the lookup callback for mounts
+ <braunr> i'll try the latter
+ <teythoon> err, shouldn't we try to fix this in lib*fs?
+ <braunr> unless you really want to work on it
+ <braunr> i dont' know
+ <teythoon> ah, so the node is destroyed but the translator is kept running?
+ that's what you mean by the above?
+ <teythoon> and ext2fs makes an effort of killing it in its node cleanup
+ code?
+ <braunr> yes
+ <braunr> grmbl, i'm lagging a lot
+ <braunr> i'm not sure
+ <braunr> ext2fs maintains it
+ <braunr> with ext2fs, translators can only be explicitely removed
+ <braunr> i mean, ext2fs keeps all node descriptors alive once accessed
+ <braunr> while procfs doesn't
+ <braunr> teythoon: ok, looks like i have a working patch that merely caches
+ the node for mounts
+ <braunr> libnetfs suffers from the same leak as libdiskfs when looking up a
+ translator
+ <braunr> i'll fix it too
+
+ <braunr> i installed my fixed procfs on darnassus, only one mtab :)
+ <teythoon> nice :)
+ <braunr> now, why is there no /home in df output ?
+ <teythoon> not sure
+ <teythoon> note how /dev/tty* end up in /proc/mounts, those are passive
+ translators too, no?
+ <braunr> yes
+ <braunr> but that's a good thing i guess
+ <braunr> or was mounts intended for file systems only ?
+ <braunr> well, in the unix traditional meaning
+ <teythoon> I think its nice too, yes
+ <teythoon> but why are they fine and your /home is not...
+ <braunr> that's weirder
+ <braunr> also, mounts actually doesn't show passive translators
+ <braunr> teythoon: does your code perform any kind of comparison ?
+ <braunr> i see /servers/socket/26 but not /servers/socket/2
+ <braunr> s/comparison/filter/g
+ <teythoon> hmm
+ <teythoon> well, yes, try /hurd/mtab --insecure /
+ <teythoon> (I cannot connect to darnassus from here...)
+ <braunr> ok but that looks unrelated
+ <braunr> both /servers/socket/26 and /servers/socket/2 refer to the same
+ translator
+ <braunr> i was wondering if mtab was filtering similar entries based on
+ that
+ <teythoon> no
+ <braunr> that's weird too then, isn't it ?
+ <teythoon> yes ;)
+ <braunr> ok
+ <teythoon> btw, how is that done with the same traanslator being bound to
+ two nodes? settrans cannot do that, can it?
+ <braunr> no it can't
+ <braunr> the translator does it when started
+ <teythoon> ah
+ <braunr> (which means there is a race if both are started simulatneously,
+ although it's very rare and not hard to solve)
+ <teythoon> a weird beaving translator then :)
+
+ <braunr> i have a fix for the multiple mtab issue, will send a patch
+ tonight
+
+ <braunr> teythoon: if ext2fs is set active, mtab output reports it
+
+ <braunr> teythoon: looks like this bug is what allows mtab not to deadlock
+ <braunr> teythoon: when i attach it as an active translator, cat freezes
+
+ <braunr> teythoon: if (control && control->pi.port_right == fsys)
+ <braunr> that's the filtering i was previously talking about
+ <braunr> oh please don't name global variables "path" ...
+
+ <braunr> youpi: i fixed procfs on ironforge and exodar to be started as
+ procfs -c -k 3
+ <braunr> without -k 3, many things as simple as top and uptime won't work
+
+
+### IRC, freenode, #hurd, 2013-10-06
+
+ <antrik> teythoon: pty-s also bind to two nodes, not only pfinet
+
+
+### IRC, freenode, #hurd, 2013-10-07
+
+ <braunr> teythoon: please tell us when you're available, we need to work
+ out the last mtab issues
+ <teythoon> braunr: I'm available now :)
+ <teythoon> I'm sorry, I've been very busy the last two weeks, but I've
+ plenty of time now
+ <braunr> great :)
+ <braunr> did you see youpi's mail ?
+ <braunr> i have the exact same question
+ <teythoon> I did
+ <braunr> it seems your code registers active translators
+ <braunr> but parent translators don't seem to register them when they're
+ created from passive translators
+ <braunr> or am i mistaken ?
+ <teythoon> I'll need a moment to get my hurd machine and myself up to
+ speed...
+ <teythoon> braunr: I concur with youpi, hooking into fshelp_fetch_root
+ should do just fine
+ <teythoon> I'll just try that
+ <braunr> ok
+ <braunr> how do you deal with mtab reporting itself ?
+ <teythoon> o_O does it do that?
+ <braunr> no, but it should
+ <braunr> when i set it as an active translator, i get a deadlock
+ <braunr> hm
+ <braunr> teythoon: before you change libfshelp, i'd like you to try
+ something else
+ <braunr> use more appropriate names for global variables in mtab.c
+ <braunr> in particular, the variable path clashes with local names
+ <teythoon> noted
+ <braunr> teythoon: as a side note (i'm not asking to rewrite anything)
+ <braunr> i strongly recommend a very explicit object oriented style of
+ coding
+ <braunr> (or data-oriented as it's sometimes called)
+ <braunr> use prefixes for all your interfaces so they can be made public if
+ needed (which acts as a namespace and avoids lots of collisions
+ naturally)
+ <braunr> use "constructors" and "destructors" (functions that both allocate
+ and initialize)
+ <braunr> this helps avoiding leaks a lot too
+ <teythoon> hm, I thought I did that, could you be more specific?
+ <braunr> ok didn't see the comment
+ <braunr> /* XXX split up */ error_t mtab_populate (...
+ <braunr> :)
+ <braunr> as a better example, see your code in libfshelp/translator-list.c
+ <braunr> struct translator should have been treated as an object
+ <braunr> this would probably have completely avoided any leaks in the first
+ place
+ <teythoon> braunr: right, I deviated from that style there
+ <braunr> teythoon: these are minor details, don't mind them too much, i
+ just find it helps me a lot
+ <teythoon> braunr: sure, I appreciate the feedback :)
+
+
+### IRC, freenode, #hurd, 2013-10-08
+
+ <teythoon> braunr: I'm on to the passive translator not getting registered
+ issue
+ <teythoon> however, removing them from the list if the active translator is
+ killed does not work as expected... I still need to fiddle with the
+ notifications to get this right
+ <braunr> ok
+
+
+### IRC, freenode, #hurd, 2013-10-16
+
+ <teythoon> braunr: btw, I fixed the 'passive translator not showing up in
+ proc/mounts'-issue
+ <teythoon> but 4 ports do leak each time a translator is killed and
+ reinstalled
+ <teythoon> this happens with passive ones as well as active ones
+ <braunr> teythoon: is that issue tied to your changed ?
+ <braunr> changes*
+ <teythoon> I'm not sure tbh, testing that is on my list of things to do
+ <braunr> ok
+ <braunr> first thing to know i guess
+ <teythoon> yes
+
+
+## Memory Leak in `translator_ihash_cleanup`
+
+### IRC, freenode, #hurd, 2013-10-04
+
+ <braunr> teythoon: isn't there a leak in translator_ihash_cleanup ?
+ <teythoon> braunr: looks like, yes
+ <teythoon> braunr: I probably forgot to add the free (element->name) when I
+ added the name field
+ <braunr> teythoon: ok
+ <braunr> teythoon: i let you fix that :p
+ <teythoon> braunr: sure ;)