[[!meta copyright="Copyright © 2010, 2013, 2014 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled [[GNU Free Documentation License|/fdl]]."]]"""]] [[!tag open_issue_hurd]] # IRC, freenode, #hurd, 2013-02-26 btw, about fakeroot-hurd the remaining issue I see is with argv[0] (yes, again...) ## IRC, freenode, #hurd, 2013-04-03 btw, I believe our fakeroot-hurd is close to working actually it's just a argv[0] issue supposed to be fixed by exec_file_name but apparently not fixed in that case, for some reason ## IRC, freenode, #hurd, 2013-08-26 < teythoon> also I looked into the fakeroot issue, aiui the problem is that scripts are not handled correctly, right? < teythoon> the exec server fails to locate the scripts file name, and so it hands the file_t to the interpreter process and passes /dev/fds/3 as script name < teythoon> afaics that breaks e.g. python < youpi> yes < youpi> pinotree's exec_file_name is supposed to fix that, but for some reason it doesn't work here < pinotree> it was pochu's, not mine < youpi> ah, right < teythoon> ah I see, I was wondering about that < pochu> it was working for a long time, wasn't it? < pochu> and only stopped working recently < youpi> did it completely stop? < youpi> I have indeed seen odd issues < youpi> I haven't actually checked whether it has completely stopped working < youpi> probably worth looking there first < pinotree> gtk+3.0 fails, but other stuff like glib2.0 and gtester-using stuff works < teythoon> huh? I created tests like "#!/bin/sh\necho $0" and that says /dev/fd..., and a python script doing the same doesn't even run, so how can it work for a package build? < youpi> it works for me in plain bash < youpi> #!/bin/sh < youpi> echo $0 < youpi> € $PWD/test.sh < youpi> /home/samy/test.sh < teythoon> it does !? < youpi> yes < youpi> not in fakeroot-hurd however, as we said < teythoon> well, obviously it works when not being run under fakeroot-hurd, yes < youpi> ok, so we weren't talking about the same thing < youpi> a mere shell script doesn't work in fakeroot-hurd indeed < youpi> that's why we still use fakeroot-sysv < teythoon> right < youpi> err, -tcp ## IRC, freenode, #hurd, 2013-11-18 I believe I figured out the argv[0] issue with fakeroot-hurd but I'm not sure how to fix this first of all, Emilios file_exec_file_name patch set works fine but not with fakeroot http://git.sceen.net/hurd/hurd.git/blob/HEAD:/exec/hashexec.c#l300 check_hashexec tries to locate the script file using a heuristic Emilios patch improves the situation with just providing this information but then, the identity port of the "discovered" file is compared with the id port of the script file to verify if the heuristic found the right file but when using fakeroot-hurd, /hurd/fakeroot proxies all requests but the exec server is outside of the /hurd/fakeroot environment, so it gets the id port from the real filesystem we could skip that test if the script name is explicitly provided though that test was meant to see whether a search through $PATH turned up the right file teythoon: nice braunr: thanks :) unfortunately, dpkg-buildpackaging hurd with it still fails for some reason but it is faster than fakeroot-tcp :) even chown ? or chmod ? dunno in detail, but the whole build is faster if you can try it, i'm interested because chown/chmod is also slow on linux with fakeroot-tcp i can try... so it's probably not a hurd bug braunr: yes, it really is no i mean chown/chmod being slow with fakeroot-tcp is probably not a hurd bug but a fakeroot-tcp bug chowning all files in /usr/bin takes 5.930s with fakeroot-hurd (6.09 with startup overhead) vs 26.42s (26.59s) with fakeroot-tcp but try it on linux (fakeroot-tcp i mean) although you may want to do it on something you don't care much about :p) ## IRC, freenode, #hurd, 2013-12-03 * teythoon is gonna hunt a fakeroot bug ... % fakeroot-hurd /bin/sh -c ":> /tmp/some_file" /bin/sh: 1: cannot create /tmp/some_file: Is a directory ah fakeroot-hurd prevents installing stuff with /bin/install sure fakeroot-hurd, why would i work on the slow one ? i don't know because it makes chmod/chown/maybe others horrenddously slow ? yes, fixing this involves fixing fakeroot-hurd are you sure ? i prefer repeating just in case: i saw that problem on linux as well with fakeroot-sysv so ? i'm almost certain it's a pure fakeroot bug, not a hurd bug so even if this is fixed, it still has to pay the socket communication overhead fixing fakeroot-hurd so that i can be used instead of fakeroot-tcp is a very good thing to do, obviously it* but it won't solve the chown/chmod speed (or, probably won't) huh, why not ? 15:53 < braunr> i'm almost certain it's a pure fakeroot bug, not a hurd bug when i say it's slow, i should be more precise it doesn't show up in top yes, but why would fakeroot-hurd suffer from the same issue ? the cpu is almost idle oh right, it's a completely different tool my bad right, right, the proper way to implement fakeroot actually :) yes this will bring near-native speed ## IRC, freenode, #hurd, 2013-12-05 fakeroot-hurd just successfully built mig :) hangs in dh_gencontrol when building gnumach or hurd though i believe it hangs waiting for a lock lock like in file lock that is braunr: no more room for vm_map_find_entry in 80220a40 80220a40 <- is that a task ? or a vm_map, not sure probably a vm_map ## IRC, freenode, #hurd, 2013-12-06 well, aren't threads a source of endless entertainment ... ? well, I found three more bugs in fakeroot-hurd one of them requires fixing the locking used in fakeroot ouch the current code does some lock cycling to aquire a lock out of order cycling ? in the netfs_node_norefs function release and reaquire i see which imho should be better solved with a weak reference working on it, it no longer deadlocks but i broke something else ... endless fun ;) such things could have been done right in the beginning ... yes, I wonder libports has weak references but pflocal is the only user hm none of the lib*fs support that didn't i add one in libdiskfs too ? anyway, irrelevant weak references are a nice feature teythoon: i don't see the cycling you mentioned only netfs_node_refcnt_lock being dropped temporarily yep, that one line 145 note that due to another bug this code is currently never run how surprising .. the note about some leak actually gave a hint about that yeah, that leak I think i'm actually very close it's just so frustrating, i thought i got it last night good luck then thanks :) ## IRC, freenode, #hurd, 2013-12-09 sweet, i fixed fakeroot-hurd :) /clap what was the problem ? lots i see it's amazing it actually run as well as it did mess strikes again i hate messy code .. * teythoon is building half a hurd package using this ... stay tuned ;) teythoon: is this going to make building faster as well? most likely, yes fakeroot-tcp is known to be slow, even on linux teythoon: are you sure about the transparent retry patch ? pretty sure, why ? it's about a more general issue that we didn't fix yet our last discussions about it lead us to agree that clients should check the identity of a server before interacting with it braunr: i don't understand, what's the problem here ? teythoon: fakeroot does the lookup itself, doesn't it ? yes teythoon: but was that also the case before your patch ? braunr: yes teythoon: then ok teythoon: i guess fakeroot handles requests only for a specific set of calls right ? and for others, requests are directly relayed braunr: yes and that still is the case, right ? yes ok looks right since it only affects lookups ok then well, fakeroot-hurd built half a hurd package in less than 70 minutes a new record for my box compared to how much before ? (and why half of it ?) unfortunately it hung after signing the packages... some perl process with a /usr/bin/tee child killing tee made it succeed though braunr: i don't build the udeb package oh ok braunr: compared with ~75 with fakeroot-tcp and my demuxer rework, ~80 before teythoon: nice ## IRC, freenode, #hurd, 2013-12-18 there, i fixed the last fakeroot-hurd bug *whee* :) i thought so many times that i got the last fakeroot bug ... last as in it's in a good enough shape to compile the hurd package that is but now it is :) this will make glibc and others so much faster to build ## IRC, freenode, #hurd, 2013-12-19 teythoon_: hum, you should make the behaviour of fakeroot-hurd on the last client exiting optional y? fakeroot-tcp does the very same thing fakeroot-hurd is different it's part of the file system yes users may want it to stay around and reuse it without checking it's actually there but once the last client is gone, who is ever getting another port to it ? no that cannot happen really ? yes i thought it was like remap since remap is based on it the same thing applies to remap only settrans has the control port hum and uses it once to get a protid for the working dir of the initial process started inside the chrooted environment you may not want to chroot inside so ? then, you get another protid i'll make an example i create a myroot directory implemented by fakeroot populate it leave and do something else, i might want to return to it later ah ok, so you are not using settrans --chroot or maybe i'm confusing the fakeroot translator and fakeroot-hurd 10:48 < braunr> you may not want to chroot inside yes hm ok, so the patch could be changed to check whether the last control port is gone too i have no idea of any practical use, but i don't see a valid reason to make a translator go away just because it has no client except for resource usage and if it's installed as a passive translator although that would make fakeroot loose its state though remap state is on the command line so it would be fine for it see what i mean ? yes i do fakeroot state could be saved in some db one day so it may apply, if anyone feels the need so what about checking for control ports too ? i'm not too familiar with those who has the control port of a passive translator ? the parent ? that should cover the use case you described for the parent translator for fsys_getroot requests it has to keep it around and for more fsys stuff too and if active ? settrans ? who just looses it ? if settrans is used to start an active translator, the parent fs still gets a right to the control port ok i don't have a clear view of what this implies for fakeroot-hurd we'd want fakeroot-hurd to clean all resources including the fakeroot translator on exit for fakeroot-hurd (or any child translator) this means that a port from the control port class will still exists so we do not exit oh, you're speaking of fakeroot.sh ? the wrapper script ? probably for me, fakeroot-hurd is the command line too, similar to fakeroot-sysv and fakeroot-tcp and fakeroot is the translator yes, agreed fakeroot-hurd could use settrans --force --chroot ... to force fakeroot to exit if the main chrooted process dies but that'd kill anything that outlives that process that might be legitimate, say a process daemonized so detecting that noone uses fakeroot is the much cleaner solution ok also, that's what fakeroot-tcp does which is why i suggested an option for that why add an option if we can do the right thing without troubling the user ? ah, if we can, good i think we can I'll rework the patch, thanks for the hint so just to be clear the way you intend it to work is wait for all clients and the control port to drop before shutting down the control port is dropped when dettaching the translator, right ? yes but hm what if clients spawn other processes ? they won't find the translator any more then, that client get's a port to fakeroot at least for it's working dir so another protid is created ah yes, it's usually choorted for such uses chrooted so fakeroot will stick around but clients, even from fakeroot, might simply use absolute paths so ? in which case they won't find fakeroot it will hit fakeroots dir_lookup sure how so ? if the path is absolute, it will trigger a magic retry of some kind so the client uses it's root dir port i thought the lookup would be done straight from the root fs port .. which points to fakeroot of course ah, chrooted again that's the whole point so this implies clients are chrooted they are even if you do another chroot what i mean is that root port also points to a fakeroot port if we detach the translator, and clients outside the chroot spawn processes, say shell scripts, they won't find the fakeroot tree now, i wonder if we want to actually handle that i'm just uncomfortable with a translator silently shutting down because it has no client if fakeroot is detached, how are clients outside the chroot ever supposed to get a handle to files inside the fakerooted env ? it makes sense for fakeroot, so the expected behaviours here aer conflicting they had those before fakeroot being detached then fakeroot wouldn't go away right unless there is a race but i don't think there is there isn't i call netfs_shutdown clients get the rights before the parent has a chance to terminate and only shutdown if it doesn't return ebusy makes sense ok go ahead :) cool, thanks for the walk-through ;) on the other hand .. that's a complicated topic left unfinished by the original authors one of many having translators automatically go away when there is no client may be a good feature but it only makes sense for passive translators and this should be automated the lib*fs libraries should be able to handle it or, we could go for proper persistence instead stay around if active, leave after a while when no more clients if passive why ? clean solution persistence looks much more expensive to me other benefits i mean persistence looks so expensive it doesn't make sense in a general purpose system sure, we could make our *fs libs handle this smarter at a much lower cost don't we get a handle to the underlying file ? i think we do yes if that's actually a file and not a directory, we could store data into it many translators are read-only so ? well, when we can write, we can use passive translators instead normally yes depends on the fs type actually but you're right, we could use regular files or a special type of file, i don't know braunr: BTW, I agree that active translators should only go away when no ports are open anymore, while passive ones can exit when control ports are still open but no protids antrik: you mean as a general rule ? that leaves the question how the translator distinguishes between having a passive translator record and not having one I believe I already arrived at that conclusion in some design discussion, probaly regarding namespace-based translator selection teythoon: yeah, as a general rule interesting currently there are command line arguments controling timeouts, but they don't consider control ports IIRC i thought there are problems with shutting down translators in general (also, command line arguments seem inconvenient to distinguish the passive vs. active case...) yeah, but we disregard the timeouts in the debian flavor of hurd teythoon: err... no we don't. at least not last time I knew. are you confusing this with thread timeouts? simple test: do ls -l on /dev, wait a few minutes, compare what do you expect will happen ? the unused translators should go away no that must be new then might be, yes http://darnassus.sceen.net/gitweb/teythoon/packaging/hurd.git/blame/HEAD:/debian/patches/libports_stability.patch antrik: debian currently disables both the global and thread timeouts in libports my work on thread destruction consists in part in reenabling thread timeouts, and my binary packages do that well so far :) braunr: any idea why the global timeouts were disabled? ## IRC, freenode, #hurd, 2013-12-20 antrik: not sure but i suspect there could be races if a message arrives while the server is going away, i'm not sure the client can determine this and retry transparently good point... not sure how that is supposed to work exactly ## IRC, freenode, #hurd, 2013-12-31 btw, we should remove the libports_stability patch and directly change the upstream code if you agree, i can force the global timeout to 0 (because we're still not sure what can go wrong when a translator goes away while a message is being delivered to it) i didn't experience any slowdown with thread destruction however so i'm tempted to set that to an actual reasonable timeout value of 30-300 seconds braunr: if you do, please introduce a macro for the default value so it can be changed easily teythoon: yes i don't understand why these are left as parameters tbh true 30 seconds seems to be plenty enough ## IRC, freenode, #hurd, 2014-01-17 time to give fakeroot-hurd a shot http://darnassus.sceen.net/~rbraun/darnassus_fakeroot_hurd_assert braunr: (wrt fakeroot-hurd) well in my book that shouldn't happen that's why i put the assertion there ;) i assumed so :) then again, /me does not agree with "threads" as concurrency model >,<, and that feeling seems to be mutual :p ? well, obviously, the threads do not agree with me wrt to that assertion the threads ? well, fakeroot is a multithreaded server teythoon: i'm not sure i get the point, are you saying you're not comfortable with threads ? that's exactly what i'm saying ok coroutines/functional i guess ? csp functional not so much ## IRC, freenode, #hurd, 2014-01-20 [[open_issues/libpthread]], [[open_issues/libpthread/t/fix_have_kernel_resources]]. teythoon: it's perfectly possible that the bug i had with fakeroot-hurd have been caused by my own glibc thread related patches has* ok *phew* :p :) i wonder if youpi could reproduce his issue on his machine what issue ? i must have missed something some package failed but he didn't gave any details he wanted to try it on his vm first ok ## IRC, freenode, #hurd, 2014-01-21 teythoon: i still get the same assertion failure with fakeroot-hurd will take a look at that sometimes too braunr: hrm :/ teythoon: don't worry, i'm sure it's nothing big in the mean time, there are updated hurd and glibc packages on my repository with fixed tls and thread destruction cool :) ## IRC, freenode, #hurd, 2014-01-23 teythoon: can you briefly explain this fake reference thing in fakeroot when you have some time please ? braunr: fakeroot creates ports to hand out to clients every port represents a node and references a real node fakeroot allows one to set attributes, e.g. file permissions on any node as if the client was root those faked attributes are stored in the node objects let's focus on fake_reference please once some attribute is faked, that node has to be kept alive otherwise, that faked information is lost so if the last peropen object is closed and some information is faked, a fake reference is kept as indicated by a flag hm in dir lookup, if a node is looked-up that has a fake reference, it is recycled, i.e. the flag cleared and the referecne count is not incremented so every time fakeroot_netfs_release_protid is called b/c, the node in question should not have the fake reference flag set what's the relation between the number of hard links and this fake reference ? i don' i don't think fakeroot has a notion of 'hard links' it does the fake reference is added on nodes with a hard link count greater than 0 but i guess that just means the underlying node still exists ah yes right currently, if the real node is deleted, the fake node is still kept around let's say it's ok for now that's what the comment is talking about, the one that indicates that garbage collection could help here yes properly fixing this is difficult agreed it would require something like inotify anyway b/c of the way file deletion works let's just ignore the issue, that's not what i'm hunting agreed the assertion i have is telling us that we're dropping a fake reference are we certain this isn't possible ? that function is called if a client dereferences a port in order to have a port in the first place, it has to get it from a dir_lookup the dir lookup turns a fake reference into a real one so i'm certain of that (barring a race condition somewhere) ok netfs_S_dir_lookup grabs idport_ihash_lock (line 354) but doesn't release it if nn == NULL (lines 388-392) hm, my file numbers are slightly different o_O i have printfs around sorry :) ok new node unlocks it new_node oh how unintuitive .. yes, don't blame me ;) that's how it was :) worse, the description says "if successful" .. ah no, the node lock ok yes, badly worded description i strongly doubt it's a race how do you trigger that assertion failure ? dpkg-buildpackage -rfakeroot-hurd -uc -us for the hurd package very similar to one of your test cases i think umm :-/ one thing that i find confusing is that fake_reference seems to apply to nodes, whereas release_protid is about, well, protids is there a 1:1 relationship ? since there is a peropen in the protid, i assume not it may be a race actually np->references must be accessed with netfs_node_refcnt_lock locked hm no, that's not it no, it's not a 1:1 relationship note that the lock idport_ihash_lock serializes most operations, despite it's name indicating that it's only for the hash table the "interesting" operations being dir_lookup and release_protid yes again, that's another issue why ? that's a pretty strong guarantee already ah yes, i was referring to scalability sure the assertion is triggered from ports_port_deref in ports_manage_port_operations_multithread but i found it hard to reason about fakeroot, there are multiple locks involved, two kinds of reference counting across different libs yes yes, that's to be expected teythoon: do we agree that the fake reference is reused by a protid ? braunr: yes why is there a ref counter for the protid as well as the peropen then ? :/ funny... i thought there was no refcnt for the peropen objects, but there is but for fakeroot-hurd that shouldn't matter, right ? i don't know here, one protid object is associated with one peropen object yes and the other way around, i.e. it's 1:1 so the refcount for those should be identical but i get a case where protid has a refcnt of 0 while the peropen has 2 .. umm, that doesn't sound right teythoon: ok, it does look like a race on np->references node references are protected by a global lock in lib*fs libs yes you check it without holding it which means another protid can be closed at the same time, setting the flag on the underlying node i'll make a proper patch soon they cannot both hold the hash lock hm teythoon: actually, i don't see why that's relevant one thread closes its protid, sets the fakeref flag the other does the same, chokes on the assertion serially i'm always a little fuzzy when exactly the references get decremented but shouldn't only the second thread set the fakeref flag ? well, that's not what i see i'll check what happens to this ref counter see how my release_protid function calls netfs_release_protid just after the out label *while holding the big hash lock so, any refcounting should happen while the lock is being held, no ? perhaps now, my logs show something new a case where the protid being released was never printed before i.e. not obtained from dir_lookup or at least, not fakeroot dir_lookup huh, where did it came from then ? no idea only dir_lookup hands out those check_openmodes calls dir_lookup too yes, but that's not our dir_lookup that's what i mean it bypasses fakeroot's custom dir_lookup but i guess the reference already exists at this point bypass ? i wouldn't call it that you're right, wrong wording it accesses files on other translators yes the netnode is already present yes could it be the root node ? i do not believe so the root node is always faked and is handed out to the first process in the fakeroot env for it's current directory port so you could try something that chdirs away to test that hypothesis the assertion looks triggered by a chdir how do you know that ? dh_auto_install: error: unable to chdir to build-deb ah well, or that is just the operation after fakeroot died and completely unrelated maybe can you trigger this reliably ? yes i'm trying to write a shell script for that so for you, fakeroot-hurd never succeeded in building a hurd package ? no on darnassus ? yes b/c i stopped working on fakeroot-hurd when it was in a good-enough shape to build the hurd package >,< maybe my system is not fast enough to hit this race (if it turns out to be one) some calls seems to decrease the refcount of the root node call* have you confirmed that it's the root node ? almost i could say yes teythoon: actually no, it's not .. could be .. teythoon: on what node does fakeroot-hurd install the fakeroot translator when used to build debian packages ? hum could it simply be that the check on np->references should be moved above the assertion ? braunr: it is not bound to any node, check settrans --chroot oh right teythoon: ok i mean does it shadow / ? looks very likely, otherwise the chroot wouldn't work i'm not sure what you mean by shadow settrans --chroot cmd -- / /hurd/fakeroot ? but yes, for any process in the chroot-like env every real node is replaced, including / makes sense teythoon: moving the assertion seems to fix the issue intuitively, it seems reasonable to assume the fakeref flag can only be set when there is only one reference, namely the fake reference (well, the fake ref, recycled by the last open) no, i don't follow i'd still say, that if ...release_protid is called, then there is no way for the fake flag to be set in the first place that's why i put the assertion in ;) on the other hand, you check the refcnt precisely because other threads may have reacquired the node but why would moving the assertion change anything ? if we would do that, we'd "lose" all threads that see np->reference being >1 but for those objects the fake_reference flag should never be set anyways i cannot see why this would help (does it help ?) (and if it does, it points to a serious problem imho) i'm recreating the traces that made me think that to get a clearer view of what's happening the problem i have with the current code is this there can be multiple protid referring to the same node occurring at the same time they are serialized by the hash table lock, ok but there apparently are cases where the first (of two) protids being closed sets the fakeref flag and the following chokes because the flag is set i assume you put this refcount check because you assumed only the last protid being closed can set the flag, right ? but then, why > 1 ? why not > 0 ? yes, that's what i was trying to assert b/c the 1 is our reference which one exactly ? >1 is anyone *beside* us ? hm you mean the reference held by the protid being destroyed yes isn't that reference already dropped before calling the cleanup function ? ah no, it's the node ref yes released by netfs_release_protid exactly which is called without the hash table lock held hm no it's locked damn my brain is slow today i actually think that it's the combination of manual reference counting and the primitive concurrency model that makes it hard to reason about this well the model is quite simple too accesses to refcounters must be protected by the appropriate lock this isn't done here, on the assumption that all referencing operations are protected by another global lock all the time even if a model is simple, this does not mean that it is a good model for human beings to comprehend and reason about i don't know note that netfs_drop_node is designed to be called with netfs_node_refcnt_lock locked implying the refcount must remain stable between checking it and dropping the node netfs_make_peropen is called without the hash table lock held in dir_lookup and this increases the refcount although the problem is rather that something decreases it without the lock held we should port libtsan and just ask gcc -fsanitize=thread what about the netfs_nput call at the end of dir_lookup ? the fake ref should be set by the norefs function that should not decrease the count to 0 b/c the caller holds a reference too yes that's ugly ugh i'm unable to think clearly right now as mentioned in the commit message, you cannot do something like this in the norefs function bbl ;) bye teythoon thanks for your time for when you come back : instead of maintaining this "fake" reference, why not assumeing the hash table holds a reference, and simply count it the same way a cache does and drop that reference when removing a node, either to reflect the current state of the underlying node, or because the translator is being shut down ? why not assume* bbl too sure, refactoring is definitively an option ## IRC, freenode, #hurd, 2014-01-24 teythoon: ok, i'll take care of fakeroot braunr: thanks. tbh i was a little fed up with that little bugger >,< i can imagine considering the number of patches you've sent already teythoon: are you sure about your call to fshelp_lock_init ? yes, why do you ask ? (the test case is given in the commit message) it doesn't look right to me to call "init" while the node is potentially locked i noticed libdiskfs peropen release function takes care of releasing locks it looks better to me it's not about releasing the lock it's about faking the file being closed which implicitly releases the lock the file is being close closed since it's in the cleanup function yes, but we keep it b/c the file has faked attributes did you look at the problem description in the commit message ? we keep the node not the peropen so ? the lock is in the node why would libdiskfs do it in the peropen release then ? there is an inconsistency somwhere actually, the lock looks to be per open or rather, the lock is per node, but its status is recorded per open allowing the implementation to track if a file descriptor was used to install a lock and release it when that file descriptor goes away why would the node be locked ? locked in what way, file-locking locked ? yes posix explicitely says that file locks must be implicitely removed when closing the file descriptor used to install them, so that makes sense isn't hat exactly what i'm doing ? no you're initializing the file lock init != unlock and it's specific to fakeroot, while it looks like libnetfs should be doing it libnetfs would do it but we prevent that by keeping the node alive again, it's a per open thing and no, libnetfs doesn't release locks implicitely in the current version didn't we agree that for fakeroot one peropen object is associated with one protid object ? yes and don't keep those alive so let them die peacefully, and fix libnetfs so it releases the lock as it's supposed to and we* don't we don't keep those alive why would we ? yes that's what i wanted to say what i mean is since letting peropens die is already what is being done there is no need for a special handling of locks in fakeroot oh on the other hand, libnetfs must be fixed ok, that might very well be true (we need to bring libnetfs and diskfs closer so that they can be diff'ed easily) i just wanted to check your reason for using lock_init in the first place yes .. teythoon: also, i think we actually do have what's necessary to deal with garbage collection namely, dead-name notifications i'll see if i can cook something simple enough otherwise, merely keeping every node around is also acceptable considering the use cases dead-name notifications won't help if the real node disappears, no ? teythoon: dead name notifications on the real node port :) teythoon: at least i can reliably build the hurd package using fakeroot-hurd now let's try glibc :) ## IRC, freenode, #hurd, 2014-01-25 braunr: awesome :) teythoon: hm not sure :/ darnassus got oom teythoon: could be unrelated though teythoon: something has apprently made /home unresponsive :( teythoon: i suspect bots hitting apache and in particular the git repositories to have increased memory usage ## IRC, freenode, #hurd, 2014-01-26 teythoon: btw, fakeroot interacts very very badly with other netfs file systems e.g., listing /proc through it creates lots of nodes i'm not yet sure how to fix that using a dead name notification doesn't seem appropriate (at least not directly) because fakeroot holds a true reference that prevents the deallocation of the target node ## IRC, freenode, #hurd, 2014-01-27 teythoon: good news (more or less): fakeroot is actually leaking a lot when crossing file systems which means if i fix that, there is a good chance we can use it to build all packages with it -with it what do you mean exactly ? if target nodes are from /, there is no such leak as soon as the target nodes are from another file system, ports rights are leaked that's what fills the kernel allocator actually oh, so dir_lookup leaks ports when crossing translator boundaries ? seems so yeah, that might very well be it the dir_lookup logic in lib*fs is quite involved :/ yes, my simple attempts were unsuccessful but i'm confident i can fix it soon that sounds good :) i also remove the fake_ref flag and replace it with "accounting the reference in the hash table" as soon as a node is faked fine with me these will be the expected leak but they're far less in numbers than what i observe and garbage collection can be implemented later although i would prefer notifications a lot more end of the news, bbl :) found it :> braunr: -v ;) err = dir_lookup (...); if (dir != dnp->nn->file) mach_port_deallocate (mach_task_self (), dir); in other words, deallocate ports for intermediate file system root directories .. :) teythoon: currently building hurd and glibc packages but i intend to improve some more with the addition of a default faked state so that only nodes with modified faked states are retained how do you mark nodes as having the default faked state ? i don't ok, right, makes sense :) this sounds awesome, thanks for following up on this i'm quite busy with other stuff so, with proper testing, it should take me the week to get merged teythoon: well thanks for all the fixes you've done fakeroot was completely unusable before that if you push your changes somewhere i'll integrate them into my packages and test them ok implementing fakeroot -u could also be a good thing and this should work easily with that default faked state strategy ## IRC, freenode, #hurd, 2014-01-28 teythoon: i should be able to test fakeroot-hurd with the default faked attributes strategy today on glibc braunr: very nice :) azeem_: do you happen to know if fakeroot -u is used by debian ? i mean when building packages braunr: how does fakeroot-hurd perform on darnassus ? i mean, does it yield a noticeable improvement over fakeroot-tcp just like on my slow box ? i'm not measuring that :/ ok, no problem and since nodes are removed from the hash table, performance might decrease slightly but the number of rights is kept very low, as expected that's good i keep seeing leaks though when switching cwd between file systems humm so i assume something is wrong with the identity of . or .. it's so insignificant compared to the previous problems that i won't waste time on that teythoon: the problem with measuring on darnassus is that it's a public machine right often scanned by ssh worms or http bots [[cannot_create__dev_null__interrupted_system_call]]. but it makes complete sense to get better performance with fakeroot-hurd that's actually one of the reasons i'm working on it if not the main one :) that was my motivation too it shows how you can get an interchangeable unix tool that directly plugs well with the low level system and make it work better nicely put :) teythoon: i still can't manage to build glibc with fakeroot-hurd but i'm not sure why :/ there was no kernel memory exhaustion this time :/ cp: cannot create regular file `debian/libc-bin.dirs': Permission denied hum youpi: do you know if building debian packages requires fakeroot -u option ? I don't know braunr: man dpkg-buildpackage says it just runs "fakeroot debian/rules " sources confirm that http://sources.debian.net/src/dpkg/1.17.6/scripts/dpkg-buildpackage.pl#L465 gg0: ok ## IRC, freenode, #hurd, 2014-01-29 it seems that something sets the permissions of this debian/libc-bin.dirs file to 000 ... i've seen this too oh do you think it's a fakeroot-hurd bug ? have i mentioned something like this in a commit message ? yes it is ok i didn't see any mention of it but i could have missed it hm, i cannot recall it either but i've seen this issue with fakeroot-hurd ok it's probably the last issue to fix to get it to work for our packages teythoon: i think i have a solution for that last mode bug fakeroot doesn't relay chmod requests, unless they change an executable bit i don't see the point, and simply removed that condition to relay any chmod request braunr: did it work ? no fakeroot still consumes too many ports and for each file, there are at least two ports, the faked one, and the real one it should be completely reworked but i don't have time to do that i'll see if it works when building from scratch actually, it's not even a quantity problem but a fragmentation problem the function that fails is kmem_realloc .. ipc spaces are arrays in kernel space .... it's more like three ports per file, you forgot the identity port ah yes ## IRC, freenode, #hurd, 2014-02-03 teythoon: i'll commit my changes on fakeroot tonight they do improve the tool, but not enough to build glibc with it braunr: cool :), so how do we make it fully usable ? teythoon: i don't know .. i'll try re adding detection of nodes with no hard links for one but imho, it needs a rework based on what the real fakeroot does i won't work on it though teythoon: also, it looks like i've tested building glibc with a wrong test binary of my fakeroot version :/ so consider all test results irrelevant so far ## IRC, freenode, #hurd, 2014-02-04 fakeroot-hurd might turn out to be easily usable for our debian packages with the fixed binary :) teythoon: hum, can you explain 672005782e57e049c7c8f4d6d0b2a80c0df512b4 (trans: fix locking issue in fakeroot) when you have time please ? it looks like it introduces a deadlock by calling new_node (which acquires the hash table lock) while dir is locked, violating the hash table -> node locking order braunr: awesome, then there still is hope for fakeroot-hurd :) teythoon: i've been able to build glibc packages several times this night so except for this deadlock i've seen once, it looks good right that deadlock right, it does indeed violate the partial order of the locks :-/ teythoon: can you explain why you moved the lock in attempt_mkfile please ? teythoon: i've just tested a fakeroot binary without the patch introducing the deadlock, and glibc built without any problem braunr: well, this is very good news :) teythoon: but i still wonder why you made this patch in the first place, i don't want to revert it blindly and reintroduce a potential regression braunr: i thought i was fixing the order in which locks were taken. if the commit message does not specify that it fixes an issue, then i was probably just wrong and you can revert it oh ok good teythoon: another successful build :) i'll commit my changes awesome :) there might still be concurrency issues but it's much better i'm curious what you did :) so little :) i was sick all week heh you'll se see well, that's good actually ;) yes teythoon: actually there was another debugging line left over, and again, my test results are irrelevant @#! ## IRC, freenode, #hurd, 2014-02-05 teythoon: i got an assertion about nn->np->nn not being equal to nn atfer the hash table lookup is dir_lookup +failure that's bad not over yet i had a couple of those too i guess it's a use after free yes i used to poison the pointers and comment out the frees to track them down iirc teythoon: one of your patches stores netnodes instead of nodes in the hash table, citing some overwriting issue teythoon: i don't understand why using netnodes fixes this braunr: libihash has this cookie for fast deletes that has to be stored somewhere the node structure has no room for it uh yes it was that bad ... hence the uglyish back pointers i see looking back i cannot even say why it worked at all well, it didn't i believe libihash must have destroyed a linked list in the node struct possibly no, it did not >,<, but for simple tests it kind of did yes fakeroot sometimes corrupts memory badly .... and yes, turns out the assertion is triggered on nodes with 0 refs .. teythoon: it looks like even the current version makes wrong usage of the ihash interface locp_offset is defined as "The offset of the location pointer from the hash value" and indeed, it's an intptr_t teythoon: hm no, it looks ok actually, forget what i said :) *phew :p hmm, still occasional double frees in fakeroot, but it looks in good shape for single threaded tasks like package building teythoon: i've just sent my fakeroot patches braunr: sweet, i'll have a closer look tomorrow :) teythoon: i couldn't debug the double frees though :/ ## IRC, freenode, #hurd, 2014-02-06 btw, i'm able to successfully use fakeroot-hurd to build glibc packages, but is there a way to make sure the resulting archives contain the right privileges and ownerships ? I don't remember whether debdiff checks permissions braunr: I've just got fakeroot-hurd debian/rules clean dh_clean fakeroot: ../../trans/fakeroot.c:161: netfs_node_norefs: Assertion `np->nn->np == np' failed. while building eglibc youpi: yes, that lockup is most annoying... :/ youpi: with the new version ? yes hum i only had rare double frees, not that any more :/ youpi: ok i got the error too still not good enough ok ## IRC, freenode, #hurd, 2014-02-07 youpi: debdiff seems to handle permissions i've found the cause of the assertions braunr: groovie :) ## IRC, freenode, #hurd, 2014-02-08 braunr: nice :) http://darnassus.sceen.net/~rbraun/debdiff_report ## IRC, freenode, #hurd, 2014-02-10 and, on a completely different topic, here is a crash i can reproduce when using fakeroot: http://darnassus.sceen.net/~rbraun/fakeroot_hurd_rpctrace_o_var_tmp_out_rm_rf_dir.png ## IRC, freenode, #hurd, 2014-02-11 still working on fakeroot there are still races (not disturbing for package building but still ..) there may be wrong right handling i believe i have witnessed a fakeroot deadlock :/ aw not sure though, buildbot killed the build process before i could investigate teythoon: was it a big package ? half of the hurd package that's not a port right overflow then