[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled [[GNU Free Documentation License|/fdl]]."]]"""]] [[!tag open_issue_hurd]] [RPC to self with rendez-vous leading to duplicate port destroy](http://lists.gnu.org/archive/html/bug-hurd/2011-03/msg00045.html) IRC, freenode, #hurd, 2011-03-14 youpi: I wonder, why does the root FS call diskfs_S_dir_lookup() at all?... errr, because a client asked for it? (problem with RPCs is you can't easily know where they come from :) ) (especially when it's the root fs...) ah, it's about a client request... didn't see that well, I just said "is called", yes I do not really understand though why it tries to reauthenticate against itself... I fear my memory of the lookup mechanism grew a bit dim see the source it's about a translated entry (and I never fully understood some aspects anyways...) it needs to start the translated entry as another user, possibly yes, but a translated entry normally would be served by *another* process?... sure, but ext2fs has to prepare it thus reauthenticate to prepare the correct set of rights prepare what? rights so the process is not root, doesn't have / opened as root, etc. rights for what? err, about everything IIRC the reauthentication is done by the parent FS on the port to the *translated* node and the translated node should be a different process?... that's not what I read in the source fshelp_fetch_root ports[INIT_PORT_CRDIR] = reauth (getcrdir ()); here, getcrdir() returns ext2fs itself well, perhaps the issue is that I have no idea what fshelp_fetch_root() does, nor why it is called here... it notably starts the translator that dir_lookup is looking at, if needed possibly as a different user, thus reauthentication of CRDIR so this is about a port that is passed to the translator being started? no well, depends on what you mean by "port" it's about reauthenticating a port to be passed to the translator being started and for that a rendez-vous port is needed for the reauthentication and that's the one at stake yeah, I meant the port that is reauthenticated what is CRDIR? current root dir ... so the parent translator passes it's own root dir to the child translator; and the issue is that for the root FS the root dir points to the root FS itself... yes OK, that makes sense (but that's only one example, rgrep mach_port_destroy hurd/ show other potential issues) well, that's actually what I wanted to mention next... why is the rendez-vous port destroyed, instead of just deallocating the port right and letting reference counting to it's thing?... do its thing "just to make sure" I guess it's pretty obvious that this will cause trouble for any RPC referencing itself... well, follow-up with that on the list with roland/tb in CC only they would know any real reason for destroy btw, if you knew how we could make _hurd_select()'s raw __mach_msg call be interruptible by signals, that'll permit to fix sudo (damn, I need sleep, my tenses are all wrong) BTW, does this cause any actual trouble?... I don't know much about interruption... cfhammer might have a better idea, he look into that stuff quite a bit AIUI looked (hehe, it's not only your tenses... guess there's something in the ether ;-) ) it makes sudo, mailq, etc. fail sometimes I mean the rendez-vous thing that's it, yes sudo etc. fail at least due to this so these are two different problems that both affect sudo? (rendez-vous and interruption I mean) yes with my patch the buildds have much fewer issues, but still some (my interrupt-related patch) I'm installing a s/destroy/deallocate/ version of ext2fs on the buildds, we'll see how it behaves (it fixes my testcase at least) interrupt-related patch? only thing interrupt-related I remember was the reauthentication race... that's what I mean well, cfhammer investigated this is quite some depth, explaining quite well why the race is only mitigated but still exists... problem is that we didn't know how to fix it properly because nobody seems to understand the cancellation code, except perhaps for Roland and Thomas (and I'm not even entirely sure about them :-) ) I think his findings and our conclusions are documented on the ML... by "much fewer issues", I mean that some of the symptoms have disappeared, others haven't BTW, couldn't the rendez-vous thing be worked around by simply ignoring the errors from the failing deallocate?... no, failing deallocate are actually dangerous why? since the name might have been reused for something else in the meanwhile that's the whole point of the warning I had added in the kernel itself I see such things really deserve tracking, since they can have any kind of consequence does Mach try to reuse names quickly, rather than only after wrapping around?... it seems to OK, then this is a serious problem indeed (note: I rarely divine issues when there aren't actual frequent symptoms :) ) well, the problem with the warning is that it only shows in the cases that do *not* cause a problem... so it's hard to associate them with any specific issues well, most of the time the port is not reused quickly enough so in most case it shows up more often than causing problem IRC, freenode, #hurd, 2011-03-14 ok, mach_port_deallocate actually can't be used since mach_reply_port() returns a receive right, not a send right * youpi guesses he will really have to manage to understand all that port stuff completely oh, right youpi: hm... now I'm confused though. if one client holds a receive right, the other client (or in this case the same process) should have a send or send-once right -- these should *not* share the same name in my understanding destroying the receive right should turn the send right into a dead name so unless I'm missing something, the destroy shouldn't be a problem, and there must be something else going wrong hm... actually I'm probably wrong yeah, definitely wrong. receive rights and "ordinary" send rights share the name. only send-once rights are special I wonder whether the problem could be worked around by using a send-once right... mach_port_mod_refs(mach_task_self(), name, MACH_PORT_RIGHT_RECEIVE, -1) can be used to deallocate only the receive right oh, you already figured that out :-)