[[!meta copyright="Copyright © 2010, 2011, 2012 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled [[GNU Free Documentation License|/fdl]]."]]"""]] [[!tag open_issue_glibc]] There are a lot of reports about this issue, but no thorough analysis. # Short Timeouts ## `elinks` IRC, unknown channel, unknown date: This is related to ELinks... I've looked at the select() implementation for the Hurd in glibc and it seems that giving it a short timeout could cause it not to report that file descriptors are ready. It sends a request to the Mach port of each file descriptor and then waits for responses from the servers. Even if the file descriptors have data for reading or are ready for writing, the server processes might not respond immediately. So if I want ELinks to check which file descriptors are ready, how long should the timeout be in order to ensure that all servers can respond in time? Or do I just imagine this problem? ## [[dbus]] ## IRC ### IRC, freenode, #hurd, 2012-01-31 don't you find vim extremely slow lately ? (and not because of cpu usage but rather unnecessary sleeps) yes. wasn't there a discussion to add a minimum timeout to mach_msg for select() or something like that during the past months ? there was, and it was added that could be it I don't want to drop it though, some app really need it as a debian patch only iirc ? yes ok if i'm right, the proper solution was to fix remote servers instead of client calls (no drop, unless the actual bug gets fixed of course) so i'm guessing it's just a hack in between not only with a timeout of zero, mach will just give *no* time for the servers to give an answer that's because the timeout is part of the client call so the protocol has to be rethought, both server/client side a suggested solution was to make it a parameter i mean, part of the message not a mach_msg parameter OTOH the servers should probably not be trusted to enforce the timeout. why ? they're not necessarily trusted. (but then again, that's not the only circumstances where that's a problem) there is a proposed solution for that too (trust root and self servers only by default) I'm not sure they're particularily easy to identify in the general case "they" ? the solutions you mean ? or the servers ? jkoenig: you can't trust the servers in general to provide an answer, timeout or not yes the root/self servers. ah jkoenig: you can stat the actual node before dereferencing the translator could they not report FD activity asynchronously to the message port? libc would cache the state I don't understand what you mean anyway, really making the timeout part of the message is not a problem 10:10 < youpi> jkoenig: you can't trust the servers in general to provide an answer, timeout or not we already trust everything (e.g. read() ) into providing an answer immediately i don't see why braunr: put sleep(1) in S_io_read() it'll not give you an immediate answer, O_NODELAY being set or not well sleep is evil, but let's just say the server thread blocks ok well fix the server so we agree ? in the current security model, we trust the server into achieve the timeout yes and jkoenig's remark is more global than just select() taht's why we must make sure we're contacting trusted servers by default it affects read() too sure so there's no reason not to fix select() that's the important point but this doesn't mean we shouldn't pass the timeout to the server and expect it to handle it correctly we keep raising issues with things, and not achieve anything, in the Hurd if it doesn't, then it's a bug, like in any other kernel type I'm not the one to convince :) eh, some would say it's one of the goals :) who's to be convinced then ? jkoenig: who raised the issue ah well, see the irc log :) not that I'm objecting to any patch, mind you :-) i didn't understand it that way if you can't trust the servers to act properly, it's similar to not trusting linux fs code no, the difference is that servers can be non-root while on linux they can't again, trust root and self non-root fuse mounts are not followed by default as with fuse that's still to be written yes and as I said, you can stat the actual node and then dereference the translator afterwards but before writing anything, we'd better agree on the solution :) which, again, "just" needs to be written err... adding a timeout to mach_msg()? that's just wrong (unless I completely misunderstood what this discussion was about...) #### IRC, freenode, #hurd, 2012-02-04 this is confirmed: the select hack patch hurts vim performance a lot I'll use program_invocation_short_name to make the patch even more ugly (of course, we really need to fix select somehow) could it (also) be that vim uses select() somehow "badly"? fsvo "badly", possibly, but still Could that the select() stuff be the reason for a ten times slower ethernet too, e.g. scp and apt-get? i didn't find myself neither scp nor apt-get slower, unlike vim see strace: scp does not use select (I haven't checked apt yet) ### IRC, freenode, #hurd, 2012-02-14 on another subject, I'm wondering how to correctly implement select/poll with a timeout on a multiserver system :/ i guess a timeout of 0 should imply a non blocking round-trip to servers only oh good, the timeout is already part of the io_select call ### IRC, freenode, #hurdfr, 2012-02-22 le gros souci de notre implé, c'est que le timeout de select est un paramètre client un paramètre passé directement à mach_msg donc si tu mets un timeout à 0, y a de fortes chances que mach_msg retourne avant même qu'un RPC puisse se faire entièrement (round-trip client-serveur donc) et donc quand le timeout est à 0 pour du non bloquant, ben tu bloques pas, mais t'as pas tes évènements .. peut-être que passer le timeout de 10ms à 10 us améliorerait la situation. car 10ms c'est un peut beaucoup :) c'est l'interval timer système historique unix et mach n'est pas préemptible donc c'est pas envisageable en l'état ceci dit c'est pas complètement lié enfin si, il nous faudrait qqchose de similaire aux high res timers de linux enfin soit des timer haute résolution, soit un timer programmable facilement actuellement il n'y a que le 8254 qui est programmé, et pour assurer un scheduling à peu près correct, il est programmé une fois, à 10ms, et basta donc oui, préciser 1ms ou 1us, ça changera rien à l'interval nécessaire pour déterminer que le timer a expiré ### IRC, freenode, #hurd, 2012-02-27 braunr: extremely dirty hack I don't even want to detail :) oh does it affect vim only ? or all select users ? we've mostly seen it with vim but possibly fakeroot has some issues too it's very little probable that only vim has the issue :) i mean, is it that dirty to switch behaviour depending on the calling program ? not all select users ew :) just those which do select({0,0}) well sure braunr: you guessed right :) thanks anyway it's probably a good thing to do currently vim was getting me so mad i was using sshfs lately it's better than nothing yes # See Also See also [[select_bogus_fd]] and [[select_vs_signals]].