1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
|
[[!meta copyright="Copyright © 2010, 2011, 2012 Free Software Foundation,
Inc."]]
[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
id="license" text="Permission is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation License, Version 1.2 or
any later version published by the Free Software Foundation; with no Invariant
Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license
is included in the section entitled [[GNU Free Documentation
License|/fdl]]."]]"""]]
[[!tag open_issue_glibc]]
There are a lot of reports about this issue, but no thorough analysis.
# Short Timeouts
## `elinks`
IRC, unknown channel, unknown date:
<paakku> This is related to ELinks... I've looked at the select()
implementation for the Hurd in glibc and it seems that giving it a short
timeout could cause it not to report that file descriptors are ready.
<paakku> It sends a request to the Mach port of each file descriptor and
then waits for responses from the servers.
<paakku> Even if the file descriptors have data for reading or are ready
for writing, the server processes might not respond immediately.
<paakku> So if I want ELinks to check which file descriptors are ready, how
long should the timeout be in order to ensure that all servers can
respond in time?
<paakku> Or do I just imagine this problem?
## [[dbus]]
## IRC
### IRC, freenode, #hurd, 2012-01-31
<braunr> don't you find vim extremely slow lately ?
<braunr> (and not because of cpu usage but rather unnecessary sleeps)
<jkoenig> yes.
<braunr> wasn't there a discussion to add a minimum timeout to mach_msg for
select() or something like that during the past months ?
<youpi> there was, and it was added
<youpi> that could be it
<youpi> I don't want to drop it though, some app really need it
<braunr> as a debian patch only iirc ?
<youpi> yes
<braunr> ok
<braunr> if i'm right, the proper solution was to fix remote servers
instead of client calls
<youpi> (no drop, unless the actual bug gets fixed of course)
<braunr> so i'm guessing it's just a hack in between
<youpi> not only
<youpi> with a timeout of zero, mach will just give *no* time for the
servers to give an answer
<braunr> that's because the timeout is part of the client call
<youpi> so the protocol has to be rethought, both server/client side
<braunr> a suggested solution was to make it a parameter
<braunr> i mean, part of the message
<braunr> not a mach_msg parameter
<jkoenig> OTOH the servers should probably not be trusted to enforce the
timeout.
<braunr> why ?
<jkoenig> they're not necessarily trusted. (but then again, that's not the
only circumstances where that's a problem)
<braunr> there is a proposed solution for that too (trust root and self
servers only by default)
<jkoenig> I'm not sure they're particularily easy to identify in the
general case
<braunr> "they" ? the solutions you mean ?
<braunr> or the servers ?
<youpi> jkoenig: you can't trust the servers in general to provide an
answer, timeout or not
<jkoenig> yes the root/self servers.
<braunr> ah
<youpi> jkoenig: you can stat the actual node before dereferencing the
translator
<jkoenig> could they not report FD activity asynchronously to the message
port? libc would cache the state
<youpi> I don't understand what you mean
<youpi> anyway, really making the timeout part of the message is not a
problem
<braunr> 10:10 < youpi> jkoenig: you can't trust the servers in general to
provide an answer, timeout or not
<youpi> we already trust everything (e.g. read() ) into providing an answer
immediately
<braunr> i don't see why
<youpi> braunr: put sleep(1) in S_io_read()
<youpi> it'll not give you an immediate answer, O_NODELAY being set or not
<braunr> well sleep is evil, but let's just say the server thread blocks
<braunr> ok
<braunr> well fix the server
<youpi> so we agree
<braunr> ?
<youpi> in the current security model, we trust the server into achieve the
timeout
<braunr> yes
<youpi> and jkoenig's remark is more global than just select()
<braunr> taht's why we must make sure we're contacting trusted servers by
default
<youpi> it affects read() too
<braunr> sure
<youpi> so there's no reason not to fix select()
<youpi> that's the important point
<braunr> but this doesn't mean we shouldn't pass the timeout to the server
and expect it to handle it correctly
<youpi> we keep raising issues with things, and not achieve anything, in
the Hurd
<braunr> if it doesn't, then it's a bug, like in any other kernel type
<youpi> I'm not the one to convince :)
<braunr> eh, some would say it's one of the goals :)
<braunr> who's to be convinced then ?
<youpi> jkoenig:
<youpi> who raised the issue
<braunr> ah
<youpi> well, see the irc log :)
<jkoenig> not that I'm objecting to any patch, mind you :-)
<braunr> i didn't understand it that way
<braunr> if you can't trust the servers to act properly, it's similar to
not trusting linux fs code
<youpi> no, the difference is that servers can be non-root
<youpi> while on linux they can't
<braunr> again, trust root and self
<youpi> non-root fuse mounts are not followed by default
<braunr> as with fuse
<youpi> that's still to be written
<braunr> yes
<youpi> and as I said, you can stat the actual node and then dereference
the translator afterwards
<braunr> but before writing anything, we'd better agree on the solution :)
<youpi> which, again, "just" needs to be written
<antrik> err... adding a timeout to mach_msg()? that's just wrong
<antrik> (unless I completely misunderstood what this discussion was
about...)
#### IRC, freenode, #hurd, 2012-02-04
<youpi> this is confirmed: the select hack patch hurts vim performance a
lot
<youpi> I'll use program_invocation_short_name to make the patch even more
ugly
<youpi> (of course, we really need to fix select somehow)
<pinotree> could it (also) be that vim uses select() somehow "badly"?
<youpi> fsvo "badly", possibly, but still
<gnu_srs1> Could that the select() stuff be the reason for a ten times
slower ethernet too, e.g. scp and apt-get?
<pinotree> i didn't find myself neither scp nor apt-get slower, unlike vim
<youpi> see strace: scp does not use select
<youpi> (I haven't checked apt yet)
### IRC, freenode, #hurd, 2012-02-14
<braunr> on another subject, I'm wondering how to correctly implement
select/poll with a timeout on a multiserver system :/
<braunr> i guess a timeout of 0 should imply a non blocking round-trip to
servers only
<braunr> oh good, the timeout is already part of the io_select call
### IRC, freenode, #hurdfr, 2012-02-22
<braunr> le gros souci de notre implé, c'est que le timeout de select est
un paramètre client
<braunr> un paramètre passé directement à mach_msg
<braunr> donc si tu mets un timeout à 0, y a de fortes chances que mach_msg
retourne avant même qu'un RPC puisse se faire entièrement (round-trip
client-serveur donc)
<braunr> et donc quand le timeout est à 0 pour du non bloquant, ben tu
bloques pas, mais t'as pas tes évènements ..
<abique|work> peut-être que passer le timeout de 10ms à 10 us améliorerait
la situation.
<abique|work> car 10ms c'est un peut beaucoup :)
<braunr> c'est l'interval timer système historique unix
<braunr> et mach n'est pas préemptible
<braunr> donc c'est pas envisageable en l'état
<braunr> ceci dit c'est pas complètement lié
<braunr> enfin si, il nous faudrait qqchose de similaire aux high res
timers de linux
<braunr> enfin soit des timer haute résolution, soit un timer programmable
facilement
<braunr> actuellement il n'y a que le 8254 qui est programmé, et pour
assurer un scheduling à peu près correct, il est programmé une fois, à
10ms, et basta
<braunr> donc oui, préciser 1ms ou 1us, ça changera rien à l'interval
nécessaire pour déterminer que le timer a expiré
### IRC, freenode, #hurd, 2012-02-27
<youpi> braunr: extremely dirty hack
<youpi> I don't even want to detail :)
<braunr> oh
<braunr> does it affect vim only ?
<braunr> or all select users ?
<youpi> we've mostly seen it with vim
<youpi> but possibly fakeroot has some issues too
<youpi> it's very little probable that only vim has the issue :)
<braunr> i mean, is it that dirty to switch behaviour depending on the
calling program ?
<youpi> not all select users
<braunr> ew :)
<youpi> just those which do select({0,0})
<braunr> well sure
<youpi> braunr: you guessed right :)
<braunr> thanks anyway
<braunr> it's probably a good thing to do currently
<braunr> vim was getting me so mad i was using sshfs lately
<youpi> it's better than nothing yes
# See Also
See also [[select_bogus_fd]] and [[select_vs_signals]].
|