open_issues/select.mdwn


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220

[[!meta copyright="Copyright © 2010, 2011, 2012 Free Software Foundation,
Inc."]]

[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
id="license" text="Permission is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation License, Version 1.2 or
any later version published by the Free Software Foundation; with no Invariant
Sections, no Front-Cover Texts, and no Back-Cover Texts.  A copy of the license
is included in the section entitled [[GNU Free Documentation
License|/fdl]]."]]"""]]

[[!tag open_issue_glibc]]

There are a lot of reports about this issue, but no thorough analysis.


# Short Timeouts

## `elinks`

IRC, unknown channel, unknown date:

    <paakku> This is related to ELinks... I've looked at the select()
      implementation for the Hurd in glibc and it seems that giving it a short
      timeout could cause it not to report that file descriptors are ready.
    <paakku> It sends a request to the Mach port of each file descriptor and
      then waits for responses from the servers.
    <paakku> Even if the file descriptors have data for reading or are ready
      for writing, the server processes might not respond immediately.
    <paakku> So if I want ELinks to check which file descriptors are ready, how
      long should the timeout be in order to ensure that all servers can
      respond in time?
    <paakku> Or do I just imagine this problem?


## [[dbus]]


## IRC

### IRC, freenode, #hurd, 2012-01-31

    <braunr> don't you find vim extremely slow lately ?
    <braunr> (and not because of cpu usage but rather unnecessary sleeps)
    <jkoenig> yes.
    <braunr> wasn't there a discussion to add a minimum timeout to mach_msg for
      select() or something like that during the past months ?
    <youpi> there was, and it was added
    <youpi> that could be it
    <youpi> I don't want to drop it though, some app really need it
    <braunr> as a debian patch only iirc ?
    <youpi> yes
    <braunr> ok
    <braunr> if i'm right, the proper solution was to fix remote servers
      instead of client calls
    <youpi> (no drop, unless the actual bug gets fixed of course)
    <braunr> so i'm guessing it's just a hack in between
    <youpi> not only
    <youpi> with a timeout of zero, mach will just give *no* time for the
      servers to give an answer
    <braunr> that's because the timeout is part of the client call
    <youpi> so the protocol has to be rethought, both server/client side
    <braunr> a suggested solution was to make it a parameter
    <braunr> i mean, part of the message
    <braunr> not a mach_msg parameter
    <jkoenig> OTOH the servers should probably not be trusted to enforce the
      timeout.
    <braunr> why ?
    <jkoenig> they're not necessarily trusted. (but then again, that's not the
      only circumstances where that's a problem)
    <braunr> there is a proposed solution for that too (trust root and self
      servers only by default)
    <jkoenig> I'm not sure they're particularily easy to identify in the
      general case
    <braunr> "they" ? the solutions you mean ?
    <braunr> or the servers ?
    <youpi> jkoenig: you can't trust the servers in general to provide an
      answer, timeout or not
    <jkoenig> yes the root/self servers.
    <braunr> ah
    <youpi> jkoenig: you can stat the actual node before dereferencing the
      translator
    <jkoenig> could they not report FD activity asynchronously to the message
      port? libc would cache the state
    <youpi> I don't understand what you mean
    <youpi> anyway, really making the timeout part of the message is not a
      problem
    <braunr> 10:10 < youpi> jkoenig: you can't trust the servers in general to
      provide an answer, timeout or not
    <youpi> we already trust everything (e.g. read() ) into providing an answer
      immediately
    <braunr> i don't see why
    <youpi> braunr: put sleep(1) in S_io_read()
    <youpi> it'll not give you an immediate answer, O_NODELAY being set or not
    <braunr> well sleep is evil, but let's just say the server thread blocks
    <braunr> ok
    <braunr> well fix the server
    <youpi> so we agree
    <braunr> ?
    <youpi> in the current security model, we trust the server into achieve the
      timeout
    <braunr> yes
    <youpi> and jkoenig's remark is more global than just select()
    <braunr> taht's why we must make sure we're contacting trusted servers by
      default
    <youpi> it affects read() too
    <braunr> sure
    <youpi> so there's no reason not to fix select()
    <youpi> that's the important point
    <braunr> but this doesn't mean we shouldn't pass the timeout to the server
      and expect it to handle it correctly
    <youpi> we keep raising issues with things, and not achieve anything, in
      the Hurd
    <braunr> if it doesn't, then it's a bug, like in any other kernel type
    <youpi> I'm not the one to convince :)
    <braunr> eh, some would say it's one of the goals :)
    <braunr> who's to be convinced then ?
    <youpi> jkoenig: 
    <youpi> who raised the issue
    <braunr> ah
    <youpi> well, see the irc log :)
    <jkoenig> not that I'm objecting to any patch, mind you :-)
    <braunr> i didn't understand it that way
    <braunr> if you can't trust the servers to act properly, it's similar to
      not trusting linux fs code
    <youpi> no, the difference is that servers can be non-root
    <youpi> while on linux they can't
    <braunr> again, trust root and self
    <youpi> non-root fuse mounts are not followed by default
    <braunr> as with fuse
    <youpi> that's still to be written
    <braunr> yes
    <youpi> and as I said, you can stat the actual  node and then dereference
      the translator afterwards
    <braunr> but before writing anything, we'd better agree on the solution :)
    <youpi> which, again, "just" needs to be written
    <antrik> err... adding a timeout to mach_msg()? that's just wrong
    <antrik> (unless I completely misunderstood what this discussion was
      about...)


#### IRC, freenode, #hurd, 2012-02-04

    <youpi> this is confirmed: the select hack patch hurts vim performance a
      lot
    <youpi> I'll use program_invocation_short_name to make the patch even more
      ugly
    <youpi> (of course, we really need to fix select somehow)
    <pinotree> could it (also) be that vim uses select() somehow "badly"?
    <youpi> fsvo "badly", possibly, but still
    <gnu_srs1> Could that the select() stuff be the reason for a ten times
      slower ethernet too, e.g. scp and apt-get?
    <pinotree> i didn't find myself neither scp nor apt-get slower, unlike vim
    <youpi> see strace: scp does not use select
    <youpi> (I haven't checked  apt yet)


### IRC, freenode, #hurd, 2012-02-14

    <braunr> on another subject, I'm wondering how to correctly implement
      select/poll with a timeout on a multiserver system :/
    <braunr> i guess a timeout of 0 should imply a non blocking round-trip to
      servers only
    <braunr> oh good, the timeout is already part of the io_select call


### IRC, freenode, #hurdfr, 2012-02-22

    <braunr> le gros souci de notre implé, c'est que le timeout de select est
      un paramètre client
    <braunr> un paramètre passé directement à mach_msg
    <braunr> donc si tu mets un timeout à 0, y a de fortes chances que mach_msg
      retourne avant même qu'un RPC puisse se faire entièrement (round-trip
      client-serveur donc)
    <braunr> et donc quand le timeout est à 0 pour du non bloquant, ben tu
      bloques pas, mais t'as pas tes évènements ..
    <abique|work> peut-être que passer le timeout de 10ms à 10 us améliorerait
      la situation.
    <abique|work> car 10ms c'est un peut beaucoup :)
    <braunr> c'est l'interval timer système historique unix
    <braunr> et mach n'est pas préemptible
    <braunr> donc c'est pas envisageable en l'état
    <braunr> ceci dit c'est pas complètement lié
    <braunr> enfin si, il nous faudrait qqchose de similaire aux high res
      timers de linux
    <braunr> enfin soit des timer haute résolution, soit un timer programmable
      facilement
    <braunr> actuellement il n'y a que le 8254 qui est programmé, et pour
      assurer un scheduling à peu près correct, il est programmé une fois, à
      10ms, et basta
    <braunr> donc oui, préciser 1ms ou 1us, ça changera rien à l'interval
      nécessaire pour déterminer que le timer a expiré


### IRC, freenode, #hurd, 2012-02-27

    <youpi> braunr: extremely dirty hack
    <youpi> I don't even want to detail :)
    <braunr> oh
    <braunr> does it affect vim only ?
    <braunr> or all select users ?
    <youpi> we've mostly seen it with vim
    <youpi> but possibly fakeroot has some issues too
    <youpi> it's very little probable that only vim has the issue :)
    <braunr> i mean, is it that dirty to switch behaviour depending on the
      calling program ?
    <youpi> not all select users
    <braunr> ew :)
    <youpi> just those which do select({0,0})
    <braunr> well sure
    <youpi> braunr: you guessed right :)
    <braunr> thanks anyway
    <braunr> it's probably a good thing to do currently
    <braunr> vim was getting me so mad i was using sshfs lately
    <youpi> it's better than nothing yes


# See Also

See also [[select_bogus_fd]] and [[select_vs_signals]].