summaryrefslogtreecommitdiff
path: root/open_issues/rework_gnumach_ipc_spaces.mdwn
blob: b7cda2278e9a45d24ca8be72df220af0ee48e375 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]]

[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
id="license" text="Permission is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation License, Version 1.2 or
any later version published by the Free Software Foundation; with no Invariant
Sections, no Front-Cover Texts, and no Back-Cover Texts.  A copy of the license
is included in the section entitled [[GNU Free Documentation
License|/fdl]]."]]"""]]

[[!tag open_issue_gnumach]]

IRC, freenode, #hurd, 2011-05-07

    <braunr> things that are referred to as "system calls" in glibc are
      actually RPCs to the kernel or other tasks, those RPCs have too lookup
      port rights
    <braunr> the main services have tens of thousands of ports, looking up one
      is slow

There is a [[!FF_project 268]][[!tag bounty]] on this task.

IRC, freenode, #hurd, 2011-04-23

    <braunr> youpi: is there any use of the port renaming facility ?
    <youpi> I don't know
    <braunr> at least, did you see such use ?
    <braunr> i wonder why mach mach_port_insert_right() lets the caller specify
      the port name
    <youpi> ../hurd-debian/hurd/serverboot/default_pager.c:	kr =
      mach_port_rename(	default_pager_self,
    <braunr> mach_port_rename() is used only once, in the default pager
    <braunr> so it's not that important
    <braunr> but mach_port_insert_right() lets userspace task decide the port
      name value
    <youpi> just to repeat myself again, I don't know port stuff very much :)
    <braunr> well you know that a port denotes a right, which denotes a port
    <youpi> yes, but I don't have any real experience with it
    <braunr> err
    <braunr> port name
    <braunr> the only reason I see is that the caller, say /hurd/exec running a
      fork()
    <braunr> hm
    <braunr> no, i don't even see the reason here
    <braunr> port names should be allocated by the kernel only, like file
      descriptors
    <youpi> you can choose file descriptor values too
    <braunr> really ?
    <youpi> with dup2, yes
    <braunr> oh
    <braunr> hm
    <braunr> what's the data structure in current unices to store file
      descriptors ?
    <braunr> a hash table ?
    <youpi> I don't know
    <braunr> i'll have to look at that
    <braunr> FYI, i'm asking these questions because i'm thinking of reworking
      ipc spaces
    <braunr> i believe the use of splay trees completely destroys performance
      of tasks with many many port names such as the root file system
    <youpi> that can be a problem yes
    <youpi> since there are 3 ports per opened file, and like 3 per thread too
    <braunr> + the page cache
    <youpi> with a few thousand opened files and threads, that makes a lot
    <youpi> by "opened file" I meant page cache actually
    <braunr> i saw numbers up to 30k
    <braunr> ok
    <youpi> on buildds I easily see 100k ports
    <braunr> for a single task ?
    <braunr> wow
    <youpi> yes
    <youpi> the page cache is 4k files
    <braunr> so that's definitely worth the try
    <youpi> so that already makes 12k ports
    <youpi> and 4k is not so big
    <braunr> it's limited to 4K ?
    <youpi> I haven't been able to check where the 100k come from yet
    <youpi> braunr: yas
    <braunr> could be leaks :/
    <youpi> yes
    <braunr> omg, a hard limit on the page cache ..
    <youpi> vm/vm_object.c:int		vm_object_cached_max = 4000;	/* may
      be patched*/
    <braunr> mach is really old :(
    <youpi> I've raised it
    <youpi> before it was 200
    <youpi> ...
    <braunr> oO
    <youpi> I tried to dro pthe limit, but then I was lacking memory
    <youpi> which I believe have fixed the other day, but I have to test again
    <braunr> that implementation doesn't know how to deal with memory pressure
    <youpi> yes
    <braunr> i saw your recent changes about adding warnings in such cases
    <braunr> so, back to ipc spaces
    <braunr> i think splay trees 1/ can get very unbalanced easily
    <braunr> which isn't hard to imagine
    <braunr> and 2/ make poor usage of the cpu caches because they're BST and
      write a lot to memory
    <youpi> maybe you could write a patch which would dump statistics on that?
    <braunr> that's part of the job i'm assigning to myself
    <youpi> ok
    <braunr> i'd like to try replacing splay trees with radix trees
    <youpi> I can run it on the buildds
    <youpi> buildds are very good stress-tests :)
    <braunr> :)
    <youpi> 22h building -> 77k ports
    <youpi> 26h building -> 97k ports
    <youpi> the problem is that when I add leak debugging (backtraces), I'm
      getting out of memory :)
    <braunr> that will be a small summer of code outside the gsoc :p
    <braunr> :/
    <braunr> backtraces are very consuming
    <youpi> but that's only because of hardcoded limits
    <youpi> I'll have to test again with bigger limits
    <braunr> again ..
    <braunr> evil hard limits
    <youpi> well, actually we could as well just drop them
    <youpi> but we'd also need to easily get statistics on zone/vm_maps usage
    <youpi> because else we don't see leaks
    <youpi> (except that the machine eventually crashes)
    <braunr> hm
    <braunr> i haven't explained why i was asking my questions actually
    <braunr> so, i want radix trees, because they're nice
    <braunr> they reduce the paths lengths
    <braunr> they don't get too unbalanced (they're invariant wrt the order of
      operations)
    <braunr> they don't need to write to memory on lookups
    <braunr> the only drawback is that they can create much overhead if their
      usage pattern isn't appropriate
    <braunr> elements in such a structure should be close, so that they share
      common nodes
    <youpi> the common usage pattern in ext2fs is a big bunch of ever-open
      ports :)
    <braunr> if there is one entry per node, it's a big waste
    <braunr> yes
    <youpi> there are 3, actually
    <braunr> but the port names have low values
    <braunr> they're allocated sequentially, beginning at 0
    <braunr> (or 1 actually)
    <braunr> which is perfect for radix trees
    <youpi> yes
    <youpi>  97989: send
    <braunr> but if anyone can rename
    <braunr> this introduces a new potential weakness
    <youpi> ah, if it's just a weakness it's probably not a problem
    <youpi> I thought it was even a no-go
    <braunr> i think so
    <youpi> I guess port rename is very seldom
    <braunr> but in a future version, it would be nice not to allow port
      renaming
    <braunr> unless there are similar issues in current unix kernels
    <braunr> in which case i'd say it's acceptable
    <youpi> there are
    <braunr> of that order ?
    <youpi> and it'd be useful for e.g. processing
      tracing/debugging/tweaking/whatever
    <youpi> it's also used to hide fds from a process
    <braunr> port renaming you mean ?
    <youpi> you allocate them very high
    <youpi> yes
    <braunr> ok
    <youpi> choosing your port name, generally
    <youpi> to match what the process expects for instance
    <braunr> then it would be a matter of resource limiting (which we totally
      lack afaik)
    <braunr> along the number of maximum open files, you would have a number of
      maximum rights
    <braunr> does that seem fine to you ?
    <youpi> if done throught rlimits, sure
    <braunr> something similar yes
    <youpi> (_no_ PORTS_MAX ;) )
    <braunr> oh and, in addition, i remember gnumach has a special
      configuration of the processor in which caching is limited
    <braunr> like write-through only
    <youpi> didn't I fix that recently ?
    <braunr> i don't know :)
    <braunr> CR0=e001003b
    <braunr> i don't think it's fixed
    <youpi> I mean, in the git
    <braunr> ah
    <youpi> not in the debian package
    <braunr> didn't tried the git version yet
    <braunr> last time i tried (which was a long time ago), it made the kernel
      crash
    <braunr> have you figured why ?
    <youpi> I'm not aware of that
    <braunr> anyway, splay trees write a lot, and most trees write a lot even
      at insertion/removal to rebalance
    <youpi> braunr: Mmm, there's no clearance of CD in the kernel actually
    <braunr> with radix trees, even if caching can't be fully enabled, it would
      make much better use of it
    <braunr> so if port renaming isn't a true issue, i'll choose that data
      structure
    <youpi> that'd probably be better yes
    <youpi> I'm surprised by the CD, I do remember fixing something like this
      lately
    <braunr> there are several levels where CD can be set
    <braunr> the processors ORs all those if i'm right
    <braunr> to determine if caching is enabled 
    <youpi> I know
    <braunr> ok
    <youpi> but in my memory that was at the CR* level, precisely
    <braunr> maybe for xen only ?
    <youpi> no
    <braunr> well good luck if you hunt that one, i'm off, see you :)
    <youpi> braunr: ah, no, it was the PGE flag that I had fixed

    <antrik> braunr: explicit port naming is used for example to pass some
      initial ports to a new task at well-known places IIRC
    <antrik> braunr: but these tend to be low numbers, so I don't see a problem
      there
    <antrik> (I'm not familiar with radix trees... why would high numbers be a
      problem?)

    <youpi> braunr: iirc the ipc space is limited to ~192k ports

    <braunr> antrik: in most cases i've seen, the insert_right() call is used
      on task_self()
    <braunr> and if there really are special ports (like the bootstrap or
      device ports), they should have special names
    <braunr> IIRC, these ports are given through command line expansion by the
      kernel at boot time
    <braunr> but it seems reasonable to think of port renaming as a potentially
      useful feature
    <braunr> antrik: the problem with radix trees isn't them being high, it's
      them being sparse
    <braunr> you get the most efficient trees when entries have keys that are
      close to each other
    <braunr> because radix trees are a type of tries (the path in the tree is
      based on the elements composing the key)
    <braunr> so the more common prefixes you have, the less external nodes you
      need
    <braunr> here, keys are port names, but they can be memory addresses or
      offsets in memory objects (like in the page cache)
    <braunr> the radix algorithm takes a few bits, say 4 or 6, at a time from a
      key, and uses that as an index in a node
    <braunr> if keys are sparse, there can be as little as one entry per node
    <braunr> IIRC, the worst case (on entry per node with the maximum possible
      number of nodes for a 32-bits key) is 2% entries
    <braunr> the reste being null entries and almost-empty nodes containing
      them
    <braunr> so if you leave the ability to give port rights the names you
      want, you can create such worst case trees
    <braunr> which may consume several MiB of memory per tree
    <braunr> tens of MiB i'd say
    <braunr> on the other hand, in the current state, almost all hurd
      applications use sequentially allocated port names, close to 0 (which
      allows a nice optimization)
    <braunr> so a radix ree would be the most efficient
    <antrik> well, if some processes really feel they must use random numbers
      for port names, they *ought* to be penalized ;-)

2011-04-27

    <braunr> antrik: remember when you asked why high numbers would be a
      problem with radix trees ?
    <braunr> here is a radix tree with one entry, which key is around 5000
    <braunr> [  656.296412] tree height: 3
    <braunr> [  656.296412] index:  0, level:  0, height:  3, count:  1,
      bitmap: 0000000000000002
    <braunr> [  656.296412] index:  1, level:  1, height:  2, count:  1,
      bitmap: 0000000000004000
    <braunr> [  656.296412] index: 14, level:  2, height:  1, count:  1,
      bitmap: 0000000000000080
    <braunr> three levels, each with an external node (dynamically allocated),
      for one entry
    <braunr> so in the worst case of entries with keys close to the highest
      values, the could be many external nodes with higher paths lengths than
      when keys are close to 0
    <braunr> which also brings the problem of port name allocation
    <braunr> can someone with access to a buildd which has an uptime of at
      least a few days (and did at least one build) show me the output of
      portinfo 3 | tail ?
    <braunr> port names are allocated linearly IIRC, like PIDs, and some parts
      of the kernel may rely on them not being reused often
    <braunr> but for maximum effifiency, they should be
    <braunr> efficiency*
    <braunr> 00:00 < braunr> can someone with access to a buildd which has an
      uptime of at least a few days (and did at least one build) show me the
      output of portinfo 3 | tail ?
    <braunr> :)
    <youpi> it's almost like wc -l
    <youpi>   4905: receive
    <youpi> vs 4647
    <youpi> for /
    <youpi>  52902: receive
    <youpi> vs 52207
    <youpi> for the chroot
    <braunr> even after several builds ?
    <braunr> and several days ?
    <youpi> that's after 2 days
    <youpi> it's not so many builds
    <youpi> rossini is not so old
    <youpi> (7h)
    <youpi> but many builds
    <youpi> 70927: send
    <youpi> vs 70938
    <braunr> ok
    <braunr> so it seems port names are reused
    <braunr> good
    <youpi> yes they are clearly
    <braunr> i think i remember a comment about why the same port name
      shouldn't be reused too soon
    <youpi> well, it could help catching programming errors
    <braunr> that it helped catch bugs in applications that could
      deallocate/reallote quickly
    <braunr> reallocate*
    <braunr> without carefuly synchronization
    <braunr> careful
    <braunr> damn, i'm tired :/
    <youpi> but that's about debugging
    <youpi> so we don't care about performance there
    <braunr> yes
    <braunr> i'll try to improve allocation performance too
    <braunr> using e.g. bitmaps in each external node back to the root so that
      unused slots are quickly found
    <braunr> i thknk that's what idr does in linux
    <antrik> braunr: idr?
    <braunr> antrik: a data structure used to map integers to pointers
    <braunr> http://fxr.watson.org/fxr/source/lib/idr.c?v=linux-2.6