open_issues/performance.mdwn


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279

[[!meta copyright="Copyright © 2010, 2011, 2012, 2013, 2014 Free Software
Foundation, Inc."]]

[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
id="license" text="Permission is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation License, Version 1.2 or
any later version published by the Free Software Foundation; with no Invariant
Sections, no Front-Cover Texts, and no Back-Cover Texts.  A copy of the license
is included in the section entitled [[GNU Free Documentation
License|/fdl]]."]]"""]]

*Performance analysis* ([[!wikipedia Performance_analysis desc="Wikipedia
article"]]) deals with analyzing how computing resources are used for
completing a specified task.

[[Profiling]] is one relevant tool.

In [[microkernel]]-based systems, there is generally a considerable [[RPC]]
overhead.

In a multi-server system, it is non-trivial to implement a high-performance
[[I/O System|community/gsoc/project_ideas/disk_io_performance]].

When providing [[faq/POSIX_compatibility]] (and similar interfaces) in an
environemnt that doesn't natively implement these interfaces, there may be a
severe performance degradation.  For example, in this [[`fork` system
call|/glibc/fork]]'s case.

[[Unit_testing]] can be used for tracking performance regressions.

---

  * [[Degradation]]

  * [[fork]]

  * [[IPC_virtual_copy]]

  * [[microbenchmarks]]

  * [[microkernel_multi-server]]

  * [[gnumach_page_cache_policy]]

  * [[metadata_caching]]

  * [[community/gsoc/project_ideas/object_lookups]]

---


# IRC, freenode, #hurd, 2012-07-05

    <braunr> the more i study the code, the more i think a lot of time is
      wasted on cpu, unlike the common belief of the lack of performance being
      only due to I/O


## IRC, freenode, #hurd, 2012-07-23

    <braunr> there are several kinds of scalability issues
    <braunr> iirc, i found some big locks in core libraries like libpager and
      libdiskfs
    <braunr> but anyway we can live with those
    <braunr> in the case i observed, ext2fs, relying on libdiskfs and libpager,
      scans the entire file list to ask for writebacks, as it can't know if the
      pages are dirty or not
    <braunr> the mistake here is moving part of the pageout policy out of the
      kernel
    <braunr> so it would require the kernel to handle periodic synces of the
      page cache
    <antrik> braunr: as for big locks: considering that we don't have any SMP
      so far, does it really matter?...
    <braunr> antrik: yes
    <braunr> we have multithreading
    <braunr> there is no reason to block many threads while if most of them
      could continue
    <braunr> -while
    <antrik> so that's more about latency than throughput?
    <braunr> considering sleeping/waking is expensive, it's also about
      throughput
    <braunr> currently, everything that deals with sleepable locks (both
      gnumach and the hurd) just wake every thread waiting for an event when
      the event occurs (there are a few exceptions, but not many)
    <antrik> ouch


## [[!message-id "20121202101508.GA30541@mail.sceen.net"]]


## IRC, freenode, #hurd, 2012-12-04

    <damo22> why do some people think hurd is slow? i find it works well even
      under heavy load inside a virtual machine
    <braunr> damo22: the virtual machine actually assists the hurd a lot :p
    <braunr> but even with that, the hurd is a slow system
    <damo22> i would have thought it would have the potential to be very fast,
      considering the model of the kernel
    <braunr> the design implies by definition more overhead, but the true cause
      is more than 15 years without optimization on the core components
    <braunr> how so ?
    <damo22> since there are less layers of code between the hardware bare
      metal and the application that users run
    <braunr> how so ? :)
    <braunr> it's the contrary actually
    <damo22> VFS -> IPC -> scheduler -> device drivers -> hardware
    <damo22> that is monolithic
    <braunr> well, it's not really meaningful
    <braunr> and i'd say the same applies for a microkernel system
    <damo22> if the application can talk directly to hardware through the
      kernel its almost like plugging directly into the hardware
    <braunr> you never talk directly to hardware
    <braunr> you talk to servers instead of the kernel
    <damo22> ah
    <braunr> consider monolithic kernel systems like systems with one big
      server
    <braunr> the kernel
    <braunr> whereas a multiserver system is a kernel and many servers
    <braunr> you still need the VFS to identify your service (and thus your
      server)
    <braunr> you need much more IPC, since system calls are "replaced" with RPC
    <braunr> the scheduler is basically the same
    <damo22> okay
    <braunr> device drivers are similar too, except they run in thread context
      (which is usually a bit heavier)
    <damo22> but you can do cool things like report when an interrupt line is
      blocked
    <braunr> and there are many context switches between all that
    <braunr> you can do all that in a monolithic kernel too, and faster
    <braunr> but it's far more elegant, and (when well done) easy to do on a
      microkernel based system
    <damo22> yes
    <damo22> i like elegant, makes coding easier if you know the basics
    <braunr> there are only two major differences between a monolilthic kernel
      and a multiserver microkernel system
    * damo22 listens
    <braunr> 1/ independence of location (your resources could be anywhere)
    <braunr> 2/ separation of address spaces (your servers have their own
      addresses)
    <damo22> wow
    <braunr> these both imply additional layers of indirection, making the
      system as a whole slower
    <damo22> but it would be far more secure though i suspect
    <braunr> yes
    <braunr> and reliable
    <braunr> that's why systems like qnx were usually adopted for critical
      tasks
    <damo22> security and reliability are very important, i would switch to the
      hurd if it supported all the hardware i use 
    <braunr> so would i :)
    <braunr> but performance matters too
    <damo22> not to me
    <braunr> it should :p
    <braunr> it really does matter a lot in practice
    <damo22> i mean, a 2x slowdown compared to linux would not affect me
    <damo22> if it had all the benefits we mentioned above
    <braunr> but the hurd is really slow for other reasons than its additional
      layers of indrection unfortunately
    <damo22> is it because of lack of optimisation in the core code?
    <braunr> we're working on these issues, but it's not easy and takes a lot
      of time :p
    <damo22> like you said
    <braunr> yes
    <braunr> and also because of some fundamental design choices related to the
      microkernel back in the 80s
    <damo22> what about the darwin system
    <damo22> it uses a mach kernel?
    <braunr> yes
    <damo22> what is stopping someone taking the MIT code from darwin and
      creating a monster free OS
    <braunr> what for ?
    <damo22> because it already has hardware support
    <damo22> and a mach kernel
    <braunr> in kernel drivers ?
    <damo22> it has kernel extensions
    <damo22> you can do things like kextload module
    <braunr> first, being a mach kernel doesn't make it compatible or even
      easily usable with the hurd, the interfaces have evolved independantly
    <braunr> and second, we really do want more stuff out of the kernel
    <braunr> drivers in particular
    <damo22> may i ask why you are very keen to have drivers out of kernel?
    <braunr> for the same reason we want other system services out of the
      kernel
    <braunr> security, reliability, etc..
    <braunr> ease of debugging
    <braunr> the ability to restart drivers separately, without restarting the
      kernel
    <damo22> i see


# IRC, freenode, #hurd, 2012-09-13

{{$news/2011-q2#phoronix-3}}.

    <braunr> the phoronix benchmarks don't actually test the operating system
      ..
    <hroi_> braunr: well, at least it tests its ability to run programs for
      those particular tasks
    <braunr> exactly, it tests how programs that don't make much use of the
      operating system run
    <braunr> well yes, we can run programs :)
    <pinotree> those are just cpu-taking tasks
    <hroi_> ok
    <pinotree> if you do a benchmark with also i/o, you can see how it is
      (quite) slower on hurd
    <hroi_> perhaps they should have run 10 of those programs in parallel, that
      would test the kernel multitasking I suppose
    <braunr> not even I/O, simply system calls
    <braunr> no, multitasking is ok on the hurd
    <braunr> and it's very similar to what is done on other systems, which
      hasn't changed much for a long time
    <braunr> (except for multiprocessor)
    <braunr> true OS benchmarks measure system calls
    <hroi_> ok, so Im sensing the view that the actual OS kernel architecture
      dont really make that much difference, good software does
    <braunr> not at all
    <braunr> i'm only saying that the phoronix benchmark results are useless
    <braunr> because they didn't measure the right thing
    <hroi_> ok


# Optimizing Data Structure Layout

## IRC, freenode, #hurd, 2014-01-02

    <braunr> teythoon_: wow, digging into the vm code :)
    <teythoon_> i discovered pahole and gnumach was a tempting target :)
    <braunr> never heard of pahole :/
    <teythoon_> it's nice
    <teythoon_> braunr: try pahole -C kmem_cache /boot/gnumach
    <teythoon_> on linux that is. ...
    <braunr> ok
    <teythoon_> braunr: http://paste.debian.net/73864/
    <braunr> very nice


## IRC, freenode, #hurd, 2014-01-03

    <braunr> teythoon: pahole is a very handy tool :)
    <teythoon> yes
    <teythoon> i especially like how general it is


# <a name="measure">Measure</a>

On some pages, we're filing information about performace measurements.


## kepler.SCHWINGE

Debian GNU/Linux, x86.  Running as a Xen domU, the system is not reserved
exclusively for measurement purposes, so it's a best-effort service.


## laplace.SCHWINGE

Debian GNU/Hurd, x86.  Running as a QEMU/KVM instance, the system is not
reserved exclusively for measurement purposes, so it's a best-effort service.


### [[!message-id "87wqghouoc.fsf@schwinge.name"]]

### IRC, freenode, #hurd, 2014-02-27

    <braunr> tschwinge: about your concern with regard to performance
      measurements, you could run kvm with hugetlbfs and cpuset
    <braunr> on a machine that provides nested page tables, this makes the
      virtualization overhead as small as it could be considering the
      implementatoin
    <braunr> hugetlbs reduces the overhead of page faults, and also implies
      locked memory while cpuset isolates the vm from global scheduling
    <braunr> hugetlbfs*


### 2014-07-25, tschwinge

Support for [huge pages](https://wiki.debian.org/Hugepages) as well as [CPU
sets](https://code.google.com/p/cpuset/) requires special setup; not doing that
at the moment.