diff options
authorThomas Schwinge <>2011-10-03 20:49:54 +0200
committerThomas Schwinge <>2011-10-03 20:49:54 +0200
commit219988e74ba30498a1c5d71cf557913a70ccca91 (patch)
parent278f76de415c83bd06146b2f25a002cf0411d025 (diff)
16 files changed, 1298 insertions, 33 deletions
diff --git a/faq/which_microkernel/discussion.mdwn b/faq/which_microkernel/discussion.mdwn
index 9ef3b915..7ea131e9 100644
--- a/faq/which_microkernel/discussion.mdwn
+++ b/faq/which_microkernel/discussion.mdwn
@@ -1,3 +1,20 @@
+[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]]
+[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
+id="license" text="Permission is granted to copy, distribute and/or modify this
+document under the terms of the GNU Free Documentation License, Version 1.2 or
+any later version published by the Free Software Foundation; with no Invariant
+Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license
+is included in the section entitled [[GNU Free Documentation
+[[!tag open_issue_documentation]]
+# Olaf, 2011-04-10
This version mixes up three distinct phases: rewrite from scratch; redesign;
own microkernel.
@@ -31,3 +48,47 @@ to the Coyotos port -- which after all is what the title promises...
All in all, I still think my text was better. If you have any conerns with it,
please discuss them...
+# IRC, freenode, #hurd, 2011-09-27
+ <cjuner> Does anyone remember/know if/why not seL4 was considered for
+ hurd-l4? Is anyone aware of any differences between seL4 and coyotos?
+## 2011-09-28
+ <antrik> cjuner: the seL4 project was only at the beginning when the
+ decision was made. so was Coyotos, but Shapiro promised back then that
+ building on EROS, it would be done very fast (a promise he couldn't keep
+ BTW); plus he convinced the people in question that it's safer to build
+ on his ideas...
+ <antrik> it doesn't really matter though, as by the time the ngHurd people
+ were through with Coyotos, they had already concluded that it doesn't
+ make sense to build upon *any* third-party microkernel
+ <cjuner> antrik, what was the problem with coyotos? what would be the
+ problem with sel4 today?
+ <cjuner> antrik, yes I did read the FAQ. It doesn't mention seL4 at all
+ (there isn't even much on the hurd-l4 mailing lists, I think that being
+ due to seL4 not having been released at that point?) and it does not
+ specify what problems they had with coyotos.
+ <antrik> cjuner: it doesn't? I thought it mentioned "newer L4 variants" or
+ something like that... but the text was rewritten a couple of times, so I
+ guess it got lost somewhere
+ <antrik> cjuner: unlike original L4, it's probably possible to implement a
+ system like the Hurd on top on seL4, just like on top of
+ Coyotos. however, foreign microkernels are always created with foreign
+ design ideas in mind; and building our own design around them is always
+ problematic. it's problematic with Mach, and it will be problematic with
+ any other third-party microkernel
+ <antrik> Coyotos specifically has different ideas about memory protection,
+ different ideas about task startup, different ideas about memory
+ handling, and different ideas about resource allocation
+ <cjuner> antrik, do any specific problems of the foreign designs,
+ specifically of seL4 or coyotos come to mind?
+ <antrik> cjuner: I mentioned several for Coyotos. I don't have enough
+ understanding of the matters to go into much more detail
+ <antrik> (and I suspect you don't have enough understanding of these
+ matters to take away anything useful from more detail ;-) )
+ <antrik> I could try to explain the issues I mentioned for Coyotos (as far
+ as I understand them), but would that really help you?
diff --git a/hurd/translator/tmpfs/tmpfs_vs_defpager.mdwn b/hurd/translator/tmpfs/tmpfs_vs_defpager.mdwn
index f0eb473c..ecebe662 100644
--- a/hurd/translator/tmpfs/tmpfs_vs_defpager.mdwn
+++ b/hurd/translator/tmpfs/tmpfs_vs_defpager.mdwn
@@ -8,9 +8,10 @@ Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license
is included in the section entitled [[GNU Free Documentation
-[[!tag open_issue_hurd]]
+[[!tag open_issue_gnumach open_issue_hurd]]
-\#hurd, freenode, 2010
+# IRC, freenode, #hurd, 2010
<slpz> humm... why does tmpfs try to use the default pager? that's a bad
idea, and probably will never work correctly...
@@ -120,3 +121,113 @@ License|/fdl]]."]]"""]]
memory, gives them a reference to the default pager by calling
<slpz> this is not really important, but worth noting ;-)
+# IRC, freenode, #hurd, 2011-09-28
+ <slpz> mcsim: "Fix tmpfs" task should be called "Fix default pager" :-)
+ <slpz> mcsim: I've been thinking about modifying tmpfs to actually have
+ it's own storeio based backend, even if a tmpfs with storage sounds a bit
+ stupid.
+ <slpz> mcsim: but I don't like the idea of having translators messing up
+ with the default pager...
+ <antrik> slpz: messing up?...
+ <slpz> antrik: in the sense of creating a number of arbitrarily sized
+ objects
+ <antrik> slpz: well, it doesn't really matter much whether a process
+ indirectly eats up arbitrary amounts of swap through tmpfs, or directly
+ through vm_allocate()...
+ <antrik> though admittedly it's harder to implement resource limits with
+ tmpfs
+ <slpz> antrik: but I've talked about having its own storeio device as
+ backend. This way Mach can pageout memory to tmpfs if it's needed.
+ <mcsim> Do I understand correctly that the goal of tmpfs task is to create
+ tmpfs in RAM?
+ <slpz> mcsim: It is. But it also needs some kind of backend, just in case
+ it's ordered to page out data to free some system's memory.
+ <slpz> mcsim: Nowadays, this backend is another translator that acts as
+ default pager for the whole system
+ <antrik> slpz: pageout memory to tmpfs? not sure what you mean
+ <slpz> antrik: I mean tmpfs acting as its own pager
+ <antrik> slpz: you mean tmpfs not using the swap partition, but some other
+ backing store?
+ <slpz> antrik: Yes.
+See also: [[open_issues/resource_management_problems/pagers]].
+ <antrik> slpz: I don't think an extra backing store for tmpfs is a good
+ idea. the whole point of tmpfs is not having a backing store... TBH, I'd
+ even like to see a single backing store for anonymous memory and named
+ files
+ <slpz> antrik: But you need a backing store, even if it's the default pager
+ :-)
+ <slpz> antrik: The question is, Should users share the same backing store
+ (swap space) or provide their own?
+ <antrik> slpz: not sure what you mean by "users" in this context :-)
+ <slpz> antrik: Real users with the ability of setting tmpfs translators
+ <antrik> essentially, I'd like to have a single partition that contains
+ both swap space and the main filesystem (at least /tmp, but probably also
+ all of /run, and possibly even /home...)
+ <antrik> but that's a bit off-topic :-)
+ <antrik> well, ideally all storage should be accounted to a user,
+ regardless whether it's swapped out anonymous storage, temporary named
+ files, or permanent files
+ <slpz> antrik: you could use a file as backend for tmpfs
+ <antrik> slpz: what's the point of using tmpfs then? :-)
+ <pinotree> (and then store the file in another tmpfs)
+ <slpz> antrik: mach-defpager could be modified to use storeio instead of
+ Mach's device_* operations, but by the way things work right now, that
+ could be dangerous, IMHO
+ <antrik> pinotree: hehe
+ <pinotree> .. recursive tmpfs'es ;)
+ <antrik> slpz: hm, sounds interesting
+ <slpz> antrik: tmpfs would try to keep data in memory always it's possible
+ (not calling m_o_lock_request would do the trick), but if memory is
+ scarce an Mach starts paging out, it would write it to that
+ file/device/whatever
+ <antrik> ideally, all storage used by system tasks for swapped out
+ anonymous memory as well as temporary named files would end up on the
+ /run partition; while all storage used by users would end up in /home/*
+ <antrik> if users share a partition, some explicit storage accounting would
+ be useful too...
+ <antrik> slpz: is that any different from what "normal" filesystems do?...
+ <antrik> (and *should* it be different?...)
+ <slpz> antrik: Yes, as most FS try to synchronize to disk at a reasonable
+ rate, to prevent data losses.
+ <slpz> antrik: tmpfs would be a FS that wouldn't synchronize until it's
+ forced to do that (which, by the way, it's what's currently happening
+ with everyone that uses the default pager).
+ <antrik> slpz: hm, good point...
+ <slpz> antrik: Also, metadata in never written to disk, only kept in memory
+ (which saves a lot of I/O, too).
+ <slpz> antrik: In fact, we would be doing the same as every other kernel
+ does, but doing it explicitly :-)
+ <antrik> I see the use in separating precious data (in permanent named
+ files) from temporary state (anonymous memory and temporary named files)
+ -- but I'm not sure whether having a completely separate FS for the
+ temporary data is the right approach for that...
+ <slpz> antrik: And giving the user the option to specify its own storage,
+ so we don't limit him to the size established for swap by the super-user.
+ <antrik> either way, that would be a rather radical change... still would
+ be good to fix tmpfs as it is first if possible
+ <antrik> as for limited swap, that's precisely why I'd prefer not to have
+ an extra swap partition at all...
+ <slpz> antrik: It's not much o fa change, it's how it works right now, with
+ the exception of replacing the default pager with its own.
+ <slpz> antrik: I think it's just a matter of 10-20 hours, as
+ much. Including testing.
+ <slpz> antrik: It could be forked with another name, though :-)
+ <antrik> slpz: I don't mean radical change in the implementation... but a
+ radical change in the way it would be used
+ <slpz> antrik: I suggest "almosttmpfs" as the name for the forked one :-P
+ <antrik> hehe
+ <antrik> how about lazyfs?
+ <slpz> antrik: That sound good to me, but probably we should use a more
+ descriptive name :-)
+## 2011-09-29
+ <tschwinge> slpz, antrik: There is a defpager in the Hurd code. It is not
+ currently being used, and likely incomplete. It is backed by libstore.
+ I have never looked at it.
diff --git a/open_issues/code_analysis.mdwn b/open_issues/code_analysis.mdwn
index 552cd2c9..7495221b 100644
--- a/open_issues/code_analysis.mdwn
+++ b/open_issues/code_analysis.mdwn
@@ -19,7 +19,12 @@ analysis|performance]], [[formal_verification]], as well as general
-# Suggestions
+# Bounty
+There is a [[!FF_project 276]][[!tag bounty]] on some of these tasks.
+# Static
* [[GCC]]'s warnings. Yes, really.
@@ -52,8 +57,6 @@ analysis|performance]], [[formal_verification]], as well as general
* <>
- * [[community/gsoc/project_ideas/Valgrind]]
* [Smatch](
* [Parfait](
@@ -66,7 +69,12 @@ analysis|performance]], [[formal_verification]], as well as general
* [sixgill](
- * [Coverity]( -- commercial?
+ * [Coverity]( (nonfree?)
+# Dynamic
+ * [[community/gsoc/project_ideas/Valgrind]]
* <>
@@ -76,7 +84,15 @@ analysis|performance]], [[formal_verification]], as well as general
* <>
-# Bounty
-There is a [[!FF_project 276]][[!tag bounty]] on some of these tasks.
+ * IRC, freenode, #glibc, 2011-09-28
+ <vsrinivas> two things you can do -- there is an environment variable
+ (DEBUG_MALLOC_ iirc?) that can be set to 2 to make ptmalloc (glibc's
+ allocator) more forceful and verbose wrt error checking
+ <vsrinivas> another is to grab a copy of Tor's source tree and copy out
+ OpenBSD's allocator (its a clearly-identifyable file in the tree);
+ LD_PRELOAD it or link it into your app, it is even more aggressive
+ about detecting memory misuse.
+ <vsrinivas> third, Red hat has a gdb python plugin that can instrument
+ glibc's heap structure. its kinda handy, might help?
+ <vsrinivas> MALLOC_CHECK_ was the envvar you want, sorry.
diff --git a/open_issues/default_pager.mdwn b/open_issues/default_pager.mdwn
index 189179c6..18670c75 100644
--- a/open_issues/default_pager.mdwn
+++ b/open_issues/default_pager.mdwn
@@ -18,6 +18,9 @@ IRC, freenode, #hurd, 2011-08-31:
have rewritten their swap pager
<antrik> (and also I/O performance steadily dropping before that point is
+[[performance/degradation]] (?).
<antrik> hm
<braunr> there could too many things
<antrik> perhaps we could "borrow" from one of them? :-)
diff --git a/open_issues/gnumach_memory_management.mdwn b/open_issues/gnumach_memory_management.mdwn
index 1fe2f9be..fb3d6895 100644
--- a/open_issues/gnumach_memory_management.mdwn
+++ b/open_issues/gnumach_memory_management.mdwn
@@ -1412,3 +1412,368 @@ There is a [[!FF_project 266]][[!tag bounty]] on this task.
better cache->nr_slabs * cache->bufs_per_slab * cache->buf_size or
cache->nr_slabs * cache->slab_size?
<braunr> the latter
+# IRC, freenode, #hurd, 2011-09-07
+ <mcsim> braunr: I've disabled calling of mem_cpu_pool_fill and allocator
+ became faster
+ <braunr> mcsim: sounds nice
+ <braunr> mcsim: i suspect the free path might not be as fast though
+ <mcsim> results for first calling: second:
+ and with many alloc/free:
+ <braunr> mcsim: thanks
+ <mcsim> best result are for second call: average time decreased from 159.56
+ to 118.756
+ <mcsim> First call slightly worse, but this is because I've added some
+ profiling code
+ <braunr> i still see some ~8k lines in 128639
+ <braunr> even some around ~12k
+ <mcsim> I think this is because of mem_cache_grow I'm investigating it now
+ <braunr> i guess so too
+ <mcsim> I've measured time for first call in cache and from about 22000
+ mem_cache_grow takes 20000
+ <braunr> how did you change the code so that it doesn't call
+ mem_cpu_pool_fill ?
+ <braunr> is the cpu layer still used ?
+ <mcsim>
+ <braunr> don't forget the free path
+ <braunr> mcsim: anyway, even with the previous slightly slower behaviour we
+ could observe, the performance hit is negligible
+ <mcsim> Is free path a compilation? (I'm sorry for my english)
+ <braunr> mcsim: mem_cache_free
+ <braunr> mcsim: the last two measurements i'd advise are with big (>4k)
+ object sizes and, really, kernel allocator consumption
+ <mcsim>
+ (first, second, small)
+ <braunr> mcsim: these numbers are closer to the zalloc ones, aren't they ?
+ <mcsim> deallocating slighty faster too
+ <braunr> it may not be the case with larger objects, because of the use of
+ a tree
+ <mcsim> yes, they are closer
+ <braunr> but then, i expect some space gains
+ <braunr> the whole thing is about compromise
+ <mcsim> ok. I'll try to measure them today. Anyway I'll post result and you
+ could read them in the morning
+ <braunr> at least, it shows that the zone allocator was actually quite good
+ <braunr> i don't like how the code looks, there are various hacks here and
+ there, it lacks self inspection features, but it's quite good
+ <braunr> and there was little room for true improvement in this area, like
+ i told you :)
+ <braunr> (my allocator, like the current x15 dev branch, focuses on mp
+ machines)
+ <braunr> mcsim: thanks again for these numbers
+ <braunr> i wouldn't have had the courage to make the tests myself before
+ some time eh
+ <mcsim> braunr: hello. Look at the small_4096 results
+ (balloc)
+ (zalloc)
+ <braunr> mcsim: wow, what's that ? :)
+ <braunr> mcsim: you should really really include your test parameters in
+ the report
+ <braunr> like object size, purpose, and other similar details
+ <mcsim> for balloc I specified only object_size = 4096
+ <mcsim> for zalloc object_size = 4096, alloc_size = 4096, memtype = 0;
+ <braunr> the results are weird
+ <braunr> apart from the very strange numbers (e.g. 0 or 4429543648), none
+ is around 3k, which is the value matching a kmem_alloc call
+ <braunr> happy to see balloc behaves quite good for this size too
+ <braunr> s/good/well/
+ <mcsim> Oh
+ <mcsim> here is significant only first 101 lines
+ <mcsim> I'm sorry
+ <braunr> ok
+ <braunr> what does the test do again ? 10 loops of 10 allocs/frees ?
+ <mcsim> yes
+ <braunr> ok, so the only slowdown is at the beginning, when the slabs are
+ created
+ <braunr> the two big numbers (31844 and 19548) are strange
+ <mcsim> on the other hand time of compilation is
+ <mcsim> balloc zalloc
+ <mcsim> 38m28.290s 38m58.400s
+ <mcsim> 38m38.240s 38m42.140s
+ <mcsim> 38m30.410s 38m52.920s
+ <braunr> what are you compiling ?
+ <mcsim> gnumach kernel
+ <braunr> in 40 mins ?
+ <mcsim> yes
+ <braunr> you lack hvm i guess
+ <mcsim> is it long?
+ <mcsim> I use real PC
+ <braunr> very
+ <braunr> ok
+ <braunr> so it's normal
+ <mcsim> in vm it was about 2 hours)
+ <braunr> the difference really is negligible
+ <braunr> ok i can explain the big numbers
+ <braunr> the slab size depends on the object size, and for 4k, it is 32k
+ <braunr> you can store 8 4k buffers in a slab (lines 2 to 9)
+ <mcsim> so we need use kmem_alloc_* 8 times?
+ <braunr> on line 10, the ninth object is allocated, which adds another slab
+ to the cache, hence the big number
+ <braunr> no, once for a size of 32k
+ <braunr> and then the free list is initialized, which means accessing those
+ pages, which means tlb misses
+ <braunr> i guess the zone allocator already has free pages available
+ <mcsim> I see
+ <braunr> i think you can stop performance measurements, they show the
+ allocator is slightly slower, but so slightly we don't care about that
+ <braunr> we need numbers on memory usage now (at the page level)
+ <braunr> and this isn't easy
+ <mcsim> For balloc I can get numbers if I summarize nr_slabs*slab_size for
+ each cache, isn't it?
+ <braunr> yes
+ <braunr> you can have a look at the original implementation, function
+ mem_info
+ <mcsim> And for zalloc I have to summarize of cur_size and then add
+ zalloc_wasted_space?
+ <braunr> i don't know :/
+ <braunr> i think the best moment to obtain accurate values is after zone_gc
+ removes the collected pages
+ <braunr> for both allocators, you could fill a stats structure at that
+ moment, and have an rpc copy that structure when a client tool requests
+ it
+ <braunr> concerning your tests, there is another point to have in mind
+ <braunr> the very first loop in your code shows a result of 31844
+ <braunr> although you disabled the call to cpu_pool_fill
+ <braunr> but the reason why it's so long is that the cpu layer still exists
+ <braunr> and if you look carefully, the cpu pools are created as needed on
+ the free path
+ <mcsim> I removed cpu_pool_drain
+ <braunr> but not cpu_pool_push/pop i guess
+ <mcsim>
+ <braunr> see, you still allocate the cpu pool array on the free path
+ <mcsim> but I don't fill it
+ <braunr> that's not the point
+ <braunr> it uses mem_cache_alloc
+ <braunr> so in a call to free, you can also have an allocation, that can
+ potentially create a new slab
+ <mcsim> I see, so I have to create cpu_pool at the initialization stage?
+ <braunr> no, you can't
+ <braunr> there is a reason why they're allocated on the free path
+ <braunr> but since you don't have the fill/drain functions, i wonder if you
+ should just comment out the whole cpu layer code
+ <braunr> but hmm
+ <braunr> no really, it's not worth the effort
+ <braunr> even with drains/fills, the results are really good enough
+ <braunr> it makes the allocator smp ready
+ <braunr> we should just keep it that way
+ <braunr> mcsim: fyi, the reason why cpu pool arrays are allocated on the
+ free path is to avoid recursion
+ <braunr> because cpu pool arrays are allocated from caches just as almost
+ everything else
+ <mcsim> ok
+ <mcsim> summ of cur_size and then adding zalloc_wasted_space gives 0x4e1954
+ <mcsim> but this value isn't even page aligned
+ <mcsim> For balloc I've got 0x4c6000 0x4aa000 0x48d000
+ <braunr> hm can you report them in decimal, >> 10 so that values are in KiB
+ ?
+ <mcsim> 4888 4776 4660 for balloc
+ <mcsim> 4998 for zalloc
+ <braunr> when ?
+ <braunr> after boot ?
+ <mcsim> boot, compile, zone_gc
+ <mcsim> and then measure
+ <braunr> ?
+ <mcsim> I call garbage collector before measuring
+ <mcsim> and I measure after kernel compilation
+ <braunr> i thought it took you 40 minutes
+ <mcsim> for balloc I got results at night
+ <braunr> oh so you already got them
+ <braunr> i can't beleive the kernel only consumes 5 MiB
+ <mcsim> before gc it takes about 9052 Kib
+ <braunr> can i see the measurement code ?
+ <braunr> oh, and how much ram does your machine have ?
+ <mcsim> 758 mb
+ <mcsim> 768
+ <braunr> that's really weird
+ <braunr> i'd expect the kernel to consume much more space
+ <mcsim>
+ <mcsim> it's only dynamically allocated data
+ <braunr> yes
+ <braunr> ipc ports, rights, vm map entries, vm objects, and lots of other
+ hanging buffers
+ <braunr> about how much is zalloc_wasted_space ?
+ <braunr> if it's small or constant, i guess you could ignore it
+ <mcsim> about 492
+ <mcsim> KiB
+ <braunr> well it's another good point, mach internal structures don't imply
+ much overhead
+ <braunr> or, the zone allocator is underused
+ <tschwinge> mcsim, braunr: The memory allocator project is coming along
+ good, as I get from your IRC messages?
+ <braunr> tschwinge: yes, but as expected, improvements are minor
+ <tschwinge> But at the very least it's now well-known, maintainable code.
+ <braunr> yes, it's readable, easier to understand, provides self inspection
+ and is smp ready
+ <braunr> there also are less hacks, but a few less features (there are no
+ way to avoid sleeping so it's unusable - and unused - in interrupt
+ handlers)
+ <braunr> is* no way
+ <braunr> tschwinge: mcsim did a good job porting and measuring it
+# IRC, freenode, #hurd, 2011-09-08
+ <antrik> braunr: note that the zalloc map used to be limited to 8 MiB or
+ something like that a couple of years ago... so it doesn't seems
+ surprising that the kernel uses "only" 5 MiB :-)
+ <antrik> (yes, we had a *lot* of zalloc panics back then...)
+# IRC, freenode, #hurd, 2011-09-14
+ <mcsim> braunr: hello. I've written a constructor for kernel map entries
+ and it can return resources to their source. Can you have a look at it?
+ If all be OK I'll push it tomorrow.
+ <braunr> mcsim: send the patch through mail please, i'll apply it on my
+ copy
+ <braunr> are you sure the cache is reapable ?
+ <mcsim> All slabs, except first I allocate with kmem_alloc_wired.
+ <braunr> how can you be sure ?
+ <mcsim> First slab I allocate during bootstrap and use pmap_steal_memory
+ and further I use only kmem_alloc_wired
+ <braunr> no, you use kmem_free
+ <braunr> in kentry_dealloc_cache()
+ <braunr> which probably creates a recursion
+ <braunr> using the constructor this way isn't a good idea
+ <braunr> constructors are good for preconstructed state (set counters to 0,
+ init lists and locks, that kind of things, not allocating memory)
+ <braunr> i don't think you should try to make this special cache reapable
+ <braunr> mcsim: keep in mind constructors are applied on buffers at *slab*
+ creation, not at object allocation
+ <braunr> so if you allocate a single slab with, say, 50 or 100 objects per
+ slab, kmem_alloc_wired would be called that number of times
+ <mcsim> why kentry_dealloc_cache can create recursion? kentry_dealloc_cache
+ is called only by mem_cache_reap.
+ <braunr> right
+ <braunr> but are you totally sure mem_cache_reap() can't be called by
+ kmem_free() ?
+ <braunr> i think you're right, it probably can't
+# IRC, freenode, #hurd, 2011-09-25
+ <mcsim> braunr: hello. I rewrote constructor for kernel entries and seems
+ that it works fine. I think that this was last milestone. Only moving of
+ memory allocator sources to more appropriate place and merge with main
+ branch left.
+ <braunr> mcsim: it needs renaming and reindenting too
+ <mcsim> for reindenting C-x h Tab in emacs will be enough?
+ <braunr> mcsim: make sure which style must be used first
+ <mcsim> and what should I rename and where better to place allocator? For
+ example, there is no lib directory, like in x15. Should I create it and
+ move list.* and rbtree.* to lib/ or move these files to util/ or
+ something else?
+ <braunr> mcsim: i told you balloc isn't a good name before, use something
+ more meaningful (kmem is already used in gnumach unfortunately if i'm
+ right)
+ <braunr> you can put the support files in kern/
+ <mcsim> what about vm_alloc?
+ <braunr> you should prefix it with vm_
+ <braunr> shouldn't
+ <braunr> it's a top level allocator
+ <braunr> on top of the vm system
+ <braunr> maybe mcache
+ <braunr> hm no
+ <braunr> maybe just km_
+ <mcsim> kern/km_alloc.*?
+ <braunr> no
+ <braunr> just km
+ <mcsim> ok.
+# IRC, freenode, #hurd, 2011-09-27
+ <mcsim> braunr: hello. When I've tried to speed of new allocator and bad
+ I've removed function mem_cpu_pool_fill. But you've said to undo this. I
+ don't understand why this function is necessary. Can you explain it,
+ please?
+ <mcsim> When I've tried to compare speed of new allocator and old*
+ <braunr> i'm not sure i said that
+ <braunr> i said the performance overhead is negligible
+ <braunr> so it's better to leave the cpu pool layer in place, as it almost
+ doesn't hurt
+ <braunr> you can implement the KMEM_CF_NO_CPU_POOL I added in the x15 mach
+ version
+ <braunr> so that cpu pools aren't used by default, but the code is present
+ in case smp is implemented
+ <mcsim> I didn't remove cpu pool layer. I've just removed filling of cpu
+ pool during creation of slab.
+ <braunr> how do you fill the cpu pools then ?
+ <mcsim> If object is freed than it is added to cpu poll
+ <braunr> so you don't fill/drain the pools ?
+ <braunr> you try to get/put an object and if it fails you directly fall
+ back to the slab layer ?
+ <mcsim> I drain them during garbage collection
+ <braunr> oh
+ <mcsim> yes
+ <braunr> you shouldn't touch the cpu layer during gc
+ <braunr> the number of objects should be small enough so that we don't care
+ much
+ <mcsim> ok. I can drain cpu pool at any other time if it is prohibited to
+ in mem_gc.
+ <mcsim> But why do we need to fill cpu poll during slab creation?
+ <mcsim> In this case allocation consist of: get object from slab -> put it
+ to cpu pool -> get it from cpu pool
+ <mcsim> I've just remove last to stages
+ <braunr> hm cpu pools aren't filled at slab creation
+ <braunr> they're filled when they're empty, and drained when they're full
+ <braunr> so that the number of objects they contain is increased/reduced to
+ a value suitable for the next allocations/frees
+ <braunr> the idea is to fall back as little as possible to the slab layer
+ because it requires the acquisition of the cache lock
+ <mcsim> oh. You're right. I'm really sorry. The point is that if cpu pool
+ is empty we don't need to fill it first
+ <braunr> uh, yes we do :)
+ <mcsim> Why cache locking is so undesirable? If we have free objects in
+ slabs locking will not take a lot if time.
+ <braunr> mcsim: it's undesirable on a smp system
+ <mcsim> ok.
+ <braunr> mcsim: and spin locks are normally noops on a up system
+ <braunr> which is the case in gnumach, hence the slightly better
+ performances without the cpu layer
+ <braunr> but i designed this allocator for x15, which only supports mp
+ systems :)
+ <braunr> mcsim: sorry i couldn't look at your code, sick first, busy with
+ server migration now (new server almost ready for xen hurds :))
+ <mcsim> ok.
+ <mcsim> I ended with allocator if didn't miss anything important:)
+ <braunr> i'll have a look soon i hope :)
+# IRC, freenode, #hurd, 2011-09-27
+ <antrik> braunr: would it be realistic/useful to check during GC whether
+ all "used" objects are actually in a CPU pool, and if so, destroy them so
+ the slab can be freed?...
+ <antrik> mcsim: BTW, did you ever do any measurements of memory
+ use/fragmentation?
+ <mcsim> antrik: I couldn't do this for zalloc
+ <antrik> oh... why not?
+ <antrik> (BTW, I would be interested in a comparision between using the CPU
+ layer, and bare slab allocation without CPU layer)
+ <mcsim> Result I've got were strange. It wasn't even aligned to page size.
+ <mcsim> Probably is it better to look into /proc/vmstat?
+ <mcsim> Because I put hooks in the code and probably I missed something
+ <antrik> mcsim: I doubt vmstat would give enough information to make any
+ useful comparision...
+ <braunr> antrik: isn't this draining cpu pools at gc time ?
+ <braunr> antrik: the cpu layer was found to add a slight overhead compared
+ to always falling back to the slab layer
+ <antrik> braunr: my idea is only to drop entries from the CPU cache if they
+ actually prevent slabs from being freed... if other objects in the slab
+ are really in use, there is no point in flushing them from the CPU cache
+ <antrik> braunr: I meant comparing the fragmentation with/without CPU
+ layer. the difference in CPU usage is probably negligable anyways...
+ <antrik> you might remember that I was (and still am) sceptical about CPU
+ layer, as I suspect it worsens the good fragmentation properties of the
+ pure slab allocator -- but it would be nice to actually check this :-)
+ <braunr> antrik: right
+ <braunr> antrik: the more i think about it, the more i consider slqb to be
+ a better solution ...... :>
+ <braunr> an idea for when there's time
+ <braunr> eh
+ <antrik> hehe :-)
diff --git a/open_issues/libmachuser_libhurduser_rpc_stubs.mdwn b/open_issues/libmachuser_libhurduser_rpc_stubs.mdwn
index d069641e..93055b77 100644
--- a/open_issues/libmachuser_libhurduser_rpc_stubs.mdwn
+++ b/open_issues/libmachuser_libhurduser_rpc_stubs.mdwn
@@ -1,4 +1,4 @@
-[[!meta copyright="Copyright © 2010 Free Software Foundation, Inc."]]
+[[!meta copyright="Copyright © 2010, 2011 Free Software Foundation, Inc."]]
[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
id="license" text="Permission is granted to copy, distribute and/or modify this
@@ -8,19 +8,49 @@ Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license
is included in the section entitled [[GNU Free Documentation
-bug-hurd discussion.
+[[!tag open_issue_glibc open_issue_hurd]]
-IRC, #hurd, 2010-08-12
- <jkoenig> Looking at hurd.git, shouldn't {hurd,include}/Makefile's "all" target do something, and shouldn't pretty much everything depend on them? As it stands it seems that the system headers are used and the potentially newer ones never get built, except maybe on "install" (which is seemingly never called from the top-level Makefile)
- <jkoenig> I would fix it, but something tells me that maybe it's a feature :-)
+# bug-hurd discussion.
+# IRC, freenode, #hurd, 2010-08-12
+ <jkoenig> Looking at hurd.git, shouldn't {hurd,include}/Makefile's "all"
+ target do something, and shouldn't pretty much everything depend on them?
+ As it stands it seems that the system headers are used and the
+ potentially newer ones never get built, except maybe on "install" (which
+ is seemingly never called from the top-level Makefile)
+ <jkoenig> I would fix it, but something tells me that maybe it's a feature
+ :-)
<antrik> jkoenig: the headers are provided by glibc, along with the stubs
- <jkoenig> antrik, you mean, even those built from the .defs files in hurd/ ?
+ <jkoenig> antrik, you mean, even those built from the .defs files in hurd/
+ ?
<antrik> yes
<jkoenig> oh, ok then.
- <antrik> as glibc provides the stubs (in libhurduser), the headers also have to come from there, or they would get out of sync
- <jkoenig> hmm, shouldn't glibc also provide /usr/share/msgids/hurd.msgids, then?
- <antrik> jkoenig: not necessarily. the msgids describe what the servers actually understand. if the stubs are missing from libhurduser, that's no reason to leave out the msgids...
+ <antrik> as glibc provides the stubs (in libhurduser), the headers also
+ have to come from there, or they would get out of sync
+ <jkoenig> hmm, shouldn't glibc also provide /usr/share/msgids/hurd.msgids,
+ then?
+ <antrik> jkoenig: not necessarily. the msgids describe what the servers
+ actually understand. if the stubs are missing from libhurduser, that's no
+ reason to leave out the msgids...
<jkoenig> ok this makes sense
+# IRC, OFTC, #debian-hurd, 2011-09-29
+ <tschwinge> pinotree: I don't like their existence. IMO (but I haven't
+ researched this in very much detail), every user of RPC stubs should
+ generated them for themselves (and glibc should directly include the
+ stubs it uses internally).
+ <pinotree> sounds fair
+ <pinotree> maybe they could be moved from glibc to hurd?
+ <tschwinge> pinotree: Yeah; someone needs to research why we have them (or
+ if it's only convenience), and whether we want to keep them.
+ <pinotree> you could move them to hurd, leaving them unaltered, so binary
+ compatibility with eventual 3rd party users is not broken
+ <pinotree> but those using them, other than hurd itself, won't compile
+ anymore, so you fix them progressively
diff --git a/open_issues/mach-defpager_vs_defpager.mdwn b/open_issues/mach-defpager_vs_defpager.mdwn
index d6976706..f03bc67f 100644
--- a/open_issues/mach-defpager_vs_defpager.mdwn
+++ b/open_issues/mach-defpager_vs_defpager.mdwn
@@ -1,4 +1,4 @@
-[[!meta copyright="Copyright © 2010 Free Software Foundation, Inc."]]
+[[!meta copyright="Copyright © 2010, 2011 Free Software Foundation, Inc."]]
[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
id="license" text="Permission is granted to copy, distribute and/or modify this
@@ -10,16 +10,24 @@ License|/fdl]]."]]"""]]
[[!tag open_issue_gnumach open_issue_hurd]]
-\#hurd, 2010, end of May / beginning of June
+IRC, freenode, #hurd, end of May/beginning of June 2010
<cfhammar> whats the difference between mach-defpager and defpager?
- <cfhammar> i'm guessing defpager is a hurdish version that uses libstore but was never finished or something
- <cfhammar> found an interesting thread about it:
+ <cfhammar> i'm guessing defpager is a hurdish version that uses libstore
+ but was never finished or something
+ <cfhammar> found an interesting thread about it:
<slpz> antrik: an interesting thread, indeed :-)
- <pochu> slpz: btw is mach-defpager linked statically but not called mach-defpager.static on purpose?
- <slpz> antrik: also, I can confirm that mach-defpager needs a complete rewrite ;-)
+ <pochu> slpz: btw is mach-defpager linked statically but not called
+ mach-defpager.static on purpose?
+ <slpz> antrik: also, I can confirm that mach-defpager needs a complete
+ rewrite ;-)
<slpz> pochu: I think the original defpager was launched by serverboot
<slpz> pochu: that could be the reason to have it static, like ext2fs
- <slpz> and since there's no need to execute it again during the normal operation of the system, they probably decided to not create a dynamically linked version
+ <slpz> and since there's no need to execute it again during the normal
+ operation of the system, they probably decided to not create a
+ dynamically linked version
<slpz> (but I'm just guessing)
- <slpz> of perhaps they wanted to prevent mach-defpager from the need of reading libraries, since it's used when memory is really scarce (guessing again)
+ <slpz> of perhaps they wanted to prevent mach-defpager from the need of
+ reading libraries, since it's used when memory is really scarce (guessing
+ again)
diff --git a/open_issues/mach_vm_pageout.mdwn b/open_issues/mach_vm_pageout.mdwn
new file mode 100644
index 00000000..dac7fe28
--- /dev/null
+++ b/open_issues/mach_vm_pageout.mdwn
@@ -0,0 +1,19 @@
+[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]]
+[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
+id="license" text="Permission is granted to copy, distribute and/or modify this
+document under the terms of the GNU Free Documentation License, Version 1.2 or
+any later version published by the Free Software Foundation; with no Invariant
+Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license
+is included in the section entitled [[GNU Free Documentation
+[[!tag open_issue_gnumach]]
+IRC, freenode, #hurd, 2011-09-09
+ <slpz> It's amazing how broken some parts of Mach's VM are
+ <slpz> currently, it doesn't even keep track of the number of external
+ pages in the lists
+ <slpz> and vm_pageout_scan produces a hang if want_pages == FALSE (which
+ never is, because vm_page_external_count is always 0)
diff --git a/open_issues/osf_mach.mdwn b/open_issues/osf_mach.mdwn
new file mode 100644
index 00000000..d689bfcb
--- /dev/null
+++ b/open_issues/osf_mach.mdwn
@@ -0,0 +1,237 @@
+[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]]
+[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
+id="license" text="Permission is granted to copy, distribute and/or modify this
+document under the terms of the GNU Free Documentation License, Version 1.2 or
+any later version published by the Free Software Foundation; with no Invariant
+Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license
+is included in the section entitled [[GNU Free Documentation
+[[!tag open_issue_glibc open_issue_gnumach open_issue_hurd]]
+IRC, freenode, #hurd, 2011-09-07
+ <slpz> tschwinge: do you think that should be possible/convenient to
+ maintain hurd and glibc versions for OSF Mach as branches in the offical
+ git repo?
+ <tschwinge> Is OSF Mach the MkLinux one?
+ <slpz> Yes, it is
+ <tschwinge> slpz: If there's a suitable license, then yes, of course!
+ <tschwinge> Unless there is a proper upstream, of course.
+ <tschwinge> But I don't assume there is?
+ <tschwinge> slpz: What is interesting for us about OSF Mach?
+ <slpz> tschwinge: Peter Bruin and Jose Marchesi did a gnuified version some
+ time ago (gnu-osfmach), so I suppose the license is not a problem. But
+ I'm going to check it, though
+ <slpz> OSF Mach has a number of interesting features
+ <slpz> like migrating threads, advisory pageout, clustered pageout, kernel
+ loaded tasks, short circuited RPC...
+ <tschwinge> Oh!
+ <tschwinge> Good.
+ <slpz> right now I'm testing if it's really worth the effort
+ <tschwinge> Yes.
+ <tschwinge> But if the core codebase is the same (is it?) it may be
+ possible to merge some things?
+ <tschwinge> If the changes can be identified reasonably...
+ <slpz> comparing performance of the specialized RPC of OSF Mach with
+ generic IPC
+ <slpz> That was my first intention, but I think that porting all those
+ features will be much more work than porting Hurd/glibc to it
+ <braunr> slpz: ipc performance currently matters less than clustered
+ pageouts
+ <braunr> slpz: i'm really not sure ..
+ <braunr> i'd personnally adapt the kernel
+ <slpz> braunr: well, clustered pageouts is one of the changes that can be
+ easily ported
+ <slpz> braunr: We can consider OSF Mach code as reasonably stable, and
+ porting its features to GNU Mach will take us to the point of having to
+ debug all that code again
+ <slpz> probably, the hardest feature to be ported is migrating threads
+ <braunr> isn't that what was tried for gnu mach 2 ? or was it only about
+ oskit ?
+ <slpz> IIRC only oskit
+ <tschwinge> slpz: But there have been some advancements in GNU Mach, too.
+ For example the Xen port.
+ <tschwinge> But wen can experiment with it, of course.
+ <slpz> tschwinge: I find easier to move the Xen support from GNU Mach to
+ OSF Mach, than porting MT in the other direction
+ <tschwinge> slpz: And I think MkLinux is a single-server, so I don't this
+ they used IPC as much as we did?
+ <tschwinge> slpz: OK, I see.
+ <braunr> slpz: MT aren't as needed as clustered pageouts :p
+ <braunr> gnumach already has ipc handoff, so MT would just consume less
+ stack space, and only slightly improve raw ipc performance
+ <tschwinge> slpz: But we will surely accept patches that get the Hurd/glibc
+ ported to OSF Mach, no question.
+ <braunr> (it's required for other issues we discussed already, but not a
+ priority imo)
+ <slpz> tschwinge: MkLinux makes heavy use of IPC, but it tries to
+ "short-circuit" it when running as a kernel loaded task
+ <tschwinge> And it's obviously best to keep it in one place. Luckily it's
+ not CVS branches anymore... :-)
+ <slpz> braunr: well, I'm a bit obsessed with IPC peformance, if the RPC on
+ OSF Mach really makes a difference, I want it for Hurd right now
+ <slpz> braunr: clustered pages can be implemented at any time :-)
+ <slpz> tschwinge: great!
+ <tschwinge> slpz: In fact, haven'T there already been some Savannah
+ repositories created, several (five?) years ago?
+ <braunr> slpz: the biggest performance issue on the hurd is I/O
+ <braunr> and the easiest way to improve that is better VM transfers
+ <slpz> tschwinge: yes, the HARD project, but I think it wasn't too well
+ received...
+ <tschwinge> slpz: Quite some things changed since then, I'd say.
+ <slpz> braunr: I agree, but IPC is the hardest part to optimize
+ <slpz> braunr: If we have a fast IPC, the rest of improvements are way
+ easier
+ <braunr> slpz: i don't see how faster IPC makes I/O faster :(
+ <braunr> slpz: read
+ again :)
+ <slpz> braunr: IPC puts the upper limit of how fast I/O could be
+ <braunr> the abstract for my thesis on x15 mach was that the ipc code was
+ the most focused part of the kernel
+ <braunr> so my approach was to optimize everything *else*
+ <braunr> the improvements in UVM (and most notably clustered page
+ transfers) show global system improvements up to 30% in netbsd
+ <braunr> we should really focus on the VM first (which btw, is a pain in
+ the ass with the crappy panicking swap code in place)
+ <braunr> and then complete the I/O system
+ <slpz> braunr: If a system can't transfer data between translators faster
+ than 100 MB/s, faster devices doesn't make much sense
+ <guillem> has anyone considered switching the syscalls to use
+ sysenter/syscall instead of soft interrupts?
+ <slpz> braunr: but I agree on the VM part
+ <braunr> guillem: it's in my thesis .. but only there :)
+ <braunr> slpz: let's reach 100 MiB/s first, then improve IPC
+ <slpz> guillem: that's a must do, also moving to 64 bits :-)
+ <braunr> guillem: there are many tiny observations in it, like the use of
+ global page table entries, which was added by youpi around that time
+ <guillem> slpz: I wanted to fix all warnings first before sending my first
+ batch of 64 bit fixes, but I think I'll just send them after checking
+ they don't introduce regressions on i386
+ <guillem> braunr: interesting I think I might have skimmed over your
+ thesis, maybe I should read it properly some time :)
+ <slpz> braunr: I see exactly as the opposite. First push IPC to its limit,
+ then improve devices/VM
+ <slpz> guillem: that's great :-)
+ <braunr> slpz: improving ipc now will bring *nothing*, whereas improving
+ vm/io now will make the system considerably more useable
+ <guillem> but then fixing 64-bit issues in the Linux code is pretty
+ annoying given that the latest code from upstream has that already fixed,
+ and we are “supposed” to drop the linux code from gnumach at some point
+ :)
+ <braunr> slpz: that's a basic principle in profiling, improve what brings
+ the best gains
+ <slpz> braunr: I'm not thinking about today, I'm thinking about how fast
+ Hurd could be when running on Mach. And, as I said, IPC is the absolute
+ upper limit.
+ <braunr> i'm really not convinced
+ <braunr> there are that many tasks making extensive use of IPCs
+ <braunr> most are cpu/IO bound
+ <slpz> but I have to acknowledge that this concern has been really
+ aliviated by the EPT improvement discovery
+ <braunr> there aren't* that many tasks
+ <slpz> braunr: create a ramdisk an write some files on it
+ <slpz> braunr: there's no I/O in that case, an performance it's really low
+ too
+ <braunr> well, ramdisks don't even work correctly iirc
+ <slpz> I must say that I consider improvements in OOL data moving as if it
+ were in IPC itself
+ <slpz> braunr: you can simulate one with storeio
+ <braunr> slpz: then measure what's slow
+ <braunr> slpz: it couldn't simply be the vm layer
+ <slpz> braunr:
+ <braunr> ok, it's not a true ramdisk
+ <braunr> it's a stack of a ramdisk and extfs servers
+ <braunr> ext2fs*
+ <braunr> i was thinking about tmpfs
+ <slpz> True, but one of Hurd main advantages is the ability of doing that
+ kind of things
+ <slpz> so they must work with a reasonable performance
+ <braunr> other systems can too ..
+ <braunr> anyway
+ <braunr> i get your point, you want faster IPCs, like everyone does
+ <slpz> braunr: yes, and I also want to know how fast could be, to have a
+ reference when profiling complex services
+ <antrik> slpz: really improving IPC performance probably requires changing
+ the semantics... but we don't know which semantics we want until we have
+ actually tried fixing the existing bottlenecks
+ <antrik> well, not only bottlenecks... also other issues such as resource
+ management
+ <slpz> antrik: I think fixing bottlenecks would probably require changes in
+ some Mach interfaces, not in the IPC subsystem
+ <slpz> antrik: I mean, IPC semantics just provide the basis for messaging,
+ I don't think we will need to change them further
+ <antrik> slpz: right, but only once we have addressed the bottlenecks (and
+ other major shortcomings), we will know how the IPC mechanisms needs to
+ change to get further improvements...
+ <antrik> of course improving Mach IPC performance is interesting too -- if
+ nothing else, then to see how much of a difference it really makes... I
+ just don't think it should be considered an overriding priority :-)
+ <youpi> slpz: I agree with braunr, I don't think improving IPC will bring
+ much on the short term
+ <youpi> the buildds are slow mostly because of bad VM
+ <youpi> like lack of read-ahead, the randomness of object cache pageout,
+ etc.
+ <youpi> that doesn't mean IPC shouldn't be improved of course
+ <youpi> but we have a big margin for iow
+ <youpi> s/iow/now
+ <slpz> youpi: I agree with you and with braunr in that regard. I'm not
+ looking for an inmediate improvement, I just want to see how fast the IPC
+ (specially, OOL data transfers) could be.
+ <slpz> also, migrating threads will help to fix some problems related with
+ resource management
+ <antrik> slpz: BTW, what about Apple's Mach? isn't it essentialy OSF Mach
+ with some further improvements?...
+ <slpz> antrik: IPC is an area with very little room for improvement, so I
+ don't we will fix that bottlenecks by applying some changes there
+ <antrik> well, for large OOL transfers, the limiting facter is certainly
+ also VM rather than the thread model?...
+ <slpz> antrik: yes, but I think is encumbered with the APPLv2 license
+ <antrik> ugh
+ <slpz> antrik: for OOL transfers, VM plays a big role, but IPC also has
+ great deal of responsibility
+ <antrik> as for resource management, migrating threads do not really help
+ much IMHO, as they only affect CPU scheduling. memory usage is a much
+ more pressing issue
+ <antrik> BTW, I have thought about passive objects in the past, but didn't
+ reach any conclusion... so I'm a bit ambivalent about migrating threads
+ :-)
+ <slpz> As an example, in Hurd on GNU Mach, an io_read can't take advantage
+ from copy-on-write, as buffers from the translator always arrive outside
+ user's buffer
+ <slpz> antrik: well, I think cpu scheduling is a big deal ;-)
+ <slpz> antrik: and for memory management, until a better design is
+ implemented, some fixes could be applied to get us to the same level as a
+ monolithic kernel
+ <antrik> to get even close to monolithic systems, we need either a way to
+ account server resources used on client's behalf, or to make servers use
+ client-provided resources. both require changes in the IPC mechanism I
+ think...
+ <antrik> (though *if* we go for the latter option, the CPU scheduling
+ changes of migrating threads would of course be necessary, in addition to
+ any changes regarding memory management...)
+ <antrik> slpz: BTW, I didn't get the point about io_read and COW...
+ <slpz> antrik: AFAIK, the FS cache (which is our primary concern) in most
+ monolithic system is agnostic with respect the users, and only deals with
+ absolute numbers. In our case we can do almost the same by combining Mach
+ and pagers knowledege.
+ <antrik> slpz: my primary concern is that anything program having a hiccup
+ crashes the system... and I'm not sure this can be properly fixed without
+ working memory accounting
+ <antrik> (I guess in can be worked around to some extent by introducing
+ various static limits on processes... but I'm not sure how well)
+ <antrik> it can
+ <slpz> antrik: monolithic system also suffer that problem (remember fork
+ bombs) and it's "solved" by imposing static limits to user processes
+ (ulimit).
+ <slpz> antrik: we do have more problems due to port management, but I think
+ some degree of control can be archieved with a reasonably amount of
+ changes.
+ <antrik> slpz: in a client-server architecture static limits are much less
+ effective... that problem exists on traditional systems too, but only in
+ some specific cases (such as X server); while on a microkernel system
+ it's ubiquitous... that's why we need a *better* solution to this problem
+ to get anywhere close to monolithic systems
diff --git a/open_issues/performance/degradation.mdwn b/open_issues/performance/degradation.mdwn
index db759308..8c9a087c 100644
--- a/open_issues/performance/degradation.mdwn
+++ b/open_issues/performance/degradation.mdwn
@@ -10,8 +10,12 @@ License|/fdl]]."]]"""]]
[[!meta title="Degradation of GNU/Hurd ``system performance''"]]
-Email, *id:""* (bug-hurd, 2011-07-25,
-Thomas Schwinge)
+[[!tag open_issue_gnumach open_issue_hurd]]
+# Email, `id:""` (bug-hurd, 2011-07-25, Thomas Schwinge)
> Building a certain GCC configuration on a freshly booted system: 11 h.
> Remove build tree, build it again (2nd): 12 h 50 min. Huh. Remove build
@@ -27,9 +31,8 @@ IRC, freenode, #hurd, 2011-07-23:
are some serious fragmentation issues
< braunr> antrik: both could be induced by fragmentation
-During [[IPC_virtual_copy]] testing:
+# During [[IPC_virtual_copy]] testing
IRC, freenode, #hurd, 2011-09-02:
@@ -38,3 +41,8 @@ IRC, freenode, #hurd, 2011-09-02:
800 fifteen minutes ago)
<braunr> manuel: i observed the same behaviour
+# IRC, freenode, #hurd, 2011-09-22
+See [[/open_issues/pagers]], IRC, freenode, #hurd, 2011-09-22.
diff --git a/open_issues/performance/io_system/clustered_page_faults.mdwn b/open_issues/performance/io_system/clustered_page_faults.mdwn
index 9e20f8e1..a3baf30d 100644
--- a/open_issues/performance/io_system/clustered_page_faults.mdwn
+++ b/open_issues/performance/io_system/clustered_page_faults.mdwn
@@ -137,3 +137,26 @@ License|/fdl]]."]]"""]]
where the pager interface needs to be modified, not the Mach one?...
<braunr> antrik: would be nice wouldn't it ? :)
<braunr> antrik: more probably the page fault handler
+# IRC, freenode, #hurd, 2011-09-28
+ <slpz> antrik: I've just recovered part of my old multipage I/O work
+ <slpz> antrik: I intend to clean and submit it after finishing the changes
+ to the pageout system.
+ <antrik> slpz: oh, great!
+ <antrik> didn't know you worked on multipage I/O
+ <antrik> slpz: BTW, have you checked whether any of the work done for GSoC
+ last year is any good?...
+ <antrik> (apart from missing copyright assignments, which would be a
+ serious problem for the Hurd parts...)
+ <slpz> antrik: It was seven years ago, but I did:
+ :-)
+ <slpz> antrik: Sincerely, I don't think the quality of that code is good
+ enough to be considered... but I think it was my fault as his mentor for
+ not correcting him soon enough...
+ <antrik> slpz: I see
+ <antrik> TBH, I feel guilty myself, for not asking about the situation
+ immediately when he stopped attending meetings...
+ <antrik> slpz: oh, you even already looked into vm_pageout_scan() back then
+ :-)
diff --git a/open_issues/performance/ipc_virtual_copy.mdwn b/open_issues/performance/ipc_virtual_copy.mdwn
index 00fa7180..9708ab96 100644
--- a/open_issues/performance/ipc_virtual_copy.mdwn
+++ b/open_issues/performance/ipc_virtual_copy.mdwn
@@ -356,3 +356,40 @@ IRC, freenode, #hurd, 2011-09-06:
<youpi> in PV it does not make sense: the guest already provides the
translated page table
<youpi> which is just faster than anything else
+IRC, freenode, #hurd, 2011-09-09:
+ <antrik> oh BTW, for another data point: dd zero->null gets around 225 MB/s
+ on my lowly 1 GHz Pentium3, with a blocksize of 32k
+ <antrik> (but only half of that with 256k blocksize, and even less with 1M)
+ <antrik> the system has been up for a while... don't know whether it's
+ faster on a freshly booted one
+IRC, freenode, #hurd, 2011-09-15:
+ <sudoman>
+ <sudoman> so is the dd command pointed to by that article a measure of io
+ performance?
+ <antrik> sudoman: no, not really
+ <antrik> it's basically the baseline of what is possible -- but the actual
+ slowness we experience is more due to very unoptimal disk access patterns
+ <antrik> though using KVM with writeback caching does actually help with
+ that...
+ <antrik> also note that the title of this post really makes no
+ sense... nested page tables should provide similar improvements for *any*
+ guest system doing VM manipulation -- it's not Hurd-specific at all
+ <sudoman> ok, that makes sense. thanks :)
+IRC, freenode, #hurd, 2011-09-16:
+ <slpz> antrik: I wrote that article (the one about How AMD/Intel fixed...)
+ <slpz> antrik: It's obviously a bit of an exaggeration, but it's true that
+ nested pages supposes a great improvement in the performance of Hurd
+ running on virtual machines
+ <slpz> antrik: and it's Hurd specific, as this system is more affected by
+ the cost of page faults
+ <slpz> antrik: and as the impact of virtualization on the performance is
+ much higher than (almost) any other OS.
+ <slpz> antrik: also, dd from /dev/zero to /dev/null it's a measure on how
+ fast OOL IPC is.
diff --git a/open_issues/resource_management_problems.mdwn b/open_issues/resource_management_problems.mdwn
index 1558bebb..8f752d61 100644
--- a/open_issues/resource_management_problems.mdwn
+++ b/open_issues/resource_management_problems.mdwn
@@ -77,6 +77,10 @@ IRC, freenode, #hurd, 2011-07-31
# Further Examples
+ * [[hurd/critique]]
* [[IO_accounting]]
+ * [[translators_set_up_by_untrusted_users]], and [[pagers]]
* [[configure max command line length]]
diff --git a/open_issues/resource_management_problems/pagers.mdwn b/open_issues/resource_management_problems/pagers.mdwn
new file mode 100644
index 00000000..4c36703c
--- /dev/null
+++ b/open_issues/resource_management_problems/pagers.mdwn
@@ -0,0 +1,322 @@
+[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]]
+[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
+id="license" text="Permission is granted to copy, distribute and/or modify this
+document under the terms of the GNU Free Documentation License, Version 1.2 or
+any later version published by the Free Software Foundation; with no Invariant
+Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license
+is included in the section entitled [[GNU Free Documentation
+[[!tag open_issue_gnumach]]
+# IRC, freenode, #hurd, 2011-09-14
+Coming from [[translators_set_up_by_untrusted_users]], 2011-09-14 discussion:
+ <slpz> antrik: I think a tunable option for preventing non-root users from
+ creating pagers and attaching translators could also be desirable
+ <antrik> slpz: why would you want to prevent creating pagers and attaching
+ translators?
+ <tschwinge> Preventing resource exhaustion, I guess.
+ <slpz> antrik: security and (as tschwinge says) for prevent a rouge pager
+ from exhausting the system.
+ <slpz> antrik: without the ability to use translators for non-root users,
+ Hurd can provide (almost) the same level of resource protection than
+ other *nixes
+See also: [[translators_set_up_by_untrusted_users]],
+ <braunr> the hurd is about that though
+ <slpz> there should be also a limit on the number of outstanding requests
+ that a task can have, and some other easily traceable values
+ <braunr> port messages queues have limits
+ <antrik> slpz: anything can exhaust the system. there are much more basic
+ limits that are missing... and I don't see how translators or pagers are
+ special in that regard
+ <slpz> braunr: that's what I said tunable. If I don't share my computer
+ with untrusted users, I want full functionality. Otherwise, I can enable
+ that limitation
+ <slpz> braunr: but I think those limits are on reception
+ <braunr> that's a wrong solution
+ <slpz> antrik: because pagers are external memory objects, and those are
+ treated differently
+ <braunr> compared to what ?
+ <braunr> and yes, the limit is on the message queue, on reception
+ <braunr> why is that a problem ?
+ <slpz> antrik: forbidding the use of translator was for security, to avoid
+ the problem of traversing an untrusted FS
+ <slpz> braunr: compared to anonymous memory
+ <slpz> braunr: because if the limit is on reception, a task can easily do a
+ DoS against a server
+ <braunr> hm actually, the problems we have with swap handling is that
+ anonymous memory is handled in a very similar way as other objects
+ <slpz> braunr: I want to limit the number of outstanding (unprocessed
+ messages in queues) requests
+ <braunr> slpz: the solution isn't about forbidding the use of translators,
+ but changing common code (libc i guess) not to use them, they can still
+ run beside
+ <slpz> braunr: that's because, currently, the external page limit is not
+ enforced
+ <braunr> i'm also not sure about DoS attacks
+ <braunr> if i'm right, there is often one port for each managed object,
+ which usually exist per client
+ <slpz> braunr: yes, that could an option too (for translators, not for
+ pagers)
+ <braunr> i don't see how pagers wouldn't be translators on the hurd
+ <slpz> braunr: all pagers are translators, but not all translators are
+ pagers ;-)
+ <braunr> so if it works for translators, it also works for pagers
+ <slpz> braunr: it would fix the security issue, but not the resource
+ exhaustion problem, with only affects to pagers
+ <braunr> i just don't see a point in implementing resource limits before
+ even fixing other fundamental issues
+ <braunr> the only way to avoid resource exhaustion is resource limits
+ <antrik> slpz: just not following untrusted translators is much more useful
+ than forbidding them alltogether
+ <braunr> and the main problem of mach is resource accounting
+ <braunr> so first, fix that, using the critique as a starting point
+ <slpz> braunr: i'm not saying that this should be implemented right now,
+ i'm just pointing out this possibility
+ <braunr> i think we're all mostly aware of it
+ <slpz> braunr: resource accounting, as it's expressed in the critique,
+ would be wonderful, but it's just too complex IMHO
+ <braunr> it requires carefully designed changes to the interface yes
+ <slpz> to the interface, to the internals, to user space tasks...
+ <braunr> the internals wouldn't be impacted that much
+ <braunr> user space tasks would mostly include hurd servers
+ <braunr> if the changes are centralized in libraries, it should be easy to
+ provide to the servers
+# IRC, freenode, #hurd, 2011-09-22
+ <slpz> antrik: I've also implemented a simple resource control on dirty
+ pages and changed pageout_scan to free external pages, and only touch
+ anonymous memory if it's really needed
+ <slpz> antrik: those combined make the system work better under heavy load
+ <slpz> antrik: 1.5 GB of RAM and another 1.5 GB of swap helps a lot, too
+ :-)
+ <antrik> hm... I'm not sure what these things mean exactly TBH... but I
+ wonder whether some of these could fix the performance degradation (and
+ ultimate crash) I described recently...
+[[/open_issues/default_pager]], [[system performance degradation
+ <antrik> care to explain them to a noob like me?
+ <slpz> probably not. During my tests, I've noticed that, at some points,
+ the system performance starts to degrade, and this doesn't change until
+ it's restarted
+ <slpz> but I wasn't able to create a test case to reproduce the bug...
+ <slpz> antrik: Sure. First, I've changed GNU Mach to:
+ <slpz> - Classify all pages from data_supply as external, and count them
+ in vm_page_external_count (previously, this variable was always zero)
+ <slpz> - Count all pages for which a data_unlock has been requested as
+ potentially dirty pages
+ <antrik> there is one important bit I forgot to mention in my recent
+ report: one "reliable" way to cause growing swap usage is simply
+ installing a lot of debian packages (e.g. running an apt-get upgrade)
+ <antrik> some other kinds of I/O also seem to have such an effect, but I
+ wasn't able to pinpoint specific situations
+ <slpz> - Establish a limit on how many potentially dirty pages are
+ allowed. If it's reached, a notification (right now it's just a bogus
+ m_o_data_unlock, to avoid implementing a new RPC) it's sent to the pager
+ which has generated the page fault
+ <slpz> - Establish a hard limit on those dirt pages. If it's reached,
+ threads asking for a data_unlock are blocked until someone cleans some
+ pages. This should be improved with a forced pageout, if needed.
+ <slpz> - And finally, in vm_pageout_scan, run over the inactive queue
+ searching for clean, external pages, freeing them. If it's not possible
+ to free enough pages, or if vm_page_external_count is less than 10% of
+ system's memory, the "normal" pageout is used.
+ <slpz> I need to clean up things a little, but I want to send a preliminary
+ patch to bug-hurd ASAP, to have more people testing it.
+ <slpz> antrik: Do you thing that performance degradation can be related
+ with the number of threads of your ext2fs translators?
+ <antrik> slpz: hm... I didn't watch that recently; but in the past, I
+ observe that the thread count is pretty constant after it reaches
+ something like 14000 on heavy load...
+ <antrik> err... wait, 14000 was ports :-)
+ <antrik> I doubt my system would survive 14000 threads ;-)
+ <antrik> don't remember thread count... I guess I should start watching
+ this again
+ <slpz> antrik: I was thinking that 14000 threads sound like a lot :-)
+ <slpz> what I know for sure, is that when operating with large files, the
+ deactivation of all pages of the memory object which is done after every
+ operation really hurts to performance
+ <antrik> right now my root FS has 5100 ports and a mere 71 thread... but
+ then, it's almost freshly booted :-)
+ <slpz> that's why I've just commented that operation in my code, since it's
+ not really needed anymore :-)
+ <slpz> anyway, after submitting all my pending mails to bug-hurd, I'll try
+ to hunt that bug. Sounds funny.
+ <antrik> regarding your explanation, I'm still trying to wrap my head
+ around some of the details. I must admit that I don't remember what
+ data_unlock does... or maybe I never fully understood it
+ <antrik> the limit on dirty pages is global?
+ <slpz> yes, right now it's global
+ <marcusb> I try to find the old discussion of the thread storm stuff
+ <marcusb> there was some concern about deadlocks
+ <slpz> marcusb: yes, because we were talking about putting an static limit
+ for the server threads of a translators
+ <slpz> marcusb: and that was wrong (my fault, I was even dumber back then
+ :-P)
+ <marcusb> oh boy digging in old mail is no fun. first I see mistakes in my
+ english. then I see quite complicated pager stuff I don't ever remember
+ touching. but there is a patch, and it has my name on it
+ <marcusb> I think I lost a couple of the early years of my hurd hacking :)
+ <antrik> hm... I reread the chapter on locking, and it's still above me :-(
+ <marcusb> not sure what you are talking about, but if there are any
+ specific questions...
+ <antrik> marcusb: external pager interface
+ <marcusb> uuuuh ;)
+ <antrik> memory_object_lock_request(), memory_object_lock_completed(),
+ memory_object_data_unlock()
+ <marcusb> is that from the mach manual?
+ <antrik> yes
+ <antrik> I didn't really understand that part when I first read it a couple
+ of years ago, and I still don't understand it now :-(
+ <marcusb> I am sure I didn't understand it either
+ <marcusb> and maybe I missed my window :)
+ <marcusb> let's see
+ <antrik> hehe
+ <antrik> slpz: what exactly do you mean by "the pager which has generated
+ the page fault"?
+ <antrik> marcusb: essentially I'm trying to understand the explanation of
+ the changes slpz did, but there are several bits totally obscure to me
+ :-(
+ <slpz> antrik: when a I/O operation is requested to ext2fs, it maps the
+ object in question to it's own space, and then memcpy's from/to there
+ <slpz> antrik: so the translator (which is also a pager) is the one who
+ generates the page fault
+ <marcusb> yeah
+ <marcusb> antrik: it's important to understand which messages are sent by
+ the kernel to the manager and which are sent the other way
+ <marcusb> if the dest port is memory_object_t, that indicates a msg from
+ kernel to manager. if it is memory_object_control_t, it's a msg from
+ manager to kernel
+ <slpz> antrik: m_o_lock_request it's used by the pager to "settle" the
+ status of a memory object, m_o_lock_completed is the answer from the
+ kernel when the lock has been completed (only if the client has requested
+ to be notified), and m_o_data_unlock is a request from the kernel to
+ change the level of protection for a page (it's called from vm_fault.c)
+ <marcusb> slpz: but it's not pagers generating page faults, but users of
+ the memory object on the other side
+ <antrik> marcusb: well, I think the direction is clear to me... but the
+ purpose not really :-)
+ <marcusb> ie a client that mapped a file
+ <slpz> antrik: in ext2fs, all pages are initially provided to the kernel
+ (via data_supply) write protected. When a write operation is done over
+ one of those pages, a page fault it's generated, which sends a
+ m_o_data_unlock to the pager, which answers (if convenient) which a
+ page_lock decreasing the protection level
+ <marcusb> antrik: one use of lock_request is when you want to shut down
+ cleanly and want to get the dirty pages written back to you from the
+ kernel.
+ <marcusb> antrik: the other thing may be COW strategies
+ <slpz> marcusb: well, pagers and clients are in the same task for most
+ translators, like ext2fs
+ <marcusb> slpz: oh.
+ <slpz> marcusb: but yes, a read operation in a mmap'ed file would trigger
+ the fault in a client user task
+ <marcusb> slpz: I think I forgot everything about pagers :)
+ <slpz> marcusb: pager-memcpy.c is the key :-)
+ <marcusb> slpz: what becomes of the fault then? the kernel sees it's a
+ mapped memory object. will it then talk to the manager or to a pager?
+ <antrik> slpz: the translator causes the faults itself when it handles
+ io_read()/io_write() requests I suppose, as opposed to clients accessing
+ mmap()ed objects which then generate the faults?...
+ <antrik> ah, that's actually what you already said above :-)
+ <slpz> marcusb: I'm not sure what do you mean by "manager"...
+ <marcusb> manager == memory object
+ <marcusb> mh
+ <slpz> marcusb: for all external objects, it will ask to their current
+ pager
+ <marcusb> slpz: I think I am missing a couple of details, so nevermind.
+ It's starting to come back to me, but I am a bit afraid of that ;)
+ <marcusb> what I love about the Hurd is how damn readable the code is
+ <marcusb> considering it's an object system, it's so much nicer to read
+ than gtk stuff
+ <slpz> when you get the big picture, it's actually somewhat fun to see how
+ data moves around just to fulfill a simple read()
+ <marcusb> you should make a diagram!
+ <marcusb> bonus point for animated video ;)
+ <slpz> marcusb: heh, take a look at the hurd specific parts of glibc... I
+ cry in pain every time a do that...
+ <marcusb> slpz: oh yeah, rdwr-internal.
+ <marcusb> oh man
+ <marcusb> slpz: funny thing, I just looked at them the other day because of
+ the security issue
+ <slpz> marcusb: I think there was one, maybe a slice from someone's
+ presentation...
+ <marcusb> I think I was always confused about the pager/memobj/kernel
+ interactions
+ <slpz> marcusb: I'm barely able to read Roland's glibc code. I think it's
+ out of my reach.
+ <antrik> marcusb: I think part of the problem is confusing terminology
+ <marcusb> it's good that you are instrumenting the mach kernel to see
+ what's actually going on in there. it was a black book for me, but neal
+ too a peek and got a much better understanding of the performance issues
+ than I ever did
+ <antrik> when talking about "pager", we usually mean the process doing the
+ paging; but in mach terminology this actually seems to be the "manager",
+ while a "pager" is an individual object in the manager process... or
+ something like that ;-)
+ <marcusb> antrik: I just never took a look at the big picture. I look at
+ the parts
+ <marcusb> I knew the tail, ears, and legs of the elephant.
+ <marcusb> it's a lot of code for a beginner
+ <antrik> I never understood the distinction between "pager" and "memory
+ object" though...
+ <antrik> maybe "pager" refers to the object in the external pager, while
+ "memory object" is the part managed in Mach itself?...
+ <marcusb> memory object is a real object, to which you can send messages.
+ it's implemented in the server
+ <antrik> hm... maybe it's the other way around then ;-)
+ <marcusb> there is also the default pager
+ <marcusb> I think the pager is just another name for the process that
+ serves the memory object (default pager == memory object for anonymous
+ memory == swap)
+ <marcusb> but!
+ <marcusb> there is also libpager
+ <marcusb> and that's a more complicated beast
+ <antrik> actually, the correct term seems to be "default memory manager"...
+ <marcusb> yeah
+ <marcusb> from mach's pov
+ <marcusb> we always called it default pager in the Hurd
+ <antrik> marcusb: problem is that "pager" is sometimes used in the Mach
+ documentation to refer to memory object ports IIRC
+ <marcusb> isn't it defpager executable?
+ <marcusb> could be
+ <marcusb> it's the same thing, really
+ <antrik> indeed, the program implementing the default memory manager is
+ called "default pager"... so the terminology is really inconsistent
+ <marcusb> the hurd's pager library is a high level abstraction for mach's
+ external memory object interface.
+ <marcusb> i wouldn't worry about it too much
+ <antrik> I never looked at libpager
+ <marcusb> you should!
+ <marcusb> it's an important beast
+ <antrik> never seemed relevant to anything I did so far...
+ <antrik> though maybe it would help understanding
+ <marcusb> it's related to what you are looking now :)
diff --git a/open_issues/rework_gnumach_ipc_spaces.mdwn b/open_issues/rework_gnumach_ipc_spaces.mdwn
index b3d1b4a4..7c66776b 100644
--- a/open_issues/rework_gnumach_ipc_spaces.mdwn
+++ b/open_issues/rework_gnumach_ipc_spaces.mdwn
@@ -10,7 +10,7 @@ License|/fdl]]."]]"""]]
[[!tag open_issue_gnumach]]
# IRC, freenode, #hurd, 2011-05-07
diff --git a/open_issues/translators_set_up_by_untrusted_users.mdwn b/open_issues/translators_set_up_by_untrusted_users.mdwn
index 36fe5438..97f48bba 100644
--- a/open_issues/translators_set_up_by_untrusted_users.mdwn
+++ b/open_issues/translators_set_up_by_untrusted_users.mdwn
@@ -324,3 +324,24 @@ do bear some similarity with the issue we're discussing here.
<youpi> it should be one's normal right to change the view one has of it
<antrik> we discussed that once actually I believe...
<antrik> err... private namespaces I mean
+IRC, freenode, #hurd, 2011-09-10:
+ <cjuner_> I am rereading Neal Walfield's and Marcus Brinkman's critique of
+ the hurd on mach. One of the arguments is that a file system may be
+ malicious (by DoS its clients with infinitely deep directory
+ hierarchies). Is there an answer to that that does not require programs
+ to be programmed defensively against such possibilities?
+IRC, freenode, #hurd, 2011-09-14:
+ <antrik> cjuner: regarding malicious filesystems: the answer is to do
+ exactly the same as FUSE on Linux: don't follow translators set up by
+ untrusted users by default
+ <cjuner> antrik, but are legacy programs somehow protected? What about
+ executing `find`? Or is GNU's find somehow protected from that?
+ <antrik> cjuner: I'm talking about a global policy
+ <cjuner> antrik, and who would implement that policy?
+ <antrik> cjuner: either glibc or the parent translators
+Continued discussion about [[resource_management_problems/pagers]].