IRC.

author: Thomas Schwinge <tschwinge@gnu.org> 2012-12-11 11:04:26 +0100
committer: Thomas Schwinge <tschwinge@gnu.org> 2012-12-11 11:04:26 +0100
commit: bcfc058a332da0a2bd2e09e13619be3e2eb803a7 (patch)
tree: 8cbce5a3d8eb1fc43efae81810da895978ce948e
parent: 5bd36fdff16871eb7d06fc26cac07e7f2703432b (diff)
9 files changed, 801 insertions, 18 deletions
diff --git a/hurd/translator/pfinet/ipv6.mdwn b/hurd/translator/pfinet/ipv6.mdwn
index 5afee0c6..d30cc850 100644
--- a/hurd/translator/pfinet/ipv6.mdwn
+++ b/hurd/translator/pfinet/ipv6.mdwn
@@ -1,4 +1,4 @@
-[[!meta copyright="Copyright © 2007, 2008, 2010 Free Software Foundation,
+[[!meta copyright="Copyright © 2007, 2008, 2010, 2012 Free Software Foundation,
 Inc."]]
 
 [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
@@ -6,8 +6,8 @@ id="license" text="Permission is granted to copy, distribute and/or modify this
 document under the terms of the GNU Free Documentation License, Version 1.2 or
 any later version published by the Free Software Foundation; with no Invariant
 Sections, no Front-Cover Texts, and no Back-Cover Texts.  A copy of the license
-is included in the section entitled
-[[GNU Free Documentation License|/fdl]]."]]"""]]
+is included in the section entitled [[GNU Free Documentation
+License|/fdl]]."]]"""]]
 
 [[Stefan_Siegl|stesie]] has added IPv6 support to the pfinet [[translator]].
 This was [Savannah task #5470](http://savannah.gnu.org/task/?5470).
@@ -55,3 +55,18 @@ Quite the same, but with static IPv6 address assignment:
 # Missing Functionality
 
 Amongst other things, support for [[IOCTL]]s is missing.
+
+
+## IRC, freenode, #hurd, 2012-12-10
+
+[[!tag open_issue_hurd]]
+
+    <braunr> looks like pfinet -G option doesn't work
+    <braunr> if someone is interested in fixing this (it concerns static IPv6
+      routing)
+    <braunr> youpi: have you ever successfully used pfinet with global
+      statically configured ipv6 addresses ?
+    <youpi> never tried
+    <braunr> ok
+    <braunr> i'd like to set this up on my VMs but it looks bugged :/
+    <braunr> i can't manage to set correctly set the gateway
diff --git a/open_issues/anatomy_of_a_hurd_system.mdwn b/open_issues/anatomy_of_a_hurd_system.mdwn
index 99ef170b..3e585876 100644
--- a/open_issues/anatomy_of_a_hurd_system.mdwn
+++ b/open_issues/anatomy_of_a_hurd_system.mdwn
@@ -13,7 +13,10 @@ License|/fdl]]."]]"""]]
 A bunch of this should also be covered in other (introductionary) material,
 like Bushnell's Hurd paper.  All this should be unfied and streamlined.
 
-IRC, freenode, #hurd, 2011-03-08:
+[[!toc]]
+
+
+# IRC, freenode, #hurd, 2011-03-08
 
     <foocraft> I've a question on what are the "units" in the hurd project, if
       you were to divide them into units if they aren't, and what are the
@@ -38,9 +41,8 @@ IRC, freenode, #hurd, 2011-03-08:
     <antrik> no
     <antrik> servers often depend on other servers for certain functionality
 
----
 
-IRC, freenode, #hurd, 2011-03-12:
+# IRC, freenode, #hurd, 2011-03-12
 
     <dEhiN> when mach first starts up, does it have some basic i/o or fs
       functionality built into it to start up the initial hurd translators?
@@ -72,24 +74,24 @@ IRC, freenode, #hurd, 2011-03-12:
     <antrik> it also does some bootstrapping work during startup, to bring the
       rest of the system up
 
----
+
+# Source Code Documentation
 
 Provide a cross-linked sources documentation, including generated files, like
 RPC stubs.
 
   * <http://www.gnu.org/software/global/>
 
----
 
-[[Hurd_101]].
+# [[Hurd_101]]
+
 
----
+# [[hurd/IO_path]]
 
-More stuff like [[hurd/IO_path]].
+Need more stuff like that.
 
----
 
-IRC, freenode, #hurd, 2011-10-18:
+# IRC, freenode, #hurd, 2011-10-18
 
     <frhodes> what happens @ boot. and which translators are started in what
       order?
@@ -97,9 +99,8 @@ IRC, freenode, #hurd, 2011-10-18:
       ext2; ext2 starts exec; ext2 execs a few other servers; ext2 execs
       init. from there on, it's just standard UNIX stuff
 
----
 
-IRC, OFTC, #debian-hurd, 2011-11-02:
+# IRC, OFTC, #debian-hurd, 2011-11-02
 
     <sekon_> is __dir_lookup a RPC ??
     <sekon_> where can i find the source of __dir_lookup ??
@@ -123,9 +124,8 @@ IRC, OFTC, #debian-hurd, 2011-11-02:
     <tschwinge> sekon_: This may help a bit:
       http://www.gnu.org/software/hurd/hurd/hurd_hacking_guide.html
 
----
 
-IRC, freenode, #hurd, 2012-01-08:
+# IRC, freenode, #hurd, 2012-01-08
 
     <abique> can you tell me how is done in hurd:  "ls | grep x" ?
     <abique> in bash
@@ -187,7 +187,8 @@ IRC, freenode, #hurd, 2012-01-08:
     <antrik> that's probably the most fundamental design feature of the Hurd
     <antrik> (all filesystem operations actually, not only lookups)
 
-IRC, freenode, #hurd, 2012-01-09:
+
+## IRC, freenode, #hurd, 2012-01-09
 
     <braunr> youpi: are you sure cthreads are M:N ? i'm almost sure they're 1:1
     <braunr> and no modern OS is a right place for any thread userspace
@@ -266,3 +267,83 @@ IRC, freenode, #hurd, 2012-01-09:
     <youpi> they help only when the threads are living
     <braunr> ok
     <youpi> now as I said I don't have to talk much more, I have to leave :)
+
+
+# IRC, freenode, #hurd, 2012-12-06
+
+    <braunr> spiderweb: have you read
+      http://www.gnu.org/software/hurd/hurd-paper.html ?
+    <spiderweb> I'll have a look.
+    <braunr> and also the beginning of
+      http://ftp.sceen.net/mach/mach_a_new_kernel_foundation_for_unix_development.pdf
+    <braunr> these two should provide a good look at the big picture the hurd
+      attemtps to achieve
+    <Tekk_> I can't help but wonder though, what advantages were really
+      achieved with early mach?
+    <Tekk_> weren't they just running a monolithic unix server like osx does?
+    <braunr> most mach-based systems were
+    <braunr> but thanks to that, they could provide advanced features over
+      other well established unix systems
+    <braunr> while also being compatible
+    <Tekk_> so basically it was just an ease of development thing
+    <braunr> well that's what mach aimed at being
+    <braunr> same for the hurd
+    <braunr> making things easy
+    <Tekk_> but as a side effect hurd actually delivers on the advantages of
+      microkernels aside from that, but the older systems wouldn't, correct?
+    <braunr> that's how there could be network file systems in very short time
+      and very scarce resources (i.e. developers working on it), while on other
+      systems it required a lot more to accomplish that
+    <braunr> no, it's not a side effect of the microkernel
+    <braunr> the hurd retains and extends the concept of flexibility introduced
+      by mach
+    <Tekk_> the improved stability, etc. isn't a side effect of being able to
+      restart generally thought of as system-critical processes?
+    <braunr> no
+    <braunr> you can't restart system critical processes on the hurd either
+    <braunr> that's one feature of minix, and they worked hard on it
+    <Tekk_> ah, okay. so that's currently just the domain of minix
+    <Tekk_> okay
+    <Tekk_> spiderweb: well, there's 1 advantage of minix for you :P
+    <braunr> the main idea of mach is to make it easy to extend unix
+    <braunr> without having hundreds of system calls
+    <braunr> the hurd keeps that and extends it by making many operations
+      unprivileged
+    <braunr> you don't need special code for kernel modules any more
+    <braunr> it's easy
+    <braunr> you don't need special code to handle suid bits and other ugly
+      similar hacks,
+    <braunr> it's easy
+    <braunr> you don't need fuse
+    <braunr> easy
+    <braunr> etc..
+
+
+# IRC, freenode, #hurd, 2012-12-06
+
+    <spiderweb> what is the #1 feature that distinguished hurd from other
+      operating systems. the concept of translators. (will read more when I get
+      more time).
+    <braunr> yes, translators
+    <braunr> using the VFS as a service directory
+    <braunr> and the VFS permissions to control access to those services
+
+
+# IRC, freenode, #hurd, 2012-12-10
+
+    <spiderweb> I want to work on hurd, but I think I'm going to start with
+      minix, I own the minix book 3rd ed. it seems like a good intro to
+      operating systems in general. like I don't even know what a semaphore is
+      yet.
+    <braunr> well, enjoy learning :)
+    <spiderweb> once I finish that book, what reading do you guys recommend?
+    <spiderweb> other than the wiki
+    <braunr> i wouldn't recommend starting with a book that focuses on one
+      operating system anyway
+    <braunr> you tend to think in terms of what is done in that specific
+      implementation and compare everything else to that
+    <braunr> tannenbaum is not only the main author or minix, but also the one
+      of the book http://en.wikipedia.org/wiki/Modern_Operating_Systems
+    <braunr>
+      http://en.wikipedia.org/wiki/List_of_important_publications_in_computer_science#Operating_systems
+      should be a pretty good list :)
diff --git a/open_issues/fakeroot_eagain.mdwn b/open_issues/fakeroot_eagain.mdwn
new file mode 100644
index 00000000..6b684a04
--- /dev/null
+++ b/open_issues/fakeroot_eagain.mdwn
@@ -0,0 +1,216 @@
+[[!meta copyright="Copyright © 2012 Free Software Foundation, Inc."]]
+
+[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
+id="license" text="Permission is granted to copy, distribute and/or modify this
+document under the terms of the GNU Free Documentation License, Version 1.2 or
+any later version published by the Free Software Foundation; with no Invariant
+Sections, no Front-Cover Texts, and no Back-Cover Texts.  A copy of the license
+is included in the section entitled [[GNU Free Documentation
+License|/fdl]]."]]"""]]
+
+[[!tag open_issue_glibc open_issue_porting]]
+
+
+# IRC, freenode, #hurd, 2012-12-05
+
+    <braunr> rbraun   18813 R        2hrs ln -sf ../af_ZA/LC_NUMERIC
+      debian/locales-all/usr/lib/locale/en_BW/LC_NUMERIC
+    <braunr> when building glibc
+    <braunr> is this a known issue ?
+    <tschwinge> braunr: No.  Can you get a backtrace?
+    <braunr> tschwinge: with gdb you mean ?
+    <tschwinge> Yes.  If you have any debugging symbols (glibc?).
+    <braunr> or the build log leading to that ?
+    <braunr> ok, i will next time i have it
+    <tschwinge> OK.
+    <braunr> (i regularly had it when working on the pthreads port)
+    <braunr> tschwinge:
+      http://www.sceen.net/~rbraun/hurd_glibc_build_deadlock_trace
+    <braunr> youpi: ^
+    <youpi> Mmm, there's not so much we can do about this one
+    <braunr> youpi: what do you mean ?
+    <youpi> the problem is that it's really a reentrency issue of the libc
+      locale
+    <youpi> it would happen just the same on linux
+    <braunr> sure
+    <braunr> but hat doesn't mean we can't report and/or fix it :)
+    <youpi> (the _nl_state_lock)
+    <braunr> do you have any workaround in mind ?
+    <youpi> no
+    <youpi> actually that's what I meant by "there's not so much we can do
+      about this"
+    <braunr> ok
+    <youpi> because it's a bad interaction between libfakeroot and glibc
+    <youpi> glibc believe fxtstat64 would never call locale functions
+    <youpi> but with libfakeroot it does
+    <braunr> i see
+    <youpi> only because we get an EAGAIN here
+    <braunr> but hm, doesn't it happen on linux ?
+    <youpi> EAGAIN doesn't happen on linux for fxstat64, no :)
+    <braunr> why does it happen on the hurd ?
+    <youpi> I mean for fakeroot stuff
+    <youpi> probably because fakeroot uses socket functions
+    <youpi> for which we probably don't properly handleEAGAIN
+    <youpi> I've already seen such kind of issue
+    <youpi> in buildd failures
+    <braunr> ok
+    <youpi> (so the actual bug here is EAGAIN
+    <youpi> )
+    <braunr> yes, so we can do something about it
+    <braunr> worth a look
+    <pinotree> (implement sysv semaphores)
+    <youpi> pinotree: if we could also solve all these buildd EAGAIN issues
+      that'd be nice :)
+    <braunr> that EAGAIN error might also be what makes exim behave badly and
+      loop forever
+    <youpi> possibly
+    <braunr> i've updated the trace with debugging symbols
+    <braunr> it fails on connect
+    <pinotree> like http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=563342 ?
+    <braunr> it's EAGAIN, not ECONNREFUSED
+    <pinotree> ah ok
+    <braunr> might be an error in tcp_v4_get_port
+
+
+## IRC, freenode, #hurd, 2012-12-06
+
+    <braunr> hmm, tcp_v4_get_port sometimes fails indeed
+    <gnu_srs> braunr: may I ask how you found out, adding print statements in
+      pfinet, or?
+    <braunr> yes
+    <gnu_srs> OK, so that's the only (easy) way to debug.
+    <braunr> that's the last resort
+    <braunr> gdb is easy too
+    <braunr> i could have added a breakpoint too
+    <braunr> but i didn't want to block pfinet while i was away
+    <braunr> is it possible to force the use of fakeroot-tcp on linux ?
+    <braunr> the problem seems to be that fakeroot doesn't close the sockets
+      that it connected to faked-tcp
+    <braunr> which, at some point, exhauts the port space
+    <pinotree> braunr: sure
+    <pinotree> change the fakeroot dpkg alternative
+    <braunr> ok
+    <pinotree> calling it explicitly `fakeroot-tcp command` or
+      `dpkg-buildpackage -rfakeroot-tcp ...` should work too
+    <braunr> fakeroot-tcp looks really evil :p
+    <braunr> hum, i don't see any faked-tcp process on linux :/
+    <pinotree> not even with `fakeroot-tcp bash -c "sleep 10"`?
+    <braunr> pinotree: now yes
+    <braunr> but, does it mean faked-tcp is started for *each* process loading
+      fakeroot-tcp ?
+    <braunr> (the lib i mean)
+    <pinotree> i think so
+    <braunr> well the hurd doesn't seem to do that at all
+    <braunr> or maybe it does and i don't see it
+    <braunr> the stale faked-tcp processes could be those that failed something
+      only
+    <pinotree> yes, there's also that issue: sometimes there are stake
+      faked-tcp processes
+    <braunr> hum no, i see one faked-tcp that consumes cpu when building glibc
+    <pinotree> *stale
+    <braunr> it's the same process for all commands
+    <pinotree> <braunr> but, does it mean faked-tcp is started for *each*
+      process loading fakeroot-tcp ?
+    <pinotree> → everytime you start fakeroot, there's a new faked-xxx for it
+    <braunr> it doesn't look that way
+    <braunr> again, on the hurd, i see one faked-tcp, consuming cpu while
+      building so i assume it services libfakeroot-tcp requests
+    <pinotree> yes
+    <braunr> which means i probably won't reproduce the problem on linux
+    <pinotree> it serves that fakeroot under which the binary(-arch) target is
+      run
+    <braunr> or perhaps it's the normal fakeroot-tcp behaviour on sid
+    <braunr> pinotree: a faked-tcp that is started for each command invocation
+      will implicitely make the network stack close all its sockets when
+      exiting
+    <braunr> pinotree: as our fakeroot-tcp uses the same instance of faked-tcp,
+      it's a lot more likely to exhaust the port space
+    <pinotree> i see
+    <braunr> i'll try on sid and see how it behaves
+    <braunr> pinotree: on the other hand, forking so many processes at each
+      command invocation may make exec leak a lot :p
+    <braunr> or rather, a lot more
+    <braunr> (or maybe not, since it leaks only in some cases)
+
+[[exec_leak]].
+
+    <braunr> pinotree: actually, the behaviour under linux is the same with the
+      alternative correctly set, whereas faked-tcp is restarted (if used at
+      all) with -rfakeroot-tcp
+    <braunr> hm no, even that isn't true
+    <braunr> grr
+    <braunr> pinotree: i think i found a handy workaround for fakeroot
+    <braunr> pinotree: the range of local ports in our networking stack is a
+      lot more limited than what is configured in current systems
+    <braunr> by extending it, i can now build glibc \o/
+    <pinotree> braunr: what are the current ours and the usual one?
+    <braunr> see pfinet/linux-src/net/ipv4/tcp_ipv4.c
+    <braunr> the modern ones are the ones suggested in the comment
+    <braunr> sysctl_local_port_range is the symbol storing the range
+    <pinotree> i see
+    <pinotree> what's the current range on linux?
+    <braunr> 20:44 < braunr> the modern ones are the ones suggested in the
+      comment
+    <pinotree> i see
+    <braunr> $ cat /proc/sys/net/ipv4/ip_local_port_range 
+    <braunr> 32768   61000
+    <braunr> so, i'm not sure why we have the problem, since even on linux,
+      netstat doesn't show open bound ports, but it does help
+    <braunr> the fact faked-tcp can remain after its use is more problematic
+    <pinotree> (maybe pfinet could grow a (startup-only?) option to change it,
+      similar to that sysctl)
+    <braunr> but it can also stems from the same issue gnu_srs found about
+      closed sockets that haven't been shut down
+    <braunr> perhaps
+    <braunr> but i don't see the point actually
+    <braunr> we could simply change the values in the code
+
+    <braunr> youpi: first, in pfinet, i increased the range of local ports to
+      reduce the likeliness of port space exhaustion
+    <braunr> so we should get a lot less EAGAIN after that
+    <braunr> (i've not committed any of those changes)
+    <youpi> range of local ports?
+    <braunr> see pfinet/linux-src/net/ipv4/tcp_ipv4.c, tcp_v4_get_port function
+      and sysctl_local_port_range array
+    <youpi> oh
+    <braunr> EAGAIN is caused by tcp_v4_get_port failing at
+    <braunr>                 /* Exhausted local port range during search? */
+    <braunr>                 if (remaining <= 0)
+    <braunr>                         goto fail;
+    <youpi> interesting
+    <youpi> so it's not a hurd bug after all
+    <youpi> just a problem in fakeroot eating a lot of ports
+    <braunr> maybe because of the same issue gnu_srs worked on (bad socket
+      close when no clean shutdown)
+    <braunr> maybe, maybe not
+    <braunr> but increasing the range is effective
+    <braunr> and i compared with what linux does today, which is exactly what
+      is in the comment above sysctl_local_port_range
+    <braunr> so it looks safe
+    <youpi> so that means that the pfinet just uses ports 1024- 4999 for
+      auto-allocated ports?
+    <braunr> i guess so
+    <youpi> the linux pfinet I meant
+    <braunr> i haven't checked the whole code but it looks that way
+    <youpi> ./sysctl_net_ipv4.c:static int ip_local_port_range_min[] = { 1, 1
+      };
+    <youpi> ./sysctl_net_ipv4.c:static int ip_local_port_range_max[] = { 65535,
+      65535 };
+    <youpi> looks like they have increased it since then :)
+    <braunr> hum :)
+    <braunr> $ cat /proc/sys/net/ipv4/ip_local_port_range 
+    <braunr> 32768   61000
+    <youpi> yep, same here
+    <youpi> ./inet_connection_sock.c:	.range = { 32768, 61000 },
+    <youpi> so there are two things apparently
+    <youpi> but linux now defaults to 32k-61k
+    <youpi> braunr: please just push the port range upgrade to 32Ki-61K
+    <braunr> ok, will do
+    <youpi> there's not reason not to do it
+
+
+## IRC, freenode, #hurd, 2012-12-11
+
+    <braunr> youpi: at least, i haven't had any failure building eglibc since
+      the port range patch
+    <youpi> good :)
diff --git a/open_issues/gnumach_memory_management.mdwn b/open_issues/gnumach_memory_management.mdwn
index 9feb30c8..e5e9d2c5 100644
--- a/open_issues/gnumach_memory_management.mdwn
+++ b/open_issues/gnumach_memory_management.mdwn
@@ -2133,3 +2133,52 @@ There is a [[!FF_project 266]][[!tag bounty]] on this task.
     <braunr> do you want to review ?
     <youpi> I don't think there is any need to
     <braunr> ok
+
+
+# IRC, freenode, #hurd, 2012-12-08
+
+    <mcsim> braunr: hi. Do I understand correct that merely the same technique
+      is used in linux to determine the slab where, the object to be freed,
+      resides?
+    <braunr> yes but it's faster on linux since it uses a direct mapping of
+      physical memory
+    <braunr> it just has to shift the virtual address to obtain the physical
+      one, whereas x15 has to walk the pages tables
+    <braunr> of course it only works for kmalloc, vmalloc is entirely different
+    <mcsim> btw, is there sense to use some kind of B-tree instead of AVL to
+      decrease number of cache misses? AFAIK, in modern processors size of L1
+      cache line is at least 64 bytes, so in one node we can put at least 4
+      leafs (key + pointer to data) making search faster.
+    <braunr> that would be a b-tree
+    <braunr> and yes, red-black trees were actually developed based on
+      properties observed on b-trees
+    <braunr> but increasing the size of the nodes also increases memory
+      overhead
+    <braunr> and code complexity
+    <braunr> that's why i have a radix trees for cases where there are a large
+      number of entries with keys close to each other :)
+    <braunr> a radix-tree is basically a b-tree using the bits of the key as
+      indexes in the various arrays it walks instead of comparing keys to each
+      other
+    <braunr> the original avl tree used in my slab allocator was intended to
+      reduce the average height of the tree (avl is better for that)
+    <braunr> avl trees are more suited for cases where there are more lookups
+      than inserts/deletions
+    <braunr> they make the tree "flatter" but the maximum complexity of
+      operations that change the tree is 2log2(n), since rebalancing the tree
+      can make the algorithm reach back to the tree root
+    <braunr> red-black trees have slightly bigger heights but insertions are
+      limited to 2 rotations and deletions to 3
+    <mcsim> there should be not much lookups in slab allocators
+    <braunr> which explains why they're more generally found in generic
+      containers
+    <mcsim> or do I misunderstand something?
+    <braunr> well, there is a lookup for each free()
+    <braunr> whereas there are insertions/deletions when a slab becomes
+      non-empty/empty
+    <mcsim> I see
+    <braunr> so it was very efficient for caches of small objects, where slabs
+      have many of them
+    <braunr> also, i wrote the implementation in userspace, without
+      functionality pmap provides (although i could have emulated it
+      afterwards)
diff --git a/open_issues/libpthread.mdwn b/open_issues/libpthread.mdwn
index 81f1a382..befc1378 100644
--- a/open_issues/libpthread.mdwn
+++ b/open_issues/libpthread.mdwn
@@ -1234,3 +1234,95 @@ There is a [[!FF_project 275]][[!tag bounty]] on this task.
       of a "message server" à la dmesg
 
 [[translator_stdout_stderr]].
+
+
+### IRC, freenode, #hurd, 2012-12-10
+
+    <youpi> braunr: unable to adjust libports thread priority: (ipc/send)
+      invalid destination port
+    <youpi> I'll see what package brought that
+    <youpi> (that was on a buildd)
+    <braunr> wow
+    <youpi> mkvtoolnix_5.9.0-1:
+    <pinotree> shouldn't that code be done in pthreads and then using such
+      pthread api? :p
+    <braunr> pinotree: you've already asked that question :p
+    <pinotree> i know :p
+    <braunr> the semantics of pthreads are larger than what we need, so that
+      will be done "later"
+    <braunr> but this error shouldn't happen
+    <braunr> it looks more like a random mach bug
+    <braunr> youpi: anything else on the console ?
+    <youpi> nope
+    <braunr> i'll add traces to know which step causes the error
+
+
+## IRC, freenode, #hurd, 2012-12-05
+
+    <braunr> tschwinge: i'm currently working on a few easy bugs and i have
+      planned improvements for libpthreads soon
+    <pinotree> wotwot, which ones?
+    <braunr> pinotree: first, fixing pthread_cond_timedwait (and everything
+      timedsomething actually)
+    <braunr> pinotree: then, fixing cancellation
+    <braunr> pinotree: and last but not least, optimizing thread wakeup
+    <braunr> i also want to try replacing spin locks and see if it does what i
+      expect
+    <pinotree> which fixes do you plan applying to cond_timedwait?
+    <braunr> see sysdeps/generic/pt-cond-timedwait.c
+    <braunr> the FIXME comment
+    <pinotree> ah that
+    <braunr> well that's important :)
+    <braunr> did you have something else in mind ?
+    <pinotree> hm, __pthread_timedblock... do you plan fixing directly there? i
+      remember having seem something related to that (but not on conditions),
+      but wasn't able to see further
+    <braunr> it has the same issue
+    <braunr> i don't remember the details, but i wrote a cthreads version that
+      does it right
+    <braunr> in the io_select_timeout branch
+    <braunr> see
+      http://git.savannah.gnu.org/cgit/hurd/hurd.git/tree/libthreads/cancel-cond.c?h=rbraun/select_timeout
+      for example
+    * pinotree looks
+    <braunr> what matters is the msg_delivered member used to synchronize
+      sleeper and waker
+    <braunr> the waker code is in
+      http://git.savannah.gnu.org/cgit/hurd/hurd.git/tree/libthreads/cprocs.c?h=rbraun/select_timeout
+    <pinotree> never seen cthreads' code before :)
+    <braunr> soon you shouldn't have any more reason to :p
+    <pinotree> ah, so basically the cthread version of the pthread cleanup
+      stack + cancelation (ie the cancel hook) broadcasts the condition
+    <braunr> yes
+    <pinotree> so a similar fix would be needed in all the places using
+      __pthread_timedblock, that is conditions and mutexes
+    <braunr> and that's what's missing in glibc that prevents deploying a
+      pthreads based hurd currently
+    <braunr> no that's unrelated
+    <pinotree> ok
+    <braunr> the problem is how __pthread_block/__pthread_timedblock is
+      synchronized with __pthread_wakeup
+    <braunr> libpthreads does exactly the same thing as cthreads for that,
+      i.e. use messages
+    <braunr> but the message alone isn't enough, since, as explained in the
+      FIXME comment, it can arrive too late
+    <braunr> it's not a problem for __pthread_block because this function can
+      only resume after receiving a message
+    <braunr> but it's a problem for __pthread_timedblock which can resume
+      because of a timeout
+    <braunr> my solution is to add a flag that says whether a message was
+      actually sent, and lock around sending the message, so that the thread
+      resume can accurately tell in which state it is
+    <braunr> and drain the message queue if needed
+    <pinotree> i see, race between the "i stop blocking because of timeout" and
+      "i stop because i got a message" with the actual check for the real cause
+    <braunr> locking around mach_msg may seem overkill but it's not in
+      practice, since there can only be one message at most in the message
+      queue
+    <braunr> and i checked that in practice by limiting the message queue size
+      and check for such errors
+    <braunr> but again, it would be far better with mutexes only, and no spin
+      locks
+    <braunr> i wondered for a long time why the load average was so high on the
+      hurd under even "light" loads
+    <braunr> now i know :)
diff --git a/open_issues/netstat.mdwn b/open_issues/netstat.mdwn
new file mode 100644
index 00000000..b575ea7f
--- /dev/null
+++ b/open_issues/netstat.mdwn
@@ -0,0 +1,34 @@
+[[!meta copyright="Copyright © 2012 Free Software Foundation, Inc."]]
+
+[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
+id="license" text="Permission is granted to copy, distribute and/or modify this
+document under the terms of the GNU Free Documentation License, Version 1.2 or
+any later version published by the Free Software Foundation; with no Invariant
+Sections, no Front-Cover Texts, and no Back-Cover Texts.  A copy of the license
+is included in the section entitled [[GNU Free Documentation
+License|/fdl]]."]]"""]]
+
+[[!tag open_issue_glibc open_issue_hurd open_issue_porting]]
+
+
+# IRC, freenode, #hurd, 2012-12-06
+
+    <braunr> we need a netstat command
+    <pinotree> wouldn't that require rpcs and notifications in pfinet to get
+      info on the known sockets?
+    <braunr> depends on the interface
+    <braunr> netstat currently uses /proc/net/* so that's out of the question
+    <braunr> but a bsd netstat using ioctls could do the job
+    <braunr> i'm not sure if it's done that way
+    <braunr> i don't see why it would require notifications though
+    <pinotree> if add such rpcs to pfinet, you could show the sockets in procfs
+    <braunr> yes
+    <braunr> that's the clean way :p
+    <braunr> but why notifications ?
+    <pinotree> to get changes on data of sockets (status change, i/o activity,
+      etc)
+    <pinotree> (possibly i'm forgetting some already there features to know
+      that)
+    <braunr> the socket state is centralized in pfinet
+    <braunr> netstat polls it
+    <braunr> (or asks it once)
diff --git a/open_issues/performance.mdwn b/open_issues/performance.mdwn
index 8147e5eb..ae05e128 100644
--- a/open_issues/performance.mdwn
+++ b/open_issues/performance.mdwn
@@ -83,6 +83,109 @@ call|/glibc/fork]]'s case.
     <antrik> ouch
 
 
+## [[!message-id "20121202101508.GA30541@mail.sceen.net"]]
+
+
+## IRC, freenode, #hurd, 2012-12-04
+
+    <damo22> why do some people think hurd is slow? i find it works well even
+      under heavy load inside a virtual machine
+    <braunr> damo22: the virtual machine actually assists the hurd a lot :p
+    <braunr> but even with that, the hurd is a slow system
+    <damo22> i would have thought it would have the potential to be very fast,
+      considering the model of the kernel
+    <braunr> the design implies by definition more overhead, but the true cause
+      is more than 15 years without optimization on the core components
+    <braunr> how so ?
+    <damo22> since there are less layers of code between the hardware bare
+      metal and the application that users run
+    <braunr> how so ? :)
+    <braunr> it's the contrary actually
+    <damo22> VFS -> IPC -> scheduler -> device drivers -> hardware
+    <damo22> that is monolithic
+    <braunr> well, it's not really meaningful
+    <braunr> and i'd say the same applies for a microkernel system
+    <damo22> if the application can talk directly to hardware through the
+      kernel its almost like plugging directly into the hardware
+    <braunr> you never talk directly to hardware
+    <braunr> you talk to servers instead of the kernel
+    <damo22> ah
+    <braunr> consider monolithic kernel systems like systems with one big
+      server
+    <braunr> the kernel
+    <braunr> whereas a multiserver system is a kernel and many servers
+    <braunr> you still need the VFS to identify your service (and thus your
+      server)
+    <braunr> you need much more IPC, since system calls are "replaced" with RPC
+    <braunr> the scheduler is basically the same
+    <damo22> okay
+    <braunr> device drivers are similar too, except they run in thread context
+      (which is usually a bit heavier)
+    <damo22> but you can do cool things like report when an interrupt line is
+      blocked
+    <braunr> and there are many context switches between all that
+    <braunr> you can do all that in a monolithic kernel too, and faster
+    <braunr> but it's far more elegant, and (when well done) easy to do on a
+      microkernel based system
+    <damo22> yes
+    <damo22> i like elegant, makes coding easier if you know the basics
+    <braunr> there are only two major differences between a monolilthic kernel
+      and a multiserver microkernel system
+    * damo22 listens
+    <braunr> 1/ independence of location (your resources could be anywhere)
+    <braunr> 2/ separation of address spaces (your servers have their own
+      addresses)
+    <damo22> wow
+    <braunr> these both imply additional layers of indirection, making the
+      system as a whole slower
+    <damo22> but it would be far more secure though i suspect
+    <braunr> yes
+    <braunr> and reliable
+    <braunr> that's why systems like qnx were usually adopted for critical
+      tasks
+    <damo22> security and reliability are very important, i would switch to the
+      hurd if it supported all the hardware i use 
+    <braunr> so would i :)
+    <braunr> but performance matters too
+    <damo22> not to me
+    <braunr> it should :p
+    <braunr> it really does matter a lot in practice
+    <damo22> i mean, a 2x slowdown compared to linux would not affect me
+    <damo22> if it had all the benefits we mentioned above
+    <braunr> but the hurd is really slow for other reasons than its additional
+      layers of indrection unfortunately
+    <damo22> is it because of lack of optimisation in the core code?
+    <braunr> we're working on these issues, but it's not easy and takes a lot
+      of time :p
+    <damo22> like you said
+    <braunr> yes
+    <braunr> and also because of some fundamental design choices related to the
+      microkernel back in the 80s
+    <damo22> what about the darwin system
+    <damo22> it uses a mach kernel?
+    <braunr> yes
+    <damo22> what is stopping someone taking the MIT code from darwin and
+      creating a monster free OS
+    <braunr> what for ?
+    <damo22> because it already has hardware support
+    <damo22> and a mach kernel
+    <braunr> in kernel drivers ?
+    <damo22> it has kernel extensions
+    <damo22> you can do things like kextload module
+    <braunr> first, being a mach kernel doesn't make it compatible or even
+      easily usable with the hurd, the interfaces have evolved independantly
+    <braunr> and second, we really do want more stuff out of the kernel
+    <braunr> drivers in particular
+    <damo22> may i ask why you are very keen to have drivers out of kernel?
+    <braunr> for the same reason we want other system services out of the
+      kernel
+    <braunr> security, reliability, etc..
+    <braunr> ease of debugging
+    <braunr> the ability to restart drivers separately, without restarting the
+      kernel
+    <damo22> i see
+
+
 # IRC, freenode, #hurd, 2012-09-13
 
 {{$news/2011-q2#phoronix-3}}.
diff --git a/open_issues/robustness.mdwn b/open_issues/robustness.mdwn
index d32bd509..1f8aa0c6 100644
--- a/open_issues/robustness.mdwn
+++ b/open_issues/robustness.mdwn
@@ -62,3 +62,68 @@ License|/fdl]]."]]"""]]
     <antrik> well, I'm not aware of the Minix implementation working across
       reboots. the one I have in mind based on a generic session management
       infrastructure should though :-)
+
+
+## IRC, freenode, #hurd, 2012-12-06
+
+    <Tekk_> out of curiosity, would it be possible to strap on a resurrection
+      server to hurd?
+    <Tekk_> in the future, that is
+    <braunr> sure
+    <Tekk_> cool :)
+    <braunr> but this requires things like persistence
+    <spiderweb> like a reincarnation server?
+    <braunr> it's a lot of works, with non negligible overhead
+    <Tekk_> spiderweb: yes, exactly. I didn't remember tanenbaum's wording on
+      that
+    <braunr> i'm pretty sure most people would be against that
+    <spiderweb> braunr: why so?
+    <Tekk_> it was actually the feature that convinced me that ukernels were a
+      good idea
+    <Tekk_> spiderweb: because then you need a process that keeps track of all
+      the other servers
+    <Tekk_> and they have to be replying to "useless" pings to see if they're
+      still alive
+    <braunr> spiderweb: the hurd community isn't looking for a system reliable
+      in critical environments
+    <braunr> just a general purpose system
+    <braunr> and persistence requires regular data saves
+    <braunr> it's expensive
+    <Tekk_> as well as that
+    <braunr> we already have performance problems because of the nature of the
+      system, adding more without really looking for the benefits is useless
+    <spiderweb> so you can't theoretically have both?
+    <braunr> persistence and performance ?
+    <braunr> it's hard
+    <Tekk_> spiderweb: you need to modify the other translators to be
+      persistent
+    <braunr> only the ones you care about actually
+    <braunr> but it's just better to make the critical servers very stable
+    <Tekk_> so it's not just turning on and off the reincarnation
+    <braunr> (there isn't that much code there)
+    <braunr> and the other servers restartable
+    <mcsim> braunr: I think that if there will be aim to make something like
+      resurrection server than it will be needed rewrite most servers to make
+      them stateless, isn't it?
+    <braunr> that's a lot easier and already works with non essential passive
+      translators
+    <Tekk_> mcsim: pretty much
+    <braunr> mcsim: only those you care about
+    <braunr> mcsim: the proc auth exec servers for example, perhaps the file
+      system servers that can act as root fs, but the others would simply be
+      restarted by the passive translator mechanism
+    <spiderweb> what about restarting device drivers, that would be simple
+      right?
+    <braunr> that's perfectly doable, yes
+    <spiderweb> (being an OS newbie) - it does seem to me that the whole
+      reincarnation server concept could quite possibly be a band aid.
+    <braunr> spiderweb: no it really works
+    <braunr> many systems do that actually
+    <braunr> let me give you a link
+    <braunr>
+      http://ftp.sceen.net/curios_improving_reliability_through_operating_system_structure.pdf
+    <braunr> it's a bit old, but there is a review of systems aiming at
+      resilience and how they achieve part of it
+    <spiderweb> neat, thanks
+    <braunr> actually it's not that old at all
+    <braunr> around 2007
diff --git a/open_issues/select.mdwn b/open_issues/select.mdwn
index 12807e11..778af530 100644
--- a/open_issues/select.mdwn
+++ b/open_issues/select.mdwn
@@ -1503,6 +1503,134 @@ IRC, unknown channel, unknown date:
 [[Term_blocking]].
 
 
+# IRC, freenode, #hurd, 2012-12-05
+
+    <braunr> well if i'm unable to build my own packages, i'll send you the one
+      line patch i wrote that fixes select/poll for the case where there is
+      only one descriptor
+    <braunr> (the current code calls mach_msg twice, each time with the same
+      timeout, doubling the total wait time when there is no event)
+
+
+## IRC, freenode, #hurd, 2012-12-06
+
+    <braunr> damn, my eglibc patch breaks select :x
+    <braunr> i guess i'll just simplify the code by using the same path for
+      both single fd and multiple fd calls
+    <braunr> at least, the patch does fix the case i wanted it to .. :)
+    <braunr> htop and ping act at the right regular interval
+    <braunr> my select patch is :
+    <braunr>    /* Now wait for reply messages.  */
+    <braunr> -  if (!err && got == 0)
+    <braunr> +  if (!err && got == 0 && firstfd != -1 && firstfd != lastfd)
+    <braunr> basically, when there is a single fd, the code calls io_select
+      with a timeout
+    <braunr> and later calls mach_msg with the same timeout
+    <braunr> effectively making the maximum wait time twice what it should be
+    <pinotree> ouch
+    <braunr> which is why htop and ping are "laggy"
+    <braunr> and perhaps also why fakeroot is when building libc
+    <braunr> well
+    <braunr> when building packages
+    <braunr> my patch avoids entering the mach_msg call if there is only one fd
+    <braunr> (my failed attempt didn't have the firstfd != -1 check, leading to
+      the 0 fd case skipping mach_msg too, which is wrong since in that case
+      there is just no wait, making applications use select/poll for sleeping
+      consume all cpu)
+
+    <braunr> the second is a fix in select (yet another) for the case where a
+      single fd is passed
+    <braunr> in which case there is one timeout directly passed in the
+      io_select call, but then yet another in the mach_msg call that waits for
+      replies
+    <braunr> this can account for the slowness of a bunch of select/poll users
+
+
+## IRC, freenode, #hurd, 2012-12-07
+
+    <braunr> finally, my select patch works :)
+
+
+## IRC, freenode, #hurd, 2012-12-08
+
+    <braunr> for those interested, i pushed my eglibc packages that include
+      this little select/poll timeout fix on my debian repository
+    <braunr> deb http://ftp.sceen.net/debian-hurd experimental/
+    <braunr> reports are welcome, i'm especially interested in potential
+      regressions
+
+
+## IRC, freenode, #hurd, 2012-12-10
+
+    <gnu_srs> I have verified your double timeout bug in hurdselect.c.
+    <gnu_srs>  Since I'm also working on hurdselect I have a few questions
+      about where the timeouts in mach_msg and io_select are implemented.
+    <gnu_srs> Have a big problem to trace them down to actual code: mig magic
+      again?
+    <braunr> yes
+    <braunr> see hurd/io.defs, io_select includes a waittime timeout:
+      natural_t; parameter
+    <braunr> waittime is mig magic that tells the client side not to wait more
+      than the timeout
+    <braunr> and in _hurd_select, you can see these lines :
+    <braunr>             err = __io_select (d[i].io_port, d[i].reply_port,
+    <braunr>                                /* Poll only if there's a single
+      descriptor.  */
+    <braunr>                                (firstfd == lastfd) ? to : 0,
+    <braunr> to being the timeout previously computed
+    <braunr> "to"
+    <braunr> and later, when waiting for replies :
+    <braunr>       while ((msgerr = __mach_msg (&msg.head,
+    <braunr>                                    MACH_RCV_MSG | options,
+    <braunr>                                    0, sizeof msg, portset, to,
+    <braunr>                                    MACH_PORT_NULL)) ==
+      MACH_MSG_SUCCESS)
+    <braunr> the same timeout is used
+    <braunr> hope it helps
+    <gnu_srs> Additional stuff on io-select question is at
+      http://paste.debian.net/215401/
+    <gnu_srs> Sorry, should have posted it before you comment, but was
+      disturbed.
+    <braunr> 14:13 < braunr> waittime is mig magic that tells the client side
+      not to wait more than the timeout
+    <braunr> the waittime argument is a client argument only
+    <braunr> that's one of the main source of problems with select/poll, and
+      the one i fixed 6 months ago
+    <gnu_srs> so there is no relation to the third argument of the client call
+      and the third argument of the server code?
+    <braunr> no
+    <braunr> the 3rd argument at server side is undoubtedly the 4th at client
+      side here
+    <gnu_srs> but for the fourth argument there is?
+    <braunr> i think i've just answered that
+    <braunr> when in doubt, check the code generated by mig when building glibc
+    <gnu_srs> as I said before, I have verified the timeout bug you solved.
+    <gnu_srs> which code to look for RPC_*?
+    <braunr> should be easy to guess
+    <gnu_srs> is it the same with mach_msg()? No explicit usage of the timeout
+      there either.
+    <gnu_srs> in the code for the function I mean.
+    <braunr> gnu_srs: mach_msg is a low level system call
+    <braunr> see
+      http://www.gnu.org/software/hurd/gnumach-doc/Mach-Message-Call.html#Mach-Message-Call
+    <gnu_srs> found the definition of __io_select in: RPC_io_select.c, thanks.
+    <gnu_srs> so the client code to look for wrt RPC_ is in hurd/*.defs? what
+      about the gnumach/*/include/*.defs?
+    <gnu_srs> a final question: why use a timeout if there is a single FD for
+      the __io_select call, not when there are more than one?
+    <braunr> well, the code is obviously buggy, so don't expect me to justify
+      wrong code
+    <braunr> but i suppose the idea was : if there is only one fd, perform a
+      classical synchronous RPC, whereas if there are more use a heavyweight
+      portset and additional code to receive replies
+
+    <youpi> exim4 didn't get fixed by the libc patch, unfortunately
+    <braunr> yes i noticed
+    <braunr> gdb can't attach correctly to exim, so it's probably something
+      completely different
+    <braunr> i'll try the non intrusive mode
+
+
 # See Also
 
 See also [[select_bogus_fd]] and [[select_vs_signals]].
author	Thomas Schwinge <tschwinge@gnu.org>	2012-12-11 11:04:26 +0100
committer	Thomas Schwinge <tschwinge@gnu.org>	2012-12-11 11:04:26 +0100
commit	bcfc058a332da0a2bd2e09e13619be3e2eb803a7 (patch)
tree	8cbce5a3d8eb1fc43efae81810da895978ce948e
parent	5bd36fdff16871eb7d06fc26cac07e7f2703432b (diff)