IRC.

author: Thomas Schwinge <thomas@schwinge.name> 2011-05-09 10:47:56 +0200
committer: Thomas Schwinge <thomas@schwinge.name> 2011-05-09 10:47:56 +0200
commit: 2bc136e680877b6a9d17d6a0e815b47775088d67 (patch)
tree: 21400fef6b3d6e6f59c4a504038348da78397264 /open_issues
parent: 946dbc8338a431b78e4a7b25d24fda36ee4cadf3 (diff)
14 files changed, 745 insertions, 7 deletions
diff --git a/open_issues/address_space_memory_mapping_entries.mdwn b/open_issues/address_space_memory_mapping_entries.mdwn
new file mode 100644
index 00000000..caf447dd
--- /dev/null
+++ b/open_issues/address_space_memory_mapping_entries.mdwn
@@ -0,0 +1,19 @@
+[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]]
+
+[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
+id="license" text="Permission is granted to copy, distribute and/or modify this
+document under the terms of the GNU Free Documentation License, Version 1.2 or
+any later version published by the Free Software Foundation; with no Invariant
+Sections, no Front-Cover Texts, and no Back-Cover Texts.  A copy of the license
+is included in the section entitled [[GNU Free Documentation
+License|/fdl]]."]]"""]]
+
+[[!tag open_issue_gnumach]]
+
+IRC, freenode, #hurd, 2011-05-07
+
+    <braunr> and as a last example: memory mapping is heavily used in the hurd,
+      but for some reason, the map entries in an address space are still on a
+      linked list
+    <braunr> a bare linked list
+    <braunr> which makes faults and page cache lookups even slower
diff --git a/open_issues/ext2fs_page_cache_swapping_leak.mdwn b/open_issues/ext2fs_page_cache_swapping_leak.mdwn
index 607c3af4..c0d0867b 100644
--- a/open_issues/ext2fs_page_cache_swapping_leak.mdwn
+++ b/open_issues/ext2fs_page_cache_swapping_leak.mdwn
@@ -149,3 +149,27 @@ IRC, freenode, #hurd, 2011-04-18
     <antrik> this make testing this stuff quite a lot harder... [sigh]
     <antrik> any suggestions how to debug this hang?
     <braunr> antrik: no :/
+
+2011-04-28: [[!taglink open_issue_documentation]]
+
+    <antrik> hm... is it normal that "swap free" doesn't increase as a process'
+      memory is paged back in?
+    <youpi> yes
+    <youpi> there's no real use cleaning swap
+    <youpi> on the contrary, it makes paging the process out again longer
+    <antrik> hm... so essentially, after swapping back and forth a bit, a part
+      of the swap equal to the size of physical RAM will be occupied with stuff
+      that is actually in RAM?
+    <youpi> yes
+    <youpi> so that that RAM can be freed immediately if needed
+    <antrik> hm... that means my effective swap size is only like 300 MB... no
+      wonder I see crashes under load
+    <antrik> err... make that 230 actually
+    <antrik> indeed, quitting the application freed both the physical RAM and
+      swap space
+    <braunr> 02:28 < antrik> hm... is it normal that "swap free" doesn't
+      increase as a process' memory is paged back in?
+    <braunr> swap is the backing store of anonymous memory, like ext2fs is the
+      backing store of memory objects created from its pager
+    <braunr> so you can view swap as the file system for everything that isn't
+      an external memory object
diff --git a/open_issues/gnumach_memory_management.mdwn b/open_issues/gnumach_memory_management.mdwn
index 1b897454..a5dd6955 100644
--- a/open_issues/gnumach_memory_management.mdwn
+++ b/open_issues/gnumach_memory_management.mdwn
@@ -772,3 +772,76 @@ IRC, freenode, #hurd, 2011-04-12:
     <braunr> FreeBSD uses a binary buddy system like Linux
     <braunr> the fact that the kernel allocator uses virtual memory doesn't
       mean the kernel has no mean to allocate contiguous physical memory ...
+
+2011-05-02
+
+    <braunr> hm nice, my allocator uses less memory than glibc (squeeze
+      version) on both 32 and 64 bits systems
+    <braunr> the new per-cpu layer is proving effective
+    <neal> braunr: Are you reimplementation malloc?
+    <braunr> no
+    <braunr> it's still the slab allocator for mach, but tested in userspace
+    <braunr> so i wrote malloc wrappers
+    <neal> Oh.
+    <braunr> i try to heavily test most of my code in userspace now
+    <neal> it's easier :-)
+    <neal> I agree
+    <braunr> even the physical memory allocator has been implemented this way
+    <neal> is this your mach version?
+    <braunr> virtual memory allocation will follow
+    <neal> or are you working on gnu mach?
+    <braunr> for now it's my version
+    <braunr> but i intend to spend the summer working on ipc port names
+      management
+
+[[rework_gnumach_IPC_spaces]].
+
+    <braunr> and integrate the result in gnu mach
+    <neal> are you keeping the same user-space API?
+    <neal> Or are you experimenting with something new?
+    <antrik> braunr: to be fair, it's not terribly hard to use less memory than
+      glibc :-)
+    <braunr> yes
+    <braunr> antrik: well ptmalloc3 received some nice improvements
+    <braunr> neal: the goal is to rework some of the internals only
+    <braunr> neal: namely, i simply intend to replace the splay tree with a
+      radix tree
+    <antrik> braunr: the glibc allocator is emphasising performace, unlike some
+      other allocators that trade some performance for much better memory
+      utilisation...
+    <antrik> ptmalloc3?
+    <braunr> that's the allocator used in glibc
+    <braunr> http://www.malloc.de/en/
+    <antrik> OK. haven't seen any recent numbers... the comparision I have in
+      mind is many years old...
+    <braunr> i also made some additions to my avl and red-black trees this week
+      end, which finally make them suitable for almost all generic uses
+    <braunr> the red-black tree could be used in e.g. gnu mach to augment the
+      linked list used in vm maps
+    <braunr> which is what's done in most modern systems
+    <braunr> it could also be used to drop the overloaded (and probably over
+      imbalanced) page cache hash table
+
+2011-05-03
+
+    <mcsim> antrik: How should I start porting? Have I just include rbraun's
+      allocator to gnumach and make it compile?
+    <antrik> mcsim: well, basically yes I guess... but you will have to look at
+      the code in question first before we know anything more specific :-)
+    <antrik> I guess braunr might know better how to start, but he doesn't
+      appear to be here :-(
+    <braunr> mcsim: you can't juste put my code into gnu mach and make it run,
+      it really requires a few careful changes
+    <braunr> mcsim: you will have to analyse how the current zone allocator
+      interacts with regard to locking
+    <braunr> if it is used in interrupt handlers
+    <braunr> what kind of locks it should use instead of the pthread stuff
+      available in userspace
+    <braunr> you will have to change the reclamiing policy, so that caches are
+      reaped on demand
+    <braunr> (this basically boils down to calling the new reclaiming function
+      instead of zone_gc())
+    <braunr> you must be careful about types too
+    <braunr> there is work to be done ;)
+    <braunr> (not to mention the obvious about replacing all the calls to the
+      zone allocator, and testing/debugging afterwards)
diff --git a/open_issues/keymap_mach_console.mdwn b/open_issues/keymap_mach_console.mdwn
new file mode 100644
index 00000000..3063dd00
--- /dev/null
+++ b/open_issues/keymap_mach_console.mdwn
@@ -0,0 +1,40 @@
+[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]]
+
+[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
+id="license" text="Permission is granted to copy, distribute and/or modify this
+document under the terms of the GNU Free Documentation License, Version 1.2 or
+any later version published by the Free Software Foundation; with no Invariant
+Sections, no Front-Cover Texts, and no Back-Cover Texts.  A copy of the license
+is included in the section entitled [[GNU Free Documentation
+License|/fdl]]."]]"""]]
+
+IRC, freenode, #hurd, 2011-04-26
+
+    <guillem> pavkac: btw are you aware there's already some code to change the
+      keymap for the mach console (I think originally from the hurdfr guys, but
+      I cannot remember exactly from where I got it from :/)
+    <guillem> pavkac: http://www.hadrons.org/~guillem/tmp/hurd-keymap.tgz
+    <pavkac> guillem: No, I didn't know. I'll diff it and try to follow.
+    <guillem> pavkac: it would be nice to maybe integrate it properly into the
+      hurd
+    <guillem> you'll see the code is pretty basic, so extending it would be
+      nice too I guess :)
+    <pavkac> guillem: OK, I'll see to it. Unfortunately I'm quite busy this
+      week. Have a lot of homeworks to school. :/
+    <pavkac> guillem: But, I'll find some time during weekend.
+    <youpi> maybe it'd be simpler to add it to the hurd package and use that
+      from the console-setup package indeed
+    <youpi> but copyright issues should be solved
+    <youpi> unless we simply put this into hurdextras
+    <guillem> ok found this:
+      http://www.mail-archive.com/debian-hurd@lists.debian.org/msg02456.html
+    <guillem> and
+      http://www.mail-archive.com/debian-hurd@lists.debian.org/msg01173.html
+    <guillem> which seems to be the original Mark's code
+    <guillem> AFAIR I contributed the the spanish keymap and some additional
+      key definitions for loadkeys
+    <guillem> and http://lists.debian.org/debian-hurd/2000/10/msg00130.html
+    <pavkac> I've fetched all. :) But I must leave, good night if you're in
+      Europe. :)
+    <guillem> pavkac: the tarball I provided should be the latest, the others
+      are mostly to track the provenance of the source
diff --git a/open_issues/pflocal_reauth.mdwn b/open_issues/pflocal_reauth.mdwn
new file mode 100644
index 00000000..839e383d
--- /dev/null
+++ b/open_issues/pflocal_reauth.mdwn
@@ -0,0 +1,39 @@
+[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]]
+
+[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
+id="license" text="Permission is granted to copy, distribute and/or modify this
+document under the terms of the GNU Free Documentation License, Version 1.2 or
+any later version published by the Free Software Foundation; with no Invariant
+Sections, no Front-Cover Texts, and no Back-Cover Texts.  A copy of the license
+is included in the section entitled [[GNU Free Documentation
+License|/fdl]]."]]"""]]
+
+[[!tag open_issue_glibc open_issue_hurd]]
+
+IRC, freenode, #hurd, 2011-04-02
+
+    <pinotree> youpi: i'm playing with pflocal, and noticing that a simple C
+      executable doesn't trigger reauthenticate
+    <pinotree> youpi: i've put a debug output (to file) in S_io_reauthenticate,
+      and with a simple C test (which uses unix sockets) it isn't called
+    <youpi> pinotree: it seems pflocal should return FS_RETRY_REAUTH in
+      retry_type
+    <youpi> to make glibc call reauthentication
+    <pinotree> pflocal?
+    <youpi> yes, in the dir_lookup handler
+    <pinotree> isn't that ext2fs?
+    <youpi> libtrivfs had dir_lookup() too
+    <youpi> trivfs_check_open_hook can be used to tweak its behavior
+    <pinotree> ah, missed that pflocal was using libtrivfs, sorry
+    <youpi> there are probably very few translators which don't use one of the
+      lib*fs :)
+    <antrik> pinotree: what are you trying to do with pflocal?
+    <pinotree> local socket scredentials (SCM_CREDS)
+    <antrik> ah
+    <antrik> don't really know what that is, but I remember reading some
+      mention of it ;-)
+
+---
+
+See also [[pflocal_socket_credentials_for_local_sockets]] and
+[[sendmsg_scm_creds]].
diff --git a/open_issues/pflocal_socket_credentials_for_local_sockets.mdwn b/open_issues/pflocal_socket_credentials_for_local_sockets.mdwn
index 5a71412e..dfdc213c 100644
--- a/open_issues/pflocal_socket_credentials_for_local_sockets.mdwn
+++ b/open_issues/pflocal_socket_credentials_for_local_sockets.mdwn
@@ -40,3 +40,7 @@ IRC, freenode, #hurd, 2011-03-28
       S_io_reauthenticate cached in the sock_user struct?
     <youpi> yes
     <pinotree> nice thanks, i will try that change first
+
+---
+
+See also [[pflocal_reauth]] and [[sendmsg_scm_creds]].
diff --git a/open_issues/python.mdwn b/open_issues/python.mdwn
index 34fa81f6..403ff8aa 100644
--- a/open_issues/python.mdwn
+++ b/open_issues/python.mdwn
@@ -27,6 +27,11 @@ First, make the language functional, have its test suite pass without errors.
 
 [[!inline pages=community/gsoc/project_ideas/perl_python feeds=no]]
 
+
+## Analysis
+
+  * [[select_bogus_fd]]
+
 ---
 
 
diff --git a/open_issues/rework_gnumach_ipc_spaces.mdwn b/open_issues/rework_gnumach_ipc_spaces.mdwn
index 5bf0c530..b7cda227 100644
--- a/open_issues/rework_gnumach_ipc_spaces.mdwn
+++ b/open_issues/rework_gnumach_ipc_spaces.mdwn
@@ -10,6 +10,14 @@ License|/fdl]]."]]"""]]
 
 [[!tag open_issue_gnumach]]
 
+IRC, freenode, #hurd, 2011-05-07
+
+    <braunr> things that are referred to as "system calls" in glibc are
+      actually RPCs to the kernel or other tasks, those RPCs have too lookup
+      port rights
+    <braunr> the main services have tens of thousands of ports, looking up one
+      is slow
+
 There is a [[!FF_project 268]][[!tag bounty]] on this task.
 
 IRC, freenode, #hurd, 2011-04-23
@@ -241,3 +249,72 @@ IRC, freenode, #hurd, 2011-04-23
     <braunr> so a radix ree would be the most efficient
     <antrik> well, if some processes really feel they must use random numbers
       for port names, they *ought* to be penalized ;-)
+
+2011-04-27
+
+    <braunr> antrik: remember when you asked why high numbers would be a
+      problem with radix trees ?
+    <braunr> here is a radix tree with one entry, which key is around 5000
+    <braunr> [  656.296412] tree height: 3
+    <braunr> [  656.296412] index:  0, level:  0, height:  3, count:  1,
+      bitmap: 0000000000000002
+    <braunr> [  656.296412] index:  1, level:  1, height:  2, count:  1,
+      bitmap: 0000000000004000
+    <braunr> [  656.296412] index: 14, level:  2, height:  1, count:  1,
+      bitmap: 0000000000000080
+    <braunr> three levels, each with an external node (dynamically allocated),
+      for one entry
+    <braunr> so in the worst case of entries with keys close to the highest
+      values, the could be many external nodes with higher paths lengths than
+      when keys are close to 0
+    <braunr> which also brings the problem of port name allocation
+    <braunr> can someone with access to a buildd which has an uptime of at
+      least a few days (and did at least one build) show me the output of
+      portinfo 3 | tail ?
+    <braunr> port names are allocated linearly IIRC, like PIDs, and some parts
+      of the kernel may rely on them not being reused often
+    <braunr> but for maximum effifiency, they should be
+    <braunr> efficiency*
+    <braunr> 00:00 < braunr> can someone with access to a buildd which has an
+      uptime of at least a few days (and did at least one build) show me the
+      output of portinfo 3 | tail ?
+    <braunr> :)
+    <youpi> it's almost like wc -l
+    <youpi>   4905: receive
+    <youpi> vs 4647
+    <youpi> for /
+    <youpi>  52902: receive
+    <youpi> vs 52207
+    <youpi> for the chroot
+    <braunr> even after several builds ?
+    <braunr> and several days ?
+    <youpi> that's after 2 days
+    <youpi> it's not so many builds
+    <youpi> rossini is not so old
+    <youpi> (7h)
+    <youpi> but many builds
+    <youpi> 70927: send
+    <youpi> vs 70938
+    <braunr> ok
+    <braunr> so it seems port names are reused
+    <braunr> good
+    <youpi> yes they are clearly
+    <braunr> i think i remember a comment about why the same port name
+      shouldn't be reused too soon
+    <youpi> well, it could help catching programming errors
+    <braunr> that it helped catch bugs in applications that could
+      deallocate/reallote quickly
+    <braunr> reallocate*
+    <braunr> without carefuly synchronization
+    <braunr> careful
+    <braunr> damn, i'm tired :/
+    <youpi> but that's about debugging
+    <youpi> so we don't care about performance there
+    <braunr> yes
+    <braunr> i'll try to improve allocation performance too
+    <braunr> using e.g. bitmaps in each external node back to the root so that
+      unused slots are quickly found
+    <braunr> i thknk that's what idr does in linux
+    <antrik> braunr: idr?
+    <braunr> antrik: a data structure used to map integers to pointers
+    <braunr> http://fxr.watson.org/fxr/source/lib/idr.c?v=linux-2.6
diff --git a/open_issues/select.mdwn b/open_issues/select.mdwn
index ab6af90b..0f750631 100644
--- a/open_issues/select.mdwn
+++ b/open_issues/select.mdwn
@@ -1,4 +1,4 @@
-[[!meta copyright="Copyright © 2010 Free Software Foundation, Inc."]]
+[[!meta copyright="Copyright © 2010, 2011 Free Software Foundation, Inc."]]
 
 [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
 id="license" text="Permission is granted to copy, distribute and/or modify this
@@ -12,12 +12,23 @@ License|/fdl]]."]]"""]]
 
 There are a lot of reports about this issue, but no thorough analysis.
 
----
+
+# `elinks`
 
 IRC, unknown channel, unknown date.
 
-    <paakku> This is related to ELinks... I've looked at the select() implementation for the Hurd in glibc and it seems that giving it a short timeout could cause it not to report that file descriptors are ready.
-    <paakku> It sends a request to the Mach port of each file descriptor and then waits for responses from the servers.
-    <paakku> Even if the file descriptors have data for reading or are ready for writing, the server processes might not respond immediately.
-    <paakku> So if I want ELinks to check which file descriptors are ready, how long should the timeout be in order to ensure that all servers can respond in time?
+    <paakku> This is related to ELinks... I've looked at the select()
+      implementation for the Hurd in glibc and it seems that giving it a short
+      timeout could cause it not to report that file descriptors are ready.
+    <paakku> It sends a request to the Mach port of each file descriptor and
+      then waits for responses from the servers.
+    <paakku> Even if the file descriptors have data for reading or are ready
+      for writing, the server processes might not respond immediately.
+    <paakku> So if I want ELinks to check which file descriptors are ready, how
+      long should the timeout be in order to ensure that all servers can
+      respond in time?
     <paakku> Or do I just imagine this problem?
+
+---
+
+See also [[select_bogus_fd]] and [[select_vs_signals]].
diff --git a/open_issues/select_bogus_fd.mdwn b/open_issues/select_bogus_fd.mdwn
new file mode 100644
index 00000000..17aced4a
--- /dev/null
+++ b/open_issues/select_bogus_fd.mdwn
@@ -0,0 +1,55 @@
+[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]]
+
+[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
+id="license" text="Permission is granted to copy, distribute and/or modify this
+document under the terms of the GNU Free Documentation License, Version 1.2 or
+any later version published by the Free Software Foundation; with no Invariant
+Sections, no Front-Cover Texts, and no Back-Cover Texts.  A copy of the license
+is included in the section entitled [[GNU Free Documentation
+License|/fdl]]."]]"""]]
+
+[[!tag open_issue_glibc]]
+
+
+# Python
+
+IRC, freenode, #hurd, 2011-04-13
+
+    <abeaumont> ok, cause of first python testsuite failure located, now the
+      hard part, how to best fix it :)
+    <abeaumont> how to redesign the code to avoid the problem... that's the
+      hard part, mostly cause i lack contextual info
+    <abeaumont> tschwinge: the problem is pretty much summarized by this
+      comment in _hurd_select (in glibc): /* If one descriptor is bogus, we
+      fail completely.  */
+    <pochu> does POSIX say anything about what to do if one fd is invalid?
+    <pochu> and the other question is why python is calling select() with an
+      invalid fd
+    <abeaumont> pochu: yep, it says it should not fail completelly
+    <pochu> then that's our bug :)
+    <pinotree> abeaumont: just note that (at least on debian) some tests may
+      hang forever or cause hurd/mach to die
+    <pinotree> abeaumont: see in the debian/rules of the packaging of each
+      pythonX.Y source
+    <pinotree> ... there's a list of the tests excluded from the test suite run
+    <abeaumont> well, to be precise, python has a configure check for
+      'broken_poll' which hurd fails, and therefore python's select module is
+      not built, and anything depending on it fails
+    <abeaumont> broken_poll checks exactly for that posix requirement
+    <abeaumont> the reason for python using a non-existant
+      descriptor... unknown :D
+    <pochu> we should fix select to not fail miserably in that case
+    <pinotree> abeaumont: we have a patch to fix the broken poll check to
+      actually disable the poll module
+    <pochu> pinotree: but the proper fix is to fix select(), which is what
+      abeaumont is looking at
+    <abeaumont> pinotree: i'd say that's exactly what python's configure check
+      does itself -- disable building the select module
+    <pochu> abeaumont: what pinotree means is that the check is broken, see
+      http://patch-tracker.debian.org/patch/series/view/python2.6/2.6.6-8/hurd-broken-poll.diff
+    <pinotree> yes, the configure check for poll does the check, but not
+      everything of the poll module gets disabled (and you get a build failure)
+
+---
+
+See also [[select]] and [[select_vs_signals]].
diff --git a/open_issues/select_vs_signals.mdwn b/open_issues/select_vs_signals.mdwn
new file mode 100644
index 00000000..bbd69d00
--- /dev/null
+++ b/open_issues/select_vs_signals.mdwn
@@ -0,0 +1,25 @@
+[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]]
+
+[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
+id="license" text="Permission is granted to copy, distribute and/or modify this
+document under the terms of the GNU Free Documentation License, Version 1.2 or
+any later version published by the Free Software Foundation; with no Invariant
+Sections, no Front-Cover Texts, and no Back-Cover Texts.  A copy of the license
+is included in the section entitled [[GNU Free Documentation
+License|/fdl]]."]]"""]]
+
+[[!tag open_issue_glibc]]
+
+
+# `sudo`
+
+`sudo [task]` hands after finishing `[task]`.
+
+IRC, freenode, #hurd, 2011-04-02
+
+    <youpi> the sudo bug is select() not being able to get interrupted by
+      signals
+
+---
+
+See also [[select]] and [[select_bogus_fd]].
diff --git a/open_issues/sendmsg_scm_creds.mdwn b/open_issues/sendmsg_scm_creds.mdwn
index 1f4de59c..2deec7e8 100644
--- a/open_issues/sendmsg_scm_creds.mdwn
+++ b/open_issues/sendmsg_scm_creds.mdwn
@@ -1,4 +1,4 @@
-[[!meta copyright="Copyright © 2010 Free Software Foundation, Inc."]]
+[[!meta copyright="Copyright © 2010, 2011 Free Software Foundation, Inc."]]
 
 [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
 id="license" text="Permission is granted to copy, distribute and/or modify this
@@ -89,3 +89,7 @@ IRC, unknown channel, unknown date.
     <youpi> (since it's just about letting the application reading from the message structure)
     <pinotree> yep
     <youpi> ok, good :)
+
+---
+
+See also [[pflocal_socket_credentials_for_local_sockets]] and [[pflocal_reauth]].
diff --git a/open_issues/sigpipe.mdwn b/open_issues/sigpipe.mdwn
new file mode 100644
index 00000000..0df3560e
--- /dev/null
+++ b/open_issues/sigpipe.mdwn
@@ -0,0 +1,345 @@
+[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]]
+
+[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
+id="license" text="Permission is granted to copy, distribute and/or modify this
+document under the terms of the GNU Free Documentation License, Version 1.2 or
+any later version published by the Free Software Foundation; with no Invariant
+Sections, no Front-Cover Texts, and no Back-Cover Texts.  A copy of the license
+is included in the section entitled [[GNU Free Documentation
+License|/fdl]]."]]"""]]
+
+[[!tag open_issue_glibc open_issue_hurd]]
+
+[[!GNU_Savannah_bug 461]]
+
+IRC, freenode, #hurd, 2011-04-20
+
+    <svante_> I found a problem from 2002 by Marcus Brinkmann that I think is
+      related to my problems: http://savannah.gnu.org/bugs/?461. He has a test
+      file called pipetest.c that shows that SIGPIPE is not triggered reliably.
+    <svante_> Cited from the bug report: The attached test program does not
+      trigger SIGPIPE reliably, because closing the read end of the pipe
+      happens asynchronously. The write can succeed because the read end might
+      not have been closed yet.
+    <svante_> I have debugged this program on both Hurd and Linux, and the
+      problem in Hurd remains:-(
+    <svante_> Anybody looked into the almost ten year old
+      bug:http://savannah.gnu.org/bugs/?461 this one is definitely related to
+      the build problems of e.g. ghc6 and ruby1.9.1. Should I mention this on
+      the ML?
+    <youpi> that could be it indeed
+    <youpi> does th bug still happen?
+    <azeem> depends on: new interface io_close
+    <azeem> which depends on: POSIX record locking
+    <svante_> youpi: Yes it does, I've tested the pipetest.c file submitted by
+      Marcus B on both Linux and Hurd
+    <azeem> that would've maybe been a nice GSOC task
+    <youpi> azeem: err, the contrary for posix record locking, non ?
+    <azeem> argh
+    <azeem> why would POSIX record locking depend on this?
+    <azeem> well anyway, then have POSIX record locking be a GSOC task :)
+    <azeem> I wasn't aware that would also fix ruby and ghc building :)
+    <youpi> http://permalink.gmane.org/gmane.os.hurd.devel.readers/265
+    <youpi> (for io_close stuff)
+    <youpi> http://comments.gmane.org/gmane.os.hurd.devel.readers/63 actually
+    <azeem> I guess if they didn't implement it/agreed on something back then
+      it'd be quite hard to do it now
+    <svante_> azeem: marcus recently showed up here. Maybe he can help out/has
+      ideas?
+    <azeem> well yeah
+    <azeem> but marcus was the junior guy back then
+    <azeem> <marcus> but it's a very hurdish solution (ie, complex, buggy, and
+      not implemented)
+    <azeem> maybe we can go for something simpler
+    <youpi> azeem: what is this quote about?
+    <azeem> don't remember
+    <azeem> not io_close I'd say
+
+2011-04-21
+
+    <antrik> svante_: why do you think the problem you see in ruby and ghc is
+      related to async close() ?
+
+2011-04-22
+
+    <svante_> Well: the test case I'm running on ruby is giving me an EBADF
+      after 8 successful loops, and tracing within eglibc points towards
+      __mutex_lock_solid or __spin_lock,  __spin_lock_solid from
+      mach/lock-intern.h from cthreads.
+
+2011-04-23
+
+    <antrik> srs1: yeah, I saw it... but I still wonder what makes you think
+      this is related to async FD closing?
+    <srs1> antrik: Every test case showing the problems are related to fd.h and
+      the functions there, especially the ones used in the function:
+      _HURD_FD_H_EXTERN_INLINE struct hurd_fd *_hurd_fd_get (int fd) and so is
+      the pipetest from Marcus too.
+    <srs1> I have not yet been able to trace further with gdb since most
+      variables are optimized out and adding print statements does not work, at
+      least not yet. Now I'm trying to build eglibc with -O1 to see if the
+      optimized out variables are there or not.
+    <youpi> srs1: he means the ghc6 issue
+    <youpi> (and the ruby issue)
+    <srs1> youpi: Yes, the ghc6 and ruby ends at the functions I mentioned in
+      fd,h 
+    <srs1> Both ghc6 and ruby programs are writing to a file when the error
+      happens. If they are using a pipeline or not I don't know yet, I think it
+      is a regular file write.
+    <srs1> I can send your the ruby program if you like: It is a c-file so
+      debugging is possible. ghc6 is worse, since that program cannot be
+      debugged directly with gdb. 
+    <antrik> pipetest also results in the program hanging in locking stuff?...
+    <srs1> pipetest does not hang, but gives no output as it should. Running it
+      in gdb with single stepping shows the correct behavior, but then gdb
+      hangs if I try to single stepping further, continue at the right place
+      works!
+    <antrik> I haven't looked at the pipetest program. do you have the link
+      handy?
+    <antrik> never mind, got it
+    <antrik> srs1: that sounds like a GDB problem...
+    <youpi> most probably, yes
+    <youpi> (and I've always seen issues like this in gdb on hurd)
+    <antrik> actually I think it's expected... the RPC handling code has some
+      explicit GDB hooks AIUI; trying to single-step into this code is probably
+      expected to wreck havoc...
+    <youpi> well, it should have some sane behavior
+    <youpi> even if it's "skip to next point where it's debuggable"
+    <antrik> srs1: note that there is no BADF involved in the pipetest AIUI...
+
+2011-04-28
+
+    <antrik> what is the actual problem you are seeing BTW?
+    <gnu_srs1> antrik: in ruby the problem is: Exception `IOError' at
+      tool/mkconfig.rb:12 - closed stream
+    <gnu_srs1> Triggered by ruby:io.c:internal_read_func() calling
+      sysdeps/mach/hurd/read.c returning a negative number of bytes read. 
+    <abeaumont> gnu_srs1: why do you think that error is locking related?
+    <gnu_srs1> This happens after 8 iterations of the read loop with 8192 bytes
+      read each time.
+    <abeaumont> but that doesn't involve locking at all, does it?
+    <gnu_srs1> I think it is, if there is a pipepline set up??
+    <gnu_srs1> Also the ghc6 hang ends up in hangs in sysdeps/mach/hurd/read.c
+      traced into fd.h where all things happen (including setting locks and
+      mutexes)
+    <braunr> what locking ?
+    <braunr> stdio locking is different from file locking
+    <braunr> and a pipe doesn't imply file locking at all
+    <abeaumont> read may block on pipes, but it's unrelated to flock
+    <gnu_srs1> Look into the file fd.h, maybe you can describe things
+      better. I'm not fluent in this stuff.
+    <gnu_srs1> Has a pipe has a file descriptor associated to it? What about a
+      file read/write?
+    <abeaumont> a pipe provides 2 file descriptors, one for reading and another
+      one for writting
+    <abeaumont> i may give a look at that if i manage to build glibc
+      succesfully...
+    <gnu_srs1> Take a look at the realevant code from fd.h:
+      http://pastebin.com/kesBpjy4
+    <abeaumont> the ruby error happens just trying to build ruby1.9?
+    <abeaumont> gnu_srs1: from what you said, the error occurs while reading,
+      so i don't see how it can be related to that code
+    <abeaumont> you already got a descriptor if you're reading from it
+    <gnu_srs1> I have not tried anything else than ruby1.9.1. I can send you
+      the ruby debug setup and files if you are interested.
+    <abeaumont> gnu_srs1: ok, i'll try to build ruby1.9.1 later... let's see if
+      i can build glibc first
+    <gnu_srs1>  abeaumont: well, the read suddenly returns -1 bytes read,
+      resulting in a file descriptor of -1 (instead of +3).
+    <abeaumont> gnu_srs1: i see
+    <antrik> gnu_srs1: are you sure the hang really happens in _hurd_fd_get()?
+      could you give us a backtrace?
+    <antrik> gnu_srs1: there are many reasons why read() can return -1; errno
+      should indicate the reason. unfortunately, I can't make much out of
+      ruby's "translation" of the error :-)
+    <gnu_srs1> antrik: In the ruby case there is no hang: The steam is closed
+      by read() giving an error code !=0. This triggers things in the ruby
+      code:  A negative number of bytes read and a negative fd results, and an
+      error error is triggered in the ruby code.
+    <gnu_srs1> antrik: See http://pastebin.com/eZmaZQJr
+    <antrik> gnu_srs1: yes, this all sounds perfectly right. the question is
+      *why* read() returns an error code. we'd need to know what error it is
+      exactly, and in what situation it occurs. tracing the libc code is not at
+      all useful here
+    <antrik> uhm... 1073741833 is errno?...
+    <gnu_srs1> BTW: I think the error code is EBADF (badfile descriptor?). The
+      integer version of it is 1073741833, see the pastebin i linked to.
+    <antrik> you could use perror() to get something more readable :-)
+    <antrik> or error() with the right arguments
+    <gnu_srs1> I used integer when printing, but looking into fd.h I think it
+      is EBADF  (I did get this result once in gdb)
+    <antrik> fd.h won't tell you anything. most error codes are generated by
+      the server, not by libc
+    <antrik> BADF might be generated in libc when ruby tries to read on FD -1
+    <antrik> (no idea why it tries to do that... perhaps there is actually
+      something wrong/stupid in ruby's error handling)
+    <gnu_srs1> Well I single-stepped in fd.h using gdb and printing err gave
+      EBADf. err is declared as: error_t err in read.c
+    <antrik> at which point did you single-step? while fd was still 3?
+    <gnu_srs1> I don't think the problem is in ruby, it is in mach/hurd!
+      Similar problems with ghc, python-apt, etc
+    <gnu_srs1> Yes, fd=3 was not changed. I cannot trace into fd.h from
+      read.c. That is the problem with all cases! Need to leave for a while
+      now.
+    <antrik> sorry, I don't see *anything* similar in the ghc failure.
+    <antrik> I don't know about python-apt
+    <antrik> for the ghc case, I'd like to see a GDB backtrace from the point
+      where it is hanging
+    <antrik> just to be clear: anything I/O-related will involve fd.h
+      somewhere. that doesn't in any way indicate the problems are related. in
+      fact the symptoms you described are very different, and I'm pretty
+      certain these are completely different issues
+    <gnu_srs1> antrik: Here is a backtrace,
+      http://pastebin.com/wvCiXFjB. Numbers 6,7,8 are from the calling Haskell
+      functions. They cannot be debugged by gdb. Nice to see that somebody is
+      showing interest at last:-/
+    <antrik> hm... I wonder whether the _hurd_intr_rpc_msg_in_trap is a result
+      of the ^C?
+    <antrik> if so, it seems to be a "normal" bloking read() operation. so
+      again probably not related to libc code at all
+    <gnu_srs1> Where is this blocking read() code located mach/hurd?
+    <antrik> io_read() is implemented by whatever server handles the FD in
+      question
+    <antrik> I guess rpctrace will be more helpful here than GDB... to see what
+      the program is trying to do here
+    <gnu_srs1> Why don't I get there with gdb?
+    <antrik> err... the server is a different process
+    <antrik> you are only tracing the client code
+    <gnu_srs1> OK, here is a rpctrace for ruby:
+      http://pastebin.com/sdPiKGBW.Nice programs you have, no manual pages, and
+      the program hang
+    <gnu_srs1> s/http://pastebin.com/sdPiKGBW.Nice
+      /http://pastebin.com/sdPiKGBW. BTW: Nice/ 
+    <gnu_srs1> antrik: Do you want the rpctrace of the ghc hang too? If that is
+      the case, do you need the whole file. From the ruby case the last part
+      looked most interesting:
+      libpthread/sysdeps/generic/pt-mutex-timedlock.c: assert (mutex->owner !=
+      self);
+    <antrik> gnu_srs1: hm... you get that assertion only with rpctrace? guess
+      it doesn't work properly then :-(
+    <gnu_srs1> Is it visible on the client side?
+    <antrik> gnu_srs1: that assertion *is* from the client side. I'm just
+      surprised that apparently it's only triggered when you run it in rpctrace
+    <antrik> how did you invoke rpctrace?
+    <gnu_srs1> rpctrace "command with options" > rpctrace.out 2>&1
+    <antrik> well, I'd like to know the "command with options" part :-)
+    <gnu_srs1> OK: for ruby: ./miniruby ./ tool/mkconfig.rb as before.
+    <antrik> OK, so it just runs the ruby interpreter and no other processes
+    <gnu_srs1> No other processes involved!
+    <abeaumont> gnu_srs1: i can reproduce the ruby error, no let's dig in it :D
+    <antrik> gnu_srs1: rpctrace for ghc could be useful too... but if it's too
+      long, pasting only the last bit might suffice
+    <gnu_srs1> antrik: OK, will do that. Do you find anything interesting?
+    <gnu_srs1>  abeaumont: Using gdb: gdb ./miniruby; (gdb) break io.c:569; c8;
+      break fd.h:72 or break read.c:27 and you are there. Beware of gdb
+      hanging, so you need another terminal to kill -9 gdb (sometimes a reboot
+      is needed :-(
+    <antrik> gnu_srs1: no, the ruby rpctrace is useless; apparently rpctrace
+      makes it break before reaching the relevant part :-(
+    <abeaumont> thanks gnu_srs1 
+    <gnu_srs1> antrik: Hope for better luck with ghc:
+      http://pastebin.com/dgcrA05t
+    <antrik> hm... it hangs at proc_dostop() again... whatever that means
+
+2011-05-07
+
+    <gnu_srs> One question about ruby: I know where the problems occur in ruby
+      code. Can I switch to the kernel thread just before in gdb to single step
+      from there?
+    <youpi> you can put a breakpoint, can't you?
+    <antrik> gnu_srs: kernel thread?
+    <gnu_srs> Yes, but will single stepping from there lead me to the Hurd
+      code. I have not succeeded to do that yet!
+    <youpi> you mean the translator code?
+    <gnu_srs> Well, Roland did call it the signal thread, there are at least
+      two threads per process, a signal thread and a main (user) thread.
+    <youpi> then it's a thread in gdb
+    <youpi> just use the thread gdb commands to access it
+    <gnu_srs> I do find two threads in gdb, yes. But following only the user
+      thread does not lead me to the cause of the problems.
+    <gnu_srs> And following the other (signal thread) has not been successful
+      so far.
+    <youpi> multithreading debugging in gdb is painful yes
+    <youpi> single-step isn't really an option in it
+    <antrik> gnu_srs: well, as I said before, the cause is probably not in the
+      libc code anyways. it would be much more relevant to find out what the FD
+      in question is, and what "special" thing Ruby does to it to trigger the
+      problematic behaviour...
+    <youpi> it's simpler to put printfs etc.
+    <antrik> youpi: well, printf doesn't work in the FD code :-)
+    <youpi> you can make it work
+    <youpi> open /dev/mem, write to 0xb8000
+    <youpi> I'm not even joking
+    <gnu_srs> I have printfs in the ruby code. And at some parts in eglibc (but
+      it is not possible to put them at all places I want, as mentioned before)
+    <antrik> sure, there are ways to debug this code too... but I don't think
+      it's useful. so far there is no indication that this will help finding
+      the actual issue
+    <gnu_srs> The problem is not file descriptors. It is that an ongoing read
+      suddenly returns -1 bytes read. And then the ruby code assigns a negative
+      file descriptor in the exception handling.
+    <youpi> a *read* ?
+    <youpi> with errno == 0 ?
+    <gnu_srs> Yes, a read!
+    <youpi> how ruby comes to assigning a negative fd from that?
+    <youpi> does it somehow close the fd?
+    <gnu_srs> The errno reported from the read is EBADF!
+    <youpi> did you try to rpctrace it?
+    <gnu_srs> I don't bother too much about ruby exception handling. The error
+      has already happened in the read operation. And that lead me to eglibc
+      code.... and so on...
+    <youpi> do you know what kind of file this fd was supposed to be on?
+    <youpi> sure, that's debugging
+    <gnu_srs> Yes I did rpctrace, but that was not successful. rpctrace just
+      hang! Buggy code?
+    <antrik> youpi: I assume that's Ruby's way to indicate that the FD is not
+      valid anymore, after the previous error
+    <youpi> does the program fork?
+    <youpi> antrik: possibly
+    <youpi> rpctrace has known issues, yes
+    <youpi> gnu_srs: did you trace close()s by hand with printfs?
+    <gnu_srs> Ho w to find out if it forks?
+    <youpi> what does rpctrace stop on ?
+    <gnu_srs> Well, I don't remember. Antrik?
+    <antrik> proc_dostop() IIRC
+    <antrik> or something like that
+    <gnu_srs> I did not find any close() statements in the code I debugged.
+    <youpi> ok, proc_dostop() is typically a sign of fork()
+    <youpi> gnu_srs: that doesn't necessarily mean it's not called
+    <antrik> gnu_srs: I think his point is that something else might close the
+      FD, causing the error you see
+    <youpi> anything can happen in the wild :)
+    <antrik> gnu_srs: as I said before, the next step is to find out what this
+      FD is, and what happens to it...
+    <gnu_srs> antrik: Any ideas how to find out?
+    <youpi> what is the backtrace?
+    <gnu_srs> Well I know the fd number, it is either 3 or 5 in my tests. Does
+      the number matter?
+    <youpi> yes, it's not std{in,out,err}
+    <gnu_srs> How to get a backtrace of a program that does not hang?
+    <youpi> make it hang at the point of failure
+    <youpi> when read returns -1
+    <youpi> so you know who did the read
+    <gnu_srs> I have to run the loop several times before the number of bytes
+      read is -1.
+    <youpi> you mean running the program several times ?
+    <youpi> or just let the loop continue for some time?
+    <pinotree> if it's the latter, you can add breakpoints with conditions
+    <gnu_srs> No the read loop runs for 7 iterations, and fails the 8th time!
+    <youpi> then make it hang when read() returns -1
+    <Mr_Spock> could you paste your code somewhere?
+    <youpi> when debugging, you're allowed to do all kinds of ugly things, you
+      know ;)
+    <gnu_srs> OK, I'll try that.
+    <gnu_srs> MR_Spock: The easiest way would be to try to build
+      ruby1.9.1. Then I can help you from where it fails. 
+    <gnu_srs> pinotree: How to give a breakpoint with a condition?
+    <pinotree> break where if condition
+    <youpi> see help break
+    <youpi> oh, there's even a thread condition nowadays, good
+    <gnu_srs> Thanks for the discussion. I have to get into the real world for
+      a while now. To be continued.
+    <antrik> gnu_srs: well, if you already know that the loop runs several
+      times before the error occurs, you apparently already looked at the
+      higher-level code that is relevant here...
+    <youpi> but it may be generic code, and not tell what calls it
diff --git a/open_issues/system_call_mechanism.mdwn b/open_issues/system_call_mechanism.mdwn
new file mode 100644
index 00000000..5598148c
--- /dev/null
+++ b/open_issues/system_call_mechanism.mdwn
@@ -0,0 +1,17 @@
+[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]]
+
+[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
+id="license" text="Permission is granted to copy, distribute and/or modify this
+document under the terms of the GNU Free Documentation License, Version 1.2 or
+any later version published by the Free Software Foundation; with no Invariant
+Sections, no Front-Cover Texts, and no Back-Cover Texts.  A copy of the license
+is included in the section entitled [[GNU Free Documentation
+License|/fdl]]."]]"""]]
+
+[[!tag open_issue_gnumach]]
+
+IRC, freenode, #hurd, 2011-05-07
+
+    <braunr> very simple examples: system calls use old call gates, which are
+      the slowest path to kernel space
+    <braunr> modern processors have dedicated instructions now
author	Thomas Schwinge <thomas@schwinge.name>	2011-05-09 10:47:56 +0200
committer	Thomas Schwinge <thomas@schwinge.name>	2011-05-09 10:47:56 +0200
commit	2bc136e680877b6a9d17d6a0e815b47775088d67 (patch)
tree	21400fef6b3d6e6f59c4a504038348da78397264 /open_issues
parent	946dbc8338a431b78e4a7b25d24fda36ee4cadf3 (diff)