summaryrefslogtreecommitdiff
path: root/open_issues
diff options
context:
space:
mode:
Diffstat (limited to 'open_issues')
-rw-r--r--open_issues/address_space_memory_mapping_entries.mdwn19
-rw-r--r--open_issues/ext2fs_page_cache_swapping_leak.mdwn24
-rw-r--r--open_issues/gnumach_memory_management.mdwn73
-rw-r--r--open_issues/keymap_mach_console.mdwn40
-rw-r--r--open_issues/pflocal_reauth.mdwn39
-rw-r--r--open_issues/pflocal_socket_credentials_for_local_sockets.mdwn4
-rw-r--r--open_issues/python.mdwn5
-rw-r--r--open_issues/rework_gnumach_ipc_spaces.mdwn77
-rw-r--r--open_issues/select.mdwn23
-rw-r--r--open_issues/select_bogus_fd.mdwn55
-rw-r--r--open_issues/select_vs_signals.mdwn25
-rw-r--r--open_issues/sendmsg_scm_creds.mdwn6
-rw-r--r--open_issues/sigpipe.mdwn345
-rw-r--r--open_issues/system_call_mechanism.mdwn17
14 files changed, 745 insertions, 7 deletions
diff --git a/open_issues/address_space_memory_mapping_entries.mdwn b/open_issues/address_space_memory_mapping_entries.mdwn
new file mode 100644
index 00000000..caf447dd
--- /dev/null
+++ b/open_issues/address_space_memory_mapping_entries.mdwn
@@ -0,0 +1,19 @@
+[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]]
+
+[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
+id="license" text="Permission is granted to copy, distribute and/or modify this
+document under the terms of the GNU Free Documentation License, Version 1.2 or
+any later version published by the Free Software Foundation; with no Invariant
+Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license
+is included in the section entitled [[GNU Free Documentation
+License|/fdl]]."]]"""]]
+
+[[!tag open_issue_gnumach]]
+
+IRC, freenode, #hurd, 2011-05-07
+
+ <braunr> and as a last example: memory mapping is heavily used in the hurd,
+ but for some reason, the map entries in an address space are still on a
+ linked list
+ <braunr> a bare linked list
+ <braunr> which makes faults and page cache lookups even slower
diff --git a/open_issues/ext2fs_page_cache_swapping_leak.mdwn b/open_issues/ext2fs_page_cache_swapping_leak.mdwn
index 607c3af4..c0d0867b 100644
--- a/open_issues/ext2fs_page_cache_swapping_leak.mdwn
+++ b/open_issues/ext2fs_page_cache_swapping_leak.mdwn
@@ -149,3 +149,27 @@ IRC, freenode, #hurd, 2011-04-18
<antrik> this make testing this stuff quite a lot harder... [sigh]
<antrik> any suggestions how to debug this hang?
<braunr> antrik: no :/
+
+2011-04-28: [[!taglink open_issue_documentation]]
+
+ <antrik> hm... is it normal that "swap free" doesn't increase as a process'
+ memory is paged back in?
+ <youpi> yes
+ <youpi> there's no real use cleaning swap
+ <youpi> on the contrary, it makes paging the process out again longer
+ <antrik> hm... so essentially, after swapping back and forth a bit, a part
+ of the swap equal to the size of physical RAM will be occupied with stuff
+ that is actually in RAM?
+ <youpi> yes
+ <youpi> so that that RAM can be freed immediately if needed
+ <antrik> hm... that means my effective swap size is only like 300 MB... no
+ wonder I see crashes under load
+ <antrik> err... make that 230 actually
+ <antrik> indeed, quitting the application freed both the physical RAM and
+ swap space
+ <braunr> 02:28 < antrik> hm... is it normal that "swap free" doesn't
+ increase as a process' memory is paged back in?
+ <braunr> swap is the backing store of anonymous memory, like ext2fs is the
+ backing store of memory objects created from its pager
+ <braunr> so you can view swap as the file system for everything that isn't
+ an external memory object
diff --git a/open_issues/gnumach_memory_management.mdwn b/open_issues/gnumach_memory_management.mdwn
index 1b897454..a5dd6955 100644
--- a/open_issues/gnumach_memory_management.mdwn
+++ b/open_issues/gnumach_memory_management.mdwn
@@ -772,3 +772,76 @@ IRC, freenode, #hurd, 2011-04-12:
<braunr> FreeBSD uses a binary buddy system like Linux
<braunr> the fact that the kernel allocator uses virtual memory doesn't
mean the kernel has no mean to allocate contiguous physical memory ...
+
+2011-05-02
+
+ <braunr> hm nice, my allocator uses less memory than glibc (squeeze
+ version) on both 32 and 64 bits systems
+ <braunr> the new per-cpu layer is proving effective
+ <neal> braunr: Are you reimplementation malloc?
+ <braunr> no
+ <braunr> it's still the slab allocator for mach, but tested in userspace
+ <braunr> so i wrote malloc wrappers
+ <neal> Oh.
+ <braunr> i try to heavily test most of my code in userspace now
+ <neal> it's easier :-)
+ <neal> I agree
+ <braunr> even the physical memory allocator has been implemented this way
+ <neal> is this your mach version?
+ <braunr> virtual memory allocation will follow
+ <neal> or are you working on gnu mach?
+ <braunr> for now it's my version
+ <braunr> but i intend to spend the summer working on ipc port names
+ management
+
+[[rework_gnumach_IPC_spaces]].
+
+ <braunr> and integrate the result in gnu mach
+ <neal> are you keeping the same user-space API?
+ <neal> Or are you experimenting with something new?
+ <antrik> braunr: to be fair, it's not terribly hard to use less memory than
+ glibc :-)
+ <braunr> yes
+ <braunr> antrik: well ptmalloc3 received some nice improvements
+ <braunr> neal: the goal is to rework some of the internals only
+ <braunr> neal: namely, i simply intend to replace the splay tree with a
+ radix tree
+ <antrik> braunr: the glibc allocator is emphasising performace, unlike some
+ other allocators that trade some performance for much better memory
+ utilisation...
+ <antrik> ptmalloc3?
+ <braunr> that's the allocator used in glibc
+ <braunr> http://www.malloc.de/en/
+ <antrik> OK. haven't seen any recent numbers... the comparision I have in
+ mind is many years old...
+ <braunr> i also made some additions to my avl and red-black trees this week
+ end, which finally make them suitable for almost all generic uses
+ <braunr> the red-black tree could be used in e.g. gnu mach to augment the
+ linked list used in vm maps
+ <braunr> which is what's done in most modern systems
+ <braunr> it could also be used to drop the overloaded (and probably over
+ imbalanced) page cache hash table
+
+2011-05-03
+
+ <mcsim> antrik: How should I start porting? Have I just include rbraun's
+ allocator to gnumach and make it compile?
+ <antrik> mcsim: well, basically yes I guess... but you will have to look at
+ the code in question first before we know anything more specific :-)
+ <antrik> I guess braunr might know better how to start, but he doesn't
+ appear to be here :-(
+ <braunr> mcsim: you can't juste put my code into gnu mach and make it run,
+ it really requires a few careful changes
+ <braunr> mcsim: you will have to analyse how the current zone allocator
+ interacts with regard to locking
+ <braunr> if it is used in interrupt handlers
+ <braunr> what kind of locks it should use instead of the pthread stuff
+ available in userspace
+ <braunr> you will have to change the reclamiing policy, so that caches are
+ reaped on demand
+ <braunr> (this basically boils down to calling the new reclaiming function
+ instead of zone_gc())
+ <braunr> you must be careful about types too
+ <braunr> there is work to be done ;)
+ <braunr> (not to mention the obvious about replacing all the calls to the
+ zone allocator, and testing/debugging afterwards)
diff --git a/open_issues/keymap_mach_console.mdwn b/open_issues/keymap_mach_console.mdwn
new file mode 100644
index 00000000..3063dd00
--- /dev/null
+++ b/open_issues/keymap_mach_console.mdwn
@@ -0,0 +1,40 @@
+[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]]
+
+[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
+id="license" text="Permission is granted to copy, distribute and/or modify this
+document under the terms of the GNU Free Documentation License, Version 1.2 or
+any later version published by the Free Software Foundation; with no Invariant
+Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license
+is included in the section entitled [[GNU Free Documentation
+License|/fdl]]."]]"""]]
+
+IRC, freenode, #hurd, 2011-04-26
+
+ <guillem> pavkac: btw are you aware there's already some code to change the
+ keymap for the mach console (I think originally from the hurdfr guys, but
+ I cannot remember exactly from where I got it from :/)
+ <guillem> pavkac: http://www.hadrons.org/~guillem/tmp/hurd-keymap.tgz
+ <pavkac> guillem: No, I didn't know. I'll diff it and try to follow.
+ <guillem> pavkac: it would be nice to maybe integrate it properly into the
+ hurd
+ <guillem> you'll see the code is pretty basic, so extending it would be
+ nice too I guess :)
+ <pavkac> guillem: OK, I'll see to it. Unfortunately I'm quite busy this
+ week. Have a lot of homeworks to school. :/
+ <pavkac> guillem: But, I'll find some time during weekend.
+ <youpi> maybe it'd be simpler to add it to the hurd package and use that
+ from the console-setup package indeed
+ <youpi> but copyright issues should be solved
+ <youpi> unless we simply put this into hurdextras
+ <guillem> ok found this:
+ http://www.mail-archive.com/debian-hurd@lists.debian.org/msg02456.html
+ <guillem> and
+ http://www.mail-archive.com/debian-hurd@lists.debian.org/msg01173.html
+ <guillem> which seems to be the original Mark's code
+ <guillem> AFAIR I contributed the the spanish keymap and some additional
+ key definitions for loadkeys
+ <guillem> and http://lists.debian.org/debian-hurd/2000/10/msg00130.html
+ <pavkac> I've fetched all. :) But I must leave, good night if you're in
+ Europe. :)
+ <guillem> pavkac: the tarball I provided should be the latest, the others
+ are mostly to track the provenance of the source
diff --git a/open_issues/pflocal_reauth.mdwn b/open_issues/pflocal_reauth.mdwn
new file mode 100644
index 00000000..839e383d
--- /dev/null
+++ b/open_issues/pflocal_reauth.mdwn
@@ -0,0 +1,39 @@
+[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]]
+
+[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
+id="license" text="Permission is granted to copy, distribute and/or modify this
+document under the terms of the GNU Free Documentation License, Version 1.2 or
+any later version published by the Free Software Foundation; with no Invariant
+Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license
+is included in the section entitled [[GNU Free Documentation
+License|/fdl]]."]]"""]]
+
+[[!tag open_issue_glibc open_issue_hurd]]
+
+IRC, freenode, #hurd, 2011-04-02
+
+ <pinotree> youpi: i'm playing with pflocal, and noticing that a simple C
+ executable doesn't trigger reauthenticate
+ <pinotree> youpi: i've put a debug output (to file) in S_io_reauthenticate,
+ and with a simple C test (which uses unix sockets) it isn't called
+ <youpi> pinotree: it seems pflocal should return FS_RETRY_REAUTH in
+ retry_type
+ <youpi> to make glibc call reauthentication
+ <pinotree> pflocal?
+ <youpi> yes, in the dir_lookup handler
+ <pinotree> isn't that ext2fs?
+ <youpi> libtrivfs had dir_lookup() too
+ <youpi> trivfs_check_open_hook can be used to tweak its behavior
+ <pinotree> ah, missed that pflocal was using libtrivfs, sorry
+ <youpi> there are probably very few translators which don't use one of the
+ lib*fs :)
+ <antrik> pinotree: what are you trying to do with pflocal?
+ <pinotree> local socket scredentials (SCM_CREDS)
+ <antrik> ah
+ <antrik> don't really know what that is, but I remember reading some
+ mention of it ;-)
+
+---
+
+See also [[pflocal_socket_credentials_for_local_sockets]] and
+[[sendmsg_scm_creds]].
diff --git a/open_issues/pflocal_socket_credentials_for_local_sockets.mdwn b/open_issues/pflocal_socket_credentials_for_local_sockets.mdwn
index 5a71412e..dfdc213c 100644
--- a/open_issues/pflocal_socket_credentials_for_local_sockets.mdwn
+++ b/open_issues/pflocal_socket_credentials_for_local_sockets.mdwn
@@ -40,3 +40,7 @@ IRC, freenode, #hurd, 2011-03-28
S_io_reauthenticate cached in the sock_user struct?
<youpi> yes
<pinotree> nice thanks, i will try that change first
+
+---
+
+See also [[pflocal_reauth]] and [[sendmsg_scm_creds]].
diff --git a/open_issues/python.mdwn b/open_issues/python.mdwn
index 34fa81f6..403ff8aa 100644
--- a/open_issues/python.mdwn
+++ b/open_issues/python.mdwn
@@ -27,6 +27,11 @@ First, make the language functional, have its test suite pass without errors.
[[!inline pages=community/gsoc/project_ideas/perl_python feeds=no]]
+
+## Analysis
+
+ * [[select_bogus_fd]]
+
---
diff --git a/open_issues/rework_gnumach_ipc_spaces.mdwn b/open_issues/rework_gnumach_ipc_spaces.mdwn
index 5bf0c530..b7cda227 100644
--- a/open_issues/rework_gnumach_ipc_spaces.mdwn
+++ b/open_issues/rework_gnumach_ipc_spaces.mdwn
@@ -10,6 +10,14 @@ License|/fdl]]."]]"""]]
[[!tag open_issue_gnumach]]
+IRC, freenode, #hurd, 2011-05-07
+
+ <braunr> things that are referred to as "system calls" in glibc are
+ actually RPCs to the kernel or other tasks, those RPCs have too lookup
+ port rights
+ <braunr> the main services have tens of thousands of ports, looking up one
+ is slow
+
There is a [[!FF_project 268]][[!tag bounty]] on this task.
IRC, freenode, #hurd, 2011-04-23
@@ -241,3 +249,72 @@ IRC, freenode, #hurd, 2011-04-23
<braunr> so a radix ree would be the most efficient
<antrik> well, if some processes really feel they must use random numbers
for port names, they *ought* to be penalized ;-)
+
+2011-04-27
+
+ <braunr> antrik: remember when you asked why high numbers would be a
+ problem with radix trees ?
+ <braunr> here is a radix tree with one entry, which key is around 5000
+ <braunr> [ 656.296412] tree height: 3
+ <braunr> [ 656.296412] index: 0, level: 0, height: 3, count: 1,
+ bitmap: 0000000000000002
+ <braunr> [ 656.296412] index: 1, level: 1, height: 2, count: 1,
+ bitmap: 0000000000004000
+ <braunr> [ 656.296412] index: 14, level: 2, height: 1, count: 1,
+ bitmap: 0000000000000080
+ <braunr> three levels, each with an external node (dynamically allocated),
+ for one entry
+ <braunr> so in the worst case of entries with keys close to the highest
+ values, the could be many external nodes with higher paths lengths than
+ when keys are close to 0
+ <braunr> which also brings the problem of port name allocation
+ <braunr> can someone with access to a buildd which has an uptime of at
+ least a few days (and did at least one build) show me the output of
+ portinfo 3 | tail ?
+ <braunr> port names are allocated linearly IIRC, like PIDs, and some parts
+ of the kernel may rely on them not being reused often
+ <braunr> but for maximum effifiency, they should be
+ <braunr> efficiency*
+ <braunr> 00:00 < braunr> can someone with access to a buildd which has an
+ uptime of at least a few days (and did at least one build) show me the
+ output of portinfo 3 | tail ?
+ <braunr> :)
+ <youpi> it's almost like wc -l
+ <youpi> 4905: receive
+ <youpi> vs 4647
+ <youpi> for /
+ <youpi> 52902: receive
+ <youpi> vs 52207
+ <youpi> for the chroot
+ <braunr> even after several builds ?
+ <braunr> and several days ?
+ <youpi> that's after 2 days
+ <youpi> it's not so many builds
+ <youpi> rossini is not so old
+ <youpi> (7h)
+ <youpi> but many builds
+ <youpi> 70927: send
+ <youpi> vs 70938
+ <braunr> ok
+ <braunr> so it seems port names are reused
+ <braunr> good
+ <youpi> yes they are clearly
+ <braunr> i think i remember a comment about why the same port name
+ shouldn't be reused too soon
+ <youpi> well, it could help catching programming errors
+ <braunr> that it helped catch bugs in applications that could
+ deallocate/reallote quickly
+ <braunr> reallocate*
+ <braunr> without carefuly synchronization
+ <braunr> careful
+ <braunr> damn, i'm tired :/
+ <youpi> but that's about debugging
+ <youpi> so we don't care about performance there
+ <braunr> yes
+ <braunr> i'll try to improve allocation performance too
+ <braunr> using e.g. bitmaps in each external node back to the root so that
+ unused slots are quickly found
+ <braunr> i thknk that's what idr does in linux
+ <antrik> braunr: idr?
+ <braunr> antrik: a data structure used to map integers to pointers
+ <braunr> http://fxr.watson.org/fxr/source/lib/idr.c?v=linux-2.6
diff --git a/open_issues/select.mdwn b/open_issues/select.mdwn
index ab6af90b..0f750631 100644
--- a/open_issues/select.mdwn
+++ b/open_issues/select.mdwn
@@ -1,4 +1,4 @@
-[[!meta copyright="Copyright © 2010 Free Software Foundation, Inc."]]
+[[!meta copyright="Copyright © 2010, 2011 Free Software Foundation, Inc."]]
[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
id="license" text="Permission is granted to copy, distribute and/or modify this
@@ -12,12 +12,23 @@ License|/fdl]]."]]"""]]
There are a lot of reports about this issue, but no thorough analysis.
----
+
+# `elinks`
IRC, unknown channel, unknown date.
- <paakku> This is related to ELinks... I've looked at the select() implementation for the Hurd in glibc and it seems that giving it a short timeout could cause it not to report that file descriptors are ready.
- <paakku> It sends a request to the Mach port of each file descriptor and then waits for responses from the servers.
- <paakku> Even if the file descriptors have data for reading or are ready for writing, the server processes might not respond immediately.
- <paakku> So if I want ELinks to check which file descriptors are ready, how long should the timeout be in order to ensure that all servers can respond in time?
+ <paakku> This is related to ELinks... I've looked at the select()
+ implementation for the Hurd in glibc and it seems that giving it a short
+ timeout could cause it not to report that file descriptors are ready.
+ <paakku> It sends a request to the Mach port of each file descriptor and
+ then waits for responses from the servers.
+ <paakku> Even if the file descriptors have data for reading or are ready
+ for writing, the server processes might not respond immediately.
+ <paakku> So if I want ELinks to check which file descriptors are ready, how
+ long should the timeout be in order to ensure that all servers can
+ respond in time?
<paakku> Or do I just imagine this problem?
+
+---
+
+See also [[select_bogus_fd]] and [[select_vs_signals]].
diff --git a/open_issues/select_bogus_fd.mdwn b/open_issues/select_bogus_fd.mdwn
new file mode 100644
index 00000000..17aced4a
--- /dev/null
+++ b/open_issues/select_bogus_fd.mdwn
@@ -0,0 +1,55 @@
+[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]]
+
+[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
+id="license" text="Permission is granted to copy, distribute and/or modify this
+document under the terms of the GNU Free Documentation License, Version 1.2 or
+any later version published by the Free Software Foundation; with no Invariant
+Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license
+is included in the section entitled [[GNU Free Documentation
+License|/fdl]]."]]"""]]
+
+[[!tag open_issue_glibc]]
+
+
+# Python
+
+IRC, freenode, #hurd, 2011-04-13
+
+ <abeaumont> ok, cause of first python testsuite failure located, now the
+ hard part, how to best fix it :)
+ <abeaumont> how to redesign the code to avoid the problem... that's the
+ hard part, mostly cause i lack contextual info
+ <abeaumont> tschwinge: the problem is pretty much summarized by this
+ comment in _hurd_select (in glibc): /* If one descriptor is bogus, we
+ fail completely. */
+ <pochu> does POSIX say anything about what to do if one fd is invalid?
+ <pochu> and the other question is why python is calling select() with an
+ invalid fd
+ <abeaumont> pochu: yep, it says it should not fail completelly
+ <pochu> then that's our bug :)
+ <pinotree> abeaumont: just note that (at least on debian) some tests may
+ hang forever or cause hurd/mach to die
+ <pinotree> abeaumont: see in the debian/rules of the packaging of each
+ pythonX.Y source
+ <pinotree> ... there's a list of the tests excluded from the test suite run
+ <abeaumont> well, to be precise, python has a configure check for
+ 'broken_poll' which hurd fails, and therefore python's select module is
+ not built, and anything depending on it fails
+ <abeaumont> broken_poll checks exactly for that posix requirement
+ <abeaumont> the reason for python using a non-existant
+ descriptor... unknown :D
+ <pochu> we should fix select to not fail miserably in that case
+ <pinotree> abeaumont: we have a patch to fix the broken poll check to
+ actually disable the poll module
+ <pochu> pinotree: but the proper fix is to fix select(), which is what
+ abeaumont is looking at
+ <abeaumont> pinotree: i'd say that's exactly what python's configure check
+ does itself -- disable building the select module
+ <pochu> abeaumont: what pinotree means is that the check is broken, see
+ http://patch-tracker.debian.org/patch/series/view/python2.6/2.6.6-8/hurd-broken-poll.diff
+ <pinotree> yes, the configure check for poll does the check, but not
+ everything of the poll module gets disabled (and you get a build failure)
+
+---
+
+See also [[select]] and [[select_vs_signals]].
diff --git a/open_issues/select_vs_signals.mdwn b/open_issues/select_vs_signals.mdwn
new file mode 100644
index 00000000..bbd69d00
--- /dev/null
+++ b/open_issues/select_vs_signals.mdwn
@@ -0,0 +1,25 @@
+[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]]
+
+[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
+id="license" text="Permission is granted to copy, distribute and/or modify this
+document under the terms of the GNU Free Documentation License, Version 1.2 or
+any later version published by the Free Software Foundation; with no Invariant
+Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license
+is included in the section entitled [[GNU Free Documentation
+License|/fdl]]."]]"""]]
+
+[[!tag open_issue_glibc]]
+
+
+# `sudo`
+
+`sudo [task]` hands after finishing `[task]`.
+
+IRC, freenode, #hurd, 2011-04-02
+
+ <youpi> the sudo bug is select() not being able to get interrupted by
+ signals
+
+---
+
+See also [[select]] and [[select_bogus_fd]].
diff --git a/open_issues/sendmsg_scm_creds.mdwn b/open_issues/sendmsg_scm_creds.mdwn
index 1f4de59c..2deec7e8 100644
--- a/open_issues/sendmsg_scm_creds.mdwn
+++ b/open_issues/sendmsg_scm_creds.mdwn
@@ -1,4 +1,4 @@
-[[!meta copyright="Copyright © 2010 Free Software Foundation, Inc."]]
+[[!meta copyright="Copyright © 2010, 2011 Free Software Foundation, Inc."]]
[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
id="license" text="Permission is granted to copy, distribute and/or modify this
@@ -89,3 +89,7 @@ IRC, unknown channel, unknown date.
<youpi> (since it's just about letting the application reading from the message structure)
<pinotree> yep
<youpi> ok, good :)
+
+---
+
+See also [[pflocal_socket_credentials_for_local_sockets]] and [[pflocal_reauth]].
diff --git a/open_issues/sigpipe.mdwn b/open_issues/sigpipe.mdwn
new file mode 100644
index 00000000..0df3560e
--- /dev/null
+++ b/open_issues/sigpipe.mdwn
@@ -0,0 +1,345 @@
+[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]]
+
+[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
+id="license" text="Permission is granted to copy, distribute and/or modify this
+document under the terms of the GNU Free Documentation License, Version 1.2 or
+any later version published by the Free Software Foundation; with no Invariant
+Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license
+is included in the section entitled [[GNU Free Documentation
+License|/fdl]]."]]"""]]
+
+[[!tag open_issue_glibc open_issue_hurd]]
+
+[[!GNU_Savannah_bug 461]]
+
+IRC, freenode, #hurd, 2011-04-20
+
+ <svante_> I found a problem from 2002 by Marcus Brinkmann that I think is
+ related to my problems: http://savannah.gnu.org/bugs/?461. He has a test
+ file called pipetest.c that shows that SIGPIPE is not triggered reliably.
+ <svante_> Cited from the bug report: The attached test program does not
+ trigger SIGPIPE reliably, because closing the read end of the pipe
+ happens asynchronously. The write can succeed because the read end might
+ not have been closed yet.
+ <svante_> I have debugged this program on both Hurd and Linux, and the
+ problem in Hurd remains:-(
+ <svante_> Anybody looked into the almost ten year old
+ bug:http://savannah.gnu.org/bugs/?461 this one is definitely related to
+ the build problems of e.g. ghc6 and ruby1.9.1. Should I mention this on
+ the ML?
+ <youpi> that could be it indeed
+ <youpi> does th bug still happen?
+ <azeem> depends on: new interface io_close
+ <azeem> which depends on: POSIX record locking
+ <svante_> youpi: Yes it does, I've tested the pipetest.c file submitted by
+ Marcus B on both Linux and Hurd
+ <azeem> that would've maybe been a nice GSOC task
+ <youpi> azeem: err, the contrary for posix record locking, non ?
+ <azeem> argh
+ <azeem> why would POSIX record locking depend on this?
+ <azeem> well anyway, then have POSIX record locking be a GSOC task :)
+ <azeem> I wasn't aware that would also fix ruby and ghc building :)
+ <youpi> http://permalink.gmane.org/gmane.os.hurd.devel.readers/265
+ <youpi> (for io_close stuff)
+ <youpi> http://comments.gmane.org/gmane.os.hurd.devel.readers/63 actually
+ <azeem> I guess if they didn't implement it/agreed on something back then
+ it'd be quite hard to do it now
+ <svante_> azeem: marcus recently showed up here. Maybe he can help out/has
+ ideas?
+ <azeem> well yeah
+ <azeem> but marcus was the junior guy back then
+ <azeem> <marcus> but it's a very hurdish solution (ie, complex, buggy, and
+ not implemented)
+ <azeem> maybe we can go for something simpler
+ <youpi> azeem: what is this quote about?
+ <azeem> don't remember
+ <azeem> not io_close I'd say
+
+2011-04-21
+
+ <antrik> svante_: why do you think the problem you see in ruby and ghc is
+ related to async close() ?
+
+2011-04-22
+
+ <svante_> Well: the test case I'm running on ruby is giving me an EBADF
+ after 8 successful loops, and tracing within eglibc points towards
+ __mutex_lock_solid or __spin_lock, __spin_lock_solid from
+ mach/lock-intern.h from cthreads.
+
+2011-04-23
+
+ <antrik> srs1: yeah, I saw it... but I still wonder what makes you think
+ this is related to async FD closing?
+ <srs1> antrik: Every test case showing the problems are related to fd.h and
+ the functions there, especially the ones used in the function:
+ _HURD_FD_H_EXTERN_INLINE struct hurd_fd *_hurd_fd_get (int fd) and so is
+ the pipetest from Marcus too.
+ <srs1> I have not yet been able to trace further with gdb since most
+ variables are optimized out and adding print statements does not work, at
+ least not yet. Now I'm trying to build eglibc with -O1 to see if the
+ optimized out variables are there or not.
+ <youpi> srs1: he means the ghc6 issue
+ <youpi> (and the ruby issue)
+ <srs1> youpi: Yes, the ghc6 and ruby ends at the functions I mentioned in
+ fd,h
+ <srs1> Both ghc6 and ruby programs are writing to a file when the error
+ happens. If they are using a pipeline or not I don't know yet, I think it
+ is a regular file write.
+ <srs1> I can send your the ruby program if you like: It is a c-file so
+ debugging is possible. ghc6 is worse, since that program cannot be
+ debugged directly with gdb.
+ <antrik> pipetest also results in the program hanging in locking stuff?...
+ <srs1> pipetest does not hang, but gives no output as it should. Running it
+ in gdb with single stepping shows the correct behavior, but then gdb
+ hangs if I try to single stepping further, continue at the right place
+ works!
+ <antrik> I haven't looked at the pipetest program. do you have the link
+ handy?
+ <antrik> never mind, got it
+ <antrik> srs1: that sounds like a GDB problem...
+ <youpi> most probably, yes
+ <youpi> (and I've always seen issues like this in gdb on hurd)
+ <antrik> actually I think it's expected... the RPC handling code has some
+ explicit GDB hooks AIUI; trying to single-step into this code is probably
+ expected to wreck havoc...
+ <youpi> well, it should have some sane behavior
+ <youpi> even if it's "skip to next point where it's debuggable"
+ <antrik> srs1: note that there is no BADF involved in the pipetest AIUI...
+
+2011-04-28
+
+ <antrik> what is the actual problem you are seeing BTW?
+ <gnu_srs1> antrik: in ruby the problem is: Exception `IOError' at
+ tool/mkconfig.rb:12 - closed stream
+ <gnu_srs1> Triggered by ruby:io.c:internal_read_func() calling
+ sysdeps/mach/hurd/read.c returning a negative number of bytes read.
+ <abeaumont> gnu_srs1: why do you think that error is locking related?
+ <gnu_srs1> This happens after 8 iterations of the read loop with 8192 bytes
+ read each time.
+ <abeaumont> but that doesn't involve locking at all, does it?
+ <gnu_srs1> I think it is, if there is a pipepline set up??
+ <gnu_srs1> Also the ghc6 hang ends up in hangs in sysdeps/mach/hurd/read.c
+ traced into fd.h where all things happen (including setting locks and
+ mutexes)
+ <braunr> what locking ?
+ <braunr> stdio locking is different from file locking
+ <braunr> and a pipe doesn't imply file locking at all
+ <abeaumont> read may block on pipes, but it's unrelated to flock
+ <gnu_srs1> Look into the file fd.h, maybe you can describe things
+ better. I'm not fluent in this stuff.
+ <gnu_srs1> Has a pipe has a file descriptor associated to it? What about a
+ file read/write?
+ <abeaumont> a pipe provides 2 file descriptors, one for reading and another
+ one for writting
+ <abeaumont> i may give a look at that if i manage to build glibc
+ succesfully...
+ <gnu_srs1> Take a look at the realevant code from fd.h:
+ http://pastebin.com/kesBpjy4
+ <abeaumont> the ruby error happens just trying to build ruby1.9?
+ <abeaumont> gnu_srs1: from what you said, the error occurs while reading,
+ so i don't see how it can be related to that code
+ <abeaumont> you already got a descriptor if you're reading from it
+ <gnu_srs1> I have not tried anything else than ruby1.9.1. I can send you
+ the ruby debug setup and files if you are interested.
+ <abeaumont> gnu_srs1: ok, i'll try to build ruby1.9.1 later... let's see if
+ i can build glibc first
+ <gnu_srs1> abeaumont: well, the read suddenly returns -1 bytes read,
+ resulting in a file descriptor of -1 (instead of +3).
+ <abeaumont> gnu_srs1: i see
+ <antrik> gnu_srs1: are you sure the hang really happens in _hurd_fd_get()?
+ could you give us a backtrace?
+ <antrik> gnu_srs1: there are many reasons why read() can return -1; errno
+ should indicate the reason. unfortunately, I can't make much out of
+ ruby's "translation" of the error :-)
+ <gnu_srs1> antrik: In the ruby case there is no hang: The steam is closed
+ by read() giving an error code !=0. This triggers things in the ruby
+ code: A negative number of bytes read and a negative fd results, and an
+ error error is triggered in the ruby code.
+ <gnu_srs1> antrik: See http://pastebin.com/eZmaZQJr
+ <antrik> gnu_srs1: yes, this all sounds perfectly right. the question is
+ *why* read() returns an error code. we'd need to know what error it is
+ exactly, and in what situation it occurs. tracing the libc code is not at
+ all useful here
+ <antrik> uhm... 1073741833 is errno?...
+ <gnu_srs1> BTW: I think the error code is EBADF (badfile descriptor?). The
+ integer version of it is 1073741833, see the pastebin i linked to.
+ <antrik> you could use perror() to get something more readable :-)
+ <antrik> or error() with the right arguments
+ <gnu_srs1> I used integer when printing, but looking into fd.h I think it
+ is EBADF (I did get this result once in gdb)
+ <antrik> fd.h won't tell you anything. most error codes are generated by
+ the server, not by libc
+ <antrik> BADF might be generated in libc when ruby tries to read on FD -1
+ <antrik> (no idea why it tries to do that... perhaps there is actually
+ something wrong/stupid in ruby's error handling)
+ <gnu_srs1> Well I single-stepped in fd.h using gdb and printing err gave
+ EBADf. err is declared as: error_t err in read.c
+ <antrik> at which point did you single-step? while fd was still 3?
+ <gnu_srs1> I don't think the problem is in ruby, it is in mach/hurd!
+ Similar problems with ghc, python-apt, etc
+ <gnu_srs1> Yes, fd=3 was not changed. I cannot trace into fd.h from
+ read.c. That is the problem with all cases! Need to leave for a while
+ now.
+ <antrik> sorry, I don't see *anything* similar in the ghc failure.
+ <antrik> I don't know about python-apt
+ <antrik> for the ghc case, I'd like to see a GDB backtrace from the point
+ where it is hanging
+ <antrik> just to be clear: anything I/O-related will involve fd.h
+ somewhere. that doesn't in any way indicate the problems are related. in
+ fact the symptoms you described are very different, and I'm pretty
+ certain these are completely different issues
+ <gnu_srs1> antrik: Here is a backtrace,
+ http://pastebin.com/wvCiXFjB. Numbers 6,7,8 are from the calling Haskell
+ functions. They cannot be debugged by gdb. Nice to see that somebody is
+ showing interest at last:-/
+ <antrik> hm... I wonder whether the _hurd_intr_rpc_msg_in_trap is a result
+ of the ^C?
+ <antrik> if so, it seems to be a "normal" bloking read() operation. so
+ again probably not related to libc code at all
+ <gnu_srs1> Where is this blocking read() code located mach/hurd?
+ <antrik> io_read() is implemented by whatever server handles the FD in
+ question
+ <antrik> I guess rpctrace will be more helpful here than GDB... to see what
+ the program is trying to do here
+ <gnu_srs1> Why don't I get there with gdb?
+ <antrik> err... the server is a different process
+ <antrik> you are only tracing the client code
+ <gnu_srs1> OK, here is a rpctrace for ruby:
+ http://pastebin.com/sdPiKGBW.Nice programs you have, no manual pages, and
+ the program hang
+ <gnu_srs1> s/http://pastebin.com/sdPiKGBW.Nice
+ /http://pastebin.com/sdPiKGBW. BTW: Nice/
+ <gnu_srs1> antrik: Do you want the rpctrace of the ghc hang too? If that is
+ the case, do you need the whole file. From the ruby case the last part
+ looked most interesting:
+ libpthread/sysdeps/generic/pt-mutex-timedlock.c: assert (mutex->owner !=
+ self);
+ <antrik> gnu_srs1: hm... you get that assertion only with rpctrace? guess
+ it doesn't work properly then :-(
+ <gnu_srs1> Is it visible on the client side?
+ <antrik> gnu_srs1: that assertion *is* from the client side. I'm just
+ surprised that apparently it's only triggered when you run it in rpctrace
+ <antrik> how did you invoke rpctrace?
+ <gnu_srs1> rpctrace "command with options" > rpctrace.out 2>&1
+ <antrik> well, I'd like to know the "command with options" part :-)
+ <gnu_srs1> OK: for ruby: ./miniruby ./ tool/mkconfig.rb as before.
+ <antrik> OK, so it just runs the ruby interpreter and no other processes
+ <gnu_srs1> No other processes involved!
+ <abeaumont> gnu_srs1: i can reproduce the ruby error, no let's dig in it :D
+ <antrik> gnu_srs1: rpctrace for ghc could be useful too... but if it's too
+ long, pasting only the last bit might suffice
+ <gnu_srs1> antrik: OK, will do that. Do you find anything interesting?
+ <gnu_srs1> abeaumont: Using gdb: gdb ./miniruby; (gdb) break io.c:569; c8;
+ break fd.h:72 or break read.c:27 and you are there. Beware of gdb
+ hanging, so you need another terminal to kill -9 gdb (sometimes a reboot
+ is needed :-(
+ <antrik> gnu_srs1: no, the ruby rpctrace is useless; apparently rpctrace
+ makes it break before reaching the relevant part :-(
+ <abeaumont> thanks gnu_srs1
+ <gnu_srs1> antrik: Hope for better luck with ghc:
+ http://pastebin.com/dgcrA05t
+ <antrik> hm... it hangs at proc_dostop() again... whatever that means
+
+2011-05-07
+
+ <gnu_srs> One question about ruby: I know where the problems occur in ruby
+ code. Can I switch to the kernel thread just before in gdb to single step
+ from there?
+ <youpi> you can put a breakpoint, can't you?
+ <antrik> gnu_srs: kernel thread?
+ <gnu_srs> Yes, but will single stepping from there lead me to the Hurd
+ code. I have not succeeded to do that yet!
+ <youpi> you mean the translator code?
+ <gnu_srs> Well, Roland did call it the signal thread, there are at least
+ two threads per process, a signal thread and a main (user) thread.
+ <youpi> then it's a thread in gdb
+ <youpi> just use the thread gdb commands to access it
+ <gnu_srs> I do find two threads in gdb, yes. But following only the user
+ thread does not lead me to the cause of the problems.
+ <gnu_srs> And following the other (signal thread) has not been successful
+ so far.
+ <youpi> multithreading debugging in gdb is painful yes
+ <youpi> single-step isn't really an option in it
+ <antrik> gnu_srs: well, as I said before, the cause is probably not in the
+ libc code anyways. it would be much more relevant to find out what the FD
+ in question is, and what "special" thing Ruby does to it to trigger the
+ problematic behaviour...
+ <youpi> it's simpler to put printfs etc.
+ <antrik> youpi: well, printf doesn't work in the FD code :-)
+ <youpi> you can make it work
+ <youpi> open /dev/mem, write to 0xb8000
+ <youpi> I'm not even joking
+ <gnu_srs> I have printfs in the ruby code. And at some parts in eglibc (but
+ it is not possible to put them at all places I want, as mentioned before)
+ <antrik> sure, there are ways to debug this code too... but I don't think
+ it's useful. so far there is no indication that this will help finding
+ the actual issue
+ <gnu_srs> The problem is not file descriptors. It is that an ongoing read
+ suddenly returns -1 bytes read. And then the ruby code assigns a negative
+ file descriptor in the exception handling.
+ <youpi> a *read* ?
+ <youpi> with errno == 0 ?
+ <gnu_srs> Yes, a read!
+ <youpi> how ruby comes to assigning a negative fd from that?
+ <youpi> does it somehow close the fd?
+ <gnu_srs> The errno reported from the read is EBADF!
+ <youpi> did you try to rpctrace it?
+ <gnu_srs> I don't bother too much about ruby exception handling. The error
+ has already happened in the read operation. And that lead me to eglibc
+ code.... and so on...
+ <youpi> do you know what kind of file this fd was supposed to be on?
+ <youpi> sure, that's debugging
+ <gnu_srs> Yes I did rpctrace, but that was not successful. rpctrace just
+ hang! Buggy code?
+ <antrik> youpi: I assume that's Ruby's way to indicate that the FD is not
+ valid anymore, after the previous error
+ <youpi> does the program fork?
+ <youpi> antrik: possibly
+ <youpi> rpctrace has known issues, yes
+ <youpi> gnu_srs: did you trace close()s by hand with printfs?
+ <gnu_srs> Ho w to find out if it forks?
+ <youpi> what does rpctrace stop on ?
+ <gnu_srs> Well, I don't remember. Antrik?
+ <antrik> proc_dostop() IIRC
+ <antrik> or something like that
+ <gnu_srs> I did not find any close() statements in the code I debugged.
+ <youpi> ok, proc_dostop() is typically a sign of fork()
+ <youpi> gnu_srs: that doesn't necessarily mean it's not called
+ <antrik> gnu_srs: I think his point is that something else might close the
+ FD, causing the error you see
+ <youpi> anything can happen in the wild :)
+ <antrik> gnu_srs: as I said before, the next step is to find out what this
+ FD is, and what happens to it...
+ <gnu_srs> antrik: Any ideas how to find out?
+ <youpi> what is the backtrace?
+ <gnu_srs> Well I know the fd number, it is either 3 or 5 in my tests. Does
+ the number matter?
+ <youpi> yes, it's not std{in,out,err}
+ <gnu_srs> How to get a backtrace of a program that does not hang?
+ <youpi> make it hang at the point of failure
+ <youpi> when read returns -1
+ <youpi> so you know who did the read
+ <gnu_srs> I have to run the loop several times before the number of bytes
+ read is -1.
+ <youpi> you mean running the program several times ?
+ <youpi> or just let the loop continue for some time?
+ <pinotree> if it's the latter, you can add breakpoints with conditions
+ <gnu_srs> No the read loop runs for 7 iterations, and fails the 8th time!
+ <youpi> then make it hang when read() returns -1
+ <Mr_Spock> could you paste your code somewhere?
+ <youpi> when debugging, you're allowed to do all kinds of ugly things, you
+ know ;)
+ <gnu_srs> OK, I'll try that.
+ <gnu_srs> MR_Spock: The easiest way would be to try to build
+ ruby1.9.1. Then I can help you from where it fails.
+ <gnu_srs> pinotree: How to give a breakpoint with a condition?
+ <pinotree> break where if condition
+ <youpi> see help break
+ <youpi> oh, there's even a thread condition nowadays, good
+ <gnu_srs> Thanks for the discussion. I have to get into the real world for
+ a while now. To be continued.
+ <antrik> gnu_srs: well, if you already know that the loop runs several
+ times before the error occurs, you apparently already looked at the
+ higher-level code that is relevant here...
+ <youpi> but it may be generic code, and not tell what calls it
diff --git a/open_issues/system_call_mechanism.mdwn b/open_issues/system_call_mechanism.mdwn
new file mode 100644
index 00000000..5598148c
--- /dev/null
+++ b/open_issues/system_call_mechanism.mdwn
@@ -0,0 +1,17 @@
+[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]]
+
+[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
+id="license" text="Permission is granted to copy, distribute and/or modify this
+document under the terms of the GNU Free Documentation License, Version 1.2 or
+any later version published by the Free Software Foundation; with no Invariant
+Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license
+is included in the section entitled [[GNU Free Documentation
+License|/fdl]]."]]"""]]
+
+[[!tag open_issue_gnumach]]
+
+IRC, freenode, #hurd, 2011-05-07
+
+ <braunr> very simple examples: system calls use old call gates, which are
+ the slowest path to kernel space
+ <braunr> modern processors have dedicated instructions now