From 2bc136e680877b6a9d17d6a0e815b47775088d67 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Mon, 9 May 2011 10:47:56 +0200 Subject: IRC. --- glibc.mdwn | 4 +- glibc/signal.mdwn | 5 +- glibc/signal/signal_thread.mdwn | 93 ++++++ .../address_space_memory_mapping_entries.mdwn | 19 ++ open_issues/ext2fs_page_cache_swapping_leak.mdwn | 24 ++ open_issues/gnumach_memory_management.mdwn | 73 +++++ open_issues/keymap_mach_console.mdwn | 40 +++ open_issues/pflocal_reauth.mdwn | 39 +++ ...local_socket_credentials_for_local_sockets.mdwn | 4 + open_issues/python.mdwn | 5 + open_issues/rework_gnumach_ipc_spaces.mdwn | 77 +++++ open_issues/select.mdwn | 23 +- open_issues/select_bogus_fd.mdwn | 55 ++++ open_issues/select_vs_signals.mdwn | 25 ++ open_issues/sendmsg_scm_creds.mdwn | 6 +- open_issues/sigpipe.mdwn | 345 +++++++++++++++++++++ open_issues/system_call_mechanism.mdwn | 17 + 17 files changed, 844 insertions(+), 10 deletions(-) create mode 100644 glibc/signal/signal_thread.mdwn create mode 100644 open_issues/address_space_memory_mapping_entries.mdwn create mode 100644 open_issues/keymap_mach_console.mdwn create mode 100644 open_issues/pflocal_reauth.mdwn create mode 100644 open_issues/select_bogus_fd.mdwn create mode 100644 open_issues/select_vs_signals.mdwn create mode 100644 open_issues/sigpipe.mdwn create mode 100644 open_issues/system_call_mechanism.mdwn diff --git a/glibc.mdwn b/glibc.mdwn index c47f3f1f..6c49508f 100644 --- a/glibc.mdwn +++ b/glibc.mdwn @@ -1,4 +1,4 @@ -[[!meta copyright="Copyright © 2007, 2008, 2010 Free Software Foundation, +[[!meta copyright="Copyright © 2007, 2008, 2010, 2011 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable @@ -35,6 +35,8 @@ Porting glibc to a specific architecture is non-trivial. * [[open_issues/secure_file_descriptor_handling]] + * [[signal/signal_thread]] + ## Concepts diff --git a/glibc/signal.mdwn b/glibc/signal.mdwn index 67028fef..84153cff 100644 --- a/glibc/signal.mdwn +++ b/glibc/signal.mdwn @@ -1,4 +1,5 @@ -[[!meta copyright="Copyright © 2009, 2010 Free Software Foundation, Inc."]] +[[!meta copyright="Copyright © 2009, 2010, 2011 Free Software Foundation, +Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this @@ -9,7 +10,7 @@ is included in the section entitled [[GNU Free Documentation License|/fdl]]."]]"""]] The [[*UNIX signalling mechanism*|unix/signal]] is implemented for the GNU Hurd -by means of a separate *signal-handling [[thread]]* that is part of every +by means of a separate *[[signal_thread]]* that is part of every [[process]]. This makes handling of signals a separate thread of control. * [[SA_SIGINFO, SA_SIGACTION|open_issues/sa_siginfo_sa_sigaction]] diff --git a/glibc/signal/signal_thread.mdwn b/glibc/signal/signal_thread.mdwn new file mode 100644 index 00000000..28855dbd --- /dev/null +++ b/glibc/signal/signal_thread.mdwn @@ -0,0 +1,93 @@ +[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!tag open_issue_documentation]] + + bugs around signals are very tricky + signals are actually the most hairy part of the hurd + and the reason they're aynchronous is that they're handled by a + second thread + (so yes, every process on the hurd has at least two threads) + braunr: How to solve the asynch problem then if every process has + two threads? + the easiest method would be to align ourselves on what most other + Unices do + establish a "signal protocol" between kernel and userspace + with a set of signal info in a table, most likely at the top of + the stack + but this is explicitely what the original Mach developers didn't + want, and they were right IMO + having two threads is very clean, but it creates incompatibilites + with what POSIX requires + so there might be a radical choice to make here + and i doubt we have the resources to make it happen + What is the advantage of having two threads per process, a per + the original design? + it's clean + you don't have to define async-signal-safe functions + it's like using sigwait() yourself in a separate thread, or + multiplexing them through signalfd() + Regardless of the advantages, isn't two threads per process a + waste of resources? + sure it is + but does it really matter ? + mach and the hurd were intended to be "hyperthreaded" + so basically, a thread should consume only a few kernel resources + in GNU Mach, it doesn't even consume a kernel stack because only + continuations are used + and in userspace, it consumes 2 MiB of virtual memory, a few table + entries, and almost no CPU time + What does "hyperthreaded" mean: Do you have a reference? + in this context, it just means there are a lot of threads + even back in the 90s, the expected number of threads could scale + up to the thousand + today, it isn't much impressive any more + but at the time, most systems didn't have LWPs yet + and a process was very expensive + Looks like I have some catching up to do: What is "continuations" + and LWP? Maybe I also need a reference to an overview on multi-threading. + Lightweight process? + http://en.wikipedia.org/wiki/Light-weight_process + svante_: that's a whole computer science domain of its own + yes + LWPs are another names for kernel threads usually + continuations are a facility which allows a thread to store its + state, yield the processor to another thread, and when it's dispatched + again by the scheduler, it can resume with its saved state + most current kernels support kernel preemption though + which means their state is saved based on scheduler decisions + unlike continuations where the thread voluntarily saves its state + if you only have continuations, you can't have kernel preemption, + but you end up with one kernel stack per processor + while the other model allows kernel preemption and requires one + kernel stack per thread + I know resources are limited, but it looks like kernel preemption + would be nice to have. Is that too much for a GSoC student? + it would require a lot of changes in obscure and sensitive parts + of the kernel + and no, kernel preemption is something we don't actually need + even current debian linux kernels are built without kernel + preemption + and considering mach has hard limitations on its physical memory + management, increasing the amount of memory used for kernel stacks would + imply less available memory for the rest of the system + Are these hard limits in mach difficult to change? + yes + consider mach difficult to change + that's actually one of the goals of my stalled project + which I hope to resume by the end of the year :/ + Reading Wikipedia it looks like LWP are "kernel treads" and other + threads are "user threads" at least in IBM/AIX. LWP in Linux is a thread + sharing resources and in SunOS they are "user threads". Which is closest + for Hurd? + i told you + 14:09 < braunr> LWPs are another names for kernel threads usually + Similar to to the IBM definition then? Sorry for not remembering + what I've been reading. diff --git a/open_issues/address_space_memory_mapping_entries.mdwn b/open_issues/address_space_memory_mapping_entries.mdwn new file mode 100644 index 00000000..caf447dd --- /dev/null +++ b/open_issues/address_space_memory_mapping_entries.mdwn @@ -0,0 +1,19 @@ +[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!tag open_issue_gnumach]] + +IRC, freenode, #hurd, 2011-05-07 + + and as a last example: memory mapping is heavily used in the hurd, + but for some reason, the map entries in an address space are still on a + linked list + a bare linked list + which makes faults and page cache lookups even slower diff --git a/open_issues/ext2fs_page_cache_swapping_leak.mdwn b/open_issues/ext2fs_page_cache_swapping_leak.mdwn index 607c3af4..c0d0867b 100644 --- a/open_issues/ext2fs_page_cache_swapping_leak.mdwn +++ b/open_issues/ext2fs_page_cache_swapping_leak.mdwn @@ -149,3 +149,27 @@ IRC, freenode, #hurd, 2011-04-18 this make testing this stuff quite a lot harder... [sigh] any suggestions how to debug this hang? antrik: no :/ + +2011-04-28: [[!taglink open_issue_documentation]] + + hm... is it normal that "swap free" doesn't increase as a process' + memory is paged back in? + yes + there's no real use cleaning swap + on the contrary, it makes paging the process out again longer + hm... so essentially, after swapping back and forth a bit, a part + of the swap equal to the size of physical RAM will be occupied with stuff + that is actually in RAM? + yes + so that that RAM can be freed immediately if needed + hm... that means my effective swap size is only like 300 MB... no + wonder I see crashes under load + err... make that 230 actually + indeed, quitting the application freed both the physical RAM and + swap space + 02:28 < antrik> hm... is it normal that "swap free" doesn't + increase as a process' memory is paged back in? + swap is the backing store of anonymous memory, like ext2fs is the + backing store of memory objects created from its pager + so you can view swap as the file system for everything that isn't + an external memory object diff --git a/open_issues/gnumach_memory_management.mdwn b/open_issues/gnumach_memory_management.mdwn index 1b897454..a5dd6955 100644 --- a/open_issues/gnumach_memory_management.mdwn +++ b/open_issues/gnumach_memory_management.mdwn @@ -772,3 +772,76 @@ IRC, freenode, #hurd, 2011-04-12: FreeBSD uses a binary buddy system like Linux the fact that the kernel allocator uses virtual memory doesn't mean the kernel has no mean to allocate contiguous physical memory ... + +2011-05-02 + + hm nice, my allocator uses less memory than glibc (squeeze + version) on both 32 and 64 bits systems + the new per-cpu layer is proving effective + braunr: Are you reimplementation malloc? + no + it's still the slab allocator for mach, but tested in userspace + so i wrote malloc wrappers + Oh. + i try to heavily test most of my code in userspace now + it's easier :-) + I agree + even the physical memory allocator has been implemented this way + is this your mach version? + virtual memory allocation will follow + or are you working on gnu mach? + for now it's my version + but i intend to spend the summer working on ipc port names + management + +[[rework_gnumach_IPC_spaces]]. + + and integrate the result in gnu mach + are you keeping the same user-space API? + Or are you experimenting with something new? + braunr: to be fair, it's not terribly hard to use less memory than + glibc :-) + yes + antrik: well ptmalloc3 received some nice improvements + neal: the goal is to rework some of the internals only + neal: namely, i simply intend to replace the splay tree with a + radix tree + braunr: the glibc allocator is emphasising performace, unlike some + other allocators that trade some performance for much better memory + utilisation... + ptmalloc3? + that's the allocator used in glibc + http://www.malloc.de/en/ + OK. haven't seen any recent numbers... the comparision I have in + mind is many years old... + i also made some additions to my avl and red-black trees this week + end, which finally make them suitable for almost all generic uses + the red-black tree could be used in e.g. gnu mach to augment the + linked list used in vm maps + which is what's done in most modern systems + it could also be used to drop the overloaded (and probably over + imbalanced) page cache hash table + +2011-05-03 + + antrik: How should I start porting? Have I just include rbraun's + allocator to gnumach and make it compile? + mcsim: well, basically yes I guess... but you will have to look at + the code in question first before we know anything more specific :-) + I guess braunr might know better how to start, but he doesn't + appear to be here :-( + mcsim: you can't juste put my code into gnu mach and make it run, + it really requires a few careful changes + mcsim: you will have to analyse how the current zone allocator + interacts with regard to locking + if it is used in interrupt handlers + what kind of locks it should use instead of the pthread stuff + available in userspace + you will have to change the reclamiing policy, so that caches are + reaped on demand + (this basically boils down to calling the new reclaiming function + instead of zone_gc()) + you must be careful about types too + there is work to be done ;) + (not to mention the obvious about replacing all the calls to the + zone allocator, and testing/debugging afterwards) diff --git a/open_issues/keymap_mach_console.mdwn b/open_issues/keymap_mach_console.mdwn new file mode 100644 index 00000000..3063dd00 --- /dev/null +++ b/open_issues/keymap_mach_console.mdwn @@ -0,0 +1,40 @@ +[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +IRC, freenode, #hurd, 2011-04-26 + + pavkac: btw are you aware there's already some code to change the + keymap for the mach console (I think originally from the hurdfr guys, but + I cannot remember exactly from where I got it from :/) + pavkac: http://www.hadrons.org/~guillem/tmp/hurd-keymap.tgz + guillem: No, I didn't know. I'll diff it and try to follow. + pavkac: it would be nice to maybe integrate it properly into the + hurd + you'll see the code is pretty basic, so extending it would be + nice too I guess :) + guillem: OK, I'll see to it. Unfortunately I'm quite busy this + week. Have a lot of homeworks to school. :/ + guillem: But, I'll find some time during weekend. + maybe it'd be simpler to add it to the hurd package and use that + from the console-setup package indeed + but copyright issues should be solved + unless we simply put this into hurdextras + ok found this: + http://www.mail-archive.com/debian-hurd@lists.debian.org/msg02456.html + and + http://www.mail-archive.com/debian-hurd@lists.debian.org/msg01173.html + which seems to be the original Mark's code + AFAIR I contributed the the spanish keymap and some additional + key definitions for loadkeys + and http://lists.debian.org/debian-hurd/2000/10/msg00130.html + I've fetched all. :) But I must leave, good night if you're in + Europe. :) + pavkac: the tarball I provided should be the latest, the others + are mostly to track the provenance of the source diff --git a/open_issues/pflocal_reauth.mdwn b/open_issues/pflocal_reauth.mdwn new file mode 100644 index 00000000..839e383d --- /dev/null +++ b/open_issues/pflocal_reauth.mdwn @@ -0,0 +1,39 @@ +[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!tag open_issue_glibc open_issue_hurd]] + +IRC, freenode, #hurd, 2011-04-02 + + youpi: i'm playing with pflocal, and noticing that a simple C + executable doesn't trigger reauthenticate + youpi: i've put a debug output (to file) in S_io_reauthenticate, + and with a simple C test (which uses unix sockets) it isn't called + pinotree: it seems pflocal should return FS_RETRY_REAUTH in + retry_type + to make glibc call reauthentication + pflocal? + yes, in the dir_lookup handler + isn't that ext2fs? + libtrivfs had dir_lookup() too + trivfs_check_open_hook can be used to tweak its behavior + ah, missed that pflocal was using libtrivfs, sorry + there are probably very few translators which don't use one of the + lib*fs :) + pinotree: what are you trying to do with pflocal? + local socket scredentials (SCM_CREDS) + ah + don't really know what that is, but I remember reading some + mention of it ;-) + +--- + +See also [[pflocal_socket_credentials_for_local_sockets]] and +[[sendmsg_scm_creds]]. diff --git a/open_issues/pflocal_socket_credentials_for_local_sockets.mdwn b/open_issues/pflocal_socket_credentials_for_local_sockets.mdwn index 5a71412e..dfdc213c 100644 --- a/open_issues/pflocal_socket_credentials_for_local_sockets.mdwn +++ b/open_issues/pflocal_socket_credentials_for_local_sockets.mdwn @@ -40,3 +40,7 @@ IRC, freenode, #hurd, 2011-03-28 S_io_reauthenticate cached in the sock_user struct? yes nice thanks, i will try that change first + +--- + +See also [[pflocal_reauth]] and [[sendmsg_scm_creds]]. diff --git a/open_issues/python.mdwn b/open_issues/python.mdwn index 34fa81f6..403ff8aa 100644 --- a/open_issues/python.mdwn +++ b/open_issues/python.mdwn @@ -27,6 +27,11 @@ First, make the language functional, have its test suite pass without errors. [[!inline pages=community/gsoc/project_ideas/perl_python feeds=no]] + +## Analysis + + * [[select_bogus_fd]] + --- diff --git a/open_issues/rework_gnumach_ipc_spaces.mdwn b/open_issues/rework_gnumach_ipc_spaces.mdwn index 5bf0c530..b7cda227 100644 --- a/open_issues/rework_gnumach_ipc_spaces.mdwn +++ b/open_issues/rework_gnumach_ipc_spaces.mdwn @@ -10,6 +10,14 @@ License|/fdl]]."]]"""]] [[!tag open_issue_gnumach]] +IRC, freenode, #hurd, 2011-05-07 + + things that are referred to as "system calls" in glibc are + actually RPCs to the kernel or other tasks, those RPCs have too lookup + port rights + the main services have tens of thousands of ports, looking up one + is slow + There is a [[!FF_project 268]][[!tag bounty]] on this task. IRC, freenode, #hurd, 2011-04-23 @@ -241,3 +249,72 @@ IRC, freenode, #hurd, 2011-04-23 so a radix ree would be the most efficient well, if some processes really feel they must use random numbers for port names, they *ought* to be penalized ;-) + +2011-04-27 + + antrik: remember when you asked why high numbers would be a + problem with radix trees ? + here is a radix tree with one entry, which key is around 5000 + [ 656.296412] tree height: 3 + [ 656.296412] index: 0, level: 0, height: 3, count: 1, + bitmap: 0000000000000002 + [ 656.296412] index: 1, level: 1, height: 2, count: 1, + bitmap: 0000000000004000 + [ 656.296412] index: 14, level: 2, height: 1, count: 1, + bitmap: 0000000000000080 + three levels, each with an external node (dynamically allocated), + for one entry + so in the worst case of entries with keys close to the highest + values, the could be many external nodes with higher paths lengths than + when keys are close to 0 + which also brings the problem of port name allocation + can someone with access to a buildd which has an uptime of at + least a few days (and did at least one build) show me the output of + portinfo 3 | tail ? + port names are allocated linearly IIRC, like PIDs, and some parts + of the kernel may rely on them not being reused often + but for maximum effifiency, they should be + efficiency* + 00:00 < braunr> can someone with access to a buildd which has an + uptime of at least a few days (and did at least one build) show me the + output of portinfo 3 | tail ? + :) + it's almost like wc -l + 4905: receive + vs 4647 + for / + 52902: receive + vs 52207 + for the chroot + even after several builds ? + and several days ? + that's after 2 days + it's not so many builds + rossini is not so old + (7h) + but many builds + 70927: send + vs 70938 + ok + so it seems port names are reused + good + yes they are clearly + i think i remember a comment about why the same port name + shouldn't be reused too soon + well, it could help catching programming errors + that it helped catch bugs in applications that could + deallocate/reallote quickly + reallocate* + without carefuly synchronization + careful + damn, i'm tired :/ + but that's about debugging + so we don't care about performance there + yes + i'll try to improve allocation performance too + using e.g. bitmaps in each external node back to the root so that + unused slots are quickly found + i thknk that's what idr does in linux + braunr: idr? + antrik: a data structure used to map integers to pointers + http://fxr.watson.org/fxr/source/lib/idr.c?v=linux-2.6 diff --git a/open_issues/select.mdwn b/open_issues/select.mdwn index ab6af90b..0f750631 100644 --- a/open_issues/select.mdwn +++ b/open_issues/select.mdwn @@ -1,4 +1,4 @@ -[[!meta copyright="Copyright © 2010 Free Software Foundation, Inc."]] +[[!meta copyright="Copyright © 2010, 2011 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this @@ -12,12 +12,23 @@ License|/fdl]]."]]"""]] There are a lot of reports about this issue, but no thorough analysis. ---- + +# `elinks` IRC, unknown channel, unknown date. - This is related to ELinks... I've looked at the select() implementation for the Hurd in glibc and it seems that giving it a short timeout could cause it not to report that file descriptors are ready. - It sends a request to the Mach port of each file descriptor and then waits for responses from the servers. - Even if the file descriptors have data for reading or are ready for writing, the server processes might not respond immediately. - So if I want ELinks to check which file descriptors are ready, how long should the timeout be in order to ensure that all servers can respond in time? + This is related to ELinks... I've looked at the select() + implementation for the Hurd in glibc and it seems that giving it a short + timeout could cause it not to report that file descriptors are ready. + It sends a request to the Mach port of each file descriptor and + then waits for responses from the servers. + Even if the file descriptors have data for reading or are ready + for writing, the server processes might not respond immediately. + So if I want ELinks to check which file descriptors are ready, how + long should the timeout be in order to ensure that all servers can + respond in time? Or do I just imagine this problem? + +--- + +See also [[select_bogus_fd]] and [[select_vs_signals]]. diff --git a/open_issues/select_bogus_fd.mdwn b/open_issues/select_bogus_fd.mdwn new file mode 100644 index 00000000..17aced4a --- /dev/null +++ b/open_issues/select_bogus_fd.mdwn @@ -0,0 +1,55 @@ +[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!tag open_issue_glibc]] + + +# Python + +IRC, freenode, #hurd, 2011-04-13 + + ok, cause of first python testsuite failure located, now the + hard part, how to best fix it :) + how to redesign the code to avoid the problem... that's the + hard part, mostly cause i lack contextual info + tschwinge: the problem is pretty much summarized by this + comment in _hurd_select (in glibc): /* If one descriptor is bogus, we + fail completely. */ + does POSIX say anything about what to do if one fd is invalid? + and the other question is why python is calling select() with an + invalid fd + pochu: yep, it says it should not fail completelly + then that's our bug :) + abeaumont: just note that (at least on debian) some tests may + hang forever or cause hurd/mach to die + abeaumont: see in the debian/rules of the packaging of each + pythonX.Y source + ... there's a list of the tests excluded from the test suite run + well, to be precise, python has a configure check for + 'broken_poll' which hurd fails, and therefore python's select module is + not built, and anything depending on it fails + broken_poll checks exactly for that posix requirement + the reason for python using a non-existant + descriptor... unknown :D + we should fix select to not fail miserably in that case + abeaumont: we have a patch to fix the broken poll check to + actually disable the poll module + pinotree: but the proper fix is to fix select(), which is what + abeaumont is looking at + pinotree: i'd say that's exactly what python's configure check + does itself -- disable building the select module + abeaumont: what pinotree means is that the check is broken, see + http://patch-tracker.debian.org/patch/series/view/python2.6/2.6.6-8/hurd-broken-poll.diff + yes, the configure check for poll does the check, but not + everything of the poll module gets disabled (and you get a build failure) + +--- + +See also [[select]] and [[select_vs_signals]]. diff --git a/open_issues/select_vs_signals.mdwn b/open_issues/select_vs_signals.mdwn new file mode 100644 index 00000000..bbd69d00 --- /dev/null +++ b/open_issues/select_vs_signals.mdwn @@ -0,0 +1,25 @@ +[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!tag open_issue_glibc]] + + +# `sudo` + +`sudo [task]` hands after finishing `[task]`. + +IRC, freenode, #hurd, 2011-04-02 + + the sudo bug is select() not being able to get interrupted by + signals + +--- + +See also [[select]] and [[select_bogus_fd]]. diff --git a/open_issues/sendmsg_scm_creds.mdwn b/open_issues/sendmsg_scm_creds.mdwn index 1f4de59c..2deec7e8 100644 --- a/open_issues/sendmsg_scm_creds.mdwn +++ b/open_issues/sendmsg_scm_creds.mdwn @@ -1,4 +1,4 @@ -[[!meta copyright="Copyright © 2010 Free Software Foundation, Inc."]] +[[!meta copyright="Copyright © 2010, 2011 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this @@ -89,3 +89,7 @@ IRC, unknown channel, unknown date. (since it's just about letting the application reading from the message structure) yep ok, good :) + +--- + +See also [[pflocal_socket_credentials_for_local_sockets]] and [[pflocal_reauth]]. diff --git a/open_issues/sigpipe.mdwn b/open_issues/sigpipe.mdwn new file mode 100644 index 00000000..0df3560e --- /dev/null +++ b/open_issues/sigpipe.mdwn @@ -0,0 +1,345 @@ +[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!tag open_issue_glibc open_issue_hurd]] + +[[!GNU_Savannah_bug 461]] + +IRC, freenode, #hurd, 2011-04-20 + + I found a problem from 2002 by Marcus Brinkmann that I think is + related to my problems: http://savannah.gnu.org/bugs/?461. He has a test + file called pipetest.c that shows that SIGPIPE is not triggered reliably. + Cited from the bug report: The attached test program does not + trigger SIGPIPE reliably, because closing the read end of the pipe + happens asynchronously. The write can succeed because the read end might + not have been closed yet. + I have debugged this program on both Hurd and Linux, and the + problem in Hurd remains:-( + Anybody looked into the almost ten year old + bug:http://savannah.gnu.org/bugs/?461 this one is definitely related to + the build problems of e.g. ghc6 and ruby1.9.1. Should I mention this on + the ML? + that could be it indeed + does th bug still happen? + depends on: new interface io_close + which depends on: POSIX record locking + youpi: Yes it does, I've tested the pipetest.c file submitted by + Marcus B on both Linux and Hurd + that would've maybe been a nice GSOC task + azeem: err, the contrary for posix record locking, non ? + argh + why would POSIX record locking depend on this? + well anyway, then have POSIX record locking be a GSOC task :) + I wasn't aware that would also fix ruby and ghc building :) + http://permalink.gmane.org/gmane.os.hurd.devel.readers/265 + (for io_close stuff) + http://comments.gmane.org/gmane.os.hurd.devel.readers/63 actually + I guess if they didn't implement it/agreed on something back then + it'd be quite hard to do it now + azeem: marcus recently showed up here. Maybe he can help out/has + ideas? + well yeah + but marcus was the junior guy back then + but it's a very hurdish solution (ie, complex, buggy, and + not implemented) + maybe we can go for something simpler + azeem: what is this quote about? + don't remember + not io_close I'd say + +2011-04-21 + + svante_: why do you think the problem you see in ruby and ghc is + related to async close() ? + +2011-04-22 + + Well: the test case I'm running on ruby is giving me an EBADF + after 8 successful loops, and tracing within eglibc points towards + __mutex_lock_solid or __spin_lock, __spin_lock_solid from + mach/lock-intern.h from cthreads. + +2011-04-23 + + srs1: yeah, I saw it... but I still wonder what makes you think + this is related to async FD closing? + antrik: Every test case showing the problems are related to fd.h and + the functions there, especially the ones used in the function: + _HURD_FD_H_EXTERN_INLINE struct hurd_fd *_hurd_fd_get (int fd) and so is + the pipetest from Marcus too. + I have not yet been able to trace further with gdb since most + variables are optimized out and adding print statements does not work, at + least not yet. Now I'm trying to build eglibc with -O1 to see if the + optimized out variables are there or not. + srs1: he means the ghc6 issue + (and the ruby issue) + youpi: Yes, the ghc6 and ruby ends at the functions I mentioned in + fd,h + Both ghc6 and ruby programs are writing to a file when the error + happens. If they are using a pipeline or not I don't know yet, I think it + is a regular file write. + I can send your the ruby program if you like: It is a c-file so + debugging is possible. ghc6 is worse, since that program cannot be + debugged directly with gdb. + pipetest also results in the program hanging in locking stuff?... + pipetest does not hang, but gives no output as it should. Running it + in gdb with single stepping shows the correct behavior, but then gdb + hangs if I try to single stepping further, continue at the right place + works! + I haven't looked at the pipetest program. do you have the link + handy? + never mind, got it + srs1: that sounds like a GDB problem... + most probably, yes + (and I've always seen issues like this in gdb on hurd) + actually I think it's expected... the RPC handling code has some + explicit GDB hooks AIUI; trying to single-step into this code is probably + expected to wreck havoc... + well, it should have some sane behavior + even if it's "skip to next point where it's debuggable" + srs1: note that there is no BADF involved in the pipetest AIUI... + +2011-04-28 + + what is the actual problem you are seeing BTW? + antrik: in ruby the problem is: Exception `IOError' at + tool/mkconfig.rb:12 - closed stream + Triggered by ruby:io.c:internal_read_func() calling + sysdeps/mach/hurd/read.c returning a negative number of bytes read. + gnu_srs1: why do you think that error is locking related? + This happens after 8 iterations of the read loop with 8192 bytes + read each time. + but that doesn't involve locking at all, does it? + I think it is, if there is a pipepline set up?? + Also the ghc6 hang ends up in hangs in sysdeps/mach/hurd/read.c + traced into fd.h where all things happen (including setting locks and + mutexes) + what locking ? + stdio locking is different from file locking + and a pipe doesn't imply file locking at all + read may block on pipes, but it's unrelated to flock + Look into the file fd.h, maybe you can describe things + better. I'm not fluent in this stuff. + Has a pipe has a file descriptor associated to it? What about a + file read/write? + a pipe provides 2 file descriptors, one for reading and another + one for writting + i may give a look at that if i manage to build glibc + succesfully... + Take a look at the realevant code from fd.h: + http://pastebin.com/kesBpjy4 + the ruby error happens just trying to build ruby1.9? + gnu_srs1: from what you said, the error occurs while reading, + so i don't see how it can be related to that code + you already got a descriptor if you're reading from it + I have not tried anything else than ruby1.9.1. I can send you + the ruby debug setup and files if you are interested. + gnu_srs1: ok, i'll try to build ruby1.9.1 later... let's see if + i can build glibc first + abeaumont: well, the read suddenly returns -1 bytes read, + resulting in a file descriptor of -1 (instead of +3). + gnu_srs1: i see + gnu_srs1: are you sure the hang really happens in _hurd_fd_get()? + could you give us a backtrace? + gnu_srs1: there are many reasons why read() can return -1; errno + should indicate the reason. unfortunately, I can't make much out of + ruby's "translation" of the error :-) + antrik: In the ruby case there is no hang: The steam is closed + by read() giving an error code !=0. This triggers things in the ruby + code: A negative number of bytes read and a negative fd results, and an + error error is triggered in the ruby code. + antrik: See http://pastebin.com/eZmaZQJr + gnu_srs1: yes, this all sounds perfectly right. the question is + *why* read() returns an error code. we'd need to know what error it is + exactly, and in what situation it occurs. tracing the libc code is not at + all useful here + uhm... 1073741833 is errno?... + BTW: I think the error code is EBADF (badfile descriptor?). The + integer version of it is 1073741833, see the pastebin i linked to. + you could use perror() to get something more readable :-) + or error() with the right arguments + I used integer when printing, but looking into fd.h I think it + is EBADF (I did get this result once in gdb) + fd.h won't tell you anything. most error codes are generated by + the server, not by libc + BADF might be generated in libc when ruby tries to read on FD -1 + (no idea why it tries to do that... perhaps there is actually + something wrong/stupid in ruby's error handling) + Well I single-stepped in fd.h using gdb and printing err gave + EBADf. err is declared as: error_t err in read.c + at which point did you single-step? while fd was still 3? + I don't think the problem is in ruby, it is in mach/hurd! + Similar problems with ghc, python-apt, etc + Yes, fd=3 was not changed. I cannot trace into fd.h from + read.c. That is the problem with all cases! Need to leave for a while + now. + sorry, I don't see *anything* similar in the ghc failure. + I don't know about python-apt + for the ghc case, I'd like to see a GDB backtrace from the point + where it is hanging + just to be clear: anything I/O-related will involve fd.h + somewhere. that doesn't in any way indicate the problems are related. in + fact the symptoms you described are very different, and I'm pretty + certain these are completely different issues + antrik: Here is a backtrace, + http://pastebin.com/wvCiXFjB. Numbers 6,7,8 are from the calling Haskell + functions. They cannot be debugged by gdb. Nice to see that somebody is + showing interest at last:-/ + hm... I wonder whether the _hurd_intr_rpc_msg_in_trap is a result + of the ^C? + if so, it seems to be a "normal" bloking read() operation. so + again probably not related to libc code at all + Where is this blocking read() code located mach/hurd? + io_read() is implemented by whatever server handles the FD in + question + I guess rpctrace will be more helpful here than GDB... to see what + the program is trying to do here + Why don't I get there with gdb? + err... the server is a different process + you are only tracing the client code + OK, here is a rpctrace for ruby: + http://pastebin.com/sdPiKGBW.Nice programs you have, no manual pages, and + the program hang + s/http://pastebin.com/sdPiKGBW.Nice + /http://pastebin.com/sdPiKGBW. BTW: Nice/ + antrik: Do you want the rpctrace of the ghc hang too? If that is + the case, do you need the whole file. From the ruby case the last part + looked most interesting: + libpthread/sysdeps/generic/pt-mutex-timedlock.c: assert (mutex->owner != + self); + gnu_srs1: hm... you get that assertion only with rpctrace? guess + it doesn't work properly then :-( + Is it visible on the client side? + gnu_srs1: that assertion *is* from the client side. I'm just + surprised that apparently it's only triggered when you run it in rpctrace + how did you invoke rpctrace? + rpctrace "command with options" > rpctrace.out 2>&1 + well, I'd like to know the "command with options" part :-) + OK: for ruby: ./miniruby ./ tool/mkconfig.rb as before. + OK, so it just runs the ruby interpreter and no other processes + No other processes involved! + gnu_srs1: i can reproduce the ruby error, no let's dig in it :D + gnu_srs1: rpctrace for ghc could be useful too... but if it's too + long, pasting only the last bit might suffice + antrik: OK, will do that. Do you find anything interesting? + abeaumont: Using gdb: gdb ./miniruby; (gdb) break io.c:569; c8; + break fd.h:72 or break read.c:27 and you are there. Beware of gdb + hanging, so you need another terminal to kill -9 gdb (sometimes a reboot + is needed :-( + gnu_srs1: no, the ruby rpctrace is useless; apparently rpctrace + makes it break before reaching the relevant part :-( + thanks gnu_srs1 + antrik: Hope for better luck with ghc: + http://pastebin.com/dgcrA05t + hm... it hangs at proc_dostop() again... whatever that means + +2011-05-07 + + One question about ruby: I know where the problems occur in ruby + code. Can I switch to the kernel thread just before in gdb to single step + from there? + you can put a breakpoint, can't you? + gnu_srs: kernel thread? + Yes, but will single stepping from there lead me to the Hurd + code. I have not succeeded to do that yet! + you mean the translator code? + Well, Roland did call it the signal thread, there are at least + two threads per process, a signal thread and a main (user) thread. + then it's a thread in gdb + just use the thread gdb commands to access it + I do find two threads in gdb, yes. But following only the user + thread does not lead me to the cause of the problems. + And following the other (signal thread) has not been successful + so far. + multithreading debugging in gdb is painful yes + single-step isn't really an option in it + gnu_srs: well, as I said before, the cause is probably not in the + libc code anyways. it would be much more relevant to find out what the FD + in question is, and what "special" thing Ruby does to it to trigger the + problematic behaviour... + it's simpler to put printfs etc. + youpi: well, printf doesn't work in the FD code :-) + you can make it work + open /dev/mem, write to 0xb8000 + I'm not even joking + I have printfs in the ruby code. And at some parts in eglibc (but + it is not possible to put them at all places I want, as mentioned before) + sure, there are ways to debug this code too... but I don't think + it's useful. so far there is no indication that this will help finding + the actual issue + The problem is not file descriptors. It is that an ongoing read + suddenly returns -1 bytes read. And then the ruby code assigns a negative + file descriptor in the exception handling. + a *read* ? + with errno == 0 ? + Yes, a read! + how ruby comes to assigning a negative fd from that? + does it somehow close the fd? + The errno reported from the read is EBADF! + did you try to rpctrace it? + I don't bother too much about ruby exception handling. The error + has already happened in the read operation. And that lead me to eglibc + code.... and so on... + do you know what kind of file this fd was supposed to be on? + sure, that's debugging + Yes I did rpctrace, but that was not successful. rpctrace just + hang! Buggy code? + youpi: I assume that's Ruby's way to indicate that the FD is not + valid anymore, after the previous error + does the program fork? + antrik: possibly + rpctrace has known issues, yes + gnu_srs: did you trace close()s by hand with printfs? + Ho w to find out if it forks? + what does rpctrace stop on ? + Well, I don't remember. Antrik? + proc_dostop() IIRC + or something like that + I did not find any close() statements in the code I debugged. + ok, proc_dostop() is typically a sign of fork() + gnu_srs: that doesn't necessarily mean it's not called + gnu_srs: I think his point is that something else might close the + FD, causing the error you see + anything can happen in the wild :) + gnu_srs: as I said before, the next step is to find out what this + FD is, and what happens to it... + antrik: Any ideas how to find out? + what is the backtrace? + Well I know the fd number, it is either 3 or 5 in my tests. Does + the number matter? + yes, it's not std{in,out,err} + How to get a backtrace of a program that does not hang? + make it hang at the point of failure + when read returns -1 + so you know who did the read + I have to run the loop several times before the number of bytes + read is -1. + you mean running the program several times ? + or just let the loop continue for some time? + if it's the latter, you can add breakpoints with conditions + No the read loop runs for 7 iterations, and fails the 8th time! + then make it hang when read() returns -1 + could you paste your code somewhere? + when debugging, you're allowed to do all kinds of ugly things, you + know ;) + OK, I'll try that. + MR_Spock: The easiest way would be to try to build + ruby1.9.1. Then I can help you from where it fails. + pinotree: How to give a breakpoint with a condition? + break where if condition + see help break + oh, there's even a thread condition nowadays, good + Thanks for the discussion. I have to get into the real world for + a while now. To be continued. + gnu_srs: well, if you already know that the loop runs several + times before the error occurs, you apparently already looked at the + higher-level code that is relevant here... + but it may be generic code, and not tell what calls it diff --git a/open_issues/system_call_mechanism.mdwn b/open_issues/system_call_mechanism.mdwn new file mode 100644 index 00000000..5598148c --- /dev/null +++ b/open_issues/system_call_mechanism.mdwn @@ -0,0 +1,17 @@ +[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!tag open_issue_gnumach]] + +IRC, freenode, #hurd, 2011-05-07 + + very simple examples: system calls use old call gates, which are + the slowest path to kernel space + modern processors have dedicated instructions now -- cgit v1.2.3