From 5bd36fdff16871eb7d06fc26cac07e7f2703432b Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Thu, 29 Nov 2012 01:33:22 +0100 Subject: IRC. --- open_issues/synchronous_ipc.mdwn | 121 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 121 insertions(+) (limited to 'open_issues/synchronous_ipc.mdwn') diff --git a/open_issues/synchronous_ipc.mdwn b/open_issues/synchronous_ipc.mdwn index 57bcdda7..53d5d69d 100644 --- a/open_issues/synchronous_ipc.mdwn +++ b/open_issues/synchronous_ipc.mdwn @@ -62,3 +62,124 @@ From [[Genode RPC|microkernel/genode/rpc]]. well, if you see places where blocking is done but failing would be more appropriate, try changing them I'd say... it's not that easy :/ + + +# IRC, freenode, #hurd, 2012-08-18 + + what is the deepest design mistake of the HURD/gnumach? + lcc: async ipc + braunr: You mentioned that moving to L4 will create problems. Can + you name some, please? + I thought it was going to be faster on L4 + the problem is that l4 *only* provides sync ipc + so implementing async communication would require one seperated + thread for each instance of async communication + But you said that the deepest design mistake of Hurd is asynch + ipc. + not the hurd, mach + and hurd depends on it now + i said l4 provides *only* sync ipc + systems require async communication tools + but they shouldn't be built entirely on top of them + Hmm, so you mean mach has bad asynch ipc? + you can consider mach and l4 as two extremes in os design + mach *only* has async ipc + what was viengoos trying to explore? + * savask is confused + lcc: half-sync ipc :) + lcc: i can't tell you more on that, i need to understand it better + myself before any explanation attempt + You say that mach problem is asynch ipc. And L4's problem is it's + sync ipc. That means problems are in either of them! + exactly + how did apple resolve issues with mach? + What is perfect then? A "golden middle"? + lcc: they have migrating threads, which make most rpc behave as if + they used sync ipc + savask: nothing is perfect :p + braunr: but why async ipc is the problem? + mcsim: it requires in-kernel buffering + braunr: Yes, but we can't have problems everywhere o_O + mcsim: this not only reduces communication performance, but + creates many resource usage problems + mcsim: and potential denial of service, which is what we + experience most of the time when something in the hurd fails + savask: there are problems we can live with + braunr: But this could be replaced by userspace server, isn't it? + savask: this is what monolithic kernels do + mcsim: what ? + mcsim: this would be the same, this central buffering server would + suffer from the same kind of issue + braunr: async ipc. Buffer can hold special server + But there could be created several servers, and queue could have + limit. + queue limits are a problem + when a queue limit is reached, you either block (= sync ipc) or + lose a message + to keep messaging reliable, mach makes senders block + the problem is that async ipc is often used to avoid blocking + so blocking when you don't expect it can create deadlocks + savask: a good compromise is to use sync ipc most of the time, and + async ipc for a few special cases, like signals + this is what okl4 does if i'm right + i'm not sure of the details, but like many other projects they + realized current systems simply need good support for async ipc, so they + extended l4 or something on top of it to provide it + it took years of research for very smart people to get to some + consensus like "sync ipc is better but async is needed too" + personaly i don't like l4 :/ + really not + braunr: Anyway there is some queue for messaging, but at the moment + if it overflows panics kernel. And with limited queue servers will panic. + mcsim: it can't overflow + mach blocks senders + queuing basically means "block and possible deadlock" or "lose + messages and live with it" + So, deadlocks are still possible? + of course + have a look at the libpager debian patch and the discussion around + it + it's a perfect example + braunr: it makes gnu mach slow as hell sometimes, which I guess is + because all threads (which can ben 1000s) wake at the same time + youpi: you mean are created ? + because they'll have to wake in any case + i can understand why creating lots of threads is slower, but + cthreads never destroyes kernel threads + doesn't seem to be a mach problem, rather a cthreads one + i hope we're able to remove the patch after pthreads are used + +[[libpthread]]. + + braunr: You state that hurd can't move to sync ipc, since it + depends on async ipc. But at the same time async ipc doesn't guarantee + that task wouldn't block. So, I don't understand why limited queues will + lead to more deadlocks? + mcsim: async ipc can block because of queue limits + mcsim: if you remove the limit, you remove the deadlock problem, + and replace it with denial of service + mcsim: i didn't say the hurd can't move to sync ipc + mcsim: i said it came to depend on async ipc as provided by mach, + and we would need to change that + and it's tricky + braunr: no, I really mean are woken. The timeout which gets dropped + by the patch makes threads wake after some time, to realize they should + go away. It's a hell long when all these threads wake at the same time + (because theygot created at the same time) + ahh + + savask: what is perfect regarding IPC is something nobody can + really answer... there are competing opinions on that matter. but we know + by know that the Mach model is far from ideal, and that the (original) L4 + model is also problematic -- at least for implementing a UNIX-like system + personally, if i'd create a system now, i'd use sync ipc for + almost everything, and implement posix-like signals in the kernel + that's one solution, it's not perfect + savask: actually the real answer may be "noone knows for now and + it still requires work and research" + so for now, we're using mach + savask: regarding IPC, the path explored by Viengoos (and briefly + Coyotos) seems rather promising to me + savask: and yes, I believe that whatever direction we take, we + should do so by incrementally reworking Mach rather than jumping to a + completely new microkernel... -- cgit v1.2.3