[[!meta copyright="Copyright © 2012 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled [[GNU Free Documentation License|/fdl]]."]]"""]] [[!tag open_issue_documentation open_issue_gnumach]] # IRC, freenode, #hurd, 2012-06-29 I do not understand what are the deficiencies of Mach, the content I find on this is vague... the major problems are that the IPC architecture offers poor performance; and that resource usage can not be properly accounted to the right parties antrik: the more i study it, the more i think ipc isn't the problem when it comes to performance, not directly i mean, the implementation is a bit heavy, yes, but it's fine the problems are resource accounting/scheduling and still too much stuff inside kernel space and with a very good implementation, the performance problem would come from crossing address spaces (and even more on SMP, i've been thinking about it lately, since it would require syncing mmu state on each processor currently using an address space being modified) braunr: the problem with Mach IPC is that it requires too many indirections to ever be performant AIUI antrik: can you mention them ? the semantics are generally quite complex, compared to Coyotos for example, or even Viengoos antrik: the semantics are related to the message format, which can be simplified i think everybody agrees on that i'm more interested in the indirections but then it's not Mach IPC anymore :-) right 22:03 < braunr> i mean, the implementation is a bit heavy, yes, but it's fine that's not an implementation issue that's what i meant by heavy :) well, yes and no Mach IPC have changed over time it would be newer Mach IPC ... :) the fact that data types are (supposed to be) transparent to the kernel is a major part of the concept, not just an implementation detail but it's not just the message format transparent ? but they're not :/ the option to buffer in the kernel also adds a lot of complexity buffer in the kernel ? ah you mean message queues yes braunr: eh? the kernel parses all the type headers during transfer yes, so it's not transparent at all maybe you have a different understanding of "transparent" ;-) i guess I think most of the other complex semantics are kinda related to the in-kernel buffering... i fail to see why :/ well, it allows ports rights to be destroyed while a message is in transfer. a lot of semantics revolve around what happens in that case yes but it doesn't affect performance a lot sure it does. it requires a lot of extra code and indirections not a lot of it "a lot" is quite a relative term :-) compared to L4 for example, it *is* a lot and those indirections (i think you refer to more branching here) are taken only when appropriate, and can be isolated, improved through locality, etc.. the features they add are also huge L4 is clearly insufficient all current L4 forks have added capabilities .. (that, with the formal verification, make se4L one of the "hottest" recent system projects) seL4* yes, but with very few extra indirection I think... similar to EROS (which claims to have IPC almost as efficient as the original L4) possibly I still fail to see much real benefit in formal verification :-) but compared to other problems, this added code is negligible antrik: for a microkernel, me too :/ the kernel is already so small you can simply audit it :) no, it's not neglible, if you go from say two cache lines touched per IPC (original L4) to dozens (Mach) every additional variable that needs to be touched to resolve some indirection, check some condition adds significant overhead if you compare the dozens to the huge amount of inter processor interrupt you get each time you change the kernel map, it's next to nothing .. change the kernel map? not sure what you mean syncing address spaces on hundreds of processors each time you send a message is a real scalability issue here (as an example), where Mach to L4 IPC seem like microoptimization braunr: modify, you mean? yes (not switchp ) but that's only one example yes, modify, not switch also, we could easily get rid of the ihash library making the message provide the address of the object associated to a receive right so the only real indirection is the capability, like in other systems, and yes, buffering adds a bit of complexity there are other optimizations that could be made in mach, like merging structures to improve locality "locality"? having rights close to their target port when there are only a few pinotree: locality of reference for cache efficiency hundreds of processors? let's stay realistic here :-) i am .. a microkernel based system is also a very good environment for RCU (i yet have to understand how liburcu actually works on linux) I'm not interested in systems for supercomputers. and I doubt desktop machines will get that many independant cores any time soon. we still lack software that could even romotely exploit that hum, the glibc build system ? :> lol we have done a survey over the nix linux distribution quite few packages actually benefit from a lot of cores and we already know them :) what i'm trying to say is that, whenever i think or even measure system performance, both of the hurd and others, i never actually see the IPC as being the real performance problem there are many other sources of overhead to overcome before getting to IPC I completely agree and with the advent of SMP, it's even more important to focus on contention (also, 8 cores aren't exactly a lot...) antrik: s/8/7/ , or even 6 ;) braunr: it depends a lot on the use case. most of the problems we see in the Hurd are probably not directly related to IPC performance; but I pretty sure some are (such as X being hardly usable with UNIX domain sockets) antrik: these have more to do with the way mach blocks than IPC itself similar to the ext2 "sleep storm" a lot of overhead comes from managing ports (for for example), which also mostly comes down to IPC performance antrik: yes, that's the main indirection antrik: but you need such management, and the related semantics in the kernel interface (although i wonder if those should be moved away from the message passing call) you mean a different interface for kernel calls than for IPC to other processes? that would break transparency in a major way. not sure we really want that... antrik: no antrik: i mean calls specific to right management admittedly, transparency for port management is only useful in special cases such as rpctrace, and that probably could be served better with dedicated debugging interfaces... antrik: i.e. not passing rights inside messages passing rights inside messages is quite essential for a capability system. the problem with Mach IPC in regard to that is that the message format allows way more flexibility than necessary in that regard... antrik: right antrik: i don't understand why passing rights inside messages is important though antrik: essential even braunr: I guess he means you need at least one way to pass rights braunr: well, for one, you need to pass a reply port with each RPC request... youpi: well, as he put, the message passing call is overpowered, and this leads to many branches in the code antrik: the reply port is obvious, and can be optimized antrik: but the case i worry about is passing references to objects between tasks antrik: rights and identities with the auth server for example antrik: well ok forget it, i just recall how it actually works :) antrik: don't forget we lack thread migration antrik: you may not think it's important, but to me, it's a major improvement for RPC performance braunr: how can seL4 be the most interesting microkernel then?... ;-) antrik: hm i don't know the details, but if it lacks thread migration, something is wrong :p antrik: they should work on viengoos :) (BTW, AIUI thread migration is quite related to passive objects -- something Hurd folks never dared seriously consider...) i still don't know what passive objects are, or i have forgotten it :/ no own control threads hm, i'm still missing something what do you refer to by control thread ? with* i.e. no main loop etc.; only activated by incoming calls ok well, if i'm right, thomas bushnel himself wrote (recently) that the ext2 "sleep" performance issue was expected to be solved with thread migration so i guess they definitely considered having it braunr: don't know what the "sleep peformance issue" is... http://lists.gnu.org/archive/html/bug-hurd/2011-12/msg00032.html antrik: also, the last message in the thread, http://lists.gnu.org/archive/html/bug-hurd/2011-12/msg00050.html antrik: do you consider having a reply port being an avoidable overhead ? braunr: not sure. I don't remember hearing of any capability system doing this kind of optimisation though; so I guess there are reasons for that... antrik: yes me too, even more since neal talked about it on viengoos I wonder whether thread management is also such a large overhead with fully sync IPC, on L4 or EROS for example... antrik: it's still a very handy optimization for thread scheduling antrik: it makes solving priority inversions a lot easier actually, is thread scheduling a problem at all with a thread activation approach like in Viengoos? antrik: thread activation is part of thread migration antrik: actually, i'd say they both refer to the same thing err... scheduler activation was the term I wanted to use same well scheduler activation is too vague to assert that antrik: do you refer to scheduler activations as described in http://en.wikipedia.org/wiki/Scheduler_activations ? my understanding was that Viengoos still has traditional threads; they just can get scheduled directly on incoming IPC braunr: that Wikipedia article is strange. it seems to use "scheduler activations" as a synonym for N:M multithreading, which is not at all how I understood it antrik: I used to try to keep a look at those pages, to fix such wrong things, but left it antrik: that's why i ask IIRC Viengoos has a thread associated with each receive buffer. after copying the message, the kernel would activate the processes activation handler, which in turn could decide to directly schedule the thead associated with the buffer or something along these lines antrik: that's similar to mach handoff antrik: generally enough, all the thread-related pages on wikipedia are quite bogus nah, handoff just schedules the process; which is not useful, if the right thread isn't activated in turn... antrik: but i think it's more than that, even in viengoos for instance, the french "thread" page was basically saying that they were invented for GUIs to overlap computation with user interaction .. :) youpi: good to know... antrik: the "misunderstanding" comes from the fact that scheduler activations is the way N:M threading was implemented on netbsd youpi: that's a refreshing take on the matter... ;-) antrik: i'll read the critique and viengoos doc/source again to be sure about what we're talking :) antrik: as threading is a major issue in mach, and one of the things i completely changed (and intend to change) in x15, whenever i get to work on that again ..... :) antrik: interestingly, the paper about scheduler activations was written (among others) by brian bershad, in 92, when he was actively working on research around mach braunr: BTW, I have little doubt that making RPC first-class would solve a number of problems... I just wonder how many others it would open # IRC, freenode, #hurd, 2012-09-04 X15 it was intended as a mach clone, but now that i have better knowledge of both mach and the hurd, i don't want to retain mach compatibility and unlike viengoos, it's not really experimental it's focused on memory and cpu scalability, and performance, with techniques likes thread migration and rcu the design i have in mind is closer to what exists today, with strong emphasis on scalability and performance, that's all and the reason the hurd can't be modified first is that my design relies on some important design changes so there is a strong dependency on these mechanisms that requires the kernel to exists first ## IRC, freenode, #hurd, 2012-09-06 In context of [[open_issues/multithreading]] and later [[open_issues/select]]. And you will address the design flaws or implementation faults with x15? no i'll address the implementation details :p and some design issues like cpu and memory resource accounting but i won't implement generic resource containers assuming it's completed, my work should provide a hurd system on par with modern monolithic systems (less performant of course, but performant, scalable, and with about the same kinds of problems) for example, thread migration should be mandatory which would make client calls behave exactly like a userspace task asking a service from the kernel you have to realize that, on a monolithic kernel, applications are clients, and the kernel is a server and when performing a system call, the calling thread actually services itself by running kernel code which is exactly what thread migration is for a multiserver system thread migration also implies sync IPC and sync IPC is inherently more performant because it only requires one copy, no in kernel buffering sync ipc also avoids message floods, since client threads must run server code and this is not achievable with evolved gnumach and/or hurd? well that's not entirely true, because there is still a form of async ipc, but it's a lot less likely it probably is but there are so many things to change i prefer starting from scratch scalability itself probably requires a revamp of the hurd core libraries and these libraries are like more than half of the hurd code mach ipc and vm are also very complicated it's better to get something new and simpler from the start a major task nevertheless:-D at least with the vm, netbsd showed it's easier to achieve good results from new code, as other mach vm based systems like freebsd struggled to get as good well yes but at least it's not experimental everything i want to implement already exists, and is tested on production systems it's just time to assemble those ideas and components together into something that works you could see it as a qnx-like system with thread migration, the global architecture of the hurd, and some improvements from linux like rcu :) ### IRC, freenode, #hurd, 2012-09-07 braunr: thread migration is tested on production systems? BTW, I don't think that generally increasing the priority of servers is a good idea in most cases, IPC should actually be sync. slpz looked at it at some point, and concluded that the implementation actually has a fast-path for that case. I wonder what happens to scheduling in this case -- is the receiver sheduled immediately? if not, that's something to fix... antrik: qnx does something very close to thread migration, yes antrik: i agree increasing the priority isn't a good thing, but it's the best of the quick and dirty ways to reduce message floods the problem isn't sync ipc in mach the problem is the notifications (in our cases the dead name notifications) that are by nature async and a malicious program could send whatever it wants at the fastest rate it can braunr: malicious programs can do any number of DOS attacks on the Hurd; I don't see how increasing priority of system servers is relevant in that context (BTW, I don't think dead name notifications are async by nature... just like for most other IPC, the *usual* case is that a server thread is actively waiting for the message when it's generated) antrik: it's async with respect to the client antrik: and malicious programs shouldn't be able to do that kind of dos but this won't be fixed any time soon on the other hand, a higher priority helps servers not create too many threads because of notifications, and that's a good thing gnu_srs: the "fix" for this will be to rewrite select so that it's synchronous btw replacing dead name notifications with something like cancelling a previously installed select request no idea what "async with respect to the client" means it means the client doesn't wait for anything what is the client? what scenario are you talking about? how does it affect scheduling? for notifications, it's usually the kernel it doesn't directly affect scheduling it affects the amount of messages a hurd server has to take care of and the more messages, the more threads i'm talking about event loops and non blocking (or very short) selects the amount of messages is always the same. the question is whether they can be handled before more come in. which would be the case if be default the receiver gets scheduled as soon as a message is sent... no scheduling handoff doesn't imply the thread will be ready to service the next message by the time a client sends a new one the rate at which a message queue gets filled has nothing to do with scheduling handoff I very much doubt rates come into play at all well they do in my understanding the problem is that a lot of messages are sent before the receive ever has a chance to handle them. so no matter how fast the receiver is, it looses a lot of non blocking selects means a lot of reply ports destroyed, a lot of dead name notifications, and what i call message floods at server side no it used to work fine with cthreads it doesn't any more with pthreads because pthreads are slightly slower if the receiver gets a chance to do some work each time a message arrives, in most cases it would be free to service the next request with the same thread no, because that thread won't have finished soon enough no, it *never* worked fine. it might have been slighly less terrible. ok it didn't work fine, it worked ok it's entirely a matter of rate here and that's the big problem, because it shouldn't I'm pretty sure the thread would finish before the time slice ends in almost all cases no too much contention and in addition locking a contended spin lock depresses priority so servers really waste a lot of time because of that I doubt contention would be a problem if the server gets a chance to handle each request before 100 others come in i don't see how this is related handling a request doesn't mean entirely processing it there is *no* relation between handoff and the rate of incoming message rate unless you assume threads can always complete their task in some fixed and low duration sure there is. we are talking about a single-processor system here. which is definitely not the case i don't see what it changes I'm pretty sure notifications can generally be handled in a very short time if the server thread is scheduled as soon as it gets a message, it can also get preempted by the kernel before replying no, notifications can actually be very long hurd_thread_cancel calls condition_broadcast so if there are a lot of threads on that .. (this is one of the optimizations i have in mind for pthreads, since it's possible to precisely select the target thread with a doubly linked list) but even if that's the case, there is no guarantee you can't assume it will be "quick enough" there is no guarantee. but I'm pretty sure it will be "quick enough" in the vast majority of cases. which is all it needs. ok that's also the idea behind raising server priorities braunr: so you are saying the storms are all caused by select(), and once this is fixed, the problem should be mostly gone and the workaround not necessary anymore? yes let's hope you are right :-) :) (I still think though that making hand-off scheduling default is the right thing to do, and would improve performance in general...) sure well no it's just a hack ;p but it's a right one the right thing to do is a lot more complicated as roland wrote a long time ago, the hurd doesn't need dead-name notifications, or any notification other than the no-sender (which can be replaced by a synchronous close on fd like operation) well, yes... I still think the viengoos approach is promising. I meant the right thing to do in the existing context ;-) better than this priority hack oh? you happen to have a link? never heard of that... i didn't want to do it initially, even resorting to priority depression on trhead creation to work around the problem hm maybe it wasn't him, i can't manage to find it antrik: http://lists.gnu.org/archive/html/l4-hurd/2003-09/msg00009.html "Long ago, in specifying the constraints of what the Hurd needs from an underlying IPC system/object model we made it very clear that we only need no-senders notifications for object implementors (servers)" "We don't in general make use of dead-name notifications, which are the general kind of object death notification Mach provides and what serves as task death notification." "In the places we do, it's to serve some particular quirky need (and mostly those are side effects of Mach's decouplable RPCs) and not a semantic model we insist on having." ### IRC, freenode, #hurd, 2012-09-08 The notion that seemed appropriate when we thought about these issues for Fluke was that the "alert" facility be a feature of the IPC system itself rather than another layer like the Hurd's io_interrupt protocol. braunr: funny, that's *exactly* what I was thinking when looking at the io_interrupt mess :-) (and what ultimately convinced me that the Hurd could be much more elegant with a custom-tailored kernel rather than building around Mach) ## IRC, freenode, #hurd, 2012-09-24 my initial attempt was a mach clone but now i want a mach-like kernel, without compability which new licence ? and some very important changes like sync ipc gplv3 (or later) cool 8) yes it is gplv2+ since i didn't take the time to read gplv3, but now that i have, i can't use anything else for such a project: ) what is mach-like ? (how it is different from Pistachio like ?) l4 doesn't provide capabilities hmmm.. you need a userspace for that +server and it relies on complete external memory management how much work is done ? my kernel will provide capabilities, similar to mach ports, but simpler (less overhead) i want the primitives right like multiprocessor, synchronization, virtual memory, etc.. ### IRC, freenode, #hurd, 2012-09-30 for those interested, x15 is now a project of its own, with no gnumach compability goal, and covered by gplv3+