diff options
38 files changed, 836 insertions, 657 deletions
diff --git a/community/gsoc/project_ideas/disk_io_performance.mdwn b/community/gsoc/project_ideas/disk_io_performance.mdwn index b6f223c8..ae634709 100644 --- a/community/gsoc/project_ideas/disk_io_performance.mdwn +++ b/community/gsoc/project_ideas/disk_io_performance.mdwn @@ -11,6 +11,8 @@ License|/fdl]]."]]"""]] [[!meta title="Disk I/O Performance Tuning"]] +[[!tag open_issue_hurd]] + The most obvious reason for the Hurd feeling slow compared to mainstream systems like GNU/Linux, is a low I/O system performance, in particular very slow hard disk access. @@ -18,8 +20,8 @@ slow hard disk access. The reason for this slowness is lack and/or bad implementation of common optimization techniques, like scheduling reads and writes to minimize head movement; effective block caching; effective reads/writes to partial blocks; -[[reading/writing multiple blocks at once|clustered_page_faults]]; and -[[read-ahead]]. The +[[reading/writing multiple blocks at once|open_issues/performance/io_system/clustered_page_faults]]; and +[[open_issues/performance/io_system/read-ahead]]. The [[ext2_filesystem_server|hurd/translator/ext2fs]] might also need some optimizations at a higher logical level. @@ -29,12 +31,12 @@ requires understanding the data flow through the various layers involved in disk access on the Hurd ([[filesystem|hurd/virtual_file_system]], [[pager|hurd/libpager]], driver), and general experience with optimizing complex systems. That said, the killing feature we are definitely -missing is the [[read-ahead]], and even a very simple implementation would bring +missing is the [[open_issues/performance/io_system/read-ahead]], and even a very simple implementation would bring very big performance speedups. Here are some real testcases: - * [[binutils_ld_64ksec]]; + * [[open_issues/performance/io_system/binutils_ld_64ksec]]; * running the Git testsuite which is mostly I/O bound; diff --git a/community/gsoc/project_ideas/dtrace.mdwn b/community/gsoc/project_ideas/dtrace.mdwn index f70598ca..6261c03e 100644 --- a/community/gsoc/project_ideas/dtrace.mdwn +++ b/community/gsoc/project_ideas/dtrace.mdwn @@ -11,6 +11,8 @@ License|/fdl]]."]]"""]] [[!meta title="Kernel Instrumentation"]] +[[!tag open_issue_gnumach]] + One of the main problems of the current Hurd implementation is very poor [[open_issues/performance]]. While we have a bunch of ideas what could cause the performance problems, these are mostly just guesses. Better understanding what really diff --git a/community/gsoc/project_ideas/libdiskfs_locking.mdwn b/community/gsoc/project_ideas/libdiskfs_locking.mdwn index 74937389..faac8bd9 100644 --- a/community/gsoc/project_ideas/libdiskfs_locking.mdwn +++ b/community/gsoc/project_ideas/libdiskfs_locking.mdwn @@ -11,6 +11,8 @@ License|/fdl]]."]]"""]] [[!meta title="Fix libdiskfs Locking Issues"]] +[[!tag open_issue_hurd]] + Nowadays the most often encountered cause of Hurd crashes seems to be lockups in the [[hurd/translator/ext2fs]] server. One of these could be traced recently, and turned out to be a lock inside [[hurd/libdiskfs]] that was taken @@ -20,19 +22,19 @@ faulty paths causing these lockups. The task is systematically checking the [[hurd/libdiskfs]] code for this kind of locking issues. To achieve this, some kind of test harness has to be implemented: For example instrumenting the code to check locking correctness constantly at -runtime. Or implementing a [[unit testing]] framework that explicitly checks +runtime. Or implementing a [[open_issues/unit_testing]] framework that explicitly checks locking in various code paths. (The latter could serve as a template for implementing unit tests in other parts of the Hurd codebase...) -(A [[systematic code review|security]] would probably suffice to find the +(A [[systematic code review|open_issues/security]] would probably suffice to find the existing locking issues; but it wouldn't document the work in terms of actual code produced, and thus it's not suitable for a GSoC project...) This task requires experience with debugging locking issues in -[[multithreaded|multithreading]] applications. +[[multithreaded|open_issues/multithreading]] applications. -Tools have been written for automated [[code analysis]]; these can help to +Tools have been written for automated [[open_issues/code_analysis]]; these can help to locate and fix such errors. Possible mentors: Samuel Thibault (youpi) diff --git a/community/gsoc/project_ideas/procfs.mdwn b/community/gsoc/project_ideas/procfs.mdwn index d4760aae..b4d1dc5f 100644 --- a/community/gsoc/project_ideas/procfs.mdwn +++ b/community/gsoc/project_ideas/procfs.mdwn @@ -1,4 +1,5 @@ -[[!meta copyright="Copyright © 2008, 2009 Free Software Foundation, Inc."]] +[[!meta copyright="Copyright © 2008, 2009, 2011 Free Software Foundation, +Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this @@ -43,3 +44,8 @@ Exercise: Add or fix one piece in the existing procfs translator. *Status*: Madhusudan.C.S has implemented a new, fully functional [[procfs|madhusudancs]] for GSoC 2008. He is still working on some outstanding issues. + +--- + +Note that [[jkoenig's `procfs` re-write|hurd/translator/procfs/jkoenig]] should +address all these issues already. diff --git a/community/gsoc/project_ideas/testing_framework.mdwn b/community/gsoc/project_ideas/testing_framework.mdwn index 0448bc6b..ff9899d9 100644 --- a/community/gsoc/project_ideas/testing_framework.mdwn +++ b/community/gsoc/project_ideas/testing_framework.mdwn @@ -36,7 +36,7 @@ before the end of the summer. (As a bonus, in addition to these explicit tests, it would be helpful to integrate some methods -for testing [[locking validity|open_issues/locking]], +for testing [[locking validity|libdiskfs_locking]], performing static code analysis etc.) This task probably requires some previous experience diff --git a/community/gsoc/project_ideas/testing_framework/discussion.mdwn b/community/gsoc/project_ideas/testing_framework/discussion.mdwn new file mode 100644 index 00000000..872d0eb7 --- /dev/null +++ b/community/gsoc/project_ideas/testing_framework/discussion.mdwn @@ -0,0 +1,270 @@ +[[!meta copyright="Copyright © 2010, 2011 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +freenode, #hurd channel, 2011-03-05: + + <nixness> what about testing though? + <nixness> like sort of "what's missing? lets write tests for it so that + when someone gets to implementing it, he knows what to do. Repeat" + project + <antrik> you mean creating an automated testing framework? + <antrik> this is actually a task I want to add for this year, if I get + around to it :-) + <nixness> yeah I'd very much want to apply for that one + <nixness> cuz I've never done Kernel hacking before, but I know that with + the right tasks like "test VM functionality", I would be able to write up + the automated tests and hopefully learn more about what breaks/makes the + kernel + <nixness> (and it would make implementing the feature much less hand-wavy + and more correct) + <nixness> antrik, I believe the framework(CUnit right?) exists, but we just + need to write the tests. + <antrik> do you have prior experience implementing automated tests? + <nixness> lots of tests! + <nixness> yes, in Java mostly, but I've played around with CUnit + <antrik> ah, great :-) + <nixness> here's what I know from experience: 1) write basic tests. 2) + write ones that test multiple features 3) stress test [option 4) + benchmark and record to DB] + <youpi> well, what we'd rather need is to fix the issues we already know + from the existing testsuites :) + +[[GSoC project propsal|community/gsoc/project_ideas/testsuites]]. + + <nixness> youpi, true, and I think I should check what's available in way + of tests, but if the tests are "all or nothing" then it becomes really + hard to fix your mistakes + <youpi> they're not all or nothing + <antrik> youpi: the existing testsuites don't test specific system features + <youpi> libc ones do + <youpi> we could also check posixtestsuite which does too + +[[open_issues/open_posix_test_suite]]. + + <antrik> AFAIK libc has very few failing tests + +[[open_issues/glibc_testsuite]]. + + <youpi> err, like twenty? + <youpi> € grep -v '^#' expected-results-i486-gnu-libc | wc -l + <youpi> 67 + <youpi> nope, even more + <antrik> oh, sorry, I confused it with coreutils + <pinotree> plus the binutils ones, i guess + <youpi> yes + +[[open_issues/binutils#weak]]. + + <antrik> anyways, I don't think relying on external testsuites for + regression testing is a good plan + <antrik> also, that doesn't cover unit testing at all + <youpi> why ? + <youpi> sure there can be unit testing at the translator etc. level + <antrik> if we want to implement test-driven development, and/or do serious + refactoring without too much risk, we need a test harness where we can + add specific tests as needed + <youpi> but more than often, the issues are at the libc / etc. level + because of a combination o fthings at the translator level, which unit + testing won't find out + * nixness yewzzz! + <nixness> sure unit testing can find them out. if they're good "unit" tests + <youpi> the problem is that you don't necessarily know what "good" means + <youpi> e.g. for posix correctness + <youpi> since it's not posix + <nixness> but if they're composite clever tests, then you lose that + granularity + <nixness> youpi, is that a blackbox test intended to be run at the very end + to check if you're posix compliant? + <antrik> also, if we have our own test harness, we can run tests + automatically as part of the build process, which is a great plus IMHO + <youpi> nixness: "that" = ? + <nixness> oh nvm, I thought there was a test stuie called "posix + correctness" + <youpi> there's the posixtestsuite yes + <youpi> it's an external one however + <youpi> antrik: being external doesn't mean we can't invoke it + automatically as part of the build process when it's available + <nixness> youpi, but being internal means it can test the inner workings of + certain modules that you are unsure of, and not just the interface + <youpi> sure, that's why I said it'd be useful too + <youpi> but as I said too, most bugs I've seen were not easy to find out at + the unit level + <youpi> but rather at the libc level + <antrik> of course we can integrate external tests if they exist and are + suitable. but that that doesn't preclude adding our own ones too. in + either case, that integration work has to be done too + <youpi> again, I've never said I was against internal testsuite + <antrik> also, the major purpose of a test suite is checking known + behaviour. a low-level test won't directly point to a POSIX violation; + but it will catch any changes in behaviour that would lead to one + <youpi> what I said is that it will be hard to write them tight enough to + find bugs + <youpi> again, the problem is knowing what will lead to a POSIX violation + <youpi> it's long work + <youpi> while libc / posixtestsuite / etc. already do that + <antrik> *any* unexpected change in behaviour is likely to cause bugs + somewher + <youpi> but WHAT is "expected" ? + <youpi> THAT is the issue + <youpi> and libc/posixtessuite do know that + <youpi> at the translator level we don't really + <youpi> see the recent post about link() + +[link(dir,name) should fail with +EPERM](http://lists.gnu.org/archive/html/bug-hurd/2011-03/msg00007.html) + + <youpi> in my memory jkoenig pointed it out for a lot of such calls + <youpi> and that issue is clearly not known at the translator level + <nixness> so you're saying that the tests have to be really really + low-level, and work our way up? + <youpi> I'm saying that the translator level tests will be difficult to + write + <antrik> why isn't it known at the translator level? if it's a translator + (not libc) bug, than obviously the translator must return something wrong + at some point, and that's something we can check + <youpi> because you'll have to know all the details of the combinations + used in libc, to know whether they'll lead to posix issues + <youpi> antrik: sure, but how did we detect that that was unexpected + behavior? + <youpi> because of a glib test + <youpi> at the translator level we didn't know it was an unexpected + behavior + <antrik> gnulib actually + <youpi> and if you had asked me, I wouldn't have known + <antrik> again, we do *not* write a test suite to find existing bugs + <youpi> right, took one for the other + <youpi> doesn't really matter actually + <youpi> antrik: ok, I don't care then + <antrik> we write a test suite to prevent future bugs, or track status of + known bugs + <youpi> (don't care *enough* for now, I mean) + <nixness> hmm, so write something up antrik for GSoC :) and I'll be sure to + apply + <antrik> now that we know some translators return a wrong error status in a + particular situation, we can write a test that checks exactly this error + status. that way we know when it is fixed, and also when it breaks again + <antrik> nixness: great :-) + <nixness> sweet. that kind of thing would also need a db backend + <antrik> nixness: BTW, if you have a good idea, you can send an application + for it even if it's not listed among the proposed tasks... + <antrik> so you don't strictly need a writeup from me to apply for this :-) + <nixness> antrik, I'll keep that in mind, but I'll also be checking your + draft page + <nixness> oh cool :) + <antrik> (and it's a well known fact that those projects which students + proposed themselfs tend to be the most successful ones :-) ) + * nixness draft initiated + <antrik> youpi: BTW, I'm surprised that you didn't mention libc testsuite + before... working up from there is probably a more effective plan than + starting with high-level test suites like Python etc... + <youpi> wasn't it already in the gsoc proposal? + <youpi> bummer + <antrik> nope + +freenode, #hurd channel, 2011-03-06: + + <nixness> how's the hurd coding workflow, typically? + +*nixness* -> *foocraft*. + + <foocraft> we're discussing how TDD can work with Hurd (or general kernel + development) on #osdev + <foocraft> so what I wanted to know, since I haven't dealt much with GNU + Hurd, is how do you guys go about coding, in this case + <tschwinge> Our current workflow scheme is... well... is... + <tschwinge> Someone wants to work on something, or spots a bug, then works + on it, submits a patch, and 0 to 10 years later it is applied. + <tschwinge> Roughly. + <foocraft> hmm so there's no indicator of whether things broke with that + patch + <foocraft> and how low do you think we can get with tests? A friend of mine + was telling me that with kernel dev, you really don't know whether, for + instance, the stack even exists, and a lot of things that I, as a + programmer, can assume while writing code break when it comes to writing + kernel code + <foocraft> Autotest seems promising + +See autotest link given above. + + <foocraft> but in any case, coming up with the testing framework that + doesn't break when the OS itself breaks is hard, it seems + <foocraft> not sure if autotest isolates the mistakes in the os from + finding their way in the validity of the tests themselves + <youpi> it could be interesting to have scripts that automatically start a + sub-hurd to do the tests + +[[hurd/subhurd#unit_testing]]. + + <tschwinge> foocraft: To answer one of your earlier questions: you can do + really low-level testing. Like testing Mach's message passing. A + million times. The questions is whether that makes sense. And / or if + it makes sense to do that as part of a unit testing framework. Or rather + do such things manually once you suspect an error somewhere. + <tschwinge> The reason for the latter may be that Mach IPC is already + heavily tested during normal system operation. + <tschwinge> And yet, there still may be (there are, surely) bugs. + <tschwinge> But I guess that you have to stop at some (arbitrary?) level. + <foocraft> so we'd assume it works, and test from there + <tschwinge> Otherwise you'd be implementing the exact counter-part of what + you're testing. + <tschwinge> Which may be what you want, or may be not. Or it may just not + be feasible. + <foocraft> maybe the testing framework should have dependencies + <foocraft> which we can automate using make, and phony targets that run + tests + <foocraft> so everyone writes a test suite and says that it depends on A + and B working correctly + <foocraft> then it'd go try to run the tests for A etc. + <tschwinge> Hmm, isn't that -- on a high level -- have you have by + different packages? For example, the perl testsuite depends (inherently) + on glibc working properly. A perl program's testsuite depends on perl + working properly. + <foocraft> yeah, but afaik, the ordering is done by hand + +freenode, #hurd channel, 2011-03-07: + + <antrik> actually, I think for most tests it would be better not to use a + subhurd... that leads precisely to the problem that if something is + broken, you might have a hard time running the tests at all :-) + <antrik> foocraft: most of the Hurd code isn't really low-level. you can + use normal debugging and testing methods + <antrik> gnumach of course *does* have some low-level stuff, so if you add + unit tests to gnumach too, you might run into issues :-) + <antrik> tschwinge: I think testing IPC is a good thing. as I already said, + automated testing is *not* to discover existing but unknown bugs, but to + prevent new ones creeping in, and tracking progress on known bugs + <antrik> tschwinge: I think you are confusing unit testing and regression + testing. http://www.bddebian.com/~hurd-web/open_issues/unit_testing/ + talks about unit testing, but a lot (most?) of it is actually about + regression tests... + <tschwinge> antrik: That may certainly be -- I'm not at all an expert in + this, and just generally though that some sort of automated testing is + needed, and thus started collecting ideas. + <tschwinge> antrik: You're of course invited to fix that. + +IRC, freenode, #hurd, 2011-03-08 + +(After discussing the [[open_issues/anatomy_of_a_hurd_system]].) + + <antrik> so that's what your question is actually about? + <foocraft> so what I would imagine is a set of only-this-server tests for + each server, and then we can have fun adding composite tests + <foocraft> thus making debugging the composite scenarios a bit less tricky + <antrik> indeed + <foocraft> and if you were trying to pass a composite test, it would also + help knowing that you still didn't break the server-only test + <antrik> there are so many different things that can be tested... the + summer will only suffice to dip into this really :-) + <foocraft> yeah, I'm designing my proposal to focus on 1) make/use a + testing framework that fits the Hurd case very well 2) write some tests + and docs on how to write good tests + <antrik> well, doesn't have to be *one* framework... unit testing and + regression testing are quite different things, which can be covered by + different frameworks diff --git a/community/gsoc/project_ideas/valgrind.mdwn b/community/gsoc/project_ideas/valgrind.mdwn index 5fa748ff..17385f75 100644 --- a/community/gsoc/project_ideas/valgrind.mdwn +++ b/community/gsoc/project_ideas/valgrind.mdwn @@ -11,6 +11,8 @@ License|/fdl]]."]]"""]] [[!meta title="Porting Valgrind to the Hurd"]] +[[!tag open_issue_gnumach open_issues_hurd]] + [Valgrind](http://valgrind.org/) is an extremely useful debugging tool for memory errors. (And some other kinds of hard-to-find errors too.) Aside from being useful for program development in general, diff --git a/open_issues/sudo_date_crash.mdwn b/faq/which_microkernel.mdwn index 53303abc..608e6b3f 100644 --- a/open_issues/sudo_date_crash.mdwn +++ b/faq/which_microkernel.mdwn @@ -1,4 +1,4 @@ -[[!meta copyright="Copyright © 2010 Free Software Foundation, Inc."]] +[[!meta copyright="Copyright © 2009, 2011 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this @@ -8,9 +8,8 @@ Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled [[GNU Free Documentation License|/fdl]]."]]"""]] -[[!tag open_issue_gnumach]] +[[!meta title="What happened with the ports to the L4 / Coyotos / Viengoos +microkernels?"]] -IRC, unknown channel, unknown date. - - <grey_gandalf> I did a sudo date... - <grey_gandalf> and the machine hangs +This story is told on the page about the +[[history/port_to_another_microkernel]]. diff --git a/history.mdwn b/history.mdwn index 8f155b54..0abcbd52 100644 --- a/history.mdwn +++ b/history.mdwn @@ -1,13 +1,13 @@ -[[!meta copyright="Copyright © 1998, 1999, 2001, 2002, 2007, 2008, 2009 Free -Software Foundation, Inc."]] +[[!meta copyright="Copyright © 1998, 1999, 2001, 2002, 2007, 2008, 2009, 2011 +Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license -is included in the section entitled -[[GNU Free Documentation License|/fdl]]."]]"""]] +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] [[!tag stable_URL]] @@ -91,4 +91,4 @@ mailing lists. --- - * [[Port_to_L4]] + * [[Port_to_another_microkernel]] diff --git a/history/port_to_another_microkernel.mdwn b/history/port_to_another_microkernel.mdwn new file mode 100644 index 00000000..b347cf38 --- /dev/null +++ b/history/port_to_another_microkernel.mdwn @@ -0,0 +1,171 @@ +[[!meta copyright="Copyright © 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, +2009, 2010, 2011 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!meta title="Porting the Hurd to another microkernel"]] + +At first, there was an effort to port the Hurd from the [[Mach +microkernel|microkernel/mach]] to the [[L4_microkernel_family|microkernel/l4]]. +Then the story continued... + +[[!toc levels=2]] + + +# L4 + +## Initial Idea + +Encountering a number of fundamental design issues with the [[Mach +microkernel|microkernel/mach]] (mostly regarding [[resource +management|open_issues/resource_management_problems]]), some of the Hurd +developers began experimenting with using other microkernels for the Hurd +around the turn of the millenium. + +The idea of using L4 as a [[microkernel]] for a Hurd system was initially +voiced in the [[community]] by Okuji Yoshinori, who, for discussing this +purpose, created the [[mailing_lists/l4-hurd]] mailing list in November 2000. + +Over the years, a lot of discussion have been held on this mailing list, which +today is still the right place for [[next-generation Hurd|hurd/ng]] +discussions. + + +## Why? + +Even though that said resource management issues constitute a broad research +topic, there was no hope that the original Mach project would work on these: +[[microkernel/Mach]] wasn't maintained by its original authors anymore. Mach +had served its purpose as a research vehicle, and has been retired by its +stakeholders. + +Thus, switching to a well-maintained current [[microkernel]] was expected to +yield a more solid foundation for a Hurd system than the [[decaying +Mach|microkernel/mach/history]] design and implementation was able to. + +At that time, the [[L4 microkernel family|microkernel/L4]] was one obvious +choice. Being a second-generation microkernel, it was deemed to provide for a +faster system kernel implementation, especially in the time-critical [[IPC]] +paths. Also, as L4 was already implemented for a bunch of different +architectures (x86, Alpha, MIPS; also including SMP support), and the Hurd +itself being rather archtecture-agnostic, it was expected to be able to easily +support more platforms than with the existing system. + + +## Steps and Goals + +At the same time, the idea was -- while mucking with the system's core anyway +-- to improve on some fundamental design issues, too -- like the resource +management problems, for example. + +One goal of porting the Hurd to L4 was to make the Hurd independent of +[[microkernel/Mach]] interfaces, to make it somewhat microkernel-agnostic. + +One idea was to first introduce a Mach-on-L4 emulation layer, to easily get a +usable (though slow) Hurd-using-Mach-interfaces-on-L4 system, and then +gradually move the Hurd servers to use L4 intefaces rather than Mach ones. + +A design upon the lean L4 kernel would finally have made it feasible to move +devices drivers out of the kernel's [[TCB]]. + + +# Implementation + +The project itself then was mostly lead by Marcus Brinkmann and Neal Walfield. +Neal started the original Hurd/L4 port while visiting Karlsruhe university in +2002. He explains: + +> My intention was to adapt the Hurd to exploit L4's concepts and intended +> [[design_pattern]]s; it was not to simply provide a Mach +> [[compatibility_layer]] on top of L4. When I left Karlsruhe, I no longer had +> access to [[microkernel/l4/Pistachio]] as I was unwilling to sign an NDA. +> Although the specification was available, the Karlsruhe group only [released +> their code in May +> 2003](https://lists.ira.uni-karlsruhe.de/pipermail/l4ka/2003-May/000345.html). +> Around this time, Marcus began hacking on Pistachio. He created a relatively +> complete run-time. I didn't really become involved again until the second +> half of 2004, after I complete by Bachelors degree. + +Development of Hurd/L4 was done in the [CVS module +`hurd-l4`](http://savannah.gnu.org/cgi-bin/viewcvs/hurd/hurd-l4/). The `doc` +directory contains a design document that is worth reading for anyone who +wishes to learn more about Hurd/L4. + +Even though there was progress -- see, for example, the [[QEMU image for +L4|hurd/running/qemu/image_for_l4]] -- this port never reached a releasable +state. Simple POSIX programs, such as `banner` could run, but for more complex +system interfaces, a lot more work was needed. + +Eventually, a straight-forward port of the original Hurd's design wasn't deemed +feasible anymore by the developers, partly due to them not cosidering L4 +suitable for implementing a general-purpose operating system on top of it, and +because of deficiencies in the original Hurd's design, which they discovered +along their way. Neal goes on: + +> Before Marcus and I considered [[microkernel/Coyotos]], we had already +> rejected some parts of the Hurd's design. The +> [[open_issues/resource_management_problems]] were +> what prompted me to look at L4. Also, some of the problems with +> [[hurd/translator]]s were already well-known to us. (For a more detailed +> description of the problems we have identified, see our [[hurd/critique]] in the +> 2007 July's SIGOPS OSR. We have also written a forward-looking +> [[hurd/ng/position_paper]].) + +> We visited Jonathan Shapiro at Hopkins in January 2006. This resulted in a +> number of discussions, some quite influential, and not always in a way which +> aligned our position with that of Jonathan's. This was particularly true of +> a number of security issues. + +A lange number of discussion threads can be found in the archives of the +[[mailing_lists/l4-hurd]] mailing list. + +> Hurd-NG, as we originally called it, was an attempt to articulate the system +> that we had come to envision in terms of interfaces and description of the +> system's structure. The new name was selected, if I recall correctly, as it +> clearly wasn't the Hurd nor the Hurd based on L4. + + +## Termination + +As of 2005, development of Hurd/L4 has stopped. + + +# Coyotos + +Following that, an attempt was started to use the kernel of the +[[microkernel/Coyotos]] system. As Coyotos is an object capability system +througout, the microkernel would obviously be more suitable for this purpose; +and it looked pretty promising in the beginning. However, further +investigations found that there are some very fundamental philosophical +differences between the Coyotos and Hurd designs; and thus this this attempt +was also abandonned, around 2006 / 2007. (This time before producing any +actual code.) + + +# Viengoos + +By now (that is, after 2006), there were some new [[microkernel/L4]] variants +available, which added protected [[IPC]] paths and other features necessary for +object capability systems; so it might be possible to implement the Hurd on top +of these. However, by that time the developers concluded that microkernel +design and system design are interconnected in very intricate ways, and thus +trying to use a third-party microkernel will always result in trouble. So Neal +Walfield created the experimental [[microkernel/Viengoos]] kernel instead -- +based on the experience from the previous experiments with L4 and Coyotos -- +for his [[research on resource +management|open_issues/resource_management_problems]]. Currently he works in +another research area though, and thus Viengoos is on hold. + + +# Intermediate Results + +Note that while none of the microkernel work is active now, the previous +experiments already yielded a lot of experience, which will be very useful in +the further development / improvement of the mainline (Mach-based) Hurd +implementation. diff --git a/hurd/faq/which_microkernel.mdwn b/history/port_to_another_microkernel/discussion.mdwn index f6225188..f2161195 100644 --- a/hurd/faq/which_microkernel.mdwn +++ b/history/port_to_another_microkernel/discussion.mdwn @@ -8,49 +8,6 @@ Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled [[GNU Free Documentation License|/fdl]]."]]"""]] -[[!meta title="What happened to the L4 / Coyotos / Viengoos microkernels?"]] - -Encountering a number of fundamental design issues with the Mach microkernel -(mostly with resource management), -some of the Hurd developers began experimenting with using other microkernels for the Hurd. - -The first attempt was reimplementing the Hurd on the L4 (Pistachio) microkernel. -This got going around 2003/2004, -and got pretty far (running some simple POSIX programs, such as "banner"); -however over time some lingering design issues turned out to be fundamental problems: -the original L4 is not suitable for building object capability systems like the Hurd. -Thus development was aborted in 2005. - -Following that, an attempt was started to use the kernel of the Coyotos system. -As Coyotos is an object-capability system througout, -the microkernel would obviously be more suitable for this purpose; -and it looked pretty promising in the beginning. -However, further investigations found -that there are some very fundamental philosophical differences -between the Coyotos and Hurd designs; -and thus this this attempt was also abandonned, around 2006/2007. -(This time before producing any actual code.) - -By now there were some new L4 variants available, -which added protected IPC paths and other features necessary for object capability systems; -so it might be possible to implement the Hurd on top of these. -However, by that time the developers concluded that microkernel design and system design -are interconnected in very intricate ways, -and thus trying to use a third-party microkernel will always result in trouble. -So Neal Walfield created the experimental [[Viengoos|microkernel/viengoos]] kernel instead -- -based on the experience from the previous experiments with L4 and Coyotos -- -for his research on resource management. -Currently he works in another research area though, and thus Viengoos is on hold. - -Note that while none of the microkernel work is active now, -the previous experiments already yielded a lot of experience, -which will be very useful in the further development/improvement -of the mainline (Mach-based) Hurd implementation. - -<!-- - ---- - IRC, #hurd, 2011-01-12. [[!taglink open_issue_documentation]] @@ -110,5 +67,3 @@ IRC, #hurd, 2011-01-12. <antrik> manpower is not something that comes from nowhere. again, having something working is crucial in a volunteer project like this <antrik> there are no fixed plans - ---> diff --git a/history/port_to_l4.mdwn b/history/port_to_l4.mdwn index b58c0d91..3f951a64 100644 --- a/history/port_to_l4.mdwn +++ b/history/port_to_l4.mdwn @@ -1,108 +1,13 @@ -[[!meta copyright="Copyright © 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, -2009, 2010 Free Software Foundation, Inc."]] +[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license -is included in the section entitled -[[GNU Free Documentation License|/fdl]]."]]"""]] +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] -[[!meta title="Porting the Hurd to L4: Hurd/L4"]] +[[!tag stable_URL]] -There was an effort to port the Hurd from [[microkernel/Mach]] to the -[[L4_microkernel_family|microkernel/L4]]. - -The idea of using L4 as a [[microkernel]] for a [[Hurd_system|hurd]] was -initially voiced in the [[Hurd_community|community]] by Okuji Yoshinori, who, -for discussing this purpose, created the [[mailing lists/l4-hurd]] mailing list -in November 2000. - -The project itself then was mostly lead by Marcus Brinkmann and Neal Walfield. -Even though there was progress -- see, for example, the -[[QEMU image for L4|hurd/running/qemu/image for l4]] -- this port never reached a -releasable state. Eventually, a straight-forward port of the original Hurd's -design wasn't deemed feasible anymore by the developers, partly due to them not -cosidering L4 suitable for implementing a general-purpose operating system on -top of it, and because of deficiencies in the original Hurd's design, which -they discovered along their way. Read the [[hurd/critique]] and a -[[hurd/ng/position paper]]. - -By now, the development of Hurd/L4 has stopped. However, Neal Walfield moved -on to working on a newly designed kernel called [[microkernel/viengoos]]. - -Over the years, a lot of discussion have been held on the -[[mailing lists/l4-hurd]] mailing list, which today is still the right place -for [[next-generation Hurd|hurd/ng]] discussions. - -Development of Hurd/L4 was done in the `hurd-l4` module of the Hurd CVS -repository. The `doc` directory contains a design document that is worth -reading for anyone who wishes to learn more about Hurd/L4. - - -One goal of porting the Hurd to L4 was to make the Hurd independend of Mach -interfaces, to make it somewhat microkernel-agnostic. - -Mach wasn't maintained by its original authors anymore, so switching to a -well-maintained current [[microkernel]] was expected to yield a more solid -foundation for a Hurd system than the decaying Mach design and implementation -was able to. - -L4 being a second-generation [[microkernel]] was deemed to provide for a faster -system kernel implementation, especially in the time-critical [[IPC]] paths. -Also, as L4 was already implemented for a bunch of different architectures -(IA32, Alpha, MIPS; SMP), and the Hurd itself being rather archtecture-unaware, -it was expected to be able to easily support more platforms than with the -existing system. - -A design upon the lean L4 kernel would finally have moved devices drivers out -of the kernel's [[TCB]]. - - -One idea was to first introduce a Mach-on-L4 emulation layer, to easily get a -usable (though slow) Hurd-using-Mach-interfaces-on-L4 system, and then -gradually move the Hurd servers to use L4 intefaces rather than Mach ones. - - -Neal Walfield started the original Hurd/L4 port while at Karlsruhe in 2002. He -explains: - -> My intention was to adapt the Hurd to exploit L4's concepts and intended -> [[design_pattern]]s; it was not to simply provide a Mach -> [[compatibility_layer]] on top of L4. When I left Karlsruhe, I no longer had -> access to [[microkernel/l4/Pistachio]] as I was unwilling to sign an NDA. -> Although the specification was available, the Karlsruhe group only [released -> their code in May -> 2003](https://lists.ira.uni-karlsruhe.de/pipermail/l4ka/2003-May/000345.html). -> Around this time, Marcus began hacking on Pistachio. He created a relatively -> complete run-time. I didn't really become involved again until the second -> half of 2004, after I complete by Bachelors degree. - -> Before Marcus and I considered [[microkernel/Coyotos]], we had already -> rejected some parts of the Hurd's design. The -> [[open issues/resource management problems]] were -> what prompted me to look at L4. Also, some of the problems with -> [[hurd/translator]]s were already well-known to us. (For a more detailed -> description of the problems we have identified, see our [[hurd/critique]] in the -> 2007 July's SIGOPS OSR. We have also written a forward-looking -> [[hurd/ng/position paper]].) - -> We visited Jonathan Shapiro at Hopkins in January 2006. This resulted in a -> number of discussions, some quite influential, and not always in a way which -> aligned our position with that of Jonathan's. This was particularly true of -> a number of security issues. - -A lange number of discussion threads can be found in the archives of the -[[mailing lists/l4-hurd]] mailing list. - -> Hurd-NG, as we originally called it, was an attempt to articulate the system -> that we had come to envision in terms of interfaces and description of the -> system's structure. The new name was selected, if I recall correctly, as it -> clearly wasn't the Hurd nor the Hurd based on L4. - - -The source code is still available in [CVS module -`hurd-l4`](http://savannah.gnu.org/cgi-bin/viewcvs/hurd/hurd-l4/) (note that -this repository has in the beginning also been used for Neal's -[[microkernel/Viengoos]]). +[[!meta redir=port_to_another_microkernel]] diff --git a/hurd-l4.mdwn b/hurd-l4.mdwn index 579c1190..afc8f8f3 100644 --- a/hurd-l4.mdwn +++ b/hurd-l4.mdwn @@ -1,13 +1,13 @@ -[[!meta copyright="Copyright © 2009 Free Software Foundation, Inc."]] +[[!meta copyright="Copyright © 2009, 2011 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license -is included in the section entitled -[[GNU Free Documentation License|/fdl]]."]]"""]] +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] [[!tag stable_URL]] -[[!meta redir=history/port_to_l4]] +[[!meta redir=history/port_to_another_microkernel]] @@ -1,13 +1,13 @@ [[!meta copyright="Copyright © 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, -2009, 2010 Free Software Foundation, Inc."]] +2009, 2010, 2011 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license -is included in the section entitled -[[GNU Free Documentation License|/fdl]]."]]"""]] +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] The GNU Hurd is under active development. Because of that, there is no *stable* version. We distribute the Hurd sources only through [[Git|source_repositories]] at present. @@ -31,7 +31,7 @@ in the *unstable* branch of the Debian archive. * [[What_Is_the_GNU_Hurd]] - A Brief Description * [[Advantages]]. And [[challenges]]. * [[History]] - * [[history/Port_to_L4]] + * [[history/Port_to_another_microkernel]] * [[Logo]] * [[Status]] * [[KnownHurdLimits]] diff --git a/hurd/ng.mdwn b/hurd/ng.mdwn index de33949d..481386a4 100644 --- a/hurd/ng.mdwn +++ b/hurd/ng.mdwn @@ -13,7 +13,7 @@ This section explains the motivations behind the new design: * [[Issues_with_L4_Pistachio]] * [[Limitations_of_the_original_Hurd_design]] - * History of the [[history/port_to_L4]] + * History of the [[history/port_to_another_microkernel]] # Work already done @@ -19,7 +19,7 @@ License|/fdl]]."]]"""]] # Related - * [[open_issues/dtrace]] + * [[community/gsoc/project_ideas/dtrace]] * [[SystemTap]] diff --git a/microkernel/coyotos.mdwn b/microkernel/coyotos.mdwn index 5ecea688..fec023ba 100644 --- a/microkernel/coyotos.mdwn +++ b/microkernel/coyotos.mdwn @@ -1,5 +1,5 @@ -[[!meta copyright="Copyright © 2006, 2007, 2008, 2010 Free Software Foundation, -Inc."]] +[[!meta copyright="Copyright © 2006, 2007, 2008, 2010, 2011 Free Software +Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this @@ -28,3 +28,6 @@ design. The coyotos microkernel specification can be found [here](http://www.coyotos.org/docs/ukernel/spec.html). + +There once was the idea of a GNU/Hurd [[port using the Coyotos +microkernel|history/port_to_another_microkernel]], but this didn't come live. diff --git a/microkernel/l4.mdwn b/microkernel/l4.mdwn index 45929842..7af5e6fc 100644 --- a/microkernel/l4.mdwn +++ b/microkernel/l4.mdwn @@ -1,4 +1,4 @@ -[[!meta copyright="Copyright © 2004, 2006, 2007, 2008, 2010 Free Software +[[!meta copyright="Copyright © 2004, 2006, 2007, 2008, 2010, 2011 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable @@ -20,7 +20,8 @@ on formally verifying an L4 microkernel. * {{$sel4}} -There was a GNU/Hurd [[history/port_to_L4]], which is now stalled. +There was a GNU/Hurd [[port to L4|history/port_to_another_microkernel]], which +is now stalled. [[!ymlfront data=""" diff --git a/open_issues/anatomy_of_a_hurd_system.mdwn b/open_issues/anatomy_of_a_hurd_system.mdwn new file mode 100644 index 00000000..e1d5c9d8 --- /dev/null +++ b/open_issues/anatomy_of_a_hurd_system.mdwn @@ -0,0 +1,73 @@ +[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!taglink open_issue_documentation]] + +A bunch of this should also be covered in other (introductionary) material, +like Bushnell's Hurd paper. All this should be unfied and streamlined. + +IRC, freenode, #hurd, 2011-03-08 + + <foocraft> I've a question on what are the "units" in the hurd project, if + you were to divide them into units if they aren't, and what are the + dependency relations between those units(roughly, nothing too pedantic + for now) + <antrik> there is GNU Mach (the microkernel); there are the server + libraries in the Hurd package; there are the actual servers in the same; + and there is the POSIX implementation layer in glibc + <antrik> relations are a bit tricky + <antrik> Mach is the base layer which implements IPC and memory management + <foocraft> hmm I'll probably allocate time for dependency graph generation, + in the worst case + <antrik> on top of this, the Hurd servers, using the server libraries, + implement various aspects of the system functionality + <antrik> client programs use libc calls to use the servers + <antrik> (servers also use libc to communicate with other servers and/or + Mach though) + <foocraft> so every server depends solely on mach, and no other server? + <foocraft> s/mach/mach and/or libc/ + <antrik> I think these things should be pretty clear one you are somewhat + familiar with the Hurd architecture... nothing really tricky there + <antrik> no + <antrik> servers often depend on other servers for certain functionality + +--- + +IRC, freenode, #hurd, 2011-03-12 + + <dEhiN> when mach first starts up, does it have some basic i/o or fs + functionality built into it to start up the initial hurd translators? + <antrik> I/O is presently completely in Mach + <antrik> filesystems are in userspace + <antrik> the root filesystem and exec server are loaded by grub + <dEhiN> o I see + <dEhiN> so in order to start hurd, you would have to start mach and + simultaneously start the root filesystem and exec server? + <antrik> not exactly + <antrik> GRUB loads all three, and then starts Mach. Mach in turn starts + the servers according to the multiboot information passed from GRUB + <dEhiN> ok, so does GRUB load them into ram? + <dEhiN> I'm trying to figure out in my mind how hurd is initially started + up from a low-level pov + <antrik> yes, as I said, GRUB loads them + <dEhiN> ok, thanks antrik...I'm new to the idea of microkernels, but a + veteran of monolithic kernels + <dEhiN> although I just learned that windows nt is a hybrid kernel which I + never knew! + <rm> note there's a /hurd/ext2fs.static + <rm> I belive that's what is used initially... right? + <antrik> yes + <antrik> loading the shared libraries in addition to the actual server + would be unweildy + <antrik> so the root FS server is linked statically instead + <dEhiN> what does the root FS server do? + <antrik> well, it serves the root FS ;-) + <antrik> it also does some bootstrapping work during startup, to bring the + rest of the system up diff --git a/open_issues/code_analysis.mdwn b/open_issues/code_analysis.mdwn index ad59f962..21e09089 100644 --- a/open_issues/code_analysis.mdwn +++ b/open_issues/code_analysis.mdwn @@ -1,4 +1,4 @@ -[[!meta copyright="Copyright © 2010 Free Software Foundation, Inc."]] +[[!meta copyright="Copyright © 2010, 2011 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this @@ -40,7 +40,7 @@ analysis|performance]], [[formal_verification]], as well as general * <http://blog.llvm.org/2010/04/whats-wrong-with-this-code.html> - * [[Valgrind]] + * [[community/gsoc/project_ideas/Valgrind]] * [Smatch](http://smatch.sourceforge.net/) diff --git a/open_issues/debugging.mdwn b/open_issues/debugging.mdwn index e66a086f..e087f484 100644 --- a/open_issues/debugging.mdwn +++ b/open_issues/debugging.mdwn @@ -1,4 +1,4 @@ -[[!meta copyright="Copyright © 2010 Free Software Foundation, Inc."]] +[[!meta copyright="Copyright © 2010, 2011 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this @@ -42,7 +42,7 @@ We have debugging infrastructure. For example: Continues: <http://lwn.net/Articles/414264/>, which introduces <http://dmtcp.sourceforge.net/>. - * [[locking]] + * [[community/gsoc/project_ideas/libdiskfs_locking]] * <http://lwn.net/Articles/415728/>, or <http://lwn.net/Articles/415471/> -- just two examples; there's a lot of such stuff for Linux. diff --git a/open_issues/dtrace.mdwn b/open_issues/dtrace.mdwn deleted file mode 100644 index cbac28fb..00000000 --- a/open_issues/dtrace.mdwn +++ /dev/null @@ -1,47 +0,0 @@ -[[!meta copyright="Copyright © 2008, 2009, 2011 Free Software Foundation, -Inc."]] - -[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable -id="license" text="Permission is granted to copy, distribute and/or modify this -document under the terms of the GNU Free Documentation License, Version 1.2 or -any later version published by the Free Software Foundation; with no Invariant -Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license -is included in the section entitled [[GNU Free Documentation -License|/fdl]]."]]"""]] - -One of the main problems of the current Hurd implementation is very poor -[[performance]]. While we have a bunch of ideas what could cause the performance -problems, these are mostly just guesses. Better understanding what really -causes bad performance is necessary to improve the situation. - -For that, we need tools for performance measurements. While all kinds of more -or less specific [[profiling]] tools could be conceived, the most promising and -generic approach seems to be a framework for logging certain events in the -running system (both in the microkernel and in the Hurd servers). This would -allow checking how much time is spent in certain modules, how often certain -situations occur, how things interact, etc. It could also prove helpful in -debugging some issues that are otherwise hard to find because of complex -interactions. - -The most popular framework for that is Sun's dtrace; but there might be others. -The student has to evaluate the existing options, deciding which makes most -sense for the Hurd; and implement that one. (Apple's implementation of dtrace -in their Mach-based kernel might be helpful here...) - -This project requires ability to evaluate possible solutions, and experience -with integrating existing components as well as low-level programming. - -Possible mentors: Samuel Thibault (youpi) - -Related: [[profiling]], [[LTTng]], [[SystemTap]] - -Exercise: In lack of a good exercise directly related to this task, just pick -one of the kernel-related or generally low-level tasks from the bug/task -trackers on savannah, and make a go at it. You might not be able to finish the -task in a limited amount of time, but you should at least be able to make a -detailed analysis of the issue. - -*Status*: Andei Barbu was working on -[SystemTap](http://csclub.uwaterloo.ca/~abarbu/hurd/) for GSoC 2008, but it -turned out too Linux-specific. He implemented kernel probes, but there is no -nice frontend yet. diff --git a/open_issues/ext2fs_page_cache_swapping_leak.mdwn b/open_issues/ext2fs_page_cache_swapping_leak.mdwn new file mode 100644 index 00000000..0ace5cd3 --- /dev/null +++ b/open_issues/ext2fs_page_cache_swapping_leak.mdwn @@ -0,0 +1,23 @@ +[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!tag open_issue_hurd]] + +IRC, OFTC, #debian-hurd, 2011-03-24 + + <youpi> I still believe we have an ext2fs page cache swapping leak, however + <youpi> as the 1.8GiB swap was full, yet the ld process was only 1.5GiB big + <pinotree> a leak at swapping time, you mean? + <youpi> I mean the ext2fs page cache being swapped out instead of simply + dropped + <pinotree> ah + <pinotree> so the swap tends to accumulate unuseful stuff, i see + <youpi> yes + <youpi> the disk content, basicallyt :) diff --git a/tag/gsoc-task.mdwn b/open_issues/gdb_noninvasive_mode_new_threads.mdwn index 99758478..9b3992f4 100644 --- a/tag/gsoc-task.mdwn +++ b/open_issues/gdb_noninvasive_mode_new_threads.mdwn @@ -8,4 +8,8 @@ Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled [[GNU Free Documentation License|/fdl]]."]]"""]] -[[!meta redir=community/gsoc/project_ideas]] +[[!tag open_issue_gdb]] + +Debugging a translator. `gdb binary`. `set noninvasive on`. `attach [PID]`. +Translator does some work. GDB doesn't notice new threads. `detach`. `attach +[PID]` -- now new threads are visible. diff --git a/open_issues/gdb_thread_ids.mdwn b/open_issues/gdb_thread_ids.mdwn index c31a9967..c04a10ee 100644 --- a/open_issues/gdb_thread_ids.mdwn +++ b/open_issues/gdb_thread_ids.mdwn @@ -1,4 +1,4 @@ -[[!meta copyright="Copyright © 2008, 2009, 2010 Free Software Foundation, +[[!meta copyright="Copyright © 2008, 2009, 2010, 2011 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable @@ -6,8 +6,8 @@ id="license" text="Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license -is included in the section entitled -[[GNU Free Documentation License|/fdl]]."]]"""]] +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] [[!meta title="GDB: thread ids"]] @@ -23,3 +23,9 @@ GNU GDB's Pedro Alves: Also see [[thread numbering of ps and GDB]]. + +--- + +`attach` to a multi-threaded process. See threads 1 to 5. `detach`. `attach` +again -- thread numbers continue where they stopped last time: now they're +threads 6 to 10. diff --git a/open_issues/locking.mdwn b/open_issues/locking.mdwn deleted file mode 100644 index 6e22f887..00000000 --- a/open_issues/locking.mdwn +++ /dev/null @@ -1,40 +0,0 @@ -[[!meta copyright="Copyright © 2008, 2009, 2010, 2011 Free Software Foundation, -Inc."]] - -[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable -id="license" text="Permission is granted to copy, distribute and/or modify this -document under the terms of the GNU Free Documentation License, Version 1.2 or -any later version published by the Free Software Foundation; with no Invariant -Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license -is included in the section entitled [[GNU Free Documentation -License|/fdl]]."]]"""]] - -[[!tag open_issue_hurd gsoc-task]] - -Every now and then, new locking issues are discovered in -[[hurd/libdiskfs]] or [[hurd/translator/ext2fs]], for example. Nowadays -these in fact seem to be the most often encountered cause of Hurd crashes -/ lockups. - -One of these could be traced -recently, and turned out to be a lock inside [[hurd/libdiskfs]] that was taken -and not released in some cases. There is reason to believe that there are more -faulty paths causing these lockups. - -The task is systematically checking the [[hurd/libdiskfs]] code for this kind of locking -issues. To achieve this, some kind of test harness has to be implemented: For -example instrumenting the code to check locking correctness constantly at -runtime. Or implementing a [[unit testing]] framework that explicitly checks -locking in various code paths. (The latter could serve as a template for -implementing unit tests in other parts of the Hurd codebase...) - -(A [[systematic code review|security]] would probably suffice to find the -existing locking -issues; but it wouldn't document the work in terms of actual code produced, and -thus it's not suitable for a GSoC project...) - -This task requires experience with debugging locking issues in -[[multithreaded|multithreading]] applications. - -Tools have been written for automated [[code analysis]]; these can help to -locate and fix such errors. diff --git a/open_issues/performance.mdwn b/open_issues/performance.mdwn index 9b3701b3..eb9f3f8a 100644 --- a/open_issues/performance.mdwn +++ b/open_issues/performance.mdwn @@ -1,4 +1,4 @@ -[[!meta copyright="Copyright © 2010 Free Software Foundation, Inc."]] +[[!meta copyright="Copyright © 2010, 2011 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this @@ -18,7 +18,7 @@ In [[microkernel]]-based systems, there is generally a considerable [[RPC]] overhead. In a multi-server system, it is non-trivial to implement a high-performance -[[I/O System|io_system]]. +[[I/O System|community/gsoc/project_ideas/disk_io_performance]]. When providing [[faq/POSIX_compatibility]] (and similar interfaces) in an environemnt that doesn't natively implement these interfaces, there may be a diff --git a/open_issues/performance/io_system.mdwn b/open_issues/performance/io_system.mdwn deleted file mode 100644 index 8535eae3..00000000 --- a/open_issues/performance/io_system.mdwn +++ /dev/null @@ -1,50 +0,0 @@ -[[!meta copyright="Copyright © 2008, 2009, 2010, 2011 Free Software Foundation, -Inc."]] - -[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable -id="license" text="Permission is granted to copy, distribute and/or modify this -document under the terms of the GNU Free Documentation License, Version 1.2 or -any later version published by the Free Software Foundation; with no Invariant -Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license -is included in the section entitled [[GNU Free Documentation -License|/fdl]]."]]"""]] - -[[!meta title="I/O System"]] - -[[!tag open_issue_hurd gsoc-task]] - -The most obvious reason for the Hurd feeling slow compared to mainstream -systems like GNU/Linux, is a low I/O system performance, in particular very -slow hard disk access. - -The reason for this slowness is lack and/or bad implementation of common -optimization techniques, like scheduling reads and writes to minimize head -movement; effective block caching; effective reads/writes to partial blocks; -[[reading/writing multiple blocks at once|clustered_page_faults]]; and -[[read-ahead]]. The -[[ext2_filesystem_server|hurd/translator/ext2fs]] might also need some -optimizations at a higher logical level. - -The goal of this project is to analyze the current situation, and implement/fix -various optimizations, to achieve significantly better disk performance. It -requires understanding the data flow through the various layers involved in -disk access on the Hurd ([[filesystem|hurd/virtual_file_system]], -[[pager|hurd/libpager]], driver), and general experience with -optimizing complex systems. That said, the killing feature we are definitely -missing is the [[read-ahead]], and even a very simple implementation would bring -very big performance speedups. - -Here are some real testcases: - - * [[binutils_ld_64ksec]]; - - * running the Git testsuite which is mostly I/O bound; - - * use [[TopGit]] on a non-toy repository. - - -Possible mentors: Samuel Thibault (youpi) - -Exercise: Look through all the code involved in disk I/O, and try something -easy to improve. It's quite likely though that you will find nothing obvious -- -in this case, please contact us about a different exercise task. diff --git a/open_issues/performance/io_system/binutils_ld_64ksec.mdwn b/open_issues/performance/io_system/binutils_ld_64ksec.mdwn index b59a87a7..79c2300f 100644 --- a/open_issues/performance/io_system/binutils_ld_64ksec.mdwn +++ b/open_issues/performance/io_system/binutils_ld_64ksec.mdwn @@ -1,4 +1,4 @@ -[[!meta copyright="Copyright © 2010 Free Software Foundation, Inc."]] +[[!meta copyright="Copyright © 2010, 2011 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this @@ -10,7 +10,8 @@ License|/fdl]]."]]"""]] [[!tag open_issue_hurd]] -This one may be considered as a testcase for I/O system optimization. +This one may be considered as a testcase for [[I/O system +optimization|community/gsoc/project_ideas/disk_io_performance]]. It is taken from the [[binutils testsuite|binutils]], `ld/ld-elf/sec64k.exp`, where this diff --git a/open_issues/performance/io_system/clustered_page_faults.mdwn b/open_issues/performance/io_system/clustered_page_faults.mdwn index 3a187523..37433e06 100644 --- a/open_issues/performance/io_system/clustered_page_faults.mdwn +++ b/open_issues/performance/io_system/clustered_page_faults.mdwn @@ -10,6 +10,8 @@ License|/fdl]]."]]"""]] [[!tag open_issue_gnumach open_issue_hurd]] +[[community/gsoc/project_ideas/disk_io_performance]]. + IRC, freenode, #hurd, 2011-02-16 <braunr> exceptfor the kernel, everything in an address space is diff --git a/open_issues/performance/io_system/read-ahead.mdwn b/open_issues/performance/io_system/read-ahead.mdwn index 3ee30b5d..b6851edd 100644 --- a/open_issues/performance/io_system/read-ahead.mdwn +++ b/open_issues/performance/io_system/read-ahead.mdwn @@ -10,6 +10,8 @@ License|/fdl]]."]]"""]] [[!tag open_issue_gnumach open_issue_hurd]] +[[community/gsoc/project_ideas/disk_io_performance]] + IRC, #hurd, freenode, 2011-02-13: <etenil> youpi: Would libdiskfs/diskfs.h be in the right place to make diff --git a/open_issues/pfinet_vs_system_time_changes.mdwn b/open_issues/pfinet_vs_system_time_changes.mdwn new file mode 100644 index 00000000..a9e1e242 --- /dev/null +++ b/open_issues/pfinet_vs_system_time_changes.mdwn @@ -0,0 +1,42 @@ +[[!meta copyright="Copyright © 2010, 2011 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!tag open_issue_hurd]] + +IRC, unknown channel, unknown date. + + <grey_gandalf> I did a sudo date... + <grey_gandalf> and the machine hangs + +This was very likely as misdiagnosis: + +IRC, freenode, #hurd, 2011-03-25 + + <tschwinge> antrik: I suspect it'S some timing stuff in pfinet that perhaps + uses absolute time, and somehow wildely gets confused? + <antrik> tschwinge: BTW, pfinet doesn't actually die I think -- it just + drops open connections... + <antrik> perhaps it thinks they timed out + <tschwinge> antrik: Isn't the translator restarted instead? + <antrik> don't think so + <antrik> when pfinet actually dies, I also loose the NFS mounts, which + doesn't happen in this case + <antrik> hehe "... and the machine hangs" + <antrik> he didn't bother to check that the machine is perfectly fine, only + the SSH connection got dropped + <tschwinge> Ah, I see. So it'S perhaps indeed simply closes TCP + connections that have been without data for ``too long''? + <antrik> yeah, that's my guess + <antrik> my clock is speeding, so ntpdate sets it in the past + <antrik> perhaps there is some math that concludes the connection have been + inactive for -200 seconds, which (unsigned) is more than any timeout :-) + <tschwinge> (The other way round, you might likely get some integer + wrap-around, and thus the same result.) + <tschwinge> Yes. diff --git a/open_issues/profiling.mdwn b/open_issues/profiling.mdwn index e04fb08a..7e3c7350 100644 --- a/open_issues/profiling.mdwn +++ b/open_issues/profiling.mdwn @@ -17,7 +17,7 @@ done for [[performance analysis|performance]] reasons. Should be working, but some issues have been reported, regarding GCC spec files. Should be possible to fix (if not yet done) easily. - * [[dtrace]] + * [[community/gsoc/project_ideas/dtrace]] Have a look at this, integrate it into the main trees. diff --git a/open_issues/rpc_to_self_with_rendez-vous_leading_to_duplicate_port_destroy.mdwn b/open_issues/rpc_to_self_with_rendez-vous_leading_to_duplicate_port_destroy.mdwn new file mode 100644 index 00000000..9db92250 --- /dev/null +++ b/open_issues/rpc_to_self_with_rendez-vous_leading_to_duplicate_port_destroy.mdwn @@ -0,0 +1,163 @@ +[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!tag open_issue_hurd]] + +[RPC to self with rendez-vous leading to duplicate port +destroy](http://lists.gnu.org/archive/html/bug-hurd/2011-03/msg00045.html) + +IRC, freenode, #hurd, 2011-03-14 + + <antrik> youpi: I wonder, why does the root FS call diskfs_S_dir_lookup() + at all?... + <youpi> errr, because a client asked for it? + <youpi> (problem with RPCs is you can't easily know where they come from :) + ) + <youpi> (especially when it's the root fs...) + <antrik> ah, it's about a client request... didn't see that + <youpi> well, I just said "is called", yes + <antrik> I do not really understand though why it tries to reauthenticate + against itself... + <antrik> I fear my memory of the lookup mechanism grew a bit dim + <youpi> see the source + <youpi> it's about a translated entry + <antrik> (and I never fully understood some aspects anyways...) + <youpi> it needs to start the translated entry as another user, possibly + <antrik> yes, but a translated entry normally would be served by *another* + process?... + <youpi> sure, but ext2fs has to prepare it + <youpi> thus reauthenticate to prepare the correct set of rights + <antrik> prepare what? + <youpi> rights + <youpi> so the process is not root, doesn't have / opened as root, etc. + <antrik> rights for what? + <youpi> err, about everything + <antrik> IIRC the reauthentication is done by the parent FS on the port to + the *translated* node + <antrik> and the translated node should be a different process?... + <youpi> that's not what I read in the source + <youpi> fshelp_fetch_root + <youpi> ports[INIT_PORT_CRDIR] = reauth (getcrdir ()); + <youpi> here, getcrdir() returns ext2fs itself + <antrik> well, perhaps the issue is that I have no idea what + fshelp_fetch_root() does, nor why it is called here... + <youpi> it notably starts the translator that dir_lookup is looking at, if + needed + <youpi> possibly as a different user, thus reauthentication of CRDIR + <antrik> so this is about a port that is passed to the translator being + started? + <youpi> no + <youpi> well, depends on what you mean by "port" + <youpi> it's about reauthenticating a port to be passed to the translator + being started + <youpi> and for that a rendez-vous port is needed for the reauthentication + <youpi> and that's the one at stake + <antrik> yeah, I meant the port that is reauthenticated + <antrik> what is CRDIR? + <youpi> current root dir ... + <antrik> so the parent translator passes it's own root dir to the child + translator; and the issue is that for the root FS the root dir points to + the root FS itself... + <youpi> yes + <antrik> OK, that makes sense + <youpi> (but that's only one example, rgrep mach_port_destroy hurd/ show + other potential issues) + <antrik> well, that's actually what I wanted to mention next... why is the + rendez-vous port destroyed, instead of just deallocating the port right + and letting reference counting to it's thing?... + <antrik> do its thing + <youpi> "just to make sure" I guess + <antrik> it's pretty obvious that this will cause trouble for any RPC + referencing itself... + <youpi> well, follow-up with that on the list + <youpi> with roland/tb in CC + <youpi> only they would know any real reason for destroy + <youpi> btw, if you knew how we could make _hurd_select()'s raw __mach_msg + call be interruptible by signals, that'll permit to fix sudo + <youpi> (damn, I need sleep, my tenses are all wrong) + <antrik> BTW, does this cause any actual trouble?... + <antrik> I don't know much about interruption... cfhammer might have a + better idea, he look into that stuff quite a bit AIUI + <antrik> looked + <antrik> (hehe, it's not only your tenses... guess there's something in the + ether ;-) ) + <youpi> it makes sudo, mailq, etc. fail sometimes + <antrik> I mean the rendez-vous thing + <youpi> that's it, yes + <youpi> sudo etc. fail at least due to this + <antrik> so these are two different problems that both affect sudo? + <antrik> (rendez-vous and interruption I mean) + <youpi> yes + <youpi> with my patch the buildds have much fewer issues, but still some + <youpi> (my interrupt-related patch) + <youpi> I'm installing a s/destroy/deallocate/ version of ext2fs on the + buildds, we'll see how it behaves + <youpi> (it fixes my testcase at least) + <antrik> interrupt-related patch? + <antrik> only thing interrupt-related I remember was the reauthentication + race... + <youpi> that's what I mean + <antrik> well, cfhammer investigated this is quite some depth, explaining + quite well why the race is only mitigated but still exists... problem is + that we didn't know how to fix it properly + <antrik> because nobody seems to understand the cancellation code, except + perhaps for Roland and Thomas + <antrik> (and I'm not even entirely sure about them :-) ) + <antrik> I think his findings and our conclusions are documented on the + ML... + <youpi> by "much fewer issues", I mean that some of the symptoms have + disappeared, others haven't + <antrik> BTW, couldn't the rendez-vous thing be worked around by simply + ignoring the errors from the failing deallocate?... + <youpi> no, failing deallocate are actually dangerous + <antrik> why? + <youpi> since the name might have been reused for something else in the + meanwhile + <youpi> that's the whole point of the warning I had added in the kernel + itself + <antrik> I see + <youpi> such things really deserve tracking, since they can have any kind + of consequence + <antrik> does Mach try to reuse names quickly, rather than only after + wrapping around?... + <youpi> it seems to + <antrik> OK, then this is a serious problem indeed + <youpi> (note: I rarely divine issues when there aren't actual frequent + symptoms :) ) + <antrik> well, the problem with the warning is that it only shows in the + cases that do *not* cause a problem... so it's hard to associate them + with any specific issues + <youpi> well, most of the time the port is not reused quickly enough + <youpi> so in most case it shows up more often than causing problem + +IRC, freenode, #hurd, 2011-03-14 + + <youpi> ok, mach_port_deallocate actually can't be used + <youpi> since mach_reply_port() returns a receive right, not a send right + * youpi guesses he will really have to manage to understand all that port + stuff completely + <antrik> oh, right + <antrik> youpi: hm... now I'm confused though. if one client holds a + receive right, the other client (or in this case the same process) should + have a send or send-once right -- these should *not* share the same name + in my understanding + <antrik> destroying the receive right should turn the send right into a + dead name + <antrik> so unless I'm missing something, the destroy shouldn't be a + problem, and there must be something else going wrong + <antrik> hm... actually I'm probably wrong + <antrik> yeah, definitely wrong. receive rights and "ordinary" send rights + share the name. only send-once rights are special + <antrik> I wonder whether the problem could be worked around by using a + send-once right... + <antrik> mach_port_mod_refs(mach_task_self(), name, + MACH_PORT_RIGHT_RECEIVE, -1) can be used to deallocate only the receive + right + <antrik> oh, you already figured that out :-) diff --git a/open_issues/unit_testing.mdwn b/open_issues/unit_testing.mdwn index a5ffe19d..1378be85 100644 --- a/open_issues/unit_testing.mdwn +++ b/open_issues/unit_testing.mdwn @@ -8,7 +8,8 @@ Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled [[GNU Free Documentation License|/fdl]]."]]"""]] -This task may be suitable for [[community/GSoC]][[!tag gsoc-task]]. +This task may be suitable for [[community/GSoC]]: +[[community/gsoc/project_ideas/testing_framework]] --- @@ -80,243 +81,6 @@ abandoned). # Discussion -freenode, #hurd channel, 2011-03-05: - - <nixness> what about testing though? - <nixness> like sort of "what's missing? lets write tests for it so that - when someone gets to implementing it, he knows what to do. Repeat" - project - <antrik> you mean creating an automated testing framework? - <antrik> this is actually a task I want to add for this year, if I get - around to it :-) - <nixness> yeah I'd very much want to apply for that one - <nixness> cuz I've never done Kernel hacking before, but I know that with - the right tasks like "test VM functionality", I would be able to write up - the automated tests and hopefully learn more about what breaks/makes the - kernel - <nixness> (and it would make implementing the feature much less hand-wavy - and more correct) - <nixness> antrik, I believe the framework(CUnit right?) exists, but we just - need to write the tests. - <antrik> do you have prior experience implementing automated tests? - <nixness> lots of tests! - <nixness> yes, in Java mostly, but I've played around with CUnit - <antrik> ah, great :-) - <nixness> here's what I know from experience: 1) write basic tests. 2) - write ones that test multiple features 3) stress test [option 4) - benchmark and record to DB] - <youpi> well, what we'd rather need is to fix the issues we already know - from the existing testsuites :) - -[[GSoC project propsal|community/gsoc/project_ideas/testsuites]]. - - <nixness> youpi, true, and I think I should check what's available in way - of tests, but if the tests are "all or nothing" then it becomes really - hard to fix your mistakes - <youpi> they're not all or nothing - <antrik> youpi: the existing testsuites don't test specific system features - <youpi> libc ones do - <youpi> we could also check posixtestsuite which does too - -[[open_issues/open_posix_test_suite]]. - - <antrik> AFAIK libc has very few failing tests - -[[open_issues/glibc_testsuite]]. - - <youpi> err, like twenty? - <youpi> € grep -v '^#' expected-results-i486-gnu-libc | wc -l - <youpi> 67 - <youpi> nope, even more - <antrik> oh, sorry, I confused it with coreutils - <pinotree> plus the binutils ones, i guess - <youpi> yes - -[[open_issues/binutils#weak]]. - - <antrik> anyways, I don't think relying on external testsuites for - regression testing is a good plan - <antrik> also, that doesn't cover unit testing at all - <youpi> why ? - <youpi> sure there can be unit testing at the translator etc. level - <antrik> if we want to implement test-driven development, and/or do serious - refactoring without too much risk, we need a test harness where we can - add specific tests as needed - <youpi> but more than often, the issues are at the libc / etc. level - because of a combination o fthings at the translator level, which unit - testing won't find out - * nixness yewzzz! - <nixness> sure unit testing can find them out. if they're good "unit" tests - <youpi> the problem is that you don't necessarily know what "good" means - <youpi> e.g. for posix correctness - <youpi> since it's not posix - <nixness> but if they're composite clever tests, then you lose that - granularity - <nixness> youpi, is that a blackbox test intended to be run at the very end - to check if you're posix compliant? - <antrik> also, if we have our own test harness, we can run tests - automatically as part of the build process, which is a great plus IMHO - <youpi> nixness: "that" = ? - <nixness> oh nvm, I thought there was a test stuie called "posix - correctness" - <youpi> there's the posixtestsuite yes - <youpi> it's an external one however - <youpi> antrik: being external doesn't mean we can't invoke it - automatically as part of the build process when it's available - <nixness> youpi, but being internal means it can test the inner workings of - certain modules that you are unsure of, and not just the interface - <youpi> sure, that's why I said it'd be useful too - <youpi> but as I said too, most bugs I've seen were not easy to find out at - the unit level - <youpi> but rather at the libc level - <antrik> of course we can integrate external tests if they exist and are - suitable. but that that doesn't preclude adding our own ones too. in - either case, that integration work has to be done too - <youpi> again, I've never said I was against internal testsuite - <antrik> also, the major purpose of a test suite is checking known - behaviour. a low-level test won't directly point to a POSIX violation; - but it will catch any changes in behaviour that would lead to one - <youpi> what I said is that it will be hard to write them tight enough to - find bugs - <youpi> again, the problem is knowing what will lead to a POSIX violation - <youpi> it's long work - <youpi> while libc / posixtestsuite / etc. already do that - <antrik> *any* unexpected change in behaviour is likely to cause bugs - somewher - <youpi> but WHAT is "expected" ? - <youpi> THAT is the issue - <youpi> and libc/posixtessuite do know that - <youpi> at the translator level we don't really - <youpi> see the recent post about link() - -[link(dir,name) should fail with -EPERM](http://lists.gnu.org/archive/html/bug-hurd/2011-03/msg00007.html) - - <youpi> in my memory jkoenig pointed it out for a lot of such calls - <youpi> and that issue is clearly not known at the translator level - <nixness> so you're saying that the tests have to be really really - low-level, and work our way up? - <youpi> I'm saying that the translator level tests will be difficult to - write - <antrik> why isn't it known at the translator level? if it's a translator - (not libc) bug, than obviously the translator must return something wrong - at some point, and that's something we can check - <youpi> because you'll have to know all the details of the combinations - used in libc, to know whether they'll lead to posix issues - <youpi> antrik: sure, but how did we detect that that was unexpected - behavior? - <youpi> because of a glib test - <youpi> at the translator level we didn't know it was an unexpected - behavior - <antrik> gnulib actually - <youpi> and if you had asked me, I wouldn't have known - <antrik> again, we do *not* write a test suite to find existing bugs - <youpi> right, took one for the other - <youpi> doesn't really matter actually - <youpi> antrik: ok, I don't care then - <antrik> we write a test suite to prevent future bugs, or track status of - known bugs - <youpi> (don't care *enough* for now, I mean) - <nixness> hmm, so write something up antrik for GSoC :) and I'll be sure to - apply - <antrik> now that we know some translators return a wrong error status in a - particular situation, we can write a test that checks exactly this error - status. that way we know when it is fixed, and also when it breaks again - <antrik> nixness: great :-) - <nixness> sweet. that kind of thing would also need a db backend - <antrik> nixness: BTW, if you have a good idea, you can send an application - for it even if it's not listed among the proposed tasks... - <antrik> so you don't strictly need a writeup from me to apply for this :-) - <nixness> antrik, I'll keep that in mind, but I'll also be checking your - draft page - <nixness> oh cool :) - <antrik> (and it's a well known fact that those projects which students - proposed themselfs tend to be the most successful ones :-) ) - * nixness draft initiated - <antrik> youpi: BTW, I'm surprised that you didn't mention libc testsuite - before... working up from there is probably a more effective plan than - starting with high-level test suites like Python etc... - <youpi> wasn't it already in the gsoc proposal? - <youpi> bummer - <antrik> nope - -freenode, #hurd channel, 2011-03-06: - - <nixness> how's the hurd coding workflow, typically? - -*nixness* -> *foocraft*. - - <foocraft> we're discussing how TDD can work with Hurd (or general kernel - development) on #osdev - <foocraft> so what I wanted to know, since I haven't dealt much with GNU - Hurd, is how do you guys go about coding, in this case - <tschwinge> Our current workflow scheme is... well... is... - <tschwinge> Someone wants to work on something, or spots a bug, then works - on it, submits a patch, and 0 to 10 years later it is applied. - <tschwinge> Roughly. - <foocraft> hmm so there's no indicator of whether things broke with that - patch - <foocraft> and how low do you think we can get with tests? A friend of mine - was telling me that with kernel dev, you really don't know whether, for - instance, the stack even exists, and a lot of things that I, as a - programmer, can assume while writing code break when it comes to writing - kernel code - <foocraft> Autotest seems promising - -See autotest link given above. - - <foocraft> but in any case, coming up with the testing framework that - doesn't break when the OS itself breaks is hard, it seems - <foocraft> not sure if autotest isolates the mistakes in the os from - finding their way in the validity of the tests themselves - <youpi> it could be interesting to have scripts that automatically start a - sub-hurd to do the tests - -[[hurd/subhurd#unit_testing]]. - - <tschwinge> foocraft: To answer one of your earlier questions: you can do - really low-level testing. Like testing Mach's message passing. A - million times. The questions is whether that makes sense. And / or if - it makes sense to do that as part of a unit testing framework. Or rather - do such things manually once you suspect an error somewhere. - <tschwinge> The reason for the latter may be that Mach IPC is already - heavily tested during normal system operation. - <tschwinge> And yet, there still may be (there are, surely) bugs. - <tschwinge> But I guess that you have to stop at some (arbitrary?) level. - <foocraft> so we'd assume it works, and test from there - <tschwinge> Otherwise you'd be implementing the exact counter-part of what - you're testing. - <tschwinge> Which may be what you want, or may be not. Or it may just not - be feasible. - <foocraft> maybe the testing framework should have dependencies - <foocraft> which we can automate using make, and phony targets that run - tests - <foocraft> so everyone writes a test suite and says that it depends on A - and B working correctly - <foocraft> then it'd go try to run the tests for A etc. - <tschwinge> Hmm, isn't that -- on a high level -- have you have by - different packages? For example, the perl testsuite depends (inherently) - on glibc working properly. A perl program's testsuite depends on perl - working properly. - <foocraft> yeah, but afaik, the ordering is done by hand - -freenode, #hurd channel, 2011-03-07: - - <antrik> actually, I think for most tests it would be better not to use a - subhurd... that leads precisely to the problem that if something is - broken, you might have a hard time running the tests at all :-) - <antrik> foocraft: most of the Hurd code isn't really low-level. you can - use normal debugging and testing methods - <antrik> gnumach of course *does* have some low-level stuff, so if you add - unit tests to gnumach too, you might run into issues :-) - <antrik> tschwinge: I think testing IPC is a good thing. as I already said, - automated testing is *not* to discover existing but unknown bugs, but to - prevent new ones creeping in, and tracking progress on known bugs - <antrik> tschwinge: I think you are confusing unit testing and regression - testing. http://www.bddebian.com/~hurd-web/open_issues/unit_testing/ - talks about unit testing, but a lot (most?) of it is actually about - regression tests... - <tschwinge> antrik: That may certainly be -- I'm not at all an expert in - this, and just generally though that some sort of automated testing is - needed, and thus started collecting ideas. - <tschwinge> antrik: You're of course invited to fix that. +See the [[GSoC project idea|community/gsoc/project_ideas/testing_framework]]'s +[[discussion +subpage|community/gsoc/project_ideas/testing_framework/discussion]]. diff --git a/open_issues/valgrind.mdwn b/open_issues/valgrind.mdwn deleted file mode 100644 index bd45829c..00000000 --- a/open_issues/valgrind.mdwn +++ /dev/null @@ -1,83 +0,0 @@ -[[!meta copyright="Copyright © 2009, 2010, 2011 Free Software Foundation, -Inc."]] - -[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable -id="license" text="Permission is granted to copy, distribute and/or modify this -document under the terms of the GNU Free Documentation License, Version 1.2 or -any later version published by the Free Software Foundation; with no Invariant -Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license -is included in the section entitled [[GNU Free Documentation -License|/fdl]]."]]"""]] - -[[!meta title="Porting Valgrind to the Hurd"]] - -[[!tag gsoc-task]] - -[Valgrind](http://valgrind.org/) is an extremely useful debugging tool for memory errors. -(And some other kinds of hard-to-find errors too.) -Aside from being useful for program development in general, -a Hurd port will help finding out why certain programs segfault on the Hurd, -although they work on Linux. -Even more importantly, it will help finding bugs in the Hurd servers themselfs. - -To keep track of memory use, -Valgrind however needs to know how each [[system call]] affects the validity of memory regions. -This knowledge is highly kernel-specific, -and thus Valgrind needs to be explicitely ported for every system. - -Such a port involves two major steps: -making Valgrind understand how kernel traps work in general on the system in question; -and how all the individual kernel calls affect memory. -The latter step is where most of the work is, -as the behaviour of each single [[system call]] needs to be described. - -Compared to Linux, -[[microkernel/Mach]] (the microkernel used by the Hurd) has very few kernel traps. -Almost all [[system call]]s are implemented as [[RPC]]s instead -- -either handled by Mach itself, or by the various [[Hurd servers|hurd/translator]]. -All RPCs use a pair of `mach_msg` invocations: -one to send a request message, and one to receive a reply. -However, while all RPCs use the same `mach_msg` trap, -the actual effect of the call varies greatly depending on which RPC is invoked -- -similar to the `ioctl` call on Linux. -Each request thus must be handled individually. - -Unlike `ioctl`, -the RPC invocations have explicit type information for the parameters though, -which can be retrieved from the message header. -By analyzing the parameters of the RPC reply message, -Valgrind can know exactly which memory regions are affected by that call, -even without specific knowledge of the RPC in question. -Thus implementing a general parser for the reply messages -will already give Valgrind a fairly good approximation of memory validity -- -without having to specify the exact semantic of each RPC by hand. - -While this should make Valgrind quite usable on the Hurd already, it's not perfect: -some RPCs might return a buffer that is only partially filled with valid data; -or some reply parameters might be optional, -and only contain valid data under certain conditions. -Such specific semantics can't be deduced from the message headers alone. -Thus for a complete port, -it will still be necessary to go through the list of all known RPCs, -and implement special handling in Valgrind for those RPCs that need it. - -The goal of this task is at minimum to make Valgrind grok Mach traps, -and to implement the generic RPC handler. -Ideally, specific handling for RPCs needing it should also be implemented. - -Completing this project will require digging into Valgrind's handling of [[system call]]s, -and into Hurd RPCs. -It is not an easy task, but a fairly predictable one -- -there shouldn't be any unexpected difficulties, -and no major design work is necessary. -It doesn't require any specific previous knowledge: -only good programming skills in general. -On the other hand, -the student will obtain a good understanding of Hurd RPCs while working on this task, -and thus perfect qualifications for Hurd development in general :-) - -Possible mentors: Samuel Thibault (youpi) - -Exercise: As a starter, -students can try to teach valgrind a couple of Linux ioctls, -as this will make them learn how to use the read/write primitives of valgrind. diff --git a/systemtap.mdwn b/systemtap.mdwn index abd1961e..ba64c0d4 100644 --- a/systemtap.mdwn +++ b/systemtap.mdwn @@ -19,7 +19,7 @@ License|/fdl]]."]]"""]] # Related - * [[open_issues/dtrace]] + * [[community/gsoc/project_ideas/dtrace]] * [[LTTng]] diff --git a/topgit.mdwn b/topgit.mdwn index b71038ec..fb374337 100644 --- a/topgit.mdwn +++ b/topgit.mdwn @@ -1,4 +1,4 @@ -[[!meta copyright="Copyright © 2010 Free Software Foundation, Inc."]] +[[!meta copyright="Copyright © 2010, 2011 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this @@ -30,7 +30,8 @@ development branches, for example [[source_repositories/binutils]] or # Running it on GNU/Hurd Nothing special to that, technically, *only* that our [[I/O system's (non-) -performance|open_issues/performance/io_system]] will render this unbearably +performance|community/gsoc/project_ideas/disk_io_performance]] will render this +unbearably slow for anything but simple test cases. So don't try to run it on the [[GCC]] or [[glibc]] repositories. Talk to [[tschwinge]] about how he's using it on a GNU/Linux machine and push the resulting trees to GNU/Hurd systems. |