diff options
-rw-r--r-- | community/gsoc/project_ideas/testing_framework.mdwn | 2 | ||||
-rw-r--r-- | community/gsoc/project_ideas/testing_framework/discussion.mdwn | 270 | ||||
-rw-r--r-- | open_issues/unit_testing.mdwn | 266 |
3 files changed, 276 insertions, 262 deletions
diff --git a/community/gsoc/project_ideas/testing_framework.mdwn b/community/gsoc/project_ideas/testing_framework.mdwn index 0448bc6b..ff9899d9 100644 --- a/community/gsoc/project_ideas/testing_framework.mdwn +++ b/community/gsoc/project_ideas/testing_framework.mdwn @@ -36,7 +36,7 @@ before the end of the summer. (As a bonus, in addition to these explicit tests, it would be helpful to integrate some methods -for testing [[locking validity|open_issues/locking]], +for testing [[locking validity|libdiskfs_locking]], performing static code analysis etc.) This task probably requires some previous experience diff --git a/community/gsoc/project_ideas/testing_framework/discussion.mdwn b/community/gsoc/project_ideas/testing_framework/discussion.mdwn new file mode 100644 index 00000000..872d0eb7 --- /dev/null +++ b/community/gsoc/project_ideas/testing_framework/discussion.mdwn @@ -0,0 +1,270 @@ +[[!meta copyright="Copyright © 2010, 2011 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +freenode, #hurd channel, 2011-03-05: + + <nixness> what about testing though? + <nixness> like sort of "what's missing? lets write tests for it so that + when someone gets to implementing it, he knows what to do. Repeat" + project + <antrik> you mean creating an automated testing framework? + <antrik> this is actually a task I want to add for this year, if I get + around to it :-) + <nixness> yeah I'd very much want to apply for that one + <nixness> cuz I've never done Kernel hacking before, but I know that with + the right tasks like "test VM functionality", I would be able to write up + the automated tests and hopefully learn more about what breaks/makes the + kernel + <nixness> (and it would make implementing the feature much less hand-wavy + and more correct) + <nixness> antrik, I believe the framework(CUnit right?) exists, but we just + need to write the tests. + <antrik> do you have prior experience implementing automated tests? + <nixness> lots of tests! + <nixness> yes, in Java mostly, but I've played around with CUnit + <antrik> ah, great :-) + <nixness> here's what I know from experience: 1) write basic tests. 2) + write ones that test multiple features 3) stress test [option 4) + benchmark and record to DB] + <youpi> well, what we'd rather need is to fix the issues we already know + from the existing testsuites :) + +[[GSoC project propsal|community/gsoc/project_ideas/testsuites]]. + + <nixness> youpi, true, and I think I should check what's available in way + of tests, but if the tests are "all or nothing" then it becomes really + hard to fix your mistakes + <youpi> they're not all or nothing + <antrik> youpi: the existing testsuites don't test specific system features + <youpi> libc ones do + <youpi> we could also check posixtestsuite which does too + +[[open_issues/open_posix_test_suite]]. + + <antrik> AFAIK libc has very few failing tests + +[[open_issues/glibc_testsuite]]. + + <youpi> err, like twenty? + <youpi> € grep -v '^#' expected-results-i486-gnu-libc | wc -l + <youpi> 67 + <youpi> nope, even more + <antrik> oh, sorry, I confused it with coreutils + <pinotree> plus the binutils ones, i guess + <youpi> yes + +[[open_issues/binutils#weak]]. + + <antrik> anyways, I don't think relying on external testsuites for + regression testing is a good plan + <antrik> also, that doesn't cover unit testing at all + <youpi> why ? + <youpi> sure there can be unit testing at the translator etc. level + <antrik> if we want to implement test-driven development, and/or do serious + refactoring without too much risk, we need a test harness where we can + add specific tests as needed + <youpi> but more than often, the issues are at the libc / etc. level + because of a combination o fthings at the translator level, which unit + testing won't find out + * nixness yewzzz! + <nixness> sure unit testing can find them out. if they're good "unit" tests + <youpi> the problem is that you don't necessarily know what "good" means + <youpi> e.g. for posix correctness + <youpi> since it's not posix + <nixness> but if they're composite clever tests, then you lose that + granularity + <nixness> youpi, is that a blackbox test intended to be run at the very end + to check if you're posix compliant? + <antrik> also, if we have our own test harness, we can run tests + automatically as part of the build process, which is a great plus IMHO + <youpi> nixness: "that" = ? + <nixness> oh nvm, I thought there was a test stuie called "posix + correctness" + <youpi> there's the posixtestsuite yes + <youpi> it's an external one however + <youpi> antrik: being external doesn't mean we can't invoke it + automatically as part of the build process when it's available + <nixness> youpi, but being internal means it can test the inner workings of + certain modules that you are unsure of, and not just the interface + <youpi> sure, that's why I said it'd be useful too + <youpi> but as I said too, most bugs I've seen were not easy to find out at + the unit level + <youpi> but rather at the libc level + <antrik> of course we can integrate external tests if they exist and are + suitable. but that that doesn't preclude adding our own ones too. in + either case, that integration work has to be done too + <youpi> again, I've never said I was against internal testsuite + <antrik> also, the major purpose of a test suite is checking known + behaviour. a low-level test won't directly point to a POSIX violation; + but it will catch any changes in behaviour that would lead to one + <youpi> what I said is that it will be hard to write them tight enough to + find bugs + <youpi> again, the problem is knowing what will lead to a POSIX violation + <youpi> it's long work + <youpi> while libc / posixtestsuite / etc. already do that + <antrik> *any* unexpected change in behaviour is likely to cause bugs + somewher + <youpi> but WHAT is "expected" ? + <youpi> THAT is the issue + <youpi> and libc/posixtessuite do know that + <youpi> at the translator level we don't really + <youpi> see the recent post about link() + +[link(dir,name) should fail with +EPERM](http://lists.gnu.org/archive/html/bug-hurd/2011-03/msg00007.html) + + <youpi> in my memory jkoenig pointed it out for a lot of such calls + <youpi> and that issue is clearly not known at the translator level + <nixness> so you're saying that the tests have to be really really + low-level, and work our way up? + <youpi> I'm saying that the translator level tests will be difficult to + write + <antrik> why isn't it known at the translator level? if it's a translator + (not libc) bug, than obviously the translator must return something wrong + at some point, and that's something we can check + <youpi> because you'll have to know all the details of the combinations + used in libc, to know whether they'll lead to posix issues + <youpi> antrik: sure, but how did we detect that that was unexpected + behavior? + <youpi> because of a glib test + <youpi> at the translator level we didn't know it was an unexpected + behavior + <antrik> gnulib actually + <youpi> and if you had asked me, I wouldn't have known + <antrik> again, we do *not* write a test suite to find existing bugs + <youpi> right, took one for the other + <youpi> doesn't really matter actually + <youpi> antrik: ok, I don't care then + <antrik> we write a test suite to prevent future bugs, or track status of + known bugs + <youpi> (don't care *enough* for now, I mean) + <nixness> hmm, so write something up antrik for GSoC :) and I'll be sure to + apply + <antrik> now that we know some translators return a wrong error status in a + particular situation, we can write a test that checks exactly this error + status. that way we know when it is fixed, and also when it breaks again + <antrik> nixness: great :-) + <nixness> sweet. that kind of thing would also need a db backend + <antrik> nixness: BTW, if you have a good idea, you can send an application + for it even if it's not listed among the proposed tasks... + <antrik> so you don't strictly need a writeup from me to apply for this :-) + <nixness> antrik, I'll keep that in mind, but I'll also be checking your + draft page + <nixness> oh cool :) + <antrik> (and it's a well known fact that those projects which students + proposed themselfs tend to be the most successful ones :-) ) + * nixness draft initiated + <antrik> youpi: BTW, I'm surprised that you didn't mention libc testsuite + before... working up from there is probably a more effective plan than + starting with high-level test suites like Python etc... + <youpi> wasn't it already in the gsoc proposal? + <youpi> bummer + <antrik> nope + +freenode, #hurd channel, 2011-03-06: + + <nixness> how's the hurd coding workflow, typically? + +*nixness* -> *foocraft*. + + <foocraft> we're discussing how TDD can work with Hurd (or general kernel + development) on #osdev + <foocraft> so what I wanted to know, since I haven't dealt much with GNU + Hurd, is how do you guys go about coding, in this case + <tschwinge> Our current workflow scheme is... well... is... + <tschwinge> Someone wants to work on something, or spots a bug, then works + on it, submits a patch, and 0 to 10 years later it is applied. + <tschwinge> Roughly. + <foocraft> hmm so there's no indicator of whether things broke with that + patch + <foocraft> and how low do you think we can get with tests? A friend of mine + was telling me that with kernel dev, you really don't know whether, for + instance, the stack even exists, and a lot of things that I, as a + programmer, can assume while writing code break when it comes to writing + kernel code + <foocraft> Autotest seems promising + +See autotest link given above. + + <foocraft> but in any case, coming up with the testing framework that + doesn't break when the OS itself breaks is hard, it seems + <foocraft> not sure if autotest isolates the mistakes in the os from + finding their way in the validity of the tests themselves + <youpi> it could be interesting to have scripts that automatically start a + sub-hurd to do the tests + +[[hurd/subhurd#unit_testing]]. + + <tschwinge> foocraft: To answer one of your earlier questions: you can do + really low-level testing. Like testing Mach's message passing. A + million times. The questions is whether that makes sense. And / or if + it makes sense to do that as part of a unit testing framework. Or rather + do such things manually once you suspect an error somewhere. + <tschwinge> The reason for the latter may be that Mach IPC is already + heavily tested during normal system operation. + <tschwinge> And yet, there still may be (there are, surely) bugs. + <tschwinge> But I guess that you have to stop at some (arbitrary?) level. + <foocraft> so we'd assume it works, and test from there + <tschwinge> Otherwise you'd be implementing the exact counter-part of what + you're testing. + <tschwinge> Which may be what you want, or may be not. Or it may just not + be feasible. + <foocraft> maybe the testing framework should have dependencies + <foocraft> which we can automate using make, and phony targets that run + tests + <foocraft> so everyone writes a test suite and says that it depends on A + and B working correctly + <foocraft> then it'd go try to run the tests for A etc. + <tschwinge> Hmm, isn't that -- on a high level -- have you have by + different packages? For example, the perl testsuite depends (inherently) + on glibc working properly. A perl program's testsuite depends on perl + working properly. + <foocraft> yeah, but afaik, the ordering is done by hand + +freenode, #hurd channel, 2011-03-07: + + <antrik> actually, I think for most tests it would be better not to use a + subhurd... that leads precisely to the problem that if something is + broken, you might have a hard time running the tests at all :-) + <antrik> foocraft: most of the Hurd code isn't really low-level. you can + use normal debugging and testing methods + <antrik> gnumach of course *does* have some low-level stuff, so if you add + unit tests to gnumach too, you might run into issues :-) + <antrik> tschwinge: I think testing IPC is a good thing. as I already said, + automated testing is *not* to discover existing but unknown bugs, but to + prevent new ones creeping in, and tracking progress on known bugs + <antrik> tschwinge: I think you are confusing unit testing and regression + testing. http://www.bddebian.com/~hurd-web/open_issues/unit_testing/ + talks about unit testing, but a lot (most?) of it is actually about + regression tests... + <tschwinge> antrik: That may certainly be -- I'm not at all an expert in + this, and just generally though that some sort of automated testing is + needed, and thus started collecting ideas. + <tschwinge> antrik: You're of course invited to fix that. + +IRC, freenode, #hurd, 2011-03-08 + +(After discussing the [[open_issues/anatomy_of_a_hurd_system]].) + + <antrik> so that's what your question is actually about? + <foocraft> so what I would imagine is a set of only-this-server tests for + each server, and then we can have fun adding composite tests + <foocraft> thus making debugging the composite scenarios a bit less tricky + <antrik> indeed + <foocraft> and if you were trying to pass a composite test, it would also + help knowing that you still didn't break the server-only test + <antrik> there are so many different things that can be tested... the + summer will only suffice to dip into this really :-) + <foocraft> yeah, I'm designing my proposal to focus on 1) make/use a + testing framework that fits the Hurd case very well 2) write some tests + and docs on how to write good tests + <antrik> well, doesn't have to be *one* framework... unit testing and + regression testing are quite different things, which can be covered by + different frameworks diff --git a/open_issues/unit_testing.mdwn b/open_issues/unit_testing.mdwn index feda3be4..1378be85 100644 --- a/open_issues/unit_testing.mdwn +++ b/open_issues/unit_testing.mdwn @@ -8,7 +8,8 @@ Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled [[GNU Free Documentation License|/fdl]]."]]"""]] -This task may be suitable for [[community/GSoC]][[!tag gsoc-task]]. +This task may be suitable for [[community/GSoC]]: +[[community/gsoc/project_ideas/testing_framework]] --- @@ -80,263 +81,6 @@ abandoned). # Discussion -freenode, #hurd channel, 2011-03-05: - - <nixness> what about testing though? - <nixness> like sort of "what's missing? lets write tests for it so that - when someone gets to implementing it, he knows what to do. Repeat" - project - <antrik> you mean creating an automated testing framework? - <antrik> this is actually a task I want to add for this year, if I get - around to it :-) - <nixness> yeah I'd very much want to apply for that one - <nixness> cuz I've never done Kernel hacking before, but I know that with - the right tasks like "test VM functionality", I would be able to write up - the automated tests and hopefully learn more about what breaks/makes the - kernel - <nixness> (and it would make implementing the feature much less hand-wavy - and more correct) - <nixness> antrik, I believe the framework(CUnit right?) exists, but we just - need to write the tests. - <antrik> do you have prior experience implementing automated tests? - <nixness> lots of tests! - <nixness> yes, in Java mostly, but I've played around with CUnit - <antrik> ah, great :-) - <nixness> here's what I know from experience: 1) write basic tests. 2) - write ones that test multiple features 3) stress test [option 4) - benchmark and record to DB] - <youpi> well, what we'd rather need is to fix the issues we already know - from the existing testsuites :) - -[[GSoC project propsal|community/gsoc/project_ideas/testsuites]]. - - <nixness> youpi, true, and I think I should check what's available in way - of tests, but if the tests are "all or nothing" then it becomes really - hard to fix your mistakes - <youpi> they're not all or nothing - <antrik> youpi: the existing testsuites don't test specific system features - <youpi> libc ones do - <youpi> we could also check posixtestsuite which does too - -[[open_issues/open_posix_test_suite]]. - - <antrik> AFAIK libc has very few failing tests - -[[open_issues/glibc_testsuite]]. - - <youpi> err, like twenty? - <youpi> € grep -v '^#' expected-results-i486-gnu-libc | wc -l - <youpi> 67 - <youpi> nope, even more - <antrik> oh, sorry, I confused it with coreutils - <pinotree> plus the binutils ones, i guess - <youpi> yes - -[[open_issues/binutils#weak]]. - - <antrik> anyways, I don't think relying on external testsuites for - regression testing is a good plan - <antrik> also, that doesn't cover unit testing at all - <youpi> why ? - <youpi> sure there can be unit testing at the translator etc. level - <antrik> if we want to implement test-driven development, and/or do serious - refactoring without too much risk, we need a test harness where we can - add specific tests as needed - <youpi> but more than often, the issues are at the libc / etc. level - because of a combination o fthings at the translator level, which unit - testing won't find out - * nixness yewzzz! - <nixness> sure unit testing can find them out. if they're good "unit" tests - <youpi> the problem is that you don't necessarily know what "good" means - <youpi> e.g. for posix correctness - <youpi> since it's not posix - <nixness> but if they're composite clever tests, then you lose that - granularity - <nixness> youpi, is that a blackbox test intended to be run at the very end - to check if you're posix compliant? - <antrik> also, if we have our own test harness, we can run tests - automatically as part of the build process, which is a great plus IMHO - <youpi> nixness: "that" = ? - <nixness> oh nvm, I thought there was a test stuie called "posix - correctness" - <youpi> there's the posixtestsuite yes - <youpi> it's an external one however - <youpi> antrik: being external doesn't mean we can't invoke it - automatically as part of the build process when it's available - <nixness> youpi, but being internal means it can test the inner workings of - certain modules that you are unsure of, and not just the interface - <youpi> sure, that's why I said it'd be useful too - <youpi> but as I said too, most bugs I've seen were not easy to find out at - the unit level - <youpi> but rather at the libc level - <antrik> of course we can integrate external tests if they exist and are - suitable. but that that doesn't preclude adding our own ones too. in - either case, that integration work has to be done too - <youpi> again, I've never said I was against internal testsuite - <antrik> also, the major purpose of a test suite is checking known - behaviour. a low-level test won't directly point to a POSIX violation; - but it will catch any changes in behaviour that would lead to one - <youpi> what I said is that it will be hard to write them tight enough to - find bugs - <youpi> again, the problem is knowing what will lead to a POSIX violation - <youpi> it's long work - <youpi> while libc / posixtestsuite / etc. already do that - <antrik> *any* unexpected change in behaviour is likely to cause bugs - somewher - <youpi> but WHAT is "expected" ? - <youpi> THAT is the issue - <youpi> and libc/posixtessuite do know that - <youpi> at the translator level we don't really - <youpi> see the recent post about link() - -[link(dir,name) should fail with -EPERM](http://lists.gnu.org/archive/html/bug-hurd/2011-03/msg00007.html) - - <youpi> in my memory jkoenig pointed it out for a lot of such calls - <youpi> and that issue is clearly not known at the translator level - <nixness> so you're saying that the tests have to be really really - low-level, and work our way up? - <youpi> I'm saying that the translator level tests will be difficult to - write - <antrik> why isn't it known at the translator level? if it's a translator - (not libc) bug, than obviously the translator must return something wrong - at some point, and that's something we can check - <youpi> because you'll have to know all the details of the combinations - used in libc, to know whether they'll lead to posix issues - <youpi> antrik: sure, but how did we detect that that was unexpected - behavior? - <youpi> because of a glib test - <youpi> at the translator level we didn't know it was an unexpected - behavior - <antrik> gnulib actually - <youpi> and if you had asked me, I wouldn't have known - <antrik> again, we do *not* write a test suite to find existing bugs - <youpi> right, took one for the other - <youpi> doesn't really matter actually - <youpi> antrik: ok, I don't care then - <antrik> we write a test suite to prevent future bugs, or track status of - known bugs - <youpi> (don't care *enough* for now, I mean) - <nixness> hmm, so write something up antrik for GSoC :) and I'll be sure to - apply - <antrik> now that we know some translators return a wrong error status in a - particular situation, we can write a test that checks exactly this error - status. that way we know when it is fixed, and also when it breaks again - <antrik> nixness: great :-) - <nixness> sweet. that kind of thing would also need a db backend - <antrik> nixness: BTW, if you have a good idea, you can send an application - for it even if it's not listed among the proposed tasks... - <antrik> so you don't strictly need a writeup from me to apply for this :-) - <nixness> antrik, I'll keep that in mind, but I'll also be checking your - draft page - <nixness> oh cool :) - <antrik> (and it's a well known fact that those projects which students - proposed themselfs tend to be the most successful ones :-) ) - * nixness draft initiated - <antrik> youpi: BTW, I'm surprised that you didn't mention libc testsuite - before... working up from there is probably a more effective plan than - starting with high-level test suites like Python etc... - <youpi> wasn't it already in the gsoc proposal? - <youpi> bummer - <antrik> nope - -freenode, #hurd channel, 2011-03-06: - - <nixness> how's the hurd coding workflow, typically? - -*nixness* -> *foocraft*. - - <foocraft> we're discussing how TDD can work with Hurd (or general kernel - development) on #osdev - <foocraft> so what I wanted to know, since I haven't dealt much with GNU - Hurd, is how do you guys go about coding, in this case - <tschwinge> Our current workflow scheme is... well... is... - <tschwinge> Someone wants to work on something, or spots a bug, then works - on it, submits a patch, and 0 to 10 years later it is applied. - <tschwinge> Roughly. - <foocraft> hmm so there's no indicator of whether things broke with that - patch - <foocraft> and how low do you think we can get with tests? A friend of mine - was telling me that with kernel dev, you really don't know whether, for - instance, the stack even exists, and a lot of things that I, as a - programmer, can assume while writing code break when it comes to writing - kernel code - <foocraft> Autotest seems promising - -See autotest link given above. - - <foocraft> but in any case, coming up with the testing framework that - doesn't break when the OS itself breaks is hard, it seems - <foocraft> not sure if autotest isolates the mistakes in the os from - finding their way in the validity of the tests themselves - <youpi> it could be interesting to have scripts that automatically start a - sub-hurd to do the tests - -[[hurd/subhurd#unit_testing]]. - - <tschwinge> foocraft: To answer one of your earlier questions: you can do - really low-level testing. Like testing Mach's message passing. A - million times. The questions is whether that makes sense. And / or if - it makes sense to do that as part of a unit testing framework. Or rather - do such things manually once you suspect an error somewhere. - <tschwinge> The reason for the latter may be that Mach IPC is already - heavily tested during normal system operation. - <tschwinge> And yet, there still may be (there are, surely) bugs. - <tschwinge> But I guess that you have to stop at some (arbitrary?) level. - <foocraft> so we'd assume it works, and test from there - <tschwinge> Otherwise you'd be implementing the exact counter-part of what - you're testing. - <tschwinge> Which may be what you want, or may be not. Or it may just not - be feasible. - <foocraft> maybe the testing framework should have dependencies - <foocraft> which we can automate using make, and phony targets that run - tests - <foocraft> so everyone writes a test suite and says that it depends on A - and B working correctly - <foocraft> then it'd go try to run the tests for A etc. - <tschwinge> Hmm, isn't that -- on a high level -- have you have by - different packages? For example, the perl testsuite depends (inherently) - on glibc working properly. A perl program's testsuite depends on perl - working properly. - <foocraft> yeah, but afaik, the ordering is done by hand - -freenode, #hurd channel, 2011-03-07: - - <antrik> actually, I think for most tests it would be better not to use a - subhurd... that leads precisely to the problem that if something is - broken, you might have a hard time running the tests at all :-) - <antrik> foocraft: most of the Hurd code isn't really low-level. you can - use normal debugging and testing methods - <antrik> gnumach of course *does* have some low-level stuff, so if you add - unit tests to gnumach too, you might run into issues :-) - <antrik> tschwinge: I think testing IPC is a good thing. as I already said, - automated testing is *not* to discover existing but unknown bugs, but to - prevent new ones creeping in, and tracking progress on known bugs - <antrik> tschwinge: I think you are confusing unit testing and regression - testing. http://www.bddebian.com/~hurd-web/open_issues/unit_testing/ - talks about unit testing, but a lot (most?) of it is actually about - regression tests... - <tschwinge> antrik: That may certainly be -- I'm not at all an expert in - this, and just generally though that some sort of automated testing is - needed, and thus started collecting ideas. - <tschwinge> antrik: You're of course invited to fix that. - -IRC, freenode, #hurd, 2011-03-08 - -(After discussing the [[anatomy_of_a_hurd_system]].) - - <antrik> so that's what your question is actually about? - <foocraft> so what I would imagine is a set of only-this-server tests for - each server, and then we can have fun adding composite tests - <foocraft> thus making debugging the composite scenarios a bit less tricky - <antrik> indeed - <foocraft> and if you were trying to pass a composite test, it would also - help knowing that you still didn't break the server-only test - <antrik> there are so many different things that can be tested... the - summer will only suffice to dip into this really :-) - <foocraft> yeah, I'm designing my proposal to focus on 1) make/use a - testing framework that fits the Hurd case very well 2) write some tests - and docs on how to write good tests - <antrik> well, doesn't have to be *one* framework... unit testing and - regression testing are quite different things, which can be covered by - different frameworks +See the [[GSoC project idea|community/gsoc/project_ideas/testing_framework]]'s +[[discussion +subpage|community/gsoc/project_ideas/testing_framework/discussion]]. |