From 3bf27c93ac4de57623809b71517116d51465f0e1 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Sat, 17 Mar 2012 12:31:07 +0100 Subject: IRC bits. --- hurd/faq/slash_usr_symlink/discussion.mdwn | 45 +++ hurd/translator.mdwn | 5 +- hurd/translator/devfs.mdwn | 25 +- hurd/translator/tmpfs/discussion.mdwn | 93 ++++++ microkernel/mach/memory_object/discussion.mdwn | 13 +- microkernel/mach/pmap.mdwn | 74 +++++ microkernel/viengoos/documentation.mdwn | 14 +- .../viengoos/documentation/irc_2012-02-23.mdwn | 159 ++++++++++ open_issues/boehm_gc.mdwn | 42 ++- open_issues/bpf.mdwn | 122 +++++++ open_issues/dde.mdwn | 350 ++++++++++++++++++++- open_issues/default_pager.mdwn | 8 +- open_issues/glibc_madvise_vs_static_linking.mdwn | 15 +- open_issues/gnumach_memory_management.mdwn | 10 +- open_issues/linux_as_the_kernel.mdwn | 42 +++ .../memory_object_model_vs_block-level_cache.mdwn | 273 ++++++++++++++++ open_issues/select.mdwn | 187 ++++++++++- open_issues/trust_the_behavior_of_translators.mdwn | 181 +++++++++++ public_hurd_boxen/xen_handling.mdwn | 9 +- 19 files changed, 1642 insertions(+), 25 deletions(-) create mode 100644 hurd/faq/slash_usr_symlink/discussion.mdwn create mode 100644 microkernel/mach/pmap.mdwn create mode 100644 microkernel/viengoos/documentation/irc_2012-02-23.mdwn create mode 100644 open_issues/linux_as_the_kernel.mdwn create mode 100644 open_issues/memory_object_model_vs_block-level_cache.mdwn create mode 100644 open_issues/trust_the_behavior_of_translators.mdwn diff --git a/hurd/faq/slash_usr_symlink/discussion.mdwn b/hurd/faq/slash_usr_symlink/discussion.mdwn new file mode 100644 index 00000000..219e14e4 --- /dev/null +++ b/hurd/faq/slash_usr_symlink/discussion.mdwn @@ -0,0 +1,45 @@ +[[!meta copyright="Copyright © 2012 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!tag open_issue_documentation]] + + +# IRC, freenode, #hurd, 2012-02-01 + + I remember the time when we had a /usr symlink. Now fedora 17 + will move / to /usr and have /foo symlinks. :) + braunr: + http://www.freedesktop.org/wiki/Software/systemd/TheCaseForTheUsrMerge + braunr: fedora and others are merging /bin, /sbin and some other + into /usr + braunr: back in 1998 we tried for two years or so to have /usr -> + .. in Debian GNU/Hurd, but eventually we gave up on it, because it broke + some stuff + marcusb: Hi, which one is better (in your opinion): / or /usr? + gnu_srs: fedora says that using /usr allows better separation of + distribution files and machine-local files + marcusb: won't it break remote /usr ? + so you can atomically mount the OS files to /usr + gnu_srs: but in the end, it's a wash + personally, I think every package should get its own directory + marcusb: what PATH then ? + braunr: well, I guess you'd want to assemble a union filesystem + for a POSIX shell + marcusb: i don't see what you mean :/ + ah this comes from Lennart Poettering + braunr: check out for example how http://nixos.org/ does it + braunr: something like, union /package1/bin /package2/bin + /package3/bin for /bin, /package1/lib /package2/lib /package3/lib for + /lib, etc. I guess + manuel: would that scale well ? + the idea that there is only one correct binary for each program + with the name foo is noble, but a complete illusion that hides the + complexity of the actual configuration management task + marcusb: right diff --git a/hurd/translator.mdwn b/hurd/translator.mdwn index 3527267f..619c0db5 100644 --- a/hurd/translator.mdwn +++ b/hurd/translator.mdwn @@ -1,4 +1,4 @@ -[[!meta copyright="Copyright © 2007, 2008, 2009, 2010, 2011 Free Software +[[!meta copyright="Copyright © 2007, 2008, 2009, 2010, 2011, 2012 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable @@ -75,7 +75,8 @@ available. Read about translator [[short-circuiting]]. The [[concept|concepts]] of translators creates its own problems, too: -[[open_issues/translators_set_up_by_untrusted_users]]. +[[open_issues/translators_set_up_by_untrusted_users]], and +[[trust_the_behavior_of_translators]]. # Existing Translators diff --git a/hurd/translator/devfs.mdwn b/hurd/translator/devfs.mdwn index 27df23aa..8784e998 100644 --- a/hurd/translator/devfs.mdwn +++ b/hurd/translator/devfs.mdwn @@ -1,12 +1,12 @@ -[[!meta copyright="Copyright © 2009 Free Software Foundation, Inc."]] +[[!meta copyright="Copyright © 2009, 2012 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license -is included in the section entitled -[[GNU Free Documentation License|/fdl]]."]]"""]] +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] `devfs` is a translator sitting on `/dev` and providing what is to be provided in there in a dynamic fashion -- as compared to static passive translator @@ -18,3 +18,22 @@ settings as they're used now. If applicable, it has to be taken care that all code concerning the page-in path is resident at all times. + +--- + +# IRC, freenode, #hurd, 2012-01-29 + + what would be an hurdish way to achieve something like the + various system (udev, devfs, devd, etc) for populating devices files + automatically according to the found system devices? + (not that i plan anything about that, just curious) + it's not really a stupid question at all :) + I guess translators in /dev + such as a blockfs on /dev/block + pinotree: in an ideal world (userspace drivers and all), the + device nodes will be exported by the drivers themselfs; and the drivers + will be launched by the bus respective bus driver + an interesting aspect is what to do if we want a traditional flat + /dev directory with unique device names... probably need some + unionfs-like translator that collects the individual driver nodes in an + intelligent manner diff --git a/hurd/translator/tmpfs/discussion.mdwn b/hurd/translator/tmpfs/discussion.mdwn index 0409f046..1d441c7d 100644 --- a/hurd/translator/tmpfs/discussion.mdwn +++ b/hurd/translator/tmpfs/discussion.mdwn @@ -283,3 +283,96 @@ License|/fdl]]."]]"""]] what kind of log do you mean? vmstat 1 I mean ah... + + +## IRC, freenode, #hurd, 2012-02-01 + + I run fsx with this command: fsx -N3000 foo/bar -S4 + -l$((1024*1024*8)). And after 70 commands it breaks. + The strangeness is at address 0xc000 there is text, which was + printed in fsx with vfprintf + I've lost log. Wait a bit, while I generate new + mcsim, what's fsx / where can I find it ? + fsx is filesystem exersiser + http://codemonkey.org.uk/projects/fsx/ + ok thanks + i use it to test tmpfs + here is fsx that compiles on linux: http://paste.debian.net/154390/ + and Makefile for it: http://paste.debian.net/154392/ + mcsim, hmm, I get a failure with ext2fs too, is it expected? + yes + i'll show you logs with tmpfs. They slightly differ + here: http://paste.debian.net/154399/ + pre last operation is truncate + and last is read + during pre-last (or last) starting from address 0xa000, every + 0x1000 bytes appears text + skipping zero size read + skipping zero size read + truncating to largest ever: 0x705f4b + signal 2 + testcalls = 38 + this text is printed by fsx, by function prt + I've mistaken: this text appears even from every beginning + I know that this text appears exactly at this moment, because I + added check of the whole file after every step. And this error appeared + only after last truncation. + I think that the problem is in defpager (I'm fixing it), but I + don't understand where defpager could get this text + wow I get java code and debconf templates + So, my question is: is it possible for defpager to get somehow this + text? + possibly recycled, non-zeroed pages? + hmmm... probably you're right + 0x1000 bytes is consistent with the page size + Should I clean these pages in tmpfs? + or in defpager? + What is proper way? + mcsim, I'd say defpager should do it, to avoid leaking + information, I'm not sure though. + maybe tmpfs should also not assume the pages have been blanked + out. + if i do it in both, it could have big influence on performance. + i'll do it only in defpager so far. + jkoenig_: Thank you a lot + mcsim, no problem. + + +## IRC, freenode, #hurd, 2012-02-08 + + mcsim: You pushed another branch with cleaned-up patches? + yes. + mcsim: Anyway, any data from your report that we could be + interested in? (Though it's not in English.) + It's completely in ukrainian an and mostly describes some aspects + of hurd's work. + mcsim: OK. So you ran out of time to do the benchmarking, + etc.? + Comparing tmpfs to ext2fs with RAM backend, etc., I mean. + tschwinge: I made benchmarking and it turned out that tmpfs up to 6 + times faster than ext2fs + tschwinge: is it possible to have a review of work, I've already + done, even if parallel writing doesn't work? + mcsim: Do you need this for university or just a general review + for inclusion in the Git master branch? + general review + Will need to find someone who feels competent to do that... + the branch that should be checked is tmpfs-final + cool, i guess you tested also special types of files like + sockets and pipes? (they are used in eg /run, /var/run or similar) + Oh. I accidentally created this branch. It is my private + branch. I'll delete it now and merge everything to mplaneta/tmpfs/master + pinotree: Completely forgot about them :( I'll do it by all means + mcsim: no worries :) + tschwinge: Ready. The right branch is mplaneta/tmpfs/master + + +## IRC, freenode, #hurd, 2012-03-07 + + did you test it with sockets and pipes? + pinotree: pipes work and sockets seems to work too (I've created + new pfinet device for them and pinged it). + try with simple C apps + Anyway all these are just translators, so there shouldn't be any + problems. + pinotree: works diff --git a/microkernel/mach/memory_object/discussion.mdwn b/microkernel/mach/memory_object/discussion.mdwn index a2a1514b..907f859a 100644 --- a/microkernel/mach/memory_object/discussion.mdwn +++ b/microkernel/mach/memory_object/discussion.mdwn @@ -1,4 +1,4 @@ -[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]] +[[!meta copyright="Copyright © 2011, 2012 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this @@ -10,7 +10,10 @@ License|/fdl]]."]]"""]] [[!tag open_issue_documentation open_issue_gnumach]] -IRC, freenode, #hurd, 2011-08-05: +[[!toc]] + + +# IRC, freenode, #hurd, 2011-08-05 < neal> braunr: For instance, memory objects are great as they allow you to specify the mapping policy in user space. @@ -23,7 +26,8 @@ IRC, freenode, #hurd, 2011-08-05: < braunr> the kernel eviction policy :) < neal> that's an implementation detail -IRC, freenode, #hurd, 2011-09-05: + +# IRC, freenode, #hurd, 2011-09-05 mach isn't a true modern microkernel, it handles a lot of resources, such as high level virtual memory and cpu time @@ -65,3 +69,6 @@ IRC, freenode, #hurd, 2011-09-05: pages are going to be flushed by themselves [[open_issues/resource_management_problems]]. + + +# [[open_issues/memory_object_model_vs_block-level_cache]] diff --git a/microkernel/mach/pmap.mdwn b/microkernel/mach/pmap.mdwn new file mode 100644 index 00000000..6910bfd3 --- /dev/null +++ b/microkernel/mach/pmap.mdwn @@ -0,0 +1,74 @@ +[[!meta copyright="Copyright © 2012 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!tag open_issue_documentation open_issue_gnumach]] + + +# IRC, freenode, #hurd, 2012-02-01 + + on Hurd what is the difference between kernel memory object and + pmap module ?? + pmap is heap/libraries table for each thread while kernel memory + object refers to arbitary blobs of data ?? + sekon: pmap is the low level memory mapping module + i.e. it programs the mmu + and these aren't hurd-specific, they are mach modules + braunr: so kernel memonry objects consists of a bunch of pmaps ?? + sekon: memory objects can be various things, be specific please + (they're certainly not a bunch of pmaps though, no) + there is one pmap per vm_map, and there is one vm_map per task + and there is no need for double question marks, is ther ?? + lol then is kernel memory object , please excuse the metaphor + something like a base class for pmap + i don't know what a "kernel memory object" is, be specific please, + again + braunr: + http://courses.cs.vt.edu/~cs5204/fall05-gback/presentations/MachOS_Rajesh.ppt + goto page titled External Memory Management (EMM) on page 15 + Kernel memory object shows up + you know there are other formats for this document + nope .. i did not know that + in page 17 pmamp shows up + "the problems of external memory management" ? + braunr: the paper i am also reading is called x15mach_thesis + ah, that's mine + * sekon bows + :) + ok i see page 17 + so please good sir explain the relationship between kernel memory + object and pmap + (if any) + braunr: there is no mention of kernel memory object + again, i don't see any reference or definition of "kernel memory + object" + but your paper says + that when page faults occur + the kernel contact the manager for a kernel reference object + *memory + where ? + in section 2.1.3 (unless i read it wrong) + no just a sec + 2.1.5 + i never used the expression "kernel memory object" there :p + anyway, you're referring simple to memory objects as seen by + userspace pagers + a memory object is a data container + usually, it's a file + but it can be anything + the pager is the task that provides its content and implements the + object methods + as for the relation between them and the pmap module, it's a + distant one + i'll explain it with an example + page fault -> request content of memory object at a given offset + with given length from pager -> ask pmap to establish the mapping in the + mmu + braunr: thank you ver much + *very diff --git a/microkernel/viengoos/documentation.mdwn b/microkernel/viengoos/documentation.mdwn index 52ff7a48..edcc79a7 100644 --- a/microkernel/viengoos/documentation.mdwn +++ b/microkernel/viengoos/documentation.mdwn @@ -1,12 +1,12 @@ -[[!meta copyright="Copyright © 2008 Free Software Foundation, Inc."]] +[[!meta copyright="Copyright © 2008, 2012 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license -is included in the section entitled -[[GNU Free Documentation License|/fdl]]."]]"""]] +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] The most up-to-date documentation is in the source code itself, see in particular the header files in the hurd directory. @@ -17,7 +17,8 @@ version of that is available [[here|reference-guide.pdf]]. It is not, however, automatically regenerated, and thus may not be up to date. -Academic Papers: + +# Academic Papers * [Viengoos: A Framework for Stakeholder-Directed Resource Allocation](http://walfield.org/papers/2009-walfield-viengoos-a-framework-for-stakeholder-directed-resource-allocation.pdf). @@ -54,3 +55,8 @@ Academic Papers: argue that only a small static number of scheduling policies are needed in practice and advocate hierarchical policy specification and central realization. + + +# Miscellaneous + + * [[IRC_2012-02-23]] diff --git a/microkernel/viengoos/documentation/irc_2012-02-23.mdwn b/microkernel/viengoos/documentation/irc_2012-02-23.mdwn new file mode 100644 index 00000000..a3229be9 --- /dev/null +++ b/microkernel/viengoos/documentation/irc_2012-02-23.mdwn @@ -0,0 +1,159 @@ +[[!meta copyright="Copyright © 2012 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!meta title="IRC, freenode, #hurd, 2012-02-23"]] + +[[!tag open_issue_documentation open_issue_viengoos]] + + neal: i've read a bit about current modern microkernel based + systems, and i'm wondering + neal: can a capability be used for both request and replies, or + does messaging need something similar to reply ports ? + braunr: you want a reply port + think about a file server: + the file server publishes a capability to access something + and multiple entities use it + if you wanted just bidirectional caps + that's the idea i had in mind, i just wondered if it was actually + still the case in practice + you'd need to create a new capability every time you delegated the + cap + yes + thanks + what about send once rights ? + also, if you send on a cap and then start waiting on it you could + get your own reply :) + you can get around send-once rights by using a counter + no i mean, is their behaviour still needed/useful ? + the counter is kernel implemented + yes + as an optimization + so they're just a special case of capability + yes + not a special capability type of their own + but they eliminate the constant create/destroy sequence + (even if it was already the case at the implementation level in + mach, they were named separately which could confuse people) + hm + actually, send once rights were used for important notifications + such as dead port notifications + is this still handled at the kernel level in modern ukernels ? + in viengoos, this is called the version field + see chapter 2 + + http://www.gnu.org/software/hurd/microkernel/viengoos/documentation/reference-guide.pdf + neal: btw, congratulations for viengoos, it really is a very + interesting project: ) + thanks + i don't see the point of rewriting a mach clone after reading + about it eh + I would definately do the messenger concept again + but I'd not do persistence + i don't fully understand how messengers deal with blocking + did you read chapter 4? + i read all of it but didn't understand everything :) + it's quite abstract and i didn't make time to read some of the + source code + If you have specific questions, I can try to help + i'll read those chapter again and formulate my questions after + I may have to read them as well :) + i don't understand how you manage to separate IPC from threading + actually + are messengers queues ? + messengers are super-buffers + they contain a reference to a thread object + to send a message, I use a messenger + I put the data in a buffer + and then I attach the messenger to the target messenger + braunr: my stance is that we should try to incorporate the ideas + from Viengoos into Mach in an evolutionary process... + this causes an activation to be sent to the target messenger's + thread object + neal: which activation ? + an activation is like a CPU interrupt + neal: is it "allocated" at that moment, or taken from the sending + thread ? + (i'm not sure my question really makes sense to you :/) + braunr: not sure what you are asking exactly; but the basic idea + is that the receiving process preallocates message buffers + antrik: maybe, i'm not sure + when someone sends a message, it's stored in one of these buffers, + and the process gets a scheduler activation, so it can decide what to do + with it + antrik is right + the traget messenger designates a memory buffer + i'm wondering about the details of this activation + is it similar to thread migration ? + just before the activation, the data is copied to the messenger's + buffer + now someone needs to be notified + (that a message arrived) + that someone is the thread designated in the target messenger's + thread field + this is done by an activation + an activation is just an upcall + a thread is forced to a particular IP + an activation isn't a "what" it's a "how" + I never understood thread migration + as it's not really about threads + nor it is about migration + neal: what happens if another message comes in before the + activation handling tread is done with the previous one?... + the messenger is enqueued on the thread object + it is delivered when the thread is in normal mode + part of delivering an activation is putting the thread is activation + mode + when in activation mode, it can't receive any activations + i see + but then, when a thread receives an activation, does it handle + several queued messengers at once (not to loose events/messages) ? + (unless it does a blocking receive on a particular messenger, which + is necessary to support memory allocation in activated mode) + it handles one at a time + ah right + it can't lose events + activations are sent per messengers/events + well, it can + but it is possible to prevent this + neal: also, is message passing completely atomic ? + I'm not sure what you mean + which part + well, all parts of a message :) + in mach, a message can contain several parts + data, rights, passing one of them may fail + only the header is atomically processed + it's not atomic in the sense that a thread can observe the data copy + that's not what i meant + is a message completely transferred or not at all in case of + failure ? + it may be partially transferred + or can it be partially transferred + ok + for instance, if the target thread doesn't provide a memory buffer + then the data can't be copied + I don't recall off hand how I dealt with bad addresses + may be it is not possible + I don't remember + sorry + but if i read the message structure correctly, there can be one + data block, and several capability addresses in a single message, right ? + yes + ok + have you considered passing only one object (either data or + capability) per message ? + or is it too inefficient ? + you at least need a reply port + s/port/messenger/ + yes but can't it be passed separately ? + then you have server state + ik + hm yes + thanks for your answers: ) + no problem diff --git a/open_issues/boehm_gc.mdwn b/open_issues/boehm_gc.mdwn index 19bd1b21..e7f849f2 100644 --- a/open_issues/boehm_gc.mdwn +++ b/open_issues/boehm_gc.mdwn @@ -1,4 +1,4 @@ -[[!meta copyright="Copyright © 2010 Free Software Foundation, Inc."]] +[[!meta copyright="Copyright © 2010, 2012 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this @@ -281,3 +281,43 @@ It has last been run and compared on 2010-11-10, based on CVS HEAD sources from Git branches (2010-12-15: last change 2009-09). * + + +## IRC, OFTC, #debian-hurd, 2012-02-05 + +[[!tag open_issue_porting]] + + youpi: i think i found out the possible cause of the ecl and + mono issuess + -s + oh + basically, we don't have the realtime signals (so no + SIGRTMIN/SIGRTMAX defined), hence things use either SIGUSR1 or + SIGUSR2... which are used in libgc to resp. stop/resume threads when + "collecting" + i just patched ecl to use SIGINFO instead of SIGUSR1 (used when + no SIGRTMIN+2 is available), and it seems going on for a while + uh, why would SIGINFO work better than SIGUSR1? + it was a test, i tried the first "not common" signal i saw + my test was, use any signal different than USR1/2 + ah, sorry, I hadn't understood + you mean there's a conflict between ecl and mono using SIGUSR1, as + well as libgc? + yes + for example, in ecl sources see src/c/unixint.d, + install_process_interrupt_handler() + SIGINFO seems a sane choice + SIGPWR could have been a better choice if it was available :) + i would have chose an "unassigned" number, say SIGLOST (the + bigger one) + 10, but it would be greater than _NSIG (and thus discarded) + not a good idea indeed + it seems that linux, beside the range for rt signals, has some + "free space" + i'll start now another ecl build, from scratch this time, with + s/SIGUSR1/SIGINFO/ (making sure ctags won't bother), and if it works i'll + update svante's bug + + mmap(...PROT_NONE...) failed + hmm... + apparently enabling MMAP_ANON in mono's libgc copy was a good + step, let's see diff --git a/open_issues/bpf.mdwn b/open_issues/bpf.mdwn index 98b50430..2a8c897a 100644 --- a/open_issues/bpf.mdwn +++ b/open_issues/bpf.mdwn @@ -440,3 +440,125 @@ This is a collection of resources concerning *Berkeley Packet Filter*s. hm, there is a "snoop" source type, using raw sockets too far from the packet source, but i'll try it anyway hm wrong, snoop was the solaris packet filter fyi + + +## IRC, freenode, #hurd, 2012-01-28 + + nice, i have tcpdump working :) + let's see if it's as simple with wireshark + \o/ + pinotree: it was actually very simple + heh, POV ;) + yep, wireshark works too + promiscuous mode is harder to test :/ + but that's a start + + +## IRC, freenode, #hurd, 2012-01-30 + + ok so next step: get tcpreplay working + braunr: BTW, when you checked the status of the kernel BPF code, + did you take zhengda's enhancements/fixes into account?... + no + when did i check it ? + braunr: well, you said the kernel BPF code has serious + shortcomings. did you take zhengda's changes into account? + antrik: ah, when i mention the issues, i considered the userspace + translator only + antrik: and stuff like non blocking io, exporting a selectable + file descriptor + antrik: deb http://ftp.sceen.net/debian-hurd experimental/ + antrik: this is my easy to use repository with a patched + libpcap0.8 + and a small and unoptimized pcap-hurd.c module + it doesn't use devopen yet + i thought it would be better to have packet filtering working + first as a debian patch, then get the new translator+final patch upstream + braunr, tcpdump works great here (awesome!). I'm probably using + exactly the same setup and "hardware" as you do, though :-P + + +## IRC, freenode, #hurd, 2012-01-31 + + antrik: i tend to think we need a bpf translator, or anything + between the kernel and libpcap to provide selectable file descriptors + jkoenig: do you happen to know how mach_msg (as called in a + hello.c file without special macros or options) deals with signals ? + i mean, is it wrapped by the libc in a version that sets errno ? + braunr: no idea. + braunr: what's up with it? (not that i have an idea about your + actual question, just curious) + pinotree: i'm improving signal handling in my pcap-hurd module + i guess checking for MACH_RCV_INTERRUPTED will dio + -INFO is correctly handled :) + ok new patch seems fine + braunr: selectable file descriptors? + antrik: see pcap_fileno() for example + it returns a file descriptor matching the underlying object + (usually a socket) that can be multiplexed in a select/poll call + obviously a mach port alone can't do the job + i've upgraded the libpcap0.8 package with improved signal handling + for tests + braunr: no idea what you are talking about :-( + + +## IRC, freenode, #hurd, 2012-02-01 + + antrik: you do know about select/poll + antrik: you know they work with multiple selectable/pollable file + descriptors + on most unix systems, packet capture sources are socket + descriptors + they're selectable/pollable + braunr: what are packet capture sources? + antrik: objects that provide applications with packets :) + antrik: a PF_PACKET socket on Linux for example, or a Mach device, + or a BPF file descriptor on BSD + for a single network device? or all of them? + AIUI the userspace BPF implementation in libpcap opens this + device, waits for packets, and if any arrive, decides depending on the + rules whether to pass them to the main program? + antrik: that's it, but it's not the point + antrik: the point is that, if programs need to include packet + sources in select/poll calls, they need file descriptors + without a translator, i can't provide that + so we either decide to stick with the libpcap patch only, and keep + this limitation, or we write a translator that enables this feature + braunr: are the two options exclusive? + pinotree: unless we implement a complete bpf translator like i did + years ago, we'll need a patch in libpcap + pinotree: the problem with my early translator implementation is + that it's buggy :( + pinotree: and it's also slower, as packets are small enough to be + passed through raw copies + braunr: I'm not sure what you mean when talking about "programs + including packet sources". programs only interact with packet sources + through libpcap, right? + braunr: or are you saying that programs somehow include file + descriptors for packet sources (how do they obtain them?) in their main + loop, and explicitly pass control to libpcap once something arrives on + the respecitive descriptors? + antrik: that's the idea, yes + braunr: what is the idea? + 20:38 < antrik> braunr: or are you saying that programs somehow + include file descriptors for packet sources (how do they obtain them?) in + their main loop, and explicitly pass control to libpcap once something + arrives on the respecitive descriptors? + braunr: you didn't answer my question though :-) + braunr: how do programs obtain these FDs? + antrik: using pcap_fileno() for example + + +## IRC, freenode, #hurd, 2012-02-02 + + braunr: oh right, you already mentioned that one... + braunr: so you want some entity that exposes the device as + something more POSIXy, so it can be used in standard FS calls, unlike the + Mach devices used for pfinet + this is probably a good sentiment in general... but I'm not in + favour of a special solution only for BPF. rather I'd take this as an + indication that we probably should expose network interfaces as something + file-like in general after all, and adapt pfinet, eth-multiplexer, and + DDE accordingly + antrik: i agree + antrik: eth-multiplexer would be the right place diff --git a/open_issues/dde.mdwn b/open_issues/dde.mdwn index e2cff94f..adb070cd 100644 --- a/open_issues/dde.mdwn +++ b/open_issues/dde.mdwn @@ -1,4 +1,5 @@ -[[!meta copyright="Copyright © 2010, 2011 Free Software Foundation, Inc."]] +[[!meta copyright="Copyright © 2010, 2011, 2012 Free Software Foundation, +Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this @@ -13,3 +14,350 @@ License|/fdl]]."]]"""]] [[General Information|/dde]]. Still waiting for interface finalization and proper integration. + +[[!toc]] + + +# Upstream Status + + +## IRC, freenode, #hurd, 2012-02-08 + +At the microkernel davroom at [[community/meetings/FOSDEM_2012]]: + + there was quite some talk about DDE. I learnt that there are newer + versions in Genode and in Minix (as opposed to the DROPS one we are + using) + but apparently none of the guys involved is interested in creating + a proper upstream project with central repository and communication + channels :-( + the original DDE creator was also there, but said he isn't working + on it anymore + OK, and the other two projects basically have their own forks. + Or are they actively cooperating? + (If you know about it.) + well, Genode is also part of the Dresden L4 group; but apart from + that, I'd rather call it a fork... + hm... actually, I'm not sure anymore whether the guy I talked to + was from Genode or Nova... + (both from the Dresdem L4 group) + + +## IRC, freenode, #hurd, 2012-02-19 + + antrik: do we know exactly which DDE version Zheng Da took as a + base ? + (so as to be able to merge new changes easily) + youpi: not sure... but from what I gathered at FOSDEM, the version + he based on (from DROPS) is not really actively developed right now; if + we want to go for newer versions, we probably have to look at other + projects (like Genode or Nova or Minix) + there's no central project for dde ? + that sucks + no... and nobody seemed interested in having one :-( + pff + which makes me seriously question the original decision to build + on DDE... + .. + if we have to basically maintain it ourselfs anyways, we could + just as well have gone with custom glue + well, the advantage of DDE is that it already exists now + on the positive side, one of the projcets (not sure which) + apparently have both USB and SATA working with some variant of DDE + + +# IRC, OFTC, #debian-hurd, 2012-02-15 + + i have no idea how the dde system works + gnumach patch to provide access to physical memory and interrupts + then userland accesses i/o ports by hand to drive things + but that assumes that no kernel driver is interfering + so one has to disable kernel drivers + how are dde drivers used? can they be loaded on their own + automatically, or you have to settrans yourself to setup a device? + there's no autoloader for now + we'd need a bus arbitrer that'd do autoprobing + i see + (you see i'm not really that low level, so pardon the flood of + posssibly-noobish questions ;) ) + I haven't set it up yet, but IIRC you need to specify which driver + to be used + well, I mostly have the same questions actually :) + I just have some guesswork here :) + i wonder whether the following could be feasible: + I'm wondering how we'll manage to make it work in d-i + a) you create a package which would b-d on linux-source, build a + selection of (network only for now) drivers and install them in + /hurd/dde/ + probably through a choice at the boot menu + I wouldn't dare depending on linux-source + dde is usually not up-to-date + b) add a small utility over the actual fsys_settrans() which + would pick the driver from /hurd/dde/ + ... so you could do `set-dde-driver b43 ` (or something + like that) + we can provide something like b) yes + although documenting the settrans should be fine enough ;) + the a) would help/ease with the fact that you need to compile on + your own the drivers + otherwise we would need to create a new linux-dde-sources-X.Y.Z + only with the sources of the drivers we want from linux X.Y.Z + (or hurd-dde-linux-X.Y.Z) + samuel.thibault * raccdec3 gnumach/debian/ (changelog + patches/70_dde.patch patches/series): + Add DDE experimental support + * debian/patches/70_dde.patch: Add experimental support for irq + passing and + physical memory allocation for DDE. Also adds nonetdev boot + parameter to + disable network device drivers. + ok, boots fine with the additional nonetdev option + now I need to try that dde hurd branch :) + samuel.thibault * rf8b9426 gnumach/debian/patches/70_dde.patch: Add + experimental.defs to gnuamch-dev + + +# IRC, freenode, #hurd, 2012-02-19 + + * youpi got dde almost working + it's able to send packets, but apparently not receive them + (e1000) + ok, rtl8139 works + antrik: the wiki instructions are correct + with e1000 I haven't investigated + (Archhurd guys also reported problems with e1000 IIRC... the one I + built a while back works fine though on my T40p with real e1000 NIC) + maybe I should try with current versions... something might got + broken by later changes :-( + at least testing could tell the changeset which breaks it + Mmm, it's very odd + with the debian package, pfinet's call to device_set_filter returns + D_INVALID_OPERATION + and indeed devnode.c returns that + ah but it's libmachdev which is supposed to answer here + youpi: so, regarding the failing device_set_filter... I guess you + are using some wrong combination of gnumach and pfinet + no it's actually that my pfinet was not using bpf + I've now fixed it + the DDE drivers rely on zhengda's modified pfinet, which uses + devnode, but also switched to using proper BPF filters. so you also need + his BPF additions/fixes in gnumach + OK + that's the latter + I had already fixed the devnode part + but hadn't seen that the filter was different + err... did I say gnumach? that of course doesn't come into play + here + so yes, you just need a pfinet using BPF + libmachdev does ;) + I'm just using pfinet from zhengda's DDE branch... I think devnode + and BPF are the only modifications + there's also a libpcap modification in the incubator + probably for tcpdump etc. + libpcap is used by the modified pfinet to compile the filter rule + why does pfinet need to compile the rule ? + it's libbpf which is used in the dde driver + it doesn't strictly need to... but I guess zhengda considered it + more elegant to put the source rule in pfinet on compile it live, rather + than the compiled blob + I probably discussed this with him myself a few years back... but + my memory on this is rather hazy ;-) + err... and compile it live + ah, right, it's only used when asking pfinet to change its filter + but it does not need it for the default filter + which is hardcoded + I see + when would pfinet change its filter? + * youpi now completely converting his hurd box to debian packages with dde + support + on SIOCSIFADDR apparently + to set "arp or (ip host %s)", + well, that sounds like the default filter... + the default filter does not choose an IP + oh, right... pfinet has to readjust the filter when setting the IP + arg we lack support for kernel options for gnumach in update-grub + again, I have a vague recollection of discussing this + * youpi crosses fingers + yay, works + so we *do* need libpcap in pfinet to set proper rules... though I + guess it can also work with a static catchall rule (like it did before + zhengda's changes), only less efficient + well in the past we were already catching everything anyway, so at + least it's not a regression :) + right + + +# IRC, freenode, #hurd, 2012-02-20 + + I was a bit wary of including the ton of dde headers in hurd-dev + maybe adding another package for that + but that would have delayed introducing the dde binaries + probably we can do that for next upload + i can try to work on it, if is feasible (ie if the dde drivers + can currently be built from outside the hurd source tree) + it should be, it's a matter of pointing its makefile to a place + where the make scripts and include headers are + (and the libraries) + ok + youpi: you mean DDEKit headers? + pinotree: actually it doesn't matter where the dde-ified Linux + drivers are built -- libdde_linux26 and the actual drivers use a + completetly different build system anyways + in fact we concluded at some point that they should live in a + separate repository -- but that change never happened + only the base stuff (ddekit, libmachdev etc.) belong in the Hurd + source tree + antrik: yes + antrik: err, not really completely different + the actual drivers' Makefile include the libdde_linux26 mk files + the build itself is separate, though + youpi: yes, I mean both libdde_linux26 and the drivers use a build + system that is completely distinct from the Hurd one + ah, yes + libdde_linux26 should however be shipped in the system + ideally libdde_linux26 and all the drivers should be built in one + go I'd say... + it should be easily feasible to also have a separate driver too + e.g. to quickly try a 2.6 driver + youpi: I'm not sure about that. it's not even dynamically linked + IIRC?... + with scripts to build it + it's not + but that doesn't mean it can't be separate + .a files are usually shipped in -dev packages + youpi: ideally we should try to come with a build system that + reuses the original Linux makefile snippets to build all the drivers + automatically without any manual per-driver work + there's usually no modification of the drivers themselves? + but yeah + "ideally", when somebody takes the time to do it + unfortunately, it's necessary to include one particular + Hurd/DDE-specific header file in each driver :-( + can't it be done through gcc's -include option? + zhengda didn't find a way to avoid this... though I still hope + that it must be possible somehow + I think the problem is that it has to be included *after* the + other Linux headers. don't remember the details though + uh + well, a good script can add a line after the last occurrence of + #include + yeah... rather hacky, but might work + even with a bit of grep, tail, cut, and sed it should work :) + note that this is Hurd-specific; the L4 guys didn't need that + what is it? + don't remember off-hand + + +# IRC, freenode, #hurd, 2012-02-22 + + antrik: AIUI, it should be possible to include all network drivers + in just one binary? + that'd permit to use it in d-i + and completely replace the mach drivers + we just need to make sure to include at least what the mach drivers + cover + (all DDE network drivers, I mean) + of course that doesn't hinder from people to carefully separate + drivers in several binaries if they wish + antrik: it does link at least, I'll give a try later + yes it works! + that looks like a plan + throw all network drivers in a /hurd/dde_net + settrans it on /dev/dde_net, and settrans devnode on /dev/eth[0-9] + I'm also uploading a version of hurd which includes headers & + libraries, so you just need a make in dde_{e100,e1000,etc,net} + (uploading it with the dde driver itself :) ) + btw, a nice thing is that we don't really care that all drivers are + stuffed into a single binary, since it's a normal process only the useful + pages are mapped and actually take memory :) + is that really a nice thing though? compared to other systems I + mean + I know on linux it only loads the modules I need, for example. It's + definitely a step up for hurd though :D + that's actually precisely what I mean + you don't need to load only the modules you need + you just load them all + and paging eliminates automatically what's not useful + even parts of the driver that your device will not need + ooh + awesome + (actually, it's not even loaded, but the pci tables of the drivers + are loaded, then paged out) + + +# IRC, freenode, #hurd, 2012-02-24 + + antrik_: about the #include , I see the issue, it's + about jiffies + it wouldn't be a very good thing to have a jiffies variable which + we'd have to update 100times per second + so ZhengDa preferred to make jiffies a macro which calls a function + which reads the mapped time + however, that break any use of the work "jiffies", e.g. structure + members & such + actually it's not only after headers that the #include has to be + done, but after any code what uses the word "jiffies" for something else + than the variable + pb is: it has to be done *before* any code that uses the word + "jiffies" for the variable, e.g. inline functions in headers + in l4dde, there's already the jiffies variable so it's not a + problem + + +# IRC, OFTC, #debian-hurd, 2012-02-27 + + I plan to do some light performance testing w.r.t. DDE + Ethernet. That is DDE vs. Mach, etc. + that'd be good, indeed + I'm getting 4MiB/s with dde + I don't remember with mach + Yes. It just shouldn't regress too much. + Aha, OK. + + +## IRC, freenode, #hurd, 2012-02-27 + + tschwinge: nttcp tells me ~80Mbps for mach-rtl8139, ~72Mbps for + dde-rtl8139, ~72Mbps for dde-e1000 + civodul: ↑ btw + youpi: so the dde network device is not much slower than the + kernel-one? + youpi: yes, looks good + rather almost the same speed + apparently + that’s quite a deal. + what speed should it have as maximum? + (means: does the mach version get out all that’s possible?) + differently put: What speed would GNU/Linux get? + I'm checking that right now + cool! + we need those numbers for the moth after the next + Mmm, I'm not sure you really want the linux number :) + 1.6Gbps :) + oh… + let me check with udp rather than tcp + so the Hurd is a “tiny bit” worse at using the network… + it might simply be an issue with tcp tuning in pfinet + hm, yes + tcp is not that cheap + and has some pretty advanced stuff for getting to high speeds + well, I'm not thinking about being cheap + but using more recent tuning + that does not believe only 1Mbps network cards exist :) + like adaptive windows and such? + :) + yes + * ArneBab remembers that TCP was invented when the connections went over + phone lines - by audio :) + yep + what’s the system load while doing the test? + yes, udp seems not so bad + ah, cool! + it's very variable (300-3000Mbps), but like on linux + that pushing it into user space has so low cost is pretty nice. + * ArneBab thinks that that’s a point where Hurd pays off + that's actually what AST said to fosdem + he doesn't care about putting an RPC for each and every port i/o + because hardware is slow anyway + jupp + but it is important to see that in real life diff --git a/open_issues/default_pager.mdwn b/open_issues/default_pager.mdwn index 683dd870..9a8e9412 100644 --- a/open_issues/default_pager.mdwn +++ b/open_issues/default_pager.mdwn @@ -10,7 +10,10 @@ License|/fdl]]."]]"""]] [[!tag open_issue_gnumach]] -IRC, freenode, #hurd, 2011-08-31: +[[!toc]] + + +# IRC, freenode, #hurd, 2011-08-31 braunr: do you have any idea what could cause the paging errors long before swap is exhausted? @@ -29,3 +32,6 @@ IRC, freenode, #hurd, 2011-08-31: uvm is too different dragonflybsd maybe, but it's very close to freebsd i didn't look at darwin/xnu + + +# [[trust_the_behavior_of_translators]] diff --git a/open_issues/glibc_madvise_vs_static_linking.mdwn b/open_issues/glibc_madvise_vs_static_linking.mdwn index 6238bc77..7b5963d3 100644 --- a/open_issues/glibc_madvise_vs_static_linking.mdwn +++ b/open_issues/glibc_madvise_vs_static_linking.mdwn @@ -1,4 +1,5 @@ -[[!meta copyright="Copyright © 2010, 2011 Free Software Foundation, Inc."]] +[[!meta copyright="Copyright © 2010, 2011, 2012 Free Software Foundation, +Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this @@ -24,3 +25,15 @@ to ignore the advice.* (`man madvise`), so we may simply want to turn it into a no-op in glibc, avoiding the link-time warning. 2011-07: This is what Samuel has done for Debian glibc. + + +# IRC, freenode, #hurd, 2012-02-16 + + youpi: With Roland's fix the situation will be that just using + gcc -static doesn't emit the stub warning, but still will do so in case + that the source code itself links in madvise. Is this acceptable for + you/Debian/...? + packages with -Werror will still bug out + not that I consider -Werror to be a good idea, though :) + youpi: Indeed. Compiler warnings can be caused by all kinds of + things. -Werror is good for development, but not for releases. diff --git a/open_issues/gnumach_memory_management.mdwn b/open_issues/gnumach_memory_management.mdwn index c34d1200..d29e316c 100644 --- a/open_issues/gnumach_memory_management.mdwn +++ b/open_issues/gnumach_memory_management.mdwn @@ -2089,7 +2089,15 @@ There is a [[!FF_project 266]][[!tag bounty]] on this task. ou alors dans les .h ipc -# IRC, freenode, #hurdfr, 2012-01-18 +# IRC, freenode, #hurd, 2012-01-18 does the slab branch need other reviews/reports before being integrated ? + + +# IRC, freenode, #hurd, 2012-01-30 + + youpi: do you have some idea about when you want to get the slab + branch in master ? + I was considering as soon as mcsim gets his paper + right diff --git a/open_issues/linux_as_the_kernel.mdwn b/open_issues/linux_as_the_kernel.mdwn new file mode 100644 index 00000000..f14b777e --- /dev/null +++ b/open_issues/linux_as_the_kernel.mdwn @@ -0,0 +1,42 @@ +[[!meta copyright="Copyright © 2012 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +Instead of attempting a [[history/port_to_another_microkernel]], or writing an +own one, an implementation of a Hurd system could use another existing +operating system/kernel, like [[UNIX]], for example, the Linux kernel. This is +not a [[microkernel]], but that is not an inherent hindrance; depending on what +the goals are. + +There has been an attempt for building a [[Mach_on_top_of_POSIX]]. + + +# IRC, freenode, #hurd, 2012-02-08 + +Richard's X-15 Mach re-implementation: + + and in case you didn't notice, it's stalled + actually i don't intend to work on it for the time being + i'd rather do as neal suggested: take linux, strip it, and give it + a mach interface + (if your goal really is to get something usable for real world + tasks) + braunr: why would you want to strip down Linux? I think one of the + major benefits of building a Linux-Frankenmach would be the ability to + use standard Linux functionality alongside Hurd... + we could have a linux x86_64 based mach replacement in "little" + time, with a compatible i386 interface for the hurd + antrik: well, many of the vfs and network subsystems would be hard + to use + BTW, one of the talks at FOSDEM was about the possibility of using + different kernels for Genode, and pariticularily focused on the + possibilities with using Linux... unfortunately, I wasn't able to follow + the whole talk; but they mentioned similar possibilities to what I'm + envisioning here :-) + diff --git a/open_issues/memory_object_model_vs_block-level_cache.mdwn b/open_issues/memory_object_model_vs_block-level_cache.mdwn new file mode 100644 index 00000000..7da5dea4 --- /dev/null +++ b/open_issues/memory_object_model_vs_block-level_cache.mdwn @@ -0,0 +1,273 @@ +[[!meta copyright="Copyright © 2012 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!tag open_issue_documentation open_issue_hurd open_issue_gnumach]] + + +# IRC, freenode, #hurd, 2012-02-14 + + Open question: what do you think about dropping the memory object + model and implementing a simple block-level cache? + +[[microkernel/mach/memory_object]]. + + slpz: AFAIK the memory object has more purpose than just cache, + it's allow used for passing chunk of data between processes, handling + swap (which similar to cache, but still slightly different), ... + kilobug: user processes usually make their way to data with POSIX + operations, so memory objects are only needed for mmap'ed files + kilobug: and swap can be replaced for an in-kernel system or even + could still use the memory object + slpz: memory objects are used for the page cache + slpz: translators (especially diskfs based) make heavy use of + memory objects, and if "user processes" use POSIX semantics, Hurd process + (translators, pagers, ...) shouldn't be bound to POSIX + braunr: and page cache could be moved to a lower level, near to the + devices + not likely + well, it could, but then you'd still have the file system overhead + kilobug: but the use of memory objects it's not compulsory, you can + easily write a fs translator without implementing memory objects at all + (except to mmap) + a unified buffer/VM cache as all modern systems have is probably + the most efficient approach + braunr: I agree. I want to look at *BSD/Linux vfs systems to seem + how much cache policy depends on the filesystem + braunr: Are you aware of any good papers on this matter? + netbsd UVM, the linux virtual memory system + both a bit old bit still relevant + braunr: Thanks. + the problem in our case is that having FS and cache information at + different contexts (kernel vs. translator), I find hard to coordinate + them. + that's why I though about a block-level cache that GNU Mach could + manage by itself + I wonder how QNX deals with this + the point of having a simple page cache is explicitely about not + caring if those pages are blocks or files or whatever + the kernel (at least, mach) normally has all the accounting + information it needs to implement its cache policy + file system translators shouldn't cache much + the pager interface could be refined, but it looks ok to me as it + is + Mach has the accounting info, but it's not able to purge the cache + without coordination with translators + which is normal + And this is a big problem when memory pressure increases, as it + doesn't know for sure when memory is going to be freed + Mach flushes its cache when it decides to, and sends back dirty + pages if needed by the pager + that's the case with every paging implementation + the main difference is security with untrusted pagers + but that's another issue + but in a monolithic implementation, the kernel is able for force a + chunk of cache memory to be freed without hoping for other process to do + the job + that's not true + they're not process, they're threads, but the timing issue is the + same + see pdflush on linux + no, it isn't. + when memory is scarce, threads that request memory can either wait + or immediately fail, and if they wait, they're usually woken by one of + the vm threads once flushing is done + a kernel thread can access all the information in the kernel, and + synchronization is pretty easy. + on mach, synchronization is done with messages, that's even easier + than shared kernel locks + with processes in different spaces, resource coordination becomes + really difficult + and what kind of info would an external pager need when simply + asked to take back its dirty pages + what resources ? + just take a look at the thread storm problem when GNU Mach needs to + clean a bunch of pages + Mach is big enough to correctly account memory + there can be thread storms on monolithic systems + that's a Mach issue, not a microkernel issue + that's why linux limits the number of pdflush thread instances + Mach can account memory, but can't assure when be freed by any + means, in a lesser degree than a monolithic system + again i disagree + no system can guarantee when memory will be freed with paging + a block level cache can, for most situations + slpz: why ? + slpz: or how i mean ? + braunr: with a block-level page cache, GNU Mach should be able to + flush dirty pages directly to the underlaying device without all the + complexity and resource cost involved in a m_o_data_return message. It + can also throttle the rate at which pages are being cleaned, and do all + this while blocking new page allocations to deal with memory exhaustion + cases. + braunr: in the current state, when cleaning dirty pages, GNU Mach + sends a bunch on m_o_data_return to the corresponding pagers, hoping they + will do their job as soon and as fast as possible. + memory is not really freed, but transformed from page cache to + anonymous memory pertaining to the corresponding translator + and GNU Mach never knows for sure when this memory is released, if + it ever is. + not being able to flush dirty pages synchronously is a big problem + when you need to throttle memory usage + and needing allocating more memory when you're trying to free (which + is the case for the m_o_data_return mechanism) makes the problem even + worse + your idea of a block level cache means in kernel block drivers + that's not the direction we're taking + i agree flushing should be a synchronous process, which was one of + the proposed improvements in the thread migration papers + (they didn't achieve it but thought about it for future works, so + that the thread at the origin of the fault would handle it itself) + but it should be possible to have kernel threads similar to + pdflush and throttle flush requests too + again, i really think it's a mach bug, and having a buffer cache + would be stepping backward + the real design issue is allocating memory while trying to free + it, yes + braunr: thread migration doesn't apply to asynchronous IPC, and the + entire paging mechanism is implemented this way + in fact, trying to do a synchronous m_o_data_return will trigger a + deadlock for sure + to achieve synchronous flushing with translators, the entire paging + model must be redesigned + It's true that I'm not very confident of the viability of user space + drivers + at least, not for every device + I know this is against the current ideas for most ukernel designs, + but if we want to achieve real work functionality, I think some + sacrifices must be done. Or at least a reasonable compromise. + slpz: thread migration for paging requests implies synchronous + RPC, we don't care much about the IPC layer there + and it requires large changes of the VM code in addition, yes + let's not talk about this, we don't have thread migration anyway + :p + except the allocation-on-free-path issue, i really don't see how + the current pager interface or the page cache creates problems wrt + flushing .. + monolithic systems also have that problem, with lower impacts + though, but still + braunr: because as it doesn't know when memory is really freed, 1) + it just blindly sends a bunch of m_o_data_return to the pagers, usually + overloading them (causing thread storms), and 2) it can't properly + throttle new page requests to deal with resource exhaustion + it does know when memory is really freed + and yes, it blindly sends a bunch of requests, they can and should + be trottled + but dirty pages freed become indistinguishable from common anonymous + chunks released, so it doesn't really know if page flushes are really + working or not (i.e. doesn't know how fast a device is processing write + requests) + memory is freed when the pager deallocates it + the speed of the operation is irrelevant + no system can rely on disk speed to guarantee correct page flushes + disk or anything else + requests can't be throttled if Mach doesn't know when they are being + processed + it can easily know it + they are processed as soon as the request is sent from the kernel + and processing is done when the pager acknowledges the end of the + flush + memory backing the flushed pages should be released before + acknowleding that to avoid starting new requests too soon + AFAIK pagers doesn't acknowledge the end of the flush + well that's where the interface should be refined + Mach just sends the m_o_data_return and continues on its own + that's why flushing should be synrhconous + are you sure about that however ? + so the entire paging system needs a new design... :) + pretty sure + not a new design .. + there is m_o_supply_completed, i don't see how difficult it would + be to add m_o_data_return_completed + it's not a small change, but not a difficult one either + i'm more worried about the allocation problem + the default pager should probably be wired in memory + maybe others + let's suppose a case in which Mach needs to free memory due to an + increase in its pressure. vm_pageout_daemon starts running, clean pages + are freed easily, but for each dirty one a m_o_data_return in sent. 1) + when should this daemon stop sending m_o_data_return and start waiting + for m_o_data_return_completed? 2) what happens if the translator needs to + read new blocks to fulfill a write request (pretty common in ext2fs)? + it should stop after an arbitrary limit is reached + a reasonable one + linux limits the number of pdflush threads for that reason as i + mentioned (to 8 iirc) + the problem of reading blocks while flushing is what i'm worried + about too, hence the need to wire that code + well, i'm nto sure it's needed + again, a reasonable about of free memory should be reserved for + that at all times + but the work for pdflush seems to be a lot easier, as it only deals + directly with block devices (if I understood it correctly, I just started + looking at it). + i don't know how other systems compute that, but this is how they + seem to do as well + no, i don't think so + well, I'll try to invest a few days understanding how pdflush work, + to see if some ideas can be borrowed for Hurd + iirc, freebsd has thresholds in percent for each part of its cache + (active, inactive, free, dirty) + but I still think simple solutions work better, and using the memory + object for page cache is tremendously complex. + the amount of free cache pages is generally sufficient to + guarantee much memory can be released at once if needed, without flushing + anything + yes but that's the whole point of the Mach VM + and its greatest advance .. + what, memory objects? + yes + using physical memory as a cache for anything, not just block + buffers + memory objects work great as a way to provide a shared image of + objects between processes, but as page cache they are an overkill (IMHO). + or, at least, in the way we're using them + probably + http://lwn.net/Articles/326552/ + this can help udnerstand the problems we may have without better + knowledge of the underlying devices, yes + (e.g. not being able to send multiple requests to pagers that + don't share the same disk) + slpz: actually i'm not sure it's that overkill + the linux vm uses struct vm_file to represent memory objects iirc + there are many links between that structure and some vfs related + subsystems + when a system very actively uses the page cache, the kernel has to + maintain a lot of objects to accurately describe the cache content + you could consider this overkill at first too + the mach way of doing it just implies some ipc messages instead of + function calls, it's not that overkill for me + the main problems are recursion (allocation while freeing, + handling page faults in order to handle flushes, that sort of things) + struct file and struct address_space actually + slpz: see struct address_space, it contains a set of function + pointers that can help understanding the linux pager interface + they probably sufferred from similar caveats and worked around + them, adjusting that interface on the way + but their strategy makes them able to treat the relationship between + the page cache and the block devices in a really simple way, almost as a + traditional storage cache. + meanwhile on Mach+pager scenario, the relationship between a block + in a file and its underlying storage becomes really blurry + this is a huge advantage when flusing out data, specially when + resources are scarce + I think the idea of using abstract objects for page cache, loses a + bit the point that we just want to avoid accessing constantly to a slow + device + and breaking the tight relationship between the device and its + cache, makes things a lot harder + this also manifest itself when flushing clean pages, as things like + having an static maximum for cached memory objects + we shouldn't care about the number of objects, we just need to + control the number of pages + but as we need the pager to flush pages, we need to keep alive a lot + of control ports to them + slpz: When mo_data_return is called, once the memory manager no + longer needs supplied data, it should be deallocated using + vm_deallocate. So this way pagers acknowledges the end of flush. diff --git a/open_issues/select.mdwn b/open_issues/select.mdwn index 0b69d645..abec304d 100644 --- a/open_issues/select.mdwn +++ b/open_issues/select.mdwn @@ -14,9 +14,11 @@ License|/fdl]]."]]"""]] There are a lot of reports about this issue, but no thorough analysis. -# `elinks` +# Short Timeouts -IRC, unknown channel, unknown date. +## `elinks` + +IRC, unknown channel, unknown date: This is related to ELinks... I've looked at the select() implementation for the Hurd in glibc and it seems that giving it a short @@ -31,9 +33,186 @@ IRC, unknown channel, unknown date. Or do I just imagine this problem? -# dbus +## [[dbus]] + + +## IRC + +### IRC, freenode, #hurd, 2012-01-31 + + don't you find vim extremely slow lately ? + (and not because of cpu usage but rather unnecessary sleeps) + yes. + wasn't there a discussion to add a minimum timeout to mach_msg for + select() or something like that during the past months ? + there was, and it was added + that could be it + I don't want to drop it though, some app really need it + as a debian patch only iirc ? + yes + ok + if i'm right, the proper solution was to fix remote servers + instead of client calls + (no drop, unless the actual bug gets fixed of course) + so i'm guessing it's just a hack in between + not only + with a timeout of zero, mach will just give *no* time for the + servers to give an answer + that's because the timeout is part of the client call + so the protocol has to be rethought, both server/client side + a suggested solution was to make it a parameter + i mean, part of the message + not a mach_msg parameter + OTOH the servers should probably not be trusted to enforce the + timeout. + why ? + they're not necessarily trusted. (but then again, that's not the + only circumstances where that's a problem) + there is a proposed solution for that too (trust root and self + servers only by default) + I'm not sure they're particularily easy to identify in the + general case + "they" ? the solutions you mean ? + or the servers ? + jkoenig: you can't trust the servers in general to provide an + answer, timeout or not + yes the root/self servers. + ah + jkoenig: you can stat the actual node before dereferencing the + translator + could they not report FD activity asynchronously to the message + port? libc would cache the state + I don't understand what you mean + anyway, really making the timeout part of the message is not a + problem + 10:10 < youpi> jkoenig: you can't trust the servers in general to + provide an answer, timeout or not + we already trust everything (e.g. read() ) into providing an answer + immediately + i don't see why + braunr: put sleep(1) in S_io_read() + it'll not give you an immediate answer, O_NODELAY being set or not + well sleep is evil, but let's just say the server thread blocks + ok + well fix the server + so we agree + ? + in the current security model, we trust the server into achieve the + timeout + yes + and jkoenig's remark is more global than just select() + taht's why we must make sure we're contacting trusted servers by + default + it affects read() too + sure + so there's no reason not to fix select() + that's the important point + but this doesn't mean we shouldn't pass the timeout to the server + and expect it to handle it correctly + we keep raising issues with things, and not achieve anything, in + the Hurd + if it doesn't, then it's a bug, like in any other kernel type + I'm not the one to convince :) + eh, some would say it's one of the goals :) + who's to be convinced then ? + jkoenig: + who raised the issue + ah + well, see the irc log :) + not that I'm objecting to any patch, mind you :-) + i didn't understand it that way + if you can't trust the servers to act properly, it's similar to + not trusting linux fs code + no, the difference is that servers can be non-root + while on linux they can't + again, trust root and self + non-root fuse mounts are not followed by default + as with fuse + that's still to be written + yes + and as I said, you can stat the actual node and then dereference + the translator afterwards + but before writing anything, we'd better agree on the solution :) + which, again, "just" needs to be written + err... adding a timeout to mach_msg()? that's just wrong + (unless I completely misunderstood what this discussion was + about...) + + +#### IRC, freenode, #hurd, 2012-02-04 + + this is confirmed: the select hack patch hurts vim performance a + lot + I'll use program_invocation_short_name to make the patch even more + ugly + (of course, we really need to fix select somehow) + could it (also) be that vim uses select() somehow "badly"? + fsvo "badly", possibly, but still + Could that the select() stuff be the reason for a ten times + slower ethernet too, e.g. scp and apt-get? + i didn't find myself neither scp nor apt-get slower, unlike vim + see strace: scp does not use select + (I haven't checked apt yet) + + +### IRC, freenode, #hurd, 2012-02-14 + + on another subject, I'm wondering how to correctly implement + select/poll with a timeout on a multiserver system :/ + i guess a timeout of 0 should imply a non blocking round-trip to + servers only + oh good, the timeout is already part of the io_select call + + +### IRC, freenode, #hurdfr, 2012-02-22 + + le gros souci de notre implé, c'est que le timeout de select est + un paramètre client + un paramètre passé directement à mach_msg + donc si tu mets un timeout à 0, y a de fortes chances que mach_msg + retourne avant même qu'un RPC puisse se faire entièrement (round-trip + client-serveur donc) + et donc quand le timeout est à 0 pour du non bloquant, ben tu + bloques pas, mais t'as pas tes évènements .. + peut-être que passer le timeout de 10ms à 10 us améliorerait + la situation. + car 10ms c'est un peut beaucoup :) + c'est l'interval timer système historique unix + et mach n'est pas préemptible + donc c'est pas envisageable en l'état + ceci dit c'est pas complètement lié + enfin si, il nous faudrait qqchose de similaire aux high res + timers de linux + enfin soit des timer haute résolution, soit un timer programmable + facilement + actuellement il n'y a que le 8254 qui est programmé, et pour + assurer un scheduling à peu près correct, il est programmé une fois, à + 10ms, et basta + donc oui, préciser 1ms ou 1us, ça changera rien à l'interval + nécessaire pour déterminer que le timer a expiré + + +### IRC, freenode, #hurd, 2012-02-27 -See [[dbus]]. + braunr: extremely dirty hack + I don't even want to detail :) + oh + does it affect vim only ? + or all select users ? + we've mostly seen it with vim + but possibly fakeroot has some issues too + it's very little probable that only vim has the issue :) + i mean, is it that dirty to switch behaviour depending on the + calling program ? + not all select users + ew :) + just those which do select({0,0}) + well sure + braunr: you guessed right :) + thanks anyway + it's probably a good thing to do currently + vim was getting me so mad i was using sshfs lately + it's better than nothing yes # See Also diff --git a/open_issues/trust_the_behavior_of_translators.mdwn b/open_issues/trust_the_behavior_of_translators.mdwn new file mode 100644 index 00000000..454c638b --- /dev/null +++ b/open_issues/trust_the_behavior_of_translators.mdwn @@ -0,0 +1,181 @@ +[[!meta copyright="Copyright © 2012 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!tag open_issue_hurd]] + +Apart from the issue of [[translators_set_up_by_untrusted_users]], here is +another problem described. + + +# IRC, freenode, #hurd, 2012-02-17 + +(Preceded by the [[memory_object_model_vs_block-level_cache]] discussion.) + + what should do Mach with a translator that doesn't clean pages in a + reasonable amount of time? + (I'm talking about pages flushed to a pager by the pageout daemon) + slpz: i don't know what it should do, but currently, it uses the + default pager + +[[default_pager]]. + + braunr: I know, but I was thinking about an alternative, for the + case in which a translator in not behaving properly + perhaps freeing the page, removing the map, and letting it die in a + segmentation fault could be an option + slpz: what if the translator can't do it properly because of + system resource exhaustion ? + (i.e. it doesn't have enough cpu time) + braunr: that's the biggest question + let's suppose that Mach selects a page, sends it to the pager for + cleaning it up, reinjects the page into the active queue, and later it + founds the page again and it's still dirty + but it needs to free some pages because memory it's really, really + scarce + Linux just sits there waiting for I/O completion for that page + (trusts its own drivers) + but we could be dealing with rogue translator... + yes + we may need some sort of "authentication" mechanism for pagers + so that "system pagers" are trusted and others not + using something like the device master port but for pagers + a special port passed to trusted pagers only + hum... that could be used to workaround the untrusted translator + crossing problem while walking a directory tree + +[[translators_set_up_by_untrusted_users]]. + + but I think differentiating between trusted and untrusted + translators was rejected for philosophical reasons + (but I'm not sure) + slpz: probably there should be something like oom killer? + braunr: even if translator is trusted it could have a bug which + make it ask more and more memory, so system have something to do with + it. Also, this way TCB is increased, so providing port for trusted + translators may hurt security. + I've read that Genode has "guarded allocators" which help resource + accounting by limiting of memory that could be used. Probably something + like this could be used in Hurd to limit translators. + I don't remember how Viengoos deals with this :-( + +[[microkernel/Viengoos]]. + + mcsim: the main feature lacking in mach is resource accounting :p + +[[resource_management_problems]]. + + mcsim: yes, I think there should be a Hurdish oom killer, paying + special attention to external pagers + +[[microkernel/mach/external_pager_mechanism]]. + + the oom killer selects untrusted processes by definition (since + pagers are in kernel) + slpz: and what is better: oom killer or resource accounting? + Under resource accounting I mean mechanism when process can't get + more resources than it is allowed. + resource accounting of course + but it's not just about that + really, how does the kernel deal when a pager refuses to honor a + paging request ? + whether it is buggy or malicious + is it really possible to keep all pagers out of the TCB ? + mcsim: we definitely want proper resource accounting in the long + run. the question is how to deal with the situation that resources are + reallocated to other tasks, so some pages have to be freed + I really don't remember how Neal proposed to deal with this + mcsim: Better: resource accounting (in which resources are accounted + to the user tasks which are requesting them, as in the Viengoos + model). Good enough an realistic: oom killer + I'm not sure an OOM killer for non-system pagers is terribly + helpful. in typical use, the vast majority of paging is done by trusted + pagers... + and without proper client resource accounting, there are enough + other ways a rogue/broken process can eat system resources -- I'm not + convinced that untrusted pagers have a major impact on the overall + situation + If pager can't free resources because of lack, for example, of cpu + time it's priority could be increased to give it second chance to free + the page. But if it doesn't manage to free resources it could be killed. + I think the current approach with default pager taking over is + good enough for dealing with untrusted pagers. the real problem are even + trusted pager frequently failing to deal with the requests + i agree with antrik + and i'm opposed to an oom killer + it's really not a proper fix for our problems + mcsim: what if needs 3 attempts before succeeding ? + +it + and increasing priority without a good reason (e.g. no priority + inversion) leads to unfairness + we don't want to deal with tricky problems like malicious pagers + using that to starve other tasks + braunr: this is just temporary decision (for example for half of + second of user time), to increase probability that task was killed not + because of it lacked resources. + mcsim: tunables should only affect the efficiency of an operation, + not its success + + +## IRC, freenode, #hurd, 2012-02-19 + + neal: the specific question is how to ensure processes free memory + fast enough when their allocation becomes lower due to resource pressure + antrik: you can't really. + antrik: the memory manager can act on the application's behalf if + the application marks pages as discardable or pagable. + antrik: if the memory manager makes an upcall to the application to + free some memory and it doesn't, you have to penalize it. + antrik: You shouldn't the process like exokernel + antrik: It's the developers fault, not the user's + antrik: What you need are controls that ensure that the user stays + in control + ...shouldn't *kill* the process... + neal: well, how can I penalize a process that eats to much + physical memory? + in the future, you don't give it as much slack memory + marking as pagable means a system pager will push them to the swap + partition? + ah, OK + yes + and you page it more aggressively, i.e., you don't give it a chance + to free memory + The situation is: + you have memory pressure + you choose a victim process and ask it to free memory + now, you need to wait + if you wait and it doesn't free memory, you give it bad karma + if you wait and it frees memory, you're good + but during that window, a bad process can delay recovery + so, in the future, we give bad processes less time + but, we still send a message asking it to free memory + we just hope it was a bug + so the major difference to the approach we have in Mach is that + instead of just redeclaring some pages as anonymous memory that will be + paged to swap by the default pager eventually if the pager in question + fails to handle them properly, we wait some time for the process to free + (any) memory, and only then start paging out some of it's pages to swap + there's also discardable memory + hm... there is some notion of "precious" pages in Mach... I wonder + whether that is also used to decide about discarding pages instead of + pushing them to swap? + antrik: A precious page is ro data that shouldn't be dropped + ah + but I guess that means non-precious RO data (such as a cache) can + be dropped without asking the pager, right? + yes + I wonder whether such a karma system can be introduced in Mach as + well to deal with problematic pagers + + +## IRC, freenode, #hurd, 2012-02-21 + + antrik: One of the main differences between Mach and Viengoos is + that in Mach servers are responsible for managing memory whereas in + Viengoos applications are primarily responsible for managing memory. diff --git a/public_hurd_boxen/xen_handling.mdwn b/public_hurd_boxen/xen_handling.mdwn index 47d92c43..d4e33ce9 100644 --- a/public_hurd_boxen/xen_handling.mdwn +++ b/public_hurd_boxen/xen_handling.mdwn @@ -1,4 +1,4 @@ -[[!meta copyright="Copyright © 2009 Free Software Foundation, Inc."]] +[[!meta copyright="Copyright © 2009, 2012 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this @@ -44,6 +44,7 @@ by typing in `host=flubber`, for example, will be enough to get access to that machine's console. /!\ TODO: How does one get the environment variables `COLUMNS` and `LINES` set -properly when using `xm console`? This is relevant for everything using -`(n)curses` -- for interactive console applications. Using `export COLUMNS=143 -LINES=44` does work, but is a manual process. +properly when using `xm console`? According to Samuel, *you don't, the xen +console doesn't have the notion of terminal size*. This is relevant for +everything using `(n)curses` -- for interactive console applications. Using +`export COLUMNS=143 LINES=44` does work, but is a manual process. -- cgit v1.2.3