diff options
Diffstat (limited to 'open_issues')
31 files changed, 4021 insertions, 32 deletions
diff --git a/open_issues/64-bit_port.mdwn b/open_issues/64-bit_port.mdwn index 797d540f..2d273ba1 100644 --- a/open_issues/64-bit_port.mdwn +++ b/open_issues/64-bit_port.mdwn @@ -1,4 +1,4 @@ -[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]] +[[!meta copyright="Copyright © 2011, 2012 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this @@ -10,7 +10,11 @@ License|/fdl]]."]]"""]] [[!tag open_issue_gnumach open_issue_mig]] -IRC, freenode, #hurd, 2011-10-16: +There is a `master-x86_64` GNU Mach branch. As of 2012-11-20, it only supports +the [[microkernel/mach/gnumach/ports/Xen]] platform. + + +# IRC, freenode, #hurd, 2011-10-16 <youpi> it'd be really good to have a 64bit kernel, no need to care about addressing space :) @@ -34,3 +38,22 @@ IRC, freenode, #hurd, 2011-10-16: <youpi> and it'd boost userland addrespace to 4GiB <braunr> yes <youpi> leaving time for a 64bit userland :) + + +# IRC, freenode, #hurd, 2012-10-03 + + <braunr> youpi: just so you know in case you try the master-x86_64 with + grub + <braunr> youpi: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=689509 + <youpi> ok, thx + <braunr> the squeeze version is fine but i had to patch the wheezy/sid one + <youpi> I actually hadn't hoped to boot into 64bit directly from grub + <braunr> youpi: there is code in viengoos that could be reused + <braunr> i've been thinking about it for a time now + <youpi> ok + <braunr> the two easiest ways are 1/ the viengoos one (a -m32 object file + converted with objcopy as an embedded loader) + <braunr> and 2/ establishing an identity mapping using 4x1 GB large pages + and switching to long mode, then jumping to c code to complete the + initialization + <braunr> i think i'll go the second way with x15, so you'll have the two :) diff --git a/open_issues/arm_port.mdwn b/open_issues/arm_port.mdwn new file mode 100644 index 00000000..2d8b9038 --- /dev/null +++ b/open_issues/arm_port.mdwn @@ -0,0 +1,238 @@ +[[!meta copyright="Copyright © 2012 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +Several people have expressed interested in a port of GNU/Hurd for the ARM +architecture. + + +# IRC, freenode, #hurd, 2012-10-09 + + <mcsim> bootinfdsds: There was an unfinished port to arm, if you're + interested. + <tschwinge> mcsim: Has that ever been published? + <mcsim> tschwinge: I don't think so. But I have an email of that person and + I think that this could be discussed with him. + + +## IRC, freenode, #hurd, 2012-10-10 + + <tschwinge> mcsim: If you have a contact to the ARM porter, could you + please ask him to post what he has? + <antrik> tschwinge: we all have the "contact" -- let me remind you that he + posted his questions to the list... + + +## IRC, freenode, #hurd, 2012-10-17 + + <mcsim> tschwinge: Hello. The person who I wrote regarding arm port of + gnumach still hasn't answered. And I don't think that he is going to + answer. + + +# IRC, freenode, #hurd, 2012-11-15 + + <matty3269> Well, I have a big interest in the ARM architecture, I worked + at ARM for a bit too, and I've written my own little OS that runs on + qemu. Is there an interest in getting hurd running on ARM? + <braunr> matty3269: not really currently + <braunr> but if that's what you want to do, sure + <tschwinge> matty3269: Well, interest -- sure!, but we don't really have + people savvy in low-level kernel implementation on ARM. I do know some + bits about it, but more about the instruction set than about its memory + architecture, for example. + <tschwinge> matty3269: But if you're feeling adventurous, by all means work + on it, and we'll try to help as we can. + <tschwinge> matty3269: There has been one previous attempt for an ARM port, + but that person never published his code, and apparently moved to a + different project. + <tschwinge> matty3269: I can help with toolchains (GCC, etc.) things for + ARM, if there's need. + <matty3269> tschwinge: That sounds great, thanks! Where would you recommend + I start (at the moment I've got Mach checked out and am trying to get it + compiled for i386) + <matty3269> I'm guessing that the Mach micro-kernel is all that would need + to be ported or are there arch-dependant bits of code in the server + processes? + <tschwinge> matty3269: + http://www.gnu.org/software/hurd/faq/system_port.html has some + information. Mach is the biggest part, yes. Then some bits in glibc and + libpthread, and even less in the Hurd libraries and servers. + <tschwinge> matty3269: Basically, you'd need equivalents for the i386 (and + similar) directories, yep. + <tschwinge> Though, you may be able to avoid some cruft in there. + <tschwinge> Does building for x86 have any issues? + <tschwinge> matty3269: How is generally your understanding of the Hurd on + Mach system architecture, and on microkernel-based systems generally, and + on Mach in particular? + <matty3269> tschwinge: yes, it seems to be progressing... I've got mig + installed and it's just compiling now + <matty3269> hmm, not too great if I'm honest, I've done mostly monolithic + kernel development so having such low-level processes, such as + scheduling, done in user-space seems a little strinage + <tschwinge> Ah, yes, MIG will need a little bit of porting, too. I can + help with that, but that's not a priority -- first you have to get Mach + to boot at all; MIG will only be needed once you need to deal with RPCs, + so user-land/kernel interaction, basically. Before, you can hack around + it. + <matty3269> tschwinge: I have been running a GNU/Hurd system for a while + now though + <tschwinge> I'm happy to tell you that the schedules is still in the + kernel. ;-) + <tschwinge> OK, good, so you know about the basic ideas. + <braunr> matty3269: there has to be machine specific stuff in user space + <braunr> for initial thread contexts for example + <matty3269> tschwinge: Ok, just got gnumach built + <braunr> but there isn't much and you can easily base your work from the + x86 implementation + <tschwinge> Yes. Mach itself is the more difficult one. + <matty3269> braunr: Yeah, looking around at things, it doesn't seem that + there will be too much work involoved in the user-space stuff + <tschwinge> braunr: Do you know off-hand whether there are some old Mach + research papers describing architecture ports? + <tschwinge> I know there are some describing the memory system (obviously), + and I/O system -- which may help matty3269 to understand the general + design/structure. + <tschwinge> We might want to identify some documents, and make a list. + <braunr> all mach related documentation i have is available here: + ftp://ftp.sceen.net/mach/ + <braunr> (also through http://) + <tschwinge> matty3269: Oh, definitely I'd suggest the Mach 3 Kernel + Principles book. That gives a good description of the Mach architecture. + <matty3269> Great, that's my weekends reading then! + <braunr> you don't need all that for a port + <matty3269> Is it possible to run the gnumach binary standalone with qemu? + <braunr> you won't go far with it + <braunr> you really need at least one program + <braunr> but sure, for a port development, it can easily be done + <braunr> i'd suggest writing a basic static application for your tests once + you reach an advanced state + <braunr> the critical parts of a port are memory and interrupts + <braunr> and memory can be particularly difficult to implement correctly + <tschwinge> matty3269: I once used QEMU's + virtual-FAT-filesystem-from-a-directory-on-the-host, and configured GRUB + to boot from that one, so it was easy to quickly reboot for kernel + development. + <braunr> but the good news is that almost every bsd system still uses a + similar interface + <tschwinge> matty3269: And, you may want to become familiar with QEMU's + built-in gdbserver, and how to connect to and use that. + <braunr> so, for example, you could base your work from the netbsd/arm pmap + module + <tschwinge> matty3269: I think that's better than starting on real + hardware. + <braunr> tschwinge: you can use -kernel with a multiboot binary now + <braunr> tschwinge: and even creating iso images is so fast it's not any + slower + <tschwinge> braunr: Yeah, I thought so, but never checked this out -- + recently I saw in qemu --help's output some »multiboot« thing flashing + by. :-) + <braunr> i think it only supports 32-bits executables though + <matty3269> braunr: Yeah, I just tried passing gnumach as the -kernel + parameter to qemu, but it segged qemu :S + <braunr> otherwise i'd be using it for x15 + <matty3269> qemu: fatal: Trying to execute code outside RAM or ROM at + 0xc0100000 + <braunr> how much ram did you give qemu ? + <matty3269> I used '-m 512' + <braunr> hum, so the -kernel option doesn't correctly implement elf loading + or something like that + <braunr> anyway, i'm not sure how well building gnumach on a non-hurd + system is supported + <braunr> so you may want to simply develop inside your VM for the time + being, and reboot + <matty3269> doing an objdump of it seems fine... + <braunr> ? + <braunr> ah, the gnumach executable is a correct elf image + <braunr> that's not the point + <matty3269> Is there particular reason that mach is linked at 0xc0100000? + <matty3269> or is that where it is expected to be in VM> + <tschwinge> That's my understanding. + <braunr> kernels commmonly sti at high addresses + <braunr> that's the "standard" 3G/1G split for user/kernel space + <matty3269> I think Linux sits at a similar VA for 32-bit + <braunr> no + <matty3269> Oh, I thought it did, I know it does on ARM, the kernel is + mapped to 0xc000000 + <braunr> i don't know arm, but are you sure about this number ? + <braunr> seems to lack a 0 + <matty3269> Ah, yes sorry + <matty3269> so 0xC0000000 + <braunr> 0xc0100000 is just 1 MiB above it + <braunr> the .text section of linux on x86 actually starts at c1000000 + (above 16 MiB, certainly to preserve as much dma-able memory since modern + machines now have a lot more) + <tschwinge> Surely the GRUB multiboot loader is not that much used/tested? + <braunr> unfortunately, no + <braunr> matty3269: FYI, my kernel starts at 0xfff00000 :p + <matty3269> braunr: hmm, you could be right, I know it's arround there + someone + <matty3269> somewhere* + <matty3269> braunr: that's an interesting address :S + <matty3269> braunr: is that the PA address of the kernel or the VA inside a + process? + <braunr> the VA + <matty3269> hmm + <braunr> it can't be a PA + <braunr> such high addresses are normally device memory + <braunr> but don't worry, i have good reasons to have chosen this address + :) + <matty3269> so with gnumach, does the boot-up sequence use PIC until VM is + active and the kernel mapped to the linking address? + <braunr> no + <braunr> actually i'm not certain of the details + <braunr> but there is no PIC + <braunr> either special sections are linked at physical addresses + <braunr> or it relies on the fact that all executable code uses near jumps + <braunr> and uses offsets when accessing data + <braunr> (which is why the kernel text is at 3 GiB + 1 MiB, and not 3 GiB) + <matty3269> hmm, + <matty3269> gah, I need to learn x86 + <braunr> that would certainly help + <matty3269> I've just had a look at boothdr.S; I presume that there must be + something else that is executed before this to setup VM, switch to 32-bit + more etc...? + <matty3269> mode* + <braunr> have a look at the multiboot specification + <braunr> it sets protected mode + <braunr> but not paging + <braunr> (i mean, the boot loader does, before passing control to the + kernel) + <matty3269> Ah, I see + <tschwinge> matty3269: Multiboot should be documented in the GRUB package. + <matty3269> tschwinge: yep, got that, thanks + <matty3269> hmm, I can't find any reference to CR0 in gnumach so paging + must be enabled elsewhere + <matty3269> oh wait, found it + <braunr> $ git grep -i '\<cr0\>' + <braunr> i386/i386/proc_reg.h, linux/dev/include/asm-i386/system.h + <braunr> although i suspect only the first one is relevant to us :) + <matty3269> Yeah, that seems to have the setup code for paging :) + <matty3269> I'm still confused how it could run that without paging or PIC + though + <matty3269> I think I need to watch the boot sequence with qemu + <braunr> it's a bit tricky + <braunr> but actually simple + <braunr> 00:44 < braunr> either special sections are linked at physical + addresses + <braunr> 00:44 < braunr> or it relies on the fact that all executable code + uses near jumps + <braunr> that's really all there is + <braunr> but you shouldn't worry about that i suppose, as the protocol + between the boot loader and an arm kernel will certainly not be the saem + <braunr> same* + <matty3269> indeed, ARM is tricky because memory maps are vastly differnt + on every platform + + +## IRC, freenode, #hurd, 2012-11-21 + + <matty3269> Well, I have a ARM gnumach kernel compiled. It just doesn't + run! :) + <braunr> matty3269: good luck :) diff --git a/open_issues/code_analysis/discussion.mdwn b/open_issues/code_analysis/discussion.mdwn index f8a0657d..6f2acc08 100644 --- a/open_issues/code_analysis/discussion.mdwn +++ b/open_issues/code_analysis/discussion.mdwn @@ -42,3 +42,20 @@ License|/fdl]]."]]"""]] tool, please add it to open_issues/code_analysis.mdwn <antrik> (I guess we should have a "proper" page listing useful debugging tools...) + + +## IRC, freenode, #hurd, 2012-09-03 + + <mcsim> hello. Has anyone tried some memory debugging tools like duma or + dmalloc with hurd? + <braunr> mcsim: yes, but i couldn't + <braunr> i tried duma, and it crashes, probably because of cthreads :) + + +## IRC, freenode, #hurd, 2012-09-08 + + <mcsim> hello. What static analyzer would you suggest (probably you have + tried it for hurd already)? + <braunr> mcsim: if you find some good free static analyzer, let me know :) + <pinotree> a simple one is cppcheck + <mcsim> braunr: I'm choosing now between splint and adlint diff --git a/open_issues/console_tty1.mdwn b/open_issues/console_tty1.mdwn new file mode 100644 index 00000000..614c02c9 --- /dev/null +++ b/open_issues/console_tty1.mdwn @@ -0,0 +1,151 @@ +[[!meta copyright="Copyright © 2012 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!tag open_issue_hurd]] + +Seen in context of [[libpthread]], but probably not directly related to it. + + +# IRC, freenode, #hurd, 2012-08-30 + + <gnu_srs> Do you also experience a frozen hurd console? + <braunr> yes + <braunr> i didn't check but i'm almost certain it's a bug in my branch + <braunr> the replacement of condition_implies was a bit hasty in some + places + <braunr> this is why i want to rework it separately + + +## IRC, freenode, #hurd, 2012-09-03 + + <gnu_srs> braunr: Did you find the cause of the Hurd console freeze for + your libpthread branch? + <braunr> gnu_srs: like i said, a bug + <braunr> probably in the replacement of condition_implies + <braunr> i rewrote that part in libpipe and it no works + <braunr> now* + + <braunr> gnu_srs: the packages have been updated + <braunr> and these apparently fix the hurd console issue correctly + +## IRC, freenode, #hurd, 2012-09-04 + + <braunr> gnu_srs: this hurd console problem isn't fixed + <braunr> it seems to be due to a race condition that only affects the first + console + <braunr> and by reading the code i can't see how it can even work oO + <gnu_srs> braunr: just rebooted, tty1 is still locked, tty2-6 works. And + the floppy error stays (maybe a kvm bug??) + <braunr> the floppy error is probably a kvm bug as we discussed + <braunr> the tty1 locked isn't + <braunr> i have it too + <braunr> it seems to be a bug in the hurd console server + <braunr> which is started by tty1, but for some reason, doesn't return a + valid answer at init time + <braunr> if you kill the term handling tty1, you'll see your first tty + starts working + <braunr> for now i'll try a hack that starts the hurd console server before + the clients + <braunr> doesn't work eh + <braunr> tty1 is the only one started before runttys + <braunr> indeed, fixing /etc/hurd/runsystem.gnu so that it doesn't touch + tty1 fixes the problem + <gnu_srs> do you have an explanation? + <braunr> not really no + <braunr> but it will do for now + <pinotree> samuel added that with the comment above, apparently to + workaround some other issue of the hurd console + <braunr> i'm pretty sure the bug is already visible with cthreads + <braunr> the first console always seems weird compared to the others + <braunr> with a login: at the bottom of the screen + <braunr> didn't you notice ? + <pinotree> sometimes, but not often + <braunr> typical of a race + <pinotree> (at least for me) + <braunr> pthreads being slightly slower exposes it + <gnu_srs> confirmed, it works by commenting out touch /dev/tty1 + <gnu_srs> yes, the login is at the bottom of the screen, sometimes one in + the upper part too:-/ + <braunr> so we have a new open issue + <braunr> hm + <braunr> exiting the first tty doesn't work + <braunr> which makes me think of the issue we have with screen + <gnu_srs> confirmed! + <braunr> also, i don't understand why we have getty on tty1, but nothing on + the other terminals + <braunr> something is really wrong with terminals on hurd *sigh* + <braunr> ah, the problem looks like it happens when getty attempts to + handle a terminal ! + <braunr> gnu_srs: anyway, i don't think it should be blocking for the + conversion to pthreads + <braunr> but it would be better if someone could assign himself that bug + <braunr> :) + + +## IRC, freenode, #hurd, 2012-09-05 + + <antrik> braunr: the login at the bottom of the screen if from the Mach + console I believe + <braunr> antrik: well maybe, but it shouldn't be there anyway + <antrik> braunr: why not? + <antrik> it's confusing, but perfectly correct as far as I can tell + <braunr> antrik: two login: on the same screen ? + <braunr> antrik: it's even more confusing when comparing with other ttys + <antrik> I mean it's correct from a techincal point of view... I'm not + saying it's helpful for the user ;-) + <braunr> i'm not even sure it's correct + <braunr> i've double checked the pthreads patch and didn't see anything + wrong there + <antrik> perhaps the startup of the Hurd console could be delayed a bit to + make sure it happens after the Mach console login is done printing + stuff... + <braunr> why are our gettys stubs ? + <antrik> I never understood the point of a getty TBH... + <braunr> well you need to communicate to something behind your terminal, + don't you ? + <braunr> with* + <antrik> why not just launch the login program or login shell right away? + <braunr> what if you want something else than a login program ? + <antrik> like what? + <antrik> and how would a getty help with that? + <braunr> an ascii-art version of star wars + <braunr> it would be configured to start something else + <antrik> and why does that need a getty? why not just start something else + directly? + <braunr> well getty is about the serial line parameters actually + <antrik> yeah, I had a vague understanding that it has something to do with + serial lines (or real TTY lines)... but we hardly need that on local + cosoles :-) + <antrik> consoles + <braunr> right + <braunr> but then why even bother with something like runttys + <antrik> well, something has to start the terminal servers?... + <antrik> I might be confused though + <braunr> what i don't understand is + <braunr> why is there no getty at startup, whereas they are spawned when + logging off ? + <antrik> they are? that's fascinating indeed ;-) + <braunr> does it behave like this on your old version ? + <antrik> I don't remember ever having seen a "getty" process on my Hurd + systems... + <braunr> can you log on e.g. tty2 and then log out, and see ? + <antrik> OTOH, I'm hardly ever using consoles... + <antrik> hm... I think that should be possible remotely using the console + client with ncurses driver? never tried that... + <braunr> ncurses driver ? + <braunr> hum i don't know, never tried either + <braunr> and it may add other bugs :p + <braunr> better wait to be close to the machine + <antrik> hehe + <antrik> well, it's a good excuse for trying the ncurses driver ;-) + <antrik> hrm + <antrik> alien:~# console -d ncursesw + <antrik> console: loading driver `ncursesw' failed: Gratuitous error + <antrik> I guess nobody tested that stuff in years diff --git a/open_issues/console_vs_xorg.mdwn b/open_issues/console_vs_xorg.mdwn new file mode 100644 index 00000000..ffefb389 --- /dev/null +++ b/open_issues/console_vs_xorg.mdwn @@ -0,0 +1,31 @@ +[[!meta copyright="Copyright © 2012 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!tag open_issue_glibc open_issue_hurd]] + + +# IRC, freenode, #hurd, 2012-08-30 + + <gean> braunr: I have some errors about keyboard in the xorg log, but + keyboard is working on the X + <braunr> gean: paste the log somewhere please + <gean> braunr: http://justpaste.it/19jb + [...] + [1987693.272] Fatal server error: + [1987693.272] Cannot set event mode on keyboard (Inappropriate ioctl for device) + [...] + [1987693.292] FatalError re-entered, aborting + [1987693.302] can't reset keyboard mode (Inappropriate ioctl for device) + [...] + <braunr> hum + <braunr> it looks like the xorg keyboard driver evolved and now uses ioctls + our drivers don't implement + <braunr> thanks for the report, we'll have to work on this + <braunr> i'm not sure the problem is new actually diff --git a/open_issues/dde.mdwn b/open_issues/dde.mdwn index 8f00c950..5f6fcf6a 100644 --- a/open_issues/dde.mdwn +++ b/open_issues/dde.mdwn @@ -17,6 +17,9 @@ Still waiting for interface finalization and proper integration. [[!toc]] +See [[user-space_device_drivers]] for generic discussion related to user-space +device drivers. + # Disk Drivers @@ -25,24 +28,6 @@ Not yet supported. The plan is to use [[libstore_parted]] for accessing partitions. -## Booting - -A similar problem is described in -[[community/gsoc/project_ideas/unionfs_boot]], and needs to be implemented. - - -### IRC, freenode, #hurd, 2012-07-17 - - <bddebian> OK, here is a stupid question I have always had. If you move - PCI and disk drivers in to userspace, how do do initial bootstrap to get - the system booting? - <braunr> that's hard - <braunr> basically you make the boot loader load all the components you - need in ram - <braunr> then you make it give each component something (ports) so they can - communicate - - # Upstream Status @@ -68,6 +53,33 @@ At the microkernel davroom at [[community/meetings/FOSDEM_2012]]: <antrik> (both from the Dresdem L4 group) +### IRC, freenode, #hurd, 2012-08-12 + + <antrik> + http://genode.org/documentation/release-notes/12.05#Re-approaching_the_Linux_device-driver_environment + <antrik> I wonder whether the very detailed explanation was prompted by our + DDE discussions at FOSDEM... + <pinotree> antrik: one could think about approaching them to develop the + common dde libs + dde_linux together + <antrik> pinotree: that's what I did at FOSDEM -- they weren't interested + <pinotree> antrik: this year's one? why weren't they? + <pinotree> maybe at that time dde was not integrated properly yet (netdde + is just few months "old") + <braunr> do you really consider it integrated properly ? + <pinotree> no, but a bit better than last year + <antrik> I don't see what our integration has to do with anything... + <antrik> they just prefer hacking thing ad-hoc than having some central + usptream + <pinotree> the helenos people? + <antrik> err... how did helenos come into the picture?... + <antrik> we are talking about genode + <pinotree> sorry, confused wrong microkernel OS + <antrik> actually, I don't remember exactly who said what; there were + people from genode there and from one or more other DDE projects... but + none of them seemed interested in a common DDE + <antrik> err... one or two other L4 projects + + ## IRC, freenode, #hurd, 2012-02-19 <youpi> antrik: do we know exactly which DDE version Zheng Da took as a @@ -91,6 +103,12 @@ At the microkernel davroom at [[community/meetings/FOSDEM_2012]]: apparently have both USB and SATA working with some variant of DDE +### IRC, freenode, #hurd, 2012-11-03 + + <mcsim> DrChaos: there is DDEUSB framework for L4. You could port it, if + you want. It uses Linux 2.6.26 usb subsystem. + + # IRC, OFTC, #debian-hurd, 2012-02-15 <pinotree> i have no idea how the dde system works @@ -457,6 +475,59 @@ At the microkernel davroom at [[community/meetings/FOSDEM_2012]]: <antrik> hm... good point +# IRC, freenode, #hurd, 2012-08-14 + + <braunr> it's amazing how much code just gets reimplemented needlessly ... + <braunr> libddekit has its own mutex, condition, semaphore etc.. objects + <braunr> with the *exact* same comment about the dequeueing-on-timeout + problem found in libpthread + <braunr> *sigh* + + + +# IRC, freenode, #hurd, 2012-08-18 + + <braunr> hum, leaks and potential deadlocks in libddekit/thread.c :/ + + +# IRC, freenode, #hurd, 2012-08-18 + + <braunr> nice, dde relies on a race to start .. + + +# IRC, freenode, #hurd, 2012-08-18 + + <braunr> hm looks like if netdde crashes, the kernel doesn't handle it + cleanly, and we can't attach another netdde instance + +[[!message-id "877gu8klq3.fsf@kepler.schwinge.homeip.net"]] + + +# IRC, freenode, #hurd, 2012-08-21 + +In context of [[libpthread]]. + + <braunr> hm, i thought my pthreads patches introduced a deadlock, but + actually this one is present in the current upstream/debian code :/ + <braunr> (the deadlock occurs when receiving data fast with sftp) + <braunr> either in netdde or pfinet + + +# DDE for Filesystems + +## IRC, freenode, #hurd, 2012-10-07 + + * pinotree wonders whether the dde layer could aldo theorically support + also file systems + <antrik> pinotree: yeah, I also brought up the idea of creating a DDE + extension or DDE-like wrapper for Linux filesystems a while back... don't + know enough about it though to decide whether it's doable + <antrik> OTOH, I'm not sure it would be worthwhile. we still should + probably have a native (not GPLv2-only) implementation for the main FS at + least; so the wrapper would only be for accessing external + partitions/media... + + # virtio diff --git a/open_issues/exec_leak.mdwn b/open_issues/exec_leak.mdwn new file mode 100644 index 00000000..b58d2c81 --- /dev/null +++ b/open_issues/exec_leak.mdwn @@ -0,0 +1,57 @@ +[[!meta copyright="Copyright © 2012 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!tag open_issue_hurd]] + + +# IRC, freenode, #hurd, 2012-08-11 + + <braunr> the exec servers seems to leak a lot + <braunr> server* + <braunr> exec now uses 109M on darnassus + <braunr> it really leaks a lot + <pinotree> only 109mb? few months ago, exec on exodar was taking more than + 200mb after few days of uptime with builds done + <braunr> i wonder how much it takes on the buildds + + +# IRC, freenode, #hurd, 2012-08-17 + + <braunr> the exec leak is tricky + <braunr> bddebian: btw, look at the TODO file in the hurd source code + <braunr> bddebian: there is a not from thomas bushnell about that + <braunr> "*** Handle dead name notifications on execserver ports. ! + <braunr> not sure it's still a todo item, but it might be worth checking + <bddebian> braunr: diskfs_execboot_class = ports_create_class (0, 0); + This is what would need to change right? It should call some cleanup + routine in the first argument? + <bddebian> Would be ideal if it could just use deadboot() from exec. + <braunr> bddebian: possible + <braunr> bddebian: hum execboot, i'm not so sure + <bddebian> Execboot is the exec task, no? + <braunr> i don't know what execboot is + <bddebian> It's from libdiskfs + <braunr> but "diskfs_execboot_class" looks like a class of ports used at + startup only + <braunr> ah + <braunr> then it's something run in the diskfs users ? + <bddebian> yes + <braunr> the leak is in exec + <braunr> if clients misbehave, it shouldn't affect that server + <bddebian> That's a different issue, this was about the TODO thing + <braunr> ah + <braunr> i don't know + <bddebian> Me either :) + <bddebian> For the leak I'm still focusing on do-bunzip2 but I am baffled + at my results.. + <braunr> ? + <bddebian> Where my counters are zero if I always increment on different + vars but wild freaking numbers if I increment on malloc and decrement on + free diff --git a/open_issues/fork_deadlock.mdwn b/open_issues/fork_deadlock.mdwn index 6b90aa0a..c1fa9208 100644 --- a/open_issues/fork_deadlock.mdwn +++ b/open_issues/fork_deadlock.mdwn @@ -63,3 +63,34 @@ Another one in `dash`: stopped = 1 i = 6 [...] + + +# IRC, OFTC, #debian-hurd, 2012-11-24 + + <youpi> the lockups are about a SIGCHLD which gets lost + <pinotree> ah, ok + <youpi> which makes bash spin + <pinotree> is that happening more often recently, or it's just something i + just noticed? + <youpi> it's more often recently + <youpi> where "recently" means "some months ago" + <youpi> I didn't notice exactly when + <pinotree> i see + <youpi> it's at most since june, apparently + <youpi> (libtool managed to build without a fuss, while now it's a pain) + <youpi> (libtool building is a good test, it seems to be triggering quite + reliably) + + +## IRC, freenode, #hurd, 2012-11-27 + + <youpi> we also have the shell wait issue + <youpi> it's particularly bad on libtool calls + <youpi> the libtool package (with testsuite) is a good reproducer :) + <youpi> the symptom is shell scripts eating CPU + <youpi> busy-waiting for a SIGCHLD which never gets received + <braunr> that could be what i got + <braunr> + http://www.gnu.org/software/hurd/microkernel/mach/gnumach/memory_management.html + <braunr> last part + <youpi> perhaps watch has the same issue as the shell, yes diff --git a/open_issues/gcc/pie.mdwn b/open_issues/gcc/pie.mdwn new file mode 100644 index 00000000..a4598d1e --- /dev/null +++ b/open_issues/gcc/pie.mdwn @@ -0,0 +1,40 @@ +[[!meta copyright="Copyright © 2012 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!meta title="Position-Independent Executables"]] + +[[!tag open_issue_gcc]] + + +# IRC, freenode, #debian-hurd, 2012-11-08 + + <pinotree> tschwinge: i'm not totally sure, but it seems the pie options + for gcc/ld are causing issues + <pinotree> namely, producing executables that sigsegv straight away + <tschwinge> pinotree: OK, I do remember some issues about these, too. + <tschwinge> Also for -pg. + <tschwinge> These have in common that they use different crt*.o files for + linking. + <tschwinge> Might well be there's some bugs there. + <pinotree> one way is to try the w3m debian build: the current build + configuration enables also pie, which in turns makes an helper executable + (mktable) sigsegv when invoked + <pinotree> if «,-pie» is appended to the DEB_BUILD_MAINT_OPTIONS variable + in debian/rules, pie is not added and the resulting mktable runs + correctly + + +## IRC, OFTC, #debian-hurd, 2012-11-09 + + <pinotree> youpi: ah, as i noted to tschwinge earlier, it seems -fPIE -pie + miscompile stuff + <youpi> uh + <pinotree> this causes the w3m build failure and (indirectly, due to elinks + built with -pie) aptitude diff --git a/open_issues/glibc.mdwn b/open_issues/glibc.mdwn index e94a4f1f..3b4e5efa 100644 --- a/open_issues/glibc.mdwn +++ b/open_issues/glibc.mdwn @@ -81,6 +81,35 @@ Last reviewed up to the [[Git mirror's e80d6f94e19d17b91e3cd3ada7193cc88f621feb Might simply be a missing patch(es) from master. + * `--disable-multi-arch` + + IRC, freenode, #hurd, 2012-11-22 + + <pinotree> tschwinge: is your glibc build w/ or w/o multiarch? + <tschwinge> pinotree: See open_issues/glibc: --disable-multi-arch + <pinotree> ah, because you do cross-compilation? + <tschwinge> No, that's natively. + <tschwinge> There is also a not of what happened in cross-gnu when I + enabled multi-arch. + <tschwinge> No idea whether that's still relevant, though. + <pinotree> EPARSE + <tschwinge> s%not%note + <tschwinge> Better? + <pinotree> yes :) + <tschwinge> As for native builds: I guess I just didn't (want to) play + with it yet. + <pinotree> it is enabled in debian since quite some time, maybe other + i386/i686 patches (done for linux) help us too + <tschwinge> I though we first needed some CPU identification + infrastructe before it can really work? + <tschwinge> I thought [...]. + <pinotree> as in use the i686 variant as runtime automatically? i guess + so + <tschwinge> I thought I had some notes about that, but can't currently + find them. + <tschwinge> Ah, I probably have been thinking about open_issues/ifunc + and open_issues/libc_variant_selection. + * --build=X `long double` test: due to `cross_compiling = maybe` wants to execute a @@ -350,6 +379,24 @@ Last reviewed up to the [[Git mirror's e80d6f94e19d17b91e3cd3ada7193cc88f621feb <pinotree> like posix/tst-waitid.c, you mean? <youpi> yes + * `getconf` things + + IRC, freenode, #hurd, 2012-10-03 + + <pinotree> getconf -a | grep CACHE + <Tekk_> pinotree: I hate spoiling data, but 0 :P + <pinotree> had that feeling, but wanted to be sure -- thanks! + <Tekk_> http://dpaste.com/809519/ + <Tekk_> except for uhh + <Tekk_> L4 linesize + <Tekk_> that didn't have any number associated + <pinotree> weird + <Tekk_> I actually didn't even know that there was L4 cache + <pinotree> what do you get if you run `getconf + LEVEL4_CACHE_LINESIZE`? + <Tekk_> pinotree: undefined + <pinotree> expected, given the output above + For specific packages: * [[octave]] @@ -384,6 +431,270 @@ Last reviewed up to the [[Git mirror's e80d6f94e19d17b91e3cd3ada7193cc88f621feb * `sysdeps/unix/sysv/linux/syslog.c` + * `fsync` on a pipe + + IRC, freenode, #hurd, 2012-08-21: + + <braunr> pinotree: i think gnu_srs spotted a conformance problem in + glibc + <pinotree> (only one?) + <braunr> pinotree: namely, fsync on a pipe (which is actually a + socketpair) doesn't return EINVAL when the "operation not supported" + error is returned as a "bad request message ID" + <braunr> pinotree: what do you think of this case ? + <pinotree> i'm far from an expert on such stuff, but seems a proper E* + should be returned + <braunr> (there also is a problem in clisp falling in an infinite loop + when trying to handle this, since it uses fsync inside the error + handling code, eww, but we don't care :p) + <braunr> basically, here is what clisp does + <braunr> if fsync fails, and the error isn't EINVAL, let's report the + error + <braunr> and reporting the error in turn writes something on the + output/error stream, which in turn calls fsync again + <pinotree> smart + <braunr> after the stack is exhausted, clisp happily crashes + <braunr> gnu_srs: i'll alter the clisp code a bit so it knows about our + mig specific error + <braunr> if that's the problem (which i strongly suspect), the solution + will be to add an error conversion for fsync so that it returns + EINVAL + <braunr> if pinotree is willing to do that, he'll be the only one + suffering from the dangers of sending stuff to the glibc maintainers + :p + <pinotree> that shouldn't be an issue i think, there are other glibc + hurd implementations that do such checks + <gnu_srs> does fsync return EINVAL for other OSes? + <braunr> EROFS, EINVAL + <braunr> fd is bound to a special file which does not + support synchronization. + <braunr> obviously, pipes and sockets don't + <pinotree> + http://pubs.opengroup.org/onlinepubs/9699919799/functions/fsync.html + <braunr> so yes, other OSes do just that + <pinotree> now that you speak about it, it could be the failure that + the gnulib fsync+fdatasync testcase have when being run with `make + check` (although not when running as ./test-foo) + <braunr> hm we may not need change glibc + <braunr> clisp has a part where it defines a macro IS_EINVAL which is + system specific + <braunr> (but we should change it in glibc for conformance anyway) + <braunr> #elif defined(UNIX_DARWIN) || defined(UNIX_FREEBSD) || + defined(UNIX_NETBSD) || defined(UNIX_OPENBSD) #define IS_EINVAL_EXTRA + ((errno==EOPNOTSUPP)||(errno==ENOTSUP)||(errno==ENODEV)) + <pinotree> i'd rather add nothing to clisp + <braunr> let's see what posix says + <braunr> EINVAL + <braunr> so right, we should simply convert it in glibc + <gnu_srs> man fsync mentions EINVAL + <braunr> man pages aren't posix, even if they are usually close + <gnu_srs> aha + <pinotree> i think checking for MIG_BAD_ID and EOPNOTSUPP (like other + parts do) will b enough + <pinotree> *be + <braunr> gnu_srs: there, it finished correctly even when piped + <gnu_srs> I saw that, congrats! + <braunr> clisp is quite tricky to debug + <braunr> i never had to deal with a program that installs break points + and handles segfaults itself in order to implement growing stacks :p + <braunr> i suppose most interpreters do that + <gnu_srs> So the permanent change will be in glibc, not clisp? + <braunr> yes + + IRC, freenode, #hurd, 2012-08-24: + + <gnu_srs1> pinotree: The changes needed for fsync.c is at + http://paste.debian.net/185379/ if you want to try it out (confirmed + with rbraun) + <youpi> I agree with the patch, posix indeed documents einval as the + "proper" error value + <pinotree> there's fdatasync too + <pinotree> other places use MIG_BAD_ID instead of EMIG_BAD_ID + <braunr> pinotree: i assume that if you're telling us, it's because + they have different values + <pinotree> braunr: tbh i never seen the E version, and everywhere in + glibc the non-E version is used + <gnu_srs1> in sysdeps/mach/hurd/bits/errno.h only the E version is + defined + <pinotree> look in gnumach/include/mach/mig_errors.h + <pinotree> (as the comment in errno.h say) + <gnu_srs1> mig_errors.h yes. Which comment: from errors.h: /* Errors + from <mach/mig_errors.h>. */ and then the EMIG_ stuff? + <gnu_srs1> Which one is used when building libc? + <gnu_srs1> Answer: At least in fsync.c errno.h is used: #include + <errno.h> + <gnu_srs1> Yes, fdatasync.c should be patched too. + <gnu_srs1> pinotree: You are right: EMIG_ or MIG_ is confusing. + <gnu_srs1> /usr/include/i386-gnu/bits/errno.h: /* Errors from + <mach/mig_errors.h>. */ + <gnu_srs1> /usr/include/hurd.h:#include <mach/mig_errors.h> + + IRC, freenode, #hurd, 2012-09-02: + + <antrik> braunr: regarding fsync(), I agree that EOPNOTSUPP probably + should be translated to EINVAL, if that's what POSIX says. it does + *not* sound right to translate MIG_BAD_ID though. the server should + explicitly return EOPNOTSUPP, and that's what the default trivfs stub + does. if you actually do see MIG_BAD_ID, there must be some other + bug... + <braunr> antrik: right, pflocal doesn't call the trivfs stub for socket + objects + <braunr> trivfs_demuxer is only called by the pflocal node demuxer, for + socket objects it's another call, and i don't think it's the right + thing to call trivfs_demuxer there either + <pinotree> handling MAG_BAD_ID isn't a bad idea anyway, you never know + what the underlying server actually implements + <pinotree> (imho) + <braunr> for me, a bad id is the same as a not supported operation + <pinotree> ditto + <pinotree> from fsync's POV, both the results are the same anyway, ie + that the server does not support a file_sync operation + <antrik> no, a bad ID means the server doesn't implement the protocol + (or not properly at least) + <antrik> it's usually a bug IMHO + <antrik> there is a reason we have EOPNOTSUPP for operations that are + part of a protocol but not implemented by a particular server + <pinotree> antrik: even if it could be the case, there's no reason to + make fsync fail anyway + <antrik> pinotree: I think there is. it indicates a bug, which should + not be hidden + <pinotree> well, patches welcome then... + <antrik> thing is, if sock objects are actually not supposed to + implement the file interface, glibc shouldn't even *try* to call + fsync on them + <pinotree> how? + <pinotree> i mean, can you check whether the file interface is not + implemented, without doing a roundtrip^ + <pinotree> ? + <antrik> well, the sock objects are not files, i.e. they were *not* + obtained by file_name_lookup(), but rather a specific call. so glibc + actually *knows* that they are not files. + <braunr> antrik: this way of thinking means we need an "fd" protocol + <braunr> so that objects accessed through a file descriptor implement + all fd calls + <antrik> now I wonder though whether there are conceivable use cases + where it would make sense for objects obtained through the socket + call to optionally implement the file interface... + <antrik> which could actually make sense, if libc lets through other + file calls as well (which I guess it does, if the sock ports are + wrapped in normal fd structures?) + <braunr> antrik: they are + <braunr> and i'd personally be in favor of such an fd protocol, even if + it means implementing stubs for many useless calls + <braunr> but the way things are now suggest a bad id really means an + operation is simply not supported + <antrik> the question in this case is whether we should make the file + protocol mandatory for anything that can end up in an FD; or whether + we should keep it optional, and add the MIG_BAD_ID calls to *all* FD + operations + <antrik> (there is no reason for fsync to be special in this regard) + <braunr> yes + <antrik> braunr: BTW, I'm rather undecided whether the right approach + is a) requiring an FD interface collection, b) always checking + MIG_BAD_ID, or perhaps c) think about introducing a mechanism to + explicitly query supported interfaces... + + IRC, freenode, #hurd, 2012-09-03: + + <braunr> antrik: querying interfaces sounds like an additional penalty + on performance + <antrik> braunr: the query usually has to be done only once. in fact it + could be integrated into the name lookup... + <braunr> antrik: once for every object + <braunr> antrik: yes, along with the lookup would be a nice thing + + [[!message-id "1351231423.8019.19.camel@hp.my.own.domain"]]. + + * `t/no-hp-timing` + + IRC, freenode, #hurd, 2012-11-16 + + <pinotree> tschwinge: wrt the glibc topgit branch t/no-hp-timing, + couldn't that file be just replaced by #include + <sysdeps/generic/hp-timing.h>? + + * `flockfile`/`ftrylockfile`/`funlockfile` + + IRC, freenode, #hurd, 2012-11-16 + + <pinotree> youpi: uhm, in glibc we use + stdio-common/f{,try,un}lockfile.c, which do nothing (as opposed to eg + the nptl versions, which do lock/trylock/unlock); do you know more + about them? + <youpi> pinotree: ouch + <youpi> no, I don't know + <youpi> well, I do know what they're supposed to do + <pinotree> i'm trying fillig them, let's see + <youpi> but not why we don't have them + <youpi> (except that libpthread is "recent") + <youpi> yet another reason to build libpthread in glibc, btw + <youpi> oh, but we do provide lockfile in libpthread, don't we ? + <youpi> pinotree: yes, and libc has weak variants, so the libpthread + will take over + <pinotree> youpi: sure, but that in stuff linking to pthreads + <pinotree> if you do a simple application doing eg main() { fopen + + fwrite + fclose }, you get no locking + <youpi> so? + <youpi> if you don't have threads, you don't need locks :) + <pinotree> ... unless there is some indirect recursion + <youpi> ? + <pinotree> basically, i was debugging why glibc tests with mtrace() and + ending with muntrace() would die (while tests without muntrace call + wouldn't) + <youpi> well, I still don't see what a lock will bring + <pinotree> if you look at the muntrace implementation (in + malloc/mtrace.c), basically fclose can trigger a malloc hook (because + of the free for the FILE*) + <youpi> either you have threads, and it's need, or you don't, and it's + a nop + <youpi> yes, and ? + <braunr> does the signal thread count ? + <youpi> again, in linux, when you don't have threads, the lock is a nop + <youpi> does the signal thread use IO ? + <braunr> that's the question :) + <braunr> i hope not + <youpi> IIRC the signal thread just manages signals, and doesn't + execute the handler itself + <braunr> sure + <braunr> i was more thinking about debug stuff + <youpi> can't hurt to add them anyway, but let me still doubt that it'd + fix muntrace, I don't see why it would, unless you have threads + <pinotree> that's what i'm going next + <pinotree> pardon, it seems i got confused a bit + <pinotree> it'd look like a genuine muntrace bug (muntrace → fclose → + free hook → lock lock → fprint (since the FILE is still set) → malloc + → malloc hook → lock lock → spin) + <pinotree> at least i got some light over the flockfile stuff, thanks + ;) + <pinotree> youpi: otoh, __libc_lock_lock (etc) are noop in the base + implementation, while doing real locks on hurd in any case, and on + linux only if nptl is loaded, it seems + <pinotree> that would explain why on linux you get no deadlock + <youpi> unless using nptl, that is? + <pinotree> hm no, even with pthread it works + <pinotree> but hey, at least the affected glibc test now passes + <pinotree> will maybe try to do investigation on why it works on linux + tomorrow + + [[!message-id "201211172058.21035.toscano.pino@tiscali.it"]]. + + * `t/pagesize` + + IRC, freenode, #hurd, 2012-11-16 + + <pinotree> tschwinge: somehow related to your t/pagesize branch: due to + the fact that EXEC_PAGESIZE is not defined on hurd, libio/libioP.h + switches the allocation modes from mmap to malloc + + * `LD_DEBUG` + + IRC, freenode, #hurd, 2012-11-22 + + <pinotree> woot, `LD_DEBUG=libs /bin/ls >/dev/null` prints stuff and + then sigsegv + <tschwinge> Yeah, that's known for years... :-D + <tschwinge> Probably not too difficult to resolve, though. + * Verify baseline changes, if we need any follow-up changes: * a11ec63713ea3903c482dc907a108be404191a02 @@ -559,6 +870,11 @@ Last reviewed up to the [[Git mirror's e80d6f94e19d17b91e3cd3ada7193cc88f621feb * *baseline* * [high] `sendmmsg` usage, c030f70c8796c7743c3aa97d6beff3bd5b8dcd5d -- need a `ENOSYS` stub. + * ea4d37b3169908615b7c17c9c506c6a6c16b3a26 -- IRC, freenode, #hurd, + 2012-11-20, pinotree: »tschwinge: i agree on your comments on + ea4d37b3169908615b7c17c9c506c6a6c16b3a26, especially since mach's + sleep.c is buggy (not considers interruption, extra time() (= RPC) + call)«. # Build diff --git a/open_issues/gnumach_page_cache_policy.mdwn b/open_issues/gnumach_page_cache_policy.mdwn index 375e153b..d128c668 100644 --- a/open_issues/gnumach_page_cache_policy.mdwn +++ b/open_issues/gnumach_page_cache_policy.mdwn @@ -771,3 +771,15 @@ License|/fdl]]."]]"""]] ## IRC, freenode, #hurd, 2012-07-26 <braunr> hm i killed darnassus, probably the page cache patch again + + +## IRC, freenode, #hurd, 2012-09-19 + + <youpi> I was wondering about the page cache information structure + <youpi> I guess the idea is that if we need to add a field, we'll just + define another RPC? + <youpi> braunr: ↑ + <braunr> i've done that already, yes + <braunr> youpi: have a look at the rbraun/page_cache gnumach branch + <youpi> that's what I was referring to + <braunr> ok diff --git a/open_issues/gnumach_vm_map_entry_forward_merging.mdwn b/open_issues/gnumach_vm_map_entry_forward_merging.mdwn index 90137766..7739f4d1 100644 --- a/open_issues/gnumach_vm_map_entry_forward_merging.mdwn +++ b/open_issues/gnumach_vm_map_entry_forward_merging.mdwn @@ -1,4 +1,4 @@ -[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]] +[[!meta copyright="Copyright © 2011, 2012 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this @@ -181,6 +181,8 @@ License|/fdl]]."]]"""]] <braunr> from what i could see, part of the problem still exists in freebsd <braunr> for the same reasons (shadow objects being one of them) +[[mach_shadow_objects]]. + # GCC build time using bash vs. dash diff --git a/open_issues/gnumach_vm_map_red-black_trees.mdwn b/open_issues/gnumach_vm_map_red-black_trees.mdwn index 7a54914f..53ff66c5 100644 --- a/open_issues/gnumach_vm_map_red-black_trees.mdwn +++ b/open_issues/gnumach_vm_map_red-black_trees.mdwn @@ -198,3 +198,149 @@ License|/fdl]]."]]"""]] get all that crap <braunr> that's very good <braunr> more test cases to fix the vm + + +### IRC, freenode, #hurd, 2012-11-01 + + <youpi> braunr: Assertion `diff != 0' failed in file "vm/vm_map.c", line + 1002 + <youpi> that's in rbtree_insert + <braunr> youpi: the problem isn't the tree, it's the map entries + <braunr> some must overlap + <braunr> if you can inspect that, it would be helpful + <youpi> I have a kdb there + <youpi> it's within a port_name_to_task system call + <braunr> this assertion basically means there already is an item in the + tree where the new item is supposed to be inserted + <youpi> this port_name_to_task presence in the stack is odd + <braunr> it's in vm_map_enter + <youpi> there's a vm_map just after that (and the assembly trap code + before) + <youpi> I know + <youpi> I'm wondering about the caller + <braunr> do you have a way to inspect the inserted map entry ? + <youpi> I'm actually wondering whether I have the right kernel in gdb + <braunr> oh + <youpi> better + <youpi> with the right kernel :) + <youpi> 0x80039acf (syscall_vm_map) + (target_map=d48b6640,address=d3b63f90,size=0,mask=0,anywhere=1) + <youpi> size == 0 seems odd to me + <youpi> (same parameters for vm_map) + <braunr> right + <braunr> my code does assume an entry has a non null size + <braunr> (in the entry comparison function) + <braunr> EINVAL (since Linux 2.6.12) length was 0. + <braunr> that's a quick glance at mmap(2) + <braunr> might help track bugs from userspace (e.g. in exec .. :)) + <braunr> posix says the saem + <braunr> same* + <braunr> the gnumach manual isn't that precise + <youpi> I don't seem to manage to read the entry + <youpi> but I guess size==0 is the problem anyway + <mcsim> youpi, braunr: Is there another kernel fault? Was that in my + kernel? + <braunr> no that's another problem + <braunr> which became apparent following the addition of red black trees in + the vm_map code + <braunr> (but which was probably present long before) + <mcsim> braunr: BTW, do you know if there where some specific circumstances + that led to memory exhaustion in my code? Or it just aggregated over + time? + <braunr> mcsim: i don't know + <mcsim> s/where/were + <mcsim> braunr: ok + + +### IRC, freenode, #hurd, 2012-11-05 + + <tschwinge> braunr: I have now also hit the diff != 0 assertion error; + sitting in KDB, waiting for your commands. + <braunr> tschwinge: can you check the backtrace, have a look at the system + call and its parameters like youpi did ? + <tschwinge> If I manage to figure out how to do that... :-) + * tschwinge goes read scrollback. + <braunr> "trace" i suppose + <braunr> if running inside qemu, you can use the integrated gdb server + <tschwinge> braunr: No, hardware. And work intervened. And mobile phone + <-> laptop via bluetooth didn't work. But now: + <tschwinge> Pretty similar to Samuel's: + <tschwinge> Assert([...]) + <tschwinge> vm_map_enter(0xc11de6c8, 0xc1785f94, 0, 0, 1) + <tschwinge> vm_map(0xc11de6c8, 0xc1785f94, 0, 0, 1) + <tschwinge> syscall_vm_map(1, 0x1024a88, 0, 0, 1) + <tschwinge> mach_call_call(1, 0x1024a88, 0, 0, 1) + <braunr> thanks + <braunr> same as youpi observed, the requested size for the mapping is 0 + <braunr> tschwinge: thanks + <tschwinge> braunr: Anything else you'd like to see before I reboot? + <braunr> tschwinge: no, that's enough for now, and the other kind of info + i'd like are much more difficult to obtain + <braunr> if we still have the problem once a small patch to prevent null + size is applied, then it'll be worth looking more into it + <pinotree> isn't it possible to find out who called with that size? + <braunr> not easy, no + <braunr> it's also likely that the call that fails isn't the first one + <pinotree> ah sure + <pinotree> braunr: making mmap reject 0 size length could help? posix says + such size should be rejected straight away + <braunr> 17:09 < braunr> if we still have the problem once a small patch to + prevent null size is applied, then it'll be worth looking more into it + <braunr> that's the idea + <braunr> making faulty processes choke on it should work fine :) + <pinotree> «If len is zero, mmap() shall fail and no mapping shall be + established.» + <pinotree> braunr: should i cook up such patch for mmap? + <braunr> no, the change must be applied in gnumach + <pinotree> sure, but that could simply such condition in mmap (ie avoiding + to call io_map on a file) + <braunr> such calls are erroneous and rare, i don't see the need + <pinotree> ok + <braunr> i bet it comes from the exec server anyway :p + <tschwinge> braunr: Is the mmap with size 0 already a reproducible testcase + you can use for the diff != 0 assertion? + <tschwinge> Otherwise I'd have a reproducer now. + <braunr> tschwinge: i'm not sure but probably yes + <tschwinge> braunr: Otherwise, take GDB sources, then: gcc -fsplit-stack + gdb/testsuite/gdb.base/morestack.c && ./a.out + <tschwinge> I have not looked what exactly this does; I think -fsplit-stack + is not really implemented for us (needs something in libgcc we might not + have), is on my GCC TODO list already. + <braunr> tschwinge: interesting too :) + + +### IRC, freenode, #hurd, 2012-11-19 + + <tschwinge> braunr: Hmm, I have now hit the diff != 0 GNU Mach assertion + failure during some GCC invocation (GCC testsuite) that does not relate + to -fsplit-stack (as the others before always have). + <tschwinge> Reproduced: + /media/erich/home/thomas/tmp/gcc/hurd/master.build/gcc/xgcc + -B/media/erich/home/thomas/tmp/gcc/hurd/master.build/gcc/ + /home/thomas/tmp/gcc/hurd/master/gcc/testsuite/gcc.dg/torture/pr42878-1.c + -fno-diagnostics-show-caret -O2 -flto -fuse-linker-plugin + -fno-fat-lto-objects -fcompare-debug -S -o pr42878-1.s + <tschwinge> Will check whether it's the same backtrace in GNU Mach. + <tschwinge> Yes, same. + <braunr> tschwinge: as youpi seems quite busy these days, i'll cook a patch + and commit it directly + <tschwinge> braunr: Thanks! I have, by the way, confirmed that the + following is enough to trigger the issue: vm_map(mach_task_self(), 0, 0, + 0, 1, 0, 0, 0, 0, 0, 0); + <tschwinge> ... and before the allocator patch, GNU Mach did accept that + and return 0 -- though I did not check what effect it actually has. (And + I don't think it has any useful one.) I'm also reading that as of lately + (Linux 2.6.12), mmap (length = 0) is to return EINVAL, which I think is + the foremost user of vm_map. + <pinotree> tschwinge: posix too says to return EINVAL for length = 0 + <braunr> yes, we checked that earlier with youpi + +[[!message-id "87sj8522zx.fsf@kepler.schwinge.homeip.net"]]. + + <braunr> tschwinge: well, actually your patch is what i had in mind + (although i'd like one in vm_map_enter to catch wrong kernel requests + too) + <braunr> tschwinge: i'll work on it tonight, and do some testing to make + sure we don't regress critical stuff (exec is another major direct user + of vm_map iirc) + <tschwinge> braunr: Oh, OK. :-) diff --git a/open_issues/implementing_hurd_on_top_of_another_system.mdwn b/open_issues/implementing_hurd_on_top_of_another_system.mdwn index 95b71ebb..220c69cc 100644 --- a/open_issues/implementing_hurd_on_top_of_another_system.mdwn +++ b/open_issues/implementing_hurd_on_top_of_another_system.mdwn @@ -1,4 +1,5 @@ -[[!meta copyright="Copyright © 2010, 2011 Free Software Foundation, Inc."]] +[[!meta copyright="Copyright © 2010, 2011, 2012 Free Software Foundation, +Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this @@ -15,9 +16,12 @@ One obvious variant is [[emulation]] (using [[hurd/running/QEMU]], for example), but doing that does not really integratable the Hurd guest into the host system. There is also a more direct way, more powerful, but it also has certain -requirements to do it effectively: +requirements to do it effectively. -IRC, #hurd, August / September 2010 +See also [[Mach_on_top_of_POSIX]]. + + +# IRC, freenode, #hurd, August / September 2010 <marcusb> silver_hook: the Hurd can also refer to the interfaces of the filesystems etc, and a lot of that is really just server/client APIs that @@ -56,7 +60,7 @@ IRC, #hurd, August / September 2010 <marcusb> ArneBab: in fact, John Tobey did this a couple of years ago, or started it -([[tschwinge]] has tarballs of John's work.) +[[Mach_on_top_of_POSIX]]. <marcusb> ArneBab: or you can just implement parts of it and relay to Linux for the rest @@ -64,11 +68,10 @@ IRC, #hurd, August / September 2010 are sufficiently happy with the translator stuff, it's not hard to bring the Hurd to Linux or BSD -Continue reading about the [[benefits of a native Hurd implementation]]. +Continue reading about the [[benefits_of_a_native_Hurd_implementation]]. ---- -IRC, #hurd, 2010-12-28 +# IRC, freenode, #hurd, 2010-12-28 <antrik> kilobug: there is no real requirement for the Hurd to run on a microkernel... as long as the important mechanisms are provided (most @@ -79,9 +82,8 @@ IRC, #hurd, 2010-12-28 Hurd on top of a monolithic kernel would actually be a useful approach for the time being... ---- -IRC, #hurd, 2011-02-11 +# IRC, freenode, #hurd, 2011-02-11 <neal> marcus and I were discussing how to add Mach to Linux <neal> one could write a module to implement Mach IPC @@ -115,3 +117,303 @@ IRC, #hurd, 2011-02-11 <neal> I'm unlikely to work on it, sorry <antrik> didn't really expect that :-) <antrik> would be nice though if you could write up your conclusions... + + +# IRC, freenode, #hurd, 2012-10-12 + + <peo-xaci> do hurd system libraries make raw system calls ever + (i.e. inlined syscall() / raw assembly)? + <braunr> sure + <peo-xaci> hmm, so a hurd emulation layer would need to use ptrace if it + should be fool proof? :/ + <braunr> there is no real need for raw assembly, and the very syscalls are + all available through macros + <braunr> hum what are you trying to say ? + <peo-xaci> well, if they are done through syscall, as a function, not a + macro, then they can be intercepted with LD_PRELOAD + <peo-xaci> so applications that do Hurd (Mach?) syscalls could work on + f.e. Linux, if a special libc is injected into the program with + LD_PRELOAD + <peo-xaci> same thing with making standard Linux-applications go through + the Hurd emulation layer + <peo-xaci> without recompilation + <mel-_> peo-xaci: the second direction is implemented in glibc. + <mel-_> for the other direction, I personally see little use for it + <braunr> peo-xaci: ok i misunderstood + <braunr> peo-xaci: i don't think there is any truely direct syscall usage + in the hurd + <peo-xaci> hmm, I'm not sure I understand what directions you are referring + to mel-_ + <braunr> peo-xaci: what are you trying to achieve ? + <peo-xaci> I want to make the Hurd design more accessible by letting Hurd + application run on the Linux kernel, preferably without + recompilation. This would be done with a daemon that implements Mach and + which all syscalls would go to. + <peo-xaci> then, I also want so that standard Linux applications can go + through that Mach daemon as well, if a special libc is preloaded + <braunr> you might want to discuss this with antrik + <peo-xaci> what I'm trying to figure out specifically is if there is some + library/interface that glue Hurd with Mach and would be better suited to + emulate than Mach? Mach seems to be more of an implementation detail to + the hurd and not something an application would directly use. + <braunr> yes, the various hurd libraries (libports and libpager mostly) + <peo-xaci> From [http://www.gnu.org/software/hurd/hurd/libports.html]: + "libports is not (at least, not for now) a generalization / abstraction + of Mach ports to the functionality the Hurd needs, that is, it is not + meant to provide an interface independently of the underlying + microkernel." + <peo-xaci> Is this still true? + <peo-xaci> Does libpager abstract the rest? + <peo-xaci> (and the other hurd libraries) + <braunr> there is nothing that really abstracts the hurd from mach + <braunr> for example, reference counting often happens here and there + <braunr> and core libraries like glibc and libpthread heavily rely on it + (through sysdeps specific code though) + <braunr> libports and libpager are meant to simplify object manipulation + for the former, and pager operations for the latter + <peo-xaci> and applications, such as translators, often use Mach interfaces + directly? + <peo-xaci> correct? + <braunr> depends on what often means + <braunr> let's say they do + <peo-xaci> :/ then it probably is better to emulate Mach after all + <braunr> there was a mach on posix port a long time ago + <peo-xaci> I thought applications were completely separated from the + microkernel in use by the Hurd + <braunr> that level of abstraction is pretty new + <braunr> genode is the only system i know which does that + +[[microkernel/Genode]]. + + <braunr> and it's still for "l4 variants" + <pinotree> ah, thanks (i forgot that name) + <antrik> braunr: Genode also runs on Linux and a few other non-L4 + environments IIRC + <antrik> peo-xaci: I'm not sure binary emulation is really useful. rather, + I'd recompile stuff as "regular" Linux executables, only using a special + glibc + <antrik> where the special glibc could be basically a port of the Hurd + glibc communicating with the Mach emulation instead of real Mach; or it + could do emulation at a higher level + <antrik> a higher level emulation would be more complicated to implement, + but more efficient, and allow better integration with the ordinary + GNU/Linux environment + <antrik> also note that any regular program could be recompiled against the + HELL glibc to run in the Hurdish environment... + <antrik> (well, glibc + hurd server libraries) + <peo-xaci> I'm willing to accept that Hurd-application would need to be + recompiled to work on the HELL + <peo-xaci> but not Linux-applications :) + <antrik> peo-xaci: if you happen to understand German, there is a fairly + good overview in my thesis report ;-) + <antrik> peo-xaci: there are no "Hurd applications" or "Linux applications" + <peo-xaci> well, let me define what I mean by the terms: Hurd applications + use Hurd-specific interfaces/syscalls, and Linux applications use + Linux-specific interfaces/syscalls + <antrik> a few programs use Linux-specific interfaces (and we probably + can't run them in HELL just as we can't run them on actual Hurd); but all + other programs work in any glibc environment + <antrik> (usually in any POSIX environment in fact...) + <antrik> peo-xaci: no sane application uses syscalls + <peo-xaci> they do under the hood + <peo-xaci> I have read about inlined syscalls + <antrik> again, there are *some* applications using Linux-specific + interfaces (sometimes because they are inherently bound to Linux + features, sometimes unnecessarily) + <antrik> so far there are no applications using Hurd-specific interfaces + <peo-xaci> translators do? + <peo-xaci> they are standard executables are they not? + <peo-xaci> I would like so that translators also can be run in the HELL + <antrik> I wouldn't consider them applications. all existing translators + are pretty much components of the Hurd itself + <peo-xaci> okay, it's a question about semantics, perhaps I should use + another word than "applications" :) + <peo-xaci> for me, applications are what have a main-function, or similar + single entry point + <braunr> hum + <braunr> that's not a good enough definition + <antrik> anyways, as I said, I think recompiling translators against a + Hurdish glibc and ported translator libraries seems the most reasonable + approach to me + <braunr> let's say applications are userspace processes that make use of + services provided by the operating system + <braunr> translators being part of the operating system here + <antrik> braunr: do you know whether the Mach-on-POSIX was actually + functional, or just an abandoned experiment?... + <antrik> (I don't remember hearing of it before...) + <braunr> incomplete iirc + <peo-xaci> braunr: still, when I've explained what I meant, even if I used + the wrong term, then my previous statements should come in another light + <peo-xaci> antrik / braunr: are you still interested in hearing my + thoughts/ideas about HELL? + <antrik> oh, there is more to come? ;-) + <peo-xaci> yes! I don't think I have made myself completely understood :/ + <peo-xaci> what I envision is a HELL system that works on as low level as + feasible, to make it possible to do almost anything that can be done on + the real Hurd (except possibly testing hardware drivers and such very low + level stuff). + <braunr> sure + <peo-xaci> I want it to be more than just allowing programs to access a + virtual filesystem à la FUSE. My idea is that all user space system + libraries/programs of the Hurd should be inside the HELL as well, and + they should not be emulated. + <peo-xaci> The system should at the very least be API compatible, so at the + very most a recompilation is necessary. + <peo-xaci> I also want so that GNU/Linux-programs can access the features + of the HELL with little effort on the user. At most perhaps a script that + wraps LD_PRELOADing has to be run on the binary. Best would be if it + could work also with insane assembly programs using raw system calls, or + if glibc happens to have some well hidden syscall being inlined to raw + assembly code. + <peo-xaci> And I think I have an idea on how an implementation could + satisfy these things! + <peo-xaci> By modifying the kernel and replace those syscalls that make + sense for the Hurd/Mach + <peo-xaci> with "the kernel", I meant Linux + <braunr> it's possible but tedious and not very useful so better do that + later + <braunr> mach did something similar at its time + <braunr> there was a syscall emulation library + <peo-xaci> but isn't it about as much work as emulating the interface on + user-level? + <braunr> and the kernel cooperated so that unmodified unix binaries + performing syscalls would actually jump to functions provided by that + library, which generally made an RPC + <peo-xaci> instead of a bunch of extern-declerations, one would put the + symbols in the syscall table + <braunr> define what "those syscalls that make sense for the Hurd/Mach" + actually means + <peo-xaci> open/close, for example + <braunr> otherwise i don't see another better way than what the old mach + folks did + <braunr> well, with that old, but existing support, your open would perform + a syscall + <braunr> the kernel would catch it and redirect the caller to its syscall + emulation library + <braunr> which would call the open RPC instead + <peo-xaci> wait, so this "existing support" you're talking about; is this a + module for the Linux kernel (or a fork, or something else)? + <peo-xaci> where can I find it? + <braunr> no + <braunr> it was for mach + <braunr> in order to run unmodified unix binaries + <braunr> the opposite of what you're trying to do + <peo-xaci> ah okay + <braunr> well + <braunr> not really either :) + <peo-xaci> does posix/unix define a standard for how a syscall table should + look like, to allow binary syscall compatibility? + <braunr> absolutely not + <peo-xaci> so how could this mach module run any unmodified unix binary? if + they expected different sys calls at different offsets? + <braunr> posix specifically (and very early) states that it almost forbids + itself to deal with anything regarding to ABIs + <braunr> depends + <braunr> since it was old, there weren't that many unix systems + <braunr> and even today, there are techniques like those used by netbsd + (and many other actually) + <braunr> that are able to inspect the binary and load a syscall emulation + environment depending on its exposed ABI + <braunr> e.g. file on an executable states which system it's for + <peo-xaci> hmm, I'm not sure how a kernel would implement that in + practice.. I thought these things were so hard coded and dependent on raw + memory reads that it would not be possible + <braunr> but i really think it's not worth the time for your project + <peo-xaci> to be honest I have virtually no experience of practical kernel + programming + <braunr> with an LDT on x86 for example + <braunr> no, there is really not that much hardcoded + <braunr> quite the contrary + <braunr> there is a lot of runtime detection today + <peo-xaci> well I mean how the syscall table is read + <braunr> it's not read + <peo-xaci> it's read to find the function pointer to the syscall handler in + the kernel? + <braunr> no + <braunr> that's the really basic approach + <braunr> (and in practice it can happen of course) + <braunr> what really happens is that, for example, on linux, the user space + system call code is loaded as a virtual shared library + <braunr> use ldd on an executable to see it + <braunr> this virtual object provides code that, depending on what the + kernel has detected, will use the appropriate method to perform a system + call + <peo-xaci> but this user space system calls need to make some kind of cpu + interupt to communicate with the kernel, right? + <braunr> the glibc itself has no idea how a system call will look like in + the end + <braunr> yes + <peo-xaci> an assembler programmer would be able to get around this glue + code? + <braunr> that's precisely what is embedded in this virtual library + <braunr> it could yes + <braunr> i think even when sysenter/sysexit is supported, legacy traps are + still implemented to support old binaries + <braunr> but then all these different entry points will lead to the same + code inside the kernel + <peo-xaci> but when the glue code is used, then its API compatible, and + then I can understand that the kernel can allow different syscall + implementations for different executables + <braunr> what glue code ? + <peo-xaci> what you talked about above "the user space system call code is + loaded as a virtual shared library" + <braunr> let's call it vdso + <braunr> i have to leave in a few minutes + <braunr> keep going, i'll read later + <peo-xaci> thanks, I looked it up on Wikipedia and understand immediately + :P + <peo-xaci> so VDSOs are provided by the kernel, not a regular library file, + right? + <vdox2> What does HELL stand for :) ? + <dardevelin> vdox2, Hurd Emulation Layer for Linux + <vdox2> dardevelin: thanks + <braunr> peo-xaci: yes + <antrik> peo-xaci: I believe your goals are conflicting. a low-level + implementation makes it basically impossible to interact between the HELL + environment and the GNU/Linux environment in any meaningful way. to allow + such interaction, you *have* to have some glue at a higher semantic level + <braunr> agreed + <antrik> peo-xaci: BTW, if you want regular Linux binaries to get somehow + redirected to access HELL facilities, there is already a framework (don't + remember the name right now) that allows this kind of system call + redirection on Linux + <antrik> (it can run both through LD_PRELOAD or as a kernel module -- where + obviously only the latter would allow raw system call redirection... but + TBH, I don't think that's worthwhile anyways. the rare cases where + programs use raw system calls are usually for extremely system-specific + stuff anyways...) + <antrik> ViewOS is the name + <antrik> err... View-OS I mean + <antrik> or maybe View OS ? ;-) + <antrik> whatever, you'll find it :-) + +[[Virtual_Square_View-OS]]. + + <antrik> I'm not sure it's really worthwhile to use this either + though... the most meaningful interaction is probably at the FS level, + and that can be done with FUSE + <antrik> OHOH, View-OS probably allows doing more interesting stuff that + FUSE, such as modyfing the way the VFS works... + <antrik> OTOH + <antrik> so it could expose more of the Hurd features, at least in theory + + +## IRC, freenode, #hurd, 2012-10-13 + + <peo-xaci> antrik / braunr: thanks for your input! I'm not entirely + convinced though. :) I will probably return to this project once I have + acquired a lot more knowledge about low level stuff. I want to see for + myself whether a low level HELL is not feasible. :P + <braunr> peo-xaci: what's the point of a low level hell ? + <peo-xaci> more Hurd code can be tested in the hell, if the hell is at a + low level + <peo-xaci> at a higher level, some Hurd code cannot run, because the + interfaces they use would not be accessible from the higher level + emulation + <antrik> peo-xaci: I never said it's not possible. I actually said it would + be easier to do. I just said you can't do it low level *and* have + meaningful interaction with the host system + <peo-xaci> I don't understand why + <braunr> peo-xaci: i really don't see what you want to achieve with low + level support + <braunr> what would be unavailable with a higher level approach ? diff --git a/open_issues/libpthread.mdwn b/open_issues/libpthread.mdwn index 03a52218..81f1a382 100644 --- a/open_issues/libpthread.mdwn +++ b/open_issues/libpthread.mdwn @@ -566,3 +566,671 @@ There is a [[!FF_project 275]][[!tag bounty]] on this task. <braunr> ouch <bddebian> braunr: Do you have debugging enabled in that custom kernel you installed? Apparently it is sitting at the debug prompt. + + +## IRC, freenode, #hurd, 2012-08-12 + + <braunr> hmm, it seems the hurd notion of cancellation is actually not the + pthread one at all + <braunr> pthread_cancel merely marks a thread as being cancelled, while + hurd_thread_cancel interrupts it + <braunr> ok, i have a pthread_hurd_cond_wait_np function in glibc + + +## IRC, freenode, #hurd, 2012-08-13 + + <braunr> nice, i got ext2fs work with pthreads + <braunr> there are issues with the stack size strongly limiting the number + of concurrent threads, but that's easy to fix + <braunr> one problem with the hurd side is the condition implications + <braunr> i think it should be deal separately, and before doing anything + with pthreads + <braunr> but that's minor, the most complex part is, again, the term server + <braunr> other than that, it was pretty easy to do + <braunr> but, i shouldn't speak too soon, who knows what tricky bootstrap + issue i'm gonna face ;p + <braunr> tschwinge: i'd like to know how i should proceed if i want a + symbol in a library overriden by that of a main executable + <braunr> e.g. have libpthread define a default stack size, and let + executables define their own if they want to change it + <braunr> tschwinge: i suppose i should create a weak alias in the library + and a normal variable in the executable, right ? + <braunr> hm i'm making this too complicated + <braunr> don't mind that stupid question + <tschwinge> braunr: A simple variable definition would do, too, I think? + <tschwinge> braunr: Anyway, I'd first like to know why we can'T reduce the + size of libpthread threads from 2 MiB to 64 KiB as libthreads had. Is + that a requirement of the pthread specification? + <braunr> tschwinge: it's a requirement yes + <braunr> the main reason i see is that hurd threadvars (which are still + present) rely on common stack sizes and alignment to work + <tschwinge> Mhm, I see. + <braunr> so for now, i'm using this approach as a hack only + <tschwinge> I'm working on phasing out threadvars, but we're not there yet. + <tschwinge> Yes, that's fine for the moment. + <braunr> tschwinge: a simple definition wouldn't work + <braunr> tschwinge: i resorted to a weak symbol, and see how it goes + <braunr> tschwinge: i supposed i need to export my symbol as a global one, + otherwise making it weak makes no sense, right ? + <braunr> suppose* + <braunr> tschwinge: also, i'm not actually sure what you meant is a + requirement about the stack size, i shouldn't have answered right away + <braunr> no there is actually no requirement + <braunr> i misunderstood your question + <braunr> hm when adding this weak variable, starting a program segfaults :( + <braunr> apparently on ___pthread_self, a tls variable + <braunr> fighting black magic begins + <braunr> arg, i can't manage to use that weak symbol to reduce stack sizes + :( + <braunr> ah yes, finally + <braunr> git clone /path/to/glibc.git on a pthread-powered ext2fs server :> + <braunr> tschwinge: seems i have problems using __thread in hurd code + <braunr> tschwinge: they produce undefined symbols + <braunr> tschwinge: forget that, another mistake on my part + <braunr> so, current state: i just need to create another patch, for the + code that is included in the debian hurd package but not in the upstream + hurd repository (e.g. procfs, netdde), and i should be able to create + hurd packages taht completely use pthreads + + +## IRC, freenode, #hurd, 2012-08-14 + + <braunr> tschwinge: i have weird bootstrap issues, as expected + <braunr> tschwinge: can you point me to important files involved during + bootstrap ? + <braunr> my ext2fs.static server refuses to start as a rootfs, whereas it + seems to work fine otherwise + <braunr> hm, it looks like it's related to global signal dispositions + + +## IRC, freenode, #hurd, 2012-08-15 + + <braunr> ahah, a subhurd running pthreads-powered hurd servers only + <LarstiQ> braunr: \o/ + <braunr> i can even long on ssh + <braunr> log + <braunr> pinotree: for reference, i uploaded my debian-specific changes + there : + <braunr> http://git.sceen.net/rbraun/debian_hurd.git/ + <braunr> darnassus is now running a pthreads-enabled hurd system :) + + +## IRC, freenode, #hurd, 2012-08-16 + + <braunr> my pthreads-enabled hurd systems can quickly die under load + <braunr> youpi: with hurd servers using pthreads, i occasionally see thread + storms apparently due to a deadlock + <braunr> youpi: it makes me think of the problem you sometimes have (and + had often with the page cache patch) + <braunr> in cthreads, mutex and condition operations are macros, and they + check the mutex/condition queue without holding the internal + mutex/condition lock + <braunr> i'm not sure where this can lead to, but it doesn't seem right + <pinotree> isn't that a bit dangerous? + <braunr> i believe it is + <braunr> i mean + <braunr> it looks dangerous + <braunr> but it may be perfectly safe + <pinotree> could it be? + <braunr> aiui, it's an optimization, e.g. "dont take the internal lock if + there are no thread to wake" + <braunr> but if there is a thread enqueuing itself at the same time, it + might not be waken + <pinotree> yeah + <braunr> pthreads don't have this issue + <braunr> and what i see looks like a deadlock + <pinotree> anything can happen between the unlocked checking and the + following instruction + <braunr> so i'm not sure how a situation working around a faulty + implementation would result in a deadlock with a correct one + <braunr> on the other hand, the error youpi reported + (http://lists.gnu.org/archive/html/bug-hurd/2012-07/msg00051.html) seems + to indicate something is deeply wrong with libports + <pinotree> it could also be the current code does not really "works around" + that, but simply implicitly relies on the so-generated behaviour + <braunr> luckily not often + <braunr> maybe + <braunr> i think we have to find and fix these issues before moving to + pthreads entirely + <braunr> (ofc, using pthreads to trigger those bugs is a good procedure) + <pinotree> indeed + <braunr> i wonder if tweaking the error checking mode of pthreads to abort + on EDEADLK is a good approach to detecting this problem + <braunr> let's try ! + <braunr> youpi: eh, i think i've spotted the libports ref mistake + <youpi> ooo! + <youpi> .oOo.!! + <gnu_srs> Same problem but different patches + <braunr> look at libports/bucket-iterate.c + <braunr> in the HURD_IHASH_ITERATE loop, pi->refcnt is incremented without + a lock + <youpi> Mmm, the incrementation itself would probably be compiled into an + INC, which is safe in UP + <youpi> it's an add currently actually + <youpi> 0x00004343 <+163>: addl $0x1,0x4(%edi) + <braunr> 40c4: 83 47 04 01 addl $0x1,0x4(%edi) + <youpi> that makes it SMP unsafe, but not UP unsafe + <braunr> right + <braunr> too bad + <youpi> that still deserves fixing :) + <braunr> the good side is my mind is already wired for smp + <youpi> well, it's actually not UP either + <youpi> in general + <youpi> when the processor is not able to do the add in one instruction + <braunr> sure + <braunr> youpi: looks like i'm wrong, refcnt is protected by the global + libports lock + <youpi> braunr: but aren't there pieces of code which manipulate the refcnt + while taking another lock than the global libports lock + <youpi> it'd not be scalable to use the global libports lock to protect + refcnt + <braunr> youpi: imo, the scalability issues are present because global + locks are taken all the time, indeed + <youpi> urgl + <braunr> yes .. + <braunr> when enabling mutex checks in libpthread, pfinet dies :/ + <braunr> grmbl, when trying to start "ls" using my deadlock-detection + libpthread, the terminal gets unresponsive, and i can't even use ps .. :( + <pinotree> braunr: one could say your deadlock detection works too + good... :P + <braunr> pinotree: no, i made a mistake :p + <braunr> it works now :) + <braunr> well, works is a bit fast + <braunr> i can't attach gdb now :( + <braunr> *sigh* + <braunr> i guess i'd better revert to a cthreads hurd and debug from there + <braunr> eh, with my deadlock-detection changes, recursive mutexes are now + failing on _pthread_self(), which for some obscure reason generates this + <braunr> => 0x0107223b <+283>: jmp 0x107223b + <__pthread_mutex_timedlock_internal+283> + <braunr> *sigh* + + +## IRC, freenode, #hurd, 2012-08-17 + + <braunr> aw, the thread storm i see isn't a deadlock + <braunr> seems to be mere contention .... + <braunr> youpi: what do you think of the way + ports_manage_port_operations_multithread determines it needs to spawn a + new thread ? + <braunr> it grabs a lock protecting the number of threads to determine if + it needs a new thread + <braunr> then releases it, to retake it right after if a new thread must be + created + <braunr> aiui, it could lead to a situation where many threads could + determine they need to create threads + <youpi> braunr: there's no reason to release the spinlock before re-taking + it + <youpi> that can indeed lead to too much thread creations + <braunr> youpi: a harder question + <braunr> youpi: what if thread creation fails ? :/ + <braunr> if i'm right, hurd servers simply never expect thread creation to + fail + <youpi> indeed + <braunr> and as some patterns have threads blocking until another produce + an event + <braunr> i'm not sure there is any point handling the failure at all :/ + <youpi> well, at least produce some output + <braunr> i added a perror + <youpi> so we know that happened + <braunr> async messaging is quite evil actually + <braunr> the bug i sometimes have with pfinet is usually triggered by + fakeroot + <braunr> it seems to use select a lot + <braunr> and select often destroys ports when it has something to return to + the caller + <braunr> which creates dead name notifications + <braunr> and if done often enough, a lot of them + <youpi> uh + <braunr> and as pfinet is creating threads to service new messages, already + existing threads are starved and can't continue + <braunr> which leads to pfinet exhausting its address space with thread + stacks (at about 30k threads) + <braunr> i initially thought it was a deadlock, but my modified libpthread + didn't detect one, and indeed, after i killed fakeroot (the whole + dpkg-buildpackage process hierarchy), pfinet just "cooled down" + <braunr> with almost all 30k threads simply waiting for requests to + service, and the few expected select calls blocking (a few ssh sessions, + exim probably, possibly others) + <braunr> i wonder why this doesn't happen with cthreads + <youpi> there's a 4k guard between stacks, otherwise I don't see anything + obvious + <braunr> i'll test my pthreads package with the fixed + ports_manage_port_operations_multithread + <braunr> but even if this "fix" should reduce thread creation, it doesn't + prevent the starvation i observed + <braunr> evil concurrency :p + + <braunr> youpi: hm i've just spotted an important difference actually + <braunr> youpi: glibc sched_yield is __swtch(), cthreads is + thread_switch(MACH_PORT_NULL, SWITCH_OPTION_DEPRESS, 10) + <braunr> i'll change the glibc implementation, see how it affects the whole + system + + <braunr> youpi: do you think bootsting the priority or cancellation + requests is an acceptable workaround ? + <braunr> boosting + <braunr> of* + <youpi> workaround for what? + <braunr> youpi: the starvation i described earlier + <youpi> well, I guess I'm not into the thing enough to understand + <youpi> you meant the dead port notifications, right? + <braunr> yes + <braunr> they are the cancellation triggers + <youpi> cancelling whaT? + <braunr> a blocking select for example + <braunr> ports_do_mach_notify_dead_name -> ports_dead_name -> + ports_interrupt_notified_rpcs -> hurd_thread_cancel + <braunr> so it's important they are processed quickly, to allow blocking + threads to unblock, reply, and be recycled + <youpi> you mean the threads in pfinet? + <braunr> the issue applies to all servers, but yes + <youpi> k + <youpi> well, it can not not be useful :) + <braunr> whatever the choice, it seems to be there will be a security issue + (a denial of service of some kind) + <youpi> well, it's not only in that case + <youpi> you can always queue a lot of requests to a server + <braunr> sure, i'm just focusing on this particular problem + <braunr> hm + <braunr> max POLICY_TIMESHARE or min POLICY_FIXEDPRI ? + <braunr> i'd say POLICY_TIMESHARE just in case + <braunr> (and i'm not sure mach handles fixed priority threads first + actually :/) + <braunr> hm my current hack which consists of calling swtch_pri(0) from a + freshly created thread seems to do the job eh + <braunr> (it may be what cthreads unintentionally does by acquiring a spin + lock from the entry function) + <braunr> not a single issue any more with this hack + <bddebian> Nice + <braunr> bddebian: well it's a hack :p + <braunr> and the problem is that, in order to boost a thread's priority, + one would need to implement that in libpthread + <bddebian> there isn't thread priority in libpthread? + <braunr> it's not implemented + <bddebian> Interesting + <braunr> if you want to do it, be my guest :p + <braunr> mach should provide the basic stuff for a partial implementation + <braunr> but for now, i'll fall back on the hack, because that's what + cthreads "does", and it's "reliable enough" + + <antrik> braunr: I don't think the locking approach in + ports_manage_port_operations_multithread() could cause issues. the worst + that can happen is that some other thread becomes idle between the check + and creating a new thread -- and I can't think of a situation where this + could have any impact... + <braunr> antrik: hm ? + <braunr> the worst case is that many threads will evalute spawn to 1 and + create threads, whereas only one of them should have + <antrik> braunr: I'm not sure perror() is a good way to handle the + situation where thread creation failed. this would usually happen because + of resource shortage, right? in that case, it should work in non-debug + builds too + <braunr> perror isn't specific to debug builds + <braunr> i'm building glibc packages with a pthreads-enabled hurd :> + <braunr> (which at one point run the test allocating and filling 2 GiB of + memory, which passed) + <braunr> (with a kernel using a 3/1 split of course, swap usage reached + something like 1.6 GiB) + <antrik> braunr: BTW, I think the observation that thread storms tend to + happen on destroying stuff more than on creating stuff has been made + before... + <braunr> ok + <antrik> braunr: you are right about perror() of course. brain fart -- was + thinking about assert_perror() + <antrik> (which is misused in some places in existing Hurd code...) + <antrik> braunr: I still don't see the issue with the "spawn" + locking... the only situation where this code can be executed + concurrently is when multiple threads are idle and handling incoming + request -- but in that case spawning does *not* happen anyways... + <antrik> unless you are talking about something else than what I'm thinking + of... + <braunr> well imagine you have idle threads, yes + <braunr> let's say a lot like a thousand + <braunr> and the server gets a thousand requests + <braunr> a one more :p + <braunr> normally only one thread should be created to handle it + <braunr> but here, the worst case is that all threads run internal_demuxer + roughly at the same time + <braunr> and they all determine they need to spawn a thread + <braunr> leading to another thousand + <braunr> (that's extreme and very unlikely in practice of course) + <antrik> oh, I see... you mean all the idle threads decide that no spawning + is necessary; but before they proceed, finally one comes in and decides + that it needs to spawn; and when the other ones are scheduled again they + all spawn unnecessarily? + <braunr> no, spawn is a local variable + <braunr> it's rather, all idle threads become busy, and right before + servicing their request, they all decide they must spawn a thread + <antrik> I don't think that's how it works. changing the status to busy (by + decrementing the idle counter) and checking that there are no idle + threads is atomic, isn't it? + <braunr> no + <antrik> oh + <antrik> I guess I should actually look at that code (again) before + commenting ;-) + <braunr> let me check + <braunr> no sorry you're right + <braunr> so right, you can't lead to that situation + <braunr> i don't even understand how i can't see that :/ + <braunr> let's say it's the heat :p + <braunr> 22:08 < braunr> so right, you can't lead to that situation + <braunr> it can't lead to that situation + + +## IRC, freenode, #hurd, 2012-08-18 + + <braunr> one more attempt at fixing netdde, hope i get it right this time + <braunr> some parts assume a ddekit thread is a cthread, because they share + the same address + <braunr> it's not as easy when using pthread_self :/ + <braunr> good, i got netdde work with pthreads + <braunr> youpi: for reference, there are now glibc, hurd and netdde + packages on my repository + <braunr> youpi: the debian specific patches can be found at my git + repository (http://git.sceen.net/rbraun/debian_hurd.git/ and + http://git.sceen.net/rbraun/debian_netdde.git/) + <braunr> except a freeze during boot (between exec and init) which happens + rarely, and the starvation which still exists to some extent (fakeroot + can cause many threads to be created in pfinet and pflocal), the + glibc/hurd packages have been working fine for a few days now + <braunr> the threading issue in pfinet/pflocal is directly related to + select, which the io_select_timeout patches should fix once merged + <braunr> well, considerably reduce at least + <braunr> and maybe fix completely, i'm not sure + + +## IRC, freenode, #hurd, 2012-08-27 + + <pinotree> braunr: wrt a78a95d in your pthread branch of hurd.git, + shouldn't that job theorically been done using pthread api (of course + after implementing it)? + <braunr> pinotree: sure, it could be done through pthreads + <braunr> pinotree: i simply restricted myself to moving the hurd to + pthreads, not augment libpthread + <braunr> (you need to remember that i work on hurd with pthreads because it + became a dependency of my work on fixing select :p) + <braunr> and even if it wasn't the reason, it is best to do these tasks + (replace cthreads and implement pthread scheduling api) separately + <pinotree> braunr: hm ok + <pinotree> implementing the pthread priority bits could be done + independently though + + <braunr> youpi: there are more than 9000 threads for /hurd/streamio kmsg on + ironforge oO + <youpi> kmsg ?! + <youpi> it's only /dev/klog right? + <braunr> not sure but it seems so + <pinotree> which syslog daemon is running? + <youpi> inetutils + <youpi> I've restarted the klog translator, to see whether when it grows + again + + <braunr> 6 hours and 21 minutes to build glibc on darnassus + <braunr> pfinet still runs only 24 threads + <braunr> the ext2 instance used for the build runs 2k threads, but that's + because of the pageouts + <braunr> so indeed, the priority patch helps a lot + <braunr> (pfinet used to have several hundreds, sometimes more than a + thousand threads after a glibc build, and potentially increasing with + each use of fakeroot) + <braunr> exec weights 164M eww, we definitely have to fix that leak + <braunr> the leaks are probably due to wrong mmap/munmap usage + +[[exec_leak]]. + + +### IRC, freenode, #hurd, 2012-08-29 + + <braunr> youpi: btw, after my glibc build, there were as little as between + 20 and 30 threads for pflocal and pfinet + <braunr> with the priority patch + <braunr> ext2fs still had around 2k because of pageouts, but that's + expected + <youpi> ok + <braunr> overall the results seem very good and allow the switch to + pthreads + <youpi> yep, so it seems + <braunr> youpi: i think my first integration branch will include only a few + changes, such as this priority tuning, and the replacement of + condition_implies + <youpi> sure + <braunr> so we can push the move to pthreads after all its small + dependencies + <youpi> yep, that's the most readable way + + +## IRC, freenode, #hurd, 2012-09-03 + + <gnu_srs> braunr: Compiling yodl-3.00.0-7: + <gnu_srs> pthreads: real 13m42.460s, user 0m0.000s, sys 0m0.030s + <gnu_srs> cthreads: real 9m 6.950s, user 0m0.000s, sys 0m0.020s + <braunr> thanks + <braunr> i'm not exactly certain about what causes the problem though + <braunr> it could be due to libpthread using doubly-linked lists, but i + don't think the overhead would be so heavier because of that alone + <braunr> there is so much contention sometimes that it could + <braunr> the hurd would have been better off with single threaded servers + :/ + <braunr> we should probably replace spin locks with mutexes everywhere + <braunr> on the other hand, i don't have any more starvation problem with + the current code + + +### IRC, freenode, #hurd, 2012-09-06 + + <gnu_srs> braunr: Yes you are right, the new pthread-based Hurd is _much_ + slower. + <gnu_srs> One annoying example is when compiling, the standard output is + written in bursts with _long_ periods of no output in between:-( + <braunr> that's more probably because of the priority boost, not the + overhead + <braunr> that's one of the big issues with our mach-based model + <braunr> we either give high priorities to our servers, or we can suffer + from message floods + <braunr> that's in fact more a hurd problem than a mach one + <gnu_srs> braunr: any immediate ideas how to speed up responsiveness the + pthread-hurd. It is annoyingly slow (slow-witted) + <braunr> gnu_srs: i already answered that + <braunr> it doesn't look that slower on my machines though + <gnu_srs> you said you had some ideas, not which. except for mcsims work. + <braunr> i have ideas about what makes it slower + <braunr> it doesn't mean i have solutions for that + <braunr> if i had, don't you think i'd have applied them ? :) + <gnu_srs> ok, how to make it more responsive on the console? and printing + stdout more regularly, now several pages are stored and then flushed. + <braunr> give more details please + <gnu_srs> it behaves like a loaded linux desktop, with little memory + left... + <braunr> details about what you're doing + <gnu_srs> apt-get source any big package and: fakeroot debian/rules binary + 2>&1 | tee ../binary.logg + <braunr> isee + <braunr> well no, we can't improve responsiveness + <braunr> without reintroducing the starvation problem + <braunr> they are linked + <braunr> and what you're doing involes a few buffers, so the laggy feel is + expected + <braunr> if we can fix that simply, we'll do so after it is merged upstream + + +### IRC, freenode, #hurd, 2012-09-07 + + <braunr> gnu_srs: i really don't feel the sluggishness you described with + hurd+pthreads on my machines + <braunr> gnu_srs: what's your hardware ? + <braunr> and your VM configuration ? + <gnu_srs> Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz + <gnu_srs> kvm -m 1024 -net nic,model=rtl8139 -net + user,hostfwd=tcp::5562-:22 -drive + cache=writeback,index=0,media=disk,file=hurd-experimental.img -vnc :6 + -cdrom isos/netinst_2012-07-15.iso -no-kvm-irqchip + <braunr> what is the file system type where your disk image is stored ? + <gnu_srs> ext3 + <braunr> and how much physical memory on the host ? + <braunr> (paste meminfo somewhere please) + <gnu_srs> 4G, and it's on the limit, 2 kvm instances+gnome,etc + <gnu_srs> 80% in use by programs, 14% in cache. + <braunr> ok, that's probably the reason then + <braunr> the writeback option doesn't help a lot if you don't have much + cache + <gnu_srs> well the other instance is cthreads based, and not so sluggish. + <braunr> we know hurd+pthreads is slower + <braunr> i just wondered why i didn't feel it that much + <gnu_srs> try to fire up more kvm instances, and do a heavy compile... + <braunr> i don't do that :) + <braunr> that's why i never had the problem + <braunr> most of the time i have like 2-3 GiB of cache + <braunr> and of course more on shattrath + <braunr> (the host of the sceen.net hurdboxes, which has 16 GiB of ram) + + +### IRC, freenode, #hurd, 2012-09-11 + + <gnu_srs> Monitoring the cthreads and the pthreads load under Linux shows: + <gnu_srs> cthread version: load can jump very high, less cpu usage than + pthread version + <gnu_srs> pthread version: less memory usage, background cpu usage higher + than for cthread version + <braunr> that's the expected behaviour + <braunr> gnu_srs: are you using the lifothreads gnumach kernel ? + <gnu_srs> for experimental, yes. + <gnu_srs> i.e. pthreads + <braunr> i mean, you're measuring on it right now, right ? + <gnu_srs> yes, one instance running cthreads, and one pthreads (with lifo + gnumach) + <braunr> ok + <gnu_srs> no swap used in either instance, will try a heavy compile later + on. + <braunr> what for ? + <gnu_srs> E.g. for memory when linking. I have swap available, but no swap + is used currently. + <braunr> yes but, what do you intend to measure ? + <gnu_srs> don't know, just to see if swap is used at all. it seems to be + used not very much. + <braunr> depends + <braunr> be warned that using the swap means there is pageout, which is one + of the triggers for global system freeze :p + <braunr> anonymous memory pageout + <gnu_srs> for linux swap is used constructively, why not on hurd? + <braunr> because of hard to squash bugs + <gnu_srs> aha, so it is bugs hindering swap usage:-/ + <braunr> yup :/ + <gnu_srs> Let's find them thenO:-), piece of cake + <braunr> remember my page cache branch in gnumach ? :) + +[[gnumach_page_cache_policy]]. + + <gnu_srs> not much + <braunr> i started it before fixing non blocking select + <braunr> anyway, as a side effect, it should solve this stability issue + too, but it'll probably take time + <gnu_srs> is that branch integrated? I only remember slab and the lifo + stuff. + <gnu_srs> and mcsims work + <braunr> no it's not + <braunr> it's unfinished + <gnu_srs> k! + <braunr> it correctly extends the page cache to all available physical + memory, but since the hurd doesn't scale well, it slows the system down + + +## IRC, freenode, #hurd, 2012-09-14 + + <braunr> arg + <braunr> darnassus seems to eat 100% cpu and make top freeze after some + time + <braunr> seems like there is an important leak in the pthreads version + <braunr> could be the lifothreads patch :/ + <cjbirk> there's a memory leak? + <cjbirk> in pthreads? + <braunr> i don't think so, and it's not a memory leak + <braunr> it's a port leak + <braunr> probably in the kernel + + +### IRC, freenode, #hurd, 2012-09-17 + + <braunr> nice, the port leak is actually caused by the exim4 loop bug + + +### IRC, freenode, #hurd, 2012-09-23 + + <braunr> the port leak i observed a few days ago is because of exim4 (the + infamous loop eating the cpu we've been seeing regularly) + +[[fork_deadlock]]? + + <youpi> oh + <braunr> next time it happens, and if i have the occasion, i'll examine the + problem + <braunr> tip: when you can't use top or ps -e, you can use ps -e -o + pid=,args= + <youpi> or -M ? + <braunr> haven't tested + + +## IRC, freenode, #hurd, 2012-09-23 + + <braunr> tschwinge: i committed the last hurd pthread change, + http://git.savannah.gnu.org/cgit/hurd/hurd.git/log/?h=master-pthreads + <braunr> tschwinge: please tell me if you consider it ok for merging + + +### IRC, freenode, #hurd, 2012-11-27 + + <youpi> braunr: btw, I forgot to forward here, with the glibc patch it does + boot fine, I'll push all that and build some almost-official packages for + people to try out what will come when eglibc gets the change in unstable + <braunr> youpi: great :) + <youpi> thanks for managing the final bits of this + <youpi> (and thanks for everybody involved) + <braunr> sorry again for the non obvious parts + <braunr> if you need the debian specific parts refined (e.g. nice commits + for procfs & others), i can do that + <youpi> I'll do that, no pb + <braunr> ok + <braunr> after that (well, during also), we should focus more on bug + hunting + + +## IRC, freenode, #hurd, 2012-10-26 + + <mcsim1> hello. What does following error message means? "unable to adjust + libports thread priority: Operation not permitted" It appears when I set + translators. + <mcsim1> Seems has some attitude to libpthread. Also following appeared + when I tried to remove translator: "pthread_create: Resource temporarily + unavailable" + <mcsim1> Oh, first message appears very often, when I use translator I set. + <braunr> mcsim1: it's related to a recent patch i sent + <braunr> mcsim1: hurd servers attempt to increase their priority on startup + (when a thread is created actually) + <braunr> to reduce message floods and thread storms (such sweet names :)) + <braunr> but if you start them as an unprivileged user, it fails, which is + ok, it's just a warning + <braunr> the second way is weird + <braunr> it normally happens when you're out of available virtual space, + not when shutting a translator donw + <mcsim1> braunr: you mean this patch: libports: reduce thread starvation on + message floods? + <braunr> yes + <braunr> remember you're running on darnassus + <braunr> with a heavily modified hurd/glibc + <braunr> you can go back to the cthreads version if you wish + <mcsim1> it's better to check translators privileges, before attempting to + increase their priority, I think. + <braunr> no + <mcsim1> it's just a bit annoying + <braunr> privileges can be changed during execution + <braunr> well remove it + <mcsim1> But warning should not appear. + <braunr> what could be done is to limit the warning to one occurrence + <braunr> mcsim1: i prefer that it appears + <mcsim1> ok + <braunr> it's always better to be explicit and verbose + <braunr> well not always, but very often + <braunr> one of the reasons the hurd is so difficult to debug is the lack + of a "message server" à la dmesg + +[[translator_stdout_stderr]]. diff --git a/open_issues/libpthread/t/fix_have_kernel_resources.mdwn b/open_issues/libpthread/t/fix_have_kernel_resources.mdwn new file mode 100644 index 00000000..37231c66 --- /dev/null +++ b/open_issues/libpthread/t/fix_have_kernel_resources.mdwn @@ -0,0 +1,21 @@ +[[!meta copyright="Copyright © 2012 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!tag open_issue_libphread]] + +`t/have_kernel_resources` + + +# IRC, freenode, #hurd, 2012-08-30 + + <braunr> tschwinge: this issue needs more cooperation with the kernel + <braunr> tschwinge: i.e. the ability to tell the kernel where the stack is, + so it's unmapped when the thread dies + <braunr> which requiring another thread to perform this deallocation diff --git a/open_issues/libpthread_CLOCK_MONOTONIC.mdwn b/open_issues/libpthread_CLOCK_MONOTONIC.mdwn index 86a613d3..22b2cd3b 100644 --- a/open_issues/libpthread_CLOCK_MONOTONIC.mdwn +++ b/open_issues/libpthread_CLOCK_MONOTONIC.mdwn @@ -103,3 +103,11 @@ License|/fdl]]."]]"""]] <pinotree> it'll be safe when implementing some private __hurd_clock_get{time,res} in libc proper, making librt just forward to it and adapting the gettimeofday to use it + + +## IRC, freenode, #hurd, 2012-10-22 + + <pinotree> youpi: apparently somebody in glibc land is indirectly solving + our "libpthread needs lirt which pulls libphtread" circular issue by + moving the clock_* functions to libc proper + <youpi> I've seen that yes :) diff --git a/open_issues/libpthread_timeout_dequeue.mdwn b/open_issues/libpthread_timeout_dequeue.mdwn new file mode 100644 index 00000000..5ebb2e11 --- /dev/null +++ b/open_issues/libpthread_timeout_dequeue.mdwn @@ -0,0 +1,22 @@ +[[!meta copyright="Copyright © 2012 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!tag open_issue_libpthread]] + + +# IRC, freenode, #hurd, 2012-08-17 + + <braunr> pthread_cond_timedwait and pthread_mutex_timedlock *can* produce + segfaults in our implementation + <braunr> if a timeout happens, but before the thread dequeues itself, + another tries to wake it, it will be dequeued twice + <braunr> this is the issue i spent a week on when working on fixing select + +[[select]] diff --git a/open_issues/mach_federations.mdwn b/open_issues/mach_federations.mdwn new file mode 100644 index 00000000..50c939c3 --- /dev/null +++ b/open_issues/mach_federations.mdwn @@ -0,0 +1,66 @@ +[[!meta copyright="Copyright © 2012 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!tag open_issue_documentation]] + + +# IRC, freenode, #hurd, 2012-08-18 + + <braunr> well replacing parts of it is possible on the hurd, but for core + servers it's limited + <braunr> minix has features for that + <braunr> this was interesting too: + http://static.usenix.org/event/osdi08/tech/full_papers/david/david_html/ + <braunr> lcc: you'll always have some kind of dependency problems which are + hard to solve + <savask> braunr: One my friend asked me if it's possible to run different + parts of Hurd on different computers and make a cluster therefore. So, is + it, at least theoretically? + <braunr> savask: no + <savask> Okay, then I guessed a right answer. + <youpi> well, theorically it's possible, but it's not implemented + <braunr> well it's possible everywhere :p + <braunr> there are projects for that on linux + <braunr> but it requires serious changes in both the protocols and servers + <braunr> and it depends on the features you want (i assume here you want + e.g. process checkpointing so they can be migrated to other machines to + transparently balance loads) + <lcc> is it even theoretically possible to have a system in which core + servers can be modified while the system is running? hm... I will look + more into it. just curious. + <savask> lcc: Linux can be updated on the fly, without rebooting. + <braunr> lcc: to some degree, it is + <braunr> savask: the whole kernel is rebooted actually + <braunr> well not rebooted, but restarted + <braunr> there is a project that provides kernel updates through binary + patches + <braunr> ksplice + <savask> braunr: But it will look like everything continued running. + <braunr> as long as the new code expects the same data structures and other + implications, yes + <braunr> "Ksplice can handle many security updates but not changes to data + structures" + <braunr> obviously + <braunr> so it's good for small changes + <braunr> and ksplice is very specific, it's intended for security updates, + ad the primary users are telecommunication providers who don't want + downtime + <antrik> braunr: well, protocols and servers on Mach-based systems should + be ready for federations... although some Hurd protocols are not clean + for federations with heterogenous architectures, at least on homogenous + clusters it should actually work with only some extra bootstrapping code, + if the support existed in our Mach variant... + <braunr> antrik: why do you want the support in the kernel ? + <antrik> braunr: I didn't say I *want* federation support in the + kernel... in fact I agree with Shapiro that it's probably a bad idea. I + just said that it *should* actually work with the system design as it is + now :-) + <antrik> braunr: yes, I said that it wouldn't work on heterogenous + federations. if all machines use the same architecture it should work. diff --git a/open_issues/mach_on_top_of_posix.mdwn b/open_issues/mach_on_top_of_posix.mdwn index 7574feb0..a3e47685 100644 --- a/open_issues/mach_on_top_of_posix.mdwn +++ b/open_issues/mach_on_top_of_posix.mdwn @@ -1,4 +1,4 @@ -[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]] +[[!meta copyright="Copyright © 2011, 2012 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this @@ -14,3 +14,5 @@ License|/fdl]]."]]"""]] At the beginning of the 2000s, there was a *Mach on Top of POSIX* port started by John Edwin Tobey. Status unknown. Ask [[tschwinge]] for the source code. + +See also [[implementing_hurd_on_top_of_another_system]]. diff --git a/open_issues/mach_shadow_objects.mdwn b/open_issues/mach_shadow_objects.mdwn new file mode 100644 index 00000000..0669041a --- /dev/null +++ b/open_issues/mach_shadow_objects.mdwn @@ -0,0 +1,24 @@ +[[!meta copyright="Copyright © 2012 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!tag open_issue_gnumach]] + +See also [[gnumach_vm_map_entry_forward_merging]]. + + +# IRC, freenode, #hurd, 2012-11-16 + + <mcsim> hi. do I understand correct that following is true: vm_object_t a; + a->shadow->copy == a;? + <braunr> mcsim: not completely sure, but i'd say no + <braunr> but mach terminology isn't always firm, so possible + <braunr> mcsim: apparently you're right, although be careful that it may + not be the case *all* the time + <braunr> there may be inconsistent states diff --git a/open_issues/multithreading.mdwn b/open_issues/multithreading.mdwn index c9567828..f42601b4 100644 --- a/open_issues/multithreading.mdwn +++ b/open_issues/multithreading.mdwn @@ -134,6 +134,75 @@ Tom Van Cutsem, 2009. <braunr> (i still strongly believe those shouldn't be used at all) +## IRC, freenode, #hurd, 2012-08-31 + + <braunr> and the hurd is all but scalable + <gnu_srs> I thought scalability was built-in already, at least for hurd?? + <braunr> built in ? + <gnu_srs> designed in + <braunr> i guess you think that because you read "aggressively + multithreaded" ? + <braunr> well, a system that is unable to control the amount of threads it + creates for no valid reason and uses global lock about everywhere isn't + really scalable + <braunr> it's not smp nor memory scalable + <gnu_srs> most modern OSes have multi-cpu support. + <braunr> that doesn't mean they scale + <braunr> bsd sucks in this area + <braunr> it got better in recent years but they're way behind linux + <braunr> linux has this magic thing called rcu + <braunr> and i want that in my system, from the beginning + <braunr> and no, the hurd was never designed to scale + <braunr> that's obvious + <braunr> a very common mistake of the early 90s + + +## IRC, freenode, #hurd, 2012-09-06 + + <braunr> mel-: the problem with such a true client/server architecture is + that the scheduling context of clients is not transferred to servers + <braunr> mel-: and the hurd creates threads on demand, so if it's too slow + to process requests, more threads are spawned + <braunr> to prevent hurd servers from creating too many threads, they are + given a higher priority + <braunr> and it causes increased latency for normal user applications + <braunr> a better way, which is what modern synchronous microkernel based + systems do + <braunr> is to transfer the scheduling context of the client to the server + <braunr> the server thread behaves like the client thread from the + scheduler perspective + <gnu_srs> how can creating more threads ease the slowness, is that a design + decision?? + <mel-> what would be needed to implement this? + <braunr> mel-: thread migration + <braunr> gnu_srs: is that what i wrote ? + <mel-> does mach support it? + <braunr> mel-: some versions do yes + <braunr> mel-: not ours + <gnu_srs> 21:49:03) braunr: mel-: and the hurd creates threads on demand, + so if it's too slow to process requests, more threads are spawned + <braunr> of course it's a design decision + <braunr> it doesn't "ease the slowness" + <braunr> it makes servers able to use multiple processors to handle + requests + <braunr> but it's a wrong design decision as the number of threads is + completely unchecked + <gnu_srs> what's the idea of creating more theads then, multiple cpus is + not supported? + <braunr> it's a very old decision taken at a time when systems and machines + were very different + <braunr> mach used to support multiple processors + <braunr> it was expected gnumach would do so too + <braunr> mel-: but getting thread migration would also require us to adjust + our threading library and our servers + <braunr> it's not an easy task at all + <braunr> and it doesn't fix everything + <braunr> thread migration on mach is an optimization + <mel-> interesting + <braunr> async ipc remains available, which means notifications, which are + async by nature, will create messages floods anyway + + # Alternative approaches: * <http://www.concurrencykit.org/> diff --git a/open_issues/packaging_libpthread.mdwn b/open_issues/packaging_libpthread.mdwn index 528e0b01..2d90779e 100644 --- a/open_issues/packaging_libpthread.mdwn +++ b/open_issues/packaging_libpthread.mdwn @@ -187,3 +187,60 @@ License|/fdl]]."]]"""]] upstream <youpi> the slibdir change, however, is odd <youpi> it must be a leftover + + +# IRC, freenode, #hurd, 2012-11-16 + + <pinotree> *** $(common-objpfx)resolv/gai_suspend.o: uses + /usr/include/i386-gnu/bits/pthread.h + <pinotree> so the ones in the libpthread addon are not used... + <tschwinge> pinotree: The latter at leash should be useful information. + <pinotree> tschwinge: i'm afraid i didn't get you :) what are you referring + to? + <tschwinge> pinotree: s%leash%least -- what I mean was the it's actually a + real bug that not the in-tree libpthread addon include files are being + used. + <pinotree> tschwinge: ah sure -- basically, the stuff in + libpthread/sysdeps/generic are not used at all + <pinotree> (glibc only uses generic for glibc/sysdeps/generic) + <pinotree> tschwinge: i might have an idea how to fix it: moving the + contents from libpthread/sysdeps/generic to libpthread/sysdeps/pthread, + and that would depend on one of the latest libpthread patches i sent + + +# libihash + +## IRC, freenode, #hurd, 2012-11-16 + + <pinotree> also, libpthread uses hurd's ihash + <tschwinge> Yes, I already thought a little bit about the ihash thing. I + besically see two options: move ihash into glibc ((probably?) not as a + public interface, though), or have libpthread use of of the hash + implementations that surely are already present in glibc. + <tschwinge> My notes say: + <tschwinge> * include/inline-hashtab.h + <tschwinge> * locale/programs/simple-hash.h + <tschwinge> * misc/hsearch_r.c + <tschwinge> * NNS; cf. f46f0abfee5a2b34451708f2462a1c3b1701facd + <tschwinge> No idea whether they're equivalent/usable. + <pinotree> interesting + <tschwinge> And no immediate recollection what NNS is; + f46f0abfee5a2b34451708f2462a1c3b1701facd is not a glibc commit after all. + ;-) + <tschwinge> Oh, and: libiberty: `hashtab.c` + <pinotree> hmm, but then you would need to properly ifdef the libpthread + hash usage (iirc only for pthread keys) depending on whether it's in + glibc or standalone + <pinotree> but that shouldn't be an ussue, i guess + <pinotree> *issue + <tschwinge> No that'd be fine. + <tschwinge> My understanding is that the long-term goal (well, no so + long-term, actually) is to completely move libpthread into glibc. + <pinotree> ie have it buildable only ad glibc addon? + <tschwinge> Yes. + <tschwinge> No need for more than one mechanism for building it, I think. + <tschwinge> Hmm, this doesn't bring us any further: + https://www.google.com/search?q=f46f0abfee5a2b34451708f2462a1c3b1701facd + <pinotree> yay for acronyms ;) + <tschwinge> So, if someone figures out what NNS and this commit it are: one + beer. ;-) diff --git a/open_issues/performance.mdwn b/open_issues/performance.mdwn index ec14fa52..8147e5eb 100644 --- a/open_issues/performance.mdwn +++ b/open_issues/performance.mdwn @@ -81,3 +81,34 @@ call|/glibc/fork]]'s case. gnumach and the hurd) just wake every thread waiting for an event when the event occurs (there are a few exceptions, but not many) <antrik> ouch + + +# IRC, freenode, #hurd, 2012-09-13 + +{{$news/2011-q2#phoronix-3}}. + + <braunr> the phoronix benchmarks don't actually test the operating system + .. + <hroi_> braunr: well, at least it tests its ability to run programs for + those particular tasks + <braunr> exactly, it tests how programs that don't make much use of the + operating system run + <braunr> well yes, we can run programs :) + <pinotree> those are just cpu-taking tasks + <hroi_> ok + <pinotree> if you do a benchmark with also i/o, you can see how it is + (quite) slower on hurd + <hroi_> perhaps they should have run 10 of those programs in parallel, that + would test the kernel multitasking I suppose + <braunr> not even I/O, simply system calls + <braunr> no, multitasking is ok on the hurd + <braunr> and it's very similar to what is done on other systems, which + hasn't changed much for a long time + <braunr> (except for multiprocessor) + <braunr> true OS benchmarks measure system calls + <hroi_> ok, so Im sensing the view that the actual OS kernel architecture + dont really make that much difference, good software does + <braunr> not at all + <braunr> i'm only saying that the phoronix benchmark results are useless + <braunr> because they didn't measure the right thing + <hroi_> ok diff --git a/open_issues/performance/io_system/read-ahead.mdwn b/open_issues/performance/io_system/read-ahead.mdwn index 657318cd..706e1632 100644 --- a/open_issues/performance/io_system/read-ahead.mdwn +++ b/open_issues/performance/io_system/read-ahead.mdwn @@ -1845,3 +1845,714 @@ License|/fdl]]."]]"""]] <braunr> but that's one way to do it <braunr> defaults work well too <braunr> as shown in other implementations + + +## IRC, freenode, #hurd, 2012-08-09 + + <mcsim> braunr: I'm still debugging ext2 with large storage patch + <braunr> mcsim: tough problems ? + <mcsim> braunr: The same issues as I always meet when do debugging, but it + takes time. + <braunr> mcsim: so nothing blocking so far ? + <mcsim> braunr: I can't tell you for sure that I will finish up to 13th of + August and this is unofficial pencil down date. + <braunr> all right, but are you blocked ? + <mcsim> braunr: If you mean the issues that I can not even imagine how to + solve than there is no ones. + <braunr> good + <braunr> mcsim: i'll try to review your code again this week end + <braunr> mcsim: make sure to commit everything even if it's messy + <mcsim> braunr: ok + <mcsim> braunr: I made changes to defpager, but I haven't tried + them. Commit them too? + <braunr> mcsim: sure + <braunr> mcsim: does it work fine without the large storage patch ? + <mcsim> braunr: looks fine, but TBH I can't even run such things like fsx, + because even without my changes it failed mightily at once. + <braunr> mcsim: right, well, that will be part of another task :) + + +## IRC, freenode, #hurd, 2012-08-13 + + <mcsim> braunr: hello. Seems ext2fs with large store patch works. + + +## IRC, freenode, #hurd, 2012-08-19 + + <mcsim> hello. Consider such situation. There is a page fault and kernel + decided to request pager for several pages, but at the moment pager is + able to provide only first pages, the rest ones are not know yet. Is it + possible to supply only one page and regarding rest ones tell the kernel + something like: "Rest pages try again later"? + <mcsim> I tried pager_data_unavailable && pager_flush_some, but this seems + does not work. + <mcsim> Or I have to supply something anyway? + <braunr> mcsim: better not provide them + <braunr> the kernel only really needs one page + <braunr> don't try to implement "try again later", the kernel will do that + if other page faults occur for those pages + <mcsim> braunr: No, translator just hangs + <braunr> ? + <mcsim> braunr: And I even can't deattach it without reboot + <braunr> hangs when what + <braunr> ? + <braunr> i mean, what happens when it hangs ? + <mcsim> If kernel request 2 pages and I provide one, than when page fault + occurs in second page translator hangs. + <braunr> well that's a bug + <braunr> clustered pager transfer is a mere optimization, you shouldn't + transfer more than you can just to satisfy some requested size + <mcsim> I think that it because I create fictitious pages before calling + mo_data_request + <braunr> as placeholders ? + <mcsim> Yes. Is it correct if I will not grab fictitious pages? + <braunr> no + <braunr> i don't know the details well enough about fictitious pages + unfortunately, but it really feels wrong to use them where real physical + pages should be used instead + <braunr> normally, an in-transfer page is simply marked busy + <mcsim> But If page is already marked busy kernel will not ask it another + time. + <braunr> when the pager replies, you unbusy them + <braunr> your bug may be that you incorrectly use pmap + <braunr> you shouldn't create mmu mappings for pages you didn't receive + from the pagers + <mcsim> I don't create them + <braunr> ok so you correctly get the second page fault + <mcsim> If pager supplies only first pages, when asked were two, than + second page will not become un-busy. + <braunr> that's a bug + <braunr> your code shouldn't assume the pager will provide all the pages it + was asked for + <braunr> only the main one + <mcsim> Will it be ok if I will provide special attribute that will keep + information that page has been advised? + <braunr> what for ? + <braunr> i don't understand "page has been advised" + <mcsim> Advised page is page that is asked in cluster, but there wasn't a + page fault in it. + <mcsim> I need this attribute because if I don't inform kernel about this + page anyhow, than kernel will not change attributes of this page. + <braunr> why would it change its attributes ? + <mcsim> But if page fault will occur in page that was asked than page will + be already busy by the moment. + <braunr> and what attribute ? + <mcsim> advised + <braunr> i'm lost + <braunr> 08:53 < mcsim> I need this attribute because if I don't inform + kernel about this page anyhow, than kernel will not change attributes of + this page. + <braunr> you need the advised attribute because if you don't inform the + kernel about this page, the kernel will not change the advised attribute + of this page ? + <mcsim> Not only advised, but busy as well. + <mcsim> And if page fault will occur in this page, kernel will not ask it + second time. Kernel will just block. + <braunr> well that's normal + <mcsim> But if kernel will block and pager is not going to report somehow + about this page, than translator will hang. + <braunr> but the pager is going to report + <braunr> and in this report, there can be less pages then requested + <mcsim> braunr: You told not to report + <braunr> the kernel can deduce it didn't receive all the pages, and mark + them unbusy anyway + <braunr> i told not to transfer more than requested + <braunr> but not sending data can be a form of communication + <braunr> i mean, sending a message in which data is missing + <braunr> it simply means its not there, but this info is sufficient for the + kernel + <mcsim> hmmm... Seems I understood you. Let me try something. + <mcsim> braunr: I informed kernel about missing page as follows: + pager_data_supply (pager, precious, writelock, i, 1, NULL, 0); Am I + right? + <braunr> i don't know the interface well + <braunr> what does it mean + <braunr> ? + <braunr> are you passing NULL as the data for a missing page ? + <mcsim> yes + <braunr> i see + <braunr> you shouldn't need a request for that though, avoiding useless ipc + is a good thing + <mcsim> i is number of page, 1 is quantity + <braunr> but if you can't find a better way for now, it will do + <mcsim> But this does not work :( + <braunr> that's a bug + <braunr> in your code probably + <mcsim> braunr: supplying NULL as data returns MACH_SEND_INVALID_MEMORY + <braunr> but why would it work ? + <braunr> mach expects something + <braunr> you have to change that + <mcsim> It's mig who refuses data. Mach does not even get the call. + <braunr> hum + <mcsim> That's why I propose to provide new attribute, that will keep + information regarding whether the page was asked as advice or not. + <braunr> i still don't understand why + <braunr> why don't you fix mig so you can your null message instead ? + <braunr> +send + <mcsim> braunr: because usually this is an error + <braunr> the kernel will decide if it's an erro + <braunr> r + <braunr> what kinf of reply do you intend to send the kernel with for these + "advised" pages ? + <mcsim> no reply. But when page fault will occur in busy page and it will + be also advised, kernel will not block, but ask this page another time. + <mcsim> And how kernel will know that this is an error or not? + <braunr> why ask another time ?! + <braunr> you really don't want to flood pagers with useless messages + <braunr> here is how it should be + <braunr> 1/ the kernel requests pages from the pager + <braunr> it know the range + <braunr> 2/ the pager replies what it can, full range, subset of it, even + only one page + <braunr> 3/ the kernel uses what the pager replied, and unbusies the other + pages + <mcsim> First time page was asked because page fault occurred in + neighborhood. And second time because PF occurred in page. + <braunr> well it shouldn't + <braunr> or it should, but then you have a segfault + <mcsim> But kernel does not keep bound of range, that it asked. + <braunr> if the kernel can't find the main page, the one it needs to make + progress, it's a segfault + <mcsim> And this range could be supplied in several messages. + <braunr> absolutely not + <braunr> you defeat the purpose of clustered pageins if you use several + messages + <mcsim> But interface supports it + <braunr> interface supported single page transfers, doesn't mean it's good + <braunr> well, you could use several messages + <braunr> as what we really want is less I/O + <mcsim> Noone keeps bounds of requested range, so it couldn't be checked + that range was split + <braunr> but it would be so much better to do it all with as few messages + as possible + <braunr> does the kernel knows the main page ? + <braunr> know* + <mcsim> Splitting range is not optimal, but it's not an error. + <braunr> i assume it does + <braunr> doesn't it ? + <mcsim> no, that's why I want to provide new attribute. + <braunr> i'm sorry i'm lost again + <braunr> how does the kernel knows a page fault has been serviced ? + <braunr> know* + <mcsim> It receives an interrupt + <braunr> ? + <braunr> let's not mix terms + <mcsim> oh.. I read as received. Sorry + <mcsim> It get mo_data_supply message. Than it replaces fictitious pages + with real ones. + <braunr> so you get a message + <braunr> and you kept track of the range using fictitious pages + <braunr> use the busy flag instead, and another way to retain the range + <mcsim> I allocate fictitious pages to reserve place. Than if page fault + will occur in this page fictitious page kernel will not send another + mo_data_request call, it will wait until fictitious page unblocks. + <braunr> i'll have to check the code but it looks unoptimal to me + <braunr> we really don't want to allocate useless objects when a simple + busy flag would do + <mcsim> busy flag for what? There is no page yet + <braunr> we're talking about mo_data_supply + <braunr> actually we're talking about the whole page fault process + <mcsim> We can't mark nothing as busy, that's why kernel allocates + fictitious page and marks it as busy until real page would be supplied. + <braunr> what do you mean "nothing" ? + <mcsim> VM_PAGE_NULL + <braunr> uh ? + <braunr> when are physical pages allocated ? + <braunr> on request or on reply from the pager ? + <braunr> i'm reading mo_data_supply, and it looks like the page is already + busy at that time + <mcsim> they are allocated by pager and than supplied in reply + <mcsim> Yes, but these pages are fictitious + <braunr> show me please + <braunr> in the master branch, not yours + <mcsim> that page is fictitious? + <braunr> yes + <braunr> i'm referring to the way mach currently does things + <mcsim> vm/vm_fault.c:582 + <braunr> that's memory_object_lock_page + <braunr> hm wait + <braunr> my bad + <braunr> ah that damn object chaining :/ + <braunr> ok + <braunr> the original code is stupid enough to use fictitious pages all the + time, you probably have to do the same + <mcsim> hm... Attributes will be useless, pager should tell something about + pages, that it is not going to supply. + <braunr> yes + <braunr> that's what null is for + <mcsim> Not null, null is error. + <braunr> one problem i can think of is making sure the kernel doesn't + interpret missing as error + <braunr> right + <mcsim> I think better have special value for mo_data_error + <braunr> probably + + +### IRC, freenode, #hurd, 2012-08-20 + + <antrik> braunr: I think it's useful to allow supplying the data in several + batches. the kernel should *not* assume that any data missing in the + first batch won't be supplied later. + <braunr> antrik: it really depends + <braunr> i personally prefer synchronous approaches + <antrik> demanding that all data is supplied at once could actually turn + readahead into a performace killer + <mcsim> antrik: Why? The only drawback I see is higher response time for + page fault, but it also leads to reduced overhead. + <braunr> that's why "it depends" + <braunr> mcsim: it brings benefit only if enough preloaded pages are + actually used to compensate for the time it took the pager to provide + them + <braunr> which is the case for many workloads (including sequential access, + which is the common case we want to optimize here) + <antrik> mcsim: the overhead of an extra RPC is negligible compared to + increased latencies when dealing with slow backing stores (such as disk + or network) + <mcsim> antrik: also many replies lead to fragmentation, while in one reply + all data is gathered in one bunch. If all data is placed consecutively, + than it may be transferred next time faster. + <braunr> mcsim: what kind of fragmentation ? + <antrik> I really really don't think it's a good idea for the page to hold + back the first page (which is usually the one actually blocking) while + it's still loading some other pages (which will probably be needed only + in the future anyways, if at all) + <antrik> err... for the pager to hold back + <braunr> antrik: then all pagers should be changed to handle asynchronous + data supply + <braunr> it's a bit late to change that now + <mcsim> there could be two cases of data placement in backing store: 1/ all + asked data is placed consecutively; 2/ it is spread among backing + store. If pager gets data in one message it more like place it + consecutively. So to have data consecutive in each pager, each pager has + to try send data in one message. Having data placed consecutive is + important, since reading of such data is much more faster. + <braunr> mcsim: you're confusing things .. + <braunr> or you're not telling them properly + <mcsim> Ok. Let me try one more time + <braunr> since you're working *only* on pagein, not pageout, how do you + expect spread pages being sent in a single message be better than + multiple messages ? + <mcsim> braunr: I think about future :) + <braunr> ok + <braunr> but antrik is right, paging in too much can reduce performance + <braunr> so the default policy should be adjusted for both the worst case + (one page) and the average/best (some/mane contiguous pages) + <braunr> through measurement ideally + <antrik> mcsim: BTW, I still think implementing clustered pageout has + higher priority than implementing madvise()... but if the latter is less + work, it might still make sense to do it first of course :-) + <braunr> many* + <braunr> there aren't many users of madvise, true + <mcsim> antrik: Implementing madvise I expect to be very simple. It should + just translate call to vm_advise + <antrik> well, that part is easy of course :-) so you already implemented + vm_advise itself I take it? + <mcsim> antrik: Yes, that was also quite easy. + <antrik> great :-) + <antrik> in that case it would be silly of course to postpone implementing + the madvise() wrapper. in other words: never mind my remark about + priorities :-) + + +## IRC, freenode, #hurd, 2012-09-03 + + <mcsim> I try a test with ext2fs. It works, than I just recompile ext2fs + and it stops working, than I recompile it again several times and each + time the result is unpredictable. + <braunr> sounds like a concurrency issue + <mcsim> I can run the same test several times and ext2 works until I + recompile it. That's the problem. Could that be concurrency too? + <braunr> mcsim: without bad luck, yes, unless "several times" is a lot + <braunr> like several dozens of tries + + +## IRC, freenode, #hurd, 2012-09-04 + + <mcsim> hello. I want to tell that ext2fs translator, that I work on, + replaced for my system old variant that processed only single pages + requests. And it works with partitions bigger than 2 Gb. + <mcsim> Probably I'm not for from the end. + <mcsim> But it's worth to mention that I didn't fix that nasty bug that I + told yesterday about. + <mcsim> braunr: That bug sometimes appears after recompilation of ext2fs + and always disappears after sync or reboot. Now I'm going to finish + defpager and test other translators. + + +## IRC, freenode, #hurd, 2012-09-17 + + <mcsim> braunr: hello. Do you remember that you said that pager has to + inform kernel about appropriate cluster size for readahead? + <mcsim> I don't understand how kernel store this information, because it + does not know about such unit as "pager". + <mcsim> Can you give me an advice about how this could be implemented? + <youpi> mcsim: it can store it in the object + <mcsim> youpi: It too big overhead + <mcsim> youpi: at least from my pow + <mcsim> *pov + <braunr> mcsim: we discussed this already + <braunr> mcsim: there is no "pager" entity in the kernel, which is a defect + from my PoV + <braunr> mcsim: the best you can do is follow what the kernel already does + <braunr> that is, store this property per object$ + <braunr> we don't care much about the overhead for now + <braunr> my guess is there is already some padding, so the overhead is + likely to be amortized by this + <braunr> like youpi said + <mcsim> I remember that discussion, but I didn't get than whether there + should be only one or two values for all policies. Or each policy should + have its own values? + <mcsim> braunr: ^ + <braunr> each policy should have its own values, which means it can be + implemented with a simple static array somewhere + <braunr> the information in each object is a policy selector, such as an + index in this static array + <mcsim> ok + <braunr> mcsim: if you want to minimize the overhead, you can make this + selector a char, and place it near another char member, so that you use + space that was previously used as padding by the compiler + <braunr> mcsim: do you see what i mean ? + <mcsim> yes + <braunr> good + + +## IRC, freenode, #hurd, 2012-09-17 + + <mcsim> hello. May I add function krealloc to slab.c? + <braunr> mcsim: what for ? + <mcsim> braunr: It is quite useful for creating dynamic arrays + <braunr> you don't want dynamic arrays + <mcsim> why? + <braunr> they're expensive + <braunr> try other data structures + <mcsim> more expensive than linked lists? + <braunr> depends + <braunr> but linked lists aren't the only other alternative + <braunr> that's why btrees and radix trees (basically trees of arrays) + exist + <braunr> the best general purpose data structure we have in mach is the red + black tree currently + <braunr> but always think about what you want to do with it + <mcsim> I want to store there sets of sizes for different memory + policies. I don't expect this array to be big. But for sure I can use + rbtree for it. + <braunr> why not a static array ? + <braunr> arrays are perfect for known data sizes + <mcsim> I expect from pager to supply its own sizes. So at the beginning in + this array is only default policy. When pager wants to supply it own + policy kernel lookups table of advice. If this policy is new set of sizes + then kernel creates new entry in table of advice. + <braunr> that would mean one set of sizes for each object + <braunr> why don't you make things simple first ? + <mcsim> Object stores only pointer to entry in this table. + <braunr> but there is no pager object shared by memory objects in the + kernel + <mcsim> I mean struct vm_object + <braunr> so that's what i'm saying, one set per object + <braunr> it's useless overhead + <braunr> i would really suggest using a global set of policies for now + <mcsim> Probably, I don't understand you. Where do you want to store this + static array? + <braunr> it's a global one + <mcsim> "for now"? It is not a problem to implement a table for local + advice, using either rbtree or dynamic array. + <braunr> it's useless overhead + <braunr> and it's not a single integer, you want a whole container per + object + <braunr> don't do anything fancy unless you know you really want it + <braunr> i'll link the netbsd code again as a very good example of how to + implement global policies that work more than decently for every file + system in this OS + <braunr> + http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/uvm/uvm_fault.c?rev=1.194&content-type=text/x-cvsweb-markup&only_with_tag=MAIN + <braunr> look for uvmadvice + <mcsim> But different translators have different demands. Thus changing of + global policy for one translator would have impact on behavior of another + one. + <braunr> i understand + <braunr> this isn't l4, or anything experimental + <braunr> we want something that works well for us + <mcsim> And this is acceptable? + <braunr> until you're able to demonstrate we need different policies, i'd + recommend not making things more complicated than they already are and + need to be + <braunr> why wouldn't it ? + <braunr> we've been discussing this a long time :/ + <mcsim> because every process runs in isolated environment and the fact + that there is something outside this environment, that has no rights to + do that, does it surprises me. + <braunr> ? + <mcsim> ok. let me dip in uvm code. Probably my questions disappear + <braunr> i don't think it will + <braunr> you're asking about the system design here, not implementation + details + <braunr> with l4, there are as you'd expect well defined components + handling policies for address space allocation, or paging, or whatever + <braunr> but this is mach + <braunr> mach has a big shared global vm server with in kernel policies for + it + <braunr> so it's ok to implement a global policy for this + <braunr> and let's be pragmatic, if we don't need complicated stuff, why + would we waste time on this ? + <mcsim> It is not complicated. + <braunr> retaining a whole container for each object, whereas they're all + going to contain exactly the same stuff for years to come seems overly + complicated for me + <mcsim> I'm not going to create separate container for each object. + <braunr> i'm not following you then + <braunr> how can pagers upload their sizes in the kernel ? + <mcsim> I'm going to create a new container only for combination of cluster + sizes that are not present in table of advice. + <braunr> that's equivalent + <braunr> you're ruling out the default set, but that's just an optimization + <braunr> whenever a file system decides to use other sizes, the problem + will arise + <mcsim> Before creating a container I'm going to lookup a table. And only + than create + <braunr> a table ? + <mcsim> But there will be the same container for a huge bunch of objects + <braunr> how do you select it ? + <braunr> if it's a per pager container, remember there is no shared pager + object in the kernel, only ports to external programs + <mcsim> I'll give an example + <mcsim> Suppose there are only two policies. At the beginning we have table + {{random = 4096, sequential = 8096}}. Than pager 1 wants to add new + policy where random cluster size is 8192. He asks kernel to create it and + after this table will be following: {{random = 4096, sequential = 8192}, + {random = 8192, sequential = 8192}}. If pager 2 wants to create the same + policy as pager 1, kernel will lockup table and will not create new + entry. So the table will be the same. + <mcsim> And each object has link to appropriate table entry + <braunr> i'm not sure how this can work + <braunr> how can pagers 1 and 2 know the sizes are the same for the same + policy ? + <braunr> (and actually they shouldn't) + <mcsim> For faster lookup there will be create hash keys for each entry + <braunr> what's the lookup key ? + <mcsim> They do not know + <mcsim> The kernel knows + <braunr> then i really don't understand + <braunr> and how do you select sizes based on the policy ? + <braunr> and how do you remove unused entries ? + <braunr> (ok this can be implemented with a simple ref counter) + <mcsim> "and how do you select sizes based on the policy ?" you mean at + page fault? + <braunr> yes + <mcsim> entry or object keeps pointer to appropriate entry in the table + <braunr> ok your per object data is a pointer to the table entry and the + policy is the index inside + <braunr> so you really need a ref counter there + <mcsim> yes + <braunr> and you need to maintain this table + <braunr> for me it's uselessly complicated + <mcsim> but this keeps design clear + <braunr> not for me + <braunr> i don't see how this is clearer + <braunr> it's just more powerful + <braunr> a power we clearly don't need now + <braunr> and in the following years + <braunr> in addition, i'm very worried about the potential problems this + can introduce + <mcsim> In fact I don't feel comfortable from the thought that one + translator can impact on behavior of another. + <braunr> simple example: the table is shared, it needs a lock, other data + structures you may have added in your patch may also need a lock + <braunr> but our locks are noop for now, so you just can't be sure there is + no deadlock or other issues + <braunr> and adding smp is a *lot* more important than being able to select + precisely policy sizes that we're very likely not to change a lot + <braunr> what do you mean by "one translator can impact another" ? + <mcsim> As I understand your idea (I haven't read uvm code yet) that there + is a global table of cluster sizes for different policies. And every + translator can change values in this table. That is what I mean under one + translator will have an impact on another one. + <braunr> absolutely not + <braunr> translators *can't* change sizes + <braunr> the sizes are completely static, assumed to be fit all + <braunr> -be + <braunr> it's not optimial but it's very simple and effective in practice + <braunr> optimal* + <braunr> and it's not a table of cluster sizes + <braunr> it's a table of pages before/after the faulted one + <braunr> this reflects the fact tha in mach, virtual memory (implementation + and policy) is in the kernel + <braunr> translators must not be able to change that + <braunr> let's talk about pagers here, not translators + <mcsim> Finally I got you. This is an acceptable tradeoff. + <braunr> it took some time :) + <braunr> just to clear something + <braunr> 20:12 < mcsim> For faster lookup there will be create hash keys + for each entry + <braunr> i'm not sure i understand you here + <mcsim> To found out if there is such policy (set of sizes) in the table we + can lookup every entry and compare each value. But it is better to create + a hash value for set and thus find equal policies. + <braunr> first, i'm really not comfortable with hash tables + <braunr> they really need careful configuration + <braunr> next, as we don't expect many entries in this table, there is + probably no need for this overhead + <braunr> remember that one property of tables is locality of reference + <braunr> you access the first entry, the processor automatically fills a + whole cache line + <braunr> so if your table fits on just a few, it's probably faster to + compare entries completely than to jump around in memory + <mcsim> But we can sort hash keys, and in this way find policies quickly. + <braunr> cache misses are way slower than computation + <braunr> so unless you have massive amounts of data, don't use an optimized + container + <mcsim> (20:38:53) braunr: that's why btrees and radix trees (basically + trees of arrays) exist + <mcsim> and what will be the key? + <braunr> i'm not saying to use a tree instead of a hash table + <braunr> i'm saying, unless you have many entries, just use a simple table + <braunr> and since pagers don't add and remove entries from this table + often, it's on case reallocation is ok + <braunr> one* + <mcsim> So here dynamic arrays fit the most? + <braunr> probably + <braunr> it really depends on the number of entries and the write ratio + <braunr> keep in mind current processors have 32-bits or (more commonly) + 64-bits cache line sizes + <mcsim> bytes probably? + <braunr> yes bytes + <braunr> but i'm not willing to add a realloc like call to our general + purpose kernel allocator + <braunr> i don't want to make it easy for people to rely on it, and i hope + the lack of it will make them think about other solutions instead :) + <braunr> and if they really want to, they can just use alloc/free + <mcsim> Under "other solutions" you mean trees? + <braunr> i mean anything else :) + <braunr> lists are simple, trees are elegant (but add non negligible + overhead) + <braunr> i like trees because they truely "gracefully" scale + <braunr> but they're still O(log n) + <braunr> a good hash table is O(1), but must be carefully measured and + adjusted + <braunr> there are many other data structures, many of them you can find in + linux + <braunr> but in mach we don't need a lot of them + <mcsim> Your favorite data structures are lists and trees. Next, what + should you claim, is that lisp is your favorite language :) + <braunr> functional programming should eventually rule the world, yes + <braunr> i wouldn't count lists are my favorite, which are really trees + <braunr> as* + <braunr> there is a reason why red black trees back higher level data + structures like vectors or maps in many common libraries ;) + <braunr> mcsim: hum but just to make it clear, i asked this question about + hashing because i was curious about what you had in mind, i still think + it's best to use static predetermined values for policies + <mcsim> braunr: I understand this. + <braunr> :) + <mcsim> braunr: Yeah. You should be cautious with me :) + + +## IRC, freenode, #hurd, 2012-09-21 + + <antrik> mcsim: there is only one cluster size per object -- it depends on + the properties of the backing store, nothing else. + <antrik> (while the readahead policies depend on the use pattern of the + application, and thus should be selected per mapping) + <antrik> but I'm still not convinced it's worthwhile to bother with cluster + size at all. do other systems even do that?... + + +## IRC, freenode, #hurd, 2012-09-23 + + <braunr> mcsim: how long do you think it will take you to polish your gsoc + work ? + <braunr> (and when before you begin that part actually, because we'll to + review the whole stuff prior to polishing it) + <mcsim> braunr: I think about 2 weeks + <mcsim> But you may already start review it, if you're intended to do it + before I'll rearrange commits. + <mcsim> Gnumach, ext2fs and defpager are ready. I just have to polish the + code. + <braunr> mcsim: i don't know when i'll be able to do that + <braunr> so expect a few weeks on my (our) side too + <mcsim> ok + <braunr> sorry for being slow, that's how hurd development is :) + <mcsim> What should I do with libc patch that adds madvise support? + <mcsim> Post it to bug-hurd? + <braunr> hm probably the same i did for pthreads, create a topic branch in + glibc.git + <mcsim> there is only one commit + <braunr> yes + <braunr> (mine was a one liner :p) + <mcsim> ok + <braunr> it will probably be a debian patch before going into glibc anyway, + just for making sure it works + <mcsim> But according to term. I expect that my study begins in a week and + I'll have to do some stuff then, so actually probably I'll need a week + more. + <braunr> don't worry, that's expected + <braunr> and that's the reason why we're slow + <mcsim> And what should I do with large store patch? + <braunr> hm good question + <braunr> what did you do for now ? + <braunr> include it in your work ? + <braunr> that's what i saw iirc + <mcsim> Yes. It consists of two parts. + <braunr> the original part and the modificaionts ? + <braunr> modifications* + <braunr> i think youpi would know better about that + <mcsim> First (small) adds notification to libpager interface and second + one adds support for large stores. + <braunr> i suppose we'll probably merge the large store patch at some point + anyway + <mcsim> Yes both original and modifications + <braunr> good + <mcsim> I'll split these parts to different commits and I'll try to make + support for large stores independent from other work. + <braunr> that would be best + <braunr> if you can make it so that, by ommitting (or including) one patch, + we can add your patches to the debian package, it would be great + <braunr> (only with regard to the large store change, not other potential + smaller conflicts) + <mcsim> braunr: I also found several bugs in defpager, that I haven't fixed + since winter. + <braunr> oh + <mcsim> seems nobody hasn't expect them. + <braunr> i'm very interested in those actually (not too soon because it + concerns my work on pageout, which is postponed after pthreads and + select) + <mcsim> ok. than I'll do it first. + + +## IRC, freenode, #hurd, 2012-09-24 + + <braunr> mcsim: what is vm_get_advice_info ? + <mcsim> braunr: hello. It should supply some machine specific parameters + regarding clustered reading. At the moment it supplies only maximal + possible size of cluster. + <braunr> mcsim: why such a need ? + <mcsim> It is used by defpager, as it can't allocate memory dynamically and + every thread has to allocate maximal size beforehand + <braunr> mcsim: i see + + +## IRC, freenode, #hurd, 2012-10-05 + + <mcsim> braunr: I think it's not worth to separate large store patch for + ext2 and patch for moving it to new libpager interface. Am I right? + <braunr> mcsim: it's worth separating, but not creating two versions + <braunr> i'm not sure what you mean here + <mcsim> First, I applied large store patch, and than I was changing patched + code, to make it work with new libpager interface. So changes to make + ext2 work with new interface depend on large store patch. + <mcsim> braunr: ^ + <braunr> mcsim: you're not forced to make each version resulting from a new + commit work + <braunr> but don't make big commits + <braunr> so if changing an interface requires its users to be updated + twice, it doesn't make sense to do that + <braunr> just update the interface cleanly, you'll have one or more commits + that produce intermediate version that don't build, that's ok + <braunr> then in another, separate commit, adjust the users + <mcsim> braunr: The only user now is ext2. And the problem with ext2 is + that I updated not the version from git repository, but the version, that + I've got after applying the large store patch. So in other words my + question is follows: should I make a commit that moves to new interface + version of ext2fs without large store patch? + <braunr> you're asking if you can include the large store patch in your + work, and by extension, in the main branch + <braunr> i would say yes, but this must be discussed with others diff --git a/open_issues/select.mdwn b/open_issues/select.mdwn index 6bed94ca..12807e11 100644 --- a/open_issues/select.mdwn +++ b/open_issues/select.mdwn @@ -1395,6 +1395,114 @@ IRC, unknown channel, unknown date: [[libpthread]]. +## IRC, freenode, #hurd, 2012-08-07 + + <rbraun_hurd> anyone knows of applications extensively using non-blocking + networking functions ? + <rbraun_hurd> (well, networking functions in a non-blocking way) + <antrik> rbraun_hurd: X perhaps? + <antrik> it's single-threaded, so I guess it must be pretty async ;-) + <antrik> thinking about it, perhaps it's the reason it works so poorly on + Hurd... + <braunr> it does ? + <rbraun_hurd> ah maybe at the client side, right + <rbraun_hurd> hm no, the client side is synchronous + <rbraun_hurd> oh by the way, i can use gitk on darnassys + <rbraun_hurd> i wonder if it's because of the select fix + <tschwinge> rbraun_hurd: If you want, you could also have a look if there's + any improvement for these: + http://www.gnu.org/software/hurd/open_issues/select.html (elinks), + http://www.gnu.org/software/hurd/open_issues/dbus.html, + http://www.gnu.org/software/hurd/open_issues/runit.html + <tschwinge> rbraun_hurd: And congratulations, again! :-) + <rbraun_hurd> tschwinge: too bad it can't be merged before the pthread port + :( + <antrik> rbraun_hurd: I was talking about server. most clients are probably + sync. + <rbraun_hurd> antrik: i guessed :) + <antrik> (thought certainly not all... multithreaded clients are not really + supported with xlib IIRC) + <rbraun_hurd> but i didn't have much trouble with X + <antrik> tried something pushing a lot of data? like, say, glxgears? :-) + <rbraun_hurd> why not + <rbraun_hurd> the problem with tests involving "a lot of data" is that it + can easily degenerate into a livelock + <antrik> yeah, sounds about right + <rbraun_hurd> (with the current patch i mean) + <antrik> the symptoms I got were general jerkiness, with occasional long + hangs + <rbraun_hurd> that applies to about everything on the hurd + <rbraun_hurd> so it didn't alarm me + <antrik> another interesting testcase is freeciv-gtk... it reporducibly + caused a thread explosion after idling for some time -- though I don't + remember the details; and never managed to come up with a way to track + down how this happens... + <rbraun_hurd> dbus is more worthwhile + <rbraun_hurd> pinotree: hwo do i test that ? + <pinotree> eh? + <rbraun_hurd> pinotree: you once mentioned dbus had trouble with non + blocking selects + <pinotree> it does a poll() with a 0s timeout + <rbraun_hurd> that's the non blocking select part, yes + <pinotree> you'll need also fixes for the socket credentials though, + otherwise it won't work ootb + <rbraun_hurd> right but, isn't it already used somehow ? + <antrik> rbraun_hurd: uhm... none of the non-X applications I use expose a + visible jerkiness/long hangs pattern... though that may well be a result + of general load patterns rather than X I guess + <rbraun_hurd> antrik: that's my feeling + <rbraun_hurd> antrik: heavy communication channels, unoptimal scheduling, + lack of scalability, they're clearly responsible for the generally + perceived "jerkiness" of the system + <antrik> again, I can't say I observe "general jerkiness". apart from slow + I/O the system behaves rather normally for the things I do + <antrik> I'm pretty sure the X jerkiness *is* caused by the socket + communication + <antrik> which of course might be a scheduling issue + <antrik> but it seems perfectly possible that it *is* related to the select + implementation + <antrik> at least worth a try I'd say + <rbraun_hurd> sure + <rbraun_hurd> there is still some work to do on it though + <rbraun_hurd> the client side changes i did could be optimized a bit more + <rbraun_hurd> (but i'm afraid it would lead to ugly things like 2 timeout + parameters in the io_select_timeout call, one for the client side, the + other for the servers, eh) + + +## IRC, freenode, #hurd, 2012-08-07 + + <braunr> when running gitk on [darnassus], yesterday, i could push the CPU + to 100% by simply moving the mouse in the window :p + <braunr> (but it may also be caused by the select fix) + <antrik> braunr: that cursor might be "normal" + <rbraunrh> antrik: what do you mean ? + <antrik> the 100% CPU + <rbraunh> antrik: yes i got that, but what would make it normal ? + <rbraunh> antrik: right i get similar behaviour on linux actually + <rbraunh> (not 100% because two threads are spread on different cores, but + their cpu usage add up to 100%) + <rbraunh> antrik: so you think as long as there are events to process, the + x client is running + <rbraunh> thath would mean latencies are small enough to allow that, which + is actually a very good thing + <antrik> hehe... sound kinda funny :-) + <rbraunh> this linear search on dequeue is a real pain :/ + + +## IRC, freenode, #hurd, 2012-08-09 + +`screen` doesn't close a window/hangs after exiting the shell. + + <rbraunh> the screen issue seems linked to select :p + <rbraunh> tschwinge: the term server may not correctly implement it + <rbraunh> tschwinge: the problem looks related to the term consoles not + dying + <rbraunh> http://www.gnu.org/software/hurd/open_issues/term_blocking.html + +[[Term_blocking]]. + + # See Also See also [[select_bogus_fd]] and [[select_vs_signals]]. diff --git a/open_issues/synchronous_ipc.mdwn b/open_issues/synchronous_ipc.mdwn index 57bcdda7..53d5d69d 100644 --- a/open_issues/synchronous_ipc.mdwn +++ b/open_issues/synchronous_ipc.mdwn @@ -62,3 +62,124 @@ From [[Genode RPC|microkernel/genode/rpc]]. <antrik> well, if you see places where blocking is done but failing would be more appropriate, try changing them I'd say... <braunr> it's not that easy :/ + + +# IRC, freenode, #hurd, 2012-08-18 + + <lcc> what is the deepest design mistake of the HURD/gnumach? + <braunr> lcc: async ipc + <savask> braunr: You mentioned that moving to L4 will create problems. Can + you name some, please? + <savask> I thought it was going to be faster on L4 + <braunr> the problem is that l4 *only* provides sync ipc + <braunr> so implementing async communication would require one seperated + thread for each instance of async communication + <savask> But you said that the deepest design mistake of Hurd is asynch + ipc. + <braunr> not the hurd, mach + <braunr> and hurd depends on it now + <braunr> i said l4 provides *only* sync ipc + <braunr> systems require async communication tools + <braunr> but they shouldn't be built entirely on top of them + <savask> Hmm, so you mean mach has bad asynch ipc? + <braunr> you can consider mach and l4 as two extremes in os design + <braunr> mach *only* has async ipc + <lcc> what was viengoos trying to explore? + * savask is confused + <braunr> lcc: half-sync ipc :) + <braunr> lcc: i can't tell you more on that, i need to understand it better + myself before any explanation attempt + <savask> You say that mach problem is asynch ipc. And L4's problem is it's + sync ipc. That means problems are in either of them! + <braunr> exactly + <lcc> how did apple resolve issues with mach? + <savask> What is perfect then? A "golden middle"? + <braunr> lcc: they have migrating threads, which make most rpc behave as if + they used sync ipc + <braunr> savask: nothing is perfect :p + <mcsim> braunr: but why async ipc is the problem? + <braunr> mcsim: it requires in-kernel buffering + <savask> braunr: Yes, but we can't have problems everywhere o_O + <braunr> mcsim: this not only reduces communication performance, but + creates many resource usage problems + <braunr> mcsim: and potential denial of service, which is what we + experience most of the time when something in the hurd fails + <braunr> savask: there are problems we can live with + <mcsim> braunr: But this could be replaced by userspace server, isn't it? + <braunr> savask: this is what monolithic kernels do + <braunr> mcsim: what ? + <braunr> mcsim: this would be the same, this central buffering server would + suffer from the same kind of issue + <mcsim> braunr: async ipc. Buffer can hold special server + <mcsim> But there could be created several servers, and queue could have + limit. + <braunr> queue limits are a problem + <braunr> when a queue limit is reached, you either block (= sync ipc) or + lose a message + <braunr> to keep messaging reliable, mach makes senders block + <braunr> the problem is that async ipc is often used to avoid blocking + <braunr> so blocking when you don't expect it can create deadlocks + <braunr> savask: a good compromise is to use sync ipc most of the time, and + async ipc for a few special cases, like signals + <braunr> this is what okl4 does if i'm right + <braunr> i'm not sure of the details, but like many other projects they + realized current systems simply need good support for async ipc, so they + extended l4 or something on top of it to provide it + <braunr> it took years of research for very smart people to get to some + consensus like "sync ipc is better but async is needed too" + <braunr> personaly i don't like l4 :/ + <braunr> really not + <mcsim> braunr: Anyway there is some queue for messaging, but at the moment + if it overflows panics kernel. And with limited queue servers will panic. + <braunr> mcsim: it can't overflow + <braunr> mach blocks senders + <braunr> queuing basically means "block and possible deadlock" or "lose + messages and live with it" + <mcsim> So, deadlocks are still possible? + <braunr> of course + <braunr> have a look at the libpager debian patch and the discussion around + it + <braunr> it's a perfect example + <youpi> braunr: it makes gnu mach slow as hell sometimes, which I guess is + because all threads (which can ben 1000s) wake at the same time + <braunr> youpi: you mean are created ? + <braunr> because they'll have to wake in any case + <braunr> i can understand why creating lots of threads is slower, but + cthreads never destroyes kernel threads + <braunr> doesn't seem to be a mach problem, rather a cthreads one + <braunr> i hope we're able to remove the patch after pthreads are used + +[[libpthread]]. + + <mcsim> braunr: You state that hurd can't move to sync ipc, since it + depends on async ipc. But at the same time async ipc doesn't guarantee + that task wouldn't block. So, I don't understand why limited queues will + lead to more deadlocks? + <braunr> mcsim: async ipc can block because of queue limits + <braunr> mcsim: if you remove the limit, you remove the deadlock problem, + and replace it with denial of service + <braunr> mcsim: i didn't say the hurd can't move to sync ipc + <braunr> mcsim: i said it came to depend on async ipc as provided by mach, + and we would need to change that + <braunr> and it's tricky + <youpi> braunr: no, I really mean are woken. The timeout which gets dropped + by the patch makes threads wake after some time, to realize they should + go away. It's a hell long when all these threads wake at the same time + (because theygot created at the same time) + <braunr> ahh + + <antrik> savask: what is perfect regarding IPC is something nobody can + really answer... there are competing opinions on that matter. but we know + by know that the Mach model is far from ideal, and that the (original) L4 + model is also problematic -- at least for implementing a UNIX-like system + <braunr> personally, if i'd create a system now, i'd use sync ipc for + almost everything, and implement posix-like signals in the kernel + <braunr> that's one solution, it's not perfect + <braunr> savask: actually the real answer may be "noone knows for now and + it still requires work and research" + <braunr> so for now, we're using mach + <antrik> savask: regarding IPC, the path explored by Viengoos (and briefly + Coyotos) seems rather promising to me + <antrik> savask: and yes, I believe that whatever direction we take, we + should do so by incrementally reworking Mach rather than jumping to a + completely new microkernel... diff --git a/open_issues/system_stats.mdwn b/open_issues/system_stats.mdwn new file mode 100644 index 00000000..9a13b29a --- /dev/null +++ b/open_issues/system_stats.mdwn @@ -0,0 +1,39 @@ +[[!meta copyright="Copyright © 2012 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!tag open_issue_documentation]]There should be a page listing ways to get +system statistics, how to interpret them, and some example/expected values. + + +# IRC, frenode, #hurd, 2012-11-04 + + <mcsim> Hi, is that normal that memory cache "ipc_port" is 24 Mb already? + Some memory has been already swapped out. + <mcsim> Other caches are big too + <braunr> how many ports ? + <mcsim> 45922 + <braunr> yes it's normal + <braunr> ipc_port 0010 76 4k 50 45937 302050 + 24164k 4240k + <braunr> it's a bug in exim + <braunr> or triggered by exim, from time to time + <braunr> lots of ports are created until the faulty processes are killed + <braunr> the other big caches you have are vm_object and vm_map_entry, + probably because of a big build like glibc + <braunr> and if they remain big, it's because there was no memory pressure + since they got big + <braunr> memory pressure can only be caused by very large files on the + hurd, because of the limited page cache size (4000 objects at most) + <braunr> the reason you have swapped memory is probably because of a glibc + test that allocates a very large (more than 1.5 GiB iirc) block and fills + it + <mcsim> yes + <braunr> (a test that fails with the 2G/2G split of the debian kernel, but + not on your vanilla version btw) diff --git a/open_issues/term_blocking.mdwn b/open_issues/term_blocking.mdwn index 19d18d0e..39803779 100644 --- a/open_issues/term_blocking.mdwn +++ b/open_issues/term_blocking.mdwn @@ -1,4 +1,5 @@ -[[!meta copyright="Copyright © 2009, 2011 Free Software Foundation, Inc."]] +[[!meta copyright="Copyright © 2009, 2011, 2012 Free Software Foundation, +Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this @@ -117,6 +118,128 @@ noninvasive on`, attach to the *term* that GDB is using. [[2011-07-04]]. +# IRC, freenode, #hurd, 2012-08-09 + +In context of the [[select]] issue. + + <braunr> i wonder where the tty allocation is made + <braunr> it could simply be that current applications don't handle old BSD + ptys correctly + <braunr> hm no, allocation is fine + <braunr> does someone know why there is no term instance for /dev/ttypX ? + <braunr> showtrans says "/hurd/term /dev/ttyp0 pty-slave /dev/ptyp0" though + <youpi> braunr: /dev/ttypX share the same translator with /dev/ptypX + <braunr> youpi: but how ? + <youpi> see the main function of term + <youpi> it attaches itself to the other node + <youpi> with file_set_translator + <youpi> just like pfinet can attach itself to /servers/socket/26 too + <braunr> youpi: isn't there a possible race when the same translator tries + to sets itself on several nodes ? + <youpi> I don't know + <tschwinge> There is. + <braunr> i guess it would just faikl + <braunr> fail + <tschwinge> I remember some discussion about this, possibly in context of + the IPv6 project. + <braunr> gdb shows weird traces in term + <braunr> i got this earlier today: http://www.sceen.net/~rbraun/gdb.txt + <braunr> 0x805e008 is the ptyctl, the trivs control for the pty + <tschwinge> braunr: How do you mean »weird«? + <braunr> tschwinge: some peropen (po) are never destroyed + <tschwinge> Well, can't they possibly still be open? + <braunr> they shouldn't + <braunr> that's why term doesn't close cleany, why select still reports + readiness, and why screen loops on it + <braunr> (and why each ssh session uses a different pty) + <tschwinge> ... but only on darnassus, I think? (I think I haven't seen + this anywhere else.) + <braunr> really ? + <braunr> i had it on my virtual machines too + <tschwinge> But perhaps I've always been rebooting systems quickly enough + to not notice. + <tschwinge> OK, I'll have a look next time I boot mine. + <braunr> i suppose it's why you can't login anymore quickly when syslog is + running + +[[syslog]]? + + <braunr> i've traced the problem to ptyio.c, where pty_open_hook returns + EBUSY because ptyopen is still true + <braunr> ptyopen remains true because pty_po_create_hook doesn't get called + <youpi> tschwinge: I've seen the pty issue on exodar too, and on my qemu + image too + <braunr> err, pty_po_destroy_hook + <tschwinge> OK. + <braunr> and pty_po_destroy_hook doesn't get called from users.c because + po->cntl != ptyctl + <braunr> which means, somehow, the pty never gets closed + <youpi> oddly enough it seems to happen on all qemu systems I have, and no + xen system I have + <braunr> Oo + <braunr> are they all (xen and qemu) up to date ? + <braunr> (so we can remove versions as a factor) + <tschwinge> Aha. I only hve Xen and real hardware. + <youpi> braunr: no + <braunr> youpi: do you know any obscur site about ptys ? :) + <youpi> no + <youpi> well, actually yes + <youpi> http://dept-info.labri.fr/~thibault/a (in french) + <braunr> :D + <braunr> http://www.linusakesson.net/programming/tty/index.php looks + interesting + <youpi> indeed + + +## IRC, freenode, #hurdfr, 2012-08-09 + + <braunr> youpi: ce que j'ai le plus de mal à comprendre, c'est ce qu'est un + "controlling tty" + <youpi> c'est le plus obscur d'obscur :) + <braunr> s'il est exclusif à une appli, comment ça doit se comporter sur un + fork, etc.. + <youpi> de manière simple, c'est ce qui permet de faire ^C + <braunr> eh oui, et c'est sûrement là que ça explose + <youpi> c'est pas exclusif, c'est hérité + <braunr> + http://homepage.ntlworld.com/jonathan.deboynepollard/FGA/bernstein-on-ttys/cttys.html + + +## IRC, freenode, #hurd, 2012-08-10 + + <braunr> youpi: and just to be sure about the test procedure, i log on a + system, type tty, see e.g. ttyp0, log out, and in again, then tty returns + ttyp1, etc.. + <youpi> yes + <braunr> youpi: and an open (e.g. cat) on /dev/ptyp0 returns EBUSY + <youpi> indeed + <braunr> so on xen it doesn't + <braunr> grmbl + <youpi> I've never seen it, more precisely + <braunr> i also have the problem with a non-accelerated qemu + <braunr> antrik: do you have the term problems we've seen on your bare + hardware ? + <antrik> I'm not sure what problem you are seeing exactly :-) + <braunr> antrik: when logging through ssh, tty first returns ttyp0, and the + second time (after logging out from the first session) ttyp1 + <braunr> antrik: and term servers that have been used are then stuck in a + busy state + <antrik> braunr: my ptys seem to be reused just fine + <braunr> or perhaps they didn't have the bug + <braunr> antrik: that's so weird + <antrik> (I do *sometimes* get hanging ptys, but that's a different issue + -- these are *not* busy; they just hang when reused...) + <braunr> antrik: yes i saw that too + <antrik> braunr: note though that my hurd package is many months old... + <antrik> (in fact everything on this system) + <braunr> antrik: i didn't see anything relevant about the term server in + years + <braunr> antrik: what shell do you use ? + <antrik> yeah, but such errors could be caused by all kinds of changes in + other parts of the Hurd, glibc, whatever... + <antrik> bash + + # Formal Verification This issue may be a simple programming error, or it may be more complicated. diff --git a/open_issues/user-space_device_drivers.mdwn b/open_issues/user-space_device_drivers.mdwn index 25168fce..8cde8281 100644 --- a/open_issues/user-space_device_drivers.mdwn +++ b/open_issues/user-space_device_drivers.mdwn @@ -50,6 +50,65 @@ Also see [[device drivers and IO systems]]. * I/O MMU. + +### IRC, freenode, #hurd, 2012-08-15 + + <carli2> hi. does hurd support mesa? + <braunr> carli2: software only, but yes + <carli2> :( + <carli2> so you did not solve the problem with the CS checkers and GPU DMA + for microkernels yet, right? + <braunr> cs = ? + <carli2> control stream + <carli2> the data sent to the gpu + <braunr> no + <braunr> and to be honest we're not currently trying to + <carli2> well, a microkernel containing cs checkers for each hardware is + not a microkernel any more + <braunr> the problem is having the ability to check + <braunr> or rather, giving only what's necessary to delegate checking to + mmus + <carli2> but maybe the kernel could have a smaller interface like a + function to check if a memory block is owned by a process + <braunr> i'm not sure what you refer to + <carli2> about DMA-capable devices you can send messages to + <braunr> carli2: dma must be delegated to a trusted server + <carli2> linux checks the data sent to these devices, parses them and + checks all pointers if they are in a memory range that the client is + allowed to read/write from + <braunr> the client ? + <carli2> in linux, 3d drivers are in user space, so the kernel side checks + the pointer sent to the GPU + <youpi> carli2: mach could do that as well + <braunr> well, there is a rather large part in kernel space too + <carli2> so in hurd I trust some drivers to not do evil things? + <braunr> those in the kernel yes + <carli2> what does "in the kernel" mean? afaik a microkernel only has + memory manager and some basic memory sharing and messaging functionality + <braunr> did you read about the hurd ? + <braunr> mach is considered an hybrid kernel, not a true microkernel + <braunr> even with all drivers outside, it's still an hybrid + <youpi> although we're to move some parts into userlands :) + <youpi> braunr: ah, why? + <braunr> youpi: the vm part is too large + <youpi> ok + <braunr> the microkernel dogma is no policy inside the kernel + <braunr> "except scheduling because it's very complicated" + <braunr> but all modern systems have moved memory management outisde the + kernel, leaving just the kernel abstraction inside + <braunr> the adress space kernel abstraction + <braunr> and the two components required to make it work are what l4re + calls region mappers (the rough equivalent of our vm_map), which decides + how to allocate regions in an address space + <braunr> and the pager, like ours, which are already external + <carli2> i'm not a OS developer, i mostly develop games, web services and + sometimes I fix gpu drivers + <braunr> that was just FYI + <braunr> but yes, dma must be considered something privileged + <braunr> and the hurd doesn't have the infrastructure you seem to be + looking for + + ## I/O Ports * Security considerations. @@ -63,8 +122,13 @@ Also see [[device drivers and IO systems]]. * [[GNU Mach|microkernel/mach/gnumach]] is said to have a high overhead when doing RPC calls. + ## System Boot +A similar problem is described in +[[community/gsoc/project_ideas/unionfs_boot]], and needs to be implemented. + + ### IRC, freenode, #hurd, 2011-07-27 < braunr> btw, was there any formulation of the modifications required to @@ -89,12 +153,270 @@ Also see [[device drivers and IO systems]]. < Tekk_> mhm < braunr> s/disk/storage/ + ### IRC, freenode, #hurd, 2012-04-25 <youpi> btw, remember the initrd thing? <youpi> I just came across task.c in libstore/ :) +### IRC, freenode, #hurd, 2012-07-17 + + <bddebian> OK, here is a stupid question I have always had. If you move + PCI and disk drivers in to userspace, how do do initial bootstrap to get + the system booting? + <braunr> that's hard + <braunr> basically you make the boot loader load all the components you + need in ram + <braunr> then you make it give each component something (ports) so they can + communicate + + +### IRC, freenode, #hurd, 2012-08-12 + + <antrik> braunr: so, about booting with userspace disk drivers + <antrik> after rereading the chapter in my thesis, I see that there aren't + really all than many interesting options... + <antrik> I pondered some variants involving a temporary boot filesystem + with handoff to the real root FS; but ultimately concluded with another + option that is slightly less elegant but probably gets a much better + usefulness/complexity ratio: + <antrik> just start the root filesystem as the first process as we used to; + only hack it so that initially it doesn't try to access the disk, but + instead gets the files from GRUB + <antrik> once the disk driver is operational, we flip a switch, and the + root filesystem starts reading stuff from disk normally + <antrik> transparently for all other processes + <bddebian> How does grub access the disk without drivers? + <antrik> bddebian: GRUB obviously has its own drivers... that's how it + loads the kernel and modules + <antrik> bddebian: basically, it would have to load additional modules for + all the components necessary to get the Hurd disk driver going + <bddebian> Right, why wouldn't that be possible? + <antrik> (I have some more crazy ideas too -- but these are mostly + orthogonal :-) ) + <antrik> ? + <antrik> I'm describing this because I'm pretty sure it *is* possible :-) + <bddebian> That grub loads the kernel and whatever server/module gets + access to the disk + <antrik> not sure what you mean + <bddebian> Well as usual I probably don't know the proper terminology but + why could grub load gnumach and the hurd "disk server" that contains the + userspace drivers? + <antrik> disk server? + <bddebian> Oh FFS whatever contains the disk drivers :) + <bddebian> diskdde, whatever :) + <antrik> actually, I never liked the idea of having a big driver blob very + much... ideally each driver should have it's own file + <antrik> but that's admittedly beside the point :-) + <antrik> its + <antrik> so to restate: in addition to gnumach, ext2fs.static, and ld.so, + in the new scenario GRUB will also load exec, the disk driver, any + libraries these two depend upon, and any additional infrastructure + involved in getting the disk driver running (for automatic probing or + whatever) + <antrik> probably some other Hurd core servers too, so we can have a more + complete POSIX environment for the disk driver to run in + <bddebian> There ya go :) + <antrik> the interesting part is modifying ext2fs so it will access only + the GRUB-provided files, until it is told that it's OK now to access the + real disk + <antrik> (and the mechanism how ext2 actually gets at the GRUB-provided + files) + <bddebian> Or write some new really small ext2fs? :) + <antrik> ? + <bddebian> I'm just talking out my butt. Something temporary that gets + disposed of when the real disk is available :) + <antrik> well, I mentioned above that I considered some handoff + schemes... but they would probably be more complex to implement than + doing the switchover internally in ext2 + <bddebian> Ah + <bddebian> boot up in a ramdisk? :) + <antrik> (and the temporary FS would *not* be an ext2 obviously, but rather + some special ramdisk-like filesystem operating from GRUB-loaded files...) + <antrik> again, that would require a complicated handoff-scheme + <bddebian> Bah, what do I know? :) + <antrik> (well, you could of course go with a trivial chroot()... but that + would be ugly and inefficient, as the initial processes would still run + from the ramdisk) + <bddebian> Aren't most things running in memory initially anyway? At what + point must it have access to the real disk? + <braunr> antrik: but doesn't that require that disk drivers be statically + linked ? + <braunr> and having all disk drivers in separate tasks (which is what we + prefer to blobs as you put it) seems to pretty much forbid using static + linking + <braunr> hm actually, i don't see how any solution could work without + static linking, as it would create a recursion + <braunr> and the only one required is the one used by the root file system + <braunr> others can be run from the dynamically linked version + <braunr> antrik: i agree, it's a good approach, requiring only a slightly + more complicated boot script/sequence + <antrik> bddebian: at some point we have to access the real disk so we + don't have to work exclusively with stuff loaded by grub... but there is + no specific point where it *has* to happen. generally speaking, the + sooner the better + <antrik> braunr: why wouldn't that work with a dynamically linked disk + driver? we only need to make sure all required libraries are loaded by + grub too + <braunr> antrik: i have a problem with that approach :p + <braunr> antrik: it would probably require a reboot when those libraries + are upgraded, wouldn't it ? + <antrik> I'd actually wish we could run with a dynamically linked ext2fs as + well... but that would require a separated boot filesystem and some kind + of handoff approach, which would be much more complicated I fear... + <braunr> and if a driver is restarted, would it use those libraries too ? + and if so, how to find them ? + <braunr> but how can you run a dynamically linked root file system ? + <braunr> unless the libraries it uses are provided by something else, as + you said + <antrik> braunr: well, if you upgrade the libraries, *and* want the disk + driver to use the upgraded libraries, you are obviously in a tricky + situation ;-) + <braunr> yes + <antrik> perhaps you could tell ext2 to preload the new libraries before + restarting the disk driver... + <antrik> but that's a minor quibble anyways IMHO + <braunr> but that case isn't that important actually, since upgrading these + libraries usually means we're upgrading the system, which can imply a + reoobt + <braunr> i don't think it is + <braunr> it looks very complicated to me + <braunr> think of restart as after a crash :p + <braunr> you can't preload stuff in that case + <antrik> uh? I don't see anything particularily complicated. but my point + was more that it's not a big thing if that's not implemented IMHO + <braunr> right + <braunr> it's not that important + <braunr> but i still think statically linking is better + <braunr> although i'm not sure about some details + <antrik> oh, you mean how to make the root filesystem use new libraries + without a reboot? that would be tricky indeed... but this is not possible + right now either, so that's not a regression + <braunr> i assume that, when statically linking, only the .o providing the + required symbols are included, right ? + <antrik> making the root filesystem restartable is a whole different epic + story ;-) + <braunr> antrik: not the root file system, but the disk driver + <braunr> but i guess it's the same + <antrik> no, it's not + <braunr> ah + <antrik> for the disk driver it's really not that hard I believe + <antrik> still some extra effort, but definitely doable + <braunr> with the preload you mentioned + <antrik> yes + <braunr> i see + <braunr> i don't think it's worth the trouble actually + <braunr> statically linking looks way simpler and should make for smaller + binaries than if libraries were loaded by grub + <antrik> no, I really don't want statically linked disk drivers + <braunr> why ? + <antrik> again, I'd prefer even ext2fs to be dynamic -- only that would be + much more complicated + <braunr> the point of dynamically linking is sharing + <antrik> while dynamic disk drivers do not require any extra effort beyond + loading the libraries with grub + <braunr> but if it means sharing big files that are seldom used (i assume + there is a lot of code that simply isn't used by hurd servers), i don't + see the point + <antrik> right. and with the approach I proposed that will work just as it + should + <antrik> err... what big files? + <braunr> glibc ? + <antrik> I don't get your point + <antrik> you prefer statically linking everything needed before the disk + driver runs (which BTW is much more than only the disk driver itself) to + using normal shared libraries like the rest of the system?... + <braunr> it's not "like the rest of the system" + <braunr> the libraries loaded by grub wouldn't be back by the ext2fs server + <braunr> they would be wired in memory + <braunr> you'd have two copies of them, the one loaded by grub, and the one + shared by normal executables + <antrik> no + <braunr> i prefer static linking because, if done correctly, the combined + size of the root file system and the disk driver should be smaller than + that of the rootfs+disk driver and libraries loaded by grub + <antrik> apparently I was not quite clear how my approach would work :-( + <braunr> probably not + <antrik> (preventing that is actually the reason why I do *not* want as + simple boot filesystem+chroot approach) + <braunr> and initramfs can be easily freed after init + <braunr> an* + <braunr> it wouldn't be a chroot but something a bit more involved like + switch_root in linux + <antrik> not if various servers use files provided by that init filesystem + <antrik> yes, that's the complex handoff I'm talking about + <braunr> yes + <braunr> that's one approach + <antrik> as I said, that would be a quite elegant approach (allowing a + dynamically linked ext2); but it would be much more complicated to + implement I believe + <braunr> how would it allow a dynamically linked ext2 ? + <braunr> how can the root file system be linked with code backed by itself + ? + <braunr> unless it requires wiring all its memory ? + <antrik> it would be loaded from the init filesystem before the handoff + <braunr> init sn't the problem here + <braunr> i understand how it would boot + <braunr> but then, you need to make sure the root fs is never used to + service page faults on its own address space + <braunr> or any address space it depends on, like the disk driver + <braunr> so this basically requires wiring all the system libraries, glibc + included + <braunr> why not + <antrik> ah. yes, that's something I covered in a separate section in my + thesis ;-) + <braunr> eh :) + <antrik> we have to do that anyways, if we want *any* dynamically linked + components (such as the disk driver) in the paging path + <braunr> yes + <braunr> and it should make swapping more reliable too + <antrik> so that adds a couple MiB of wired memory... I guess we will just + have to live with that + <braunr> yes it seems acceptable + <braunr> thanks + <antrik> (it is actually one reason why I want to avoid static linking as + much as possible... so at least we have to wire these libraries only + *once*) + <antrik> anyways, back to my "simpler" approach + <antrik> the idea is that a (static) ext2fs would still be the first task + running, and immediately able to serve filesystem access requests -- only + it would serve these requests from files preloaded by GRUB rather than + the actual disk driver + <braunr> i understand now + <antrik> until a switch is flipped telling it that now the disk driver (and + anything it depends upon) is operational + <braunr> you still need to make sure all this is wired + <antrik> yes + <antrik> that's orthogonal + <antrik> which is why I have a separate section about it :-) + <braunr> what was the relation with ggi ? + <antrik> none strictly speaking + <braunr> i'll rephrase it: how did it end up in your thesis ? + <antrik> I just covered all aspects of userspace drivers in one of the + "introduction" sections of my thesis + <braunr> ok + <antrik> before going into specifics of KGI + <antrik> (and throwing in along the way that most of the issues described + do not matter for KGI ;-) ) + <braunr> hehe + <braunr> i'm wondering, do we have mlockall on the hurd ? it seems not + <braunr> that's something deeply missing in mach + <antrik> well, bootstrap in general *is* actually relevant for KGI as well, + because of console messages during boot... but the filesystem bootstrap + is mostly irrelevant there ;-) + <antrik> braunr: oh? that's a problem then... I just assumed we have it + <braunr> well, it's possible to implement MCL_CURRENT, but not MCL_FUTURE + <braunr> or at least, it would be a bit difficult + <braunr> every allocation would need to be aware of that property + <braunr> it's better to have it managed by the vm system + <braunr> mach-defpager has its own version of vm_allocate for that + <antrik> braunr: I don't think we care about MCL_FUTURE here + <antrik> hm, wait... MCL_CURRENT is fine for code, but it might indeed be a + problem for dynamically allocated memory :-( + <braunr> yes + + # Plan * Examine what other systems are doing. @@ -116,6 +438,112 @@ Also see [[device drivers and IO systems]]. and parallel port drivers, using `libtrivfs`. +## I/O Server + +### IRC, freenode, #hurd, 2012-08-10 + + <braunr> usually you'd have an I/O server, and serveral device drivers + using it + <bddebian> Well maybe that's my question. Should there be unique servers + for say ISA, PCI, etc or could all of that be served by one "server"? + <braunr> forget about ISA + <bddebian> How? Oh because the ISA bus is now served via a PCI bridge? + <braunr> the I/O server would merely be there to help device drivers map + only what they require, and avoid conflicts + <braunr> because it's a relic of the past :p + <braunr> and because it requires too high privileges + <bddebian> But still exists in several PCs :) + <braunr> so usually, you'd directly ask the kernel for the I/O ports you + need + <mel-> so do floppy drives + <mel-> :) + <braunr> if i'm right, even the l4 guys do it that way + <braunr> he's right, some devices are still considered ISA + <bddebian> But that is where my confusion lies. Something has to figure + out what/where those I/O ports are + <braunr> and that's why i tell you to forget about it + <braunr> ISA has both statically allocated ports (the historical ones) and + others usually detected through PnP, when it works + <braunr> PCI is much cleaner, and memory mapped I/O is both better and much + more popular currently + <bddebian> So let's say I have a PCI SCSI card. I need some device driver + to know how to talk to that, right? + <bddebian> something is going to enumerate all the PCI devices and map them + to and address space + <braunr> bddebian: that would be the I/O server + <braunr> we'll call it the PCI server + <bddebian> OK, that is where I am headed. What if everything isn't PCI? + Is the "I/O server" generic enough? + <youpi> nowadays everything is PCI + <bddebian> So we are completely ignoring legacy hardware? + <braunr> we could have separate servers using a shared library that would + provide allocation routines like resource maps + <braunr> yes + <youpi> for what is not, the translator just needs to be run as root + <youpi> to get i/o perm from the kernel + <braunr> the idea for projects like ours, where the user base is very small + is: don't implement what you can't test + <youpi> bddebian: legacy can not be supported in a nice way, so for them we + can just afford a bad solution + <youpi> i.e. leave the driver in kernel + <braunr> right + <youpi> e.g. the keyboard + <bddebian> Well what if I have a USB keyboard? :-P + <braunr> that's a different matter + <youpi> USB keyboard is not legacy hardware + <youpi> it's usb + <youpi> which can be enumerated like pci + <braunr> and USB uses PCI + <youpi> and pci could be on usb :) + <braunr> so it's just a separate stack on top of the PCI server + <bddebian> Sure so would SCSI in my example above but is still a seperate + bus + <braunr> netbsd has a very nice way of attaching drivers to buses + <youpi> bddebian: also, yes, and it can be enumerated + <bddebian> Which was my original question. This magic I/O server handles + all of the buses? + <youpi> no, just PCI, and then you'd have other servers for other busses + <braunr> i didn't mean that there would be *one* I/O server instance + <bddebian> So then it isn't a generic I/O server is it? + <bddebian> Ahhhh + <youpi> that way you can even put scsi over ppp or other crazy things + <braunr> it's more of an idea + <braunr> there would probably be a generic interface for basic stuff + <braunr> and i assume it could be augmented with specific (e.g. USB) + interfaces for servers that need more detailed communication + <braunr> (well, i'm pretty sure of it) + <bddebian> So the I/O server generalizes all functions, say read and write, + and then the PCI, USB, SCIS, whatever servers are contacted by it? + <braunr> no, not read and write + <braunr> resource allocation rather + <youpi> and enumeration + <braunr> probing perhaps + <braunr> bddebian: the goal of the I/O server is to make it possible for + device drivers to access the resources they need without a chance to + interfere with other device drivers + <braunr> (at least, that's one of the goals) + <braunr> so a driver would request the bus space matching the device(s) and + obtain that through memory mapping + <bddebian> Shouldn't that be in the "global address space"? SOrry if I am + using the wrong terminology + <youpi> well, the i/o server should also trigger the start of that driver + <youpi> bddebian: address space is not a matter for drivers + <braunr> bddebian: i'm not sure what you think of with "global address + space" + <youpi> bddebian: it's just a matter for the pci enumerator when (and if) + it places the BARs in physical address space + <youpi> drivers merely request mapping that, they don't need to know about + actual physical addresses + <braunr> i'm almost sure you lost him at BARs + <braunr> :( + <braunr> youpi: that's what i meant with probing actually + <bddebian> Actually I know BARs I have been reading on PCI :) + <bddebian> I suppose physicall address space is more what I meant when I + used "global address space" + <braunr> i see + <youpi> bddebian: probably, yes + + # Documentation * [An Architecture for Device Drivers Executing as User-Level diff --git a/open_issues/vm_map_kernel_bug.mdwn b/open_issues/vm_map_kernel_bug.mdwn new file mode 100644 index 00000000..613c1317 --- /dev/null +++ b/open_issues/vm_map_kernel_bug.mdwn @@ -0,0 +1,54 @@ +[[!meta copyright="Copyright © 2012 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!tag open_issue_glibc open_issue_gnumach]] + + +# IRC, frenode, #hurd, 2012-11-04 + + <tschwinge> braunr, pinotree, youpi: Has either of you already figured out + what [glibc]/sysdeps/mach/hurd/dl-sysdep.c:fmh »XXX loser kludge for + vm_map kernel bug« is about? + <pinotree> tschwinge: ETOOLOWLEVELFORME :) + <pinotree> tschwinge: 5bf62f2d3a8af353fac661b224fc1604d4de51ea added it + <braunr> tschwinge: no, but that looks interesting + <braunr> i'll have a look later + <tschwinge> Heh, "interesting". ;-) + <tschwinge> It seems related to vm_map's mask + parameter/ELF_MACHINE_USER_ADDRESS_MASK, though the latter in only used + in the mmap implementation in sysdeps/mach/hurd/dl-sysdep.c (in mmap.c, 0 + is passed; perhaps due to the bug?). + <tschwinge> braunr: Anyway, I'd already welcome a patch to simply turn that + into a more comprehensible form. + <braunr> tschwinge: ELF_MACHINE_USER_ADDRESS_MASK is defined as "Mask + identifying addresses reserved for the user program, where the dynamic + linker should not map anything." + <braunr> about the vm_map parameter, which is a mask, it is described by + "Bits asserted in this mask must not be asserted in the address returned" + <braunr> so it's an alignment constraint + <braunr> the kludge disables alignment, apparently because gnumach doesn't + handle them correctly for some cases + <tschwinge> braunr: But ELF_MACHINE_USER_ADDRESS_MASK is 0xf8000000, so I'd + rather assume this means to restrict to addresses lower than 0xf8000000. + (What are whigher ones reserved for?) + <braunr> tschwinge: the linker i suppose + <braunr> tschwinge: sorry, i don't understand what + ELF_MACHINE_USER_ADDRESS_MASK really is used for :/ + <braunr> tschwinge: it looks unused for the other systems + <braunr> tschwinge: i guess it's just one way to partition the address + space, so that the linker knows where to load libraries and mmap can + still allocate large contiguous blocks + <braunr> tschwinge: 0xf8000000 means each "chunk" of linker/other blocks + are 128 MiB large + <tschwinge> braunr: OK, thanks for looking. I guess I'll ask Roland about + it. + <braunr> it could be that gnumach isn't good at aligning to large values + +[[!message-id "87fw4pb4c7.fsf@kepler.schwinge.homeip.net"]] |