[[!meta copyright="Copyright © 2010 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled [[GNU Free Documentation License|/fdl]]."]]"""]] [[!tag open_issue_gnumach]] IRC, unknown channel, unknown date. youpi: I have found an interesting Mach problem, but I'm a bit scared of debugging it... (it is related to VM stuff) I have a memory region that is mapped by the iopl device (it's an mmio region -- graphics memory to be precise) when gdb tries to read that region with vm_read() (for a "print" command), it triggers a general protection trap... antrik: does the general protection trap kill the whole kernel or just gdb? kernel kernel: General protection trap (13), code=0 pmap_copy_page(41000000,49f2000,1,0,1) /build/buildd/gnumach-1.3.99.dfsg.cvs20090220/build-dbg/../i386/i386/phys.c:62 vm_object_copy_slowly(209c1c54,41000000,1000,1,20994908) /build/buildd/gnumach-1.3.99.dfsg.cvs20090220/build-dbg/../vm/vm_object.c:1150 vm_object_copy_strategically(209c1c54,41000000,1000,20994908,2099490c) /build/buildd/gnumach-1.3.99.dfsg.cvs20090220/build-dbg/../vm/vm_object.c:1669 vm_map_copyin(209ba6e4,2c000,1000,0,25394ec8) /build/buildd/gnumach-1.3.99.dfsg.cvs20090220/build-dbg/../vm/vm_map.c:3297 vm_read(209ba6e4,2c000,1000,208d303c,25394f00) /build/buildd/gnumach-1.3.99.dfsg.cvs20090220/build-dbg/../vm/vm_user.c:228 _Xvm_read(2095cfe4,208d3010,0,1fff3e48,2095cfd4) /build/buildd/gnumach-1.3.99.dfsg.cvs20090220/build-dbg/kern/mach.server.c:1164 ipc_kobject_server(2095cfd4,2095cfe4,28,127ca0,0) /build/buildd/gnumach-1.3.99.dfsg.cvs20090220/build-dbg/../kern/ipc_kobject.c:201 mach_msg_trap(1024440,3,28,30,2c) /build/buildd/gnumach-1.3.99.dfsg.cvs20090220/build-dbg/../ipc/mach_msg.c:1367 Bad frame pointer: 0x102441c BTW, is it useful at all to write down the paramenters as well?... argments I mean in the trace you mean? yes apparently the problem here is that the call to vm_fault_page() didn't perform its task which address is faulty? not sure what you mean ah shit the gpf wouldn't tell you does examine 49f2000 work? oh, wait, 4100000, that can't work +0 which physical address is your mmio at? haven't tried it... but I can provoke the fault again if it helps :-) we have the 1GB limitation issue oh... lemme check no need to, I think the problem is that the iopl driver should check that it's not above phys_last_addr it's only vm_read() that fails, though... the actual program I debugged in gdb works perfectly fine yes, but that's because it's accessing the memory in a different way in the case of direct reads it just uses the page table in the case of vm_read() it uses kernel's projection but in that case it's not in the kernel projection phys = 1090519040 that's it, it's beyond 1GB there's not much to do except changing mach's adressing organization yeah, that's the 0x41000000 hm... I guess we could make the vm_read() bail out instead of crashing?... yes but there are a lot of places like this still, it's not exactly fun when trying to debug a program and the kernel crashes :-) right :) I could try to add the check... if you tell me where it belongs ;-) antrik: it's not just one place, that's the problem it's all the places that call pmap_zero_page, pmap_copy_page, copy_to_phys or copy_from_phys and since we do want to let the iopl device create such kind of page, in principle we have to cope with them all pmap_zero_page should be ok, though the rest isn't is that tricky, or just a matter of doing it in all places? hm... now it crashed in "normal" usage as well... hm... a page fault trap for a change... hm... now gdb tried to vm_read() something that is mapped to physical address 0x0... so I guess I fucked something up in the mapping code is it expected that such a vm_read() causes a kernel page fault, though?... youpi: ^ nope in principle the check for validity of the page is done earlier physical address 0x0 makes sense, though OK, here is the trace: Kernel page fault (14), code=0 at address 0x0 pmap_copy_page(0,6e54000,1,0,1) /build/buildd/gnumach-1.3.99.dfsg.cvs20090220/build-dbg/../i386/i386/phys.c:62 vm_object_copy_slowly(20a067b0,0,1000,1,0acacec) /build/buildd/gnumach-1.3.99.dfsg.cvs20090220/build-dbg/../vm/vm_object.c:1150 vm_object_copy_strategically(20a067b0,0,1000,20acacec,20acacf0) /build/buildd/gnumach-1.3.99.dfsg.cvs20090220/build-dbg/../vm/vm_object.c:1669 vm_map_copyin(20a0f1c4,120d000,1000,0,253cdec8) /build/buildd/gnumach-1.3.99.dfsg.cvs20090220/build-dbg/../vm/vm_map.c:3297 vm_read(20a0f1c4,120d000,1000,20a5703c,253cdf00) /build/buildd/gnumach-1.3.99.dfsg.cvs20090220/build-dbg/../vm/vm_user.c:228 _Xvm_read(20a52c80,20a57010,253cdf40,20ae33cc,20a52c70) /build/buildd/gnumach-1.3.99.dfsg.cvs20090220/build-dbg/kern/mach.server.c:1164 ipc_kobject_server(20a52c70,20a52c80,28,20873074,20873070) /build/buildd/gnumach-1.3.99.dfsg.cvs20090220/build-dbg/../kern/ipc_kobject.c:201 mach_msg_trap(10247d0,3,28,30,2f) /build/buildd/gnumach-1.3.99.dfsg.cvs20090220/build-dbg/../ipc/mach_msg.c:1367 Bad frame pointer: 0x10247ac seems to be exactly the same, except for the different arguments... hm... interesting... it *does* write something to the framebuffer, before it crashes... (which unfortunately makes it a bit hard to read the panic message... ;-) ) heh :) wait, it must write to something else than the frame buffer as well, or else the debugger should just paint over the crap... or perhaps it crashes so hard that the debugger doesn't even work? ;-) hm... I guess the first thing I should actually do is finding out what's up with e2fsck... this make testing crashes kinda annoying :-( oh, "interesting"... I ran it on one of my other hurd partitions, and it complained about an endless number of files... (perhaps all) however, the value for the normal files was different than for the passive translator nodes it doesn't happen only on crashes; it seems that all passive translators that are still in use at time of shutdown (or crash) have the offending bit set in the inode ouch... seems it doesn't write into the framebuffer after all, but rather scribbles all over the first 4 MiB of memory -- which includes also the VGA window, before it goes on killing the kernel... which iopl driver are you using ? ? the one from the debian patch? upstream, gnumach doesn't have an iopl device any more I guess so... standard Debian stuff here oh. how does X map the memory, then? X does yes ? X uses the iopl() device to access the video memory, yes I don't know if that was what you were asking for, but that's what I meant by my answer :) yeah, I know how it does *currently* do it -- I stole the code from there :-) my question is, how is X supposed to get at the framebuffer, when there is no iopl device anymore? ah, I hadn't noticed the "how" word in Debian there is !debian → !x? the clean "access device memory" interface is yet to be done err... that sounds like Xorg philosophy what, to wait for a nice interface ? "let's kill the old stuff, fuck regressions... maybe someone will figure out how to do it with the new stuff at some point. if not, not our problem" that's also a GNU philosophy ah, that one anyone know how device_map() is supposed to behave? the documentation isn't really clear... my understanding was then when an offset is specified, then the resulting object will be relative to that object; i.e. the offset of a later vm_map() on this object is applied on top of the object's internal offset... but that doesn't seem to be how it works for the iopl device, if I read the xf86 code correctly... yeah, the offset parameter seems a nop when doing device_map() on the iopl device