[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled [[GNU Free Documentation License|/fdl]]."]]"""]] [[!tag open_issue_gnumach open_issue_hurd]] [[community/gsoc/project_ideas/disk_io_performance]]. IRC, freenode, #hurd, 2011-02-16 exceptfor the kernel, everything in an address space is represented with a VM object those objects can represent anonymous memory (from malloc() or because of a copy-on-write) or files on classic Unix systems, these are files on the Hurd, these are memory objects, backed by external pagers (like ext2fs) so when you read a file the kernel maps it from ext2fs in your address space and when you access the memory, a fault occurs the kernel determines it's a region backed by ext2fs so it asks ext2fs to provide the data when the fault is resolved, your process goes on does the faul occur because Mach doesn't know how to access the memory? it occurs because Mach intentionnaly didn't back the region with physical memory the MMU is programmed not to know what is present in the memory region or because it's read only (which is the case for COW faults) so that means this bit of memory is a buffer that ext2fs loads the file into and then it is remapped to the application that asked for it more or less, yes ideally, it's directly written into the right pages there is no intermediate buffer I see and as you told me before, currently the page faults are handled one at a time which wastes a lot of time a certain amount of time enough to bother the user :) I've seen pages have a fixed size yes use the PAGE_SIZE macro and when allocating memory, the size that's asked for is rounded up to the page size so if I have this correctly, it means that a file ext2fs provides could be split into a lot of pages yes once in memory, it is managed by the page cache so that pages more actively used are kept longer than others in order to minimize I/O ok so a better page cache code would also improve overall performance and more RAM would help a lot, since we are strongly limited by the 768 MiB limit which reduces the page cache size a lot but the problem is that reading a whole file in means trigerring many page faults just for one file if you want to stick to the page clustering thing, yes you want less page faults, so that there are less IPC between the kernel and the pager so either I make pages bigger or I modify Mach so it can check up on a range of pages for faults before actually processing you *don't* change the page size ah that's hardware isn't it? in Mach, yes ok and usually, you want the page size to be the CPU page size I see current CPU can support multiple page sizes, but it becomes quite hard to correctly handle and bigger page sizes mean more fragmentation, so it only suits machines with large amounts of RAM, which isn't the case for us ok so I'll try the second approach then that's what i'd recommand recommend* ok --- IRC, freenode, #hurd, 2011-02-16 etenil: OSF Mach does have clustered paging BTW; so that's one place to start looking... (KAM ported the OSF code to gnumach IIRC) there is also an existing patch for clustered paging in libpager, which needs some adaptation the biggest part of the task is probably modifying the Hurd servers to use the new interface but as I said, KAM's code should be available through google, and can serve as a starting point