From 4d93ba7548629fff82aa03351132c85d478a8734 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Mon, 14 Feb 2011 17:08:47 +0100 Subject: open_issues/performance/io_system/read-ahead: New. --- open_issues/performance/io_system/read-ahead.mdwn | 48 +++++++++++++++++++++++ 1 file changed, 48 insertions(+) create mode 100644 open_issues/performance/io_system/read-ahead.mdwn (limited to 'open_issues/performance/io_system') diff --git a/open_issues/performance/io_system/read-ahead.mdwn b/open_issues/performance/io_system/read-ahead.mdwn new file mode 100644 index 00000000..b3b139c7 --- /dev/null +++ b/open_issues/performance/io_system/read-ahead.mdwn @@ -0,0 +1,48 @@ +[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!tag open_issue_gnumach open_issue_hurd]] + +IRC, #hurd, freenode, 2011-02-13: + + youpi: Would libdiskfs/diskfs.h be in the right place to make + readahead functions? + etenil: no, it'd rather be at the memory management layer, + i.e. mach, unfortunately + because that's where you see the page faults + youpi: Linux also provides a readahead() function for higher level + applications. I'll probably have to add the same thing in a place that's + higher level than mach + well, that should just be hooked to the same common implementation + the man page for readahead() also states that portable + applications should avoid it, but it could be benefic to have it for + portability + it's not in posix indeed + +IRC, #hurd, freenode, 2011-02-14: + + youpi: I've investigated prefetching (readahead) techniques. One + called DiskSeen seems really efficient. I can't tell yet if it's patented + etc. but I'll keep you informed + don't bother with complicated techniques, even the most simple ones + will be plenty :) + it's not complicated really + the matter is more about how to plug it into mach + ok + then don't bother with potential pattents + etenil: please take a look at the work KAM did for last year's + GSoC + just use a trivial technique :) + ok, i'll just go the easy way then + + antrik: what was etenil referring to when talking about + prefetching ? + oh, madvise() stuff + i could help him with that -- cgit v1.2.3 From e8670035b2547fc300b83f7a031534edb02581db Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Wed, 16 Feb 2011 09:32:56 +0100 Subject: open_issues/performance/io_system/read-ahead: Link to Etenil's user page. --- open_issues/performance/io_system/read-ahead.mdwn | 2 ++ 1 file changed, 2 insertions(+) (limited to 'open_issues/performance/io_system') diff --git a/open_issues/performance/io_system/read-ahead.mdwn b/open_issues/performance/io_system/read-ahead.mdwn index b3b139c7..c3a0c1bb 100644 --- a/open_issues/performance/io_system/read-ahead.mdwn +++ b/open_issues/performance/io_system/read-ahead.mdwn @@ -46,3 +46,5 @@ IRC, #hurd, freenode, 2011-02-14: prefetching ? oh, madvise() stuff i could help him with that + +[[Etenil]] is now working in this area. -- cgit v1.2.3 From c01c02cd0e656e7f569560ce38ddad96ff9bbb43 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Wed, 16 Feb 2011 10:12:26 +0100 Subject: open_issues/performance/io_system/read-ahead: Some more IRC. --- open_issues/performance/io_system/read-ahead.mdwn | 230 ++++++++++++++++++++++ 1 file changed, 230 insertions(+) (limited to 'open_issues/performance/io_system') diff --git a/open_issues/performance/io_system/read-ahead.mdwn b/open_issues/performance/io_system/read-ahead.mdwn index c3a0c1bb..241cda41 100644 --- a/open_issues/performance/io_system/read-ahead.mdwn +++ b/open_issues/performance/io_system/read-ahead.mdwn @@ -26,6 +26,8 @@ IRC, #hurd, freenode, 2011-02-13: portability it's not in posix indeed +--- + IRC, #hurd, freenode, 2011-02-14: youpi: I've investigated prefetching (readahead) techniques. One @@ -47,4 +49,232 @@ IRC, #hurd, freenode, 2011-02-14: oh, madvise() stuff i could help him with that +--- + [[Etenil]] is now working in this area. + +--- + +IRC, freenode, #hurd, 2011-02-15 + + oh, I'm looking into prefetching/readahead to improve I/O + performance + etenil: ok + etenil: that's actually a VM improvement, like samuel told you + yes + a true I/O improvement would be I/O scheduling + and how to implement it in a hurdish way + (or if it makes sense to have it in the kernel) + that's what I've been wondering too lately + concerning the VM, you should look at madvise() + my understanding is that Mach considers devices without really + knowing what they are + that's roughly the interface used both at the syscall() and the + kernel levels in BSD, which made it in many other unix systems + whereas I/O optimisations are often hard disk drives specific + that's true for almost any kernel + the device knowledge is at the driver level + yes + (here, I separate kernels from their drivers ofc) + but Mach also contains some drivers, so I'm going through the code + to find the apropriate place for these improvements + you shouldn't tough the drivers at all + touch + true, but I need to understand how it works before fiddling around + hm + not at all + the VM improvement is about pagein clustering + you don't need to know how pages are fetched + well, not at the device level + you need to know about the protocol between the kernel and + external pagers + ok + you could also implement pageout clustering + if I understand you well, you say that what I'd need to do is a + queuing system for the paging in the VM? + no + i'm saying that, when a page fault occurs, the kernel should + (depending on what was configured through madvise()) transfer pages in + multiple blocks rather than one at a time + communication with external pagers is already async, made through + regular ports + which already implement message queuing + you would just need to make the mapped regions larger + and maybe change the interface so that this size is passed + mmh + (also don't forget that page clustering can include pages *before* + the page which caused the fault, so you may have to pass the start of + that region too) + I'm not sure I understand the page fault thing + is it like a segmentation error? + I can't find a clear definition in Mach's manual + ah + it's a fundamental operating system concept + http://en.wikipedia.org/wiki/Page_fault + ah ok + I understand now + so what's currently happening is that when a page fault occurs, + Mach is transfering pages one at a time and wastes time + sometimes, transferring just one page is what you want + it depends on the application, which is why there is madvise() + our rootfs, on the other hand, would benefit much from such an + improvement + in UVM, this optimization is account for around 10% global + performance improvement + accounted* + not bad + well, with an improved page cache, I'm sure I/O would matter less + on systems with more RAM + (and another improvement would make mach support more RAM in the + first place !) + an I/O scheduler outside the kernel would be a very good project + IMO + in e.g. libstore/storeio + yes + but as i stated in my thesis, a resource scheduler should be as + close to its resource as it can + and since mach can host several operating systems, I/O schedulers + should reside near device drivers + and since current drivers are in the kernel, it makes sens to have + it in the kernel too + so there must be some discussion about this + doesn't this mean that we'll have to get some optimizations in + Mach and have the same outside of Mach for translators that access the + hardware directly? + etenil: why ? + well as you said Mach contains some drivers, but in principle, it + shouldn't, translators should do disk access etc, yes? + etenil: ok + etenil: so ? + well, let's say if one were to introduce SATA support in Hurd, + nothing would stop him/her to do so with a translator rather than in Mach + you should avoid the term translator here + it's really hurd specific + let's just say a user space task would be responsible for that + job, maybe multiple instances of it, yes + ok, so in this case, let's say we have some I/O optimization + techniques like readahead and I/O scheduling within Mach, would these + also apply to the user-space task, or would they need to be + reimplemented? + if you have user space drivers, there is no point having I/O + scheduling in the kernel + but we also have drivers within the kernel + what you call readahead, and I call pagein/out clustering, is + really tied to the VM, so it must be in Mach in any case + well + you either have one or the other + currently we have them in the kernel + if we switch to DDE, we should have all of them outside + that's why such things must be discussed + ok so if I follow you, then future I/O device drivers will need to + be implemented for Mach + currently, yes + but preferrably, someone should continue the work that has been + done on DDe so that drivers are outside the kernel + so for the time being, I will try and improve I/O in Mach, and if + drivers ever get out, then some of the I/O optimizations will need to be + moved out of Mach + let me remind you one of the things i said + i said I/O scheduling should be close to their resource, because + we can host several operating systems + now, the Hurd is the only system running on top of Mach + so we could just have I/O scheduling outside too + then you should consider neighbor hurds + which can use different partitions, but on the same device + currently, partitions are managed in the kernel, so file systems + (and storeio) can't make good scheduling decisions if it remains that way + but that can change too + a single storeio representing a whole disk could be shared by + several hurd instances, just as if it were a high level driver + then you could implement I/O scheduling in storeio, which would be + an improvement for the current implementation, and reusable for future + work + yes, that was my first instinct + and you would be mostly free of the kernel internals that make it + a nightmare + but youpi said that it would be better to modify Mach instead + he mentioned the page clustering thing + not I/O scheduling + theseare really two different things + ok + you *can't* implement page clustering outside Mach because Mach + implements virtual memory + both policies and mechanisms + well, I'd rather think of one thing at a time if that's alright + so what I'm busy with right now is setting up clustered page-in + which need to be done within Mach + keep clustered page-outs in mind too + although there are more constraints on those + yes + I've looked up madvise(). There's a lot of documentation about it + in Linux but I couldn't find references to it in Mach (nor Hurd), does it + exist? + well, if it did, you wouldn't be caring about clustered page + transfers, would you ? + be careful about linux specific stuff + I suppose not + you should implement at least posix options, and if there are + more, consider the bsd variants + (the Mach VM is the ancestor of all modern BSD VMs) + madvise() seems to be posix + there are system specific extensions + be careful + CONFORMING TO POSIX.1b. POSIX.1-2001 describes posix_madvise(3) + with constants POSIX_MADV_NORMAL, etc., with a behav‐ ior close to that + described here. There is a similar posix_fadvise(2) for file access. + MADV_REMOVE, MADV_DONTFORK, MADV_DOFORK, MADV_HWPOISON, + MADV_MERGEABLE, and MADV_UNMERGEABLE are Linux- specific. + I was about to post these + ok, so basically madvise() allows tasks etc. to specify a usage + type for a chunk of memory, then I could apply the relevant I/O + optimization based on this + that's it + cool, then I don't need to worry about knowing what the I/O is + operating on, I just need to apply the optimizations as advised + that's convenient + ok I'll start working on this tonight + making a basic readahead shouldn't be too hard + readahead is a misleading name + is pagein better? + applies to too many things, doesn't include the case where + previous elements could be prefetched + clustered page transfers is what i would use + page prefetching maybe + ok + you should stick to something that's already used in the + literature since you're not inventing something new + yes I've read a paper about prefetching + ok + thanks for your help braunr + sure + you're welcome + braunr: madvise() is really the least important part of the + picture... + very few applications actually use it. but pretty much all + applications will profit from clustered paging + I would consider madvise() an optional goody, not an integral part + of the implementation + etenil: you can find some stuff about KAM's work on + http://www.gnu.org/software/hurd/user/kam.html + not much specific though + thanks + I don't remember exactly, but I guess there is also some + information on the mailing list. check the archives for last summer + look for Karim Allah Ahmed + antrik: I disagree, madvise gives me a good starting point, even + if eventually the optimisations should run even without it + the code he wrote should be available from Google's summer of code + page somewhere... + antrik: right, i was mentioning madvise() because the kernel (VM) + interface is pretty similar to the syscall + but even a default policy would be nice + etenil: I fear that many bits were discussed only on IRC... so + you'd better look through the IRC logs from last April onwards... + ok + + at the beginning I thought I could put that into libstore + which would have been fine + + BTW, I remembered now that KAM's GSoC application should have a + pretty good description of the necessary changes... unfortunately, these + are not publicly visible IIRC :-( -- cgit v1.2.3 From bdd896e0b81cfb40c8d24a78f9022f6cd1ae5e8c Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Thu, 17 Feb 2011 14:15:11 +0100 Subject: open_issues/performance/io_system/clustered_page_faults: New. And some more IRC discussions. --- microkernel/mach/external_pager_mechanism.mdwn | 16 +++- microkernel/mach/gnumach/memory_management.mdwn | 14 +++ microkernel/mach/memory_object.mdwn | 4 +- open_issues/performance/io_system.mdwn | 3 +- .../io_system/clustered_page_faults.mdwn | 103 +++++++++++++++++++++ open_issues/performance/io_system/read-ahead.mdwn | 19 ++++ 6 files changed, 155 insertions(+), 4 deletions(-) create mode 100644 open_issues/performance/io_system/clustered_page_faults.mdwn (limited to 'open_issues/performance/io_system') diff --git a/microkernel/mach/external_pager_mechanism.mdwn b/microkernel/mach/external_pager_mechanism.mdwn index d9b6c2c8..05a6cc56 100644 --- a/microkernel/mach/external_pager_mechanism.mdwn +++ b/microkernel/mach/external_pager_mechanism.mdwn @@ -1,5 +1,5 @@ -[[!meta copyright="Copyright © 2002, 2007, 2008, 2010 Free Software Foundation, -Inc."]] +[[!meta copyright="Copyright © 2002, 2007, 2008, 2010, 2011 Free Software +Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this @@ -181,3 +181,15 @@ fashion. The server is not required to send a response to the kernel. (D) The manager then transfers the data to the storeio server which eventually sends it to disk. The device driver consumes the memory doing the equivalent of a `vm_deallocate`. + + +# Issues + + * [[open_issues/performance/io_system/read-ahead]] + + * [[open_issues/performance/io_system/clustered_page_faults]] + + +# GNU Hurd Usage + +Read about the [[Hurd's I/O path|hurd/io_path]]. diff --git a/microkernel/mach/gnumach/memory_management.mdwn b/microkernel/mach/gnumach/memory_management.mdwn index 49a082e9..17dbe46f 100644 --- a/microkernel/mach/gnumach/memory_management.mdwn +++ b/microkernel/mach/gnumach/memory_management.mdwn @@ -34,3 +34,17 @@ IRC, freenode, #hurd, 2011-02-15 except for performance (because you can use larger - even very lage - pages without resetting the mmu often thanks to global pages, but that didn't exist at the time) + +IRC, freenode, #hurd, 2011-02-15 + + however, the kernel won't work in 64 bit mode without some changes + to physical memory management + and mmu management + (but maybe that's what you meant by physical memory) + +IRC, freenode, #hurd, 2011-02-16 + + antrik: youpi added it for xen, yes + antrik: but you're right, since mach uses a direct mapped kernel + space, the true problem is the lack of linux-like highmem support + which isn't required if the kernel space is really virtual diff --git a/microkernel/mach/memory_object.mdwn b/microkernel/mach/memory_object.mdwn index 2342145c..f32fe778 100644 --- a/microkernel/mach/memory_object.mdwn +++ b/microkernel/mach/memory_object.mdwn @@ -1,4 +1,4 @@ -[[!meta copyright="Copyright © 2002, 2003, 2010 Free Software Foundation, +[[!meta copyright="Copyright © 2002, 2003, 2010, 2011 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable @@ -29,3 +29,5 @@ last one tried is the *default memory manager* that resides in the microkernel, in contrast to most of the others. The default memory manager is needed because the microkernel can't wait infinitely for someone else to free the memory cache: it just calls the next memory manager hoping it to succeed. + +Read about [[GNU Mach's memory management|gnumach/memory_management]]. diff --git a/open_issues/performance/io_system.mdwn b/open_issues/performance/io_system.mdwn index dbf7012a..4af093ba 100644 --- a/open_issues/performance/io_system.mdwn +++ b/open_issues/performance/io_system.mdwn @@ -20,7 +20,8 @@ slow hard disk access. The reason for this slowness is lack and/or bad implementation of common optimization techniques, like scheduling reads and writes to minimize head movement; effective block caching; effective reads/writes to partial blocks; -reading/writing multiple blocks at once; and [[read-ahead]]. The +[[reading/writing multiple blocks at once|clustered_page_faults]]; and +[[read-ahead]]. The [[ext2_filesystem_server|hurd/translator/ext2fs]] might also need some optimizations at a higher logical level. diff --git a/open_issues/performance/io_system/clustered_page_faults.mdwn b/open_issues/performance/io_system/clustered_page_faults.mdwn new file mode 100644 index 00000000..3a187523 --- /dev/null +++ b/open_issues/performance/io_system/clustered_page_faults.mdwn @@ -0,0 +1,103 @@ +[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +[[!tag open_issue_gnumach open_issue_hurd]] + +IRC, freenode, #hurd, 2011-02-16 + + exceptfor the kernel, everything in an address space is + represented with a VM object + those objects can represent anonymous memory (from malloc() or + because of a copy-on-write) + or files + on classic Unix systems, these are files + on the Hurd, these are memory objects, backed by external pagers + (like ext2fs) + so when you read a file + the kernel maps it from ext2fs in your address space + and when you access the memory, a fault occurs + the kernel determines it's a region backed by ext2fs + so it asks ext2fs to provide the data + when the fault is resolved, your process goes on + does the faul occur because Mach doesn't know how to access the + memory? + it occurs because Mach intentionnaly didn't back the region with + physical memory + the MMU is programmed not to know what is present in the memory + region + or because it's read only + (which is the case for COW faults) + so that means this bit of memory is a buffer that ext2fs loads the + file into and then it is remapped to the application that asked for it + more or less, yes + ideally, it's directly written into the right pages + there is no intermediate buffer + I see + and as you told me before, currently the page faults are handled + one at a time + which wastes a lot of time + a certain amount of time + enough to bother the user :) + I've seen pages have a fixed size + yes + use the PAGE_SIZE macro + and when allocating memory, the size that's asked for is rounded + up to the page size + so if I have this correctly, it means that a file ext2fs provides + could be split into a lot of pages + yes + once in memory, it is managed by the page cache + so that pages more actively used are kept longer than others + in order to minimize I/O + ok + so a better page cache code would also improve overall performance + and more RAM would help a lot, since we are strongly limited by + the 768 MiB limit + which reduces the page cache size a lot + but the problem is that reading a whole file in means trigerring + many page faults just for one file + if you want to stick to the page clustering thing, yes + you want less page faults, so that there are less IPC between the + kernel and the pager + so either I make pages bigger + or I modify Mach so it can check up on a range of pages for faults + before actually processing + you *don't* change the page size + ah + that's hardware isn't it? + in Mach, yes + ok + and usually, you want the page size to be the CPU page size + I see + current CPU can support multiple page sizes, but it becomes quite + hard to correctly handle + and bigger page sizes mean more fragmentation, so it only suits + machines with large amounts of RAM, which isn't the case for us + ok + so I'll try the second approach then + that's what i'd recommand + recommend* + ok + +--- + +IRC, freenode, #hurd, 2011-02-16 + + etenil: OSF Mach does have clustered paging BTW; so that's one + place to start looking... + (KAM ported the OSF code to gnumach IIRC) + there is also an existing patch for clustered paging in libpager, + which needs some adaptation + the biggest part of the task is probably modifying the Hurd + servers to use the new interface + but as I said, KAM's code should be available through google, and + can serve as a starting point + + diff --git a/open_issues/performance/io_system/read-ahead.mdwn b/open_issues/performance/io_system/read-ahead.mdwn index 241cda41..3ee30b5d 100644 --- a/open_issues/performance/io_system/read-ahead.mdwn +++ b/open_issues/performance/io_system/read-ahead.mdwn @@ -278,3 +278,22 @@ IRC, freenode, #hurd, 2011-02-15 BTW, I remembered now that KAM's GSoC application should have a pretty good description of the necessary changes... unfortunately, these are not publicly visible IIRC :-( + +--- + +IRC, freenode, #hurd, 2011-02-16 + + braunr: I've looked in the kernel to see where prefetching would + fit best. We talked of the VM yesterday, but I'm not sure about it. It + seems to me that the device part of the kernel makes more sense since + it's logically what manages devices, am I wrong? + etenil: you are + etenil: well + etenil: drivers should already support clustered sector + read/writes + ah + but yes, there must be support in the drivers too + what would really benefit the Hurd mostly concerns page faults, so + the right place is the VM subsystem + +[[clustered_page_faults]] -- cgit v1.2.3