From 12c341b917921eb631026ec44a284c4d884e5de6 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Wed, 6 Mar 2013 21:52:20 +0100 Subject: IRC. --- open_issues/performance/io_system/read-ahead.mdwn | 429 +++++++++++++++++++++- 1 file changed, 428 insertions(+), 1 deletion(-) (limited to 'open_issues/performance') diff --git a/open_issues/performance/io_system/read-ahead.mdwn b/open_issues/performance/io_system/read-ahead.mdwn index 706e1632..be582e8a 100644 --- a/open_issues/performance/io_system/read-ahead.mdwn +++ b/open_issues/performance/io_system/read-ahead.mdwn @@ -1,4 +1,5 @@ -[[!meta copyright="Copyright © 2011, 2012 Free Software Foundation, Inc."]] +[[!meta copyright="Copyright © 2011, 2012, 2013 Free Software Foundation, +Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this @@ -2556,3 +2557,429 @@ License|/fdl]]."]]"""]] you're asking if you can include the large store patch in your work, and by extension, in the main branch i would say yes, but this must be discussed with others + + +## IRC, freenode, #hurd, 2013-02-18 + + mcsim: so, currently reviewing gnumach + braunr: hello + mcsim: the review branch, right ? + braunr: yes + braunr: What do you start with? + memory refreshing + i see you added the advice twice, to vm_object and vm_map_entry + iirc, we agreed to only add it to map entries + am i wrong ? + let me see + the real question being: what do you use the object advice for ? + >iirc, we agreed to only add it to map entries + braunr: TBH, do not remember that. At some point we came to + conclusion that there should be only one advice. But I'm not sure if it + was final point. + maybe it wasn't, yes + that's why i've just reformulated the question + if (map_entry && (map_entry->advice != VM_ADVICE_DEFAULT)) + advice = map_entry->advice; + else + advice = object->advice; + ok + It just participates in determining actual advice + ok that's not a bad thing + let's keep it + please document VM_ADVICE_KEEP + and rephrase "How to handle page faults" in vm_object.h to + something like 'How to tune page fault handling" + mcsim: what's the point of VM_ADVICE_KEEP btw ? + braunr: Probably it is better to remove it? + well if it doesn't do anything, probably + braunr: advising was part of mo_set_attributes before + no it is redudant + i see + so yes, remove it + s/no/now + (don't waste time on a gcs-like changelog format for now) + i also suggest creating _vX branches + so we can compare the changes between each of your review branches + hm, minor coding style issues like switch(...) instead of switch + (...) + why does syscall_vm_advise return MACH_SEND_INTERRUPTED if the + target map is NULL ? + is it modelled after an existing behaviour ? + ah, it's the syscall version + braunr: every syscall does so + and the error is supposed to be used by user stubs to switch to + the rpc version + ok + hm + you've replaced obsolete port_set_select and port_set_backup calls + with your own + don't do that + instead, add your calls to the new gnumach interface + mcsim: out of curiosity, have you actually tried the syscall + version ? + braunr: Isn't it called by default? + i don't think so, no + than no + ok + you could name vm_get_advice_info vm_advice_info + regarding obsolete calls, did you say that only in regard of + port_set_* or all other calls too? + all of the + m + i missed one, yes + the idea is: don't change the existing interface + >you could name vm_get_advice_info vm_advice_info + could or should? i.e. rename? + i'd say should, to remain consistent with the existing similar + calls + ok + can you explain KERN_NO_DATA a bit more ? + i suppose it's what servers should answer for neighbour pages that + don't exist in the backend, right ? + kernel can ask server for some data to read them beforehand, but + server can be in situation when it does not know what data should be + prefetched + yes + ok + it is used by ext2 server + with large store patch + so its purpose is to allow the kernel to free the preallocated + pages that won't be used + do i get it right ? + no. + ext2 server has a buffer for pages and when kernel asks to read + pages ahead it specifies region of that buffer + ah ok + but consecutive pages in buffer does not correspond to consecutive + pages on disk + so, the kernel can only prefetch pages that were already read by + the server ? + no, it can ask a server to prefetch pages that were not read by + server + hum + ok + but in case with buffer, if buffer page is empty, server does not + know what to prefetch + i'm not sure i'm following + well, i'm sure i'm not following + what happens when the kernel requests data from a server, right + after a page fault ? + what does the message afk for ? + kernel is unaware regarding actual size of file where was page + fault because of buffer indirection, right? + i don't know what "buffer" refers to here + this is buffer in memory where ext2 server reads pages + with large store patch ext2 server does not map the whole disk, but + some of its pages + and it maps these pages in special buffer + that means that constructiveness of pages in memory does not mean + that they are consecutive on disk or logically (belong to the same file) + ok so it's a page pool + with unordered pages + but what do you mean when you say "server does not know what to + prefetch" + it normally has everything to determine that + For instance, page fault occurs that leads to reading of + 4k-file. But kernel does not know actual size of file and asks to + prefetch 16K bytes + yes + There is no sense to prefetch something that does not belong to + this file + yes but the server *knows* that + and server answers with KERN_NO_DATA + server should always say something about every page that was asked + then, again, isn't the purpose of KERN_NO_DATA to notify the + kernel it can release the preallocated pages meant for the non existing + data ? + (non existing or more generally non prefetchable) + yes + then + why did you answer no to + 15:46 < braunr> so its purpose is to allow the kernel to free the + preallocated pages that won't be used + is there something missing ? + (well obviously, notify the kernel it can go on with page fault + handling) + braunr: sorry, misunderstoo/misread + ok + so good, i got this right :) + i wonder if KERN_NO_DATA may be a bit too vague + people might confuse it with ENODATA + Actually, this is transformation of ENODATA + I was looking among POSIX error codes and thought that this is the + most appropriate + i'm not sure it is + first, it's about STREAMS, a commonly unused feature + and second, the code is obsolete + braunr: AFAIR purpose of KERN_NO_DATA is not only free + pages. Without this call something should hang + 15:59 < braunr> (well obviously, notify the kernel it can go on + with page fault handling) + yes + hm + sorry again + i don't see anything better for the error name for now + and it's really minor so let's keep it as it is + actually, ENODATA being obsolete helps here + ok, done for now, work calling + we'll continue later or tomorrow + braunr: ok + other than that, this looks ok on the kernel side for now + the next change is a bit larger so i'd like to take the time to + read it + braunr: ok + regarding moving calls in mach.defs, can I put them elsewhere? + gnumach.defs + you'll probably need to rebase your changes to get it + braunr: I'll rebase this later, when we finish with review + ok + keep the comments in a list then, not to forget + (logging irc is also useful) + + +## IRC, freenode, #hurd, 2013-02-20 + + mcsim: why does VM_ADVICE_DEFAULT have its own entry ? + braunr: this kind of fallback mode + i suppose that even random strategy could even read several pages + at once + yes + but then, why did you name it "default" ? + because it is assigned by default + ah + so you expect pagers to set something else + for all objects they create + yes + ok + why not, but add a comment please + at least until all pagers will support clustered reading + ok + even after that, it's ok + just say it's there to keep the previous behaviour by default + so people don't get the idea of changing it too easily + comment in vm_advice.h? + no, in vm_fault.C + right above the array + why does vm_calculate_clusters return two ranges ? + also, "Function PAGE_IS_NOT_ELIGIBLE is used to determine if", + PAGE_IS_NOT_ELIGIBLE doesn't look like a function + I thought make it possible not only prefetch range, but also free + some memory that is not used already + braunr: ^ + but didn't implement it :/ + don't overengineer it + reduce to what's needed + braunr: ok + braunr: do you think it's worth to implement? + no + braunr: it could be useful for sequential policy + describe what you have in mind a bit more please, i think i don't + have the complete picture + with sequential policy user supposed to read strictly in sequential + order, so pages that user is not supposed to read could be put in unused + list + what pages the user isn't supposed to read ? + if user read pages in increasing order than it is not supposed to + read pages that are right before the page where page fault occured + right ? + do you mean higher ? + that are before + before would be lower then + oh + "right before" + yes :) + why not ? + the initial assumption, that MADV_SEQUENTIAL expects *strict* + sequential access, looks wrong + remember it's just a hint + a user could just acces pages that are closer to one another and + still use MADV_SEQUENTIAL, expecting a speedup because pages are close + well ok, this wouldn't be wise + MADV_SEQUENTIAL should be optimized for true sequential access, + agreed + but i'm not sure i'm following you + but I'm not going to page these pages out. Just put in unused + list, and if they will be used later they will be move to active list + your optimization seem to be about freeing pages that were + prefetched and not actually accessed + what's the unused list ? + inactive list + ok + so that they're freed sooner + yes + well, i guess all neighbour pages should first be put in the + inactive list + iirc, pages in the inactive list aren't mapped + this would force another page fault, with a quick resolution, to + tell the vm system the page was actually used, and must become active, + and paged out later than other inactive pages + but i really think it's not worth doing it now + clustered pagins is about improving I/O + page faults without I/O are orders of magnitude faster than I/O + it wouldn't bring much right now + ok, I remove this, but put in TODO + I'm not sure that right list is inactive list, but the list that is + scanned to pageout pages to swap partition. There should be such list + both the active and inactive are + the active one is scanned when the inactive isn't large enough + (the current ratio of active pages is limited to 1/3) + (btw, we could try increasing it to 1/2) + iirc, linux uses 1/2 + your comment about unlock_request isn't obvious, i'll have to + reread again + i mean, the problem isn't obvious + ew, functions with so many indentation levels :/ + i forgot how ugly some parts of the mach vm were + mcsim: basically it's ok, i'll wait for the simplified version for + another pass + simplified? + 22:11 < braunr> reduce to what's needed + ok + and what comment? + your XXX in vm_fault.c + when calling vm_calculate_clusters + is m->unlock_request the same for all cluster or I should + recalculate it for every page? + s/all/whole + that's what i say, i'll have to come back to that later + after i have reviewed the userspace code i think + so i understand the interactions better + braunr: pushed v1 branch + braunr: "Move new calls to gnumach.defs file" and "Implement + putting pages in inactive list with sequential policy" are in my TODO + mcsim: ok + + +## IRC, freenode, #hurd, 2013-02-24 + + mcsim: where does the commit from neal (reworking libpager) come + from ? + (ok the question looks a little weird semantically but i think you + get my point) + braunr: you want me to give you a link to mail with this commit? + why not, yes + http://permalink.gmane.org/gmane.os.hurd.bugs/446 + ok so + http://lists.gnu.org/archive/html/bug-hurd/2012-06/msg00001.html + ok so, we actually have three things to review here + that libpager patch, the ext2fs large store one, and your work + mcsim: i suppose something in your work depends on neal's patch, + right ? + i mean, why did you work on top of it ? + Yes + All user level code + i see it adds some notifications + no + notifacations are for large store + ok + but the rest is for my work + but what does it do that you require ? + braunr: this patch adds support for multipage work. There were just + stubs that returned errors for chunks longer than one page before. + ok + for now, i'll just consider that it's ok, as well as the large + store patch + ok i've skipped all patches up to "Make mach-defpager process + multipage requests in m_o_data_request." since they're obvious + but this one isn't + mcsim: why is the offset member a vm_size_t in struct block ? + (these things matter for large file support on 32-bit systems) + braunr: It should be vm_offset_t, right? + yes + well + it seems so but + im not sure what offset is here + vm_offset is normally the offset inside a vm_object + and if we want large file support, it could become a 64-bit + integer + while vm_size_t is a size inside an address space, so it's either + 32 or 64-bit, depending on the address space size + but here, if offset is an offset inside an address space, + vm_size_t is fine + same question for send_range_parameters + braunr: TBH, I do not differ vm_size_t and vm_offset_t well + they can be easily confused yes + they're both offsets and sizes actually + they're integers + so here I used vm_offset_t because field name is offset + but vm_size_t is an offset/size inside an address space (a + vm_map), while vm_offset_t is an offset/size inside an object + braunr: I didn't know that + it's not clear at all + and it may not have been that clear in mach either + but i think it's best to consider them this way from now on + well, it's not that important anyway since we don't have large + file support, but we should some day :/ + i'm afraid we'll have it as a side effect of the 64-bit port + mcsim: just name them vm_offset_t when they're offsets for + consistency + but seems that I guessed, because I use vm_offset_t variables in + mo_ functions + well ok, but my question was about struct block + where you use vm_size_t + braunr: I consider this like a mistake + ok + moving on + in upload_range, there are two XXX comments + i'm not sure to understand + Second XXX I put because at the moment when I wrote this not all + hurd libraries and servers supported size different from vm_page_size + But then I fixed this and replaced vm_page_size with size in + page_read_file_direct + ok then update the comment accordingly + When I was adding third XXX, I tried to check everything. But I + still had felling that I forgot something. + No it is better to remove second and third XXX, since I didn't find + what I missed + well, that's what i mean by "update" :) + ok + and first XXX just an optimisation. Its idea is that there is no + case when the whole structure is used in one function. + ok + But I was not sure if was worth to do, because if there will appear + some bug in future it could be hard to find it. + I mean that maintainability decreases because of using union + So, I'd rather keep it like it is + how is struct send_range_parameters used ? + it doesn't looked to be something stored long + also, you're allowed to use GNU extensions + It is used to pass parameters from one function to another + which of them? + see + http://gcc.gnu.org/onlinedocs/gcc-4.4.7/gcc/Unnamed-Fields.html#Unnamed-Fields + mcsim: if it's used to pass parameters, it's likely always on the + stack + braunr: I use it when necessary + we really don't care much about a few extra words on the stack + the difference in size would + agree + matter + oops + the difference in size would matter if a lot of those were stored + in memory for long durations + that's not the case, so the size isn't a problem, and you should + remove the comment + ok + mcsim: if i get it right, the libpager rework patch changes some + parameters from byte offset to page frame numbers + braunr: yes + why don't you check errors in send_range ? + braunr: it was absent in original code, but you're right, I should + do this + i'm not sure how to handle any error there, but at least an assert + I found a place where pager just panics + for now it's ok + your work isn't about avoiding panics, but there must be a check, + so if we can debug it and reach that point, we'll know what went wrong + i don't understand the prototype change of default_read :/ + it looks like it doesn't return anything any more + has it become asynchronous ? + It was returning some status before, but now it handles this status + on its own + hum + how ? + how do you deal with errors ? + in old code default_read returned kr and this kr was used to + determine what m_o_ function will be used + now default_read calls m_o_ on its own + ok -- cgit v1.2.3