1 files changed, 991 insertions, 0 deletions
diff --git a/open_issues/performance/io_system/read-ahead.mdwn b/open_issues/performance/io_system/read-ahead.mdwn
index 710c746b..706e1632 100644
--- a/open_issues/performance/io_system/read-ahead.mdwn
+++ b/open_issues/performance/io_system/read-ahead.mdwn
@@ -1565,3 +1565,994 @@ License|/fdl]]."]]"""]]
     <braunr> mcsim1: just use sane values inside the kernel :p
     <braunr> this simplifies things by only adding the new vm_advise call and
       not change the existing external pager interface
+
+
+## IRC, freenode, #hurd, 2012-07-12
+
+    <braunr> mcsim: so, to begin with, tell us what state you've reached please
+    <mcsim> braunr: I'm writing code for hurd and gnumach. For gnumach I'm
+      implementing memory policies now. RANDOM and NORMAL seems work, but in
+      hurd I found error that I made during editing ext2fs. So for now ext2fs
+      does not work
+    <braunr> policies ?
+    <braunr> what about mechanism ?
+    <mcsim> also I moved some translators to new interface.
+    <mcsim> It works too
+    <braunr> well that's impressive
+    <mcsim> braunr: I'm not sure yet that everything works
+    <braunr> right, but that's already a very good step
+    <braunr> i thought you were still working on the interfaces to be honest
+    <mcsim> And with mechanism I didn't implement moving pages to inactive
+      queue
+    <braunr> what do you mean ?
+    <braunr> ah you mean with the sequential policy ?
+    <mcsim> yes
+    <braunr> you can consider this a secondary goal
+    <mcsim> sequential I was going to implement like you've said, but I still
+      want to support moving pages to inactive queue
+    <braunr> i think you shouldn't
+    <braunr> first get to a state where clustered transfers do work fine
+    <mcsim> policies are implemented in function calculate_clusters
+    <braunr> then, you can try, and measure the difference
+    <mcsim> ok. I'm now working on fixing ext2fs
+    <braunr> so, except from bug squashing, what's left to do ?
+    <mcsim> finish policies and ext2fs; move fatfs, ufs, isofs to new
+      interface; test this all; edit patches from debian repository, that
+      conflict with my changes; rearrange commits and fix code indentation;
+      update documentation;
+    <braunr> think about measurements too
+    <tschwinge> mcsim: Please don't spend a lot of time on ufs.  No testing
+      required for that one.
+    <braunr> and keep us informed about your progress on bug fixing, so we can
+      test soon
+    <mcsim> Forgot about moving system to new interfaces (I mean determine form
+      of vm_advise and memory_object_change_attributes)
+    <braunr> s/determine/final/
+    <mcsim> braunr: ok.
+    <braunr> what do you mean "moving system to new interfaces" ?
+    <mcsim> braunr: I also pushed code changes to gnumach and hurd git
+      repositories
+    <mcsim> I met an issue with memory_object_change_attributes when I tried to
+      use it as I have to update all applications that use it. This includes
+      libc and translators that are not in hurd repository or use debian
+      patches. So I will not be able to run system with new
+      memory_object_change_attributes interface, until I update all software
+      that use this rpc
+    <braunr> this is a bit like the problem i had with my change
+    <braunr> the solution is : don't do it
+    <braunr> i mean, don't change the interface in an incompatible way
+    <braunr> if you can't change an existing call, add a new one
+    <mcsim> temporary I changed memory_object_set_attributes as it isn't used
+      any more.
+    <mcsim> braunr: ok. Adding new call is a good idea :)
+
+
+## IRC, freenode, #hurd, 2012-07-16
+
+    <braunr> mcsim: how did you deal with multiple page transfers towards the
+      default pager ?
+    <mcsim> braunr: hello. Didn't handle this yet, but AFAIR default pager
+      supports multiple page transfers.
+    <braunr> mcsim: i'm almost sure it doesn't
+    <mcsim> braunr: indeed
+    <mcsim> braunr: So, I'll update it just other translators.
+    <braunr> like other translators you mean ?
+    <mcsim> *just as
+    <mcsim> braunr: yes
+    <braunr> ok
+    <braunr> be aware also that it may need some support in vm_pageout.c in
+      gnumach
+    <mcsim> braunr: thank you
+    <braunr> if you see anything strange in the default pager, don't hesitate
+      to talk about it
+    <mcsim> braunr: ok. I didn't finish with ext2fs yet.
+    <braunr> so it's a good thing you're aware of it now, before you begin
+      working on it :)
+    <mcsim> braunr: I'm working on ext2 now.
+    <braunr> yes i understand
+    <braunr> i meant "before beginning work on the default pager"
+    <mcsim> ok
+
+    <antrik> mcsim: BTW, we were mostly talking about readahead (pagein) over
+      the past weeks, so I wonder what the status on clustered page*out* is?...
+    <mcsim> antrik: I don't work on this, but following, I think, is an example
+      of *clustered* pageout: _pager_seqnos_memory_object_data_return: object =
+      113, seqno = 4, control = 120, start_address = 0, length = 8192, dirty =
+      1. This is an example of debugging printout that shows that pageout
+      manipulates with chunks bigger than page sized.
+    <mcsim> antrik: Another one with bigger length
+      _pager_seqnos_memory_object_data_return: object = 125, seqno = 124,
+      control = 132, start_address = 131072, length = 126976, dirty = 1, kcopy
+    <antrik> mcsim: that's odd -- I didn't know the functionality for that even
+      exists in our codebase...
+    <antrik> my understanding was that Mach always sends individual pageout
+      requests for ever single page it wants cleaned...
+    <antrik> (and this being the reason for the dreadful thread storms we are
+      facing...)
+    <braunr> antrik: ok
+    <braunr> antrik: yes that's what is happening
+    <braunr> the thread storms aren't that much of a problem now
+    <braunr> (by carefully throttling pageouts, which is a task i intend to
+      work on during the following months, this won't be an issue any more)
+
+
+## IRC, freenode, #hurd, 2012-07-19
+
+    <mcsim> I moved fatfs, ufs, isofs to new interface, corrected some errors
+      in other that I already moved, moved kernel to new interface (renamed
+      vm_advice to vm_advise and added rpcs memory_object_set_advice and
+      memory_object_get_advice). Made some changes in mechanism and tried to
+      finish ext2 translator.
+    <mcsim> braunr: I've got an issue with fictitious pages...
+    <mcsim> When I determine bounds of cluster in external object I never know
+      its actual size. So, mo_data_request call could ask data that are behind
+      object bounds. The problem is that pager returns data that it has and
+      because of this fictitious pages that were allocated are not freed.
+    <braunr> why don't you know the size ?
+    <mcsim> I see 2 solutions. First one is do not allocate fictitious pages at
+      all (but I think that there could be issues). Another lies in allocating
+      fictitious pages, but then freeing them with mo_data_lock.
+    <mcsim> braunr: Because pages does not inform kernel about object size.
+    <braunr> i don't understand what you mean
+    <mcsim> I think that second way is better.
+    <braunr> so how does it happen ?
+    <braunr> you get a page fault
+    <mcsim> Don't you understand problem or solutions?
+    <braunr> then a lookup in the map finds the map entry
+    <braunr> and the map entry gives you the link to the underlying object
+    <mcsim> from vm_object.h: 	vm_size_t		size;		/*
+      Object size (only valid if internal)				 */
+    <braunr> mcsim: ugh
+    <mcsim> For external they are either 0x8000 or 0x20000...
+    <braunr> and for internal ?
+    <braunr> i'm very surprised to learn that
+    <mcsim> braunr: for internal size is actual
+    <braunr> right sorry, wrong question
+    <braunr> did you find what 0x8000 and 0x20000 are ?
+    <mcsim> for external I met only these 2 magic numbers when printed out
+      arguments of functions _pager_seqno_memory_object_... when they were
+      called.
+    <braunr> yes but did you try to find out where they come from ?
+    <mcsim> braunr: no. I think that 0x2000(many zeros) is maximal possible
+      object size.
+    <braunr> what's the exact value ?
+    <mcsim> can't tell exactly :/ My hurd box has broken again.
+    <braunr> mcsim: how does the vm find the backing content then ?
+    <mcsim> braunr: Do you know if it is guaranteed that map_entry size will be
+      not bigger than external object size?
+    <braunr> mcsim: i know it's not
+    <braunr> but you can use the map entry boundaries though
+    <mcsim> braunr: vm asks pager
+    <braunr> but if the page is already present
+    <braunr> how does it know ?
+    <braunr> it must be inside a vm_object ..
+    <mcsim> If I can use these boundaries than the problem, I described is not
+      actual.
+    <braunr> good
+    <braunr> it makes sense to use these boundaries, as the application can't
+      use data outside the mapping
+    <mcsim> I ask page with vm_page_lookup
+    <braunr> it would matter for shared objects, but then they have their own
+      faults :p
+    <braunr> ok
+    <braunr> so the size is actually completely ignord
+    <mcsim> if it is present than I stop expansion of cluster.
+    <braunr> which makes sense
+    <mcsim> braunr: yes, for external.
+    <braunr> all right
+    <braunr> use the mapping boundaries, it will do
+    <braunr> mcsim: i have only one comment about what i could see
+    <braunr> mcsim: there are 'advice' fields in both vm_map_entry and
+      vm_object
+    <braunr> there should be something else in vm_object
+    <braunr> i told you about pages before and after
+    <braunr> mcsim: how are you using this per object "advice" currently ?
+    <braunr> (in addition, using the same name twice for both mechanism and
+      policy is very sonfusing)
+    <braunr> confusing*
+    <mcsim> braunr: I try to expand cluster as much as it possible, but not
+      much than limit
+    <mcsim> they both determine policy, but advice for entry has bigger
+      priority
+    <braunr> that's wrong
+    <braunr> mapping and content shouldn't compete for policy
+    <braunr> the mapping tells the policy (=the advice) while the content tells
+      how to implement (e.g. how much content)
+    <braunr> IMO, you could simply get rid of the per object "advice" field and
+      use default values for now
+    <mcsim> braunr: What sense these values for number of pages before and
+      after should have?
+    <braunr> or use something well known, easy, and effective like preceding
+      and following pages
+    <braunr> they give the vm the amount of content to ask the backing pager
+    <mcsim> braunr: maximal amount, minimal amount or exact amount?
+    <braunr> neither
+    <braunr> that's why i recommend you forget it for now
+    <braunr> but
+    <braunr> imagine you implement the three standard policies (normal, random,
+      sequential)
+    <braunr> then the pager assigns preceding and following numbers for each of
+      them, say [5;5], [0;0], [15;15] respectively
+    <braunr> these numbers would tell the vm how many pages to ask the pagers
+      in a single request and from where
+    <mcsim> braunr: but in fact there could be much more policies.
+    <braunr> yes
+    <mcsim> also in kernel context there is no such unit as pager.
+    <braunr> so there should be a call like memory_object_set_advice(int
+      advice, int preceding, int following);
+    <braunr> for example
+    <braunr> what ?
+    <braunr> the pager is the memory manager
+    <braunr> it does exist in kernel context
+    <braunr> (or i don't understand what you mean)
+    <mcsim> there is only port, but port could be either pager or something
+      else
+    <braunr> no, it's a pager
+    <braunr> it's a port whose receive right is hold by a task implementing the
+      pager interface
+    <braunr> either the default pager or an untrusted task
+    <braunr> (or null if the object is anonymous memory not yet sent to the
+      default pager)
+    <mcsim> port is always pager?
+    <braunr> the object port is, yes
+    <braunr>         struct ipc_port         *pager;         /* Where to get
+      data */
+    <mcsim> So, you suggest to keep set of advices for each object?
+    <braunr> i suggest you don't change anything in objects for now
+    <braunr> keep the advice in the mappings only, and implement default
+      behaviour for the known policies
+    <braunr> mcsim: if you understand this point, then i have nothing more to
+      say, and we should let nowhere_man present his work
+    <mcsim> braunr: ok. I'll implement only default behaviors for know policies
+      for now.
+    <braunr> (actually, using the mapping boundaries is slightly unoptimal, as
+      we could have several mappings for the same content, e.g. a program with
+      read only executable mapping, then ro only)
+    <braunr> mcsim: another way to know the "size" is to actually lookup for
+      pages in objects
+    <braunr> hm no, that's not true
+    <mcsim> braunr: But if there is no page we have to ask it
+    <mcsim> and I don't understand why using mappings boundaries is unoptimal
+    <braunr> here is bash
+    <braunr> 0000000000400000    868K r-x--  /bin/bash
+    <braunr> 00000000006d9000     36K rw---  /bin/bash
+    <braunr> two entries, same file
+    <braunr> (there is the anonymous memory layer for the second, but it would
+      matter for the first cow faults)
+
+
+## IRC, freenode, #hurd, 2012-08-02
+
+    <mcsim> braunr: You said that I probably need some support in vm_pageout.c
+      to make defpager work with clustered page transfers, but TBH I thought
+      that I have to implement only pagein. Do you expect from me implementing
+      pageout either? Or I misunderstand role of vm_pageout.c?
+    <braunr> no
+    <braunr> you're expected to implement only pagins for now
+    <braunr> pageins
+    <mcsim> well, I'm finishing merging of ext2fs patch for large stores and
+      work on defpager in parallel.
+    <mcsim> braunr: Also I didn't get your idea about configuring of paging
+      mechanism on behalf of pagers.
+    <braunr> which one ?
+    <mcsim> braunr: You said that pager has somehow pass size of desired
+      clusters for different paging policies.
+    <braunr> mcsim: i said not to care about that
+    <braunr> and the wording isn't correct, it's not "on behalf of pagers"
+    <mcsim> servers?
+    <braunr> pagers could tell the kernel what size (before and after a faulted
+      page) they prefer for each existing policy
+    <braunr> but that's one way to do it
+    <braunr> defaults work well too
+    <braunr> as shown in other implementations
+
+
+## IRC, freenode, #hurd, 2012-08-09
+
+    <mcsim> braunr: I'm still debugging ext2 with large storage patch
+    <braunr> mcsim: tough problems ?
+    <mcsim> braunr: The same issues as I always meet when do debugging, but it
+      takes time.
+    <braunr> mcsim: so nothing blocking so far ?
+    <mcsim> braunr: I can't tell you for sure that I will finish up to 13th of
+      August and this is unofficial pencil down date.
+    <braunr> all right, but are you blocked ?
+    <mcsim> braunr: If you mean the issues that I can not even imagine how to
+      solve than there is no ones.
+    <braunr> good
+    <braunr> mcsim: i'll try to review your code again this week end
+    <braunr> mcsim: make sure to commit everything even if it's messy
+    <mcsim> braunr: ok
+    <mcsim> braunr: I made changes to defpager, but I haven't tried
+      them. Commit them too?
+    <braunr> mcsim: sure
+    <braunr> mcsim: does it work fine without the large storage patch ?
+    <mcsim> braunr: looks fine, but TBH I can't even run such things like fsx,
+      because even without my changes it failed mightily at once.
+    <braunr> mcsim: right, well, that will be part of another task :)
+
+
+## IRC, freenode, #hurd, 2012-08-13
+
+    <mcsim> braunr: hello. Seems ext2fs with large store patch works.
+
+
+## IRC, freenode, #hurd, 2012-08-19
+
+    <mcsim> hello. Consider such situation. There is a page fault and kernel
+      decided to request pager for several pages, but at the moment pager is
+      able to provide only first pages, the rest ones are not know yet. Is it
+      possible to supply only one page and regarding rest ones tell the kernel
+      something like: "Rest pages try again later"?
+    <mcsim> I tried pager_data_unavailable && pager_flush_some, but this seems
+      does not work.
+    <mcsim> Or I have to supply something anyway?
+    <braunr> mcsim: better not provide them
+    <braunr> the kernel only really needs one page
+    <braunr> don't try to implement "try again later", the kernel will do that
+      if other page faults occur for those pages
+    <mcsim> braunr: No, translator just hangs
+    <braunr> ?
+    <mcsim> braunr: And I even can't deattach it without reboot
+    <braunr> hangs when what 
+    <braunr> ?
+    <braunr> i mean, what happens when it hangs ?
+    <mcsim> If kernel request 2 pages and I provide one, than when page fault
+      occurs in second page translator hangs.
+    <braunr> well that's a bug
+    <braunr> clustered pager transfer is a mere optimization, you shouldn't
+      transfer more than you can just to satisfy some requested size
+    <mcsim> I think that it because I create fictitious pages before calling
+      mo_data_request
+    <braunr> as placeholders ?
+    <mcsim> Yes. Is it correct if I will not grab fictitious pages?
+    <braunr> no
+    <braunr> i don't know the details well enough about fictitious pages
+      unfortunately, but it really feels wrong to use them where real physical
+      pages should be used instead
+    <braunr> normally, an in-transfer page is simply marked busy
+    <mcsim> But If page is already marked busy kernel will not ask it another
+      time.
+    <braunr> when the pager replies, you unbusy them
+    <braunr> your bug may be that you incorrectly use pmap
+    <braunr> you shouldn't create mmu mappings for pages you didn't receive
+      from the pagers
+    <mcsim> I don't create them
+    <braunr> ok so you correctly get the second page fault
+    <mcsim> If pager supplies only first pages, when asked were two, than
+      second page will not become un-busy.
+    <braunr> that's a bug
+    <braunr> your code shouldn't assume the pager will provide all the pages it
+      was asked for
+    <braunr> only the main one
+    <mcsim> Will it be ok if I will provide special attribute that will keep
+      information that page has been advised?
+    <braunr> what for ?
+    <braunr> i don't understand "page has been advised"
+    <mcsim> Advised page is page that is asked in cluster, but there wasn't a
+      page fault in it.
+    <mcsim> I need this attribute because if I don't inform kernel about this
+      page anyhow, than kernel will not change attributes of this page.
+    <braunr> why would it change its attributes ?
+    <mcsim> But if page fault will occur in page that was asked than page will
+      be already busy by the moment.
+    <braunr> and what attribute ?
+    <mcsim> advised
+    <braunr> i'm lost
+    <braunr> 08:53 < mcsim> I need this attribute because if I don't inform
+      kernel about this page anyhow, than kernel will not change attributes of
+      this page.
+    <braunr> you need the advised attribute because if you don't inform the
+      kernel about this page, the kernel will not change the advised attribute
+      of this page ?
+    <mcsim> Not only advised, but busy as well.
+    <mcsim> And if page fault will occur in this page, kernel will not ask it
+      second time. Kernel will just block.
+    <braunr> well that's normal
+    <mcsim> But if kernel will block and pager is not going to report somehow
+      about this page, than translator will hang.
+    <braunr> but the pager is going to report
+    <braunr> and in this report, there can be less pages then requested
+    <mcsim> braunr: You told not to report
+    <braunr> the kernel can deduce it didn't receive all the pages, and mark
+      them unbusy anyway
+    <braunr> i told not to transfer more than requested
+    <braunr> but not sending data can be a form of communication
+    <braunr> i mean, sending a message in which data is missing
+    <braunr> it simply means its not there, but this info is sufficient for the
+      kernel
+    <mcsim> hmmm... Seems I understood you. Let me try something.
+    <mcsim> braunr: I informed kernel about missing page as follows:
+      pager_data_supply (pager, precious, writelock, i, 1, NULL, 0); Am I
+      right?
+    <braunr> i don't know the interface well
+    <braunr> what does it mean 
+    <braunr> ?
+    <braunr> are you passing NULL as the data for a missing page ?
+    <mcsim> yes
+    <braunr> i see
+    <braunr> you shouldn't need a request for that though, avoiding useless ipc
+      is a good thing
+    <mcsim> i is number of page, 1 is quantity
+    <braunr> but if you can't find a better way for now, it will do
+    <mcsim> But this does not work :(
+    <braunr> that's a bug
+    <braunr> in your code probably
+    <mcsim> braunr: supplying NULL as data returns MACH_SEND_INVALID_MEMORY
+    <braunr> but why would it work ?
+    <braunr> mach expects something
+    <braunr> you have to change that
+    <mcsim> It's mig who refuses data. Mach does not even get the call.
+    <braunr> hum
+    <mcsim> That's why I propose to provide new attribute, that will keep
+      information regarding whether the page was asked as advice or not.
+    <braunr> i still don't understand why
+    <braunr> why don't you fix mig so you can your null message instead ?
+    <braunr> +send
+    <mcsim> braunr: because usually this is an error
+    <braunr> the kernel will decide if it's an erro
+    <braunr> r
+    <braunr> what kinf of reply do you intend to send the kernel with for these
+      "advised" pages ?
+    <mcsim> no reply. But when page fault will occur in busy page and it will
+      be also advised, kernel will not block, but ask this page another time.
+    <mcsim> And how kernel will know that this is an error or not?
+    <braunr> why ask another time ?!
+    <braunr> you really don't want to flood pagers with useless messages
+    <braunr> here is how it should be
+    <braunr> 1/ the kernel requests pages from the pager
+    <braunr> it know the range
+    <braunr> 2/ the pager replies what it can, full range, subset of it, even
+      only one page
+    <braunr> 3/ the kernel uses what the pager replied, and unbusies the other
+      pages
+    <mcsim> First time page was asked because page fault occurred in
+      neighborhood. And second time because PF occurred in page. 
+    <braunr> well it shouldn't
+    <braunr> or it should, but then you have a segfault
+    <mcsim> But kernel does not keep bound of range, that it asked.
+    <braunr> if the kernel can't find the main page, the one it needs to make
+      progress, it's a segfault
+    <mcsim> And this range could be supplied in several messages.
+    <braunr> absolutely not
+    <braunr> you defeat the purpose of clustered pageins if you use several
+      messages
+    <mcsim> But interface supports it
+    <braunr> interface supported single page transfers, doesn't mean it's good
+    <braunr> well, you could use several messages
+    <braunr> as what we really want is less I/O
+    <mcsim> Noone keeps bounds of requested range, so it couldn't be checked
+      that range was split 
+    <braunr> but it would be so much better to do it all with as few messages
+      as possible
+    <braunr> does the kernel knows the main page ?
+    <braunr> know*
+    <mcsim> Splitting range is not optimal, but it's not an error.
+    <braunr> i assume it does
+    <braunr> doesn't it ?
+    <mcsim> no, that's why I want to provide new attribute.
+    <braunr> i'm sorry i'm lost again
+    <braunr> how does the kernel knows a page fault has been serviced ?
+    <braunr> know*
+    <mcsim> It receives an interrupt
+    <braunr> ?
+    <braunr> let's not mix terms
+    <mcsim> oh.. I read as received. Sorry
+    <mcsim> It get mo_data_supply message. Than it replaces fictitious pages
+      with real ones.
+    <braunr> so you get a message
+    <braunr> and you kept track of the range using fictitious pages
+    <braunr> use the busy flag instead, and another way to retain the range
+    <mcsim> I allocate fictitious pages to reserve place. Than if page fault
+      will occur in this page fictitious page kernel will not send another
+      mo_data_request call, it will wait until fictitious page unblocks.
+    <braunr> i'll have to check the code but it looks unoptimal to me
+    <braunr> we really don't want to allocate useless objects when a simple
+      busy flag would do
+    <mcsim> busy flag for what? There is no page yet
+    <braunr> we're talking about mo_data_supply
+    <braunr> actually we're talking about the whole page fault process
+    <mcsim> We can't mark nothing as busy, that's why kernel allocates
+      fictitious page and marks it as busy until real page would be supplied.
+    <braunr> what do you mean "nothing" ?
+    <mcsim> VM_PAGE_NULL
+    <braunr> uh ?
+    <braunr> when are physical pages allocated ?
+    <braunr> on request or on reply from the pager ?
+    <braunr> i'm reading mo_data_supply, and it looks like the page is already
+      busy at that time
+    <mcsim> they are allocated by pager and than supplied in reply
+    <mcsim> Yes, but these pages are fictitious
+    <braunr> show me please
+    <braunr> in the master branch, not yours
+    <mcsim> that page is fictitious?
+    <braunr> yes
+    <braunr> i'm referring to the way mach currently does things
+    <mcsim> vm/vm_fault.c:582
+    <braunr> that's memory_object_lock_page
+    <braunr> hm wait
+    <braunr> my bad
+    <braunr> ah that damn object chaining :/
+    <braunr> ok
+    <braunr> the original code is stupid enough to use fictitious pages all the
+      time, you probably have to do the same
+    <mcsim> hm... Attributes will be useless, pager should tell something about
+      pages, that it is not going to supply.
+    <braunr> yes
+    <braunr> that's what null is for
+    <mcsim> Not null, null is error.
+    <braunr> one problem i can think of is making sure the kernel doesn't
+      interpret missing as error
+    <braunr> right
+    <mcsim> I think better have special value for mo_data_error
+    <braunr> probably
+
+
+### IRC, freenode, #hurd, 2012-08-20
+
+    <antrik> braunr: I think it's useful to allow supplying the data in several
+      batches. the kernel should *not* assume that any data missing in the
+      first batch won't be supplied later.
+    <braunr> antrik: it really depends
+    <braunr> i personally prefer synchronous approaches
+    <antrik> demanding that all data is supplied at once could actually turn
+      readahead into a performace killer
+    <mcsim> antrik: Why? The only drawback I see is higher response time for
+      page fault, but it also leads to reduced overhead.
+    <braunr> that's why "it depends"
+    <braunr> mcsim: it brings benefit only if enough preloaded pages are
+      actually used to compensate for the time it took the pager to provide
+      them
+    <braunr> which is the case for many workloads (including sequential access,
+      which is the common case we want to optimize here)
+    <antrik> mcsim: the overhead of an extra RPC is negligible compared to
+      increased latencies when dealing with slow backing stores (such as disk
+      or network)
+    <mcsim> antrik: also many replies lead to fragmentation, while in one reply
+      all data is gathered in one bunch. If all data is placed consecutively,
+      than it may be transferred next time faster.
+    <braunr> mcsim: what kind of fragmentation ?
+    <antrik> I really really don't think it's a good idea for the page to hold
+      back the first page (which is usually the one actually blocking) while
+      it's still loading some other pages (which will probably be needed only
+      in the future anyways, if at all)
+    <antrik> err... for the pager to hold back
+    <braunr> antrik: then all pagers should be changed to handle asynchronous
+      data supply
+    <braunr> it's a bit late to change that now
+    <mcsim> there could be two cases of data placement in backing store: 1/ all
+      asked data is placed consecutively; 2/ it is spread among backing
+      store. If pager gets data in one message it more like place it
+      consecutively. So to have data consecutive in each pager, each pager has
+      to try send data in one message. Having data placed consecutive is
+      important, since reading of such data is much more faster.
+    <braunr> mcsim: you're confusing things ..
+    <braunr> or you're not telling them properly
+    <mcsim> Ok. Let me try one more time
+    <braunr> since you're working *only* on pagein, not pageout, how do you
+      expect spread pages being sent in a single message be better than
+      multiple messages ?
+    <mcsim> braunr: I think about future :)
+    <braunr> ok
+    <braunr> but antrik is right, paging in too much can reduce performance
+    <braunr> so the default policy should be adjusted for both the worst case
+      (one page) and the average/best (some/mane contiguous pages)
+    <braunr> through measurement ideally
+    <antrik> mcsim: BTW, I still think implementing clustered pageout has
+      higher priority than implementing madvise()... but if the latter is less
+      work, it might still make sense to do it first of course :-)
+    <braunr> many*
+    <braunr> there aren't many users of madvise, true
+    <mcsim> antrik: Implementing madvise I expect to be very simple. It should
+      just translate call to vm_advise
+    <antrik> well, that part is easy of course :-) so you already implemented
+      vm_advise itself I take it?
+    <mcsim> antrik: Yes, that was also quite easy.
+    <antrik> great :-)
+    <antrik> in that case it would be silly of course to postpone implementing
+      the madvise() wrapper. in other words: never mind my remark about
+      priorities :-)
+
+
+## IRC, freenode, #hurd, 2012-09-03
+
+    <mcsim> I try a test with ext2fs. It works, than I just recompile ext2fs
+      and it stops working, than I recompile it again several times and each
+      time the result is unpredictable.
+    <braunr> sounds like a concurrency issue
+    <mcsim> I can run the same test several times and ext2 works until I
+      recompile it. That's the problem. Could that be concurrency too?
+    <braunr> mcsim: without bad luck, yes, unless "several times" is a lot
+    <braunr> like several dozens of tries
+
+
+## IRC, freenode, #hurd, 2012-09-04
+
+    <mcsim> hello. I want to tell that ext2fs translator, that I work on,
+      replaced for my system old variant that processed only single pages
+      requests. And it works with partitions bigger than 2 Gb.
+    <mcsim> Probably I'm not for from the end.
+    <mcsim> But it's worth to mention that I didn't fix that nasty bug that I
+      told yesterday about.
+    <mcsim> braunr: That bug sometimes appears after recompilation of ext2fs
+      and always disappears after sync or reboot. Now I'm going to finish
+      defpager and test other translators.
+
+
+## IRC, freenode, #hurd, 2012-09-17
+
+    <mcsim> braunr: hello. Do you remember that you said that pager has to
+      inform kernel about appropriate cluster size for readahead?
+    <mcsim> I don't understand how kernel store this information, because it
+      does not know about such unit as "pager".
+    <mcsim> Can you give me an advice about how this could be implemented?
+    <youpi> mcsim: it can store it in the object
+    <mcsim> youpi: It too big overhead
+    <mcsim> youpi: at least from my pow
+    <mcsim> *pov
+    <braunr> mcsim: we discussed this already
+    <braunr> mcsim: there is no "pager" entity in the kernel, which is a defect
+      from my PoV
+    <braunr> mcsim: the best you can do is follow what the kernel already does
+    <braunr> that is, store this property per object$
+    <braunr> we don't care much about the overhead for now
+    <braunr> my guess is there is already some padding, so the overhead is
+      likely to be amortized by this
+    <braunr> like youpi said
+    <mcsim> I remember that discussion, but I didn't get than whether there
+      should be only one or two values for all policies. Or each policy should
+      have its own values?
+    <mcsim> braunr: ^
+    <braunr> each policy should have its own values, which means it can be
+      implemented with a simple static array somewhere
+    <braunr> the information in each object is a policy selector, such as an
+      index in this static array
+    <mcsim> ok
+    <braunr> mcsim: if you want to minimize the overhead, you can make this
+      selector a char, and place it near another char member, so that you use
+      space that was previously used as padding by the compiler
+    <braunr> mcsim: do you see what i mean ?
+    <mcsim> yes
+    <braunr> good
+
+
+## IRC, freenode, #hurd, 2012-09-17
+
+    <mcsim> hello. May I add function krealloc to slab.c?
+    <braunr> mcsim: what for ?
+    <mcsim> braunr: It is quite useful for creating dynamic arrays
+    <braunr> you don't want dynamic arrays
+    <mcsim> why?
+    <braunr> they're expensive
+    <braunr> try other data structures
+    <mcsim> more expensive than linked lists?
+    <braunr> depends
+    <braunr> but linked lists aren't the only other alternative
+    <braunr> that's why btrees and radix trees (basically trees of arrays)
+      exist
+    <braunr> the best general purpose data structure we have in mach is the red
+      black tree currently
+    <braunr> but always think about what you want to do with it
+    <mcsim> I want to store there sets of sizes for different memory
+      policies. I don't expect this array to be big. But for sure I can use
+      rbtree for it.
+    <braunr> why not a static array ?
+    <braunr> arrays are perfect for known data sizes
+    <mcsim> I expect from pager to supply its own sizes. So at the beginning in
+      this array is only default policy. When pager wants to supply it own
+      policy kernel lookups table of advice. If this policy is new set of sizes
+      then kernel creates new entry in table of advice.
+    <braunr> that would mean one set of sizes for each object
+    <braunr> why don't you make things simple first ?
+    <mcsim> Object stores only pointer to entry in this table.
+    <braunr> but there is no pager object shared by memory objects in the
+      kernel
+    <mcsim> I mean struct vm_object
+    <braunr> so that's what i'm saying, one set per object
+    <braunr> it's useless overhead
+    <braunr> i would really suggest using a global set of policies for now
+    <mcsim> Probably, I don't understand you. Where do you want to store this
+      static array?
+    <braunr> it's a global one
+    <mcsim> "for now"? It is not a problem to implement a table for local
+      advice, using either rbtree or dynamic array.
+    <braunr> it's useless overhead
+    <braunr> and it's not a single integer, you want a whole container per
+      object
+    <braunr> don't do anything fancy unless you know you really want it
+    <braunr> i'll link the netbsd code again as a very good example of how to
+      implement global policies that work more than decently for every file
+      system in this OS
+    <braunr>
+      http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/uvm/uvm_fault.c?rev=1.194&content-type=text/x-cvsweb-markup&only_with_tag=MAIN
+    <braunr> look for uvmadvice
+    <mcsim> But different translators have different demands. Thus changing of
+      global policy for one translator would have impact on behavior of another
+      one.
+    <braunr> i understand
+    <braunr> this isn't l4, or anything experimental
+    <braunr> we want something that works well for us
+    <mcsim> And this is acceptable?
+    <braunr> until you're able to demonstrate we need different policies, i'd
+      recommend not making things more complicated than they already are and
+      need to be
+    <braunr> why wouldn't it ?
+    <braunr> we've been discussing this a long time :/
+    <mcsim> because every process runs in isolated environment and the fact
+      that there is something outside this environment, that has no rights to
+      do that, does it surprises me.
+    <braunr> ?
+    <mcsim> ok. let me dip in uvm code. Probably my questions disappear
+    <braunr> i don't think it will
+    <braunr> you're asking about the system design here, not implementation
+      details
+    <braunr> with l4, there are as you'd expect well defined components
+      handling policies for address space allocation, or paging, or whatever
+    <braunr> but this is mach
+    <braunr> mach has a big shared global vm server with in kernel policies for
+      it
+    <braunr> so it's ok to implement a global policy for this
+    <braunr> and let's be pragmatic, if we don't need complicated stuff, why
+      would we waste time on this ?
+    <mcsim> It is not complicated.
+    <braunr> retaining a whole container for each object, whereas they're all
+      going to contain exactly the same stuff for years to come seems overly
+      complicated for me
+    <mcsim> I'm not going to create separate container for each object.
+    <braunr> i'm not following you then
+    <braunr> how can pagers upload their sizes in the kernel ?
+    <mcsim> I'm going to create a new container only for combination of cluster
+      sizes that are not present in table of advice.
+    <braunr> that's equivalent
+    <braunr> you're ruling out the default set, but that's just an optimization
+    <braunr> whenever a file system decides to use other sizes, the problem
+      will arise
+    <mcsim> Before creating a container I'm going to lookup a table. And only
+      than create
+    <braunr> a table ?
+    <mcsim> But there will be the same container for a huge bunch of objects
+    <braunr> how do you select it ?
+    <braunr> if it's a per pager container, remember there is no shared pager
+      object in the kernel, only ports to external programs
+    <mcsim> I'll give an example
+    <mcsim> Suppose there are only two policies. At the beginning we have table
+      {{random = 4096, sequential = 8096}}. Than pager 1 wants to add new
+      policy where random cluster size is 8192. He asks kernel to create it and
+      after this table will be following: {{random = 4096, sequential = 8192},
+      {random = 8192, sequential = 8192}}. If pager 2 wants to create the same
+      policy as pager 1, kernel will lockup table and will not create new
+      entry. So the table will be the same.
+    <mcsim> And each object has link to appropriate table entry
+    <braunr> i'm not sure how this can work
+    <braunr> how can pagers 1 and 2 know the sizes are the same for the same
+      policy ?
+    <braunr> (and actually they shouldn't)
+    <mcsim> For faster lookup there will be create hash keys for each entry
+    <braunr> what's the lookup key ?
+    <mcsim> They do not know
+    <mcsim> The kernel knows
+    <braunr> then i really don't understand
+    <braunr> and how do you select sizes based on the policy ?
+    <braunr> and how do you remove unused entries ?
+    <braunr> (ok this can be implemented with a simple ref counter)
+    <mcsim> "and how do you select sizes based on the policy ?" you mean at
+      page fault?
+    <braunr> yes
+    <mcsim> entry or object keeps pointer to appropriate entry in the table
+    <braunr> ok your per object data is a pointer to the table entry and the
+      policy is the index inside
+    <braunr> so you really need a ref counter there
+    <mcsim> yes
+    <braunr> and you need to maintain this table
+    <braunr> for me it's uselessly complicated
+    <mcsim> but this keeps design clear
+    <braunr> not for me
+    <braunr> i don't see how this is clearer
+    <braunr> it's just more powerful
+    <braunr> a power we clearly don't need now
+    <braunr> and in the following years
+    <braunr> in addition, i'm very worried about the potential problems this
+      can introduce
+    <mcsim> In fact I don't feel comfortable from the thought that one
+      translator can impact on behavior of another.
+    <braunr> simple example: the table is shared, it needs a lock, other data
+      structures you may have added in your patch may also need a lock
+    <braunr> but our locks are noop for now, so you just can't be sure there is
+      no deadlock or other issues
+    <braunr> and adding smp is a *lot* more important than being able to select
+      precisely policy sizes that we're very likely not to change a lot
+    <braunr> what do you mean by "one translator can impact another" ?
+    <mcsim> As I understand your idea (I haven't read uvm code yet) that there
+      is a global table of cluster sizes for different policies. And every
+      translator can change values in this table. That is what I mean under one
+      translator will have an impact on another one.
+    <braunr> absolutely not
+    <braunr> translators *can't* change sizes
+    <braunr> the sizes are completely static, assumed to be fit all
+    <braunr> -be
+    <braunr> it's not optimial but it's very simple and effective in practice
+    <braunr> optimal*
+    <braunr> and it's not a table of cluster sizes
+    <braunr> it's a table of pages before/after the faulted one
+    <braunr> this reflects the fact tha in mach, virtual memory (implementation
+      and policy) is in the kernel
+    <braunr> translators must not be able to change that
+    <braunr> let's talk about pagers here, not translators
+    <mcsim> Finally I got you. This is an acceptable tradeoff.
+    <braunr> it took some time :)
+    <braunr> just to clear something
+    <braunr> 20:12 < mcsim> For faster lookup there will be create hash keys
+      for each entry
+    <braunr> i'm not sure i understand you here
+    <mcsim> To found out if there is such policy (set of sizes) in the table we
+      can lookup every entry and compare each value. But it is better to create
+      a hash value for set and thus find equal policies.
+    <braunr> first, i'm really not comfortable with hash tables
+    <braunr> they really need careful configuration
+    <braunr> next, as we don't expect many entries in this table, there is
+      probably no need for this overhead
+    <braunr> remember that one property of tables is locality of reference
+    <braunr> you access the first entry, the processor automatically fills a
+      whole cache line
+    <braunr> so if your table fits on just a few, it's probably faster to
+      compare entries completely than to jump around in memory
+    <mcsim> But we can sort hash keys, and in this way find policies quickly.
+    <braunr> cache misses are way slower than computation
+    <braunr> so unless you have massive amounts of data, don't use an optimized
+      container
+    <mcsim> (20:38:53) braunr: that's why btrees and radix trees (basically
+      trees of arrays) exist
+    <mcsim> and what will be the key?
+    <braunr> i'm not saying to use a tree instead of a hash table
+    <braunr> i'm saying, unless you have many entries, just use a simple table
+    <braunr> and since pagers don't add and remove entries from this table
+      often, it's on case reallocation is ok
+    <braunr> one*
+    <mcsim> So here dynamic arrays fit the most?
+    <braunr> probably
+    <braunr> it really depends on the number of entries and the write ratio
+    <braunr> keep in mind current processors have 32-bits or (more commonly)
+      64-bits cache line sizes
+    <mcsim> bytes probably?
+    <braunr> yes bytes
+    <braunr> but i'm not willing to add a realloc like call to our general
+      purpose kernel allocator
+    <braunr> i don't want to make it easy for people to rely on it, and i hope
+      the lack of it will make them think about other solutions instead :)
+    <braunr> and if they really want to, they can just use alloc/free
+    <mcsim> Under "other solutions" you mean trees?
+    <braunr> i mean anything else :)
+    <braunr> lists are simple, trees are elegant (but add non negligible
+      overhead)
+    <braunr> i like trees because they truely "gracefully" scale
+    <braunr> but they're still O(log n)
+    <braunr> a good hash table is O(1), but must be carefully measured and
+      adjusted
+    <braunr> there are many other data structures, many of them you can find in
+      linux
+    <braunr> but in mach we don't need a lot of them
+    <mcsim> Your favorite data structures are lists and trees. Next, what
+      should you claim, is that lisp is your favorite language :)
+    <braunr> functional programming should eventually rule the world, yes
+    <braunr> i wouldn't count lists are my favorite, which are really trees
+    <braunr> as*
+    <braunr> there is a reason why red black trees back higher level data
+      structures like vectors or maps in many common libraries ;)
+    <braunr> mcsim: hum but just to make it clear, i asked this question about
+      hashing because i was curious about what you had in mind, i still think
+      it's best to use static predetermined values for policies
+    <mcsim> braunr: I understand this.
+    <braunr> :)
+    <mcsim> braunr: Yeah. You should be cautious with me :)
+
+
+## IRC, freenode, #hurd, 2012-09-21
+
+    <antrik> mcsim: there is only one cluster size per object -- it depends on
+      the properties of the backing store, nothing else.
+    <antrik> (while the readahead policies depend on the use pattern of the
+      application, and thus should be selected per mapping)
+    <antrik> but I'm still not convinced it's worthwhile to bother with cluster
+      size at all. do other systems even do that?...
+
+
+## IRC, freenode, #hurd, 2012-09-23
+
+    <braunr> mcsim: how long do you think it will take you to polish your gsoc
+      work ?
+    <braunr> (and when before you begin that part actually, because we'll to
+      review the whole stuff prior to polishing it)
+    <mcsim> braunr: I think about 2 weeks
+    <mcsim> But you may already start review it, if you're intended to do it
+      before I'll rearrange commits.
+    <mcsim> Gnumach, ext2fs and defpager are ready. I just have to polish the
+      code.
+    <braunr> mcsim: i don't know when i'll be able to do that
+    <braunr> so expect a few weeks on my (our) side too
+    <mcsim> ok
+    <braunr> sorry for being slow, that's how hurd development is :)
+    <mcsim> What should I do with libc patch that adds madvise support?
+    <mcsim> Post it to bug-hurd?
+    <braunr> hm probably the same i did for pthreads, create a topic branch in
+      glibc.git
+    <mcsim> there is only one commit
+    <braunr> yes
+    <braunr> (mine was a one liner :p)
+    <mcsim> ok
+    <braunr> it will probably be a debian patch before going into glibc anyway,
+      just for making sure it works
+    <mcsim> But according to term. I expect that my study begins in a week and
+      I'll have to do some stuff then, so actually probably I'll need a week
+      more.
+    <braunr> don't worry, that's expected
+    <braunr> and that's the reason why we're slow
+    <mcsim> And what should I do with large store patch?
+    <braunr> hm good question
+    <braunr> what did you do for now ?
+    <braunr> include it in your work ?
+    <braunr> that's what i saw iirc
+    <mcsim> Yes. It consists of two parts.
+    <braunr> the original part and the modificaionts ?
+    <braunr> modifications*
+    <braunr> i think youpi would know better about that
+    <mcsim> First (small) adds notification to libpager interface and second
+      one adds support for large stores.
+    <braunr> i suppose we'll probably merge the large store patch at some point
+      anyway
+    <mcsim> Yes both original and modifications
+    <braunr> good
+    <mcsim> I'll split these parts to different commits and I'll try to make
+      support for large stores independent from other work.
+    <braunr> that would be best
+    <braunr> if you can make it so that, by ommitting (or including) one patch,
+      we can add your patches to the debian package, it would be great
+    <braunr> (only with regard to the large store change, not other potential
+      smaller conflicts)
+    <mcsim> braunr: I also found several bugs in defpager, that I haven't fixed
+      since winter.
+    <braunr> oh
+    <mcsim> seems nobody hasn't expect them.
+    <braunr> i'm very interested in those actually (not too soon because it
+      concerns my work on pageout, which is postponed after pthreads and
+      select)
+    <mcsim> ok. than I'll do it first.
+
+
+## IRC, freenode, #hurd, 2012-09-24
+
+    <braunr> mcsim: what is vm_get_advice_info ?
+    <mcsim> braunr: hello. It should supply some machine specific parameters
+      regarding clustered reading. At the moment it supplies only maximal
+      possible size of cluster.
+    <braunr> mcsim: why such a need ?
+    <mcsim> It is used by defpager, as it can't allocate memory dynamically and
+      every thread has to allocate maximal size beforehand 
+    <braunr> mcsim: i see
+
+
+## IRC, freenode, #hurd, 2012-10-05
+
+    <mcsim> braunr: I think it's not worth to separate large store patch for
+      ext2 and patch for moving it to new libpager interface. Am I right?
+    <braunr> mcsim: it's worth separating, but not creating two versions
+    <braunr> i'm not sure what you mean here
+    <mcsim> First, I applied large store patch, and than I was changing patched
+      code, to make it work with new libpager interface. So changes to make
+      ext2 work with new interface depend on large store patch.
+    <mcsim> braunr: ^
+    <braunr> mcsim: you're not forced to make each version resulting from a new
+      commit work
+    <braunr> but don't make big commits
+    <braunr> so if changing an interface requires its users to be updated
+      twice, it doesn't make sense to do that
+    <braunr> just update the interface cleanly, you'll have one or more commits
+      that produce intermediate version that don't build, that's ok
+    <braunr> then in another, separate commit, adjust the users
+    <mcsim> braunr: The only user now is ext2. And the problem with ext2 is
+      that I updated not the version from git repository, but the version, that
+      I've got after applying the large store patch. So in other words my
+      question is follows: should I make a commit that moves to new interface
+      version of ext2fs without large store patch?
+    <braunr> you're asking if you can include the large store patch in your
+      work, and by extension, in the main branch
+    <braunr> i would say yes, but this must be discussed with others