IRC.

author: Thomas Schwinge <tschwinge@gnu.org> 2012-01-28 15:04:40 +0100
committer: Thomas Schwinge <tschwinge@gnu.org> 2012-01-28 15:04:40 +0100
commit: 6f3a380f3c1bc602b1b86dec307abf27f71bfef4 (patch)
tree: a534bf34fc4d91b4d30c6f3ac4fabbc3c511201f /hurd/translator/tmpfs/discussion.mdwn
parent: be4193108513f02439a211a92fd80e0651f6721b (diff)
1 files changed, 265 insertions, 1 deletions
diff --git a/hurd/translator/tmpfs/discussion.mdwn b/hurd/translator/tmpfs/discussion.mdwn
index 486206e3..0409f046 100644
--- a/hurd/translator/tmpfs/discussion.mdwn
+++ b/hurd/translator/tmpfs/discussion.mdwn
@@ -1,4 +1,4 @@
-[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]]
+[[!meta copyright="Copyright © 2011, 2012 Free Software Foundation, Inc."]]
 
 [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
 id="license" text="Permission is granted to copy, distribute and/or modify this
@@ -19,3 +19,267 @@ License|/fdl]]."]]"""]]
   * [[!GNU_Savannah_bug 26751]]
 
   * [[!GNU_Savannah_bug 32755]]
+
+
+# [[Maksym_Planeta]]
+
+## IRC, freenode, #hurd, 2011-11-29
+
+    <mcsim> Hello. In seqno_memory_object_data_request I call
+      memory_object_data_supply and supply one zero filled page, but seems that
+      kernel ignores this call because this page stays filled in specified
+      memory object. In what cases kernel may ignore this call? It is written
+      in documentation that "kernel prohibits the overwriting of live data
+      pages". But when I called memory_object_lock_request on this page with
+      should flush and MEMORY_OBJECT_RETURN_ALL nothing change
+    <braunr> what are you trying to do ?
+    <mcsim> I think that memory object holds wrong data, so I'm trying to
+      replace them. This happens when file is truncated, so I should notify
+      memory object that there is no some data. But since gnumach works only
+      with sizes that are multiple of vm_page_size, I should manually correct
+      last page for case when file size isn't multiple of vm_page_size. This is
+      needed for case when file grows again and that tail of last page, which
+      wasn't part of file should be filled wit
+    <mcsim> I've put some printf's in kernel and it seems that page that holds
+      data which I want replace both absent and busy:
+    <mcsim> m = vm_page_lookup(object,offset);
+    <mcsim> ...
+    <mcsim> if (m->absent && m->busy) { <-- Condition is true
+    <mcsim> in vm/memory_object.c:169
+    <slpz> mcsim: Receiving m_o_data_request means there's no page in the
+      memory object at that offset, so m_o_data_supply should work
+    <slpz> are you sure that page is not being installed into the memory
+      object?
+    <braunr> it seems normal it's both absent and busy
+    <braunr> absent because, as sergio said, the page is missing, and busy
+      because the kernel starts a transfer for its content
+    <braunr> i don't understand how you determine the kernel ignores your
+      data_supply
+    <braunr> "because this page stays filled in specified memory object"
+    <braunr> please explain this with more detail
+    <slpz> mcsim: anyway, when truncating a file to a non page-aligned length,
+      you can just zero fill the rest of the page by mapping the object and
+      writing to it with memset/bzero 
+    <braunr> (avoid bzero, it's obsolete)
+    <mcsim> slpz: I'll try try it now.
+    <braunr> slpz: i think that's what he's trying to do
+    <mcsim> I don't vm_map it
+    <braunr> how do you zero it then ?
+    <braunr> "I call memory_object_data_supply and supply one zero filled page"
+    <mcsim> First I call mo_lock_request and ask to return this page, than I
+      memset tail and try to mo_data_supply
+    <mcsim> I use this function when I try to replace kr =
+      memory_object_data_supply(reply_to, offset, addr, vm_page_size, FALSE,
+      VM_PROT_NONE, FALSE, MACH_PORT_NULL);
+    <mcsim> where addr points to new data, offset points to old data in
+      object. and reply_to is memory_control which I get as parameter in
+      mo_data_request
+    <braunr> why would you want to vm_map it then ?
+    <mcsim> because mo_data_supply doesn't work.
+    <braunr> mcsim: i still don't see why you want to vm_map
+    <mcsim> I just want to try it.
+    <braunr> but what do you think will happen ?
+    <mcsim> But seems that it doesn't work too, because I can't vm_map
+      memory_object from memory_manager of this object.
+
+
+## IRC, freenode, #hurd, 2012-01-05
+
+    <mcsim> Seems tmpfs works now. The code really needs cleaning, but the main
+      is that it works. So in nearest future it will be ready for merging to
+      master branch. BTW, anyone knows good tutorial about refactoring using
+      git (I have a lot of pointless commits and I want to gather and scatter
+      them to sensible ones).
+    <antrik> I wonder whether he actually got the "proper" tmpfs with the
+      defaul pager working? or only the hack with a private pager?
+    <mcsim> antrik: with default pager
+    <antrik> mcsim: wow, that's great :-)
+    <antrik> how did you fix it?
+    <mcsim> antrik: The main code I wrote before December, so I forgot some of
+      what exactly I were doing. So I have to look through my code :)
+    <mcsim> antrik: old defpager was using old functions like m_o_data_write
+      instead of m_o_data_return etc. I changed it, mostly because of my
+      misunderstanding. But I hope that this is not a problem.
+
+
+## IRC, freenode, #hurd, 2012-01-18
+
+    <antrik> mcsim: did you publish your in-progress work?
+    <mcsim> there is a branch with working tmpfs in git repository:
+      http://git.savannah.gnu.org/cgit/hurd/hurd.git/log/?h=mplaneta/tmpfs/defpager
+    <jd823592> sorry for interrupting the meeting but i wonder what is a
+      lazyfs?
+    <mcsim> jd823592: lazyfs is tmpfs which uses own pager
+    <antrik> mcsim: ah, nice :-)
+    <antrik> BTW, what exactly did you need to fix to make it work?
+    <mcsim> most fixes wore in defpager in default_pager_object_set_size. Also,
+      as i said earlier, I switched to new functions (m_o_data_return instead
+      of m_o_data_write and so on). I said that this was mostly because of my
+      misunderstanding, but it turned out that new function provide work with
+      precious attribute of page.
+    <mcsim> Also there were some small errors like this:
+    <mcsim>  	  pager->map = (dp_map_t) kalloc (PAGEMAP_SIZE (new_size));
+    <mcsim> 	  memcpy (pager->map, old_mapptr, PAGEMAP_SIZE (old_size));
+    <mcsim> where in second line should be new_size too
+    <mcsim> I removed all warnings in compiling defpager (and this helped to
+      find an error).
+    <antrik> great work :-)
+    <jd823592> tmpfs is nice thing to have :), are there other recent
+      improvements that were not yet published in previous moth?
+    <mcsim> BTW, i measured tmpfs speed in it is up to 6 times faster than
+      ramdisk+ext2fs
+    <antrik> mcsim: whow, that's quite a difference... didn't expect that
+
+
+## IRC, freenode, #hurd, 2012-01-24
+
+    <mcsim> braunr: I'm just wondering is there any messages before hurd
+      breaks. I have quite strange message: memory_object_data_request(0x0,
+      0x0, 0xf000, 0x1000, 0x1) failed, 10000003
+    <braunr> hm i don't think so
+    <braunr> usually it either freezes completely, or it panics because of an
+      exhausted resource
+    <mcsim> where first and second 0x0 are pager and pager_request for memory
+      object in vm_fault_page from gnumach/vm_fault.c
+    <braunr> if you're using the code you're currently working on (which i
+      assume), then look for a bug there first
+    <tschwinge> mcsim: Maybe you're running out of swap?
+    <mcsim> tschwinge: no
+    <braunr> also, translate the error code
+    <mcsim> AFAIR that's MACH_INVALID_DEST
+    <braunr> and what does it mean in this situation ?
+    <mcsim> I've run fsx as long as possible several times. It runs quite long
+      but it breaks in different ways.
+    <mcsim> MACH_SEND_INVALID_DEST
+    <mcsim> this means that kernel tries to call rpc with pager 0x0
+    <mcsim> this is invalid destiantion
+    <braunr> null port
+    <braunr> ok
+    <braunr> did the pager die ?
+    <mcsim> When I get this message pager dies, but also computer can suddenly
+      reboot
+    <braunr> i guess the pager crashing makes mach print this error
+    <braunr> but then you may have a dead port instead of a null port, i don't
+      remember the details
+    <mcsim> braunr: thank you.
+    <mcsim> btw, for big file sizes fsx breaks on ext2fs
+    <braunr> could you identify the threshold ?
+    <braunr> and what's fsx exactly ?
+    <mcsim> fsx is a testing utility for filesystems
+    <mcsim> see http://codemonkey.org.uk/projects/fsx/
+    <braunr> ah, written by tevanian
+    <mcsim> threshold seems to be 8Mb
+    <braunr> fyi, avadis tevanian is the main author of the mach 3 core
+      services and VM parts
+    <braunr> well, ext2fs is bugged, we already know that
+    <braunr> old code maintained as well as possible, but still
+    <mcsim> hmm, with 6mb it breaks too
+    <braunr> i guess that it may break on anything larger than a page actually
+      :p
+    <mcsim> When I tested with size of 256kb, fsx worked quite long and didn't
+      break
+    <braunr> mcsim: without knowing exactly what the test actually does, it's
+      hard to tell
+    <mcsim> I see, I just wanted to tell that there are bugs in ext2fs too. But
+      I didn't debugged it.
+    <mcsim> fsx performs different operations, like read, write, truncate file,
+      grow file in random order.
+    <braunr> in parellel too ?
+    <braunr> parellel
+    <braunr> parallel*
+    <mcsim> no
+    <mcsim> I run several fsx's parallel on tmpfs, but they break on file with
+      size 8mb.
+    <braunr> that must match something in mach
+    <braunr> s/must/could/ :)
+    <mcsim> braunr: I've pushed my commits to mplaneta/tmpfs/master branch in
+      hurd repository, so you could review it.
+    <braunr> you shouldn't do that just for me :p
+    <braunr> you should do that regularly, and ask for reviews after
+      (e.g. during the meetings) 
+    <mcsim> everyone could do that :)
+    <braunr> i'm quite busy currently unfortunately
+    <braunr> i'll try when i have time, but the best would be to ask very
+      specific questions
+    <braunr> these are usually the faster to answer for people ho have the
+      necessary expertise to help you
+    <braunr> fastest*
+    <mcsim> ok.
+    <mcsim> braunr: probably, I was doing something wrong, because now parallel
+      works only for small sizes. Sorry, for disinformation.
+
+
+### IRC, freenode, #hurd, 2012-01-25
+
+    <antrik> braunr: actually, the paging errors are *precisely* the way my
+      system tends to die...
+    <antrik> (it's after about a month of uptime usually though, not a week...)
+    <antrik> tschwinge: in my case at least, I have still plenty of swap when
+      this happens. swap usage is generally at about the amount of physical
+      memory -- I have no idea though whether there is an actual connection, or
+      it's just coincidence
+    <braunr> antrik: ok, your hurd dies because of memory issues, my virtual
+      machines die because of something else (though idk what)
+    <antrik> before I aquired the habit of running my box 24/7 and thus hitting
+      this issue, most of the hangs I experienced were also of a different
+      nature... but very rare in general, except when doing specific
+      problematic actions
+    <mcsim> antrik: yes. Do you get messages like that I posted?
+    <mcsim> here is it: memory_object_data_request(0x0, 0x0, 0xf000, 0x1000,
+      0x1) failed, 10000003
+    <antrik> mcsim: I can't tell for sure (never noted them down, silly me...)
+    <antrik> but I definitely get paging errors right before it hangs
+    <antrik> I guess that was unclear... what I'm trying to say is: I do get
+      memory_object_data_request() failed; but I'm not sure about the
+      parameters
+    <mcsim> antrik: ok. Thank you.
+    <mcsim> I'll try to find something in defpager, but there should be errors
+      in mach too. At least because sometimes computer suddenly reboots during
+      test.
+    <antrik> mcsim: I don't get sudden reboots
+    <antrik> might be a different error
+    <antrik> do you have debugging mode activated in Mach? otherwise it reboots
+      on kernel panics...
+    <mcsim> antrik: no. But usually on kernel panics mach waits for some time
+      showing the error message and only than reboots.
+    <antrik> OK
+    <mcsim> how can I know that tmpfs is stable enough? Correcting errors in
+      kernel to make fsx test work seems to be very complex.
+    <mcsim> *If errors are in kernel.
+    <antrik> well, it seems that you tested it already much more thoroughly
+      than any other code in the Hurd was ever tested :-)
+    <antrik> of course it would be great if you could pinpoint some of the
+      problems you see nevertheless :-)
+    <antrik> but that's not really necessary before declaring tmpfs good enough
+      I'd say
+    <mcsim> ok. I'll describe every error I meet on my userpage
+    <mcsim> but it will take some time, not before weekend.
+    <antrik> don't worry, it's not urgent
+    <antrik> the reason I'd really love to see those errors investigated is
+      that most likely they are the same ones that cause stability problems in
+      actual use...
+    <antrik> having an easy method for reproducing them is already a good start
+    <mcsim> no. they are not the same
+    <mcsim> every time i get different one
+    <mcsim> especially when i just start one process fsx and wait error
+    <antrik> mcsim: have you watched memory stats while running it? if it's
+      related to the problems I'm experiencing, you will probably see rising
+      memory use while the test is running
+    <mcsim> it could be reboot, message, I posted and also fsx could stop
+      telling that something wrong with data
+    <antrik> you get all of these also on ext2?
+    <mcsim> i've done it only once. Here is the log:
+      http://paste.debian.net/153511/
+    <mcsim> I saved "free" output every 30 seconds
+    <mcsim> no. I'll do it now
+    <antrik> would be better to log with "vmstat 1"
+    <mcsim> ok.
+    <mcsim> as you can see, there is now any leek during work. But near end
+      free memory suddenly decreases
+    <antrik> yeah... it's a bit odd, as there is a single large drop, but seems
+      stable again afterwards...
+    <antrik> a more detailed log might shed some light
+    <mcsim> drop at the beginning was when I started translator.
+    <mcsim> what kind of log do you mean?
+    <antrik> vmstat 1 I mean
+    <mcsim> ah...
author	Thomas Schwinge <tschwinge@gnu.org>	2012-01-28 15:04:40 +0100
committer	Thomas Schwinge <tschwinge@gnu.org>	2012-01-28 15:04:40 +0100
commit	6f3a380f3c1bc602b1b86dec307abf27f71bfef4 (patch)
tree	a534bf34fc4d91b4d30c6f3ac4fabbc3c511201f /hurd/translator/tmpfs/discussion.mdwn
parent	be4193108513f02439a211a92fd80e0651f6721b (diff)