[[!meta copyright="Copyright © 2010, 2011, 2013 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled [[GNU Free Documentation License|/fdl]]."]]"""]] [[!tag open_issue_gnumach open_issue_hurd]] # IRC, freenode, #hurd, 2010 <slpz> humm... why does tmpfs try to use the default pager? that's a bad idea, and probably will never work correctly... * slpz is thinking about old issues <slpz> tmpfs should create its own pagers, just like ext2fs, storeio... <slpz> slopez@slp-hurd:~$ settrans -a tmp /hurd/tmpfs 10M <slpz> slopez@slp-hurd:~$ echo "foo" > tmp/bar <slpz> slopez@slp-hurd:~$ cat tmp/bar <slpz> foo <slpz> slopez@slp-hurd:~$ <slpz> :-) <pochu> slpz: woo you fixed it? <slpz> pochu: well, it's WIP, but reading/writing works... <slpz> I've replaced the use of default pager for the standard pager creation mechanism <antrik> slpz: err... how is it supposed to use swap space if not using the default pager? <antrik> slpz: or do you mean that it should act as a proxy, just allocating anonymous memory (backed by the default pager) itself? <youpi> antrik: the kernel uses the default pager if the application pager isn't responsive enough <slpz> antrik: it will just create memory objects and provide zerofilled pages when requested by the kernel (after a page fault) <antrik> youpi: that makes sense I guess... but how is that relevant to the question at hand?... <slpz> antrik: memory objects will contain the data by themselves <slpz> antrik: as youpi said, when memory is scarce, GNU Mach will start paging out data from memory objects to the default pager <slpz> antrik: that's the way in which pages will get into swap space <slpz> (if needed) <youpi> the thing being that the tmpfs pager has a chance to select pages he doesn't care any more about <antrik> slpz: well, the point is that instead of writing the pages to a backing store, tmpfs will just keep them in anonymous memory, and let the default pager write them out when there is pressure, right? <antrik> youpi: no idea what you are talking about. apparently I still don't really understand this stuff :-( <youpi> ah, but tmpfs doesn't have pages he doesn't care about, does it? <slpz> antrik: yes, but the term "anonymous memory" could be a bit confusing. <slpz> antrik: in GNU Mach, anonymous memory is backed by a memory object without a pager. In tmpfs, nodes will be allocated in memory objects, and the pager for those memory objects will be tmpfs itself <antrik> slpz: hm... I thought anynymous memory is backed by memory objects created from the default pager? <antrik> yes, I understand that tmpfs is supposed to be the pager for the objects it provides. they are obviously not anonymoust -- they have inodes in the tmpfs name space <antrik> but my understanding so far was that when Mach returns pages to the pager, they end up in anonymous memory allocated to the pager process; and then this pager is responsible for writing them back to the actual backing store <antrik> am I totally off there?... <antrik> (i.e. in my understanding the returned pages do not reside in the actual memory object the pager provides, but in an anonymous memory object) <slpz> antrik: you're right. The trick here is, when does Mach return the pages? <slpz> antrik: if we set the attribute "can_persist" in a memory object, Mach will keep it until object cache is full or memory is scarce <slpz> or we change the attributes so it can no longer persist, of course <slpz> without a backing store, if Mach starts sending us pages to be written, we're in trouble <slpz> so we must do something about it. One option, could be creating another pager and copying the contents between objects. <antrik> another pager? not sure what you mean <antrik> BTW, you didn't really say why we can't use the default pager for tmpfs objects :-) <slpz> well, there're two problems when using the default pager as backing store for translators <slpz> 1) Mach relies on it to do swapping tasks, so meddling with it is not a good idea <slpz> 2) There're problems with seqnos when trying to work with the default pager from tasks other the kernel itself <slpz> (probably, the latter could be fixed) <slpz> antrik: pager's terminology is a bit confusing. One can also say creating another memory object (though the function in libpager is "pager_create") <antrik> not sure why "meddling" with it would be a problem... <antrik> and yeah, I was vaguely aware that there is some seqno problem with tmpfs... though so far I didn't really understand what it was about :-) <antrik> makes sense now <antrik> anyways, AIUI now you are trying to come up with a mechanism where the default pager is not used for tmpfs objects directly, but without making it inefficient? <antrik> slpz: still don't understand what you mean by creating another memory object/pager... <antrik> (and yeat, the terminology is pretty mixed up even in Mach itself) <slpz> antrik: I meant creating another pager, in terms of calling again to libpager's pager_create <antrik> slpz: well, I understand what "create another pager" means... I just don't understand what this other pager would be, when you would create it, and what for... <slpz> antrik: oh, ok, sorry <slpz> antrik: creating another pager it's just a trick to avoid losing information when Mach's objects cache is full, and it decides to purge one of our objects <slpz> anyway, IMHO object caching mechanism is obsolete and should be replaced <slpz> I'm writting a comment to bug #28730 which says something about this <slpz> antrik: just one more thing :-) <slpz> if you look at the code, for most time of their lives, anonymous memory objects don't have a pager <slpz> not even the default one <slpz> only the pageout thread, when the system is running really low on memory, gives them a reference to the default pager by calling vm_object_pager_create <slpz> this is not really important, but worth noting ;-) # IRC, freenode, #hurd, 2011-09-28 <slpz> mcsim: "Fix tmpfs" task should be called "Fix default pager" :-) <slpz> mcsim: I've been thinking about modifying tmpfs to actually have it's own storeio based backend, even if a tmpfs with storage sounds a bit stupid. <slpz> mcsim: but I don't like the idea of having translators messing up with the default pager... <antrik> slpz: messing up?... <slpz> antrik: in the sense of creating a number of arbitrarily sized objects <antrik> slpz: well, it doesn't really matter much whether a process indirectly eats up arbitrary amounts of swap through tmpfs, or directly through vm_allocate()... <antrik> though admittedly it's harder to implement resource limits with tmpfs <slpz> antrik: but I've talked about having its own storeio device as backend. This way Mach can pageout memory to tmpfs if it's needed. <mcsim> Do I understand correctly that the goal of tmpfs task is to create tmpfs in RAM? <slpz> mcsim: It is. But it also needs some kind of backend, just in case it's ordered to page out data to free some system's memory. <slpz> mcsim: Nowadays, this backend is another translator that acts as default pager for the whole system <antrik> slpz: pageout memory to tmpfs? not sure what you mean <slpz> antrik: I mean tmpfs acting as its own pager <antrik> slpz: you mean tmpfs not using the swap partition, but some other backing store? <slpz> antrik: Yes. See also: [[open_issues/resource_management_problems/pagers]]. <antrik> slpz: I don't think an extra backing store for tmpfs is a good idea. the whole point of tmpfs is not having a backing store... TBH, I'd even like to see a single backing store for anonymous memory and named files <slpz> antrik: But you need a backing store, even if it's the default pager :-) <slpz> antrik: The question is, Should users share the same backing store (swap space) or provide their own? <antrik> slpz: not sure what you mean by "users" in this context :-) <slpz> antrik: Real users with the ability of setting tmpfs translators <antrik> essentially, I'd like to have a single partition that contains both swap space and the main filesystem (at least /tmp, but probably also all of /run, and possibly even /home...) <antrik> but that's a bit off-topic :-) <antrik> well, ideally all storage should be accounted to a user, regardless whether it's swapped out anonymous storage, temporary named files, or permanent files <slpz> antrik: you could use a file as backend for tmpfs <antrik> slpz: what's the point of using tmpfs then? :-) <pinotree> (and then store the file in another tmpfs) <slpz> antrik: mach-defpager could be modified to use storeio instead of Mach's device_* operations, but by the way things work right now, that could be dangerous, IMHO <antrik> pinotree: hehe <pinotree> .. recursive tmpfs'es ;) <antrik> slpz: hm, sounds interesting <slpz> antrik: tmpfs would try to keep data in memory always it's possible (not calling m_o_lock_request would do the trick), but if memory is scarce an Mach starts paging out, it would write it to that file/device/whatever <antrik> ideally, all storage used by system tasks for swapped out anonymous memory as well as temporary named files would end up on the /run partition; while all storage used by users would end up in /home/* <antrik> if users share a partition, some explicit storage accounting would be useful too... <antrik> slpz: is that any different from what "normal" filesystems do?... <antrik> (and *should* it be different?...) <slpz> antrik: Yes, as most FS try to synchronize to disk at a reasonable rate, to prevent data losses. <slpz> antrik: tmpfs would be a FS that wouldn't synchronize until it's forced to do that (which, by the way, it's what's currently happening with everyone that uses the default pager). <antrik> slpz: hm, good point... <slpz> antrik: Also, metadata in never written to disk, only kept in memory (which saves a lot of I/O, too). <slpz> antrik: In fact, we would be doing the same as every other kernel does, but doing it explicitly :-) <antrik> I see the use in separating precious data (in permanent named files) from temporary state (anonymous memory and temporary named files) -- but I'm not sure whether having a completely separate FS for the temporary data is the right approach for that... <slpz> antrik: And giving the user the option to specify its own storage, so we don't limit him to the size established for swap by the super-user. <antrik> either way, that would be a rather radical change... still would be good to fix tmpfs as it is first if possible <antrik> as for limited swap, that's precisely why I'd prefer not to have an extra swap partition at all... <slpz> antrik: It's not much o fa change, it's how it works right now, with the exception of replacing the default pager with its own. <slpz> antrik: I think it's just a matter of 10-20 hours, as much. Including testing. <slpz> antrik: It could be forked with another name, though :-) <antrik> slpz: I don't mean radical change in the implementation... but a radical change in the way it would be used <slpz> antrik: I suggest "almosttmpfs" as the name for the forked one :-P <antrik> hehe <antrik> how about lazyfs? <slpz> antrik: That sound good to me, but probably we should use a more descriptive name :-) ## 2011-09-29 <tschwinge> slpz, antrik: There is a defpager in the Hurd code. It is not currently being used, and likely incomplete. It is backed by libstore. I have never looked at it. [[open_issues/mach-defpager_vs_defpager]]. # IRC, freenode, #hurd, 2011-11-08 <mcsim> who else uses defpager besides tmpfs and kernel? <braunr> normally, nothing directly <mcsim> than why tmpfs should use defpager? <braunr> it's its backend <braunr> backign store rather <braunr> the backing store of most file systems are partitions <braunr> tmpfs has none, it uses the swap space <mcsim> if we allocate memory for tmpfs using vm_allocate, will it be able to use swap partition? <braunr> it should <braunr> vm_allocate just maps anonymous memory <braunr> anonymous memory uses swap space as its backing store too <braunr> but be aware that this part of the vm system is known to have deficiencies <braunr> which is why all mach based implementations have rewritten their default pager <mcsim> what kind of deficiencies? <braunr> bugs <braunr> and design issues, making anonymous memory fragmentation horrible <antrik> mcsim: vm_allocate doesn't return a memory object; so it can't be passed to clients for mmap() <mcsim> antrik: I use vm_allocate in pager_read_page <antrik> mcsim: well, that means that you have to actually implement a pager yourself <antrik> also, when the kernel asks the pager to write back some pages, it expects the memory to become free. if you are "paging" to ordinary anonymous memory, this doesn't happen; so I expect it to have a very bad effect on system performance <antrik> both can be avoided by just passing a real anonymous memory object, i.e. one provided by the defpager <antrik> only problem is that the current defpager implementation can't really handle that... <antrik> at least that's my understanding of the situation # IRC, freenode, #hurd, 2013-07-05 <teythoon> btw, why does the tmpfs translator have to talk to the pager? <teythoon> to get more control about how the memory is paged out? <teythoon> read lot's of irc logs about tmpfs on the wiki, but I couldn't find the answer to that <mcsim> teythoon: did you read this? http://www.gnu.org/software/hurd/hurd/translator/tmpfs/tmpfs_vs_defpager.html <teythoon> mcsim: I did <mcsim> teythoon: Last discussion, i think has very good point. <mcsim> To provide memory objects you should implement pager interface <mcsim> And if you implement pager interface you are the one who is asked to write data to backing storage to evict them <mcsim> But tmpfs doesn't do this <teythoon> mmm, clients doing mmap... <mcsim> teythoon: You don't have mmap <mcsim> teythoon: mmap is implemented on top of mach interface <mcsim> teythoon: I mean you don't have mmap at this level <teythoon> mcsim: sure, but that's close enough for me at this point <mcsim> teythoon: diskfs interface requires implementor to provide a memory object port (send right) <mcsim> Guest8183: Why tmpfs requires defpager <Guest8183> how did you get to talk about that ? <mcsim> I was just asked <teythoon> Guest8183: it's just so unsettling that tmpfs has to be started as root :/ <Guest8183> teythoon: why ? *** Guest8183 (~rbraun@dalaran.sceen.net) is now known as braunr_ <teythoon> braunr_: b/c starting translators isn't a privileged operation, and starting a tmpfs translator that doesn't even access any device but "just" memory shouldn't require any special privileges as well imho <teythoon> so why is tmpfs not based on say libnetfs? b/c it is used for d-i and someone (apt?) mmaps stuff? <pinotree> being libdiskfs-based isn't much the issue, iirc <pinotree> http://lists.gnu.org/archive/html/bug-hurd/2013-03/msg00014.html too <kilobug> teythoon: AFAIK apt uses mmap, yes <braunr_> teythoon: right <braunr_> a ramfs is actually tricky to implement well <mcsim> braunr_: What do you mean under "to implement well"? <braunr_> as efficiently as possible <braunr_> i.e. being as close as possible to the page cache for minimum overhead <mcsim> braunr: AFAIK ramfs should not use swap partition, so page cache shouldn't be relevant for it. <braunr> i'm talking about a ramfs in general <braunr> not the specific linux ramfs <braunr> in linux, what they call ramfs is the tiny version of tmpfs that doesn't use swap <braunr> i actually don't like "tmpfs" much <braunr> memfs may be more appropriate <braunr> anyway <mcsim> braunr: I see. And do you consider defpager variant as "close as possible to the page cache"? <braunr> not far at least <braunr> if we were able to use it for memory obects, it would be nice <braunr> but defpager only gets attached to objects when they're evicted <braunr> before that, anonymous (or temporary, in mach terminology) objects have no backing store <braunr> this was probably designed without having tmpfs in mind <braunr> i wonder if it's possible to create a memory object without a backing store <mcsim> what should happen to it if kernel decides to evict it? <braunr> it sets the default pager as its backing store and pushes it out <mcsim> that's how it works now, but you said "create a memory object without a backing store" <braunr> mach can do that <braunr> i was wondering if we could do that too from userspace <mcsim> mach does not evict such objects, unless it bound a defpager to them <mcsim> but how can you handle this in userspace? <braunr> i mean, create a memory object with a null control port <braunr> mcsim: is that clearer ? <mcsim> suppose you create such object, how kernel will evict it if kernel does not know who is responsible for eviction of this object? <braunr> it does <braunr> 16:41 < braunr> it sets the default pager as its backing store and pushes it out <braunr> that's how i intend to do it on x15 at least <braunr> but it's much simpler there because uvm provides better separation between anonymous and file memory <braunr> whereas they're much too similar in mach vm <mcsim> than what the difference between current situation, when you explicitly invoke defpager to create object and implicit method you propose? <braunr> you don't need a true defpager unless you actually have swap <mcsim> ok <mcsim> now I see <braunr> it also saves the communication overhead when initializing the object <mcsim> thank you <braunr> which may be important since we use ramfs for speed mostly <mcsim> agree <braunr> it should also simplify the defpager implementation, since it would only have a single client, the kernel <braunr> which may also be important with regard to global design <braunr> one thing which is in my opinion very wrong with mach is that it may be a client <braunr> a well designed distributed system should normally not allow on component to act as both client and server toward another <braunr> i.e. the kernel should only be a server, not a client <braunr> and there should be a well designed server hierarchy to avoid deadlocks <braunr> (such as the one we had in libpager because of that) <mcsim> And how about filesystem? It acts both as server and as client <braunr> yes <braunr> but not towards the same other component <braunr> application -> file system -> kernel <braunr> no "<->" <braunr> the qnx documentation explains that quite well <braunr> let me see if i can find the related description <mcsim> Basically, I've got your point. And I would rather agree that kernel should not act as client <braunr> mcsim: http://www.qnx.com/developers/docs/6.4.0/neutrino/sys_arch/ipc.html#Robust <braunr> one way to implement that (and qnx does that too) is to make pagers act as client only <braunr> they sleep in the kernel, waiting for a reply <braunr> and when the kernel needs to evict something, a reply is sent <braunr> (qnx doesn't actually do that for paging, but it's a general idea) <mcsim> braunr: how hierarchy of senders is enforced? <braunr> it's not <braunr> developers must take care <braunr> same as locking, be careful about it