[[!meta copyright="Copyright © 2010, 2011, 2013 Free Software Foundation, Inc."]] [[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable id="license" text="Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled [[GNU Free Documentation License|/fdl]]."]]"""]] [[!tag open_issue_gnumach open_issue_hurd]] # IRC, freenode, #hurd, 2010 humm... why does tmpfs try to use the default pager? that's a bad idea, and probably will never work correctly... * slpz is thinking about old issues tmpfs should create its own pagers, just like ext2fs, storeio... slopez@slp-hurd:~$ settrans -a tmp /hurd/tmpfs 10M slopez@slp-hurd:~$ echo "foo" > tmp/bar slopez@slp-hurd:~$ cat tmp/bar foo slopez@slp-hurd:~$ :-) slpz: woo you fixed it? pochu: well, it's WIP, but reading/writing works... I've replaced the use of default pager for the standard pager creation mechanism slpz: err... how is it supposed to use swap space if not using the default pager? slpz: or do you mean that it should act as a proxy, just allocating anonymous memory (backed by the default pager) itself? antrik: the kernel uses the default pager if the application pager isn't responsive enough antrik: it will just create memory objects and provide zerofilled pages when requested by the kernel (after a page fault) youpi: that makes sense I guess... but how is that relevant to the question at hand?... antrik: memory objects will contain the data by themselves antrik: as youpi said, when memory is scarce, GNU Mach will start paging out data from memory objects to the default pager antrik: that's the way in which pages will get into swap space (if needed) the thing being that the tmpfs pager has a chance to select pages he doesn't care any more about slpz: well, the point is that instead of writing the pages to a backing store, tmpfs will just keep them in anonymous memory, and let the default pager write them out when there is pressure, right? youpi: no idea what you are talking about. apparently I still don't really understand this stuff :-( ah, but tmpfs doesn't have pages he doesn't care about, does it? antrik: yes, but the term "anonymous memory" could be a bit confusing. antrik: in GNU Mach, anonymous memory is backed by a memory object without a pager. In tmpfs, nodes will be allocated in memory objects, and the pager for those memory objects will be tmpfs itself slpz: hm... I thought anynymous memory is backed by memory objects created from the default pager? yes, I understand that tmpfs is supposed to be the pager for the objects it provides. they are obviously not anonymoust -- they have inodes in the tmpfs name space but my understanding so far was that when Mach returns pages to the pager, they end up in anonymous memory allocated to the pager process; and then this pager is responsible for writing them back to the actual backing store am I totally off there?... (i.e. in my understanding the returned pages do not reside in the actual memory object the pager provides, but in an anonymous memory object) antrik: you're right. The trick here is, when does Mach return the pages? antrik: if we set the attribute "can_persist" in a memory object, Mach will keep it until object cache is full or memory is scarce or we change the attributes so it can no longer persist, of course without a backing store, if Mach starts sending us pages to be written, we're in trouble so we must do something about it. One option, could be creating another pager and copying the contents between objects. another pager? not sure what you mean BTW, you didn't really say why we can't use the default pager for tmpfs objects :-) well, there're two problems when using the default pager as backing store for translators 1) Mach relies on it to do swapping tasks, so meddling with it is not a good idea 2) There're problems with seqnos when trying to work with the default pager from tasks other the kernel itself (probably, the latter could be fixed) antrik: pager's terminology is a bit confusing. One can also say creating another memory object (though the function in libpager is "pager_create") not sure why "meddling" with it would be a problem... and yeah, I was vaguely aware that there is some seqno problem with tmpfs... though so far I didn't really understand what it was about :-) makes sense now anyways, AIUI now you are trying to come up with a mechanism where the default pager is not used for tmpfs objects directly, but without making it inefficient? slpz: still don't understand what you mean by creating another memory object/pager... (and yeat, the terminology is pretty mixed up even in Mach itself) antrik: I meant creating another pager, in terms of calling again to libpager's pager_create slpz: well, I understand what "create another pager" means... I just don't understand what this other pager would be, when you would create it, and what for... antrik: oh, ok, sorry antrik: creating another pager it's just a trick to avoid losing information when Mach's objects cache is full, and it decides to purge one of our objects anyway, IMHO object caching mechanism is obsolete and should be replaced I'm writting a comment to bug #28730 which says something about this antrik: just one more thing :-) if you look at the code, for most time of their lives, anonymous memory objects don't have a pager not even the default one only the pageout thread, when the system is running really low on memory, gives them a reference to the default pager by calling vm_object_pager_create this is not really important, but worth noting ;-) # IRC, freenode, #hurd, 2011-09-28 mcsim: "Fix tmpfs" task should be called "Fix default pager" :-) mcsim: I've been thinking about modifying tmpfs to actually have it's own storeio based backend, even if a tmpfs with storage sounds a bit stupid. mcsim: but I don't like the idea of having translators messing up with the default pager... slpz: messing up?... antrik: in the sense of creating a number of arbitrarily sized objects slpz: well, it doesn't really matter much whether a process indirectly eats up arbitrary amounts of swap through tmpfs, or directly through vm_allocate()... though admittedly it's harder to implement resource limits with tmpfs antrik: but I've talked about having its own storeio device as backend. This way Mach can pageout memory to tmpfs if it's needed. Do I understand correctly that the goal of tmpfs task is to create tmpfs in RAM? mcsim: It is. But it also needs some kind of backend, just in case it's ordered to page out data to free some system's memory. mcsim: Nowadays, this backend is another translator that acts as default pager for the whole system slpz: pageout memory to tmpfs? not sure what you mean antrik: I mean tmpfs acting as its own pager slpz: you mean tmpfs not using the swap partition, but some other backing store? antrik: Yes. See also: [[open_issues/resource_management_problems/pagers]]. slpz: I don't think an extra backing store for tmpfs is a good idea. the whole point of tmpfs is not having a backing store... TBH, I'd even like to see a single backing store for anonymous memory and named files antrik: But you need a backing store, even if it's the default pager :-) antrik: The question is, Should users share the same backing store (swap space) or provide their own? slpz: not sure what you mean by "users" in this context :-) antrik: Real users with the ability of setting tmpfs translators essentially, I'd like to have a single partition that contains both swap space and the main filesystem (at least /tmp, but probably also all of /run, and possibly even /home...) but that's a bit off-topic :-) well, ideally all storage should be accounted to a user, regardless whether it's swapped out anonymous storage, temporary named files, or permanent files antrik: you could use a file as backend for tmpfs slpz: what's the point of using tmpfs then? :-) (and then store the file in another tmpfs) antrik: mach-defpager could be modified to use storeio instead of Mach's device_* operations, but by the way things work right now, that could be dangerous, IMHO pinotree: hehe .. recursive tmpfs'es ;) slpz: hm, sounds interesting antrik: tmpfs would try to keep data in memory always it's possible (not calling m_o_lock_request would do the trick), but if memory is scarce an Mach starts paging out, it would write it to that file/device/whatever ideally, all storage used by system tasks for swapped out anonymous memory as well as temporary named files would end up on the /run partition; while all storage used by users would end up in /home/* if users share a partition, some explicit storage accounting would be useful too... slpz: is that any different from what "normal" filesystems do?... (and *should* it be different?...) antrik: Yes, as most FS try to synchronize to disk at a reasonable rate, to prevent data losses. antrik: tmpfs would be a FS that wouldn't synchronize until it's forced to do that (which, by the way, it's what's currently happening with everyone that uses the default pager). slpz: hm, good point... antrik: Also, metadata in never written to disk, only kept in memory (which saves a lot of I/O, too). antrik: In fact, we would be doing the same as every other kernel does, but doing it explicitly :-) I see the use in separating precious data (in permanent named files) from temporary state (anonymous memory and temporary named files) -- but I'm not sure whether having a completely separate FS for the temporary data is the right approach for that... antrik: And giving the user the option to specify its own storage, so we don't limit him to the size established for swap by the super-user. either way, that would be a rather radical change... still would be good to fix tmpfs as it is first if possible as for limited swap, that's precisely why I'd prefer not to have an extra swap partition at all... antrik: It's not much o fa change, it's how it works right now, with the exception of replacing the default pager with its own. antrik: I think it's just a matter of 10-20 hours, as much. Including testing. antrik: It could be forked with another name, though :-) slpz: I don't mean radical change in the implementation... but a radical change in the way it would be used antrik: I suggest "almosttmpfs" as the name for the forked one :-P hehe how about lazyfs? antrik: That sound good to me, but probably we should use a more descriptive name :-) ## 2011-09-29 slpz, antrik: There is a defpager in the Hurd code. It is not currently being used, and likely incomplete. It is backed by libstore. I have never looked at it. [[open_issues/mach-defpager_vs_defpager]]. # IRC, freenode, #hurd, 2011-11-08 who else uses defpager besides tmpfs and kernel? normally, nothing directly than why tmpfs should use defpager? it's its backend backign store rather the backing store of most file systems are partitions tmpfs has none, it uses the swap space if we allocate memory for tmpfs using vm_allocate, will it be able to use swap partition? it should vm_allocate just maps anonymous memory anonymous memory uses swap space as its backing store too but be aware that this part of the vm system is known to have deficiencies which is why all mach based implementations have rewritten their default pager what kind of deficiencies? bugs and design issues, making anonymous memory fragmentation horrible mcsim: vm_allocate doesn't return a memory object; so it can't be passed to clients for mmap() antrik: I use vm_allocate in pager_read_page mcsim: well, that means that you have to actually implement a pager yourself also, when the kernel asks the pager to write back some pages, it expects the memory to become free. if you are "paging" to ordinary anonymous memory, this doesn't happen; so I expect it to have a very bad effect on system performance both can be avoided by just passing a real anonymous memory object, i.e. one provided by the defpager only problem is that the current defpager implementation can't really handle that... at least that's my understanding of the situation # IRC, freenode, #hurd, 2013-07-05 btw, why does the tmpfs translator have to talk to the pager? to get more control about how the memory is paged out? read lot's of irc logs about tmpfs on the wiki, but I couldn't find the answer to that teythoon: did you read this? http://www.gnu.org/software/hurd/hurd/translator/tmpfs/tmpfs_vs_defpager.html mcsim: I did teythoon: Last discussion, i think has very good point. To provide memory objects you should implement pager interface And if you implement pager interface you are the one who is asked to write data to backing storage to evict them But tmpfs doesn't do this mmm, clients doing mmap... teythoon: You don't have mmap teythoon: mmap is implemented on top of mach interface teythoon: I mean you don't have mmap at this level mcsim: sure, but that's close enough for me at this point teythoon: diskfs interface requires implementor to provide a memory object port (send right) Guest8183: Why tmpfs requires defpager how did you get to talk about that ? I was just asked Guest8183: it's just so unsettling that tmpfs has to be started as root :/ teythoon: why ? *** Guest8183 (~rbraun@dalaran.sceen.net) is now known as braunr_ braunr_: b/c starting translators isn't a privileged operation, and starting a tmpfs translator that doesn't even access any device but "just" memory shouldn't require any special privileges as well imho so why is tmpfs not based on say libnetfs? b/c it is used for d-i and someone (apt?) mmaps stuff? being libdiskfs-based isn't much the issue, iirc http://lists.gnu.org/archive/html/bug-hurd/2013-03/msg00014.html too teythoon: AFAIK apt uses mmap, yes teythoon: right a ramfs is actually tricky to implement well braunr_: What do you mean under "to implement well"? as efficiently as possible i.e. being as close as possible to the page cache for minimum overhead braunr: AFAIK ramfs should not use swap partition, so page cache shouldn't be relevant for it. i'm talking about a ramfs in general not the specific linux ramfs in linux, what they call ramfs is the tiny version of tmpfs that doesn't use swap i actually don't like "tmpfs" much memfs may be more appropriate anyway braunr: I see. And do you consider defpager variant as "close as possible to the page cache"? not far at least if we were able to use it for memory obects, it would be nice but defpager only gets attached to objects when they're evicted before that, anonymous (or temporary, in mach terminology) objects have no backing store this was probably designed without having tmpfs in mind i wonder if it's possible to create a memory object without a backing store what should happen to it if kernel decides to evict it? it sets the default pager as its backing store and pushes it out that's how it works now, but you said "create a memory object without a backing store" mach can do that i was wondering if we could do that too from userspace mach does not evict such objects, unless it bound a defpager to them but how can you handle this in userspace? i mean, create a memory object with a null control port mcsim: is that clearer ? suppose you create such object, how kernel will evict it if kernel does not know who is responsible for eviction of this object? it does 16:41 < braunr> it sets the default pager as its backing store and pushes it out that's how i intend to do it on x15 at least but it's much simpler there because uvm provides better separation between anonymous and file memory whereas they're much too similar in mach vm than what the difference between current situation, when you explicitly invoke defpager to create object and implicit method you propose? you don't need a true defpager unless you actually have swap ok now I see it also saves the communication overhead when initializing the object thank you which may be important since we use ramfs for speed mostly agree it should also simplify the defpager implementation, since it would only have a single client, the kernel which may also be important with regard to global design one thing which is in my opinion very wrong with mach is that it may be a client a well designed distributed system should normally not allow on component to act as both client and server toward another i.e. the kernel should only be a server, not a client and there should be a well designed server hierarchy to avoid deadlocks (such as the one we had in libpager because of that) And how about filesystem? It acts both as server and as client yes but not towards the same other component application -> file system -> kernel no "<->" the qnx documentation explains that quite well let me see if i can find the related description Basically, I've got your point. And I would rather agree that kernel should not act as client mcsim: http://www.qnx.com/developers/docs/6.4.0/neutrino/sys_arch/ipc.html#Robust one way to implement that (and qnx does that too) is to make pagers act as client only they sleep in the kernel, waiting for a reply and when the kernel needs to evict something, a reply is sent (qnx doesn't actually do that for paging, but it's a general idea) braunr: how hierarchy of senders is enforced? it's not developers must take care same as locking, be careful about it