[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]]

[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
id="license" text="Permission is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation License, Version 1.2 or
any later version published by the Free Software Foundation; with no Invariant
Sections, no Front-Cover Texts, and no Back-Cover Texts.  A copy of the license
is included in the section entitled [[GNU Free Documentation
License|/fdl]]."]]"""]]

[[!tag open_issue_gnumach]]

IRC, freenode, #hurd, 2011-10-16:

    <youpi> braunr: I realize that kmem_alloc_wired maps the allocated pages in
      the kernel map
    <youpi> it's a bit of a waste when my allocation is exactly a page size
    <youpi> is there a proper page allocation which would simply return its
      physical address?
    <youpi> pages returned by vm_page_grab  may get swapped out, right?
    <youpi> so could it be a matter of calling vm_page_alloc then vm_page_wire
      (with lock_queues held) ?
    <youpi> s/alloc/grab/
    <braunr> vm_page_grab() is only used at boot iirc
    <braunr> youpi: mapping allocated memory in the kernel map is normal, even
      if it's only a page
    <braunr> the allocated area usually gets merged with an existing
      vm_map_entry
    <braunr> youpi: also, i'm not sure about what you're trying to do here, so
      my answers may be out of scope :p
    <youpi> saving addressing space
    <youpi> with that scheme we're using twice as much addressing space for
      kernel buffers
    <braunr> kernel or user task ?
    <youpi> kernl
    <braunr> hm are there so many wired areas ?
    <youpi> several MiBs, yes
    <youpi> there are also the zalloc areas
    <braunr> that's pretty normal
    <youpi> which I've recently incrased
    <braunr> hm forget what i've just said about vm_page_grab()
    <braunr> youpi: is there a particular problem caused by kernel memory
      exhaustion ?
    <youpi> I currently can't pass the dh_strip stage of iceweasel due to this
    <youpi> it can not allocate a stack 
    <braunr> a kernel thread stack ?
    <youpi> yes
    <braunr> that's surprising
    <youpi> it'd be good to have a kernel memory profile
    <braunr> vminfo is able to return the kernel map
    <youpi> well, it's not suprising if the kernel map is full
    <youpi> but it doesn't tell what allocates which p ieces
    <braunr> that's what i mean, it's surprising to have a full kernel map
    <youpi> (that's what profiling is about)
    <braunr> right
    <youpi> well, it's not really surprising, considering that the krenel map
      size is arbitrarily chosen
    <braunr> youpi: 2 GiB is really high enough
    <youpi> it's not 2GiB, precisely
    <youpi> there is much of the 2GiB addr space which is spent on physical
      memory mapping
    <youpi> then there is the virtual mapping
    <braunr> are you sure we're talking about the kernel map, or e.g. the kmem
      map
    <youpi> which is currently only 192MiB
    <youpi> the kmem_map part of kernel_map
    <braunr> ok, the kmem_map submap then
    <braunr> netbsd has used 128 MiB for yeas with almost no issue
    <braunr> mach uses more kernel objects so it's reasonable to have a bigger
      map
    <braunr> but that big ..
    <youpi> I've made the zalloc areas quite big
    <youpi> err, not only zalloc area
    <braunr> kernel stacks are allocated directly from the kernel map
    <youpi> kalloc to 64MiB, zalloc to 64MiB
    <youpi> ipc map size to 8MiB
    <braunr> youpi: it could be the lack of kernel map entries
    <youpi> and the device io map to 16MiB
    <braunr> do you have the exact error message ?
    <youpi> no more room for vm_map_find_entry in 71403294
    <youpi> no more rooom for kmem_alloc_aligned in 71403294
    <braunr> ah, aligned
    <youpi> for a stack
    <youpi> which is 4 pages only
    <braunr> memory returned by kmem functions always return pages
    <braunr> hum
    <braunr> kmem functions always return memory in page units
    <youpi> and my xen driver is allocating 1MiB memory for the network buffer
    <braunr> 4 pages for kernel stacks ?
    <youpi> through kmem_alloc_wire
    <braunr> that seems a lot
    <youpi> that's needed for xen page updates
    <youpi> without having to split the update in several parts
    <braunr> ok
    <braunr> but are there any alignment requirements ?
    <youpi> I guess mach  uses the alignment trick to find "self"
    <youpi> anyway, an alignment on 4pages shouldn't be a problem
    <braunr> i think kmem_alloc_aligned() is the generic function used both for
      requests with and without alignment constraints
    <youpi> so I was thinking about at least moving my xen net driver to
      vm_grab_page instead of kmem_alloc
    <youpi> and along this, maybe others
    <braunr> but you only get a vm_page, you can't access the memory it
      describes
    <youpi> non, a lloc_aligned always aligns
    <youpi> why?
    <braunr> because it's not mapped
    <youpi> there's even vm_grab_page_physical_addr
    <youpi> it is, in the physical memory map
    <braunr> ah, you mean using the direct mapped area
    <youpi> yes
    <braunr> then yes
    <braunr> i don't know that part much
    <youpi> what I'm afraid of is the wiring
    <braunr> why ?
    <youpi> because I don't want to see my page swapped out :)
    <youpi> or whatever might happen if I don't wire it
    <braunr> oh i'm almost sure you won't
    <youpi> why?
    <youpi> why some people need to wire it, and I won't?
    <braunr> because in most mach vm derived code i've seen, you have to
      explicitely tell the vm your area is pageable
    <youpi> ah,  mach does such thing indeed
    <braunr> wiring can be annoying when you're passing kernel data to external
      tasks
    <braunr> you have to make sure the memory isn't wired once passed
    <braunr> but that's rather a security/resource management problem
    <youpi> in the net driver case, it's not passed to anything else
    <youpi> I'm seeing 19MiB kmem_alloc_wired atm
    <braunr> looks ok to me
    <braunr> be aware that the vm_resident code was meant for single page
      allocations
    <youpi> what does this mean?
    <braunr> there is no buddy algorithm or anything else decent enough wrt
      performance
    <braunr> vm_page_grab_contiguous_pages() can be quite slow
    <youpi> err, ok, but what is the relation with the question at stake ?
    <braunr> you need 4 pages of direct mapped memory for stacks
    <braunr> those pages need to be physically contiguous if you want to avoid
      the mapping
    <braunr> allocating physically contiguous pages in mach is slow
    <braunr> :)
    <youpi> I didn't mean I wanted to avoid the mapping for stacks
    <youpi> for anything more than a page, kmem mapping should be fine
    <youpi> I'm concerned with code which allocates only page per page
    <youpi> which thus really doesn't need any mapping
    <braunr> i don't know the mach details but in derived implementations,
      there is almost no overhead when allocating single pages
    <braunr> except for the tlb programming
    <youpi> well, there is: twice as much addressing space
    <braunr> well
    <braunr> netbsd doesn't directly map physical memory
    <braunr> and for others, like freebsd
    <braunr> the area is just one vm_map_entry
    <braunr> and on modern mmus, 4 MiBs physical mappings are used in pmap
    <youpi> again, I don't care about tlb & performance
    <youpi> just about the addressing space
    <braunr> hm
    <braunr> you say "twice"
    <youpi> which is short when you're trying to link crazy stuff like
      iceweasel & co
    <youpi> yes
    <braunr> ok, the virtual space is doubled
    <youpi> yes
    <braunr> but the resources consume to describe them aren't
    <braunr> even on mach
    <youpi> since you have both the physical mapping and the kmem mapping
    <youpi> I don't care much about the resources
    <youpi> but about addressing space
    <braunr> well there are a limited numbers of solutions
    <youpi> the space it takes  and has to be taken from something else, that
      is,  here physical memory available to Mach
    <braunr> reduce the physical mapping
    <braunr> increase the kmem mapping
    <braunr> or reduce kernel memory consumption
    <youpi> and instead of taking the space from physical  mapping, we can as
      well avoid doubling the space consumption when it's trivial lto
    <youpi> yes, the hird
    <youpi> +t
    <youpi> that's what I'm asking from the beginning :)
    <braunr> 18:21 < youpi> I don't care much about the resources
    <braunr> actually, you care :)
    <youpi> yes and no
    <braunr> i understand what you mean
    <youpi> not in the sense "it takes a page_t to allocate a page"
    <braunr> you want more virtual space, and aren't that much concerned about
      the number of objects used
    <youpi> yes
    <braunr> then it makes sense
    <braunr> but in this case, it could be a good idea to generalize it
    <braunr> have our own kmalloc/vmalloc interfaces
    <braunr> maybe a gsoc project :)
    <youpi> err, don't we have them already?
    <youpi> I mean, what exactly do you want to generalize?
    <braunr> mach only ever had vmalloc
    <youpi> we already have a hell lot of allocators :)
    <youpi> and it's a pain to distribute the available space to them
    <braunr> yes
    <braunr> but what you basically want is to introduce something similar to
      kmalloc for single pages
    <youpi> or just patch the few cases that need it into just grabbing a page
    <youpi> there are probably not so many
    <braunr> ok
    <braunr> i've just read vm_page_grab()
    <braunr> it only removes a page from the free list
    <braunr> other functions such as vm_page_alloc insert them afterwards
    <braunr> if a page is in no list, it can't be paged out
    <braunr> so i think it's probably safe to assume it's naturally wired
    <braunr> you don't even need a call to vm_page_wire or a flag of some sort
    <youpi> ok
    <braunr> although considering the low amount of work done by
      vm_page_wire(), you could, for clarity
    <youpi> braunr: I was also wondering about the VM_PAGE_FREE_RESERVED & such
      constants
    <youpi> they're like 50 pages
    <youpi> is this still reasonable  nowadays?
    <braunr> that's a question i'm still asking myself quite often :)
    <youpi> also, the BURST_{MAX,MIN} & such in vm_pageout.c are probably out
      of date?
    <braunr> i didn't study the pageout code much
    <youpi> k
    <braunr> but correct handling of low memory thresholds is a good point to
      keep in mind
    <braunr> youpi: i often wonder how linux can sometimes have so few free
      pages left and still be able to work without any visible failure
    <youpi> well, as long as you have enough pages to be able to make progress,
      you're fine
    <youpi> that' the point of the RESERVED pages in mach I guess
    <braunr> youpi: yes but, obviously, hard values are *bad*
    <braunr> linux must adjust it, depending on the number of processors, the
      number of pdflush threads, probably other things i don't have in mind
    <braunr> i don't know what should make us adjust that value in mach
    <youpi> which value?
    <braunr> the size of the reserved pool
    <youpi> I don't think it's adjusted
    <braunr> that's what i'm saying
    <braunr> i guess there is an #ifndef line for arch specific definitions
    <youpi> err, you just said linux must adjust it :
    <youpi> ))
    <youpi> there is none
    <braunr> linux adjusts it dynamically
    <braunr> well ok
    <braunr> that's another way to say it
    <braunr> we don't have code to get rid of this macro
    <braunr> but i don't even know how we, as maintainers, are supposed to
      guess it