summaryrefslogtreecommitdiff
path: root/open_issues/gnumach_memory_management_2.mdwn
blob: 64aae2a410ab5e9b9a078e3ed4e882216eb0f523 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
[[!meta copyright="Copyright © 2011 Free Software Foundation, Inc."]]

[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
id="license" text="Permission is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation License, Version 1.2 or
any later version published by the Free Software Foundation; with no Invariant
Sections, no Front-Cover Texts, and no Back-Cover Texts.  A copy of the license
is included in the section entitled [[GNU Free Documentation
License|/fdl]]."]]"""]]

[[!tag open_issue_gnumach]]

IRC, freenode, #hurd, 2011-10-16:

    <youpi> braunr: I realize that kmem_alloc_wired maps the allocated pages in
      the kernel map
    <youpi> it's a bit of a waste when my allocation is exactly a page size
    <youpi> is there a proper page allocation which would simply return its
      physical address?
    <youpi> pages returned by vm_page_grab  may get swapped out, right?
    <youpi> so could it be a matter of calling vm_page_alloc then vm_page_wire
      (with lock_queues held) ?
    <youpi> s/alloc/grab/
    <braunr> vm_page_grab() is only used at boot iirc
    <braunr> youpi: mapping allocated memory in the kernel map is normal, even
      if it's only a page
    <braunr> the allocated area usually gets merged with an existing
      vm_map_entry
    <braunr> youpi: also, i'm not sure about what you're trying to do here, so
      my answers may be out of scope :p
    <youpi> saving addressing space
    <youpi> with that scheme we're using twice as much addressing space for
      kernel buffers
    <braunr> kernel or user task ?
    <youpi> kernl
    <braunr> hm are there so many wired areas ?
    <youpi> several MiBs, yes
    <youpi> there are also the zalloc areas
    <braunr> that's pretty normal
    <youpi> which I've recently incrased
    <braunr> hm forget what i've just said about vm_page_grab()
    <braunr> youpi: is there a particular problem caused by kernel memory
      exhaustion ?
    <youpi> I currently can't pass the dh_strip stage of iceweasel due to this
    <youpi> it can not allocate a stack 
    <braunr> a kernel thread stack ?
    <youpi> yes
    <braunr> that's surprising
    <youpi> it'd be good to have a kernel memory profile
    <braunr> vminfo is able to return the kernel map
    <youpi> well, it's not suprising if the kernel map is full
    <youpi> but it doesn't tell what allocates which p ieces
    <braunr> that's what i mean, it's surprising to have a full kernel map
    <youpi> (that's what profiling is about)
    <braunr> right
    <youpi> well, it's not really surprising, considering that the krenel map
      size is arbitrarily chosen
    <braunr> youpi: 2 GiB is really high enough
    <youpi> it's not 2GiB, precisely
    <youpi> there is much of the 2GiB addr space which is spent on physical
      memory mapping
    <youpi> then there is the virtual mapping
    <braunr> are you sure we're talking about the kernel map, or e.g. the kmem
      map
    <youpi> which is currently only 192MiB
    <youpi> the kmem_map part of kernel_map
    <braunr> ok, the kmem_map submap then
    <braunr> netbsd has used 128 MiB for yeas with almost no issue
    <braunr> mach uses more kernel objects so it's reasonable to have a bigger
      map
    <braunr> but that big ..
    <youpi> I've made the zalloc areas quite big
    <youpi> err, not only zalloc area
    <braunr> kernel stacks are allocated directly from the kernel map
    <youpi> kalloc to 64MiB, zalloc to 64MiB
    <youpi> ipc map size to 8MiB
    <braunr> youpi: it could be the lack of kernel map entries
    <youpi> and the device io map to 16MiB
    <braunr> do you have the exact error message ?
    <youpi> no more room for vm_map_find_entry in 71403294
    <youpi> no more rooom for kmem_alloc_aligned in 71403294
    <braunr> ah, aligned
    <youpi> for a stack
    <youpi> which is 4 pages only
    <braunr> memory returned by kmem functions always return pages
    <braunr> hum
    <braunr> kmem functions always return memory in page units
    <youpi> and my xen driver is allocating 1MiB memory for the network buffer
    <braunr> 4 pages for kernel stacks ?
    <youpi> through kmem_alloc_wire
    <braunr> that seems a lot
    <youpi> that's needed for xen page updates
    <youpi> without having to split the update in several parts
    <braunr> ok
    <braunr> but are there any alignment requirements ?
    <youpi> I guess mach  uses the alignment trick to find "self"
    <youpi> anyway, an alignment on 4pages shouldn't be a problem
    <braunr> i think kmem_alloc_aligned() is the generic function used both for
      requests with and without alignment constraints
    <youpi> so I was thinking about at least moving my xen net driver to
      vm_grab_page instead of kmem_alloc
    <youpi> and along this, maybe others
    <braunr> but you only get a vm_page, you can't access the memory it
      describes
    <youpi> non, a lloc_aligned always aligns
    <youpi> why?
    <braunr> because it's not mapped
    <youpi> there's even vm_grab_page_physical_addr
    <youpi> it is, in the physical memory map
    <braunr> ah, you mean using the direct mapped area
    <youpi> yes
    <braunr> then yes
    <braunr> i don't know that part much
    <youpi> what I'm afraid of is the wiring
    <braunr> why ?
    <youpi> because I don't want to see my page swapped out :)
    <youpi> or whatever might happen if I don't wire it
    <braunr> oh i'm almost sure you won't
    <youpi> why?
    <youpi> why some people need to wire it, and I won't?
    <braunr> because in most mach vm derived code i've seen, you have to
      explicitely tell the vm your area is pageable
    <youpi> ah,  mach does such thing indeed
    <braunr> wiring can be annoying when you're passing kernel data to external
      tasks
    <braunr> you have to make sure the memory isn't wired once passed
    <braunr> but that's rather a security/resource management problem
    <youpi> in the net driver case, it's not passed to anything else
    <youpi> I'm seeing 19MiB kmem_alloc_wired atm
    <braunr> looks ok to me
    <braunr> be aware that the vm_resident code was meant for single page
      allocations
    <youpi> what does this mean?
    <braunr> there is no buddy algorithm or anything else decent enough wrt
      performance
    <braunr> vm_page_grab_contiguous_pages() can be quite slow
    <youpi> err, ok, but what is the relation with the question at stake ?
    <braunr> you need 4 pages of direct mapped memory for stacks
    <braunr> those pages need to be physically contiguous if you want to avoid
      the mapping
    <braunr> allocating physically contiguous pages in mach is slow
    <braunr> :)
    <youpi> I didn't mean I wanted to avoid the mapping for stacks
    <youpi> for anything more than a page, kmem mapping should be fine
    <youpi> I'm concerned with code which allocates only page per page
    <youpi> which thus really doesn't need any mapping
    <braunr> i don't know the mach details but in derived implementations,
      there is almost no overhead when allocating single pages
    <braunr> except for the tlb programming
    <youpi> well, there is: twice as much addressing space
    <braunr> well
    <braunr> netbsd doesn't directly map physical memory
    <braunr> and for others, like freebsd
    <braunr> the area is just one vm_map_entry
    <braunr> and on modern mmus, 4 MiBs physical mappings are used in pmap
    <youpi> again, I don't care about tlb & performance
    <youpi> just about the addressing space
    <braunr> hm
    <braunr> you say "twice"
    <youpi> which is short when you're trying to link crazy stuff like
      iceweasel & co
    <youpi> yes
    <braunr> ok, the virtual space is doubled
    <youpi> yes
    <braunr> but the resources consume to describe them aren't
    <braunr> even on mach
    <youpi> since you have both the physical mapping and the kmem mapping
    <youpi> I don't care much about the resources
    <youpi> but about addressing space
    <braunr> well there are a limited numbers of solutions
    <youpi> the space it takes  and has to be taken from something else, that
      is,  here physical memory available to Mach
    <braunr> reduce the physical mapping
    <braunr> increase the kmem mapping
    <braunr> or reduce kernel memory consumption
    <youpi> and instead of taking the space from physical  mapping, we can as
      well avoid doubling the space consumption when it's trivial lto
    <youpi> yes, the hird
    <youpi> +t
    <youpi> that's what I'm asking from the beginning :)
    <braunr> 18:21 < youpi> I don't care much about the resources
    <braunr> actually, you care :)
    <youpi> yes and no
    <braunr> i understand what you mean
    <youpi> not in the sense "it takes a page_t to allocate a page"
    <braunr> you want more virtual space, and aren't that much concerned about
      the number of objects used
    <youpi> yes
    <braunr> then it makes sense
    <braunr> but in this case, it could be a good idea to generalize it
    <braunr> have our own kmalloc/vmalloc interfaces
    <braunr> maybe a gsoc project :)
    <youpi> err, don't we have them already?
    <youpi> I mean, what exactly do you want to generalize?
    <braunr> mach only ever had vmalloc
    <youpi> we already have a hell lot of allocators :)
    <youpi> and it's a pain to distribute the available space to them
    <braunr> yes
    <braunr> but what you basically want is to introduce something similar to
      kmalloc for single pages
    <youpi> or just patch the few cases that need it into just grabbing a page
    <youpi> there are probably not so many
    <braunr> ok
    <braunr> i've just read vm_page_grab()
    <braunr> it only removes a page from the free list
    <braunr> other functions such as vm_page_alloc insert them afterwards
    <braunr> if a page is in no list, it can't be paged out
    <braunr> so i think it's probably safe to assume it's naturally wired
    <braunr> you don't even need a call to vm_page_wire or a flag of some sort
    <youpi> ok
    <braunr> although considering the low amount of work done by
      vm_page_wire(), you could, for clarity
    <youpi> braunr: I was also wondering about the VM_PAGE_FREE_RESERVED & such
      constants
    <youpi> they're like 50 pages
    <youpi> is this still reasonable  nowadays?
    <braunr> that's a question i'm still asking myself quite often :)
    <youpi> also, the BURST_{MAX,MIN} & such in vm_pageout.c are probably out
      of date?
    <braunr> i didn't study the pageout code much
    <youpi> k
    <braunr> but correct handling of low memory thresholds is a good point to
      keep in mind
    <braunr> youpi: i often wonder how linux can sometimes have so few free
      pages left and still be able to work without any visible failure
    <youpi> well, as long as you have enough pages to be able to make progress,
      you're fine
    <youpi> that' the point of the RESERVED pages in mach I guess
    <braunr> youpi: yes but, obviously, hard values are *bad*
    <braunr> linux must adjust it, depending on the number of processors, the
      number of pdflush threads, probably other things i don't have in mind
    <braunr> i don't know what should make us adjust that value in mach
    <youpi> which value?
    <braunr> the size of the reserved pool
    <youpi> I don't think it's adjusted
    <braunr> that's what i'm saying
    <braunr> i guess there is an #ifndef line for arch specific definitions
    <youpi> err, you just said linux must adjust it :
    <youpi> ))
    <youpi> there is none
    <braunr> linux adjusts it dynamically
    <braunr> well ok
    <braunr> that's another way to say it
    <braunr> we don't have code to get rid of this macro
    <braunr> but i don't even know how we, as maintainers, are supposed to
      guess it