summaryrefslogtreecommitdiff
path: root/glibc/mmap.mdwn
blob: cddd0584c5e4d4bde9e1e292c9c73578c4ea0bf1 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
[[!meta copyright="Copyright © 2012 Free Software Foundation, Inc."]]

[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
id="license" text="Permission is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation License, Version 1.2 or
any later version published by the Free Software Foundation; with no Invariant
Sections, no Front-Cover Texts, and no Back-Cover Texts.  A copy of the license
is included in the section entitled [[GNU Free Documentation
License|/fdl]]."]]"""]]

The `mmap` call is generally supported on GNU Hurd, as indicated by
`_POSIX_MAPPED_FILES` (`sysconf (_SC_MAPPED_FILES)`).


# Flags

*Flags contain mapping type, sharing type and options.*

  * *Mapping type (must choose one and only one of these).*

      * `MAP_FILE` (*Mapped from a file or device.*)

      * `MAP_ANON`/`MAP_ANONYMOUS` (*Allocated from anonymous virtual memory.*)

    Even though it is not defined to zero (it is for the Linux kernel; why not
    for us?), `MAP_FILE` is the default and can be omitted.

  * *Sharing types (must choose one and only one of these).*

      * `MAP_SHARED` (*Share changes.*)

      * `MAP_PRIVATE` (*Changes private; copy pages on write.*)

      * `MAP_COPY` (*Virtual copy of region at mapping time.*)

    For us, `MAP_PRIVATE` is the default (is defined to zero), for the Linux
    kernel, one of `MAP_SHARED` or `MAP_PRIVATE` has to be specified
    explicitly.

    The Linux kernel does not support `MAP_COPY`, and as per the comment in
    `elf/dl-load.c`, `MAP_PRIVATE | MAP_DENYWRITE` is Linux' replacement for
    `MAP_COPY`.  However, `MAP_DENYWRITE` is defunct (`mmap` manpage).

    In contrast to `MAP_COPY`, for `MAP_PRIVATE` *it is unspecified whether
    changes made to the file after the `mmap` call are visible in the mapped
    region* (`mmap` manpage).

    `MAP_COPY`:

        What exactly is that?  `elf/dl-load.c` has some explanation.
        <http://lkml.indiana.edu/hypermail/linux/kernel/0110.1/1506.html>

        It is only handled in `dl-sysdep.c`, when `flags &
        (MAP_COPY|MAP_PRIVATE)` is used for
        [[`vm_map`|microkernel/mach/interface/vm_map]]'s `copy` parameter, and
        `mmap.c` uses `! (flags & MAP_SHARED)` instead, which seems
        inconsistent?

        Usage in glibc:

          * `catgets/open_catalog.c:__open_catalog`,
            `locale/loadlocale.c:_nl_load_locale`: *Linux seems to lack read-only
            copy-on-write.*

  * `MAP_TYPE` (*Mask for type field.*/*Mask for type of mapping.*)

    [[!tag open_issue_glibc]]In `bits/mman.h` this is described and defined to
    be a mask for the *mapping* type, in the `bits/mman.h` files corresponding
    to Linux kernel it is described an defined to be a mask for the *sharing*
    type.

  * *Other flags.*

      * `MAP_FIXED` (*Map address must be exactly as requested.*)

        If the memory region is already in use, an unmap is attempted before
        (re-)mapping it.

        [[!tag open_issue_glibc]]The following text should be improved:

        `[glibc]/llio.texi` says:
        
            @var{address} gives a preferred starting address for the mapping.
            @code{NULL} expresses no preference. Any previous mapping at that
            address is automatically removed. [...]

        The comments in `misc/sys/mman.h`, `misc/mmap.c`, `misc/mmap64.c`,
        `ports/sysdeps/unix/sysv/linux/hppa/mmap.c`, and
        `sysdeps/mach/hurd/mmap.c` have a better wording:

            A successful `mmap' call
            deallocates any previous mapping for the affected region.

        This is correct insofar that for `MAP_FIXED` indeed it is first
        unmapped if already in use, and for the regular cases, an address will
        be chosen that has no previous mapping.

      * `MAP_NOEXTEND` (*For `MAP_FILE`, don't change file size.*)

        Referenced in `[hurd]/TODO` as unimplemented.

      * `MAP_HASSEMPHORE` (*Region may contain semaphores.*)

      * `MAP_INHERIT` (*Region is retained after exec.*)

  * Linux-specific flags

      * `MAP_GROWSDOWN` (*Stack-like segment.*), `MAP_GROWSUP` (*Register
        stack-like segment.*)

        See `mmap` manpage.

      * `MAP_DENYWRITE` (*`ETXTBSY`*)

        As per the comment in `elf/dl-load.c`, `MAP_PRIVATE | MAP_DENYWRITE` is
        Linux' replacement for `MAP_COPY`.  However, `MAP_DENYWRITE` is defunct
        (`mmap` manpage).

      * `MAP_EXECUTABLE` (*Mark it as an executable.*)

      * `MAP_LOCKED` (*Lock the mapping.*)

        ... à la `mlock`.  Not implemented for us, but probably
        could[[open_issue_glibc]].

      * `MAP_NORESERVE` (*Don't check for reservations.*)

        See `mmap` manpage.

        From [[hurd/porting/guidelines]]: *Not POSIX, but we could implement
        it.*

      * `MAP_POPULATE` (*Populate (prefault) pagetables.*)

        From the `mmap` manpage:

            Populate (prefault) page tables for a mapping.  For a file mapping,
            this causes read-ahead on the file.  Later accesses to the mapping
            will not be blocked by page faults.  MAP_POPULATE is only supported
            for private mappings since Linux 2.6.23.

        Unknown Linux kernel version, `mm/mmap.c`:

                if (vm_flags & VM_LOCKED) {
                        if (!mlock_vma_pages_range(vma, addr, addr + len))
                                mm->locked_vm += (len >> PAGE_SHIFT);
                } else if ((flags & MAP_POPULATE) && !(flags & MAP_NONBLOCK))
                        make_pages_present(addr, addr + len);
                return addr;

        Is only advisory, so can worked around with `#define MAP_POPULATE 0`,
        8069478040336a7de3461be275432493cc7e4c91.

      * `MAP_NONBLOCK` (*Do not block on IO.*)

        From the `mmap` manpage:

            Only meaningful in conjunction with MAP_POPULATE.  Don't perform
            read-ahead: only create page tables entries for pages that are
            already present in RAM.  Since Linux 2.6.23, this flag causes
            MAP_POPULATE to do nothing.  One day the combination of
            MAP_POPULATE and MAP_NONBLOCK may be reimplemented.

      * `MAP_STACK` (*Allocation is for a stack.*)

        See `mmap` manpage.

      * `MAP_HUGETLB` (*Create huge page mapping.*)

        See `mmap` manpage.

      * `MAP_32BIT` (*Only give out 32-bit addresses.*)

        See `mmap` manpage.


# Implementation

Essentially, `mmap` is implemented by means of
[[`io_map`|hurd/interface/io_map]] (not for `MAP_ANON`) followed by
[[`vm_map`|microkernel/mach/interface/vm_map]].

There are two implementations: `sysdeps/mach/hurd/mmap.c` (main implementation)
and `sysdeps/mach/hurd/dl-sysdep.c` (*Minimal mmap implementation sufficient
for initial loading of shared libraries.*).


## `mmap ("/dev/zero")`

[[!tag open_issue_glibc open_issue_hurd]]Do we implement that (equivalently to
`MAP_ANON`)?


## Mapping Size

From the `mmap` manpage:

    A file is mapped in multiples of the page size.  For a file that is not a
    multiple of the page size, the remaining memory is zeroed when mapped, and
    writes to that region are not written out to the file.  The effect of
    changing the size of the underlying file of a mapping on the pages that
    correspond to added or removed regions of the file is unspecified.

[[!tag open_issue_glibc]]Do we implement that?


## Use of a Mapped Region

From the `mmap` manpage:

    Use of a mapped region can result in these signals:
    
    SIGSEGV Attempted write into a region mapped as read-only.
    
    SIGBUS  Attempted access to a portion of the buffer that does not
            correspond to the file (for example, beyond the end of the file,
            including the case where another process has truncated the file).

[[!tag open_issue_glibc]]Do we implement that?


# Usage in glibc itself

Review of `mmap` usage in generic bits of glibc (omitted: `nptl/`,
`sysdeps/unix/sparc/`, `sysdepts/unix/sysv/linux/`), based on
a1bcbd4035ac2483dc10da150d4db46f3e1744f8 (2012-03-11).  `MAP_FILE` is the
interesting case; `MAP_ANON` is generally fine.  Some of the `mmap` usages in
glibc have fallback code for the `MAP_FAILED` case, some do not.

    catgets/open_catalog.c:    (struct catalog_obj *) __mmap (NULL, st.st_size, PROT_READ,
    catgets/open_catalog.c-                                  MAP_FILE|MAP_COPY, fd, 0);

Has fallback for `MAP_FAILED`.

    elf/cache.c:    = mmap (NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
    elf/cache.c:    = mmap (NULL, aux_cache_size, PROT_READ, MAP_PRIVATE, fd, 0);

No fallback for `MAP_FAILED`.

    elf/dl-load.c:        l->l_map_start = (ElfW(Addr)) __mmap ((void *) mappref, maplength,
    elf/dl-load.c-                                              c->prot,
    elf/dl-load.c-                                              MAP_COPY|MAP_FILE,
    elf/dl-load.c-                                              fd, c->mapoff);
    elf/dl-load.c:            && (__mmap ((void *) (l->l_addr + c->mapstart),
    elf/dl-load.c-                        c->mapend - c->mapstart, c->prot,
    elf/dl-load.c-                        MAP_FIXED|MAP_COPY|MAP_FILE,
    elf/dl-load.c-                        fd, c->mapoff)

No fallback for `MAP_FAILED`.

    elf/dl-misc.c:            result = __mmap (NULL, *sizep, prot,
    elf/dl-misc.c-#ifdef MAP_COPY
    elf/dl-misc.c-                             MAP_COPY
    elf/dl-misc.c-#else
    elf/dl-misc.c-                             MAP_PRIVATE
    elf/dl-misc.c-#endif
    elf/dl-misc.c-#ifdef MAP_FILE
    elf/dl-misc.c-                             | MAP_FILE
    elf/dl-misc.c-#endif
    elf/dl-misc.c-                             , fd, 0);

No fallback for `MAP_FAILED`.

    elf/dl-profile.c:  addr = (struct gmon_hdr *) __mmap (NULL, expected_size, PROT_READ|PROT_WRITE,
    elf/dl-profile.c-                                  MAP_SHARED|MAP_FILE, fd, 0);

No fallback for `MAP_FAILED`.

    elf/readlib.c:  file_contents = mmap (0, statbuf.st_size, PROT_READ, MAP_SHARED,
    elf/readlib.c-                        fileno (file), 0);

No fallback for `MAP_FAILED`.

    elf/sprof.c:      result->symbol_map = mmap (NULL, max_offset - min_offset,
    elf/sprof.c-                           PROT_READ, MAP_SHARED|MAP_FILE, symfd,
    elf/sprof.c-                           min_offset);
    elf/sprof.c:  addr = mmap (NULL, st.st_size, PROT_READ, MAP_SHARED|MAP_FILE, fd, 0);

No fallback for `MAP_FAILED`.

    iconv/gconv_cache.c:  gconv_cache = __mmap (NULL, cache_size, PROT_READ, MAP_SHARED, fd, 0);
    iconv/iconv_charmap.c:            && ((addr = mmap (NULL, st.st_size, PROT_READ, MAP_PRIVATE,
    iconv/iconv_charmap.c-                              fd, 0)) != MAP_FAILED))
    iconv/iconv_prog.c:           && ((addr = mmap (NULL, st.st_size, PROT_READ, MAP_PRIVATE,
    iconv/iconv_prog.c-                             fd, 0)) != MAP_FAILED))

Have fallback for `MAP_FAILED`.

    intl/loadmsgcat.c:  data = (struct mo_file_header *) mmap (NULL, size, PROT_READ,
    intl/loadmsgcat.c-                                     MAP_PRIVATE, fd, 0);

Has fallback for `MAP_FAILED`.

    libio/fileops.c:        p = __mmap (NULL, st.st_size, PROT_READ, MAP_SHARED,
    libio/fileops.c-                    fp->_fileno, 0);
    libio/fileops.c:      p = __mmap (NULL, st.st_size, PROT_READ, MAP_SHARED, fp->_fileno, 0);

Has fallback for `MAP_FAILED`.

    locale/loadarchive.c:      result = __mmap64 (NULL, mapsize, PROT_READ, MAP_FILE|MAP_COPY, fd, 0);
    locale/loadarchive.c:   result = __mmap64 (NULL, mapsize, PROT_READ, MAP_FILE|MAP_COPY,
    locale/loadarchive.c-                      fd, 0);
    locale/loadarchive.c:   addr = __mmap64 (NULL, to - from, PROT_READ, MAP_FILE|MAP_COPY,
    locale/loadarchive.c-                    fd, from);

Some have fallback for `MAP_FAILED`.

    locale/programs/locale.c:               void *mapped = mmap64 (NULL, st.st_size, PROT_READ,
    locale/programs/locale.c-                                      MAP_SHARED, fd, 0);
    locale/programs/locale.c:                   && ((mapped = mmap64 (NULL, st.st_size, PROT_READ,
    locale/programs/locale.c-                                         MAP_SHARED, fd, 0))
    locale/programs/locale.c:  addr = mmap64 (NULL, len, PROT_READ, MAP_SHARED, fd, 0);
    locale/programs/locarchive.c:      void *p = mmap64 (NULL, RESERVE_MMAP_SIZE, PROT_NONE, MAP_SHARED, fd, 0);
    locale/programs/locarchive.c:  p = mmap64 (p, total, PROT_READ | PROT_WRITE, MAP_SHARED | xflags, fd, 0);
    locale/programs/locarchive.c:  void *p = mmap64 (ah->addr + start, st.st_size - start,
    locale/programs/locarchive.c-             PROT_READ | PROT_WRITE, MAP_SHARED | MAP_FIXED,
    locale/programs/locarchive.c-             ah->fd, start);
    locale/programs/locarchive.c:    ah->addr = mmap64 (ah->addr, st.st_size, PROT_READ | PROT_WRITE,
    locale/programs/locarchive.c-                MAP_SHARED | MAP_FIXED, ah->fd, 0);
    locale/programs/locarchive.c:      ah->addr = mmap64 (NULL, st.st_size, PROT_READ | PROT_WRITE,
    locale/programs/locarchive.c-                  MAP_SHARED, ah->fd, 0);
    locale/programs/locarchive.c:  p = mmap64 (p, total, PROT_READ | PROT_WRITE, MAP_SHARED | xflags, fd, 0);
    locale/programs/locarchive.c:  ah->addr = mmap64 (p, st.st_size, PROT_READ | (readonly ? 0 : PROT_WRITE),
    locale/programs/locarchive.c-              MAP_SHARED | xflags, fd, 0);
    locale/programs/locarchive.c:     data[cnt].addr = mmap64 (NULL, st.st_size, PROT_READ, MAP_SHARED,
    locale/programs/locarchive.c-                              fd, 0);

No fallback for `MAP_FAILED`.

    nscd/connections.c:           else if ((mem = mmap (NULL, dbs[cnt].max_db_size,
    nscd/connections.c-                                 PROT_READ | PROT_WRITE,
    nscd/connections.c-                                 MAP_SHARED, fd, 0))
    nscd/connections.c:               || (mem = mmap (NULL, dbs[cnt].max_db_size,
    nscd/connections.c-                               PROT_READ | PROT_WRITE,
    nscd/connections.c-                               MAP_SHARED, fd, 0)) == MAP_FAILED)
    nscd/nscd_helper.c:  void *mapping = __mmap (NULL, mapsize, PROT_READ, MAP_SHARED, mapfd, 0);

No fallback for `MAP_FAILED`.

    nss/makedb.c:  const struct nss_db_header *header = mmap (NULL, st.st_size, PROT_READ,
    nss/makedb.c-                                      MAP_PRIVATE|MAP_POPULATE, fd, 0);
    nss/nss_db/db-open.c:   mapping->header = mmap (NULL, header.allocate, PROT_READ,
    nss/nss_db/db-open.c-                           MAP_PRIVATE, fd, 0);

No fallback for `MAP_FAILED`.

    posix/tst-mmap.c:  ptr = mmap (NULL, 1000, PROT_READ, MAP_SHARED, fd, ps);
    posix/tst-mmap.c:  ptr = mmap64 (NULL, 1000, PROT_READ, MAP_SHARED, fd, ps);
    rt/tst-mqueue3.c:  void *mem = mmap (NULL, ps, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    rt/tst-mqueue5.c:  void *mem = mmap (NULL, ps, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    rt/tst-shm.c:  mem = mmap (NULL, 4000, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    stdio-common/tst-fmemopen.c:  if ((mmap_data = (char *) mmap (NULL, fs.st_size, PROT_READ,
    stdio-common/tst-fmemopen.c-                            MAP_SHARED, fd, 0)) == MAP_FAILED)

No fallback for `MAP_FAILED`.


## `io_map` Failure

This is the [[libnetfs: `io_map`|open_issues/libnetfs_io_map]] issue.

[[!tag open_issue_glibc open_issue_hurd]]
[[tschwinge]]'s current plan is to make the following cases do the same (if
that is possible); probably by introducing a generic `mmap_or_read` function,
that first tries `mmap` (and that will succeed on Linux-based systems and also
on Hurd-based, if it's backed by [[hurd/libdiskfs]]), and if that fails tries
`mmap` on anonymous memory and then fills it by `read`ing the required data.
This is also what the [[hurd/exec]] server is doing (and is the reason that the
`./true` invocation on [[libnetfs: `io_map`|open_issues/libnetfs_io_map]]
works, to my understanding): see `exec.c:prepare`, if `io_map` fails,
`e->filemap == MACH_PORT_NULL`; then `exec.c:map` (as invoked from
`exec.c:load_section`, `exec.c:check_elf`, `exec.c:do_exec`, or
`hashexec.c:check_hashbang`) will use `io_read` instead.

Doing so potentially means reading in a lot of unused data -- but we probably
can't do any better?

In parallel (or even alternatively?), it should be researched how Linux (or any
other kernel) implements `mmap` on NFS and similar file systems, and then
implement the same in [[hurd/libnetfs]] and/or [[hurd/translator/nfs]], etc.

Here, also probably the whole mapping region [[!message-id desc="has to be
read" "871yjkl50c.fsf@becket.becket.net"]] ([bug-hurd list
archive](http://lists.gnu.org/archive/html/bug-hurd/2001-10/msg00306.html)) at
`mmap` time.  Then, only `MAP_PRIVATE` (or rather: `MAP_COPY`) is possible, but
not `MAP_SHARED`.