[[!meta copyright="Copyright © 2012 Free Software Foundation, Inc."]]

[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
id="license" text="Permission is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation License, Version 1.2 or
any later version published by the Free Software Foundation; with no Invariant
Sections, no Front-Cover Texts, and no Back-Cover Texts.  A copy of the license
is included in the section entitled [[GNU Free Documentation
License|/fdl]]."]]"""]]

The `mmap` call is generally supported on GNU Hurd, as indicated by
`_POSIX_MAPPED_FILES` (`sysconf (_SC_MAPPED_FILES)`).


# Flags

*Flags contain mapping type, sharing type and options.*

  * *Mapping type (must choose one and only one of these).*

      * `MAP_FILE` (*Mapped from a file or device.*)

      * `MAP_ANON`/`MAP_ANONYMOUS` (*Allocated from anonymous virtual memory.*)

    Even though it is not defined to zero (it is for the Linux kernel; why not
    for us?), `MAP_FILE` is the default and can be omitted.

  * *Sharing types (must choose one and only one of these).*

      * `MAP_SHARED` (*Share changes.*)

      * `MAP_PRIVATE` (*Changes private; copy pages on write.*)

      * `MAP_COPY` (*Virtual copy of region at mapping time.*)

    For us, `MAP_PRIVATE` is the default (is defined to zero), for the Linux
    kernel, one of `MAP_SHARED` or `MAP_PRIVATE` has to be specified
    explicitly.

    The Linux kernel does not support `MAP_COPY`, and as per the comment in
    `elf/dl-load.c`, `MAP_PRIVATE | MAP_DENYWRITE` is Linux' replacement for
    `MAP_COPY`.  However, `MAP_DENYWRITE` is defunct (`mmap` manpage).

    In contrast to `MAP_COPY`, for `MAP_PRIVATE` *it is unspecified whether
    changes made to the file after the `mmap` call are visible in the mapped
    region* (`mmap` manpage).

    `MAP_COPY`:

        What exactly is that?  `elf/dl-load.c` has some explanation.
        <http://lkml.indiana.edu/hypermail/linux/kernel/0110.1/1506.html>

        It is only handled in `dl-sysdep.c`, when `flags &
        (MAP_COPY|MAP_PRIVATE)` is used for
        [[`vm_map`|microkernel/mach/interface/vm_map]]'s `copy` parameter, and
        `mmap.c` uses `! (flags & MAP_SHARED)` instead, which seems
        inconsistent?

        Usage in glibc:

          * `catgets/open_catalog.c:__open_catalog`,
            `locale/loadlocale.c:_nl_load_locale`: *Linux seems to lack read-only
            copy-on-write.*

  * `MAP_TYPE` (*Mask for type field.*/*Mask for type of mapping.*)

    [[!tag open_issue_glibc]]In `bits/mman.h` this is described and defined to
    be a mask for the *mapping* type, in the `bits/mman.h` files corresponding
    to Linux kernel it is described an defined to be a mask for the *sharing*
    type.

  * *Other flags.*

      * `MAP_FIXED` (*Map address must be exactly as requested.*)

        If the memory region is already in use, an unmap is attempted before
        (re-)mapping it.

        [[!tag open_issue_glibc]]The following text should be improved:

        `[glibc]/llio.texi` says:
        
            @var{address} gives a preferred starting address for the mapping.
            @code{NULL} expresses no preference. Any previous mapping at that
            address is automatically removed. [...]

        The comments in `misc/sys/mman.h`, `misc/mmap.c`, `misc/mmap64.c`,
        `ports/sysdeps/unix/sysv/linux/hppa/mmap.c`, and
        `sysdeps/mach/hurd/mmap.c` have a better wording:

            A successful `mmap' call
            deallocates any previous mapping for the affected region.

        This is correct insofar that for `MAP_FIXED` indeed it is first
        unmapped if already in use, and for the regular cases, an address will
        be chosen that has no previous mapping.

      * `MAP_NOEXTEND` (*For `MAP_FILE`, don't change file size.*)

        Referenced in `[hurd]/TODO` as unimplemented.

      * `MAP_HASSEMPHORE` (*Region may contain semaphores.*)

      * `MAP_INHERIT` (*Region is retained after exec.*)

  * Linux-specific flags

      * `MAP_GROWSDOWN` (*Stack-like segment.*), `MAP_GROWSUP` (*Register
        stack-like segment.*)

        See `mmap` manpage.

      * `MAP_DENYWRITE` (*`ETXTBSY`*)

        As per the comment in `elf/dl-load.c`, `MAP_PRIVATE | MAP_DENYWRITE` is
        Linux' replacement for `MAP_COPY`.  However, `MAP_DENYWRITE` is defunct
        (`mmap` manpage).

      * `MAP_EXECUTABLE` (*Mark it as an executable.*)

      * `MAP_LOCKED` (*Lock the mapping.*)

        ... à la `mlock`.  Not implemented for us, but probably
        could[[open_issue_glibc]].

      * `MAP_NORESERVE` (*Don't check for reservations.*)

        See `mmap` manpage.

        From [[hurd/porting/guidelines]]: *Not POSIX, but we could implement
        it.*

      * `MAP_POPULATE` (*Populate (prefault) pagetables.*)

        From the `mmap` manpage:

            Populate (prefault) page tables for a mapping.  For a file mapping,
            this causes read-ahead on the file.  Later accesses to the mapping
            will not be blocked by page faults.  MAP_POPULATE is only supported
            for private mappings since Linux 2.6.23.

        Unknown Linux kernel version, `mm/mmap.c`:

                if (vm_flags & VM_LOCKED) {
                        if (!mlock_vma_pages_range(vma, addr, addr + len))
                                mm->locked_vm += (len >> PAGE_SHIFT);
                } else if ((flags & MAP_POPULATE) && !(flags & MAP_NONBLOCK))
                        make_pages_present(addr, addr + len);
                return addr;

        Is only advisory, so can worked around with `#define MAP_POPULATE 0`,
        8069478040336a7de3461be275432493cc7e4c91.

      * `MAP_NONBLOCK` (*Do not block on IO.*)

        From the `mmap` manpage:

            Only meaningful in conjunction with MAP_POPULATE.  Don't perform
            read-ahead: only create page tables entries for pages that are
            already present in RAM.  Since Linux 2.6.23, this flag causes
            MAP_POPULATE to do nothing.  One day the combination of
            MAP_POPULATE and MAP_NONBLOCK may be reimplemented.

      * `MAP_STACK` (*Allocation is for a stack.*)

        See `mmap` manpage.

      * `MAP_HUGETLB` (*Create huge page mapping.*)

        See `mmap` manpage.

      * `MAP_32BIT` (*Only give out 32-bit addresses.*)

        See `mmap` manpage.


# Implementation

Essentially, `mmap` is implemented by means of
[[`io_map`|hurd/interface/io_map]] (not for `MAP_ANON`) followed by
[[`vm_map`|microkernel/mach/interface/vm_map]].

There are two implementations: `sysdeps/mach/hurd/mmap.c` (main implementation)
and `sysdeps/mach/hurd/dl-sysdep.c` (*Minimal mmap implementation sufficient
for initial loading of shared libraries.*).


## `mmap ("/dev/zero")`

[[!tag open_issue_glibc open_issue_hurd]]Do we implement that (equivalently to
`MAP_ANON`)?


## Mapping Size

From the `mmap` manpage:

    A file is mapped in multiples of the page size.  For a file that is not a
    multiple of the page size, the remaining memory is zeroed when mapped, and
    writes to that region are not written out to the file.  The effect of
    changing the size of the underlying file of a mapping on the pages that
    correspond to added or removed regions of the file is unspecified.

[[!tag open_issue_glibc]]Do we implement that?


## Use of a Mapped Region

From the `mmap` manpage:

    Use of a mapped region can result in these signals:
    
    SIGSEGV Attempted write into a region mapped as read-only.
    
    SIGBUS  Attempted access to a portion of the buffer that does not
            correspond to the file (for example, beyond the end of the file,
            including the case where another process has truncated the file).

[[!tag open_issue_glibc]]Do we implement that?


# Usage in glibc itself

Review of `mmap` usage in generic bits of glibc (omitted: `nptl/`,
`sysdeps/unix/sparc/`, `sysdepts/unix/sysv/linux/`), based on
a1bcbd4035ac2483dc10da150d4db46f3e1744f8 (2012-03-11).  `MAP_FILE` is the
interesting case; `MAP_ANON` is generally fine.  Some of the `mmap` usages in
glibc have fallback code for the `MAP_FAILED` case, some do not.

    catgets/open_catalog.c:    (struct catalog_obj *) __mmap (NULL, st.st_size, PROT_READ,
    catgets/open_catalog.c-                                  MAP_FILE|MAP_COPY, fd, 0);

Has fallback for `MAP_FAILED`.

    elf/cache.c:    = mmap (NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
    elf/cache.c:    = mmap (NULL, aux_cache_size, PROT_READ, MAP_PRIVATE, fd, 0);

No fallback for `MAP_FAILED`.

    elf/dl-load.c:        l->l_map_start = (ElfW(Addr)) __mmap ((void *) mappref, maplength,
    elf/dl-load.c-                                              c->prot,
    elf/dl-load.c-                                              MAP_COPY|MAP_FILE,
    elf/dl-load.c-                                              fd, c->mapoff);
    elf/dl-load.c:            && (__mmap ((void *) (l->l_addr + c->mapstart),
    elf/dl-load.c-                        c->mapend - c->mapstart, c->prot,
    elf/dl-load.c-                        MAP_FIXED|MAP_COPY|MAP_FILE,
    elf/dl-load.c-                        fd, c->mapoff)

No fallback for `MAP_FAILED`.

    elf/dl-misc.c:            result = __mmap (NULL, *sizep, prot,
    elf/dl-misc.c-#ifdef MAP_COPY
    elf/dl-misc.c-                             MAP_COPY
    elf/dl-misc.c-#else
    elf/dl-misc.c-                             MAP_PRIVATE
    elf/dl-misc.c-#endif
    elf/dl-misc.c-#ifdef MAP_FILE
    elf/dl-misc.c-                             | MAP_FILE
    elf/dl-misc.c-#endif
    elf/dl-misc.c-                             , fd, 0);

No fallback for `MAP_FAILED`.

    elf/dl-profile.c:  addr = (struct gmon_hdr *) __mmap (NULL, expected_size, PROT_READ|PROT_WRITE,
    elf/dl-profile.c-                                  MAP_SHARED|MAP_FILE, fd, 0);

No fallback for `MAP_FAILED`.

    elf/readlib.c:  file_contents = mmap (0, statbuf.st_size, PROT_READ, MAP_SHARED,
    elf/readlib.c-                        fileno (file), 0);

No fallback for `MAP_FAILED`.

    elf/sprof.c:      result->symbol_map = mmap (NULL, max_offset - min_offset,
    elf/sprof.c-                           PROT_READ, MAP_SHARED|MAP_FILE, symfd,
    elf/sprof.c-                           min_offset);
    elf/sprof.c:  addr = mmap (NULL, st.st_size, PROT_READ, MAP_SHARED|MAP_FILE, fd, 0);

No fallback for `MAP_FAILED`.

    iconv/gconv_cache.c:  gconv_cache = __mmap (NULL, cache_size, PROT_READ, MAP_SHARED, fd, 0);
    iconv/iconv_charmap.c:            && ((addr = mmap (NULL, st.st_size, PROT_READ, MAP_PRIVATE,
    iconv/iconv_charmap.c-                              fd, 0)) != MAP_FAILED))
    iconv/iconv_prog.c:           && ((addr = mmap (NULL, st.st_size, PROT_READ, MAP_PRIVATE,
    iconv/iconv_prog.c-                             fd, 0)) != MAP_FAILED))

Have fallback for `MAP_FAILED`.

    intl/loadmsgcat.c:  data = (struct mo_file_header *) mmap (NULL, size, PROT_READ,
    intl/loadmsgcat.c-                                     MAP_PRIVATE, fd, 0);

Has fallback for `MAP_FAILED`.

    libio/fileops.c:        p = __mmap (NULL, st.st_size, PROT_READ, MAP_SHARED,
    libio/fileops.c-                    fp->_fileno, 0);
    libio/fileops.c:      p = __mmap (NULL, st.st_size, PROT_READ, MAP_SHARED, fp->_fileno, 0);

Has fallback for `MAP_FAILED`.

    locale/loadarchive.c:      result = __mmap64 (NULL, mapsize, PROT_READ, MAP_FILE|MAP_COPY, fd, 0);
    locale/loadarchive.c:   result = __mmap64 (NULL, mapsize, PROT_READ, MAP_FILE|MAP_COPY,
    locale/loadarchive.c-                      fd, 0);
    locale/loadarchive.c:   addr = __mmap64 (NULL, to - from, PROT_READ, MAP_FILE|MAP_COPY,
    locale/loadarchive.c-                    fd, from);

Some have fallback for `MAP_FAILED`.

    locale/programs/locale.c:               void *mapped = mmap64 (NULL, st.st_size, PROT_READ,
    locale/programs/locale.c-                                      MAP_SHARED, fd, 0);
    locale/programs/locale.c:                   && ((mapped = mmap64 (NULL, st.st_size, PROT_READ,
    locale/programs/locale.c-                                         MAP_SHARED, fd, 0))
    locale/programs/locale.c:  addr = mmap64 (NULL, len, PROT_READ, MAP_SHARED, fd, 0);
    locale/programs/locarchive.c:      void *p = mmap64 (NULL, RESERVE_MMAP_SIZE, PROT_NONE, MAP_SHARED, fd, 0);
    locale/programs/locarchive.c:  p = mmap64 (p, total, PROT_READ | PROT_WRITE, MAP_SHARED | xflags, fd, 0);
    locale/programs/locarchive.c:  void *p = mmap64 (ah->addr + start, st.st_size - start,
    locale/programs/locarchive.c-             PROT_READ | PROT_WRITE, MAP_SHARED | MAP_FIXED,
    locale/programs/locarchive.c-             ah->fd, start);
    locale/programs/locarchive.c:    ah->addr = mmap64 (ah->addr, st.st_size, PROT_READ | PROT_WRITE,
    locale/programs/locarchive.c-                MAP_SHARED | MAP_FIXED, ah->fd, 0);
    locale/programs/locarchive.c:      ah->addr = mmap64 (NULL, st.st_size, PROT_READ | PROT_WRITE,
    locale/programs/locarchive.c-                  MAP_SHARED, ah->fd, 0);
    locale/programs/locarchive.c:  p = mmap64 (p, total, PROT_READ | PROT_WRITE, MAP_SHARED | xflags, fd, 0);
    locale/programs/locarchive.c:  ah->addr = mmap64 (p, st.st_size, PROT_READ | (readonly ? 0 : PROT_WRITE),
    locale/programs/locarchive.c-              MAP_SHARED | xflags, fd, 0);
    locale/programs/locarchive.c:     data[cnt].addr = mmap64 (NULL, st.st_size, PROT_READ, MAP_SHARED,
    locale/programs/locarchive.c-                              fd, 0);

No fallback for `MAP_FAILED`.

    nscd/connections.c:           else if ((mem = mmap (NULL, dbs[cnt].max_db_size,
    nscd/connections.c-                                 PROT_READ | PROT_WRITE,
    nscd/connections.c-                                 MAP_SHARED, fd, 0))
    nscd/connections.c:               || (mem = mmap (NULL, dbs[cnt].max_db_size,
    nscd/connections.c-                               PROT_READ | PROT_WRITE,
    nscd/connections.c-                               MAP_SHARED, fd, 0)) == MAP_FAILED)
    nscd/nscd_helper.c:  void *mapping = __mmap (NULL, mapsize, PROT_READ, MAP_SHARED, mapfd, 0);

No fallback for `MAP_FAILED`.

    nss/makedb.c:  const struct nss_db_header *header = mmap (NULL, st.st_size, PROT_READ,
    nss/makedb.c-                                      MAP_PRIVATE|MAP_POPULATE, fd, 0);
    nss/nss_db/db-open.c:   mapping->header = mmap (NULL, header.allocate, PROT_READ,
    nss/nss_db/db-open.c-                           MAP_PRIVATE, fd, 0);

No fallback for `MAP_FAILED`.

    posix/tst-mmap.c:  ptr = mmap (NULL, 1000, PROT_READ, MAP_SHARED, fd, ps);
    posix/tst-mmap.c:  ptr = mmap64 (NULL, 1000, PROT_READ, MAP_SHARED, fd, ps);
    rt/tst-mqueue3.c:  void *mem = mmap (NULL, ps, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    rt/tst-mqueue5.c:  void *mem = mmap (NULL, ps, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    rt/tst-shm.c:  mem = mmap (NULL, 4000, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    stdio-common/tst-fmemopen.c:  if ((mmap_data = (char *) mmap (NULL, fs.st_size, PROT_READ,
    stdio-common/tst-fmemopen.c-                            MAP_SHARED, fd, 0)) == MAP_FAILED)

No fallback for `MAP_FAILED`.


## `io_map` Failure

This is the [[libnetfs: `io_map`|service_solahart_jakarta_selatan__082122541663/libnetfs_io_map]] issue.

[[!tag open_issue_glibc open_issue_hurd]]
[[tschwinge]]'s current plan is to make the following cases do the same (if
that is possible); probably by introducing a generic `mmap_or_read` function,
that first tries `mmap` (and that will succeed on Linux-based systems and also
on Hurd-based, if it's backed by [[hurd/libdiskfs]]), and if that fails tries
`mmap` on anonymous memory and then fills it by `read`ing the required data.
This is also what the [[hurd/exec]] server is doing (and is the reason that the
`./true` invocation on [[libnetfs: `io_map`|service_solahart_jakarta_selatan__082122541663/libnetfs_io_map]]
works, to my understanding): see `exec.c:prepare`, if `io_map` fails,
`e->filemap == MACH_PORT_NULL`; then `exec.c:map` (as invoked from
`exec.c:load_section`, `exec.c:check_elf`, `exec.c:do_exec`, or
`hashexec.c:check_hashbang`) will use `io_read` instead.

Doing so potentially means reading in a lot of unused data -- but we probably
can't do any better?

In parallel (or even alternatively?), it should be researched how Linux (or any
other kernel) implements `mmap` on NFS and similar file systems, and then
implement the same in [[hurd/libnetfs]] and/or [[hurd/translator/nfs]], etc.

Here, also probably the whole mapping region [[!message-id desc="has to be
read" "871yjkl50c.fsf@becket.becket.net"]] ([bug-hurd list
archive](http://lists.gnu.org/archive/html/bug-hurd/2001-10/msg00306.html)) at
`mmap` time.  Then, only `MAP_PRIVATE` (or rather: `MAP_COPY`) is possible, but
not `MAP_SHARED`.