[[!meta copyright="Copyright © 2010, 2012 Free Software Foundation, Inc."]]

[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
id="license" text="Permission is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation License, Version 1.2 or
any later version published by the Free Software Foundation; with no Invariant
Sections, no Front-Cover Texts, and no Back-Cover Texts.  A copy of the license
is included in the section entitled [[GNU Free Documentation
License|/fdl]]."]]"""]]

Here's what's to be done for maintaining Boehm GC.

This one does need Hurd-specific configuration.

It is, for example, used by [[/GCC]] (which has its own fork), so any changes
committed upstream should very like also be made there.

[[!toc levels=2]]


# [[General information|/boehm_gc]]


# Configuration

<!--

git checkout reviewed
git log --reverse --pretty=fuller --stat=$COLUMNS,$COLUMNS -w -p -C --cc ..upstream/master
-i
/^commit |^---$|hurd|linux|glibc

-->

Last reviewed up to the 5f492b98dd131bdd6c67eb56c31024420c1e7dab (2012-06-08)
sources, and for `libatomic_ops` to the
6a0afde033f105c6320f1409162e3765a1395bfd (2012-05-15) sources.

  * `configure.ac`

      * `PARALLEL_MARK` is not enabled; doesn't make sense so far.

      * `*-*-kfreebsd*-gnu` defines `USE_COMPILER_TLS`.  What's this, and
        why does not other config?

      * TODO

            [ if test "$enable_gc_debug" = "yes"; then
                AC_MSG_WARN("Should define GC_DEBUG and use debug alloc. in clients.")
                AC_DEFINE([KEEP_BACK_PTRS], 1,
                          [Define to save back-pointers in debugging headers.])
                keep_back_ptrs=true
                AC_DEFINE([DBG_HDRS_ALL], 1,
                          [Define to force debug headers on all objects.])
                case $host in
                  x86-*-linux* | i586-*-linux* | i686-*-linux* | x86_64-*-linux* )
                    AC_DEFINE(MAKE_BACK_GRAPH)
                    AC_MSG_WARN("Client must not use -fomit-frame-pointer.")
                    AC_DEFINE(SAVE_CALL_COUNT, 8)
                  ;;
            AM_CONDITIONAL([KEEP_BACK_PTRS], [test x"$keep_back_ptrs" = xtrue])

  * `configure.host`

    Nothing.

  * `Makefile.am`, `include/include.am`, `cord/cord.am`, `doc/doc.am`,
    `tests/tests.am`

    Nothing.

  * `include/gc_config_macros.h`

    Should be OK.

  * `include/private/gcconfig.h`

    Hairy.  But should be OK.  Search for *HURD*, compare to *LINUX*,
    *I386* case.

    See `doc/porting.html` and `doc/README.macros` (and others) for
    documentation.

    *LINUX* has:

      * `#define LINUX_STACKBOTTOM`

        Defined instead of `STACKBOTTOM` to have the value read from `/proc/`.

      * `#define HEAP_START (ptr_t)0x1000`

        May want to define it for us, too?

      * `#ifdef USE_I686_PREFETCH`, `USE_3DNOW_PREFETCH` --- [...]

        Apparently these are optimization that we also could use.  Have a
        look at *LINUX* for *X86_64*, which uses `__builtin_prefetch`
        (which Linux x86 could use, too?).

      * TODO

            #if defined(LINUX) && defined(USE_MMAP)
                /* The kernel may do a somewhat better job merging mappings etc.    */
                /* with anonymous mappings.                                         */
            #   define USE_MMAP_ANON
            #endif

      * TODO

            #if defined(GC_LINUX_THREADS) && defined(REDIRECT_MALLOC)
                /* Nptl allocates thread stacks with mmap, which is fine.  But it   */
                /* keeps a cache of thread stacks.  Thread stacks contain the       */
                /* thread control blocks.  These in turn contain a pointer to       */
                /* (sizeof (void *) from the beginning of) the dtv for thread-local */
                /* storage, which is calloc allocated.  If we don't scan the cached */
                /* thread stacks, we appear to lose the dtv.  This tends to         */
                /* result in something that looks like a bogus dtv count, which     */
                /* tends to result in a memset call on a block that is way too      */
                /* large.  Sometimes we're lucky and the process just dies ...      */
                /* There seems to be a similar issue with some other memory         */
                /* allocated by the dynamic loader.                                 */
                /* This should be avoidable by either:                              */
                /* - Defining USE_PROC_FOR_LIBRARIES here.                          */
                /*   That performs very poorly, precisely because we end up         */
                /*   scanning cached stacks.                                        */
                /* - Have calloc look at its callers.                               */
                /*   In spite of the fact that it is gross and disgusting.          */
                /* In fact neither seems to suffice, probably in part because       */
                /* even with USE_PROC_FOR_LIBRARIES, we don't scan parts of stack   */
                /* segments that appear to be out of bounds.  Thus we actually      */
                /* do both, which seems to yield the best results.                  */
            
            #   define USE_PROC_FOR_LIBRARIES
            #endif

      * TODO

            # if defined(GC_LINUX_THREADS) && defined(REDIRECT_MALLOC) \
                 && !defined(INCLUDE_LINUX_THREAD_DESCR)
                /* Will not work, since libc and the dynamic loader use thread      */
                /* locals, sometimes as the only reference.                         */
            #   define INCLUDE_LINUX_THREAD_DESCR
            # endif

      * TODO

            # if defined(UNIX_LIKE) && defined(THREADS) && !defined(NO_CANCEL_SAFE) \
                 && !defined(PLATFORM_ANDROID)
                /* Make the code cancellation-safe.  This basically means that we   */
                /* ensure that cancellation requests are ignored while we are in    */
                /* the collector.  This applies only to Posix deferred cancellation;*/
                /* we don't handle Posix asynchronous cancellation.                 */
                /* Note that this only works if pthread_setcancelstate is           */
                /* async-signal-safe, at least in the absence of asynchronous       */
                /* cancellation.  This appears to be true for the glibc version,    */
                /* though it is not documented.  Without that assumption, there     */
                /* seems to be no way to safely wait in a signal handler, which     */
                /* we need to do for thread suspension.                             */
                /* Also note that little other code appears to be cancellation-safe.*/
                /* Hence it may make sense to turn this off for performance.        */
            #   define CANCEL_SAFE
            # endif

      * `CAN_SAVE_CALL_ARGS` vs. -fomit-frame-pointer now being on by
        default for Linux x86 IIRC?  (Which is an [[!taglink
        open_issue_gcc]] for not including us.)

      * TODO

            # if defined(REDIRECT_MALLOC) && defined(THREADS) && !defined(LINUX)
            #   error "REDIRECT_MALLOC with THREADS works at most on Linux."
            # endif


    *HURD* has:

      * `#define STACK_GROWS_DOWN`

      * `#define HEURISTIC2`

        Defined instead of `STACKBOTTOM` to have the value probed.

        Linux also has this:

            #if defined(LINUX_STACKBOTTOM) && defined(NO_PROC_STAT) \
                && !defined(USE_LIBC_PRIVATES)
                /* This combination will fail, since we have no way to get  */
                /* the stack base.  Use HEURISTIC2 instead.                 */
            #   undef LINUX_STACKBOTTOM
            #   define HEURISTIC2
                /* This may still fail on some architectures like IA64.     */
                /* We tried ...                                             */
            #endif

        Being on [[glibc]], we could perhaps do similar as `USE_LIBC_PRIVATES`
        instead of `HEURISTIC2`.  Pro: avoid `SIGSEGV` (and general fragility)
        during probing at startup (if I'm understanding this correctly).  Con:
        rely on glibc internals.  Or we instead add support to parse
        [[`/proc/`|hurd/translator/procfs]] (can even use the same as Linux?),
        or use some other interface.  [[!tag open_issue_glibc]]

      * `#define SIG_SUSPEND SIGUSR1`, `#define SIG_THR_RESTART SIGUSR2`

      * We don't `#define MPROTECT_VDB` (WIP comment); but Linux neither.

      * Where does our `GETPAGESIZE` come from?  Should we `#include
        <unistd.h>` like it is done for *LINUX*?

  * `include/gc_pthread_redirects.h`

      * TODO

        Cancellation stuff is Linux-only.  In other places, too.

  * `mach_dep.c`

      * `#define NO_GETCONTEXT`

        [[!taglink open_issue_glibc]], but this is not a real problem here,
        because we can use the following GCC internal function without much
        overhead:

      * `GC_with_callee_saves_pushed`

        The `HAVE_BUILTIN_UNWIND_INIT` case is ours.

  * `os_dep.c`

      * `read`

        Sure that it doesn't internally (in [[glibc]]) use `malloc`.  Probably
        only / mostly (?) a problem for `--enable-redirect-malloc`
        configurations?  Linux with threads uses `readv`.

      * TODO.

  * `dyn_load.c`

    For `DYNAMIC_LOADING`.  TODO.

  * `pthread_support.c`, `pthread_stop_world.c`

    TODO.

  * TODO.

    Other files also contain *LINUX* and other conditionals.

  * `libatomic_ops/`

      * `configure.ac`

        Nothing.

      * `Makefile`, `src/Makefile`, `src/atomic_ops/Makefile`,
        `src/atomic_ops/sysdeps/Makefile`, `doc/Makefile`, `tests/Makefile`

        Nothing.

      * `src/atomic_ops/sysdeps/gcc/x86.h`

        Nothing.

  * b8b65e8a5c2c4896728cd00d008168a6293f55b1 configure.ac probably not all
    correct.

  * `mmap`, b64dd3bc1e5a23e677c96b478d55648a0730ab75

  * `parallel mark`, 07c2b8e455c9e70d1f173475bbf1196320812154, pass
    `--disable-parallel-mark` or enable for us, too?

  * `HANDLE_FORK`, e9b11b6655c45ad3ab3326707aa31567a767134b,
    806d656802a1e3c2b55cd9e4530c6420340886c9,
    1e882b98c2cf9479a9cd08a67439dab7f9622924

  * Check `include/private/thread_local_alloc.h` re
    `USE_COMPILER_TLS`/`USE_PTHREAD_SPECIFIC`.


# Build

Here's a log of a binutils build run; this is from the
5f492b98dd131bdd6c67eb56c31024420c1e7dab (2012-06-08) sources, and for
`libatomic_ops` for the 6a0afde033f105c6320f1409162e3765a1395bfd (2012-05-15)
sources, run on kepler.SCHWINGE and coulomb.SCHWINGE.

    $ export LC_ALL=C
    $ (cd ../master/ && ln -sfn ../libatomic_ops/master libatomic_ops)
    $ (cd ../master/ && autoreconf -vfi)
    $ ../master/configure --prefix="$PWD".install SHELL=/bin/bash CC=gcc-4.6 CXX=g++-4.6 --enable-cplusplus --enable-gc-debug --enable-gc-assertions --enable-assertions 2>&1 | tee log_build
    [...]
    $ make 2>&1 | tee log_build_
    [...]

Different hosts may default to different shells and compiler versions; thus
harmonized.  Using bash instead of dash as otherwise libtool explodes.

This takes up around X MiB, and needs roughly X min on kepler.SCHWINGE and
X min on coulomb.SCHWINGE.

<!--

    $ (make && touch .go-install) 2>&1 | tee log_build_ && test -f .go-install && (make install && touch .go-check) 2>&1 | tee log_install && test -f .go-check && { make -k check 2>&1 | tee log_check; (cd libatomic_ops/ && make -k check) 2>&1 | tee log_check_; }

-->

## Analysis

    $ ssh kepler.SCHWINGE 'cd tmp/source/boehm-gc/ && cat master.build/log_build* | sed -e "s%\(/media/data\)\?${PWD}%[...]%g"' > toolchain/logs/boehm-gc/linux/log_build
    $ ssh coulomb.SCHWINGE 'cd tmp/boehm-gc/ && cat master.build/log_build* | sed -e "s%\(/media/erich\)\?${PWD}%[...]%g"' > toolchain/logs/boehm-gc/hurd/log_build
    $ diff -wu <(sed -f toolchain/logs/boehm-gc/linux/log_build.sed < toolchain/logs/boehm-gc/linux/log_build) <(sed -f toolchain/logs/boehm-gc/hurd/log_build.sed < toolchain/logs/boehm-gc/hurd/log_build) > toolchain/logs/boehm-gc/log_build.diff

  * only GNU/Linux: `configure: WARNING: "Explicit GC_INIT() calls may be
    required."`

  * only GNU/Linux: `configure: WARNING: "Client must not use
    -fomit-frame-pointer."`


# Install

    $ make install 2>&1 | tee log_install
    [...]

This takes up around X MiB, and needs roughly X min on kepler.SCHWINGE and X
min on coulomb.SCHWINGE.


## Analysis

    $ ssh kepler.SCHWINGE 'cd tmp/source/boehm-gc/ && cat master.build/log_install | sed -e "s%\(/media/data\)\?${PWD}%[...]%g"' > toolchain/logs/boehm-gc/linux/log_install
    $ ssh coulomb.SCHWINGE 'cd tmp/boehm-gc/ && cat master.build/log_install | sed -e "s%\(/media/erich\)\?${PWD}%[...]%g"' > toolchain/logs/boehm-gc/hurd/log_install
    $ diff -wu toolchain/logs/boehm-gc/linux/log_install toolchain/logs/boehm-gc/hurd/log_install > toolchain/logs/boehm-gc/log_install.diff


# Testsuite

    $ make -k check
    [...]
    $ (cd libatomic_ops/ && make -k check)
    [...]

This needs roughly X min on kepler.SCHWINGE and X min on coulomb.SCHWINGE.


## Analysis

    $ ssh kepler.SCHWINGE 'cd tmp/source/boehm-gc/ && cat master.build/log_check* | sed -e "s%\(/media/data\)\?${PWD}%[...]%g"' > toolchain/logs/boehm-gc/linux/log_check
    $ ssh coulomb.SCHWINGE 'cd tmp/boehm-gc/ && cat master.build/log_check* | sed -e "s%\(/media/erich\)\?${PWD}%[...]%g"' > toolchain/logs/boehm-gc/hurd/log_check
    $ diff -wu <(sed -f toolchain/logs/boehm-gc/linux/log_check.sed < toolchain/logs/boehm-gc/linux/log_check) <(sed -f toolchain/logs/boehm-gc/hurd/log_check.sed < toolchain/logs/boehm-gc/hurd/log_check) > toolchain/logs/boehm-gc/log_check.diff

There are different configurations possible, but in general, the testsuite
restults of GNU/Linux and GNU/Hurd look very similar.

  * GNU/Hurd is missing `Call chain at allocation: [...]` output.

    `os_dep.c`:`GC_print_callers`


# TODO

  * Port stuff to [[GCC]], and test it there.

  * What are other applications to test Boehm GC?  Also especially in
    combination with [[/libpthread]] and dynamic loading of shared libraries?

      * There are patches (apparently not committed) that GCC itself can use
        it, too: <http://gcc.gnu.org/wiki/Garbage_collection_tuning>.

      * There's been some talking about it on GNU guile mailing lists, and two
        Git branches (2010-12-15: last change 2009-09).

      * <http://www.hpl.hp.com/personal/Hans_Boehm/gc/#users>


## IRC, OFTC, #debian-hurd, 2012-02-05

[[!tag open_issue_porting]]

    <pinotree> youpi: i think i found out the possible cause of the ecl and
      mono issuess
    <pinotree> -s
    <youpi> oh
    <pinotree> basically, we don't have the realtime signals (so no
      SIGRTMIN/SIGRTMAX defined), hence things use either SIGUSR1 or
      SIGUSR2... which are used in libgc to resp. stop/resume threads when
      "collecting"
    <pinotree> i just patched ecl to use SIGINFO instead of SIGUSR1 (used when
      no SIGRTMIN+2 is available), and it seems going on for a while
    <youpi> uh, why would SIGINFO work better than SIGUSR1?
    <pinotree> it was a test, i tried the first "not common" signal i saw
    <pinotree> my test was, use any signal different than USR1/2
    <youpi> ah, sorry, I hadn't understood
    <youpi> you mean there's a conflict between ecl and mono using SIGUSR1, as
      well as libgc?
    <pinotree> yes
    <pinotree> for example, in ecl sources see src/c/unixint.d,
      install_process_interrupt_handler()
    <youpi> SIGINFO seems a sane choice
    <youpi> SIGPWR could have been a better choice if it was available :)
    <pinotree> i would have chose an "unassigned" number, say SIGLOST (the
      bigger one) + 10, but it would be greater than _NSIG (and thus discarded)
    <youpi> not a good idea indeed
    <pinotree> it seems that linux, beside the range for rt signals, has some
      "free space"
    <pinotree> i'll start now another ecl build, from scratch this time, with
      s/SIGUSR1/SIGINFO/ (making sure ctags won't bother), and if it works i'll
      update svante's bug

    <pinotree> mmap(...PROT_NONE...) failed
    <pinotree> hmm...
    <pinotree> apparently enabling MMAP_ANON in mono's libgc copy was a good
      step, let's see


### IRC, OFTC, #debian-hurd, 2012-03-18

    <pinotree> youpi: mono is afflicted by the SIGUSR1/2 conflict with libgc
    <youpi> pinotree: didn't we have a solution for that?
    <pinotree> well, it works just for one signal
    <pinotree> the ideal solution would be having a range for RT signals, and
      make libgc use RTMIN+5/6, like done on most of other OSes
    <youpi> but we don't have RT signals, do we?
    <pinotree> right :(


### IRC, freenode, #hurd, 2012-03-21

    <pinotree> civodul: given we have to realtime signals (so no range of
      signals for them), libgc uses SIGUSR1/2 instead of using SIGRTMIN+5/6 for
      its thread synchronization stuff
    <pinotree> civodul: which means that if an application using libgc then
      sets its own handlers for either of SIGUSR1/2, hell breaks
    <civodul> pinotree: ok
    <civodul> pinotree: is it a Debian-specific change, or included upstream?
    <pinotree> libgc using SIGUSR1/2? upstream
    <civodul> ok