diff options
Diffstat (limited to 'open_issues/boehm_gc.mdwn')
-rw-r--r-- | open_issues/boehm_gc.mdwn | 553 |
1 files changed, 553 insertions, 0 deletions
diff --git a/open_issues/boehm_gc.mdwn b/open_issues/boehm_gc.mdwn new file mode 100644 index 00000000..2913eea8 --- /dev/null +++ b/open_issues/boehm_gc.mdwn @@ -0,0 +1,553 @@ +[[!meta copyright="Copyright © 2010, 2012, 2013, 2014 Free Software Foundation, +Inc."]] + +[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable +id="license" text="Permission is granted to copy, distribute and/or modify this +document under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; with no Invariant +Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license +is included in the section entitled [[GNU Free Documentation +License|/fdl]]."]]"""]] + +Here's what's to be done for maintaining Boehm GC. + +This one does need Hurd-specific configuration. + +It is, for example, used by [[/GCC]] (which has its own fork), so any changes +committed upstream should very like also be made there. + +[[!toc levels=2]] + + +# [[General information|/boehm_gc]] + + +# Configuration + +<!-- + +git checkout reviewed +git log --reverse --pretty=fuller --stat=$COLUMNS,$COLUMNS -w -p -C --cc ..upstream/master +-i +/^commit |^---$|hurd|linux|glibc + +--> + +Last reviewed up to the 5f492b98dd131bdd6c67eb56c31024420c1e7dab (2012-06-08) +sources, and for `libatomic_ops` to the +6a0afde033f105c6320f1409162e3765a1395bfd (2012-05-15) sources. + + * `configure.ac` + + * `PARALLEL_MARK` is not enabled; doesn't make sense so far. + + * `*-*-kfreebsd*-gnu` defines `USE_COMPILER_TLS`. What's this, and + why does not other config? + + * TODO + + [ if test "$enable_gc_debug" = "yes"; then + AC_MSG_WARN("Should define GC_DEBUG and use debug alloc. in clients.") + AC_DEFINE([KEEP_BACK_PTRS], 1, + [Define to save back-pointers in debugging headers.]) + keep_back_ptrs=true + AC_DEFINE([DBG_HDRS_ALL], 1, + [Define to force debug headers on all objects.]) + case $host in + x86-*-linux* | i586-*-linux* | i686-*-linux* | x86_64-*-linux* ) + AC_DEFINE(MAKE_BACK_GRAPH) + AC_MSG_WARN("Client must not use -fomit-frame-pointer.") + AC_DEFINE(SAVE_CALL_COUNT, 8) + ;; + AM_CONDITIONAL([KEEP_BACK_PTRS], [test x"$keep_back_ptrs" = xtrue]) + + * `configure.host` + + Nothing. + + * `Makefile.am`, `include/include.am`, `cord/cord.am`, `doc/doc.am`, + `tests/tests.am` + + Nothing. + + * `include/gc_config_macros.h` + + Should be OK. + + * `include/private/gcconfig.h` + + Hairy. But should be OK. Search for *HURD*, compare to *LINUX*, + *I386* case. + + See `doc/porting.html` and `doc/README.macros` (and others) for + documentation. + + *LINUX* has: + + * `#define LINUX_STACKBOTTOM` + + Defined instead of `STACKBOTTOM` to have the value read from `/proc/`. + + * `#define HEAP_START (ptr_t)0x1000` + + May want to define it for us, too? + + * `#ifdef USE_I686_PREFETCH`, `USE_3DNOW_PREFETCH` --- [...] + + Apparently these are optimization that we also could use. Have a + look at *LINUX* for *X86_64*, which uses `__builtin_prefetch` + (which Linux x86 could use, too?). + + * TODO + + #if defined(LINUX) && defined(USE_MMAP) + /* The kernel may do a somewhat better job merging mappings etc. */ + /* with anonymous mappings. */ + # define USE_MMAP_ANON + #endif + + * TODO + + #if defined(GC_LINUX_THREADS) && defined(REDIRECT_MALLOC) + /* Nptl allocates thread stacks with mmap, which is fine. But it */ + /* keeps a cache of thread stacks. Thread stacks contain the */ + /* thread control blocks. These in turn contain a pointer to */ + /* (sizeof (void *) from the beginning of) the dtv for thread-local */ + /* storage, which is calloc allocated. If we don't scan the cached */ + /* thread stacks, we appear to lose the dtv. This tends to */ + /* result in something that looks like a bogus dtv count, which */ + /* tends to result in a memset call on a block that is way too */ + /* large. Sometimes we're lucky and the process just dies ... */ + /* There seems to be a similar issue with some other memory */ + /* allocated by the dynamic loader. */ + /* This should be avoidable by either: */ + /* - Defining USE_PROC_FOR_LIBRARIES here. */ + /* That performs very poorly, precisely because we end up */ + /* scanning cached stacks. */ + /* - Have calloc look at its callers. */ + /* In spite of the fact that it is gross and disgusting. */ + /* In fact neither seems to suffice, probably in part because */ + /* even with USE_PROC_FOR_LIBRARIES, we don't scan parts of stack */ + /* segments that appear to be out of bounds. Thus we actually */ + /* do both, which seems to yield the best results. */ + + # define USE_PROC_FOR_LIBRARIES + #endif + + * TODO + + # if defined(GC_LINUX_THREADS) && defined(REDIRECT_MALLOC) \ + && !defined(INCLUDE_LINUX_THREAD_DESCR) + /* Will not work, since libc and the dynamic loader use thread */ + /* locals, sometimes as the only reference. */ + # define INCLUDE_LINUX_THREAD_DESCR + # endif + + * TODO + + # if defined(UNIX_LIKE) && defined(THREADS) && !defined(NO_CANCEL_SAFE) \ + && !defined(PLATFORM_ANDROID) + /* Make the code cancellation-safe. This basically means that we */ + /* ensure that cancellation requests are ignored while we are in */ + /* the collector. This applies only to Posix deferred cancellation;*/ + /* we don't handle Posix asynchronous cancellation. */ + /* Note that this only works if pthread_setcancelstate is */ + /* async-signal-safe, at least in the absence of asynchronous */ + /* cancellation. This appears to be true for the glibc version, */ + /* though it is not documented. Without that assumption, there */ + /* seems to be no way to safely wait in a signal handler, which */ + /* we need to do for thread suspension. */ + /* Also note that little other code appears to be cancellation-safe.*/ + /* Hence it may make sense to turn this off for performance. */ + # define CANCEL_SAFE + # endif + + * `CAN_SAVE_CALL_ARGS` vs. -fomit-frame-pointer now being on by + default for Linux x86 IIRC? (Which is an [[!taglink + open_issue_gcc]] for not including us.) + + * TODO + + # if defined(REDIRECT_MALLOC) && defined(THREADS) && !defined(LINUX) + # error "REDIRECT_MALLOC with THREADS works at most on Linux." + # endif + + + *HURD* has: + + * `#define STACK_GROWS_DOWN` + + * `#define HEURISTIC2` + + Defined instead of `STACKBOTTOM` to have the value probed. + + Linux also has this: + + #if defined(LINUX_STACKBOTTOM) && defined(NO_PROC_STAT) \ + && !defined(USE_LIBC_PRIVATES) + /* This combination will fail, since we have no way to get */ + /* the stack base. Use HEURISTIC2 instead. */ + # undef LINUX_STACKBOTTOM + # define HEURISTIC2 + /* This may still fail on some architectures like IA64. */ + /* We tried ... */ + #endif + + Being on [[glibc]], we could perhaps do similar as `USE_LIBC_PRIVATES` + instead of `HEURISTIC2`. Pro: avoid `SIGSEGV` (and general fragility) + during probing at startup (if I'm understanding this correctly). Con: + rely on glibc internals. Or we instead add support to parse + [[`/proc/`|hurd/translator/procfs]] (can even use the same as Linux?), + or use some other interface. [[!tag open_issue_glibc]] + This is also likely the issue causing the GDB [[!tag open_issue_gdb]] + `GC_find_limit_with_bound` SIGSEGV startup confusion described in + [[binutils]]. + + * `#define SIG_SUSPEND SIGUSR1`, `#define SIG_THR_RESTART SIGUSR2` + + * We don't `#define MPROTECT_VDB` (WIP comment); but Linux neither. + + * Where does our `GETPAGESIZE` come from? Should we `#include + <unistd.h>` like it is done for *LINUX*? + + * `include/gc_pthread_redirects.h` + + * TODO + + Cancellation stuff is Linux-only. In other places, too. + + * `mach_dep.c` + + * `#define NO_GETCONTEXT` + + [[!taglink open_issue_glibc]], but this is not a real problem here, + because we can use the following GCC internal function without much + overhead: + + * `GC_with_callee_saves_pushed` + + The `HAVE_BUILTIN_UNWIND_INIT` case is ours. + + * `os_dep.c` + + * `read` + + Sure that it doesn't internally (in [[glibc]]) use `malloc`. Probably + only / mostly (?) a problem for `--enable-redirect-malloc` + configurations? Linux with threads uses `readv`. + + * TODO. + + * `dyn_load.c` + + For `DYNAMIC_LOADING`. TODO. + + * `pthread_support.c`, `pthread_stop_world.c` + + TODO. + + * TODO. + + Other files also contain *LINUX* and other conditionals. + + * `libatomic_ops/` + + * `configure.ac` + + Nothing. + + * `Makefile`, `src/Makefile`, `src/atomic_ops/Makefile`, + `src/atomic_ops/sysdeps/Makefile`, `doc/Makefile`, `tests/Makefile` + + Nothing. + + * `src/atomic_ops/sysdeps/gcc/x86.h` + + Nothing. + + * b8b65e8a5c2c4896728cd00d008168a6293f55b1 configure.ac probably not all + correct. + + * `mmap`, b64dd3bc1e5a23e677c96b478d55648a0730ab75 + + * `parallel mark`, 07c2b8e455c9e70d1f173475bbf1196320812154, pass + `--disable-parallel-mark` or enable for us, too? + + * `HANDLE_FORK`, e9b11b6655c45ad3ab3326707aa31567a767134b, + 806d656802a1e3c2b55cd9e4530c6420340886c9, + 1e882b98c2cf9479a9cd08a67439dab7f9622924 + + * Check `include/private/thread_local_alloc.h` re + `USE_COMPILER_TLS`/`USE_PTHREAD_SPECIFIC`. + + +# Build + +Here's a log of a binutils build run; this is from the +5f492b98dd131bdd6c67eb56c31024420c1e7dab (2012-06-08) sources, and for +`libatomic_ops` for the 6a0afde033f105c6320f1409162e3765a1395bfd (2012-05-15) +sources, run on kepler.SCHWINGE and coulomb.SCHWINGE. + + $ export LC_ALL=C + $ (cd ../master/ && ln -sfn ../libatomic_ops/master libatomic_ops) + $ (cd ../master/ && autoreconf -vfi) + $ ../master/configure --prefix="$PWD".install SHELL=/bin/bash CC=gcc-4.6 CXX=g++-4.6 --enable-cplusplus --enable-gc-debug --enable-gc-assertions --enable-assertions 2>&1 | tee log_build + [...] + $ make 2>&1 | tee log_build_ + [...] + +Different hosts may default to different shells and compiler versions; thus +harmonized. Using bash instead of dash as otherwise libtool explodes. + +This takes up around X MiB, and needs roughly X min on kepler.SCHWINGE and +X min on coulomb.SCHWINGE. + +<!-- + + $ (make && touch .go-install) 2>&1 | tee log_build_ && test -f .go-install && (make install && touch .go-check) 2>&1 | tee log_install && test -f .go-check && { make -k check 2>&1 | tee log_check; (cd libatomic_ops/ && make -k check) 2>&1 | tee log_check_; } + +--> + +## Analysis + + $ ssh kepler.SCHWINGE 'cd tmp/source/boehm-gc/ && cat master.build/log_build* | sed -e "s%\(/media/data\)\?${PWD}%[...]%g"' > toolchain/logs/boehm-gc/linux/log_build + $ ssh coulomb.SCHWINGE 'cd tmp/boehm-gc/ && cat master.build/log_build* | sed -e "s%\(/media/erich\)\?${PWD}%[...]%g"' > toolchain/logs/boehm-gc/hurd/log_build + $ diff -wu <(sed -f toolchain/logs/boehm-gc/linux/log_build.sed < toolchain/logs/boehm-gc/linux/log_build) <(sed -f toolchain/logs/boehm-gc/hurd/log_build.sed < toolchain/logs/boehm-gc/hurd/log_build) > toolchain/logs/boehm-gc/log_build.diff + + * only GNU/Linux: `configure: WARNING: "Explicit GC_INIT() calls may be + required."` + + * only GNU/Linux: `configure: WARNING: "Client must not use + -fomit-frame-pointer."` + + +# Install + + $ make install 2>&1 | tee log_install + [...] + +This takes up around X MiB, and needs roughly X min on kepler.SCHWINGE and X +min on coulomb.SCHWINGE. + + +## Analysis + + $ ssh kepler.SCHWINGE 'cd tmp/source/boehm-gc/ && cat master.build/log_install | sed -e "s%\(/media/data\)\?${PWD}%[...]%g"' > toolchain/logs/boehm-gc/linux/log_install + $ ssh coulomb.SCHWINGE 'cd tmp/boehm-gc/ && cat master.build/log_install | sed -e "s%\(/media/erich\)\?${PWD}%[...]%g"' > toolchain/logs/boehm-gc/hurd/log_install + $ diff -wu toolchain/logs/boehm-gc/linux/log_install toolchain/logs/boehm-gc/hurd/log_install > toolchain/logs/boehm-gc/log_install.diff + + +# Testsuite + + $ make -k check + [...] + $ (cd libatomic_ops/ && make -k check) + [...] + +This needs roughly X min on kepler.SCHWINGE and X min on coulomb.SCHWINGE. + + +## Analysis + + $ ssh kepler.SCHWINGE 'cd tmp/source/boehm-gc/ && cat master.build/log_check* | sed -e "s%\(/media/data\)\?${PWD}%[...]%g"' > toolchain/logs/boehm-gc/linux/log_check + $ ssh coulomb.SCHWINGE 'cd tmp/boehm-gc/ && cat master.build/log_check* | sed -e "s%\(/media/erich\)\?${PWD}%[...]%g"' > toolchain/logs/boehm-gc/hurd/log_check + $ diff -wu <(sed -f toolchain/logs/boehm-gc/linux/log_check.sed < toolchain/logs/boehm-gc/linux/log_check) <(sed -f toolchain/logs/boehm-gc/hurd/log_check.sed < toolchain/logs/boehm-gc/hurd/log_check) > toolchain/logs/boehm-gc/log_check.diff + +There are different configurations possible, but in general, the testsuite +restults of GNU/Linux and GNU/Hurd look very similar. + + * GNU/Hurd is missing `Call chain at allocation: [...]` output. + + `os_dep.c`:`GC_print_callers` + + +# TODO + + * What are other applications to test Boehm GC? Also especially in + combination with [[/libpthread]] and dynamic loading of shared libraries? + + * There are patches (apparently not committed) that GCC itself can use + it, too: <http://gcc.gnu.org/wiki/Garbage_collection_tuning>. + + * There's been some talking about it on GNU guile mailing lists, and two + Git branches (2010-12-15: last change 2009-09). + + * <http://www.hpl.hp.com/personal/Hans_Boehm/gc/#users> + + +## IRC, OFTC, #debian-hurd, 2012-02-05 + +[[!tag open_issue_porting]] + + <pinotree> youpi: i think i found out the possible cause of the ecl and + mono issuess + <pinotree> -s + <youpi> oh + <pinotree> basically, we don't have the realtime signals (so no + SIGRTMIN/SIGRTMAX defined), hence things use either SIGUSR1 or + SIGUSR2... which are used in libgc to resp. stop/resume threads when + "collecting" + <pinotree> i just patched ecl to use SIGINFO instead of SIGUSR1 (used when + no SIGRTMIN+2 is available), and it seems going on for a while + <youpi> uh, why would SIGINFO work better than SIGUSR1? + <pinotree> it was a test, i tried the first "not common" signal i saw + <pinotree> my test was, use any signal different than USR1/2 + <youpi> ah, sorry, I hadn't understood + <youpi> you mean there's a conflict between ecl and mono using SIGUSR1, as + well as libgc? + <pinotree> yes + <pinotree> for example, in ecl sources see src/c/unixint.d, + install_process_interrupt_handler() + <youpi> SIGINFO seems a sane choice + <youpi> SIGPWR could have been a better choice if it was available :) + <pinotree> i would have chose an "unassigned" number, say SIGLOST (the + bigger one) + 10, but it would be greater than _NSIG (and thus discarded) + <youpi> not a good idea indeed + <pinotree> it seems that linux, beside the range for rt signals, has some + "free space" + <pinotree> i'll start now another ecl build, from scratch this time, with + s/SIGUSR1/SIGINFO/ (making sure ctags won't bother), and if it works i'll + update svante's bug + + <pinotree> mmap(...PROT_NONE...) failed + <pinotree> hmm... + <pinotree> apparently enabling MMAP_ANON in mono's libgc copy was a good + step, let's see + + +### IRC, OFTC, #debian-hurd, 2012-03-18 + + <pinotree> youpi: mono is afflicted by the SIGUSR1/2 conflict with libgc + <youpi> pinotree: didn't we have a solution for that? + <pinotree> well, it works just for one signal + <pinotree> the ideal solution would be having a range for RT signals, and + make libgc use RTMIN+5/6, like done on most of other OSes + <youpi> but we don't have RT signals, do we? + <pinotree> right :( + + +### IRC, freenode, #hurd, 2012-03-21 + + <pinotree> civodul: given we have to realtime signals (so no range of + signals for them), libgc uses SIGUSR1/2 instead of using SIGRTMIN+5/6 for + its thread synchronization stuff + <pinotree> civodul: which means that if an application using libgc then + sets its own handlers for either of SIGUSR1/2, hell breaks + <civodul> pinotree: ok + <civodul> pinotree: is it a Debian-specific change, or included upstream? + <pinotree> libgc using SIGUSR1/2? upstream + <civodul> ok + + +### IRC, freenode, #hurd, 2013-09-03 + + <congzhang> braunr: when will libc malloc say memory corruption? + <braunr> congzhang: usually on free + <braunr> sometimes on alloc + <congzhang> and after one thread be created + <congzhang> I want to know why and how to find the source + <congzhang> does libgc work well on hurd? + <braunr> i don't think it does + <congzhang> so , why it can't? + <braunr> congzhang: what ? + <congzhang> libgc was not work on hurd + <pinotree> why? + <congzhang> I try porting dotgnu + <braunr> ah + <braunr> nested signal handling + <congzhang> one program always receive Abort signal + <pinotree> and why it should be a problem in libgc? + <congzhang> for malloc memory corruption + <braunr> libgc relies on this + <congzhang> yes + <congzhang> so, is there a workaround to make it work? + <braunr> show the error please + <congzhang> http://paste.debian.net/34416/ + <pinotree> where's libgc? + <congzhang> i compile dotgnu with enable-gc + <pinotree> so? + <congzhang> I am not sure about it + <pinotree> so why did you say earlier that libgc doesn't work? + <congzhang> because after I see one thread was created notice by gdb, it + memory corruption + <pinotree> so what? + <congzhang> maybe gabage collection happen, and gc thread start + <pinotree> that's speculation + <pinotree> you cannot debug things speculating on code you don't know + <pinotree> less speculation and more in-deep debugging, please + * congzhang I try again, to check weather thread list changing + <congzhang> sorry for this + <braunr> it simply looks like a real memory corruption (an overflow) + <congzhang> maybe PATH related problem + <pinotree> PATH? + <congzhang> yes + <braunr> PATH_MAX + <braunr> but unlikely + <congzhang> csant do path traverse + <congzhang> I fond the macro + <congzhang> found + <congzhang> #if defined(__sun__) || defined(__BEOS__) + <congzhang> #define BROKEN_DIRENT 1 + <congzhang> #endif + <congzhang> and so for hurd? + <pinotree> BROKEN_DIRENT doesn't say much about what it does + <WhiteKIBA> nope + <WhiteKIBA> whoops + <congzhang> it seems other port meet the trouble too + <pinotree> which trouble? + <congzhang> http://comments.gmane.org/gmane.comp.gnu.dotgnu.developer/3642 + <congzhang> (gdb) ptype struct dirent + <congzhang> type = struct dirent { + <congzhang> __ino_t d_ino; + <congzhang> unsigned short d_reclen; + <congzhang> unsigned char d_type; + <congzhang> unsigned char d_namlen; + <congzhang> char d_name[1]; + <congzhang> } + <congzhang> + <congzhang> d_name should be char[PATH_MAX]? + <congzhang> and + http://libjit-linear-scan-register-allocator.googlecode.com/svn/trunk/pnet/support/dir.c + <pinotree> no + <braunr> stop pasting that much + <_d3f> uhm PATH_MAX on the hurd? + <braunr> and stop saying nonsense + <congzhang> sorry, i think four line was not worth to pastbin + <pinotree> they are 8 + <congzhang> never again + <braunr> just try by defining BROKEN_DIRENT to 1 in all cases and see how + it goes + * congzhang read dir.c again + <congzhang> braunr: it does not crash this time, I do more test + + +#### IRC, freenode, #hurd, 2013-09-04 + + <congzhang> hi, I am dotgnu work on hurd, and even winforms app + <congzhang> s/am/make + <congzhang> and maybe c# hello world translate another day :) + + +### IRC, freenode, #hurd, 2013-12-16 + + <braunr> gnu_srs: ah, libgc + <braunr> there are signal-related problems with libgc + + +## Leak Detection + +### IRC, freenode, #hurd, 2013-10-17 + + <teythoon> I spent the last two days integrating libgc - the boehm + conservative garbage collector - into hurd + <teythoon> it can be used in leak detection mode + <azeem> whoa, cool + <teythoon> and it actually kind of works, finds malloc leaks in translators + <braunr> i think there were problems with signal handling in libgc + <braunr> i'm not sure we support nested signal handling well + <teythoon> yes, I read about them + <teythoon> libgc uses SIGUSR1/2, so any program installing handlers on them + will break + <azeem> (which is not a problem on Linux, cause there some RT-signals or so + are used) + <teythoon> yes |