summaryrefslogtreecommitdiff
path: root/open_issues/crash_server.mdwn
blob: 7ed4afbf62bd3fff9442d3fbbcec460b410b9c08 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
[[!meta copyright="Copyright © 2009, 2010, 2011 Free Software Foundation,
Inc."]]

[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
id="license" text="Permission is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation License, Version 1.2 or
any later version published by the Free Software Foundation; with no Invariant
Sections, no Front-Cover Texts, and no Back-Cover Texts.  A copy of the license
is included in the section entitled [[GNU Free Documentation
License|/fdl]]."]]"""]]

[[!tag open_issue_hurd]]

Given an `a.out` executable that only does `raise (SIGABRT)`, invoking that
one...

  * ... against `crash-dump-core` will...

      * ... not overwrite existing `core` files.

        Is this reasonable?  Linux does overwrite them, for example.

      * ... show big variances in running-time behavior:

            $ TIMEFORMAT='real %R user %U system %S'
            $ rm -f core; time env CRASHSERVER=/servers/crash-dump-core ./a.out; ls -l core
            Aborted (core dumped)
            real 1.350 user 0.000 system 0.010
            -rw------- 1 tschwinge tschwinge 17031168 Jul  7 21:59 core
            $ rm -f core; time env CRASHSERVER=/servers/crash-dump-core ./a.out; ls -l core
            Aborted (core dumped)
            real 22.771 user 0.000 system 0.010
            -rw------- 1 tschwinge tschwinge 17031168 Jul  7 21:59 core
            $ rm -f core; time env CRASHSERVER=/servers/crash-dump-core ./a.out; ls -l core
            Aborted (core dumped)
            real 1.367 user 0.000 system 0.010
            -rw------- 1 tschwinge tschwinge 17031168 Jul  7 22:00 core
            $ rm -f core; time env CRASHSERVER=/servers/crash-dump-core ./a.out; ls -l core
            Aborted (core dumped)
            real 5.789 user 0.000 system 0.010
            -rw------- 1 tschwinge tschwinge 17031168 Jul  7 22:00 core
            $ rm -f core; time env CRASHSERVER=/servers/crash-dump-core ./a.out; ls -l core
            Aborted (core dumped)
            real 22.664 user 0.010 system 0.000
            -rw------- 1 tschwinge tschwinge 17031168 Jul  7 22:01 core

      * ... produce a huge `core` file:

            $ du -hs core 
            17M     core

        On Linux, the `core` file occupies 76 KiB of disk space, which seems
        much more reasonable. This is possibly related with the default 128MiB
        heap preallocation.

      * ... does not always produce a useful backtrace:

        `abort();`

            $ gdb test core
            warning: core file may not match specified executable file.
            [New Thread 86678]
            warning: Wrong size fpregset in core file.
            ...
            Core was generated by `./test'.
            Program terminated with signal 6, Aborted.
            warning: Wrong size fpregset in core file.
            (gdb) bt
            #0  0x00000000 in ?? ()
            #1  0x011f593f in __msg_sig_post (process=72, signal=6, sigcode=0, refport=1)
                at /build/buildd-eglibc_2.10.2-7-hurd-i386-iGL6op/eglibc-2.10.2/build-tree/hurd-i386-libc/hurd/RPC_msg_sig_post.c:144
            #2  0x0109a433 in kill_port (pid=<value optimized out>)
                at ../sysdeps/mach/hurd/kill.c:68
            #3  kill_pid (pid=<value optimized out>) at ../sysdeps/mach/hurd/kill.c:105
            #4  0x0109a69f in __kill (pid=21142, sig=6) at ../sysdeps/mach/hurd/kill.c:139
            #5  0x01099af6 in raise (sig=6) at ../sysdeps/posix/raise.c:27
            #6  0x0109de59 in abort () at abort.c:88
            #7  0x0804849f in main ()

        `char *foo = 0; *foo = 1;`

            $ gdb test core
            Program terminated with signal 11, Segmentation fault.
            warning: Wrong size fpregset in core file.
            #0  0x00000000 in ?? ()
            (gdb) bt
            #0  0x00000000 in ?? ()
            #1  0x0108565b in __libc_start_main (main=0x8048464 <main>, argc=1, ubp_av=0x1023e64, 
                init=0x8048490 <__libc_csu_init>, fini=0x8048480 <__libc_csu_fini>, rtld_fini=0xea20 <_dl_fini>, 
                stack_end=0x1023e5c) at libc-start.c:251
            #2  0x080483d1 in _start ()

        `raise (SIGABRT);`

            $ gdb a.out core
            warning: core file may not match specified executable file.
            [New Thread 76651]
            
            warning: Wrong size fpregset in core file.
            Reading symbols from /lib/libc.so.0.3...[...]
            Core was generated by `./a.out'.
            Program terminated with signal 6, Aborted.
            
            warning: Wrong size fpregset in core file.
            #0  0x00000000 in ?? ()
            (gdb) bt
            #0  0x00000000 in ?? ()
            Cannot access memory at address 0x17

        [[!tag open_issue_gdb]] Probably [[GDB]] doesn't manage to dig in the stack properly.

  * ... against `crash-suspend` will...

      * ... not work at all:
    
            $ CRASHSERVER=/servers/crash-suspend ./a.out
            $ [returns to the shell and doesn't suspended]

      * ... show big variances in running-time behavior:
    
            $ TIMEFORMAT='real %R user %U system %S'
            $ rm -f core; time env CRASHSERVER=/servers/crash-suspend ./a.out; ls -l core
            Aborted (core dumped)
            real 1.381 user 0.000 system 0.010
            -rw------- 1 tschwinge tschwinge 17031168 Jul  7 22:04 core
            $ rm -f core; time env CRASHSERVER=/servers/crash-suspend ./a.out; ls -l core
            Aborted (core dumped)
            real 1.332 user 0.000 system 0.010
            -rw------- 1 tschwinge tschwinge 17031168 Jul  7 22:04 core
            $ rm -f core; time env CRASHSERVER=/servers/crash-suspend ./a.out; ls -l core
            Aborted (core dumped)
            real 21.228 user 0.000 system 0.010
            -rw------- 1 tschwinge tschwinge 17031168 Jul  7 22:04 core
            $ rm -f core; time env CRASHSERVER=/servers/crash-suspend ./a.out; ls -l core
            Aborted (core dumped)
            real 1.323 user 0.000 system 0.010
            -rw------- 1 tschwinge tschwinge 17031168 Jul  7 22:05 core
            $ rm -f core; time env CRASHSERVER=/servers/crash-suspend ./a.out; ls -l core
            Aborted (core dumped)
            real 22.279 user 0.000 system 0.010
            -rw------- 1 tschwinge tschwinge 17031168 Jul  7 22:05 core
            $ rm -f core; time env CRASHSERVER=/servers/crash-suspend ./a.out; ls -l core
            Aborted (core dumped)
            real 1.362 user 0.000 system 0.000
            -rw------- 1 tschwinge tschwinge 17031168 Jul  7 22:08 core
            $ rm -f core; time env CRASHSERVER=/servers/crash-suspend ./a.out; ls -l core
            Aborted (core dumped)
            real 21.110 user 0.000 system 0.000
            -rw------- 1 tschwinge tschwinge 17031168 Jul  7 22:08 core
            $ rm -f core; time env CRASHSERVER=/servers/crash-suspend ./a.out; ls -l core
            Aborted (core dumped)
            real 1.350 user 0.000 system 0.020
            -rw------- 1 tschwinge tschwinge 17031168 Jul  7 22:08 core

      * ... can reliably crash GNU Mach:

        This happens if a `core` file is already present (and won't get
        overwritten; see above).  I reproduced this three times.

            $ TIMEFORMAT='real %R user %U system %S'
            $ time env CRASHSERVER=/servers/crash-suspend ./a.out; ls -l core
            Aborted
            real 2.856 user 0.000 system 0.010
            -rw------- 1 tschwinge tschwinge 17031168 Jul  7 22:08 core

            panic: zalloc: zone kalloc.8192 exhausted
            Kernel Breakpoint trap, eip 0x20020a77
            Stopped at  0x20020a76: int     $3
            db> trace
            0x20020a76(2006aba8,4d0f7e9c,200209b0,0,0)
            0x20020a4d(2006b094,2006ae40,2000,20016803,4a5f4114)
            0x2002bca5(49a03564,1,0,9,1000)
            0x20022f4c(2000,4a5f45d4,4a84879c,49a46564,4ac43e78)
            0x20021e65(4ac43e78,4a5f45d4,4a5f4114,0,0)
            0x2005309d(2106ba9c,3,38,28,1783)
            Bad frame pointer: 0x2106ba78

            $ addr2line -i -f -e /boot/gnumach-xen 0x20020a76 0x20020a4d 0x2002bca5 0x20022f4c 0x20021e65 0x2005309d
            Debugger
            /home/tschwinge/tmp/gnumach/gnumach-1-branch-Xen-branch.build/../gnumach-1-branch-Xen-branch/kern/debug.c:105
            panic
            /home/tschwinge/tmp/gnumach/gnumach-1-branch-Xen-branch.build/../gnumach-1-branch-Xen-branch/kern/debug.c:148
            zalloc
            /home/tschwinge/tmp/gnumach/gnumach-1-branch-Xen-branch.build/../gnumach-1-branch-Xen-branch/kern/zalloc.c:470
            kalloc
            /home/tschwinge/tmp/gnumach/gnumach-1-branch-Xen-branch.build/../gnumach-1-branch-Xen-branch/kern/kalloc.c:185
            ipc_kobject_server
            /home/tschwinge/tmp/gnumach/gnumach-1-branch-Xen-branch.build/../gnumach-1-branch-Xen-branch/kern/ipc_kobject.c:76
            mach_msg_trap
            /home/tschwinge/tmp/gnumach/gnumach-1-branch-Xen-branch.build/../gnumach-1-branch-Xen-branch/ipc/mach_msg.c:1367

---

If someone is working in this area, they may want to have a look at
[[GDB_gcore]], and port <http://code.google.com/p/google-coredumper/>, too.