summaryrefslogtreecommitdiff
path: root/open_issues/crash_server.mdwn
blob: 5182df6f44f9d9fb7c4b1c8af3c30c1d0f8d16d8 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
[[!meta copyright="Copyright © 2009, 2010, 2011, 2013 Free Software Foundation,
Inc."]]

[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
id="license" text="Permission is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation License, Version 1.2 or
any later version published by the Free Software Foundation; with no Invariant
Sections, no Front-Cover Texts, and no Back-Cover Texts.  A copy of the license
is included in the section entitled [[GNU Free Documentation
License|/fdl]]."]]"""]]

[[!tag open_issue_hurd]]

Given an `a.out` executable that only does `raise (SIGABRT)`, invoking that
one...

  * ... against `crash-dump-core` will...

      * ... not overwrite existing `core` files.

        Is this reasonable?  Linux does overwrite them, for example.

      * ... show big variances in running-time behavior:

            $ TIMEFORMAT='real %R user %U system %S'
            $ rm -f core; time env CRASHSERVER=/servers/crash-dump-core ./a.out; ls -l core
            Aborted (core dumped)
            real 1.350 user 0.000 system 0.010
            -rw------- 1 tschwinge tschwinge 17031168 Jul  7 21:59 core
            $ rm -f core; time env CRASHSERVER=/servers/crash-dump-core ./a.out; ls -l core
            Aborted (core dumped)
            real 22.771 user 0.000 system 0.010
            -rw------- 1 tschwinge tschwinge 17031168 Jul  7 21:59 core
            $ rm -f core; time env CRASHSERVER=/servers/crash-dump-core ./a.out; ls -l core
            Aborted (core dumped)
            real 1.367 user 0.000 system 0.010
            -rw------- 1 tschwinge tschwinge 17031168 Jul  7 22:00 core
            $ rm -f core; time env CRASHSERVER=/servers/crash-dump-core ./a.out; ls -l core
            Aborted (core dumped)
            real 5.789 user 0.000 system 0.010
            -rw------- 1 tschwinge tschwinge 17031168 Jul  7 22:00 core
            $ rm -f core; time env CRASHSERVER=/servers/crash-dump-core ./a.out; ls -l core
            Aborted (core dumped)
            real 22.664 user 0.010 system 0.000
            -rw------- 1 tschwinge tschwinge 17031168 Jul  7 22:01 core

      * ... produce a huge `core` file:

            $ du -hs core 
            17M     core

        On Linux, the `core` file occupies 76 KiB of disk space, which seems
        much more reasonable. This is possibly related with the default 128MiB
        heap preallocation.

      * ... does not always produce a useful backtrace:

        `abort();`

            $ gdb test core
            warning: core file may not match specified executable file.
            [New Thread 86678]
            warning: Wrong size fpregset in core file.
            ...
            Core was generated by `./test'.
            Program terminated with signal 6, Aborted.
            warning: Wrong size fpregset in core file.
            (gdb) bt
            #0  0x00000000 in ?? ()
            #1  0x011f593f in __msg_sig_post (process=72, signal=6, sigcode=0, refport=1)
                at /build/buildd-eglibc_2.10.2-7-hurd-i386-iGL6op/eglibc-2.10.2/build-tree/hurd-i386-libc/hurd/RPC_msg_sig_post.c:144
            #2  0x0109a433 in kill_port (pid=<value optimized out>)
                at ../sysdeps/mach/hurd/kill.c:68
            #3  kill_pid (pid=<value optimized out>) at ../sysdeps/mach/hurd/kill.c:105
            #4  0x0109a69f in __kill (pid=21142, sig=6) at ../sysdeps/mach/hurd/kill.c:139
            #5  0x01099af6 in raise (sig=6) at ../sysdeps/posix/raise.c:27
            #6  0x0109de59 in abort () at abort.c:88
            #7  0x0804849f in main ()

        `char *foo = 0; *foo = 1;`

            $ gdb test core
            Program terminated with signal 11, Segmentation fault.
            warning: Wrong size fpregset in core file.
            #0  0x00000000 in ?? ()
            (gdb) bt
            #0  0x00000000 in ?? ()
            #1  0x0108565b in __libc_start_main (main=0x8048464 <main>, argc=1, ubp_av=0x1023e64, 
                init=0x8048490 <__libc_csu_init>, fini=0x8048480 <__libc_csu_fini>, rtld_fini=0xea20 <_dl_fini>, 
                stack_end=0x1023e5c) at libc-start.c:251
            #2  0x080483d1 in _start ()

        `raise (SIGABRT);`

            $ gdb a.out core
            warning: core file may not match specified executable file.
            [New Thread 76651]
            
            warning: Wrong size fpregset in core file.
            Reading symbols from /lib/libc.so.0.3...[...]
            Core was generated by `./a.out'.
            Program terminated with signal 6, Aborted.
            
            warning: Wrong size fpregset in core file.
            #0  0x00000000 in ?? ()
            (gdb) bt
            #0  0x00000000 in ?? ()
            Cannot access memory at address 0x17

        [[!tag open_issue_gdb]] Probably [[GDB]] doesn't manage to dig in the stack properly.

  * ... against `crash-suspend` will...

      * ... not work at all:
    
            $ CRASHSERVER=/servers/crash-suspend ./a.out
            $ [returns to the shell and doesn't suspended]

      * ... show big variances in running-time behavior:
    
            $ TIMEFORMAT='real %R user %U system %S'
            $ rm -f core; time env CRASHSERVER=/servers/crash-suspend ./a.out; ls -l core
            Aborted (core dumped)
            real 1.381 user 0.000 system 0.010
            -rw------- 1 tschwinge tschwinge 17031168 Jul  7 22:04 core
            $ rm -f core; time env CRASHSERVER=/servers/crash-suspend ./a.out; ls -l core
            Aborted (core dumped)
            real 1.332 user 0.000 system 0.010
            -rw------- 1 tschwinge tschwinge 17031168 Jul  7 22:04 core
            $ rm -f core; time env CRASHSERVER=/servers/crash-suspend ./a.out; ls -l core
            Aborted (core dumped)
            real 21.228 user 0.000 system 0.010
            -rw------- 1 tschwinge tschwinge 17031168 Jul  7 22:04 core
            $ rm -f core; time env CRASHSERVER=/servers/crash-suspend ./a.out; ls -l core
            Aborted (core dumped)
            real 1.323 user 0.000 system 0.010
            -rw------- 1 tschwinge tschwinge 17031168 Jul  7 22:05 core
            $ rm -f core; time env CRASHSERVER=/servers/crash-suspend ./a.out; ls -l core
            Aborted (core dumped)
            real 22.279 user 0.000 system 0.010
            -rw------- 1 tschwinge tschwinge 17031168 Jul  7 22:05 core
            $ rm -f core; time env CRASHSERVER=/servers/crash-suspend ./a.out; ls -l core
            Aborted (core dumped)
            real 1.362 user 0.000 system 0.000
            -rw------- 1 tschwinge tschwinge 17031168 Jul  7 22:08 core
            $ rm -f core; time env CRASHSERVER=/servers/crash-suspend ./a.out; ls -l core
            Aborted (core dumped)
            real 21.110 user 0.000 system 0.000
            -rw------- 1 tschwinge tschwinge 17031168 Jul  7 22:08 core
            $ rm -f core; time env CRASHSERVER=/servers/crash-suspend ./a.out; ls -l core
            Aborted (core dumped)
            real 1.350 user 0.000 system 0.020
            -rw------- 1 tschwinge tschwinge 17031168 Jul  7 22:08 core

      * ... can reliably crash GNU Mach:

        This happens if a `core` file is already present (and won't get
        overwritten; see above).  I reproduced this three times.

            $ TIMEFORMAT='real %R user %U system %S'
            $ time env CRASHSERVER=/servers/crash-suspend ./a.out; ls -l core
            Aborted
            real 2.856 user 0.000 system 0.010
            -rw------- 1 tschwinge tschwinge 17031168 Jul  7 22:08 core

            panic: zalloc: zone kalloc.8192 exhausted
            Kernel Breakpoint trap, eip 0x20020a77
            Stopped at  0x20020a76: int     $3
            db> trace
            0x20020a76(2006aba8,4d0f7e9c,200209b0,0,0)
            0x20020a4d(2006b094,2006ae40,2000,20016803,4a5f4114)
            0x2002bca5(49a03564,1,0,9,1000)
            0x20022f4c(2000,4a5f45d4,4a84879c,49a46564,4ac43e78)
            0x20021e65(4ac43e78,4a5f45d4,4a5f4114,0,0)
            0x2005309d(2106ba9c,3,38,28,1783)
            Bad frame pointer: 0x2106ba78

            $ addr2line -i -f -e /boot/gnumach-xen 0x20020a76 0x20020a4d 0x2002bca5 0x20022f4c 0x20021e65 0x2005309d
            Debugger
            /home/tschwinge/tmp/gnumach/gnumach-1-branch-Xen-branch.build/../gnumach-1-branch-Xen-branch/kern/debug.c:105
            panic
            /home/tschwinge/tmp/gnumach/gnumach-1-branch-Xen-branch.build/../gnumach-1-branch-Xen-branch/kern/debug.c:148
            zalloc
            /home/tschwinge/tmp/gnumach/gnumach-1-branch-Xen-branch.build/../gnumach-1-branch-Xen-branch/kern/zalloc.c:470
            kalloc
            /home/tschwinge/tmp/gnumach/gnumach-1-branch-Xen-branch.build/../gnumach-1-branch-Xen-branch/kern/kalloc.c:185
            ipc_kobject_server
            /home/tschwinge/tmp/gnumach/gnumach-1-branch-Xen-branch.build/../gnumach-1-branch-Xen-branch/kern/ipc_kobject.c:76
            mach_msg_trap
            /home/tschwinge/tmp/gnumach/gnumach-1-branch-Xen-branch.build/../gnumach-1-branch-Xen-branch/ipc/mach_msg.c:1367


# IRC, freenode, #hurd, 2013-09-07

    <rekado> I'm trying to investigate a crash in pfinet, so it will actually
      die.  I just want to know why it dies and what the value of a few
      variables has been when it died.
    <teythoon> have you tried to make it dump core?
    <rekado> oh, good idea.
    <rekado> I'll try that.
    <teythoon> do you know how?
    <rekado> I don't, but I think I can figure it out.
    <teythoon> look into /servers
    <rekado> do I just have to set CRASHSERVER=/servers/crash-dump-core and run
      pfinet in that environment?
    <teythoon> possibly, I've never heard of CRASHSERVER, but it's certainly
      plausible ;)
    <teythoon> I just link crash to crash-dump-core, that way it is permanent
      and for all processes
    <rekado> found it in the website contents
    <rekado> gotta try that.
    <rekado> hmm, I can't get pfinet to dump core; linked /servers/crash to
      /servers/crash-dump-core and compiled pfinet to raise(6) at one point.
    <rekado> But no core file is created.
    <teythoon> :/
    <teythoon> rekado: try cd /tmp ; cat & kill -SIGILL %% to see if that dumps
      core
    <rekado> yes, this works.
    <rekado> I replaced the original pfinet with my crashing version.
    <rekado> Should it dump core to /hurd then?
    <teythoon> I'm not sure about it's wd
    <teythoon> hm, ok, I just did settrans -ca foo /hurd/pfinet and then killed
      that pfient with SIGILL and it dumped core
    <teythoon> to the directory I issued the settrans from
    <rekado> So I must run it myself.  I can't just replace the original binary
      and have it dump core somewhere.
    <teythoon> it seems that you have to use settrans -ca to start an active
      translator
    <teythoon> do fsysopts /servers/socket/2 to find out the cmdline of your
      pfinet
    <rekado> that's very helpful.
    <rekado> thanks
    <teythoon> then use this to restart it, e.g.:
    <teythoon> settrans -afg /servers/socket/2 $(fsysopts /servers/socket/2)
    <teythoon> if it dies it should dump core to you cwd
    <rekado> great. Thank you very much.  I had been wondering how to get the
      full cmdline of pfinet.
    * rekado makes a note of fsysopts
    <rekado> yup, there's the core file. Nice.
    <teythoon> cool 8D
    <teythoon> btw, in case using gdb doesn't work out for your problem, if you
      start pfinet (or any translator) this way (with -a == active), you can
      write stuff to stderr
    <rekado> yeah, I noticed that.  The assert() call wrote to stderr.  Useful.
    <braunr> rekado: core dumps are another not-working-well feature of the
      hurd :/
    <braunr> i recommend attaching
    <tschwinge> rekado: In case that's still helpful:
      <http://www.gnu.org/software/hurd/hurd/debugging/translator.html>.

---

If someone is working in this area, they may want to have a look at
[[GDB_gcore]], and port <http://code.google.com/p/google-coredumper/>, too.