open_issues/code_analysis.mdwn


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290

[[!meta copyright="Copyright © 2010, 2011, 2012, 2013, 2014 Free Software
Foundation, Inc."]]

[[!meta license="""[[!toggle id="license" text="GFDL 1.2+"]][[!toggleable
id="license" text="Permission is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation License, Version 1.2 or
any later version published by the Free Software Foundation; with no Invariant
Sections, no Front-Cover Texts, and no Back-Cover Texts.  A copy of the license
is included in the section entitled [[GNU Free Documentation
License|/fdl]]."]]"""]]

In the topic of *code analysis* or *program analysis* ([[!wikipedia
Program_analysis_(computer_science) desc="Wikipedia article"]]), there is
static code analysis ([[!wikipedia Static_code_analysis desc="Wikipedia
article"]]) and dynamic program analysis ([[!wikipedia Dynamic_program_analysis
desc="Wikipedia article"]]).  This topic overlaps with [[performance
analysis|performance]], [[formal_verification]], as well as general
[[debugging]].

[[!toc]]


# Bounty

There is a [[!FF_project 276]][[!tag bounty]] on some of these tasks.


# Static

  * [[GCC]]'s warnings.  Yes, really.

      * GCC plugins can be used for additional semantic analysis.  For example,
        <http://lwn.net/Articles/457543/>, and search for *kernel context* in
        the comments.

      * Have GCC make use of [[RPC]]/[[microkernel/mach/MIG]] *in*/*out*
        specifiers, and have it emit useful warnings in case these are pointing
        to uninitialized data (for *in* only).

  * [[!wikipedia List_of_tools_for_static_code_analysis]]

  * [Engineering zero-defect software](http://esr.ibiblio.org/?p=4340), Eric
    S. Raymond, 2012-05-13

  * [Static Source Code Analysis Tools for C](http://spinroot.com/static/)

  * [Cppcheck](http://sourceforge.net/apps/mediawiki/cppcheck/)

    For example, [Debian's hurd_20110319-2
    package](http://qa.debian.org/daca/cppcheck/sid/hurd_20110319-2.html)
    (Samuel Thibault, 2011-08-05: *I had a look at those, some are spurious;
    the realloc issues are for real*).

  * Coccinelle

      * <http://lwn.net/Articles/315686/>

      * <http://www.google.com/search?q=coccinelle+analysis>

    Has already been used for finding and fixing [[!message-id desc="double
    mutex unlocking issues"
    "1355701890-29227-1-git-send-email-tipecaml@gmail.com"]].

  * [clang](http://www.google.com/search?q=clang+analysis)

      * <http://darnassus.sceen.net/~teythoon/qa/gnumach/scan-build>

      * <http://darnassus.sceen.net/~teythoon/qa/hurd/scan-build>

  * [Linux' sparse](https://sparse.wiki.kernel.org/)

  * <http://klee.llvm.org/>

      * <http://blog.llvm.org/2010/04/whats-wrong-with-this-code.html>

  * [Smatch](http://smatch.sourceforge.net/)

  * [Parfait](http://labs.oracle.com/projects/parfait/)

      * <http://lwn.net/Articles/344003/>

  * [Saturn](http://saturn.stanford.edu/)

  * [Flawfinder](http://www.dwheeler.com/flawfinder/)

  * [sixgill](http://sixgill.org/)

  * [s-spider](http://code.google.com/p/s-spider/)

  * [CIL (C Intermediate Language)](http://kerneis.github.com/cil/)

  * [Frama-C](http://frama-c.com/)

        <teythoon> btw, I've been looking at http://frama-c.com/ lately
        <teythoon> it's a theorem prover for c/c++
        <braunr> oh nice
        <teythoon> I think it's most impressive, it works on the hurd (aptitude
          install frama-c o_O)
        <teythoon> *and it works
        <braunr> "Simple things should be simple,
        <braunr> complex things should be possible."
        <braunr> :)
        <braunr> looks great
        <teythoon> even the gui is awesome, allows one to browse source code in
          a very impressive way
        <braunr> clear separation between value changes, dependencies, side
          effects
        <braunr> we could have plugins for stuff like ports
        <braunr> handles concurrency oO
        <nalaginrut> so you want to use Frame-C to analyze the whole Hurd code
          base?
        <teythoon> nalaginrut: well, frama-c looks "able" to assist in
          analyzing the Hurd, yes
        <teythoon> nalaginrut: but theorem proving is a manual process, one
          needs to guide the prover
        <teythoon> nalaginrut: b/c some stuff is not decideable
        <nalaginrut> I ask this because I can imagine how to analyze Linux
          since all the code is in a directory. But Hurd's codes are
          distributed to many other projects
        <braunr> that's not a problem
        <braunr> each server can be analyzed separately
        <teythoon> braunr: also, each "entry point"
        <nalaginrut> alright, but sounds a big work
        <teythoon> it is
        <braunr> otherwise, formal verification would be widespread :)
        <teythoon> that, and most tools are horrible to use, frama-c is really
          an exception in this regard

  * [Coverity](http://www.coverity.com/) (nonfree)

      * <https://scan.coverity.com/projects/1307> If you want access, speak up in #hurd or on the mailing list.

      * IRC, OFTC, #debian-hurd, 2014-02-03

            <pere> btw, did you consider adding hurd and mach to <URL:
              https://scan.coverity.com/ > to detect bugs automatically?
            <pere> I found lots of bugs in gnash, ipmitool and sysvinit when I
              started scanning those projects. :)
            <teythoon> i did some static analysis work, i haven't used coverty
              but free tools for that
            <teythoon> i think thomas wanted to look into coverty though
            <pere> quite easy to set up, but you need to download and run a
              non-free tarball on the build host.
            <teythoon> does that tar ball contains binary code ?
            <teythoon> that'd be a show stopper for the hurd of course
            <pere> did not investigate.  I just put it in a contained virtual
              machine.
            <pere> did not want it on my laptop. :)
            <pere> prefer free software here. :)
            <pere> but I did not have to "accept license", at least. :)

      * IRC, OFTC, #debian-hurd, 2014-02-05

            <pere> ah, cool.  <URL: https://scan.coverity.com/projects/1307 >
              is now in place. :)

        [[microkernel/mach/gnumach/projects/clean_up_the_code]],
        *Code_Analysis, Coverity*.

  * [Splint](http://www.splint.org/)

      * IRC, freenode, #hurd, 2011-12-04

            <mcsim> has anyone used splint on hurd?
            <mcsim> this is tool for statically checking C programs
            <mcsim> seems I made it work


## Hurd-specific Applications

  * [[Port Sequence Numbers|microkernel/mach/ipc/sequence_numbering]].  If
    these are used, care must be taken to update them reliably, [[!message-id
    "1123688017.3905.22.camel@buko.sinrega.org"]].  This could be checked by a
    static analysis tool.

  * [[glibc]]'s [[glibc/critical_section]]s.


# Dynamic

  * [[community/gsoc/project_ideas/Valgrind]]

  * glibc's `libmcheck`

      * Used by GDB, for example.

      * Is not thread-safe, [[!sourceware_PR 6547]], [[!sourceware_PR 9939]],
        [[!sourceware_PR 12751]], [[!stackoverflow_question 314931]].

  * <http://en.wikipedia.org/wiki/Electric_Fence>

      * <http://sourceforge.net/projects/duma/>

  * <http://wiki.debian.org/Hardening>

  * <https://wiki.ubuntu.com/CompilerFlags>

  * `MALLOC_CHECK_`/`MALLOC_PERTURB_`

      * IRC, freenode, #glibc, 2011-09-28

            <vsrinivas> two things you can do -- there is an environment
              variable (DEBUG_MALLOC_ iirc?) that can be set to 2 to make
              ptmalloc (glibc's allocator) more forceful and verbose wrt error
              checking
            <vsrinivas> another is to grab a copy of Tor's source tree and copy
              out OpenBSD's allocator (its a clearly-identifyable file in the
              tree); LD_PRELOAD it or link it into your app, it is even more
              aggressive about detecting memory misuse.
            <vsrinivas> third, Red hat has a gdb python plugin that can
              instrument glibc's heap structure. its kinda handy, might help?
            <vsrinivas> MALLOC_CHECK_ was the envvar you want, sorry.

      * [`MALLOC_PERTURB_`](http://udrepper.livejournal.com/11429.html)

      * <http://git.fedorahosted.org/cgit/initscripts.git/diff/?id=deb0df0124fbe9b645755a0a44c7cb8044f24719>

  * In context of [[!message-id
    "1341350006-2499-1-git-send-email-rbraun@sceen.net"]]/the `alloca` issue
    mentioned in [[gnumach_page_cache_policy]]:

    IRC, freenode, #hurd, 2012-07-08:

        <youpi> braunr: there's actually already an ifdef REDZONE in libthreads

    It's `RED_ZONE`.

        <youpi> except it seems clumsy :)
        <youpi> ah, no, the libthreads code properly sets the guard, just for
          grow-up stacks

  * GCC, LLVM/clang: [[Address Sanitizer (asan), Memory Sanitizer (msan),
    Thread Sanitizer (tasn), Undefined Behavor Sanitizer (ubsan), ...|_san]]

  * [GCC plugins](http://gcc.gnu.org/wiki/plugins)

      * [CTraps](https://github.com/blucia0a/CTraps-gcc)

        > CTraps is a gcc plugin and runtime library that inserts calls to runtime
        > library functions just before shared memory accesses in parallel/concurrent
        > code.
        > 
        > The purpose of this plugin is to expose information about when and how threads
        > communicate with one another to programmers for the purpose of debugging and
        > performance tuning.  The overhead of the instrumentation and runtime code is
        > very low -- often low enough for always-on use in production code.  In a series
        > of initial experiments the overhead was 0-10% in many important cases.

  * Input fuzzing

    Not a new topic; has been used (and papers published?) for early [[UNIX]]
    tools.  What about some [[RPC]] fuzzing?

      * <http://caca.zoy.org/wiki/zzuf>

      * <http://www.ece.cmu.edu/~koopman/ballista/>

      * [Jones: system call abuse](http://lwn.net/Articles/414273/), Dave
        Jones, 2010.

          * [Trinity: A Linux system call
            fuzzer]http://codemonkey.org.uk/projects/trinity/().
            [Trinity: A Linux kernel fuzz tester (and then
            some)](http://www.socallinuxexpo.org/scale11x/presentations/trinity-linux-kernel-fuzz-tester-and-then-some),
            Dave Jones, The Eleventh Annual Southern California Linux Expo, 2013.

      * [American fuzzy lop](https://code.google.com/p/american-fuzzy-lop/), *a
        practical, instrumentation-driven fuzzer for binary formats*.

      * [Melkor - An ELF File Format
        Fuzzer](https://www.blackhat.com/us-14/arsenal.html#Hernandez),
        Alejandro Hernández.

          * Can use this to find bugs in our [[hurd/translator/exec]] server,
            for example?  See also the discussion in [[!message-id
            "5452389B.502@samsung.com"]].

  * Mayhem, *an automatic bug finding system*

    IRC, freenode, #hurd, 2013-06-29:

        <teythoon> started reading the mayhem paper referenced here
          http://lists.debian.org/debian-devel/2013/06/msg00720.html
        <teythoon> that's nice work, they are doing symbolic execution of x86
          binary code, that's effectively model checking with some specialized
          formulas
        <teythoon> (too bad the mayhem code isn't available, damn those
          academic people keeping the good stuff to themselvs...)
        <teythoon> (and I really think that's bad practice, how should anyone
          reproduce their results? that's not how science works imho...)