1FLAWFINDER(1) Flawfinder FLAWFINDER(1)
2
3
4
6 flawfinder - lexically find potential security flaws ("hits") in source
7 code
8
10 flawfinder [--help|-h] [--version] [--listrules]
11 [--allowlink] [--followdotdir] [--nolink]
12 [--patch=filename|-P filename]
13 [--inputs|-I] [ --minlevel=X | -m X ] [--falsepositive|-F]
14 [--neverignore|-n]
15 [--regex=PATTERN | -e PATTERN]
16 [--context|-c] [--columns|-C] [--csv] [--dataonly|-D] [--html|-H]
17 [--immediate|-i] [--singleline|-S] [--omittime] [--quiet|-Q] [--error-
18 level=LEVEL]
19 [--loadhitlist=F] [--savehitlist=F] [--diffhitlist=F]
20 [--] [ source code file or source root directory ]+
21
23 Flawfinder searches through C/C++ source code looking for potential
24 security flaws. To run flawfinder, simply give flawfinder a list of
25 directories or files. For each directory given, all files that have
26 C/C++ filename extensions in that directory (and its subdirectories,
27 recursively) will be examined. Thus, for most projects, simply give
28 flawfinder the name of the source code's topmost directory (use ``.''
29 for the current directory), and flawfinder will examine all of the
30 project's C/C++ source code. Flawfinder does not require that you be
31 able to build your software, so it can be used even with incomplete
32 source code. If you only want to have changes reviewed, save a unified
33 diff of those changes (created by GNU "diff -u" or "svn diff" or "git
34 diff") in a patch file and use the --patch (-P) option.
35
36 Flawfinder will produce a list of ``hits'' (potential security flaws,
37 also called findings), sorted by risk; the riskiest hits are shown
38 first. The risk level is shown inside square brackets and varies from
39 0, very little risk, to 5, great risk. This risk level depends not
40 only on the function, but on the values of the parameters of the func‐
41 tion. For example, constant strings are often less risky than fully
42 variable strings in many contexts, and in those contexts the hit will
43 have a lower risk level. Flawfinder knows about gettext (a common
44 library for internationalized programs) and will treat constant strings
45 passed through gettext as though they were constant strings; this
46 reduces the number of false hits in internationalized programs.
47 Flawfinder will do the same sort of thing with _T() and _TEXT(), common
48 Microsoft macros for handling internationalized programs. Flawfinder
49 correctly ignores text inside comments and strings. Normally
50 flawfinder shows all hits with a risk level of at least 1, but you can
51 use the --minlevel option to show only hits with higher risk levels if
52 you wish. Hit descriptions also note the relevant Common Weakness Enu‐
53 meration (CWE) identifier(s) in parentheses, as discussed below.
54 Flawfinder is officially CWE-Compatible. Hit descriptions with "[MS-
55 banned]" indicate functions that are in the banned list of functions
56 released by Microsoft; see http://msdn.microsoft.com/en-
57 us/library/bb288454.aspx for more information about banned functions.
58
59 Not every hit (aka finding) is actually a security vulnerability, and
60 not every security vulnerability is necessarily found. Nevertheless,
61 flawfinder can be an aid in finding and removing security vulnerabili‐
62 ties. A common way to use flawfinder is to first apply flawfinder to a
63 set of source code and examine the highest-risk items. Then, use
64 --inputs to examine the input locations, and check to make sure that
65 only legal and safe input values are accepted from untrusted users.
66
67 Once you've audited a program, you can mark source code lines that are
68 actually fine but cause spurious warnings so that flawfinder will stop
69 complaining about them. To mark a line so that these warnings are sup‐
70 pressed, put a specially-formatted comment either on the same line
71 (after the source code) or all by itself in the previous line. The
72 comment must have one of the two following formats:
73
74 · // Flawfinder: ignore
75
76 · /* Flawfinder: ignore */
77
78 For compatibility's sake, you can replace "Flawfinder:" with "ITS4:" or
79 "RATS:" in these specially-formatted comments. Since it's possible
80 that such lines are wrong, you can use the --neverignore option, which
81 causes flawfinder to never ignore any line no matter what the comment
82 directives say (more confusingly, --neverignore ignores the ignores).
83
84 Flawfinder uses an internal database called the ``ruleset''; the rule‐
85 set identifies functions that are common causes of security flaws. The
86 standard ruleset includes a large number of different potential prob‐
87 lems, including both general issues that can impact any C/C++ program,
88 as well as a number of specific Unix-like and Windows functions that
89 are especially problematic. The --listrules option reports the list of
90 current rules and their default risk levels. As noted above, every
91 potential security flaw found in a given source code file (matching an
92 entry in the ruleset) is called a ``hit,'' and the set of hits found
93 during any particular run of the program is called the ``hitlist.''
94 Hitlists can be saved (using --savehitlist), reloaded back for redis‐
95 play (using --loadhitlist), and you can show only the hits that are
96 different from another run (using --diffhitlist).
97
98 Flawfinder is a simple tool, leading to some fundamental pros and cons.
99 Flawfinder works by doing simple lexical tokenization (skipping com‐
100 ments and correctly tokenizing strings), looking for token matches to
101 the database (particularly to find function calls). Flawfinder is thus
102 similar to RATS and ITS4, which also use simple lexical tokenization.
103 Flawfinder then examines the text of the function parameters to esti‐
104 mate risk. Unlike tools such as splint, gcc's warning flags, and
105 clang, flawfinder does not use or have access to information about con‐
106 trol flow, data flow, or data types when searching for potential vul‐
107 nerabilities or estimating the level of risk. Thus, flawfinder will
108 necessarily produce many false positives for vulnerabilities and fail
109 to report many vulnerabilities. On the other hand, flawfinder can find
110 vulnerabilities in programs that cannot be built or cannot be linked.
111 It can often work with programs that cannot even be compiled (at least
112 by the reviewer's tools). Flawfinder also doesn't get as confused by
113 macro definitions and other oddities that more sophisticated tools have
114 trouble with. Flawfinder can also be useful as a simple introduction
115 to static analysis tools in general, since it is easy to start using
116 and easy to understand.
117
118 Any filename given on the command line will be examined (even if it
119 doesn't have a usual C/C++ filename extension); thus you can force
120 flawfinder to examine any specific files you desire. While searching
121 directories recursively, flawfinder only opens and examines regular
122 files that have C/C++ filename extensions. Flawfinder presumes that
123 files are C/C++ files if they have the extensions ".c", ".h", ".ec",
124 ".ecp", ".pgc", ".C", ".cpp", ".CPP", ".cxx", ".c++", ".cc", ".CC",
125 ".pcc", ".hpp", or ".H". The filename ``-'' means the standard input.
126 To prevent security problems, special files (such as device special
127 files and named pipes) are always skipped, and by default symbolic
128 links are skipped (the --allowlink option follows symbolic links).
129
130 After the list of hits is a brief summary of the results (use -D to
131 remove this information). It will show the number of hits, lines ana‐
132 lyzed (as reported by wc -l), and the physical source lines of code
133 (SLOC) analyzed. A physical SLOC is a non-blank, non-comment line. It
134 will then show the number of hits at each level; note that there will
135 never be a hit at a level lower than minlevel (1 by default). Thus,
136 "[0] 0 [1] 9" means that at level 0 there were 0 hits reported, and
137 at level 1 there were 9 hits reported. It will next show the number of
138 hits at a given level or larger (so level 3+ has the sum of the number
139 of hits at level 3, 4, and 5). Thus, an entry of "[0+] 37" shows that
140 at level 0 or higher there were 37 hits (the 0+ entry will always be
141 the same as the "hits" number above). Hits per KSLOC is next shown;
142 this is each of the "level or higher" values multiplied by 1000 and
143 divided by the physical SLOC. If symlinks were skipped, the count of
144 those is reported. If hits were suppressed (using the "ignore" direc‐
145 tive in source code comments as described above), the number suppressed
146 is reported. The minimum risk level to be included in the report is
147 displayed; by default this is 1 (use --minlevel to change this). The
148 summary ends with important reminders: Not every hit is necessarily a
149 security vulnerability, and there may be other security vulnerabilities
150 not reported by the tool.
151
152 Flawfinder can easily integrate into a continuous integration system.
153 You might want to check out the --error-level option to help do that.
154
155 Flawfinder is released under the GNU GPL license version 2 or later
156 (GPLv2+).
157
158 Flawfinder works similarly to another program, ITS4, which is not fully
159 open source software (as defined in the Open Source Definition) nor
160 free software (as defined by the Free Software Foundation). The author
161 of Flawfinder has never seen ITS4's source code. Flawfinder is similar
162 in many ways to RATS, if you are familiar with RATS.
163
164
166 Here's a brief example of how flawfinder might be used. Imagine that
167 you have the C/C++ source code for some program named xyzzy (which you
168 may or may not have written), and you're searching for security vulner‐
169 abilities (so you can fix them before customers encounter the vulnera‐
170 bilities). For this tutorial, I'll assume that you're using a Unix-
171 like system, such as Linux, OpenBSD, or MacOS X.
172
173 If the source code is in a subdirectory named xyzzy, you would probably
174 start by opening a text window and using flawfinder's default settings,
175 to analyze the program and report a prioritized list of potential secu‐
176 rity vulnerabilities (the ``less'' just makes sure the results stay on
177 the screen):
178 flawfinder xyzzy | less
179
180
181 At this point, you will see a large number of entries. Each entry has
182 a filename, a colon, a line number, a risk level in brackets (where 5
183 is the most risky), a category, the name of the function, and a
184 description of why flawfinder thinks the line is a vulnerability.
185 Flawfinder normally sorts by risk level, showing the riskiest items
186 first; if you have limited time, it's probably best to start working on
187 the riskiest items and continue until you run out of time. If you want
188 to limit the display to risks with only a certain risk level or higher,
189 use the --minlevel option. If you're getting an extraordinary number
190 of false positives because variable names look like dangerous function
191 names, use the -F option to remove reports about them. If you don't
192 understand the error message, please see documents such as the Secure
193 Programming HOWTO ⟨https://dwheeler.com/secure-programs⟩ at
194 https://dwheeler.com/secure-programs which provides more information on
195 writing secure programs.
196
197 Once you identify the problem and understand it, you can fix it. Occa‐
198 sionally you may want to re-do the analysis, both because the line num‐
199 bers will change and to make sure that the new code doesn't introduce
200 yet a different vulnerability.
201
202 If you've determined that some line isn't really a problem, and you're
203 sure of it, you can insert just before or on the offending line a com‐
204 ment like
205 /* Flawfinder: ignore */
206 to keep them from showing up in the output.
207
208 Once you've done that, you should go back and search for the program's
209 inputs, to make sure that the program strongly filters any of its
210 untrusted inputs. Flawfinder can identify many program inputs by using
211 the --inputs option, like this:
212 flawfinder --inputs xyzzy
213
214 Flawfinder can integrate well with text editors and integrated develop‐
215 ment environments; see the examples for more information.
216
217 Flawfinder includes many other options, including ones to create HTML
218 versions of the output (useful for prettier displays). The next sec‐
219 tion describes those options in more detail.
220
221
223 Flawfinder has a number of options, which can be grouped into options
224 that control its own documentation, select input data, select which
225 hits to display, select the output format, and perform hitlist manage‐
226 ment. The commonly-used flawfinder options support the standard option
227 syntax defined in the POSIX (Issue 7, 2013 Edition) section ``Utility
228 Conventions''. Flawfinder also supports the GNU long options (double-
229 dash options of form --option) as defined in the GNU C Library Refer‐
230 ence Manual ``Program Argument Syntax Conventions'' and GNU Coding
231 Standards ``Standards for Command Line Interfaces''. Long option argu‐
232 ments can be provided as ``--name=value'' or ``-name value''. All
233 options can be accessed using the more readable GNU long option conven‐
234 tions; some less commonly used options can only be accessed using long
235 option conventions.
236
237
238 Documentation
239 --help
240
241 -h Show usage (help) information.
242
243
244 --version Shows (just) the version number and exits.
245
246
247 --listrules List the terms (tokens) that trigger further examination,
248 their default risk level, and the default warning (includ‐
249 ing the CWE identifier(s), if applicable), all tab-sepa‐
250 rated. The terms are primarily names of potentially-dan‐
251 gerous functions. Note that the reported risk level and
252 warning for some specific code may be different than the
253 default, depending on how the term is used. Combine with
254 -D if you do not want the usual header. Flawfinder version
255 1.29 changed the separator from spaces to tabs, and added
256 the default warning field.
257
258
259 Selecting Input Data
260 --allowlink Allow the use of symbolic links; normally symbolic links
261 are skipped. Don't use this option if you're analyzing
262 code by others; attackers could do many things to cause
263 problems for an analysis with this option enabled. For
264 example, an attacker could insert symbolic links to files
265 such as /etc/passwd (leaking information about the file) or
266 create a circular loop, which would cause flawfinder to run
267 ``forever''. Another problem with enabling this option is
268 that if the same file is referenced multiple times using
269 symbolic links, it will be analyzed multiple times (and
270 thus reported multiple times). Note that flawfinder
271 already includes some protection against symbolic links to
272 special file types such as device file types (e.g.,
273 /dev/zero or C:\mystuff\com1). Note that for flawfinder
274 version 1.01 and before, this was the default.
275
276
277 --followdotdir
278 Enter directories whose names begin with ".". Normally
279 such directories are ignored, since they normally include
280 version control private data (such as .git/ or .svn/),
281 build metadata (such as .makepp), configuration informa‐
282 tion, and so on.
283
284
285 --nolink Ignored. Historically this disabled following symbolic
286 links; this behavior is now the default.
287
288
289 --patch=patchfile
290
291 -P patchfile
292 Examine the selected files or directories, but only report
293 hits in lines that are added or modified as described in
294 the given patch file. The patch file must be in a recog‐
295 nized unified diff format (e.g., the output of GNU "diff -u
296 old new", "svn diff", or "git diff [commit]"). Flawfinder
297 assumes that the patch has already been applied to the
298 files. The patch file can also include changes to irrele‐
299 vant files (they will simply be ignored). The line numbers
300 given in the patch file are used to determine which lines
301 were changed, so if you have modified the files since the
302 patch file was created, regenerate the patch file first.
303 Beware that the file names of the new files given in the
304 patch file must match exactly, including upper/lower case,
305 path prefix, and directory separator (\ vs. /). Only uni‐
306 fied diff format is accepted (GNU diff, svn diff, and git
307 diff output is okay); if you have a different format, again
308 regenerate it first. Only hits that occur on resultant
309 changed lines, or immediately above and below them, are
310 reported. This option implies --neverignore. Warning: Do
311 not pass a patch file without the -P, because flawfinder
312 will then try to treat the file as a source file. This
313 will often work, but the line numbers will be relative to
314 the beginning of the patch file, not the positions in the
315 source code. Note that you must also provide the actual
316 files to analyze, and not just the patch file; when using
317 -P files are only reported if they are both listed in the
318 patch and also listed (directly or indirectly) in the list
319 of files to analyze.
320
321
322
323 Selecting Hits to Display
324 --inputs
325
326 -I Show only functions that obtain data from outside the program;
327 this also sets minlevel to 0.
328
329
330 --minlevel=X
331
332 -m X Set minimum risk level to X for inclusion in hitlist. This can
333 be from 0 (``no risk'') to 5 (``maximum risk''); the default is
334 1.
335
336
337 --falsepositive
338
339 -F Do not include hits that are likely to be false positives. Cur‐
340 rently, this means that function names are ignored if they're
341 not followed by "(", and that declarations of character arrays
342 aren't noted. Thus, if you have use a variable named "access"
343 everywhere, this will eliminate references to this ordinary
344 variable. This isn't the default, because this also increases
345 the likelihood of missing important hits; in particular, func‐
346 tion names in #define clauses and calls through function point‐
347 ers will be missed.
348
349
350 --neverignore
351
352 -n Never ignore security issues, even if they have an ``ignore''
353 directive in a comment.
354
355
356 --regexp=PATTERN
357
358 -e PATTERN
359 Only report hits with text that matches the regular expression
360 pattern PATTERN. For example, to only report hits containing
361 the text "CWE-120", use ``--regex CWE-120''. These option flag
362 names are the same as grep.
363
364
365
366 Selecting Output Format
367 --columns
368
369 -C Show the column number (as well as the file name and line
370 number) of each hit; this is shown after the line number by
371 adding a colon and the column number in the line (the first
372 character in a line is column number 1). This is useful
373 for editors that can jump to specific columns, or for inte‐
374 grating with other tools (such as those to further filter
375 out false positives).
376
377
378 --context
379
380 -c Show context, i.e., the line having the "hit"/potential
381 flaw. By default the line is shown immediately after the
382 warning.
383
384
385 --csv Generate output in comma-separated-value (CSV) format.
386 This is the recommended format for sending to other tools
387 for processing. It will always generate a header row, fol‐
388 lowed by 0 or more data rows (one data row for each hit).
389 Selecting this option automatically enables --quiet and
390 --dataonly. The headers are mostly self-explanatory.
391 "File" is the filename, "Line" is the line number, "Column"
392 is the column (starting from 1), "Level" is the risk level
393 (0-5, 5 is riskiest), "Category" is the general flawfinder
394 category, "Name" is the name of the triggering rule, "Warn‐
395 ing" is text explaining why it is a hit (finding), "Sugges‐
396 tion" is text suggesting how it might be fixed, "Note" is
397 other explanatory notes, "CWEs" is the list of one or more
398 CWEs, "Context" is the source code line triggering the hit,
399 and "Fingerprint" is the SHA-256 hash of the context once
400 its leading and trailing whitespace have been removed (the
401 fingerprint may help detect and eliminate later duplica‐
402 tions). If you use Python3, the hash is of the context
403 when encoded as UTF-8.
404
405
406 --dataonly
407
408 -D Don't display the header and footer. Use this along with
409 --quiet to see just the data itself.
410
411
412 --html
413
414 -H Format the output as HTML instead of as simple text.
415
416
417 --immediate
418
419 -i Immediately display hits (don't just wait until the end).
420
421
422 --singleline
423
424 -S Display as single line of text output for each hit. Useful
425 for interacting with compilation tools.
426
427
428 --omittime Omit timing information. This is useful for regression
429 tests of flawfinder itself, so that the output doesn't vary
430 depending on how long the analysis takes.
431
432
433 --quiet
434
435 -Q Don't display status information (i.e., which files are
436 being examined) while the analysis is going on.
437
438
439 --error-level=LEVEL
440 Return a nonzero (false) error code if there is at least
441 one hit of LEVEL or higher. If a diffhitlist is provided,
442 hits noted in it are ignored. This option can be useful
443 within a continuous integration script, especially if you
444 mark known-okay lines as "flawfinder: ignore". Usually you
445 want level to be fairly high, such as 4 or 5. By default,
446 flawfinder returns 0 (true) on a successful run.
447
448
449 Hitlist Management
450 --savehitlist=F
451 Save all resulting hits (the "hitlist") to F.
452
453
454 --loadhitlist=F
455 Load the hitlist from F instead of analyzing source pro‐
456 grams. Warning: Do not load hitlists from untrusted
457 sources (for security reasons). These are internally
458 implemented using Python's "pickle" facility, which trusts
459 the input. Note that stored hitlists often cannot be read
460 when using an older version of Python, in particular, if
461 savehitlist was used but flawfinder was run using Python 3,
462 the hitlist can't be loaded by running flawfinder with
463 Python 2.
464
465
466 --diffhitlist=F
467 Show only hits (loaded or analyzed) not in F. F was pre‐
468 sumably created previously using --savehitlist. Warning:
469 Do not diff hitlists from untrusted sources (for security
470 reasons). If the --loadhitlist option is not provided,
471 this will show the hits in the analyzed source code files
472 that were not previously stored in F. If used along with
473 --loadhitlist, this will show the hits in the loaded
474 hitlist not in F. The difference algorithm is conserva‐
475 tive; hits are only considered the ``same'' if they have
476 the same filename, line number, column position, function
477 name, and risk level.
478
479
480 Character Encoding Errors
481 Flawfinder uses the character encoding rules set by Python. Sometimes
482 source code does not perfectly follow some encoding rules. If you run
483 flawfinder with Python 2 these non-conformities often do not impact
484 processing in practice.
485
486 However, if you run flawfinder with Python 3, this can be a problem.
487 Python 3 developers wants the world to always use encodings perfectly
488 correctly, everywhere, and in general wants everyone to only use UTF-8.
489 UTF-8 is a great encoding, and it is very popular, but the world often
490 doesn't care what the Python 3 developers want.
491
492 When running flawfinder using Python 3, the program will crash hard if
493 any source file has any non-conforming text. It will do this even if
494 the non-conforming text is in comments or strings (where it often
495 doesn't matter). Python 3 fails to provide useful built-ins to deal
496 with the messiness of the real world, so it's non-trivial to deal with
497 this problem without depending on external libraries (which we're try‐
498 ing to avoid).
499
500 A symptom of this problem is if you run flawfinder and you see an error
501 message like this:
502
503 Error: encoding error in ,1.c
504
505 'utf-8' codec can't decode byte 0xff in position 45: invalid start byte
506
507 What you are seeing is the result of an internal UnicodeDecodeError.
508
509 If this happens to you, there are several options:
510
511 Option #1 (special case): if your system normally uses an encoding
512 other than UTF-8, is properly set up to use that encoding (using LC_ALL
513 and maybe LC_CTYPE), and the input files are in that non-UTF-8 encod‐
514 ing, it may be that Python3 is (incorrectly) ignoring your configura‐
515 tion. In that case, simply tell Python3 to use your configuration by
516 setting the environment variable PYTHONUTF8=0, e.g., run flawfinder as:
517 "PYTHONUTF8=0 python3 flawfinder ...".
518
519 Option #2 (special case): If you know what the encoding of the files
520 is, you can force use of that encoding. E.g., if the encoding is BLAH,
521 run flawfinder as: "PYTHONUTF8=0 LC_ALL=C.BLAH python3 flawfinder ...".
522 You can replace "C" after LC_ALL= with your real language locale (e.g.,
523 "en_US").
524
525 Option #3: If you don't know what the encoding is, or the encoding is
526 inconsistent (e.g., the common case of UTF-8 files with some characters
527 encoded using Windows-1252 instead), then you can force the system to
528 use the ISO-8859-1 (Latin-1) encoding in which all bytes are allowed.
529 If the inconsistencies are only in comments and strings, and the under‐
530 lying character set is "close enough" to ASCII, this can get you going
531 in a hurry. You can do this by running: "PYTHONUTF8=0
532 LC_ALL=C.ISO-8859-1 python3 flawfinder ...". In some cases you may not
533 need the "PYTHONUTF8=0". You may be able to replace "C" after LC_ALL=
534 with your real language locale (e.g., "en_US").
535
536 Option #4: Convert the encoding of the files to be analyzed so that
537 it's a single encoding - it's highly recommended to convert to UTF-8.
538 For example, the system program "iconv" or the Python program cvt2utf
539 can be used to convert encodings. (You can install cvt2utf with "pip
540 install cvtutf"). This works well if some files have one encoding, and
541 some have another, but they are consistent within a single file. If
542 the files have encoding errors, you'll have to fix them.
543
544 Option #5: Run flawfinder using Python 2 instead of Python 3. E.g.,
545 "python2 flawfinder ...".
546
547 To be clear: I strongly recommend using the UTF-8 encoding for all
548 source code, and use continuous integration tests to ensure that the
549 source code is always valid UTF-8. If you do that, many problems dis‐
550 appear. But in the real world this is not always the situation. Hope‐
551 fully this information will help you deal with real-world encoding
552 problems.
553
554
556 Here are various examples of how to invoke flawfinder. The first exam‐
557 ples show various simple command-line options. Flawfinder is designed
558 to work well with text editors and integrated development environments,
559 so the next sections show how to integrate flawfinder into vim and
560 emacs.
561
562
563 Simple command-line options
564 flawfinder /usr/src/linux-3.16
565 Examine all the C/C++ files in the directory
566 /usr/src/linux-3.16 and all its subdirectories (recur‐
567 sively), reporting on all hits found. By default
568 flawfinder will skip symbolic links and directories with
569 names that start with a period.
570
571
572 flawfinder --minlevel=4 .
573 Examine all the C/C++ files in the current directory and
574 its subdirectories (recursively); only report vulnerabili‐
575 ties level 4 and up (the two highest risk levels).
576
577
578 flawfinder --inputs mydir
579 Examine all the C/C++ files in mydir and its subdirectories
580 (recursively), and report functions that take inputs (so
581 that you can ensure that they filter the inputs appropri‐
582 ately).
583
584
585 flawfinder --neverignore mydir
586 Examine all the C/C++ files in the directory mydir and its
587 subdirectories, including even the hits marked for ignoring
588 in the code comments.
589
590
591 flawfinder --csv .
592 Examine the current directory down (recursively), and
593 report all hits in CSV format. This is the recommended
594 form if you want to further process flawfinder output using
595 other tools (such as data correlation tools).
596
597
598 flawfinder -QD mydir
599 Examine mydir and report only the actual results (removing
600 the header and footer of the output). This form may be
601 useful if the output will be piped into other tools for
602 further analysis, though CSV format is probably the better
603 choice in that case. The -C (--columns) and -S (--single‐
604 line) options can also be useful if you're piping the data
605 into other tools.
606
607
608 flawfinder -QDSC mydir
609 Examine mydir, reporting only the actual results (no header
610 or footer). Each hit is reported on one line, and column
611 numbers are reported. This can be a useful command if you
612 are feeding flawfinder output to other tools.
613
614
615 flawfinder --quiet --html --context mydir > results.html
616 Examine all the C/C++ files in the directory mydir and its
617 subdirectories, and produce an HTML formatted version of
618 the results. Source code management systems (such as
619 SourceForge and Savannah) might use a command like this.
620
621
622 flawfinder --quiet --savehitlist saved.hits *.[ch]
623 Examine all .c and .h files in the current directory.
624 Don't report on the status of processing, and save the
625 resulting hitlist (the set of all hits) in the file
626 saved.hits.
627
628
629 flawfinder --diffhitlist saved.hits *.[ch]
630 Examine all .c and .h files in the current directory, and
631 show any hits that weren't already in the file saved.hits.
632 This can be used to show only the ``new'' vulnerabilities
633 in a modified program, if saved.hits was created from the
634 older version of the program being analyzed.
635
636
637 flawfinder --patch recent.patch .
638 Examine the current directory recursively, but only report
639 lines that were changed or added in the already-applied
640 patchfile named recent.patch.
641
642
643 flawfinder --regex "CWE-120|CWE-126" src/
644 Examine directory src recursively, but only report hits
645 where CWE-120 or CWE-126 apply.
646
647
648 Invoking from vim
649 The text editor vim includes a "quickfix" mechanism that works well
650 with flawfinder, so that you can easily view the warning messages and
651 jump to the relevant source code.
652
653 First, you need to invoke flawfinder to create a list of hits, and
654 there are two ways to do this. The first way is to start flawfinder
655 first, and then (using its output) invoke vim. The second way is to
656 start (or continue to run) vim, and then invoke flawfinder (typically
657 from inside vim).
658
659 For the first way, run flawfinder and store its output in some FLAWFILE
660 (say "flawfile"), then invoke vim using its -q option, like this: "vim
661 -q flawfile". The second way (starting flawfinder after starting vim)
662 can be done a legion of ways. One is to invoke flawfinder using a
663 shell command, ":!flawfinder-command > FLAWFILE", then follow that with
664 the command ":cf FLAWFILE". Another way is to store the flawfinder
665 command in your makefile (as, say, a pseudocommand like "flaw"), and
666 then run ":make flaw".
667
668 In all these cases you need a command for flawfinder to run. A plausi‐
669 ble command, which places each hit in its own line (-S) and removes
670 headers and footers that would confuse it, is:
671
672 flawfinder -SQD .
673
674
675 You can now use various editing commands to view the results. The com‐
676 mand ":cn" displays the next hit; ":cN" displays the previous hit, and
677 ":cr" rewinds back to the first hit. ":copen" will open a window to
678 show the current list of hits, called the "quickfix window"; ":cclose"
679 will close the quickfix window. If the buffer in the used window has
680 changed, and the error is in another file, jumping to the error will
681 fail. You have to make sure the window contains a buffer which can be
682 abandoned before trying to jump to a new file, say by saving the file;
683 this prevents accidental data loss.
684
685
686 Invoking from emacs
687 The text editor / operating system emacs includes "grep mode" and "com‐
688 pile mode" mechanisms that work well with flawfinder, making it easy to
689 view warning messages, jump to the relevant source code, and fix any
690 problems you find.
691
692 First, you need to invoke flawfinder to create a list of warning mes‐
693 sages. You can use "grep mode" or "compile mode" to create this list.
694 Often "grep mode" is more convenient; it leaves compile mode untouched
695 so you can easily recompile once you've changed something. However, if
696 you want to jump to the exact column position of a hit, compile mode
697 may be more convenient because emacs can use the column output of
698 flawfinder to directly jump to the right location without any special
699 configuration.
700
701 To use grep mode, enter the command "M-x grep" and then enter the
702 needed flawfinder command. To use compile mode, enter the command "M-x
703 compile" and enter the needed flawfinder command. This is a meta-key
704 command, so you'll need to use the meta key for your keyboard (this is
705 usually the ESC key). As with all emacs commands, you'll need to press
706 RETURN after typing "grep" or "compile". So on many systems, the grep
707 mode is invoked by typing ESC x g r e p RETURN.
708
709 You then need to enter a command, removing whatever was there before if
710 necessary. A plausible command is:
711
712 flawfinder -SQDC .
713
714 This command makes every hit report a single line, which is much easier
715 for tools to handle. The quiet and dataonly options remove the other
716 status information not needed for use inside emacs. The trailing
717 period means that the current directory and all descendents are
718 searched for C/C++ code, and analyzed for flaws.
719
720 Once you've invoked flawfinder, you can use emacs to jump around in its
721 results. The command C-x ` (Control-x backtick) visits the source code
722 location for the next warning message. C-u C-x ` (control-u control-x
723 backtick) restarts from the beginning. You can visit the source for
724 any particular error message by moving to that hit message in the *com‐
725 pilation* buffer or *grep* buffer and typing the return key. (Techni‐
726 cal note: in the compilation buffer, this invokes compile-goto-error.)
727 You can also click the Mouse-2 button on the error message (you don't
728 need to switch to the *compilation* buffer first).
729
730 If you want to use grep mode to jump to specific columns of a hit,
731 you'll need to specially configure emacs to do this. To do this, mod‐
732 ify the emacs variable "grep-regexp-alist". This variable tells Emacs
733 how to parse output of a "grep" command, similar to the variable "com‐
734 pilation-error-regexp-alist" which lists various formats of compilation
735 error messages.
736
737
738 Invoking from Integrated Development Environments (IDEs)
739 For (other) IDEs, consult your IDE's set of plug-ins.
740
741
743 The Common Weakness Enumeration (CWE) is ``a formal list or dictionary
744 of common software weaknesses that can occur in software's architec‐
745 ture, design, code or implementation that can lead to exploitable secu‐
746 rity vulnerabilities... created to serve as a common language for
747 describing software security weaknesses''
748 (https://cwe.mitre.org/about/faq.html). For more information on CWEs,
749 see https://cwe.mitre.org.
750
751 Flawfinder supports the CWE and is officially CWE-Compatible. Hit
752 descriptions typically include a relevant Common Weakness Enumeration
753 (CWE) identifier in parentheses where there is known to be a relevant
754 CWE. For example, many of the buffer-related hits mention CWE-120, the
755 CWE identifier for ``buffer copy without checking size of input'' (aka
756 ``Classic Buffer Overflow''). In a few cases more than one CWE identi‐
757 fier may be listed. The HTML report also includes hypertext links to
758 the CWE definitions hosted at MITRE. In this way, flawfinder is
759 designed to meet the CWE-Output requirement.
760
761 In some cases there are CWE mapping and usage challenges; here is how
762 flawfinder handles them. If the same entry maps to multiple CWEs
763 simultaneously, all the CWE mappings are listed as separated by commas.
764 This often occurs with CWE-20, Improper Input Validation; thus the
765 report "CWE-676, CWE-120" maps to two CWEs. In addition, flawfinder
766 provides additional information for those who are are interested in the
767 CWE/SANS top 25 list 2011 (https://cwe.mitre.org/top25/) when mappings
768 are not directly to them. Many people will want to search for specific
769 CWEs in this top 25 list, such as CWE-120 (classic buffer overflow).
770 The challenge is that some flawfinder hits map to a more general CWE
771 that would include a top 25 item, while in some other cases hits map to
772 a more specific vulnerability that is only a subset of a top 25 item.
773 To resolve this, in some cases flawfinder will list a sequence of CWEs
774 in the format "more-general/more-specific", where the CWE actually
775 being mapped is followed by a "!". This is always done whenever a flaw
776 is not mapped directly to a top 25 CWE, but the mapping is related to
777 such a CWE. So "CWE-119!/CWE-120" means that the vulnerability is
778 mapped to CWE-119 and that CWE-120 is a subset of CWE-119. In con‐
779 trast, "CWE-362/CWE-367!" means that the hit is mapped to CWE-367, a
780 subset of CWE-362. Note that this is a subtle syntax change from
781 flawfinder version 1.31; in flawfinder version 1.31, the form "more-
782 general:more-specific" meant what is now listed as "more-general!/more-
783 specific", while "more-general/more-specific" meant "more-general/more-
784 specific!". Tools can handle both the version 1.31 and the current
785 format, if they wish, by noting that the older format did not use "!"
786 at all (and thus this is easy to distinguish). These mapping mecha‐
787 nisms simplify searching for certain CWEs.
788
789 CWE version 2.7 (released June 23, 2014) was used for the mapping. The
790 current CWE mappings select the most specific CWE the tool can deter‐
791 mine. In theory, most CWE security elements (signatures/patterns that
792 the tool searches for) could theoretically be mapped to CWE-676 (Use of
793 Potentially Dangerous Function), but such a mapping would not be use‐
794 ful. Thus, more specific mappings were preferred where one could be
795 found. Flawfinder is a lexical analysis tool; as a result, it is
796 impractical for it to be more specific than the mappings currently
797 implemented. This also means that it is unlikely to need much updating
798 for map currency; it simply doesn't have enough information to refine
799 to a detailed CWE level that CWE changes would typically affect. The
800 list of CWE identifiers was generated automatically using "make show-
801 cwes", so there is confidence that this list is correct. Please report
802 CWE mapping problems as bugs if you find any.
803
804 Flawfinder may fail to find a vulnerability, even if flawfinder covers
805 one of these CWE weaknesses. That said, flawfinder does find vulnera‐
806 bilities listed by the CWEs it covers, and it will not report lines
807 without those vulnerabilities in many cases. Thus, as required for any
808 tool intending to be CWE compatible, flawfinder has a rate of false
809 positives less than 100% and a rate of false negatives less than 100%.
810 Flawfinder almost always reports whenever it finds a match to a CWE
811 security element (a signature/pattern as defined in its database),
812 though certain obscure constructs can cause it to fail (see BUGS
813 below).
814
815 Flawfinder can report on the following CWEs (these are the CWEs that
816 flawfinder covers; ``*'' marks those in the CWE/SANS top 25 list):
817
818 · CWE-20: Improper Input Validation
819
820 · CWE-22: Improper Limitation of a Pathname to a Restricted Directory
821 (``Path Traversal'')
822
823 · CWE-78: Improper Neutralization of Special Elements used in an OS
824 Command (``OS Command Injection'')*
825
826 · CWE-119: Improper Restriction of Operations within the Bounds of a
827 Memory Buffer (a parent of CWE-120*, so this is shown as
828 CWE-119!/CWE-120)
829
830 · CWE-120: Buffer Copy without Checking Size of Input (``Classic Buffer
831 Overflow'')*
832
833 · CWE-126: Buffer Over-read
834
835 · CWE-134: Uncontrolled Format String*
836
837 · CWE-190: Integer Overflow or Wraparound*
838
839 · CWE-250: Execution with Unnecessary Privileges
840
841 · CWE-327: Use of a Broken or Risky Cryptographic Algorithm*
842
843 · CWE-362: Concurrent Execution using Shared Resource with Improper
844 Synchronization (``Race Condition'')
845
846 · CWE-377: Insecure Temporary File
847
848 · CWE-676: Use of Potentially Dangerous Function*
849
850 · CWE-732: Incorrect Permission Assignment for Critical Resource*
851
852 · CWE-785: Use of Path Manipulation Function without Maximum-sized Buf‐
853 fer (child of CWE-120*, so this is shown as CWE-120/CWE-785)
854
855 · CWE-807: Reliance on Untrusted Inputs in a Security Decision*
856
857 · CWE-829: Inclusion of Functionality from Untrusted Control Sphere*
858
859 You can select a specific subset of CWEs to report by using the
860 ``--regex'' (-e) option. This option accepts a regular expression, so
861 you can select multiple CWEs, e.g., ``--regex "CWE-120|CWE-126"''. If
862 you select multiple CWEs with ``|'' on a command line you will typi‐
863 cally need to quote the parameters (since an unquoted ``|'' is the pipe
864 symbol). Flawfinder is designed to meet the CWE-Searchable require‐
865 ment.
866
867 If your goal is to report a subset of CWEs that are listed in a file,
868 that can be achieved on a Unix-like system using the ``--regex'' aka
869 ``-e'' option. The file must be in regular expression format. For
870 example, ``flawfinder -e $(cat file1)'' would report only hits that
871 matched the pattern in ``file1''. If file1 contained
872 ``CWE-120|CWE-126'' it would only report hits matching those CWEs.
873
874 A list of all CWE security elements (the signatures/patterns that
875 flawfinder looks for) can be found by using the ``--listrules'' option.
876 Each line lists the signature token (typically a function name) that
877 may lead to a hit, the default risk level, and the default warning
878 (which includes the default CWE identifier). For most purposes this is
879 also enough if you want to see what CWE security elements map to which
880 CWEs, or the reverse. For example, to see the most of the signatures
881 (function names) that map to CWE-327, without seeing the default risk
882 level or detailed warning text, run ``flawfinder --listrules | grep
883 CWE-327 | cut -f1''. You can also see the tokens without a CWE mapping
884 this way by running ``flawfinder -D --listrules | grep -v CWE-''. How‐
885 ever, while --listrules lists all CWE security elements, it only lists
886 the default mappings from CWE security elements to CWE identifiers. It
887 does not include the refinements that flawfinder applies (e.g., by
888 examining function parameters).
889
890 If you want a detailed and exact mapping between the CWE security ele‐
891 ments and CWE identifiers, the flawfinder source code (included in the
892 distribution) is the best place for that information. This detailed
893 information is primarily of interest to those few people who are trying
894 to refine the CWE mappings of flawfinder or refine CWE in general. The
895 source code documents the mapping between the security elements to the
896 respective CWE identifiers, and is a single Python file. The
897 ``c_rules'' dataset defines most rules, with reference to a function
898 that may make further refinements. You can search the dataset for
899 function names to see what CWE it generates by default; if first param‐
900 eter is not ``normal'' then that is the name of a refining Python
901 method that may select different CWEs (depending on additional informa‐
902 tion). Conversely, you can search for ``CWE-number'' and find what
903 security elements (signatures or patterns) refer to that CWE identi‐
904 fier. For most people, this is much more than they need; most people
905 just want to scan their source code to quickly find problems.
906
907
908
910 The whole point of this tool is to help find vulnerabilities so they
911 can be fixed. However, developers and reviewers must know how to
912 develop secure software to use this tool, because otherwise, a fool
913 with a tool is still a fool. My book at https://dwheeler.com/secure-
914 programs may help.
915
916 This tool should be, at most, a small part of a larger software devel‐
917 opment process designed to eliminate or reduce the impact of vulnera‐
918 bilities. Developers and reviewers need know how to develop secure
919 software, and they need to apply this knowledge to reduce the risks of
920 vulnerabilities in the first place.
921
922 Different vulnerability-finding tools tend to find different vulnera‐
923 bilities. Thus, you are best off using human review and a variety of
924 tools. This tool can help find some vulnerabilities, but by no means
925 all.
926
927 You should always analyze a copy of the source program being analyzed,
928 not a directory that can be modified by a developer while flawfinder is
929 performing the analysis. This is especially true if you don't necess‐
930 ily trust a developer of the program being analyzed. If an attacker
931 has control over the files while you're analyzing them, the attacker
932 could move files around or change their contents to prevent the expo‐
933 sure of a security problem (or create the impression of a problem where
934 there is none). If you're worried about malicious programmers you
935 should do this anyway, because after analysis you'll need to verify
936 that the code eventually run is the code you analyzed. Also, do not
937 use the --allowlink option in such cases; attackers could create mali‐
938 cious symbolic links to files outside of their source code area (such
939 as /etc/passwd).
940
941 Source code management systems (like GitHub, SourceForge, and Savannah)
942 definitely fall into this category; if you're maintaining one of those
943 systems, first copy or extract the files into a separate directory
944 (that can't be controlled by attackers) before running flawfinder or
945 any other code analysis tool.
946
947 Note that flawfinder only opens regular files, directories, and (if
948 requested) symbolic links; it will never open other kinds of files,
949 even if a symbolic link is made to them. This counters attackers who
950 insert unusual file types into the source code. However, this only
951 works if the filesystem being analyzed can't be modified by an attacker
952 during the analysis, as recommended above. This protection also
953 doesn't work on Cygwin platforms, unfortunately.
954
955 Cygwin systems (Unix emulation on top of Windows) have an additional
956 problem if flawfinder is used to analyze programs that the analyst can‐
957 not trust. The problem is due to a design flaw in Windows (that it
958 inherits from MS-DOS). On Windows and MS-DOS, certain filenames (e.g.,
959 ``com1'') are automatically treated by the operating system as the
960 names of peripherals, and this is true even when a full pathname is
961 given. Yes, Windows and MS-DOS really are designed this badly.
962 Flawfinder deals with this by checking what a filesystem object is, and
963 then only opening directories and regular files (and symlinks if
964 enabled). Unfortunately, this doesn't work on Cygwin; on at least some
965 versions of Cygwin on some versions of Windows, merely trying to deter‐
966 mine if a file is a device type can cause the program to hang. A work‐
967 around is to delete or rename any filenames that are interpreted as
968 device names before performing the analysis. These so-called
969 ``reserved names'' are CON, PRN, AUX, CLOCK$, NUL, COM1-COM9, and
970 LPT1-LPT9, optionally followed by an extension (e.g., ``com1.txt''), in
971 any directory, and in any case (Windows is case-insensitive).
972
973 Do not load or diff hitlists from untrusted sources. They are imple‐
974 mented using the Python pickle module, and the pickle module is not
975 intended to be secure against erroneous or maliciously constructed
976 data. Stored hitlists are intended for later use by the same user who
977 created the hitlist; in that context this restriction is not a problem.
978
979
981 Flawfinder is based on simple text pattern matching, which is part of
982 its fundamental design and not easily changed. This design approach
983 leads to a number of fundamental limitations, e.g., a higher false pos‐
984 itive rate, and is the underlying cause of most of the bugs listed
985 here. On the positive side, flawfinder doesn't get confused by many
986 complicated preprocessor sequences that other tools sometimes choke on;
987 flawfinder can often handle code that cannot link, and sometimes cannot
988 even compile or build.
989
990 Flawfinder is currently limited to C/C++. In addition, when analyzing
991 C++ it focuses primarily on the C subset of C++. For example,
992 flawfinder does not report on expressions like cin >> charbuf, where
993 charbuf is a char array. That is because flawfinder doesn't have type
994 information, and ">>" is safe with many other types; reporting on all
995 ">>" would lead to too many false positives. That said, it's designed
996 so that adding support for other languages should be easy where its
997 text-based approach can usefully apply.
998
999 Flawfinder can be fooled by user-defined functions or method names that
1000 happen to be the same as those defined as ``hits'' in its database, and
1001 will often trigger on definitions (as well as uses) of functions with
1002 the same name. This is typically not a problem for C code. In C code,
1003 a function with the same name as a common library routine name often
1004 indicates that the developer is simply rewriting a common library rou‐
1005 tine with the same interface, say for portability's sake. C programs
1006 tend to avoid reusing the same name for a different purpose (since in C
1007 function names are global by default). There are reasonable odds that
1008 these rewritten routines will be vulnerable to the same kinds of mis‐
1009 use, and thus, reusing these rules is a reasonable approach. However,
1010 this can be a much more serious problem in C++ code which heavily uses
1011 classes and namespaces, since the same method name may have many dif‐
1012 ferent meanings. The --falsepositive option can help somewhat in this
1013 case. If this is a serious problem, feel free to modify the program,
1014 or process the flawfinder output through other tools to remove the
1015 false positives.
1016
1017 Preprocessor commands embedded in the middle of a parameter list of a
1018 call can cause problems in parsing, in particular, if a string is
1019 opened and then closed multiple times using an #ifdef .. #else con‐
1020 struct, flawfinder gets confused. Such constructs are bad style, and
1021 will confuse many other tools too. If you must analyze such files, re‐
1022 write those lines. Thankfully, these are quite rare.
1023
1024 Flawfinder reports vulnerabilities regardless of the parameters of
1025 "#if" or "#ifdef". A construct "#if VALUE" will often have VALUE of 0
1026 in some cases, and non-zero in others. Similarly, "#ifdef VALUE" will
1027 have VALUE defined in some cases, and not defined in others.
1028 Flawfinder reports in all cases, which means that flawfinder has a
1029 chance of reporting vulnerabilities in all alternatives. This is not a
1030 bug, this is intended behavior.
1031
1032 Flawfinder will report hits even if they are between a literal "#if 0"
1033 and "#endif". It would be possible to change this particular situa‐
1034 tion, but directly using "#if 0" to comment-out code (other than during
1035 debugging) is itself that the removal is very temporary (in which case
1036 we should report it) or an indicator of a problem with poor code prac‐
1037 tices. If you want to permanently get rid of code, then delete it
1038 instead of using "#if 0", since you can always see what it was using
1039 your version control software. If you don't use version control soft‐
1040 ware, then that's the bug you need to fix right now.
1041
1042 Some complex or unusual constructs can mislead flawfinder. In particu‐
1043 lar, if a parameter begins with gettext(" and ends with ), flawfinder
1044 will presume that the parameter of gettext is a constant. This means
1045 it will get confused by patterns like gettext("hi") + function("bye").
1046 In practice, this doesn't seem to be a problem; gettext() is usually
1047 wrapped around the entire parameter.
1048
1049 The routine to detect statically defined character arrays uses simple
1050 text matching; some complicated expressions can cause it to trigger or
1051 not trigger unexpectedly.
1052
1053 Flawfinder looks for specific patterns known to be common mistakes.
1054 Flawfinder (or any tool like it) is not a good tool for finding inten‐
1055 tionally malicious code (e.g., Trojan horses); malicious programmers
1056 can easily insert code that would not be detected by this kind of tool.
1057
1058 Flawfinder looks for specific patterns known to be common mistakes in
1059 application code. Thus, it is likely to be less effective analyzing
1060 programs that aren't application-layer code (e.g., kernel code or self-
1061 hosting code). The techniques may still be useful; feel free to
1062 replace the database if your situation is significantly different from
1063 normal.
1064
1065 Flawfinder's default output format (filename:linenumber, followed
1066 optionally by a :columnnumber) can be misunderstood if any source files
1067 have very weird filenames. Filenames embedding a newline/linefeed
1068 character will cause odd breaks, and filenames including colon (:) are
1069 likely to be misunderstood. This is especially important if
1070 flawfinder's output is being used by other tools, such as filters or
1071 text editors. If you are using flawfinder's output in other tools,
1072 consider using its CSV format instead (which can handle this). If
1073 you're looking at new code, examine the files for such characters.
1074 It's incredibly unwise to have such filenames anyway; many tools can't
1075 handle such filenames at all. Newline and linefeed are often used as
1076 internal data delimeters. The colon is often used as special charac‐
1077 ters in filesystems: MacOS uses it as a directory separator, Win‐
1078 dows/MS-DOS uses it to identify drive letters, Windows/MS-DOS inconsis‐
1079 tently uses it to identify special devices like CON:, and applications
1080 on many platforms use the colon to identify URIs/URLs. Filenames
1081 including spaces and/or tabs don't cause problems for flawfinder,
1082 though note that other tools might have problems with them.
1083
1084 Flawfinder is not internationalized, so it currently does not support
1085 localization.
1086
1087 In general, flawfinder attempts to err on the side of caution; it tends
1088 to report hits, so that they can be examined further, instead of
1089 silently ignoring them. Thus, flawfinder prefers to have false posi‐
1090 tives (reports that turn out to not be problems) rather than false neg‐
1091 atives (failures to report security vulnerabilities). But this is a
1092 generality; flawfinder uses simplistic heuristics and simply can't get
1093 everything "right".
1094
1095 Security vulnerabilities might not be identified as such by flawfinder,
1096 and conversely, some hits aren't really security vulnerabilities. This
1097 is true for all static security scanners, and is especially true for
1098 tools like flawfinder that use a simple lexical analysis and pattern
1099 analysis to identify potential vulnerabilities. Still, it can serve as
1100 a useful aid for humans, helping to identify useful places to examine
1101 further, and that's the point of this simple tool.
1102
1103
1105 See the flawfinder website at https://dwheeler.com/flawfinder. You
1106 should also see the Secure Programming HOWTO at
1107 https://dwheeler.com/secure-programs.
1108
1109
1111 David A. Wheeler (dwheeler@dwheeler.com).
1112
1113
1114
1115Flawfinder 4 Apr 2018 FLAWFINDER(1)