1goaccess(1) User Manuals goaccess(1)
2
3
4
6 goaccess - fast web log analyzer and interactive viewer.
7
9 goaccess [filename] [options...] [-c][-M][-H][-q][-d][...]
10
12 goaccess GoAccess is an open source real-time web log analyzer and
13 interactive viewer that runs in a terminal in *nix systems or through
14 your browser.
15
16 It provides fast and valuable HTTP statistics for system administrators
17 that require a visual server report on the fly.
18
19 GoAccess parses the specified web log file and outputs the data to the
20 X terminal. Features include:
21
22
23 General Statistics:
24 This panel gives a summary of several metrics, such as the num‐
25 ber of valid and invalid requests, time taken to analyze the
26 dataset, unique visitors, requested files, static files (CSS,
27 ICO, JPG, etc) HTTP referrers, 404s, size of the parsed log file
28 and bandwidth consumption.
29
30 Unique visitors
31 This panel shows metrics such as hits, unique visitors and cumu‐
32 lative bandwidth per date. HTTP requests containing the same IP,
33 the same date, and the same user agent are considered a unique
34 visitor. By default, it includes web crawlers/spiders.
35
36 Optionally, date specificity can be set to the hour level using
37 --date-spec=hr which will display dates such as 05/Jun/2016:16.
38 This is great if you want to track your daily traffic at the
39 hour level.
40
41 Requested files
42 This panel displays the most requested (non-static) files on
43 your web server. It shows hits, unique visitors, and percent‐
44 age, along with the cumulative bandwidth, protocol, and the
45 request method used.
46
47 Requested static files
48 Lists the most frequently static files such as: JPG, CSS, SWF,
49 JS, GIF, and PNG file types, along with the same metrics as the
50 last panel. Additional static files can be added to the configu‐
51 ration file.
52
53 404 or Not Found
54 Displays the same metrics as the previous request panels, how‐
55 ever, its data contains all pages that were not found on the
56 server, or commonly known as 404 status code.
57
58 Hosts This panel has detailed information on the hosts themselves.
59 This is great for spotting aggressive crawlers and identifying
60 who's eating your bandwidth.
61
62 Expanding the panel can display more information such as host's
63 reverse DNS lookup result, country of origin and city. If the -a
64 argument is enabled, a list of user agents can be displayed by
65 selecting the desired IP address, and then pressing ENTER.
66
67 Operating Systems
68 This panel will report which operating system the host used when
69 it hit the server. It attempts to provide the most specific ver‐
70 sion of each operating system.
71
72 Browsers
73 This panel will report which browser the host used when it hit
74 the server. It attempts to provide the most specific version of
75 each browser.
76
77 Visit Times
78 This panel will display an hourly report. This option displays
79 24 data points, one for each hour of the day.
80
81 Optionally, hour specificity can be set to the tenth of an hour
82 level using --hour-spec=min which will display hours as 16:4
83 This is great if you want to spot peaks of traffic on your
84 server.
85
86 Virtual Hosts
87 This panel will display all the different virtual hosts parsed
88 from the access log. This panel is displayed if %v is used
89 within the log-format string.
90
91 Referrers URLs
92 If the host in question accessed the site via another resource,
93 or was linked/diverted to you from another host, the URL they
94 were referred from will be provided in this panel. See
95 `--ignore-panel` in your configuration file to enable it. dis‐
96 abled by default.
97
98 Referring Sites
99 This panel will display only the host part but not the whole
100 URL. The URL where the request came from.
101
102 Keyphrases
103 It reports keyphrases used on Google search, Google cache, and
104 Google translate that have lead to your web server. At present,
105 it only supports Google search queries via HTTP. See `--ignore-
106 panel` in your configuration file to enable it. disabled by
107 default.
108
109 Geo Location
110 Determines where an IP address is geographically located. Sta‐
111 tistics are broken down by continent and country. It needs to be
112 compiled with GeoLocation support.
113
114 HTTP Status Codes
115 The values of the numeric status code to HTTP requests.
116
117 Remote User (HTTP authentication)
118 This is the userid of the person requesting the document as
119 determined by HTTP authentication. If the document is not pass‐
120 word protected, this part will be "-" just like the previous
121 one. This panel is not enabled unless %e is given within the
122 log-format variable.
123
124 Cache Status
125 If you are using caching on your server, you may be at the point
126 where you want to know if your request is being cached and
127 served from the cache. This panel shows the cache status of the
128 object the server served. This panel is not enabled unless %C is
129 given within the log-format variable. The status can be either
130 `MISS`, `BYPASS`, `EXPIRED`, `STALE`, `UPDATING`, `REVALIDATED`
131 or `HIT`
132
133 MIME Types
134 This panel specifies Media Types (formerly known as MIME types)
135 and Media Subtypes which will be assigned and listed underneath.
136 This panel is not enabled unless %M is given within the log-for‐
137 mat variable. See https://www.iana.org/assignments/media-
138 types/media-types.xhtml for more details.
139
140 Encryption Settings
141 This panel shows the SSL/TLS protocol used along the Cipher
142 Suites. This panel is not enabled unless %K is given within the
143 log-format variable.
144
145
146 NOTE: Optionally and if configured, all panels can display the average
147 time taken to serve the request.
148
149
151 There are three storage options that can be used with GoAccess. Choos‐
152 ing one will depend on your environment and needs.
153
154 Default Hash Tables
155 In-memory storage provides better performance at the cost of
156 limiting the dataset size to the amount of available physical
157 memory. GoAccess uses in-memory hash tables. It has very good
158 memory usage and pretty good performance. This storage has sup‐
159 port for on-disk persistence.
160
162 Multiple options can be used to configure GoAccess. For a complete up-
163 to-date list of configure options, run ./configure --help
164
165 --enable-debug
166 Compile with debugging symbols and turn off compiler optimiza‐
167 tions.
168
169 --enable-utf8
170 Compile with wide character support. Ncursesw is required.
171
172 --enable-geoip=<legacy|mmdb>
173 Compile with GeoLocation support. MaxMind's GeoIP is required.
174 legacy will utilize the original GeoIP databases. mmdb will
175 utilize the enhanced GeoIP2 databases.
176
177 --with-getline
178 Dynamically expands line buffer in order to parse full line
179 requests instead of using a fixed size buffer of 4096.
180
181 --with-openssl
182 Compile GoAccess with OpenSSL support for its WebSocket server.
183
185 The following options can be supplied to the command or specified in
186 the configuration file. If specified in the configuration file, long
187 options need to be used without prepending -- and without using the
188 equal sign =.
189
190 LOG/DATE/TIME FORMAT
191 --time-format=<timeformat>
192 The time-format variable followed by a space, specifies the log
193 format time containing either a name of a predefined format (see
194 options below) or any combination of regular characters and spe‐
195 cial format specifiers.
196
197 They all begin with a percentage (%) sign. See `man strftime`.
198 %T or %H:%M:%S.
199
200 Note that if a timestamp is given in microseconds, %f must be
201 used as time-format. If the timestamp is given in milliseconds
202 %* must be used as time-format.
203
204 --date-format=<dateformat>
205 The date-format variable followed by a space, specifies the log
206 format time containing either a name of a predefined format (see
207 options below) or any combination of regular characters and spe‐
208 cial format specifiers.
209
210 They all begin with a percentage (%) sign. See `man strftime`.
211 %Y-%m-%d.
212
213 Note that if a timestamp is given in microseconds, %f must be
214 used as date-format. If the timestamp is given in milliseconds
215 %* must be used as date-format.
216
217 --log-format=<logformat>
218 The log-format variable followed by a space or \t for tab-delim‐
219 ited, specifies the log format string.
220
221 Note that if there are spaces within the format, the string
222 needs to be enclosed in single/double quotes. Inner quotes need
223 to be escaped.
224
225 In addition to specifying the raw log/date/time formats, for
226 simplicity, any of the following predefined log format names can
227 be supplied to the log/date/time-format variables. GoAccess can
228 also handle one predefined name in one variable and another pre‐
229 defined name in another variable.
230
231 COMBINED - Combined Log Format,
232 VCOMBINED - Combined Log Format with Virtual Host,
233 COMMON - Common Log Format,
234 VCOMMON - Common Log Format with Virtual Host,
235 W3C - W3C Extended Log File Format,
236 SQUID - Native Squid Log Format,
237 CLOUDFRONT - Amazon CloudFront Web Distribution,
238 CLOUDSTORAGE - Google Cloud Storage,
239 AWSELB - Amazon Elastic Load Balancing,
240 AWSS3 - Amazon Simple Storage Service (S3)
241 CADDY - Caddy's JSON Structured format
242
243 Note: Piping data into GoAccess won't prompt a log/date/time
244 configuration dialog, you will need to previously define it in
245 your configuration file or in the command line.
246
247 USER INTERFACE OPTIONS
248 -c --config-dialog
249 Prompt log/time/date configuration window on program start. Only
250 when curses is initialized.
251
252 -i --hl-header
253 Color highlight active terminal panel.
254
255 -m --with-mouse
256 Enable mouse support on main terminal dashboard.
257
258 ---color=<fg:bg[attrs, PANEL]>
259 Specify custom colors for the terminal output.
260
261 Color Syntax
262 DEFINITION space/tab colorFG#:colorBG# [attributes,PANEL]
263
264 FG# = foreground color [-1...255] (-1 = default term color)
265 BG# = background color [-1...255] (-1 = default term color)
266
267 Optionally, it is possible to apply color attributes (multiple
268 attributes are comma separated), such as: bold, underline, nor‐
269 mal, reverse, blink
270
271 If desired, it is possible to apply custom colors per panel,
272 that is, a metric in the REQUESTS panel can be of color A, while
273 the same metric in the BROWSERS panel can be of color B.
274
275 Available color definitions:
276 COLOR_MTRC_HITS
277 COLOR_MTRC_VISITORS
278 COLOR_MTRC_DATA
279 COLOR_MTRC_BW
280 COLOR_MTRC_AVGTS
281 COLOR_MTRC_CUMTS
282 COLOR_MTRC_MAXTS
283 COLOR_MTRC_PROT
284 COLOR_MTRC_MTHD
285 COLOR_MTRC_HITS_PERC
286 COLOR_MTRC_HITS_PERC_MAX
287 COLOR_MTRC_VISITORS_PERC
288 COLOR_MTRC_VISITORS_PERC_MAX
289 COLOR_PANEL_COLS
290 COLOR_BARS
291 COLOR_ERROR
292 COLOR_SELECTED
293 COLOR_PANEL_ACTIVE
294 COLOR_PANEL_HEADER
295 COLOR_PANEL_DESC
296 COLOR_OVERALL_LBLS
297 COLOR_OVERALL_VALS
298 COLOR_OVERALL_PATH
299 COLOR_ACTIVE_LABEL
300 COLOR_BG
301 COLOR_DEFAULT
302 COLOR_PROGRESS
303
304 See configuration file for a sample color scheme.
305
306 --color-scheme=<1|2|3>
307 Choose among color schemes. 1 for the default grey scheme. 2
308 for the green scheme. 3 for the Monokai scheme (shown only if
309 terminal supports 256 colors).
310
311 --crawlers-only
312 Parse and display only crawlers (bots).
313
314 --html-custom-css=<path/custom.css>
315 Specifies a custom CSS file path to load in the HTML report.
316
317 --html-custom-js=<path/custom.js>
318 Specifies a custom JS file path to load in the HTML report.
319
320 --html-report-title=<title>
321 Set HTML report page title and header.
322
323 --html-refresh=<secs>
324 Refresh the HTML report every X seconds. The value has to be
325 between 1 and 60 seconds. The default is set to refresh the HTML
326 report every 1 second.
327
328 --html-prefs=<JSON>
329 Set HTML report default preferences. Supply a valid JSON object
330 containing the HTML preferences. It allows the ability to cus‐
331 tomize each panel plot. See example below.
332
333 Note: The JSON object passed needs to be a one line JSON string.
334 For instance,
335
336 --html-prefs='{"theme":"bright","perPage":5,"layout":"horizontal","showTables":true,"visitors":{"plot":{"chartType":"bar"}}}'
337
338 --json-pretty-print
339 Format JSON output using tabs and newlines.
340
341 Note: This is not recommended when outputting a real-time HTML
342 report since the WebSocket payload will much much larger.
343
344 --max-items=<number>
345 The maximum number of items to display per panel. The maximum
346 can be a number between 1 and n.
347
348 Note: Only the CSV and JSON output allow a maximum number
349 greater than the default value of 366 (or 50 in the real-time
350 HTML output) items per panel.
351
352 --no-color
353 Turn off colored output. This is the default output on terminals
354 that do not support colors.
355
356 --no-column-names
357 Don't write column names in the terminal output. By default, it
358 displays column names for each available metric in every panel.
359
360 --no-csv-summary
361 Disable summary metrics on the CSV output.
362
363 --no-progress
364 Disable progress metrics [total requests/requests per second].
365
366 --no-tab-scroll
367 Disable scrolling through panels when TAB is pressed or when a
368 panel is selected using a numeric key.
369
370 --no-html-last-updated
371 Do not show the last updated field displayed in the HTML gener‐
372 ated report.
373
374 --no-parsing-spinner
375 Do now show the progress metrics and parsing spinner.
376
377 SERVER OPTIONS
378 --addr Specify IP address to bind the server to. Otherwise it binds to
379 0.0.0.0.
380
381 Usually there is no need to specify the address, unless you
382 intentionally would like to bind the server to a different
383 address within your server.
384
385 --daemonize
386 Run GoAccess as daemon (only if --real-time-html enabled).
387
388 Note: It's important to make use of absolute paths across GoAc‐
389 cess' configuration.
390
391 --user-name=<username>
392 Run GoAccess as the specified user.
393
394 Note: It's important to ensure the user or the users' group can
395 access the input and output files as well as any other files
396 needed. Other groups the user belongs to will be ignored. As
397 such it's advised to run GoAccess behind a SSL proxy as it's
398 unlikely this user can access the SSL certificates.
399
400 --origin=<url>
401 Ensure clients send the specified origin header upon the Web‐
402 Socket handshake.
403
404 --pid-file=<path/goaccess.pid>
405 Write the daemon PID to a file when used along the --daemonize
406 option.
407
408 --port=<port>
409 Specify the port to use. By default GoAccess' WebSocket server
410 listens on port 7890.
411
412 --real-time-html
413 Enable real-time HTML output.
414
415 GoAccess uses its own WebSocket server to push the data from the
416 server to the client. See http://gwsocket.io for more details
417 how the WebSocket server works.
418
419 --ws-url=<[scheme://]url[:port]>
420 URL to which the WebSocket server responds. This is the URL sup‐
421 plied to the WebSocket constructor on the client side.
422
423 Optionally, it is possible to specify the WebSocket URI scheme,
424 such as ws:// or wss:// for unencrypted and encrypted connec‐
425 tions. e.g., wss://goaccess.io
426
427 If GoAccess is running behind a proxy, you could set the client
428 side to connect to a different port by specifying the host fol‐
429 lowed by a colon and the port. e.g., goaccess.io:9999
430
431 By default, it will attempt to connect to the generated report's
432 hostname. If GoAccess is running on a remote server, the host of
433 the remote server should be specified here. Also, make sure it
434 is a valid host and NOT an http address.
435
436 --fifo-in=<path/file>
437 Creates a named pipe (FIFO) that reads from on the given
438 path/file.
439
440 --fifo-out=<path/file>
441 Creates a named pipe (FIFO) that writes to the given path/file.
442
443 --ssl-cert=<cert.crt>
444 Path to TLS/SSL certificate. In order to enable TLS/SSL support,
445 GoAccess requires that --ssl-cert and --ssl-key are used.
446
447 Only if configured using --with-openssl
448
449 --ssl-key=<priv.key>
450 Path to TLS/SSL private key. In order to enable TLS/SSL support,
451 GoAccess requires that --ssl-cert and --ssl-key are used.
452
453 Only if configured using --with-openssl
454
455 FILE OPTIONS
456 - The log file to parse is read from stdin.
457
458 -f --log-file=<logfile>
459 Specify the path to the input log file. If set in the config
460 file, it will take priority over -f from the command line.
461
462 -S --log-size=<bytes>
463 Specify the log size in bytes. This is useful when piping in
464 logs for processing in which the log size can be explicitly set.
465
466 -l --debug-file=<debugfile>
467 Send all debug messages to the specified file.
468
469 -p --config-file=<configfile>
470 Specify a custom configuration file to use. If set, it will take
471 priority over the global configuration file (if any).
472
473 --invalid-requests=<filename>
474 Log invalid requests to the specified file.
475
476 --unknowns-log=<filename>
477 Log unknown browsers and OSs to the specified file.
478
479 --no-global-config
480 Do not load the global configuration file. This directory should
481 normally be /usr/local/etc, unless specified with
482 --sysconfdir=/dir. See --dcf option for finding the default
483 configuration file.
484
485 PARSE OPTIONS
486 -a --agent-list
487 Enable a list of user-agents by host. For faster parsing, do not
488 enable this flag.
489
490 -d --with-output-resolver
491 Enable IP resolver on HTML|JSON output.
492
493 -e --exclude-ip=<IP|IP-range>
494 Exclude an IPv4 or IPv6 from being counted. Ranges can be
495 included as well using a dash in between the IPs (start-end).
496
497 Examples:
498 exclude-ip 127.0.0.1
499 exclude-ip 192.168.0.1-192.168.0.100
500 exclude-ip ::1
501 exclude-ip 0:0:0:0:0:ffff:808:804-0:0:0:0:0:ffff:808:808
502
503 -H --http-protocol=<yes|no>
504 Set/unset HTTP request protocol. This will create a request key
505 containing the request protocol + the actual request.
506
507 -M --http-method=<yes|no>
508 Set/unset HTTP request method. This will create a request key
509 containing the request method + the actual request.
510
511 -o --output=<path/file.[json|csv|html]>
512 Write output to stdout given one of the following files and the
513 corresponding extension for the output format:
514
515 /path/file.csv - Comma-separated values (CSV)
516 /path/file.json - JSON (JavaScript Object Notation)
517 /path/file.html - HTML
518
519 -q --no-query-string
520 Ignore request's query string. i.e.,
521 www.google.com/page.htm?query => www.google.com/page.htm.
522
523 Note: Removing the query string can greatly decrease memory con‐
524 sumption, especially on timestamped requests.
525
526 -r --no-term-resolver
527 Disable IP resolver on terminal output.
528
529 --444-as-404
530 Treat non-standard status code 444 as 404.
531
532 --4xx-to-unique-count
533 Add 4xx client errors to the unique visitors count.
534
535 --anonymize-ip
536 Anonymize the client IP address. The IP anonymization option
537 sets the last octet of IPv4 user IP addresses and the last 80
538 bits of IPv6 addresses to zeros. e.g., 192.168.20.100 =>
539 192.168.20.0 e.g., 2a03:2880:2110:df07:face:b00c::1 =>
540 2a03:2880:2110:df07::
541
542 --all-static-files
543 Include static files that contain a query string. e.g.,
544 /fonts/fontawesome-webfont.woff?v=4.0.3
545
546 --browsers-file=<path>
547 By default GoAccess parses an "essential/basic" curated list of
548 browsers & crawlers. If you need to add additional browsers, use
549 this option. Include an additional delimited list of
550 browsers/crawlers/feeds etc. See config/browsers.list for an
551 example or https://raw.githubusercontent.com/allinurl/goac‐
552 cess/master/config/browsers.list
553
554 --date-spec=<date|hr>
555 Set the date specificity to either date (default) or hr to dis‐
556 play hours appended to the date.
557
558 This is used in the visitors panel. It's useful for tracking
559 visitors at the hour level. For instance, an hour specificity
560 would yield to display traffic as 18/Dec/2010:19
561
562 --double-decode
563 Decode double-encoded values. This includes, user-agent,
564 request, and referer.
565
566 --enable-panel=<PANEL>
567 Enable parsing and displaying the given panel.
568
569 Available panels:
570 VISITORS
571 REQUESTS
572 REQUESTS_STATIC
573 NOT_FOUND
574 HOSTS
575 OS
576 BROWSERS
577 VISIT_TIMES
578 VIRTUAL_HOSTS
579 REFERRERS
580 REFERRING_SITES
581 KEYPHRASES
582 STATUS_CODES
583 REMOTE_USER
584 CACHE_STATUS
585 GEO_LOCATION
586 MIME_TYPE
587 TLS_TYPE
588
589 --hide-referer=<NEEDLE>
590 Hide a referer but still count it. Wild cards are allowed in the
591 needle. i.e., *.bing.com.
592
593 --hour-spec=<hr|min>
594 Set the time specificity to either hour (default) or min to dis‐
595 play the tenth of an hour appended to the hour.
596
597 This is used in the time distribution panel. It's useful for
598 tracking peaks of traffic on your server at specific times.
599
600 --ignore-crawlers
601 Ignore crawlers from being counted.
602
603 --ignore-panel=<PANEL>
604 Ignore parsing and displaying the given panel.
605
606 Available panels:
607 VISITORS
608 REQUESTS
609 REQUESTS_STATIC
610 NOT_FOUND
611 HOSTS
612 OS
613 BROWSERS
614 VISIT_TIMES
615 VIRTUAL_HOSTS
616 REFERRERS
617 REFERRING_SITES
618 KEYPHRASES
619 STATUS_CODES
620 REMOTE_USER
621 CACHE_STATUS
622 GEO_LOCATION
623 MIME_TYPE
624 TLS_TYPE
625
626 --ignore-referer=<referer>
627 Ignore referers from being counted. Wildcards allowed. e.g.,
628 *.domain.com ww?.domain.*
629
630 --ignore-statics=<req|panel>
631 Ignore static file requests.
632
633 req
634 Only ignore request from valid requests
635
636 panels
637 Ignore request from panels.
638
639 Note that it will count them towards the total number of
640 requests
641
642 --ignore-status=<CODE>
643 Ignore parsing and displaying one or multiple status code(s).
644 For multiple status codes, use this option multiple times.
645
646 --keep-last=<num_days>
647 Keep the last specified number of days in storage. This will
648 recycle the storage tables. e.g., keep & show only the last 7
649 days.
650
651 --no-ip-validation
652 Disable client IP validation. Useful if IP addresses have been
653 obfuscated before being logged. The log still needs to contain
654 a placeholder for %h usually it's a resolved IP. e.g.
655 ord37s19-in-f14.1e100.net.
656
657 --no-strict-status
658 Disable HTTP status code validation. Some servers would record
659 this value only if a connection was established to the target
660 and the target sent a response. Otherwise, it could be recorded
661 as -.
662
663 --num-tests=<number>
664 Number of lines from the access log to test against the provided
665 log/date/time format. By default, the parser is set to test 10
666 lines. If set to 0, the parser won't test any lines and will
667 parse the whole access log. If a line matches the given
668 log/date/time format before it reaches <number>, the parser will
669 consider the log to be valid, otherwise GoAccess will return
670 EXIT_FAILURE and display the relevant error messages.
671
672 --process-and-exit
673 Parse log and exit without outputting data. Useful if we are
674 looking to only add new data to the on-disk database without
675 outputting to a file or a terminal.
676
677 --real-os
678 Display real OS names. e.g, Windows XP, Snow Leopard.
679
680 --sort-panel=<PANEL,FIELD,ORDER>
681 Sort panel on initial load. Sort options are separated by comma.
682 Options are in the form: PANEL,METRIC,ORDER
683
684 Available metrics:
685 BY_HITS - Sort by hits
686 BY_VISITORS - Sort by unique visitors
687 BY_DATA - Sort by data
688 BY_BW - Sort by bandwidth
689 BY_AVGTS - Sort by average time served
690 BY_CUMTS - Sort by cumulative time served
691 BY_MAXTS - Sort by maximum time served
692 BY_PROT - Sort by http protocol
693 BY_MTHD - Sort by http method
694
695 Available orders:
696 ASC
697 DESC
698
699 --static-file=<extension>
700 Add static file extension. e.g.: .mp3 Extensions are case sensi‐
701 tive.
702
703 GEOLOCATION OPTIONS
704 -g --std-geoip
705 Standard GeoIP database for less memory usage.
706
707 --geoip-database=<geofile>
708 Specify path to GeoIP database file. i.e., GeoLiteCity.dat.
709
710 If using GeoIP2, you will need to download the GeoLite2 City or
711 Country database from MaxMind.com and use the option --geoip-
712 database to specify the database. You can also get updated data‐
713 base files for GeoIP legacy, you can find these as GeoLite
714 Legacy Databases from MaxMind.com. IPv4 and IPv6 files are sup‐
715 ported as well. For updated DB URLs, please see the default
716 GoAccess configuration file.
717
718 Note: --geoip-city-data is an alias of --geoip-database.
719
720 OTHER OPTIONS
721 -h --help
722 The help.
723
724 -s --storage
725 Display current storage method. i.e., B+ Tree, Hash.
726
727 -V --version
728 Display version information and exit.
729
730 --dcf Display the path of the default config file when `-p` is not
731 used.
732
733 PERSISTENCE STORAGE OPTIONS
734 --persist
735 Persist parsed data into disk. If database files exist, files
736 will be overwritten. This should be set to the first dataset.
737 See examples below.
738
739 --restore
740 Load previously stored data from disk. If reading persisted data
741 only, the database files need to exist. See --persist and exam‐
742 ples below.
743
744 --db-path=<dir>
745 Path where the on-disk database files are stored. The default
746 value is the /tmp directory.
747
748
750 GoAccess can parse virtually any web log format.
751
752 Predefined options include, Common Log Format (CLF), Combined Log For‐
753 mat (XLF/ELF), including virtual host, Amazon CloudFront (Download Dis‐
754 tribution), Google Cloud Storage and W3C format (IIS).
755
756 GoAccess allows any custom format string as well.
757
758 There are two ways to configure the log format. The easiest is to run
759 GoAccess with -c to prompt a configuration window. Otherwise, it can be
760 configured under ~/.goaccessrc or the %sysconfdir%.
761
762 time-format
763 The time-format variable followed by a space, specifies the log
764 format time containing any combination of regular characters and
765 special format specifiers. They all begin with a percentage (%)
766 sign. See `man strftime`. %T or %H:%M:%S.
767
768 Note: If a timestamp is given in microseconds, %f must be used
769 as time-format or %* if the timestamp is given in milliseconds.
770
771 date-format
772 The date-format variable followed by a space, specifies the log
773 format date containing any combination of regular characters and
774 special format specifiers. They all begin with a percentage (%)
775 sign. See `man strftime`. e.g., %Y-%m-%d.
776
777 Note: If a timestamp is given in microseconds, %f must be used
778 as date-format or %* if the timestamp is given in milliseconds.
779
780 log-format
781 The log-format variable followed by a space or \t , specifies
782 the log format string.
783
784 %x A date and time field matching the time-format and date-format
785 variables. This is used when given a timestamp or the date &
786 time are concatenated as a single string (e.g., 1501647332 or
787 20170801235000) instead of the date and time being in two sepa‐
788 rated variables.
789
790 %t time field matching the time-format variable.
791
792 %d date field matching the date-format variable.
793
794 %v The canonical Server Name of the server serving the request
795 (Virtual Host).
796
797 %e This is the userid of the person requesting the document as
798 determined by HTTP authentication.
799
800 %C The cache status of the object the server served.
801
802 %h host (the client IP address, either IPv4 or IPv6)
803
804 %r The request line from the client. This requires specific delim‐
805 iters around the request (as single quotes, double quotes, or
806 anything else) to be parsable. If not, we have to use a combina‐
807 tion of special format specifiers as %m %U %H.
808
809 %q The query string.
810
811 %m The request method.
812
813 %U The URL path requested.
814
815 Note: If the query string is in %U, there is no need to use %q.
816 However, if the URL path, does not include any query string, you
817 may use %q and the query string will be appended to the request.
818
819 %H The request protocol.
820
821 %s The status code that the server sends back to the client.
822
823 %b The size of the object returned to the client.
824
825 %R The "Referrer" HTTP request header.
826
827 %u The user-agent HTTP request header.
828
829 %K The TLS encryption settings chosen for the connection. (In
830 Apache LogFormat: %{SSL_PROTOCOL}x)
831
832 %k The TLS encryption settings chosen for the connection. (In
833 Apache LogFormat: %{SSL_CIPHER}x)
834
835 %M The MIME-type of the requested resource. (In Apache LogFormat:
836 %{Content-Type}o)
837
838 %D The time taken to serve the request, in microseconds as a deci‐
839 mal number.
840
841 %T The time taken to serve the request, in seconds with millisec‐
842 onds resolution.
843
844 %L The time taken to serve the request, in milliseconds as a deci‐
845 mal number.
846
847 %^ Ignore this field.
848
849 %~ Move forward through the log string until a non-space (!isspace)
850 char is found.
851
852 ~h The host (the client IP address, either IPv4 or IPv6) in a X-
853 Forwarded-For (XFF) field.
854
855 It uses a special specifier which consists of a tilde before the
856 host specifier, followed by the character(s) that delimit the
857 XFF field, which are enclosed by curly braces (i.e., ~h{," })
858
859 For example, ~h{," } is used in order to parse "11.25.11.53,
860 17.68.33.17" field which is delimited by a double quote, a
861 comma, and a space.
862
863 Note: In order to get the average, cumulative and maximum time served
864 in GoAccess, you will need to start logging response times in your web
865 server. In Nginx you can add $request_time to your log format, or %D in
866 Apache.
867
868 Important: If multiple time served specifiers are used at the same
869 time, the first option specified in the format string will take prior‐
870 ity over the other specifiers.
871
872 GoAccess requires the following fields:
873
874 %h a valid IPv4/6
875
876 %d a valid date
877
878 %r the request
879
881 F1 or h
882 Main help.
883
884 F5 Redraw main window.
885
886 q Quit the program, current window or collapse active module
887
888 o or ENTER
889 Expand selected module or open window
890
891 0-9 and Shift + 0
892 Set selected module to active
893
894 j Scroll down within expanded module
895
896 k Scroll up within expanded module
897
898 c Set or change scheme color.
899
900 TAB Forward iteration of modules. Starts from current active module.
901
902 SHIFT + TAB
903 Backward iteration of modules. Starts from current active mod‐
904 ule.
905
906 ^f Scroll forward one screen within an active module.
907
908 ^b Scroll backward one screen within an active module.
909
910 s Sort options for active module
911
912 / Search across all modules (regex allowed)
913
914 n Find the position of the next occurrence across all modules.
915
916 g Move to the first item or top of screen.
917
918 G Move to the last item or bottom of screen.
919
921 Note: Piping data into GoAccess won't prompt a log/date/time configura‐
922 tion dialog, you will need to previously define it in your configura‐
923 tion file or in the command line.
924
925
926 DIFFERENT OUTPUTS
927 To output to a terminal and generate an interactive report:
928
929 # goaccess access.log
930
931 To generate an HTML report:
932
933 # goaccess access.log -a -o report.html
934
935 To generate a JSON report:
936
937 # goaccess access.log -a -d -o report.json
938
939 To generate a CSV file:
940
941 # goaccess access.log --no-csv-summary -o report.csv
942
943 GoAccess also allows great flexibility for real-time filtering and
944 parsing. For instance, to quickly diagnose issues by monitoring logs
945 since goaccess was started:
946
947 # tail -f access.log | goaccess -
948
949 And even better, to filter while maintaining opened a pipe to preserve
950 real-time analysis, we can make use of tail -f and a matching pattern
951 tool such as grep, awk, sed, etc:
952
953 # tail -f access.log | grep -i --line-buffered 'firefox' | goac‐
954 cess --log-format=COMBINED -
955
956 or to parse from the beginning of the file while maintaining the pipe
957 opened and applying a filter
958
959 # tail -f -n +0 access.log | grep -i --line-buffered 'firefox' |
960 goaccess --log-format=COMBINED -o report.html --real-time-html -
961
962 MULTIPLE LOG FILES
963 There are several ways to parse multiple logs with GoAccess. The sim‐
964 plest is to pass multiple log files to the command line:
965
966 # goaccess access.log access.log.1
967
968 It's even possible to parse files from a pipe while reading regular
969 files:
970
971 # cat access.log.2 | goaccess access.log access.log.1 -
972
973 Note that the single dash is appended to the command line to let GoAc‐
974 cess know that it should read from the pipe.
975
976 Now if we want to add more flexibility to GoAccess, we can do a series
977 of pipes. For instance, if we would like to process all compressed log
978 files access.log.*.gz in addition to the current log file, we can do:
979
980 # zcat access.log.*.gz | goaccess access.log -
981
982 Note: On Mac OS X, use gunzip -c instead of zcat.
983
984 REAL TIME HTML OUTPUT
985 GoAccess has the ability to output real-time data in the HTML report.
986 You can even email the HTML file since it is composed of a single file
987 with no external file dependencies, how neat is that!
988
989 The process of generating a real-time HTML report is very similar to
990 the process of creating a static report. Only --real-time-html is
991 needed to make it real-time.
992
993 # goaccess access.log -o /usr/share/nginx/html/site/report.html
994 --real-time-html
995
996 By default, GoAccess will use the host name of the generated report.
997 Optionally, you can specify the URL to which the client's browser will
998 connect to. See https://goaccess.io/faq for a more detailed example.
999
1000 # goaccess access.log -o report.html --real-time-html --ws-
1001 url=goaccess.io
1002
1003 By default, GoAccess listens on port 7890, to use a different port
1004 other than 7890, you can specify it as (make sure the port is opened):
1005
1006 # goaccess access.log -o report.html --real-time-html
1007 --port=9870
1008
1009 And to bind the WebSocket server to a different address other than
1010 0.0.0.0, you can specify it as:
1011
1012 # goaccess access.log -o report.html --real-time-html
1013 --addr=127.0.0.1
1014
1015 Note: To output real time data over a TLS/SSL connection, you need to
1016 use --ssl-cert=<cert.crt> and --ssl-key=<priv.key>.
1017
1018 WORKING WITH DATES
1019 Another useful pipe would be filtering dates out of the web log
1020
1021 The following will get all HTTP requests starting on 05/Dec/2010 until
1022 the end of the file.
1023
1024 # sed -n '/05Dec2010/,$ p' access.log | goaccess -a -
1025
1026 or using relative dates such as yesterdays or tomorrows day:
1027
1028 # sed -n '/'$(date '+%d%b%Y' -d '1 week ago')'/,$ p' access.log
1029 | goaccess -a -
1030
1031 If we want to parse only a certain time-frame from DATE a to DATE b, we
1032 can do:
1033
1034 # sed -n '/5Nov2010/,/5Dec2010/ p' access.log | goaccess -a -
1035
1036 If we want to preserve only certain amount of data and recycle storage,
1037 we can keep only a certain number of days. For instance to keep & show
1038 the last 5 days:
1039
1040 # goaccess access.log --keep-last=5
1041
1042 VIRTUAL HOSTS
1043 Assuming your log contains the virtual host (server blocks) field. For
1044 instance:
1045
1046 vhost.com:80 10.131.40.139 - - [02/Mar/2016:08:14:04 -0600] "GET
1047 /shop/bag-p-20 HTTP/1.1" 200 6715 "-" "Apache (internal dummy
1048 connection)"
1049
1050 And you would like to append the virtual host to the request in order
1051 to see which virtual host the top urls belong to
1052
1053 awk '$8=$1$8' access.log | goaccess -a -
1054
1055 To exclude a list of virtual hosts you can do the following:
1056
1057 # grep -v "`cat exclude_vhost_list_file`" vhost_access.log |
1058 goaccess -
1059
1060 FILES & STATUS CODES
1061 To parse specific pages, e.g., page views, html, htm, php, etc. within
1062 a request:
1063
1064 # awk '$7~/.html|.htm|.php/' access.log | goaccess -
1065
1066 Note, $7 is the request field for the common and combined log format,
1067 (without Virtual Host), if your log includes Virtual Host, then you
1068 probably want to use $8 instead. It's best to check which field you are
1069 shooting for, e.g.:
1070
1071 # tail -10 access.log | awk '{print $8}'
1072
1073 Or to parse a specific status code, e.g., 500 (Internal Server Error):
1074
1075 # awk '$9~/500/' access.log | goaccess -
1076
1077 SERVER
1078 Also, it is worth pointing out that if we want to run GoAccess at lower
1079 priority, we can run it as:
1080
1081 # nice -n 19 goaccess -f access.log -a
1082
1083 and if you don't want to install it on your server, you can still run
1084 it from your local machine:
1085
1086 # ssh -n root@server 'tail -f /var/log/apache2/access.log' |
1087 goaccess -
1088
1089 Note: SSH requires -n so GoAccess can read from stdin. Also, make sure
1090 to use SSH keys for authentication as it won't work if a passphrase is
1091 required.
1092
1093 INCREMENTAL LOG PROCESSING
1094 GoAccess has the ability to process logs incrementally through its
1095 internal storage and dump its data to disk. It works in the following
1096 way:
1097
1098
1099 1 A dataset must be persisted first with --persist, then the same
1100 dataset can be loaded with
1101
1102 2 --restore. If new data is passed (piped or through a log file), it
1103 will append it to the original dataset.
1104
1105
1106 NOTES
1107
1108 GoAccess keeps track of inodes of all the files processed (assuming
1109 files will stay on the same partition), in addition, it extracts a
1110 snippet of data from the log along with the last line parsed of each
1111 file and the timestamp of the last line parsed. e.g.,
1112 inode:29627417|line:20012|ts:20171231235059
1113
1114 First it compares if the snippet matches the log being parsed, if it
1115 does, it assumes the log hasn't changed dramatically, e.g., hasn't been
1116 truncated. If the inode does not match the current file, it parses all
1117 lines. If the current file matches the inode, it then reads the remain‐
1118 ing lines and updates the count of lines parsed and the timestamp. As
1119 an extra precaution, it won't parse log lines with a timestamp ≤ than
1120 the one stored.
1121
1122 Piped data works based off the timestamp of the last line read. For
1123 instance, it will parse and discard all incoming entries until it finds
1124 a timestamp >= than the one stored.
1125
1126
1127 For instance:
1128
1129 // last month access log
1130 # goaccess access.log.1 --persist
1131
1132 then, load it with
1133
1134 // append this month access log, and preserve new data
1135 # goaccess access.log --restore --persist
1136
1137 To read persisted data only (without parsing new data)
1138
1139 # goaccess --restore
1140
1142 Each active panel has a total of 366 items or 50 in the real-time HTML
1143 report. The number of items is customizable using max-items However,
1144 only the CSV and JSON output allow a maximum number greater than the
1145 default value of 366 items per panel.
1146
1147 A hit is a request (line in the access log), e.g., 10 requests = 10
1148 hits. HTTP requests with the same IP, date, and user agent are consid‐
1149 ered a unique visit.
1150
1152 If you think you have found a bug, please send me an email to goac‐
1153 cess@prosoftcorp.com or use the issue tracker in
1154 https://github.com/allinurl/goaccess/issues
1155
1157 Gerardo Orellana <goaccess@prosoftcorp.com> For more details about it,
1158 or new releases, please visit https://goaccess.io
1159
1160
1161
1162Linux FEBRUARY 2021 goaccess(1)