1webalizer(1) The Webalizer webalizer(1)
2
3
4
6 webalizer - A web server log file analysis tool.
7
9 webalizer [ option ... ] [ log-file ]
10
11 webazolver [ option ... ] [ log-file ]
12
14 The Webalizer is a web server log file analysis program which produces
15 usage statistics in HTML format for viewing with a browser. The
16 results are presented in both columnar and graphical format, which
17 facilitates interpretation. Yearly, monthly, daily and hourly usage
18 statistics are presented, along with the ability to display usage by
19 site, URL, referrer, user agent (browser), username, search strings,
20 entry/exit pages, and country (some information may not be available
21 if not present in the log file being processed).
22
23 The Webalizer supports CLF (common log format) log files, as well as
24 Combined log formats as defined by NCSA and others, and variations of
25 these which it attempts to handle intelligently. In addition, the
26 Webalizer also supports wu-ftpd xferlog formatted log files, allowing
27 analysis of ftp servers, and squid proxy logs. Logs may also be com‐
28 pressed, via gzip. If a compressed log file is detected, it will be
29 automatically uncompressed while it is read. Compressed logs must have
30 the standard gzip extension of .gz.
31
32 webazolver is normally just a symbolic link to the webalizer. When run
33 as webazolver, only DNS file creation/updates are performed, and the
34 program will exit once complete. All normal options and configuration
35 directives are available, however many will not be used. In addition,
36 a DNS cache file must be specified. If the number of DNS children pro‐
37 cesses to use are not specified, the webazolver will default to 5.
38
39 This documentation applies to The Webalizer Version 2.01
40
42 The Webalizer was designed to be run from a Unix command line prompt or
43 as a crond(8) job. Once executed, the general flow of the program is:
44
45 o A default configuration file is scanned for. A file named
46 webalizer.conf is searched for in the current directory, and if
47 found, and is owned by the invoking user, then its configura‐
48 tion data is parsed. If the file is not present in the current
49 directory, the file /etc/webalizer.conf is searched for and,
50 if found, is used instead.
51
52 o Any command line arguments given to the program are parsed.
53 This may include the specification of a configuration file,
54 which is processed at the time it is encountered.
55
56 o If a log file was specified, it is opened and made ready for
57 processing. If no log file was given, STDIN is used for input.
58 If the log filename '-' is specified, STDIN will be forced.
59
60 o If an output directory was specified, the program does a
61 chdir(2) to that directory in prepration for generating output.
62 If no output directory was given, the current directory is
63 used.
64
65 o If a non-zero number of DNS Children processes were specified,
66 they will be started, and the specified log file will be pro‐
67 cessed, creating or updating the specified DNS cache file.
68
69 o If no hostname was given, the program attempts to get the host‐
70 name using a uname(2) system call. If that fails, localhost is
71 used.
72
73 o A history file is searched for in the current directory (output
74 directory) and read if found. This file keeps totals for pre‐
75 vious months, which is used in the main index.html HTML docu‐
76 ment. Note: The file location can now be specified with the
77 HistoryName configuration option.
78
79 o If incremental processing was specified, a data file is
80 searched for and loaded if found, containing the 'internal
81 state' data of the program at the end of a previous run. Note:
82 The file location can now be specified with the IncrementalName
83 configuration option.
84
85 o Main processing begins on the log file. If the log spans mul‐
86 tiple months, a seperate HTML document is created for each
87 month.
88
89 o After main processing, the main index.html page is created,
90 which has totals by month and links to each months HTML docu‐
91 ment.
92
93 o A new history file is saved to disk, which includes totals gen‐
94 erated by The Webalizer during the current run.
95
96 o If incremental processing was specified, a data file is written
97 that contains the 'internal state' data at the end of this run.
98
100 Version 1.2x of The Webalizer adds incremental run capability. Simply
101 put, this allows processing large log files by breaking them up into
102 smaller pieces, and processing these pieces instead. What this means
103 in real terms is that you can now rotate your log files as often as you
104 want, and still be able to produce monthly usage statistics without the
105 loss of any detail. Basically, The Webalizer saves and restores all
106 internal data in a file named webalizer.current. This allows the pro‐
107 gram to 'start where it left off' so to speak, and allows the preserva‐
108 tion of detail from one run to the next. The data file is placed in
109 the current output directory, and is a plain ascii text file that can
110 be viewed with any standard text editor. It's location and name may be
111 changed using the IncrementalName configuration keyword.
112
113 Some special precautions need to be taken when using the incremental
114 run capability of The Webalizer. Configuration options should not be
115 changed between runs, as that could cause corruption of the internal
116 data stored. For example, changing the MangleAgents level will cause
117 different representations of user agents to be stored, producing
118 invalid results in the user agents section of the report. If you need
119 to change configuration options, do it at the end of the month after
120 normal processing of the previous month and before processing the cur‐
121 rent month. You may also want to delete the webalizer.current file as
122 well.
123
124 The Webalizer also attempts to prevent data duplication by keeping
125 track of the timestamp of the last record processed. This timestamp is
126 then compared to current records being processed, and any records that
127 were logged previous to that timestamp are ignored. This, in theory,
128 should allow you to re-process logs that have already been processed,
129 or process logs that contain a mix of processed/not yet processed
130 records, and not produce duplication of statistics. The only time this
131 may break is if you have duplicate timestamps in two seperate log
132 files... any records in the second log file that do have the same time‐
133 stamp as the last record in the previous log file processed, will be
134 discarded as if they had already been processed. There are lots of
135 ways to prevent this however, for example, stopping the web server
136 before rotating logs will prevent this situation. This setup also
137 necessitates that you always process logs in chronological order, oth‐
138 erwise data loss will occur as a result of the timestamp compare.
139
141 The Webalizer supports reverse DNS lookups through a DNS cache file
142 that is either created/updated at run-time, or has been previously cre‐
143 ated, either by a previous run of the webalizer, or by running the
144 stand-alone version, webazolver. In order to perform reverse DNS
145 lookups, a DNSCache filename must be specified. In order to cre‐
146 ate/update the cache file at run-time, the DNSChildren number must be
147 non-zero. The DNSChildren value specifies the number of children pro‐
148 cesses to fork, each of which will perform reverse DNS lookups in order
149 to create/update the DNS cache file. See the file DNS.README for addi‐
150 tional information.
151
153 The Webalizer supports many different configuration options that will
154 alter the way the program behaves and generates output. Most of these
155 can be specified on the command line, while some can only be specified
156 in a configuration file. The command line options are listed below,
157 with references to the corresponding configuration file keywords.
158
159 General Options
160
161 -h Display all available command line options and exit program.
162
163 -v -V Display program version and exit program.
164
165 -d Debug. Display debugging information for errors and warnings.
166
167 -i IgnoreHist. Ignore history. USE WITH CAUTION. This will cause
168 The Webalizer to ignore any previous monthly history file only.
169 Incremental data (if present) is still processed.
170
171 -p Incremental. Preserve internal data between runs.
172
173 -q Quiet. Supress informational messages. Does not supress warn‐
174 ings or errors.
175
176 -Q ReallyQuiet. Supress all messages including warnings and
177 errors.
178
179 -T TimeMe. Force display of timing information at end of process‐
180 ing.
181
182 -c file Use configuration file file.
183
184 -n name HostName. Use the hostname name.
185
186 -o dir OutputDir. Use output directory dir.
187
188 -t name ReportTitle. Use name for report title.
189
190 -F ( clf | ftp | squid )
191 LogType. Specify log type to be processed. Value can be
192 either clf, ftp or squid format. If not specified, will
193 default to CLF format. FTP logs must be in standard wu-ftpd
194 xferlog format.
195
196 -f FoldSeqErr. Fold out of sequence log records back into analy‐
197 sis, by treating as if they were the same date/time as the last
198 good record. Normally, out of sequence log records are simply
199 ignored.
200
201 -Y CountryGraph. Supress country graph.
202
203 -G HourlyGraph. Supress hourly graph.
204
205 -x name HTMLExtension. Defines HTML file extension to use. If not
206 specified, defaults to html. Do not include the leading
207 period.
208
209 -H HourlyStats. Supress hourly statistics.
210
211 -L GraphLegend. Supress color coded graph legends.
212
213 -l num GraphLines. Specify number of background lines. Default is 2.
214 Use zero ('0') to disable the lines.
215
216 -P name PageType. Specify file extensions that are considered pages.
217 Sometimes referred to as pageviews.
218
219 -m num VisitTimeout. Specify the Visit timeout period. Specified in
220 number of seconds. Default is 1800 seconds (30 minutes).
221
222 -I name IndexAlias. Use the filename name as an additional alias for
223 index..
224
225 -M num MangleAgents. Mangle user agent names according to the mangle
226 level specified by num. Mangle levels are:
227
228 5 Browser name and major version.
229
230 4 Browser name, major and minor version.
231
232 3 Browser name, major version, minor version to two decimal
233 places.
234
235 2 Browser name, major and minor versions and sub-version.
236
237 1 Browser name, version and machine type if possible.
238
239 0 All informaiton (left unchanged).
240
241 -g num GroupDomains. Automatically group sites by domain. The group‐
242 ing level specified by num can be thought of as 'the number of
243 dots' to display in the grouping. The default value of 0 dis‐
244 ables any domain grouping.
245
246 -D name DNSCache. Use the DNS cache file name.
247
248 -N num DNSChildren. Use num DNS children processes to perform DNS
249 lookups, either creating or updateing the DNS cache file.
250 Specify zero (0) to disable cache file creation/updates. If
251 given, a DNS cache filename must be specified.
252
253 Hide Options
254
255 -a name HideAgent. Hide user agents matching name.
256
257 -r name HideReferrer. Hide referrer matching name.
258
259 -s name HideSite. Hide site matching name.
260
261 -X name HideAllSites. Hide all individual sites (only display groups).
262
263 -u name HideURL. Hide URL matching name.
264
265 Table size options
266
267 -A num TopAgents. Display the top num user agents table.
268
269 -R num TopReferrers. Display the top num referrers table.
270
271 -S num TopSites. Display the top num sites table.
272
273 -U num TopURLs. Display the top num URL's table.
274
275 -C num TopCountries. Display the top num countries table.
276
277 -e num TopEntry. Display the top num entry pages table.
278
279 -E num TopExit. Display the top num exit pages table.
280
282 Configuration files are standard ascii(7) text files that may be cre‐
283 ated or edited using any standard editor. Blank lines and lines that
284 begin with a pound sign ('#') are ignored. Any other lines are consid‐
285 ered to be configurgation lines, and have the form "Keyword Value",
286 where the ´Keyword´ is one of the currently available configuration
287 keywords defined below, and 'Value' is the value to assign to that par‐
288 ticular option. Any text found after the keyword up to the end of the
289 line is considered the keyword's value, so you should not include any‐
290 thing after the actual value on the line that is not actually part of
291 the value being assigned. The file sample.conf provided with the dis‐
292 tribution contains lots of useful documentation and examples as well.
293
294 General Configuration Keywords
295
296 LogFile name
297 Use log file named name. If none specified, STDIN will be
298 used.
299
300 LogType name
301 Specify log file type as name. Values can be either web, squid
302 or ftp, with the default being web.
303
304 OutputDir dir
305 Create output in the directory dir. If none specified, the
306 current directory will be used.
307
308 HistoryName name
309 Filename to use for history file. Relative to output directory
310 unless absolute name is given (ie: starts with '/'). Defaults
311 to ´webalizer.hist' in the standard output directory.
312
313 ReportTitle name
314 Use the title string name for the report title. If none speci‐
315 fied, use the default of (in english) "Usage Statistics for ".
316
317 Hostname name
318 Set the hostname for the report as name. If none specified, an
319 attempt will be made to gather the hostname via a uname(2) sys‐
320 tem call. If that fails, localhost will be used.
321
322 UseHTTPS ( yes | no )
323 Use https:// on links to URLS, instead of the default http://,
324 in the 'Top URL's' table.
325
326 Quiet ( yes | no )
327 Supress informational messages. Warning and Error messages
328 will not be supressed.
329
330 ReallyQuiet ( yes | no )
331 Supress all messages, including Warning and Error messages.
332
333 Debug ( yes | no )
334 Print extra debugging information on Warnings and Errors.
335
336 TimeMe ( yes | no )
337 Force timing information at end of processing.
338
339 GMTTime ( yes | no )
340 Use GMT (UTC) time instead of local timezone for reports.
341
342 IgnoreHist ( yes | no )
343 Ignore previous monthly history file. USE WITH CAUTION. Does
344 not prevent Incremental file processing.
345
346 FoldSeqErr ( yes | no )
347 Fold out of sequence log records back into analysis by treating
348 them as if they had the same date/time as the last good record.
349 Normally, out of sequence log records are ignored.
350
351 CountryGraph ( yes | no )
352 Display Country Usage Graph in output report.
353
354 DailyGraph ( yes | no )
355 Display Daily Graph in output report.
356
357 DailyStats ( yes | no )
358 Display Daily Statistics in output report.
359
360 HourlyGraph ( yes | no )
361 Display Hourly Graph in output report.
362
363 HourlyStats ( yes | no )
364 Display Hourly Statistics in output report.
365
366 PageType name
367 Define the file extensions to consider as a page. If a file is
368 found to have the same extension as name, it will be counted as
369 a page (sometimes called a pageview).
370
371 GraphLegend ( yes | no )
372 Allows the color coded graph legends to be enabled/disabled.
373
374 GraphLines num
375 Specify the number of background reference lines displayed on
376 the graphs produced. Disable by using zero ('0'), default is
377 2.
378
379 VisitTimeout num
380 Specifies the visit timeout value. Default is 1800 seconds (30
381 minutes). A visit is determined by looking at the difference
382 in time between the current and last request from a specific
383 site. If the difference is greater or equal to the timeout
384 value, the request is counted as a new visit. Specified in
385 seconds.
386
387 IndexAlias name
388 Use name as an additional alias for index.*.
389
390 MangleAgents num
391 Mangle user agent names based on mangle level num. See the -M
392 command line switch for mangle levels and their meaning. The
393 default is 0, which doesn't mangle user agents at all.
394
395 SearchEngine name variable
396 Allows the specification of search engines and their query
397 strings. The name is the name to match against the referrer
398 string for a given search engine. The variable is the cgi
399 variable that the search engine uses for queries. See the sam‐
400 ple.conf file for example usage with common search engines.
401
402 Incremental ( yes | no )
403 Enable Incremental mode processing.
404
405 IncrementalName name
406 Filename to use for incremental data. Relative to output
407 directory unless an absolute name is given (ie: starts with
408 '/'). Defaults to ´webalizer.current' in the standard output
409 directory.
410
411 DNSCache name
412 Filename to use for the DNS cache. Relative to output direc‐
413 tory unless an absolute name is given (ie: starts with '/').
414
415 DNSChildren num
416 Number of children DNS processes to run in order to cre‐
417 ate/update the DNS cache file. Specify zero (0) to disable.
418
419 Top Table Keywords
420
421 TopAgents num
422 Display the top num User Agents table. Use zero to disable.
423
424 AllAgents ( yes | no )
425 Create seperate HTML page with All User Agents.
426
427 TopReferrers num
428 Display the top num Referrers table. Use zero to disable.
429
430 AllReferrers ( yes | no )
431 Create seperate HTML page with All Referrers.
432
433 TopSites num
434 Display the top num Sites table. Use zero to disable.
435
436 TopKSites num
437 Display the top num Sites (by KByte) table. Use zero to dis‐
438 able.
439
440 AllSites ( yes | no )
441 Create seperate HTML page with All Sites.
442
443 TopURLs num
444 Display the top num URLs table. Use zero to disable.
445
446 TopKURLs num
447 Display the top num URLs (by KByte) table. Use zero to dis‐
448 able.
449
450 AllURLs ( yes | no )
451 Create seperate HTML page with All URLs.
452
453 TopCountries num
454 Display the top num Countries in the table. Use zero to dis‐
455 able.
456
457 TopEntry num
458 Display the top num Entry Pages in the table. Use zero to dis‐
459 able.
460
461 TopExit num
462 Display the top num Exit Pages in the table. Use zero to dis‐
463 able.
464
465 TopSearch num
466 Display the top num Search Strings in the table. Use zero to
467 disable.
468
469 AllSearchStr ( yes | no )
470 Create seperate HTML page with All Search Strings.
471
472 TopUsers num
473 Display the top num Usernames in the table. Use zero to dis‐
474 able. Usernames are only available if using http based authen‐
475 tication.
476
477 AllUsers ( yes | no )
478 Create seperate HTML page with All Usernames.
479
480 Hide/Ignore/Group/Include Keywords
481
482 HideAgent name
483 Hide User Agents that match name.
484
485 HideReferrer name
486 Hide Referrers that match name.
487
488 HideSite name
489 Hide Sites that match name.
490
491 HideAllSites ( yes | no )
492 Hide all individual sites. This causes only grouped sites to
493 be displayed.
494
495 HideURL name
496 Hide URL's that match name.
497
498 HideUser name
499 Hide Usernames that match name.
500
501 IgnoreAgent name
502 Ignore User Agents that match name.
503
504 IgnoreReferrer name
505 Ignore Referrers that match name.
506
507 IgnoreSite name
508 Ignore Sites that match name.
509
510 IgnoreURL name
511 Ignore URL's that match name.
512
513 IgnoreUser name
514 Ignore Usernames that match name.
515
516 GroupAgent name [Label]
517 Group User Agents that match name. Display Label in 'Top
518 Agent' table if given (instead of name).
519
520 GroupReferrer name [Label]
521 Group Referrers that match name. Display Label in 'Top Refer‐
522 rer' table if given (instead of name).
523
524 GroupSite name [Label]
525 Group Sites that match name. Display Label in 'Top Site' table
526 if given (instead of name).
527
528 GroupDomains num
529 Automatically group sites by domain. The value num specifies
530 the level of grouping, and can be thought of as the 'number of
531 dots' to be displayed. The default value of 0 disables domain
532 grouping.
533
534 GroupURL name [Label]
535 Group URL's that match name. Display Label in 'Top URL' table
536 if given (instead of name).
537
538 GroupUser name [Label]
539 Group Usernames that match name. Display Label in 'Top User‐
540 names' table if given (instead of name).
541
542 IncludeSite name
543 Force inclusion of sites that match name. Takes precedence
544 over Ignore# keywords.
545
546 IncludeURL name
547 Force inclusion of URL's that match name. Takes precedence
548 over Ignore# keywords.
549
550 IncludeReferrer name
551 Force inclusion of Referrers that match name. Takes precedence
552 over Ignore# keywords.
553
554 IncludeAgent name
555 Force inclusion of User Agents that match name. Takes prece‐
556 dence over Ignore* keywords.
557
558 IncludeUser name
559 Force inclusion of Usernames that match name. Takes precedence
560 over Ignore* keywords.
561
562 HTML Generation Keywords
563
564 HTMLExtension text
565 Defines the HTML file extension to use. Default is html. Do
566 not include the leading period!
567
568 HTMLPre text
569 Insert text at the very beginning of the generated HTML file.
570 Defaults to a standard html 3.2 DOCTYPE record.
571
572 HTMLHead text
573 Insert text within the <HEAD></HEAD> block of the HTML file.
574
575 HTMLBody text
576 Insert text in HTML page, starting with the <BODY> tag. If
577 used, the first line must be a <BODY ...> tag. Multiple lines
578 may be specified.
579
580 HTMLPost text
581 Insert text at top (before horiz. rule) of HTML pages. Multi‐
582 ple lines may be specified.
583
584 HTMLTail text
585 Insert text at bottom of the HTML page. The text is top and
586 right aligned within a table column at the end of the report.
587
588 HTMLEnd text
589 Insert text at the very end of the HTML page. If not speci‐
590 fied, the default is to insert the ending </BODY> and </HTML>
591 tags. If used, you must supply these tags yourself.
592
593 Dump Object Keywords
594
595 The Webalizer allows you to export processed data to other programs by
596 using tab delimited text files. The Dump* commands specify which files
597 are to be written, and where.
598
599 DumpPath name
600 Save dump files in directory name. If not specified, the
601 default output directory will be used. Do not specify a trail‐
602 ing slash (/fP).
603
604 DumpExtension name
605 Use name as the filename extension for dump files. If not
606 given, the default of tab will be used.
607
608 DumpHeader ( yes | no )
609 Print a column header as the first record of the file.
610
611 DumpSites ( yes | no )
612 Dump the sites data to a tab delimited file.
613
614 DumpURLs ( yes | no )
615 Dump the url data to a tab delimited file.
616
617 DumpReferrers ( yes | no )
618 Dump the referrer data to a tab delimitd file. This data is
619 only available if using a log that contains referrer informa‐
620 tion (ie: a combined format web log).
621
622 DumpAgents ( yes | no )
623 Dump the user agent data to a tab delimited file. This data is
624 only available if using a log that contains user agent informa‐
625 tion (ie: a combined format web log).
626
627 DumpUsers ( yes | no )
628 Dump the username data to a tab delimited file. This data is
629 only available if processing a wu-ftpd xferlog or a web log
630 that contains http authentication information.
631
632 DumpSearchStr ( yes | no )
633 Dump the search string data to a tab delimited file. This data
634 is only available if processing a web log that contains refer‐
635 rer information and had search string information present.
636
638 webalizer.conf Default configuration file. Is searched for in the
639 current directory and if not found, in the /etc/
640 directory.
641
642 webalizer.hist Monthly history file for previous 12 months. (can
643 be changed)
644
645 webalizer.current Current state data file (Incremental processing).
646 (can be changed)
647
648 xxxxx_YYYYMM.html Various monthly HTML output files produced. (exten‐
649 sion can be changed)
650
651 xxxxx_YYYYMM.png Various monthly image files used in the reports.
652
653 xxxxx_YYYYMM.tab Monthly tab delimited text files. (extension can
654 be changed)
655
657 Report bugs to brad@mrunix.net.
658
660 Copyright (C) 1997-2000 by Bradford L. Barrett. Distributed under the
661 GNU GPL. See the files "COPYING" and "Copyright", supplied with all
662 distributions for additional information.
663
665 Bradford L. Barrett <brad@mrunix.net>
666
667
668
669Version 2.01 22-Oct-2001 webalizer(1)