1webalizer(1)                     The Webalizer                    webalizer(1)
2
3
4

NAME

6       webalizer - A web server log file analysis tool.
7

SYNOPSIS

9       webalizer [ option ... ] [ log-file ]
10
11       webazolver [ option ... ] [ log-file ]
12

DESCRIPTION

14       The  Webalizer is a web server log file analysis program which produces
15       usage statistics in HTML  format  for  viewing  with  a  browser.   The
16       results  are  presented  in  both  columnar and graphical format, which
17       facilitates interpretation.  Yearly, monthly, daily  and  hourly  usage
18       statistics  are  presented,  along with the ability to display usage by
19       site, URL, referrer, user agent (browser),  username,  search  strings,
20       entry/exit  pages,   and country (some information may not be available
21       if not present in the log file being processed).
22
23       The Webalizer supports CLF (common log format) log files,  as  well  as
24       Combined  log  formats as defined by NCSA and others, and variations of
25       these which it attempts to  handle  intelligently.   In  addition,  the
26       Webalizer  also  supports wu-ftpd xferlog formatted log files, allowing
27       analysis of ftp servers, and squid proxy logs.  Logs may also  be  com‐
28       pressed,  via  gzip.   If a compressed log file is detected, it will be
29       automatically uncompressed while it is read.  Compressed logs must have
30       the standard gzip extension of .gz.
31
32       webazolver is normally just a symbolic link to the webalizer.  When run
33       as webazolver, only DNS file creation/updates are  performed,  and  the
34       program  will exit once complete.  All normal options and configuration
35       directives are available, however many will not be used.  In  addition,
36       a DNS cache file must be specified.  If the number of DNS children pro‐
37       cesses to use are not specified, the webazolver will default to 5.
38
39       This documentation applies to The Webalizer Version 2.01
40

RUNNING THE WEBALIZER

42       The Webalizer was designed to be run from a Unix command line prompt or
43       as a crond(8) job. Once executed, the general flow of the program is:
44
45       o       A  default  configuration  file  is  scanned for.  A file named
46               webalizer.conf is searched for in the current directory, and if
47               found,  and  is owned by the invoking user, then its configura‐
48               tion data is parsed.  If the file is not present in the current
49               directory,   the  file /etc/webalizer.conf is searched for and,
50               if found, is used instead.
51
52       o       Any command line arguments given to  the  program  are  parsed.
53               This  may  include  the  specification of a configuration file,
54               which is processed at the time it is encountered.
55
56       o       If a log file was specified, it is opened and  made  ready  for
57               processing.  If no log file was given, STDIN is used for input.
58               If the log filename '-' is specified, STDIN will be forced.
59
60       o       If an output  directory  was  specified,  the  program  does  a
61               chdir(2) to that directory in prepration for generating output.
62               If no output directory was  given,  the  current  directory  is
63               used.
64
65       o       If  a non-zero number of DNS Children processes were specified,
66               they will be started, and the specified log file will  be  pro‐
67               cessed, creating or updating the specified DNS cache file.
68
69       o       If no hostname was given, the program attempts to get the host‐
70               name using a uname(2) system call.  If that fails, localhost is
71               used.
72
73       o       A history file is searched for in the current directory (output
74               directory) and read if found.  This file keeps totals for  pre‐
75               vious  months,  which is used in the main index.html HTML docu‐
76               ment.  Note: The file location can now be  specified  with  the
77               HistoryName configuration option.
78
79       o       If  incremental  processing  was  specified,  a  data  file  is
80               searched for and loaded  if  found,  containing  the  'internal
81               state' data of the program at the end of a previous run.  Note:
82               The file location can now be specified with the IncrementalName
83               configuration option.
84
85       o       Main  processing begins on the log file.  If the log spans mul‐
86               tiple months, a seperate HTML  document  is  created  for  each
87               month.
88
89       o       After  main  processing,  the  main index.html page is created,
90               which has totals by month and links to each months  HTML  docu‐
91               ment.
92
93       o       A new history file is saved to disk, which includes totals gen‐
94               erated by The Webalizer during the current run.
95
96       o       If incremental processing was specified, a data file is written
97               that contains the 'internal state' data at the end of this run.
98

INCREMENTAL PROCESSING

100       Version  1.2x of The Webalizer adds incremental run capability.  Simply
101       put, this allows processing large log files by breaking  them  up  into
102       smaller  pieces,  and processing these pieces instead.  What this means
103       in real terms is that you can now rotate your log files as often as you
104       want, and still be able to produce monthly usage statistics without the
105       loss of any detail.  Basically, The Webalizer saves  and  restores  all
106       internal  data in a file named webalizer.current.  This allows the pro‐
107       gram to 'start where it left off' so to speak, and allows the preserva‐
108       tion  of  detail  from one run to the next.  The data file is placed in
109       the current output directory, and is a plain ascii text file  that  can
110       be viewed with any standard text editor.  It's location and name may be
111       changed using the IncrementalName configuration keyword.
112
113       Some special precautions need to be taken when  using  the  incremental
114       run  capability  of The Webalizer.  Configuration options should not be
115       changed between runs, as that could cause corruption  of  the  internal
116       data  stored.   For example, changing the MangleAgents level will cause
117       different representations  of  user  agents  to  be  stored,  producing
118       invalid  results in the user agents section of the report.  If you need
119       to change configuration options, do it at the end of  the  month  after
120       normal  processing of the previous month and before processing the cur‐
121       rent month.  You may also want to delete the webalizer.current file  as
122       well.
123
124       The  Webalizer  also  attempts  to  prevent data duplication by keeping
125       track of the timestamp of the last record processed.  This timestamp is
126       then  compared to current records being processed, and any records that
127       were logged previous to that timestamp are ignored.  This,  in  theory,
128       should  allow  you to re-process logs that have already been processed,
129       or process logs that contain  a  mix  of  processed/not  yet  processed
130       records, and not produce duplication of statistics.  The only time this
131       may break is if you have  duplicate  timestamps  in  two  seperate  log
132       files... any records in the second log file that do have the same time‐
133       stamp as the last record in the previous log file  processed,  will  be
134       discarded  as  if  they  had already been processed.  There are lots of
135       ways to prevent this however, for  example,  stopping  the  web  server
136       before  rotating  logs  will  prevent  this situation.  This setup also
137       necessitates that you always process logs in chronological order,  oth‐
138       erwise data loss will occur as a result of the timestamp compare.
139

REVERSE DNS LOOKUPS

141       The  Webalizer  supports  reverse  DNS lookups through a DNS cache file
142       that is either created/updated at run-time, or has been previously cre‐
143       ated,  either  by  a  previous  run of the webalizer, or by running the
144       stand-alone version, webazolver.   In  order  to  perform  reverse  DNS
145       lookups,  a  DNSCache  filename  must  be  specified.  In order to cre‐
146       ate/update the cache file at run-time, the DNSChildren number  must  be
147       non-zero.   The DNSChildren value specifies the number of children pro‐
148       cesses to fork, each of which will perform reverse DNS lookups in order
149       to create/update the DNS cache file.  See the file DNS.README for addi‐
150       tional information.
151

COMMAND LINE OPTIONS

153       The Webalizer supports many different configuration options  that  will
154       alter  the way the program behaves and generates output.  Most of these
155       can be specified on the command line, while some can only be  specified
156       in  a  configuration  file.  The command line options are listed below,
157       with references to the corresponding configuration file keywords.
158
159       General Options
160
161       -h      Display all available command line options and exit program.
162
163       -v -V   Display program version and exit program.
164
165       -d      Debug.  Display debugging information for errors and warnings.
166
167       -i      IgnoreHist.  Ignore history.  USE WITH CAUTION. This will cause
168               The Webalizer to ignore any previous monthly history file only.
169               Incremental data (if present) is still processed.
170
171       -p      Incremental.  Preserve internal data between runs.
172
173       -q      Quiet.  Supress informational messages.  Does not supress warn‐
174               ings or errors.
175
176       -Q      ReallyQuiet.   Supress  all  messages  including  warnings  and
177               errors.
178
179       -T      TimeMe.  Force display of timing information at end of process‐
180               ing.
181
182       -c file Use configuration file file.
183
184       -n name HostName.  Use the hostname name.
185
186       -o dir  OutputDir.  Use output directory dir.
187
188       -t name ReportTitle.  Use name for report title.
189
190       -F ( clf | ftp | squid )
191               LogType.   Specify  log  type  to  be  processed.  Value can be
192               either clf, ftp  or  squid  format.   If  not  specified,  will
193               default  to  CLF  format.  FTP logs must be in standard wu-ftpd
194               xferlog format.
195
196       -f      FoldSeqErr.  Fold out of sequence log records back into  analy‐
197               sis, by treating as if they were the same date/time as the last
198               good record.  Normally, out of sequence log records are  simply
199               ignored.
200
201       -Y      CountryGraph. Supress country graph.
202
203       -G      HourlyGraph.  Supress hourly graph.
204
205       -x name HTMLExtension.   Defines  HTML  file  extension to use.  If not
206               specified, defaults  to  html.   Do  not  include  the  leading
207               period.
208
209       -H      HourlyStats.  Supress hourly statistics.
210
211       -L      GraphLegend.  Supress color coded graph legends.
212
213       -l num  GraphLines.   Specify number of background lines. Default is 2.
214               Use zero ('0') to disable the lines.
215
216       -P name PageType.  Specify file extensions that are  considered  pages.
217               Sometimes referred to as pageviews.
218
219       -m num  VisitTimeout.   Specify the Visit timeout period.  Specified in
220               number of seconds.  Default is 1800 seconds (30 minutes).
221
222       -I name IndexAlias.  Use the filename name as an additional  alias  for
223               index..
224
225       -M num  MangleAgents.   Mangle user agent names according to the mangle
226               level specified by num.  Mangle levels are:
227
228               5   Browser name and major version.
229
230               4   Browser name, major and minor version.
231
232               3   Browser name, major version, minor version to  two  decimal
233                   places.
234
235               2   Browser name, major and minor versions and sub-version.
236
237               1   Browser name, version and machine type if possible.
238
239               0   All informaiton (left unchanged).
240
241       -g num  GroupDomains.  Automatically group sites by domain.  The group‐
242               ing level specified by num can be thought of as 'the number  of
243               dots'  to display in the grouping.  The default value of 0 dis‐
244               ables any domain grouping.
245
246       -D name DNSCache.  Use the DNS cache file name.
247
248       -N num  DNSChildren.  Use num DNS children  processes  to  perform  DNS
249               lookups,  either  creating  or  updateing  the  DNS cache file.
250               Specify zero (0) to disable cache  file  creation/updates.   If
251               given, a DNS cache filename must be specified.
252
253       Hide Options
254
255       -a name HideAgent.  Hide user agents matching name.
256
257       -r name HideReferrer.  Hide referrer matching name.
258
259       -s name HideSite.  Hide site matching name.
260
261       -X name HideAllSites.  Hide all individual sites (only display groups).
262
263       -u name HideURL.  Hide URL matching name.
264
265       Table size options
266
267       -A num  TopAgents.  Display the top num user agents table.
268
269       -R num  TopReferrers.  Display the top num referrers table.
270
271       -S num  TopSites.  Display the top num sites table.
272
273       -U num  TopURLs.  Display the top num URL's table.
274
275       -C num  TopCountries.  Display the top num countries table.
276
277       -e num  TopEntry.  Display the top num entry pages table.
278
279       -E num  TopExit.  Display the top num exit pages table.
280

CONFIGURATION FILES

282       Configuration  files  are standard ascii(7) text files that may be cre‐
283       ated or edited using any standard editor.  Blank lines and  lines  that
284       begin with a pound sign ('#') are ignored.  Any other lines are consid‐
285       ered to be configurgation lines, and have  the  form  "Keyword  Value",
286       where  the  ´Keyword´  is  one of the currently available configuration
287       keywords defined below, and 'Value' is the value to assign to that par‐
288       ticular  option.  Any text found after the keyword up to the end of the
289       line is considered the keyword's value, so you should not include  any‐
290       thing  after  the actual value on the line that is not actually part of
291       the value being assigned.  The file sample.conf provided with the  dis‐
292       tribution contains lots of useful documentation and examples as well.
293
294       General Configuration Keywords
295
296       LogFile name
297               Use  log  file  named  name.   If none specified, STDIN will be
298               used.
299
300       LogType name
301               Specify log file type as name. Values can be either web,  squid
302               or ftp, with the default being web.
303
304       OutputDir dir
305               Create  output  in  the  directory dir.  If none specified, the
306               current directory will be used.
307
308       HistoryName name
309               Filename to use for history file.  Relative to output directory
310               unless  absolute  name is given (ie: starts with '/'). Defaults
311               to ´webalizer.hist' in the standard output directory.
312
313       ReportTitle name
314               Use the title string name for the report title.  If none speci‐
315               fied, use the default of (in english) "Usage Statistics for ".
316
317       Hostname name
318               Set the hostname for the report as name.  If none specified, an
319               attempt will be made to gather the hostname via a uname(2) sys‐
320               tem call.  If that fails, localhost will be used.
321
322       UseHTTPS ( yes | no )
323               Use  https:// on links to URLS, instead of the default http://,
324               in the 'Top URL's' table.
325
326       Quiet ( yes | no )
327               Supress informational messages.   Warning  and  Error  messages
328               will not be supressed.
329
330       ReallyQuiet ( yes | no )
331               Supress all messages, including Warning and Error messages.
332
333       Debug ( yes | no )
334               Print extra debugging information on Warnings and Errors.
335
336       TimeMe ( yes | no )
337               Force timing information at end of processing.
338
339       GMTTime ( yes | no )
340               Use GMT (UTC) time instead of local timezone for reports.
341
342       IgnoreHist ( yes | no )
343               Ignore  previous monthly history file.  USE WITH CAUTION.  Does
344               not prevent Incremental file processing.
345
346       FoldSeqErr ( yes | no )
347               Fold out of sequence log records back into analysis by treating
348               them as if they had the same date/time as the last good record.
349               Normally, out of sequence log records are ignored.
350
351       CountryGraph ( yes | no )
352               Display Country Usage Graph in output report.
353
354       DailyGraph ( yes | no )
355               Display Daily Graph in output report.
356
357       DailyStats ( yes | no )
358               Display Daily Statistics in output report.
359
360       HourlyGraph ( yes | no )
361               Display Hourly Graph in output report.
362
363       HourlyStats ( yes | no )
364               Display Hourly Statistics in output report.
365
366       PageType name
367               Define the file extensions to consider as a page.  If a file is
368               found to have the same extension as name, it will be counted as
369               a page (sometimes called a pageview).
370
371       GraphLegend ( yes | no )
372               Allows the color coded graph legends to be enabled/disabled.
373
374       GraphLines num
375               Specify the number of background reference lines  displayed  on
376               the  graphs  produced.  Disable by using zero ('0'), default is
377               2.
378
379       VisitTimeout num
380               Specifies the visit timeout value.  Default is 1800 seconds (30
381               minutes).   A  visit is determined by looking at the difference
382               in time between the current and last request  from  a  specific
383               site.   If  the  difference  is greater or equal to the timeout
384               value, the request is counted as a  new  visit.   Specified  in
385               seconds.
386
387       IndexAlias name
388               Use name as an additional alias for index.*.
389
390       MangleAgents num
391               Mangle  user agent names based on mangle level num.  See the -M
392               command line switch for mangle levels and their  meaning.   The
393               default is 0, which doesn't mangle user agents at all.
394
395       SearchEngine name variable
396               Allows  the  specification  of  search  engines and their query
397               strings.  The name is the name to match  against  the  referrer
398               string  for  a  given  search  engine.  The variable is the cgi
399               variable that the search engine uses for queries.  See the sam‐
400               ple.conf file for example usage with common search engines.
401
402       Incremental ( yes | no )
403               Enable Incremental mode processing.
404
405       IncrementalName name
406               Filename  to  use  for  incremental  data.   Relative to output
407               directory unless an absolute name is  given  (ie:  starts  with
408               '/').   Defaults  to ´webalizer.current' in the standard output
409               directory.
410
411       DNSCache name
412               Filename to use for the DNS cache.  Relative to  output  direc‐
413               tory unless an absolute name is given (ie: starts with '/').
414
415       DNSChildren num
416               Number  of  children  DNS  processes  to  run  in order to cre‐
417               ate/update the DNS cache file.  Specify zero (0) to disable.
418
419       Top Table Keywords
420
421       TopAgents num
422               Display the top num User Agents table. Use zero to disable.
423
424       AllAgents ( yes | no )
425               Create seperate HTML page with All User Agents.
426
427       TopReferrers num
428               Display the top num Referrers table. Use zero to disable.
429
430       AllReferrers ( yes | no )
431               Create seperate HTML page with All Referrers.
432
433       TopSites num
434               Display the top num Sites table. Use zero to disable.
435
436       TopKSites num
437               Display the top num Sites (by KByte) table.  Use zero  to  dis‐
438               able.
439
440       AllSites ( yes | no )
441               Create seperate HTML page with All Sites.
442
443       TopURLs num
444               Display the top num URLs table. Use zero to disable.
445
446       TopKURLs num
447               Display  the  top  num URLs (by KByte) table.  Use zero to dis‐
448               able.
449
450       AllURLs ( yes | no )
451               Create seperate HTML page with All URLs.
452
453       TopCountries num
454               Display the top num Countries in the table. Use  zero  to  dis‐
455               able.
456
457       TopEntry num
458               Display the top num Entry Pages in the table.  Use zero to dis‐
459               able.
460
461       TopExit num
462               Display the top num Exit Pages in the table.  Use zero to  dis‐
463               able.
464
465       TopSearch num
466               Display  the  top num Search Strings in the table.  Use zero to
467               disable.
468
469       AllSearchStr ( yes | no )
470               Create seperate HTML page with All Search Strings.
471
472       TopUsers num
473               Display the top num Usernames in the table.  Use zero  to  dis‐
474               able.  Usernames are only available if using http based authen‐
475               tication.
476
477       AllUsers ( yes | no )
478               Create seperate HTML page with All Usernames.
479
480       Hide/Ignore/Group/Include Keywords
481
482       HideAgent name
483               Hide User Agents that match name.
484
485       HideReferrer name
486               Hide Referrers that match name.
487
488       HideSite name
489               Hide Sites that match name.
490
491       HideAllSites ( yes | no )
492               Hide all individual sites.  This causes only grouped  sites  to
493               be displayed.
494
495       HideURL name
496               Hide URL's that match name.
497
498       HideUser name
499               Hide Usernames that match name.
500
501       IgnoreAgent name
502               Ignore User Agents that match name.
503
504       IgnoreReferrer name
505               Ignore Referrers that match name.
506
507       IgnoreSite name
508               Ignore Sites that match name.
509
510       IgnoreURL name
511               Ignore URL's that match name.
512
513       IgnoreUser name
514               Ignore Usernames that match name.
515
516       GroupAgent name [Label]
517               Group  User  Agents  that  match  name.   Display Label in 'Top
518               Agent' table if given (instead of name).
519
520       GroupReferrer name [Label]
521               Group Referrers that match name.  Display Label in 'Top  Refer‐
522               rer' table if given (instead of name).
523
524       GroupSite name [Label]
525               Group Sites that match name.  Display Label in 'Top Site' table
526               if given (instead of name).
527
528       GroupDomains num
529               Automatically group sites by domain.  The value  num  specifies
530               the  level of grouping, and can be thought of as the 'number of
531               dots' to be displayed.  The default value of 0 disables  domain
532               grouping.
533
534       GroupURL name [Label]
535               Group  URL's that match name.  Display Label in 'Top URL' table
536               if given (instead of name).
537
538       GroupUser name [Label]
539               Group Usernames that match name.  Display Label in  'Top  User‐
540               names' table if given (instead of name).
541
542       IncludeSite name
543               Force  inclusion  of  sites  that match name.  Takes precedence
544               over Ignore# keywords.
545
546       IncludeURL name
547               Force inclusion of URL's that  match  name.   Takes  precedence
548               over Ignore# keywords.
549
550       IncludeReferrer name
551               Force inclusion of Referrers that match name.  Takes precedence
552               over Ignore# keywords.
553
554       IncludeAgent name
555               Force inclusion of User Agents that match name.   Takes  prece‐
556               dence over Ignore* keywords.
557
558       IncludeUser name
559               Force inclusion of Usernames that match name.  Takes precedence
560               over Ignore* keywords.
561
562       HTML Generation Keywords
563
564       HTMLExtension text
565               Defines the HTML file extension to use.  Default is  html.   Do
566               not include the leading period!
567
568       HTMLPre text
569               Insert  text  at the very beginning of the generated HTML file.
570               Defaults to a standard html 3.2 DOCTYPE record.
571
572       HTMLHead text
573               Insert text within the <HEAD></HEAD> block of the HTML file.
574
575       HTMLBody text
576               Insert text in HTML page, starting with  the  <BODY>  tag.   If
577               used,  the first line must be a <BODY ...> tag.  Multiple lines
578               may be specified.
579
580       HTMLPost text
581               Insert text at top (before horiz. rule) of HTML pages.   Multi‐
582               ple lines may be specified.
583
584       HTMLTail text
585               Insert  text  at  bottom of the HTML page.  The text is top and
586               right aligned within a table column at the end of the report.
587
588       HTMLEnd text
589               Insert text at the very end of the HTML page.   If  not  speci‐
590               fied,  the  default is to insert the ending </BODY> and </HTML>
591               tags.  If used, you must supply these tags yourself.
592
593       Dump Object Keywords
594
595       The Webalizer allows you to export processed data to other programs  by
596       using tab delimited text files.  The Dump* commands specify which files
597       are to be written, and where.
598
599       DumpPath name
600               Save dump files in  directory  name.   If  not  specified,  the
601               default output directory will be used.  Do not specify a trail‐
602               ing slash (/fP).
603
604       DumpExtension name
605               Use name as the filename extension  for  dump  files.   If  not
606               given, the default of tab will be used.
607
608       DumpHeader ( yes | no )
609               Print a column header as the first record of the file.
610
611       DumpSites ( yes | no )
612               Dump the sites data to a tab delimited file.
613
614       DumpURLs ( yes | no )
615               Dump the url data to a tab delimited file.
616
617       DumpReferrers ( yes | no )
618               Dump  the  referrer  data to a tab delimitd file.  This data is
619               only available if using a log that contains  referrer  informa‐
620               tion (ie: a combined format web log).
621
622       DumpAgents ( yes | no )
623               Dump the user agent data to a tab delimited file.  This data is
624               only available if using a log that contains user agent informa‐
625               tion (ie: a combined format web log).
626
627       DumpUsers ( yes | no )
628               Dump  the  username data to a tab delimited file.  This data is
629               only available if processing a wu-ftpd xferlog  or  a  web  log
630               that contains http authentication information.
631
632       DumpSearchStr ( yes | no )
633               Dump the search string data to a tab delimited file.  This data
634               is only available if processing a web log that contains  refer‐
635               rer information and had search string information present.
636

FILES

638       webalizer.conf      Default configuration file.  Is searched for in the
639                           current directory and if not found,  in  the  /etc/
640                           directory.
641
642       webalizer.hist      Monthly  history file for previous 12 months.  (can
643                           be changed)
644
645       webalizer.current   Current state data file  (Incremental  processing).
646                           (can be changed)
647
648       xxxxx_YYYYMM.html   Various monthly HTML output files produced. (exten‐
649                           sion can be changed)
650
651       xxxxx_YYYYMM.png    Various monthly image files used in the reports.
652
653       xxxxx_YYYYMM.tab    Monthly tab delimited text files.   (extension  can
654                           be changed)
655

BUGS

657       Report bugs to brad@mrunix.net.
658
660       Copyright  (C) 1997-2000 by Bradford L. Barrett.  Distributed under the
661       GNU GPL.  See the files "COPYING" and "Copyright",  supplied  with  all
662       distributions for additional information.
663

AUTHOR

665       Bradford L. Barrett <brad@mrunix.net>
666
667
668
669Version 2.01                      22-Oct-2001                     webalizer(1)
Impressum