1PMDAWEBLOG(1)               General Commands Manual              PMDAWEBLOG(1)
2
3
4

NAME

6       pmdaweblog  -  performance  metrics  domain agent (PMDA) for Web server
7       logs
8

SYNOPSIS

10       $PCP_PMDAS_DIR/weblog/pmdaweblog [-Cp] [-d domain]  [-h  helpfile]  [-i
11       port] [-l logfile] [-n idlesec] [-S num] [-t delay] [-u socket] config‐
12       file
13

DESCRIPTION

15       pmdaweblog is a Performance Metrics Domain Agent (PMDA(3))  that  scans
16       Web  server logs to extract metrics characterizing Web server activity.
17       These performance metrics are then made available  through  the  infra‐
18       structure of the Performance Co-Pilot (PCP).
19
20       The  configfile  specifies which Web servers are to be monitored, their
21       associated access logs and error logs, and a  regular-expression  based
22       scheme for extracting detailed information about each Web access.  This
23       file is maintained as part of the PMDA installation and/or de-installa‐
24       tion   by   the   scripts   Install   and   Remove   in  the  directory
25       $PCP_PMDAS_DIR/weblog.  For more details, refer to  the  section  below
26       covering installation.
27
28       Once started, pmdaweblog monitors a set of log files and in response to
29       a request for information, will process any new  information  that  has
30       been  appended  to  the log files, similar to a tail(1).  There is also
31       periodic "catch up" to process new information from all log files,  and
32       a scheme to detect the rotation of log files.
33
34       Like  all  other PMDAs, pmdaweblog is launched by pmcd(1) using command
35       line options specified in $PCP_PMCDCONF_PATH - the Install script  will
36       prompt  for appropriate values for the command line options, and update
37       $PCP_PMCDCONF_PATH.
38
39       A brief description of the pmdaweblog command line options follows:
40
41       -C     Check the configuration and exit.
42
43       -d domain
44              Specify the domain number.  It is absolutely  crucial  that  the
45              performance  metrics  domain number specified here is unique and
46              consistent.  That is, domain should be different for every  PMDA
47              on  the  one host, and the same domain number should be used for
48              the pmdaweblog PMDA on all hosts.
49
50              For most installations, the default domain  as  encapsulated  in
51              the   file  $PCP_PMDAS_DIR/weblog/domain.h  will  suffice.   For
52              alternate values, check $PCP_PMCDCONF_PATH for the domain values
53              already    in    use    on    this    host,    and    the   file
54              $PCP_VAR_DIR/pmns/stdpmid  contains  a  repository   of   ``well
55              known'' domain assignments that probably should be avoided.
56
57       -h helpfile
58              Get  the  help  text from the supplied helpfile rather than from
59              the default location.
60
61       -i port
62              Communicate with pmcd(1) on the specified Internet  port  (which
63              may be a number or a name).
64
65       -l logfile
66              Location  of  the  log  file.   By  default,  a  log  file named
67              weblog.log is written in the current directory of  pmcd(1)  when
68              pmdaweblog is started, i.e.  $PCP_LOG_DIR/pmcd.  If the log file
69              cannot be created or is not writable, output is written  to  the
70              standard error instead.
71
72       -n idlesec
73              If  a Web server log file has not been modified for idlesec sec‐
74              onds, then the file will be closed and re-opened.  This  is  the
75              only  way pmdaweblog can detect any asynchronous rotation of the
76              logs by Web server administrative scripts.  The  default  period
77              is  20  seconds.   This  value  may be changed dynamically using
78              pmstore(1)  to  modify  the  value  of  the  performance  metric
79              web.config.check.
80
81       -p     Communicate with pmcd(1) via a pipe.
82
83       -S num Specify  the maximum number of Web servers per sproc.  It may be
84              desirable (from a latency and  load  balancing  perspective)  or
85              necessary  (due to file descriptor limits) to delegate responsi‐
86              bility for scanning the Web server log files to several  sprocs.
87              pmdaweblog will ensure that each sproc handles the log files for
88              at most num Web servers.  The default value is  80  Web  servers
89              per sproc.
90
91       -t delay
92              To  avoid  the  need  to  scan a lot of information from the Web
93              server logs in response to a single request for performance met‐
94              rics,  all  log  files will be checked at least once every delay
95              seconds.  The default is 15 seconds.  This value may by  changed
96              dynamically  using pmstore(1) to modify the value of the perfor‐
97              mance metric web.config.catchup.
98
99       -u socket
100              Communicate with pmcd(1) via the given Unix domain socket.
101

INSTALLATION

103       The PCP framework allows metrics to be collected on one host and  moni‐
104       tored from another.  These hosts are referred to as collector and moni‐
105       tor hosts, respectively.  A host may be both a collector and a monitor.
106
107       Collector hosts require the installation of the agent, while monitoring
108       hosts require no agent installation at all.
109
110       For collector hosts do the following as root:
111
112         # cd $PCP_PMDAS_DIR/weblog
113         # ./Install
114
115       The installation procedure prompts for a default or non-default instal‐
116       lation.  A default installation will search for known server configura‐
117       tions  and  automatically  configure  the PMDA for any server log files
118       that are found.  A non-default  installation  will  step  through  each
119       server,  prompting  the  user for other server configurations and argu‐
120       ments to pmdaweblog.  The end result of a collector installation is  to
121       build a configuration file that is passed to pmdaweblog via the config‐
122       file argument.
123
124       If you want to undo the installation, do the following as root:
125
126         # cd $PCP_PMDAS_DIR/weblog
127         # ./Remove
128
129       pmdaweblog  is  launched  by  pmcd(1)  and  should  never  be  executed
130       directly.  The Install and Remove scripts notify pmcd(1) when the agent
131       is installed or removed.
132

CONFIGURATION

134       The configuration file for the weblog PMDA is an ASCII file that can be
135       easily modified.  Empty lines and lines beginning with '#' are ignored.
136       All other lines must be either a regular expression or server  specifi‐
137       cation.
138
139       Regular  expressions,  which  are used on both the access and error log
140       files, must be of the form:
141
142         regex regexName regexp
143       or
144
145         regex_posix regexName ordering regexp_posix
146
147       The regexName is a word which uniquely identifies the  regular  expres‐
148       sion.   This  is  the  reference used in the server specification.  The
149       regexp for access logs is in the format described for  regcmp(3).   The
150       regexp_posix for access logs is in the format described for regcomp(3).
151       Note that on IRIX post release 6.2, it  is  not  recommended  that  the
152       POSIX  compliant  form  be  used  for performance reasons. The argument
153       ordering is explained below. The Posix form should be available on  all
154       platforms.
155
156       The  regular  expression requires the specification of up to four argu‐
157       ments to be extracted from each  line  of  a  Web  server  access  log,
158       depending  on the type of server. In the most common case there are two
159       arguments representing the method and the size.
160
161       For the non- Posix version, argument $0 should contain the method: GET,
162       HEAD  ,  POST or PUT.  The method PUT is treated as a synonym for POST,
163       and anything else is categorized as OTHER.
164
165       The second argument, $1, should contain the size  of  the  request.   A
166       size of ``-'' or `` '' is treated as unknown.
167
168       Argument  $3  should  contain  the  status  code returned to the client
169       browser and argument $4 should contain the status code returned to  the
170       server  from  a  remote  host.  These latter two arguments are used for
171       caching servers and must  be  specified  as  a  pair  (or  $3  will  be
172       ignored).  For  further  information  on status codes, refer to the web
173       site http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
174
175       Some legal non- Posix regex expression specifications for monitoring an
176       access log are:
177
178         # pattern for CERN, NCSA, Netscape etc Access Logs
179         regex CERN ] "([A-Za-z][-A-Za-z]+)$0 .*" [-0-9]+ ([-0-9]+)$1
180
181         # pattern for FTP Server access logs (normally in SYSLOG)
182         regex SYSLOG_FTP ftpd[.*]: ([gp][-A-Za-z]+)$0( )$1
183
184       There is 1 special types of access logs with the RegexName SQUID.  This
185       formats extract 4 parameters but since the Squid log  file  uses  text-
186       based status codes, it is handled as a special case.
187
188       In the examples below, NS_PROXY parses the Netscape/W3C Common Extended
189       Log Format and SQUID parses the default Squid Object Cache  format  log
190       file.
191
192         # pattern for Netscape Proxy Server Extended Logs
193         regex NS_PROXY ] "([A-Za-z][-A-Za-z]+)$0 .*" ([-0-9]+)$2 \
194              ([-0-9]+)$1 ([-0-9]+)$3
195
196         # pattern for Squid Cache logs
197         regex SQUID [0-9]+.[0-9]+[ ]+[0-9]+ [a-zA-Z0-9.]+ \
198              ([_A-Z]+)$3([0-9]+)$2 ([0-9]+)$1 ([A-Z]+)$0
199
200       The  regexp  for  the error logs does not require any arguments, only a
201       match.  Some legal expressions are:
202
203         # pattern for CERN, NCSA, Netscape etc Error Logs
204         regex CERN_err .
205
206         # pattern for FTP Server error logs (normally in SYSLOG)
207         regex SYSLOG_FTP_err FTP LOGIN FAILED
208
209       If POSIX compliant regular expressions are used, additional information
210       is  required  since  the order of parameters cannot be specified in the
211       regular expression.  For backwards compatibility, the  common  case  of
212       two parameters the order may be specified as method,size or size,method
213       In the general case, the ordering is specified by one of the  following
214       methods:
215
216       n1,n2,n3,n4
217            where  nX  is  a digit between 1 and 4. Each comma-seperated field
218            represents    (in    order)     the     arument     number     for
219            method,size,client_status,server_status
220
221       -    Used for cases like the error logs where the content is ignored.
222
223       As  for the non- Posix format, the SQUID RegexName is treated as a spe‐
224       cial case to match the non-numerical status codes.
225
226       Some legal Posix regex  expression  specifications  for  monitoring  an
227       access log are:
228
229         # pattern for CERN, NCSA, Netscape, Apache etc Access Logs
230         regex_posix CERN method,size ][ \]+"([A-Za-z][-A-Za-z]+) \
231              [^"]*" [-0-9]+ ([-0-9]+)
232
233         # pattern for CERN, NCSA, Netscape, Apache etc Access Logs
234         regex_posix CERN 1,2 ][ \]+"([A-Za-z][-A-Za-z]+) \
235              [^"]*" [-0-9]+ ([-0-9]+)
236
237         # pattern for FTP Server access logs (normally in SYSLOG)
238         regex_posix SYSLOG_FTP method,size ftpd[.*]: \
239              ([gp][-A-Za-z]+)( )
240
241         # pattern for Netscape Proxy Server Extended Logs
242         regex_posix NS_PROXY 1,3,2,4 ][ ]+"([A-Za-z][-A-Za-z]+) \
243              [^"]*" ([-0-9]+) ([-0-9]+) ([-0-9]+)
244
245         # pattern for Squid Cache logs
246         regex_posix SQUID 4,3,2,1 [0-9]+.[0-9]+[ ]+[0-9]+ \
247              [a-zA-Z0-9.]+ ([_A-Z]+)([0-9]+) ([0-9]+) ([A-Z]+)
248
249         # pattern for CERN, NCSA, Netscape etc Error Logs
250         regex_posix CERN_err - .
251
252         # pattern for FTP Server error logs (normally in SYSLOG)
253         regex_posix SYSLOG_FTP_err - FTP LOGIN FAILED
254
255
256       A Web server can be specified using this syntax:
257
258         server serverName on|off accessRegex accessFile errorRegex errorFile
259
260       The serverName must be unique for each server, and is the name given to
261       the instance for the associated performance metrics.  See PMAPI(3)  for
262       a  discussion  of  PCP  instance domains.  The on or off flag indicates
263       whether the server is to be monitored when the PMDA is installed.  This
264       can    altered    dynamically   using   pmstore(1)   for   the   metric
265       web.perserver.watched, which has one instance for each Web server named
266       in configfile.
267
268       Two  files  are monitored for each Web server, the access and the error
269       log.  Each file requires the name  of  a  previously  declared  regular
270       expression,  and  a file name.  The log files specified for each server
271       do not have to exist when the weblog PMDA is installed.  The PMDA  will
272       continue  to check for non-existent log files and open them when possi‐
273       ble.  Some legal server specifications are:
274
275         # Netscape Server on Port 80 at IP address 127.55.555.555
276         server 127.55.555.555:80 on CERN /logs/access CERN_err /logs/errors
277
278         # FTP Server.
279         server ftpd on SYSLOG_FTP /var/log/messages SYSLOG_FTP_err /var/log/messages
280

CAVEATS

282       Specifying regular expressions with an incorrect number  of  arguments,
283       anything  other  than  2  for access logs, and none for error logs, may
284       cause the PMDA to behave incorrectly and even crash.  This  is  due  to
285       limitations in the interface of regex(3).
286

FILES

288       $PCP_PMDAS_DIR/weblog
289                 installation directory for the weblog PMDA
290
291       $PCP_PMDAS_DIR/weblog/Install
292                 installation script for the weblog PMDA
293
294       $PCP_PMDAS_DIR/weblog/Remove
295                 de-installation script for the weblog PMDA
296
297       $PCP_LOG_DIR/pmcd/weblog.log
298                 default log file for error reporting
299
300       $PCP_PMCDCONF_PATH
301                 pmcd  configuration  file  that  specifies  the  command line
302                 options to be used when pmdaweblog is launched
303
304       $PCP_LOG_DIR/NOTICES
305                 log of PMDA installations and removals
306
307       $PCP_VAR_DIR/config/web/weblog.conf
308                 likely location of the weblog PMDA configuration file
309
310       $PCP_DOC_DIR/pcpweb/index.html
311                 the online HTML documentation for PCPWEB
312

PCP ENVIRONMENT

314       Environment variables with the prefix PCP_ are used to parameterize the
315       file  and  directory names used by PCP.  On each installation, the file
316       /etc/pcp.conf contains the  local  values  for  these  variables.   The
317       $PCP_CONF  variable may be used to specify an alternative configuration
318       file, as described in pcp.conf(4).
319

SEE ALSO

321       pmcd(1), pmchart(1), pmdawebping(1), pminfo(1), pmstore(1),  pmview(1),
322       tail(1), weblogvis(1), webvis(1), PMAPI(3), PMDA(3) and regcmp(3).
323
324
325
326Performance Co-Pilot                  SGI                        PMDAWEBLOG(1)
Impressum