1PMDAWEBLOG(1)               General Commands Manual              PMDAWEBLOG(1)
2
3
4

NAME

6       pmdaweblog  -  performance  metrics  domain agent (PMDA) for Web server
7       logs
8

SYNOPSIS

10       $PCP_PMDAS_DIR/weblog/pmdaweblog [-Cp] [-d domain]  [-h  helpfile]  [-i
11       port]  [-l  logfile]  [-n  idlesec] [-S num] [-t delay] [-u socket] [-U
12       username] configfile
13

DESCRIPTION

15       pmdaweblog is a Performance Metrics Domain Agent (PMDA(3))  that  scans
16       Web  server logs to extract metrics characterizing Web server activity.
17       These performance metrics are then made available  through  the  infra‐
18       structure of the Performance Co-Pilot (PCP).
19
20       The  configfile  specifies which Web servers are to be monitored, their
21       associated access logs and error logs, and a  regular-expression  based
22       scheme for extracting detailed information about each Web access.  This
23       file is maintained as part of the PMDA installation and/or de-installa‐
24       tion   by   the   scripts   Install   and   Remove   in  the  directory
25       $PCP_PMDAS_DIR/weblog.  For more details, refer to  the  section  below
26       covering installation.
27
28       Once started, pmdaweblog monitors a set of log files and in response to
29       a request for information, will process any new  information  that  has
30       been  appended  to  the log files, similar to a tail(1).  There is also
31       periodic "catch up" to process new information from all log files,  and
32       a scheme to detect the rotation of log files.
33
34       Like  all  other PMDAs, pmdaweblog is launched by pmcd(1) using command
35       line options specified in $PCP_PMCDCONF_PATH - the Install script  will
36       prompt  for appropriate values for the command line options, and update
37       $PCP_PMCDCONF_PATH.
38
39       A brief description of the pmdaweblog command line options follows:
40
41       -C     Check the configuration and exit.
42
43       -d domain
44              Specify the domain number.  It is absolutely  crucial  that  the
45              performance  metrics  domain number specified here is unique and
46              consistent.  That is, domain should be different for every  PMDA
47              on  the  one host, and the same domain number should be used for
48              the pmdaweblog PMDA on all hosts.
49
50              For most installations, the default domain  as  encapsulated  in
51              the   file  $PCP_PMDAS_DIR/weblog/domain.h  will  suffice.   For
52              alternate values, check $PCP_PMCDCONF_PATH for the domain values
53              already    in    use    on    this    host,    and    the   file
54              $PCP_VAR_DIR/pmns/stdpmid  contains  a  repository   of   ``well
55              known'' domain assignments that probably should be avoided.
56
57       -h helpfile
58              Get  the  help  text from the supplied helpfile rather than from
59              the default location.
60
61       -i port
62              Communicate with pmcd(1) on the specified Internet  port  (which
63              may be a number or a name).
64
65       -l logfile
66              Location  of  the  log  file.   By  default,  a  log  file named
67              weblog.log is written in the current directory of  pmcd(1)  when
68              pmdaweblog is started, i.e.  $PCP_LOG_DIR/pmcd.  If the log file
69              cannot be created or is not writable, output is written  to  the
70              standard error instead.
71
72       -n idlesec
73              If  a Web server log file has not been modified for idlesec sec‐
74              onds, then the file will be closed and re-opened.  This  is  the
75              only  way pmdaweblog can detect any asynchronous rotation of the
76              logs by Web server administrative scripts.  The  default  period
77              is  20  seconds.   This  value  may be changed dynamically using
78              pmstore(1)  to  modify  the  value  of  the  performance  metric
79              web.config.check.
80
81       -p     Communicate with pmcd(1) via a pipe.
82
83       -S num Specify  the maximum number of Web servers per sproc.  It may be
84              desirable (from a latency and  load  balancing  perspective)  or
85              necessary  (due to file descriptor limits) to delegate responsi‐
86              bility for scanning the Web server log files to several  sprocs.
87              pmdaweblog will ensure that each sproc handles the log files for
88              at most num Web servers.  The default value is  80  Web  servers
89              per sproc.
90
91       -t delay
92              To  avoid  the  need  to  scan a lot of information from the Web
93              server logs in response to a single request for performance met‐
94              rics,  all  log  files will be checked at least once every delay
95              seconds.  The default is 15 seconds.  This value may by  changed
96              dynamically  using pmstore(1) to modify the value of the perfor‐
97              mance metric web.config.catchup.
98
99       -u socket
100              Communicate with pmcd(1) via the given Unix domain socket.
101
102       -U     User account under which to run the agent.  The default  is  the
103              unprivileged  "pcp"  account  in current versions of PCP, but in
104              older versions  the  superuser  account  ("root")  was  used  by
105              default.
106

INSTALLATION

108       The  PCP framework allows metrics to be collected on one host and moni‐
109       tored from another.  These hosts are referred to as collector and moni‐
110       tor hosts, respectively.  A host may be both a collector and a monitor.
111
112       Collector hosts require the installation of the agent, while monitoring
113       hosts require no agent installation at all.
114
115       For collector hosts do the following as root:
116
117         # cd $PCP_PMDAS_DIR/weblog
118         # ./Install
119
120       The installation procedure prompts for a default or non-default instal‐
121       lation.  A default installation will search for known server configura‐
122       tions and automatically configure the PMDA for  any  server  log  files
123       that  are  found.   A  non-default  installation will step through each
124       server, prompting the user for other server  configurations  and  argu‐
125       ments  to pmdaweblog.  The end result of a collector installation is to
126       build a configuration file that is passed to pmdaweblog via the config‐
127       file argument.
128
129       If you want to undo the installation, do the following as root:
130
131         # cd $PCP_PMDAS_DIR/weblog
132         # ./Remove
133
134       pmdaweblog  is  launched  by  pmcd(1)  and  should  never  be  executed
135       directly.  The Install and Remove scripts notify pmcd(1) when the agent
136       is installed or removed.
137

CONFIGURATION

139       The configuration file for the weblog PMDA is an ASCII file that can be
140       easily modified.  Empty lines and lines beginning with '#' are ignored.
141       All  other lines must be either a regular expression or server specifi‐
142       cation.
143
144       Regular expressions, which are used on both the access  and  error  log
145       files, must be of the form:
146
147         regex regexName regexp
148       or
149
150         regex_posix regexName ordering regexp_posix
151
152       The  regexName  is a word which uniquely identifies the regular expres‐
153       sion.  This is the reference used in  the  server  specification.   The
154       regexp  for  access logs is in the format described for regcmp(3).  The
155       regexp_posix for access logs is in the format described for regcomp(3).
156       The  argument  ordering  is  explained  below. The Posix form should be
157       available on all platforms.
158
159       The regular expression requires the specification of up to  four  argu‐
160       ments  to  be  extracted  from  each  line  of a Web server access log,
161       depending on the type of server. In the most common case there are  two
162       arguments representing the method and the size.
163
164       For the non- Posix version, argument $0 should contain the method: GET,
165       HEAD , POST or PUT.  The method PUT is treated as a synonym  for  POST,
166       and anything else is categorized as OTHER.
167
168       The  second  argument,  $1,  should contain the size of the request.  A
169       size of ``-'' or `` '' is treated as unknown.
170
171       Argument $3 should contain the  status  code  returned  to  the  client
172       browser  and argument $4 should contain the status code returned to the
173       server from a remote host.  These latter two  arguments  are  used  for
174       caching  servers  and  must  be  specified  as  a  pair  (or $3 will be
175       ignored). For further information on status codes,  refer  to  the  web
176       site http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html.
177
178       Some legal non- Posix regex expression specifications for monitoring an
179       access log are:
180
181         # pattern for CERN, NCSA, Netscape etc Access Logs
182         regex CERN ] "([A-Za-z][-A-Za-z]+)$0 .*" [-0-9]+ ([-0-9]+)$1
183
184         # pattern for FTP Server access logs (normally in SYSLOG)
185         regex SYSLOG_FTP ftpd[.*]: ([gp][-A-Za-z]+)$0( )$1
186
187       There is 1 special types of access logs with the RegexName SQUID.  This
188       formats  extract  4  parameters but since the Squid log file uses text-
189       based status codes, it is handled as a special case.
190
191       In the examples below, NS_PROXY parses the Netscape/W3C Common Extended
192       Log  Format  and SQUID parses the default Squid Object Cache format log
193       file.
194
195         # pattern for Netscape Proxy Server Extended Logs
196         regex NS_PROXY ] "([A-Za-z][-A-Za-z]+)$0 .*" ([-0-9]+)$2 \
197              ([-0-9]+)$1 ([-0-9]+)$3
198
199         # pattern for Squid Cache logs
200         regex SQUID [0-9]+.[0-9]+[ ]+[0-9]+ [a-zA-Z0-9.]+ \
201              ([_A-Z]+)$3([0-9]+)$2 ([0-9]+)$1 ([A-Z]+)$0
202
203       The regexp for the error logs does not require any  arguments,  only  a
204       match.  Some legal expressions are:
205
206         # pattern for CERN, NCSA, Netscape etc Error Logs
207         regex CERN_err .
208
209         # pattern for FTP Server error logs (normally in SYSLOG)
210         regex SYSLOG_FTP_err FTP LOGIN FAILED
211
212       If POSIX compliant regular expressions are used, additional information
213       is required since the order of parameters cannot be  specified  in  the
214       regular  expression.   For  backwards compatibility, the common case of
215       two parameters the order may be specified as method,size or size,method
216       In  the general case, the ordering is specified by one of the following
217       methods:
218
219       n1,n2,n3,n4
220            where nX is a digit between 1 and 4.  Each  comma-seperated  field
221            represents     (in     order)     the    argument    number    for
222            method,size,client_status,server_status
223
224       -    Used for cases like the error logs where the content is ignored.
225
226       As for the non- Posix format, the SQUID RegexName is treated as a  spe‐
227       cial case to match the non-numerical status codes.
228
229       Some  legal  Posix  regex  expression  specifications for monitoring an
230       access log are:
231
232         # pattern for CERN, NCSA, Netscape, Apache etc Access Logs
233         regex_posix CERN method,size ][ \]+"([A-Za-z][-A-Za-z]+) \
234              [^"]*" [-0-9]+ ([-0-9]+)
235
236         # pattern for CERN, NCSA, Netscape, Apache etc Access Logs
237         regex_posix CERN 1,2 ][ \]+"([A-Za-z][-A-Za-z]+) \
238              [^"]*" [-0-9]+ ([-0-9]+)
239
240         # pattern for FTP Server access logs (normally in SYSLOG)
241         regex_posix SYSLOG_FTP method,size ftpd[.*]: \
242              ([gp][-A-Za-z]+)( )
243
244         # pattern for Netscape Proxy Server Extended Logs
245         regex_posix NS_PROXY 1,3,2,4 ][ ]+"([A-Za-z][-A-Za-z]+) \
246              [^"]*" ([-0-9]+) ([-0-9]+) ([-0-9]+)
247
248         # pattern for Squid Cache logs
249         regex_posix SQUID 4,3,2,1 [0-9]+.[0-9]+[ ]+[0-9]+ \
250              [a-zA-Z0-9.]+ ([_A-Z]+)([0-9]+) ([0-9]+) ([A-Z]+)
251
252         # pattern for CERN, NCSA, Netscape etc Error Logs
253         regex_posix CERN_err - .
254
255         # pattern for FTP Server error logs (normally in SYSLOG)
256         regex_posix SYSLOG_FTP_err - FTP LOGIN FAILED
257
258
259       A Web server can be specified using this syntax:
260
261         server serverName on|off accessRegex accessFile errorRegex errorFile
262
263       The serverName must be unique for each server, and is the name given to
264       the  instance for the associated performance metrics.  See PMAPI(3) for
265       a discussion of PCP instance domains.  The on  or  off  flag  indicates
266       whether the server is to be monitored when the PMDA is installed.  This
267       can   altered   dynamically   using   pmstore(1)   for    the    metric
268       web.perserver.watched, which has one instance for each Web server named
269       in configfile.
270
271       Two files are monitored for each Web server, the access and  the  error
272       log.   Each  file  requires  the  name of a previously declared regular
273       expression, and a file name.  The log files specified for  each  server
274       do  not have to exist when the weblog PMDA is installed.  The PMDA will
275       continue to check for non-existent log files and open them when  possi‐
276       ble.  Some legal server specifications are:
277
278         # Netscape Server on Port 80 at IP address 127.55.555.555
279         server 127.55.555.555:80 on CERN /logs/access CERN_err /logs/errors
280
281         # FTP Server.
282         server ftpd on SYSLOG_FTP /var/log/messages SYSLOG_FTP_err /var/log/messages
283

CAVEATS

285       Specifying  regular  expressions with an incorrect number of arguments,
286       anything other than 2 for access logs, and none  for  error  logs,  may
287       cause  the  PMDA  to  behave incorrectly and even crash. This is due to
288       limitations in the interface of regex(3).
289

FILES

291       $PCP_PMDAS_DIR/weblog
292                 installation directory for the weblog PMDA
293
294       $PCP_PMDAS_DIR/weblog/Install
295                 installation script for the weblog PMDA
296
297       $PCP_PMDAS_DIR/weblog/Remove
298                 de-installation script for the weblog PMDA
299
300       $PCP_LOG_DIR/pmcd/weblog.log
301                 default log file for error reporting
302
303       $PCP_PMCDCONF_PATH
304                 pmcd configuration  file  that  specifies  the  command  line
305                 options to be used when pmdaweblog is launched
306
307       $PCP_LOG_DIR/NOTICES
308                 log of PMDA installations and removals
309
310       $PCP_VAR_DIR/config/web/weblog.conf
311                 likely location of the weblog PMDA configuration file
312
313       $PCP_DOC_DIR/pcpweb/index.html
314                 the online HTML documentation for PCPWEB
315

PCP ENVIRONMENT

317       Environment variables with the prefix PCP_ are used to parameterize the
318       file and directory names used by PCP.  On each installation,  the  file
319       /etc/pcp.conf  contains  the  local  values  for  these variables.  The
320       $PCP_CONF variable may be used to specify an alternative  configuration
321       file, as described in pcp.conf(5).
322

SEE ALSO

324       pmcd(1),  pmchart(1), pmdawebping(1), pminfo(1), pmstore(1), pmview(1),
325       tail(1), weblogvis(1), webvis(1), PMAPI(3), PMDA(3) and regcmp(3).
326
327
328
329Performance Co-Pilot                  PCP                        PMDAWEBLOG(1)
Impressum