1PMDAWEBLOG(1) General Commands Manual PMDAWEBLOG(1)
2
3
4
6 pmdaweblog - performance metrics domain agent (PMDA) for Web server
7 logs
8
10 $PCP_PMDAS_DIR/weblog/pmdaweblog [-Cp] [-d domain] [-h helpfile] [-i
11 port] [-l logfile] [-n idlesec] [-S num] [-t delay] [-u socket] [-U
12 username] configfile
13
15 pmdaweblog is a Performance Metrics Domain Agent (PMDA(3)) that scans
16 Web server logs to extract metrics characterizing Web server activity.
17 These performance metrics are then made available through the infra‐
18 structure of the Performance Co-Pilot (PCP).
19
20 The configfile specifies which Web servers are to be monitored, their
21 associated access logs and error logs, and a regular-expression based
22 scheme for extracting detailed information about each Web access. This
23 file is maintained as part of the PMDA installation and/or de-installa‐
24 tion by the scripts Install and Remove in the directory $PCP_PM‐
25 DAS_DIR/weblog. For more details, refer to the section below covering
26 installation.
27
28 Once started, pmdaweblog monitors a set of log files and in response to
29 a request for information, will process any new information that has
30 been appended to the log files, similar to a tail(1). There is also
31 periodic "catch up" to process new information from all log files, and
32 a scheme to detect the rotation of log files.
33
34 Like all other PMDAs, pmdaweblog is launched by pmcd(1) using command
35 line options specified in $PCP_PMCDCONF_PATH - the Install script will
36 prompt for appropriate values for the command line options, and update
37 $PCP_PMCDCONF_PATH.
38
39 A brief description of the pmdaweblog command line options follows:
40
41 -C Check the configuration and exit.
42
43 -d domain
44 Specify the domain number. It is absolutely crucial that the
45 performance metrics domain number specified here is unique and
46 consistent. That is, domain should be different for every PMDA
47 on the one host, and the same domain number should be used for
48 the pmdaweblog PMDA on all hosts.
49
50 For most installations, the default domain as encapsulated in
51 the file $PCP_PMDAS_DIR/weblog/domain.h will suffice. For al‐
52 ternate values, check $PCP_PMCDCONF_PATH for the domain values
53 already in use on this host, and the file
54 $PCP_VAR_DIR/pmns/stdpmid contains a repository of ``well
55 known'' domain assignments that probably should be avoided.
56
57 -h helpfile
58 Get the help text from the supplied helpfile rather than from
59 the default location.
60
61 -i port
62 Communicate with pmcd(1) on the specified Internet port (which
63 may be a number or a name).
64
65 -l logfile
66 Location of the log file. By default, a log file named web‐
67 log.log is written in the current directory of pmcd(1) when pm‐
68 daweblog is started, i.e. $PCP_LOG_DIR/pmcd. If the log file
69 cannot be created or is not writable, output is written to the
70 standard error instead.
71
72 -n idlesec
73 If a Web server log file has not been modified for idlesec sec‐
74 onds, then the file will be closed and re-opened. This is the
75 only way pmdaweblog can detect any asynchronous rotation of the
76 logs by Web server administrative scripts. The default period
77 is 20 seconds. This value may be changed dynamically using pm‐
78 store(1) to modify the value of the performance metric web.con‐
79 fig.check.
80
81 -p Communicate with pmcd(1) via a pipe.
82
83 -S num Specify the maximum number of Web servers per sproc. It may be
84 desirable (from a latency and load balancing perspective) or
85 necessary (due to file descriptor limits) to delegate responsi‐
86 bility for scanning the Web server log files to several sprocs.
87 pmdaweblog will ensure that each sproc handles the log files for
88 at most num Web servers. The default value is 80 Web servers
89 per sproc.
90
91 -t delay
92 To avoid the need to scan a lot of information from the Web
93 server logs in response to a single request for performance met‐
94 rics, all log files will be checked at least once every delay
95 seconds. The default is 15 seconds. This value may by changed
96 dynamically using pmstore(1) to modify the value of the perfor‐
97 mance metric web.config.catchup.
98
99 -u socket
100 Communicate with pmcd(1) via the given Unix domain socket.
101
102 -U User account under which to run the agent. The default is the
103 unprivileged "pcp" account in current versions of PCP, but in
104 older versions the superuser account ("root") was used by de‐
105 fault.
106
108 The PCP framework allows metrics to be collected on one host and moni‐
109 tored from another. These hosts are referred to as collector and moni‐
110 tor hosts, respectively. A host may be both a collector and a monitor.
111
112 Collector hosts require the installation of the agent, while monitoring
113 hosts require no agent installation at all.
114
115 For collector hosts do the following as root:
116
117 # cd $PCP_PMDAS_DIR/weblog
118 # ./Install
119
120 The installation procedure prompts for a default or non-default instal‐
121 lation. A default installation will search for known server configura‐
122 tions and automatically configure the PMDA for any server log files
123 that are found. A non-default installation will step through each
124 server, prompting the user for other server configurations and argu‐
125 ments to pmdaweblog. The end result of a collector installation is to
126 build a configuration file that is passed to pmdaweblog via the config‐
127 file argument.
128
129 If you want to undo the installation, do the following as root:
130
131 # cd $PCP_PMDAS_DIR/weblog
132 # ./Remove
133
134 pmdaweblog is launched by pmcd(1) and should never be executed di‐
135 rectly. The Install and Remove scripts notify pmcd(1) when the agent
136 is installed or removed.
137
139 The configuration file for the weblog PMDA is an ASCII file that can be
140 easily modified. Empty lines and lines beginning with '#' are ignored.
141 All other lines must be either a regular expression or server specifi‐
142 cation.
143
144 Regular expressions, which are used on both the access and error log
145 files, must be of the form:
146
147 regex regexName regexp
148 or
149
150 regex_posix regexName ordering regexp_posix
151
152 The regexName is a word which uniquely identifies the regular expres‐
153 sion. This is the reference used in the server specification. The
154 regexp for access logs is in the format described for regcmp(3). The
155 regexp_posix for access logs is in the format described for regcomp(3).
156 The argument ordering is explained below. The Posix form should be
157 available on all platforms.
158
159 The regular expression requires the specification of up to four argu‐
160 ments to be extracted from each line of a Web server access log, de‐
161 pending on the type of server. In the most common case there are two
162 arguments representing the method and the size.
163
164 For the non- Posix version, argument $0 should contain the method: GET,
165 HEAD , POST or PUT. The method PUT is treated as a synonym for POST,
166 and anything else is categorized as OTHER.
167
168 The second argument, $1, should contain the size of the request. A
169 size of ``-'' or `` '' is treated as unknown.
170
171 Argument $3 should contain the status code returned to the client
172 browser and argument $4 should contain the status code returned to the
173 server from a remote host. These latter two arguments are used for
174 caching servers and must be specified as a pair (or $3 will be ig‐
175 nored). For further information on status codes, refer to the web site
176 http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html.
177
178 Some legal non- Posix regex expression specifications for monitoring an
179 access log are:
180
181 # pattern for CERN, NCSA, Netscape etc Access Logs
182 regex CERN ] "([A-Za-z][-A-Za-z]+)$0 .*" [-0-9]+ ([-0-9]+)$1
183
184 # pattern for FTP Server access logs (normally in SYSLOG)
185 regex SYSLOG_FTP ftpd[.*]: ([gp][-A-Za-z]+)$0( )$1
186
187 There is 1 special types of access logs with the RegexName SQUID. This
188 formats extract 4 parameters but since the Squid log file uses text-
189 based status codes, it is handled as a special case.
190
191 In the examples below, NS_PROXY parses the Netscape/W3C Common Extended
192 Log Format and SQUID parses the default Squid Object Cache format log
193 file.
194
195 # pattern for Netscape Proxy Server Extended Logs
196 regex NS_PROXY ] "([A-Za-z][-A-Za-z]+)$0 .*" ([-0-9]+)$2 \
197 ([-0-9]+)$1 ([-0-9]+)$3
198
199 # pattern for Squid Cache logs
200 regex SQUID [0-9]+.[0-9]+[ ]+[0-9]+ [a-zA-Z0-9.]+ \
201 ([_A-Z]+)$3([0-9]+)$2 ([0-9]+)$1 ([A-Z]+)$0
202
203 The regexp for the error logs does not require any arguments, only a
204 match. Some legal expressions are:
205
206 # pattern for CERN, NCSA, Netscape etc Error Logs
207 regex CERN_err .
208
209 # pattern for FTP Server error logs (normally in SYSLOG)
210 regex SYSLOG_FTP_err FTP LOGIN FAILED
211
212 If POSIX compliant regular expressions are used, additional information
213 is required since the order of parameters cannot be specified in the
214 regular expression. For backwards compatibility, the common case of
215 two parameters the order may be specified as method,size or size,method
216 In the general case, the ordering is specified by one of the following
217 methods:
218
219 n1,n2,n3,n4
220 where nX is a digit between 1 and 4. Each comma-seperated field
221 represents (in order) the argument number for
222 method,size,client_status,server_status
223
224 - Used for cases like the error logs where the content is ignored.
225
226 As for the non- Posix format, the SQUID RegexName is treated as a spe‐
227 cial case to match the non-numerical status codes.
228
229 Some legal Posix regex expression specifications for monitoring an ac‐
230 cess log are:
231
232 # pattern for CERN, NCSA, Netscape, Apache etc Access Logs
233 regex_posix CERN method,size ][ \]+"([A-Za-z][-A-Za-z]+) \
234 [^"]*" [-0-9]+ ([-0-9]+)
235
236 # pattern for CERN, NCSA, Netscape, Apache etc Access Logs
237 regex_posix CERN 1,2 ][ \]+"([A-Za-z][-A-Za-z]+) \
238 [^"]*" [-0-9]+ ([-0-9]+)
239
240 # pattern for FTP Server access logs (normally in SYSLOG)
241 regex_posix SYSLOG_FTP method,size ftpd[.*]: \
242 ([gp][-A-Za-z]+)( )
243
244 # pattern for Netscape Proxy Server Extended Logs
245 regex_posix NS_PROXY 1,3,2,4 ][ ]+"([A-Za-z][-A-Za-z]+) \
246 [^"]*" ([-0-9]+) ([-0-9]+) ([-0-9]+)
247
248 # pattern for Squid Cache logs
249 regex_posix SQUID 4,3,2,1 [0-9]+.[0-9]+[ ]+[0-9]+ \
250 [a-zA-Z0-9.]+ ([_A-Z]+)([0-9]+) ([0-9]+) ([A-Z]+)
251
252 # pattern for CERN, NCSA, Netscape etc Error Logs
253 regex_posix CERN_err - .
254
255 # pattern for FTP Server error logs (normally in SYSLOG)
256 regex_posix SYSLOG_FTP_err - FTP LOGIN FAILED
257
258
259 A Web server can be specified using this syntax:
260
261 server serverName on|off accessRegex accessFile errorRegex errorFile
262
263 The serverName must be unique for each server, and is the name given to
264 the instance for the associated performance metrics. See PMAPI(3) for
265 a discussion of PCP instance domains. The on or off flag indicates
266 whether the server is to be monitored when the PMDA is installed. This
267 can altered dynamically using pmstore(1) for the metric
268 web.perserver.watched, which has one instance for each Web server named
269 in configfile.
270
271 Two files are monitored for each Web server, the access and the error
272 log. Each file requires the name of a previously declared regular ex‐
273 pression, and a file name. The log files specified for each server do
274 not have to exist when the weblog PMDA is installed. The PMDA will
275 continue to check for non-existent log files and open them when possi‐
276 ble. Some legal server specifications are:
277
278 # Netscape Server on Port 80 at IP address 127.55.555.555
279 server 127.55.555.555:80 on CERN /logs/access CERN_err /logs/errors
280
281 # FTP Server.
282 server ftpd on SYSLOG_FTP /var/log/messages SYSLOG_FTP_err /var/log/messages
283
285 Specifying regular expressions with an incorrect number of arguments,
286 anything other than 2 for access logs, and none for error logs, may
287 cause the PMDA to behave incorrectly and even crash. This is due to
288 limitations in the interface of regex(3).
289
291 $PCP_PMDAS_DIR/weblog
292 installation directory for the weblog PMDA
293
294 $PCP_PMDAS_DIR/weblog/Install
295 installation script for the weblog PMDA
296
297 $PCP_PMDAS_DIR/weblog/Remove
298 de-installation script for the weblog PMDA
299
300 $PCP_LOG_DIR/pmcd/weblog.log
301 default log file for error reporting
302
303 $PCP_PMCDCONF_PATH
304 pmcd configuration file that specifies the command line op‐
305 tions to be used when pmdaweblog is launched
306
307 $PCP_LOG_DIR/NOTICES
308 log of PMDA installations and removals
309
310 $PCP_VAR_DIR/config/web/weblog.conf
311 likely location of the weblog PMDA configuration file
312
313 $PCP_DOC_DIR/pcpweb/index.html
314 the online HTML documentation for PCPWEB
315
317 Environment variables with the prefix PCP_ are used to parameterize the
318 file and directory names used by PCP. On each installation, the file
319 /etc/pcp.conf contains the local values for these variables. The
320 $PCP_CONF variable may be used to specify an alternative configuration
321 file, as described in pcp.conf(5).
322
324 pmcd(1), pmchart(1), pmdawebping(1), pminfo(1), pmstore(1), pmview(1),
325 tail(1), weblogvis(1), webvis(1), PMAPI(3), PMDA(3) and regcmp(3).
326
327
328
329Performance Co-Pilot PCP PMDAWEBLOG(1)