1PMDAWEBLOG(1) General Commands Manual PMDAWEBLOG(1)
2
3
4
6 pmdaweblog - performance metrics domain agent (PMDA) for Web server
7 logs
8
10 $PCP_PMDAS_DIR/weblog/pmdaweblog [-Cp] [-d domain] [-h helpfile] [-i
11 port] [-l logfile] [-n idlesec] [-S num] [-t delay] [-u socket] config‐
12 file
13
15 pmdaweblog is a Performance Metrics Domain Agent (PMDA(3)) that scans
16 Web server logs to extract metrics characterizing Web server activity.
17 These performance metrics are then made available through the infra‐
18 structure of the Performance Co-Pilot (PCP).
19
20 The configfile specifies which Web servers are to be monitored, their
21 associated access logs and error logs, and a regular-expression based
22 scheme for extracting detailed information about each Web access. This
23 file is maintained as part of the PMDA installation and/or de-installa‐
24 tion by the scripts Install and Remove in the directory
25 $PCP_PMDAS_DIR/weblog. For more details, refer to the section below
26 covering installation.
27
28 Once started, pmdaweblog monitors a set of log files and in response to
29 a request for information, will process any new information that has
30 been appended to the log files, similar to a tail(1). There is also
31 periodic "catch up" to process new information from all log files, and
32 a scheme to detect the rotation of log files.
33
34 Like all other PMDAs, pmdaweblog is launched by pmcd(1) using command
35 line options specified in $PCP_PMCDCONF_PATH - the Install script will
36 prompt for appropriate values for the command line options, and update
37 $PCP_PMCDCONF_PATH.
38
39 A brief description of the pmdaweblog command line options follows:
40
41 -C Check the configuration and exit.
42
43 -d domain
44 Specify the domain number. It is absolutely crucial that the
45 performance metrics domain number specified here is unique and
46 consistent. That is, domain should be different for every PMDA
47 on the one host, and the same domain number should be used for
48 the pmdaweblog PMDA on all hosts.
49
50 For most installations, the default domain as encapsulated in
51 the file $PCP_PMDAS_DIR/weblog/domain.h will suffice. For
52 alternate values, check $PCP_PMCDCONF_PATH for the domain values
53 already in use on this host, and the file
54 $PCP_VAR_DIR/pmns/stdpmid contains a repository of ``well
55 known'' domain assignments that probably should be avoided.
56
57 -h helpfile
58 Get the help text from the supplied helpfile rather than from
59 the default location.
60
61 -i port
62 Communicate with pmcd(1) on the specified Internet port (which
63 may be a number or a name).
64
65 -l logfile
66 Location of the log file. By default, a log file named
67 weblog.log is written in the current directory of pmcd(1) when
68 pmdaweblog is started, i.e. $PCP_LOG_DIR/pmcd. If the log file
69 cannot be created or is not writable, output is written to the
70 standard error instead.
71
72 -n idlesec
73 If a Web server log file has not been modified for idlesec sec‐
74 onds, then the file will be closed and re-opened. This is the
75 only way pmdaweblog can detect any asynchronous rotation of the
76 logs by Web server administrative scripts. The default period
77 is 20 seconds. This value may be changed dynamically using
78 pmstore(1) to modify the value of the performance metric
79 web.config.check.
80
81 -p Communicate with pmcd(1) via a pipe.
82
83 -S num Specify the maximum number of Web servers per sproc. It may be
84 desirable (from a latency and load balancing perspective) or
85 necessary (due to file descriptor limits) to delegate responsi‐
86 bility for scanning the Web server log files to several sprocs.
87 pmdaweblog will ensure that each sproc handles the log files for
88 at most num Web servers. The default value is 80 Web servers
89 per sproc.
90
91 -t delay
92 To avoid the need to scan a lot of information from the Web
93 server logs in response to a single request for performance met‐
94 rics, all log files will be checked at least once every delay
95 seconds. The default is 15 seconds. This value may by changed
96 dynamically using pmstore(1) to modify the value of the perfor‐
97 mance metric web.config.catchup.
98
99 -u socket
100 Communicate with pmcd(1) via the given Unix domain socket.
101
103 The PCP framework allows metrics to be collected on one host and moni‐
104 tored from another. These hosts are referred to as collector and moni‐
105 tor hosts, respectively. A host may be both a collector and a monitor.
106
107 Collector hosts require the installation of the agent, while monitoring
108 hosts require no agent installation at all.
109
110 For collector hosts do the following as root:
111
112 # cd $PCP_PMDAS_DIR/weblog
113 # ./Install
114
115 The installation procedure prompts for a default or non-default instal‐
116 lation. A default installation will search for known server configura‐
117 tions and automatically configure the PMDA for any server log files
118 that are found. A non-default installation will step through each
119 server, prompting the user for other server configurations and argu‐
120 ments to pmdaweblog. The end result of a collector installation is to
121 build a configuration file that is passed to pmdaweblog via the config‐
122 file argument.
123
124 If you want to undo the installation, do the following as root:
125
126 # cd $PCP_PMDAS_DIR/weblog
127 # ./Remove
128
129 pmdaweblog is launched by pmcd(1) and should never be executed
130 directly. The Install and Remove scripts notify pmcd(1) when the agent
131 is installed or removed.
132
134 The configuration file for the weblog PMDA is an ASCII file that can be
135 easily modified. Empty lines and lines beginning with '#' are ignored.
136 All other lines must be either a regular expression or server specifi‐
137 cation.
138
139 Regular expressions, which are used on both the access and error log
140 files, must be of the form:
141
142 regex regexName regexp
143 or
144
145 regex_posix regexName ordering regexp_posix
146
147 The regexName is a word which uniquely identifies the regular expres‐
148 sion. This is the reference used in the server specification. The
149 regexp for access logs is in the format described for regcmp(3). The
150 regexp_posix for access logs is in the format described for regcomp(3).
151 Note that on IRIX post release 6.2, it is not recommended that the
152 POSIX compliant form be used for performance reasons. The argument
153 ordering is explained below. The Posix form should be available on all
154 platforms.
155
156 The regular expression requires the specification of up to four argu‐
157 ments to be extracted from each line of a Web server access log,
158 depending on the type of server. In the most common case there are two
159 arguments representing the method and the size.
160
161 For the non- Posix version, argument $0 should contain the method: GET,
162 HEAD , POST or PUT. The method PUT is treated as a synonym for POST,
163 and anything else is categorized as OTHER.
164
165 The second argument, $1, should contain the size of the request. A
166 size of ``-'' or `` '' is treated as unknown.
167
168 Argument $3 should contain the status code returned to the client
169 browser and argument $4 should contain the status code returned to the
170 server from a remote host. These latter two arguments are used for
171 caching servers and must be specified as a pair (or $3 will be
172 ignored). For further information on status codes, refer to the web
173 site http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
174
175 Some legal non- Posix regex expression specifications for monitoring an
176 access log are:
177
178 # pattern for CERN, NCSA, Netscape etc Access Logs
179 regex CERN ] "([A-Za-z][-A-Za-z]+)$0 .*" [-0-9]+ ([-0-9]+)$1
180
181 # pattern for FTP Server access logs (normally in SYSLOG)
182 regex SYSLOG_FTP ftpd[.*]: ([gp][-A-Za-z]+)$0( )$1
183
184 There is 1 special types of access logs with the RegexName SQUID. This
185 formats extract 4 parameters but since the Squid log file uses text-
186 based status codes, it is handled as a special case.
187
188 In the examples below, NS_PROXY parses the Netscape/W3C Common Extended
189 Log Format and SQUID parses the default Squid Object Cache format log
190 file.
191
192 # pattern for Netscape Proxy Server Extended Logs
193 regex NS_PROXY ] "([A-Za-z][-A-Za-z]+)$0 .*" ([-0-9]+)$2 \
194 ([-0-9]+)$1 ([-0-9]+)$3
195
196 # pattern for Squid Cache logs
197 regex SQUID [0-9]+.[0-9]+[ ]+[0-9]+ [a-zA-Z0-9.]+ \
198 ([_A-Z]+)$3([0-9]+)$2 ([0-9]+)$1 ([A-Z]+)$0
199
200 The regexp for the error logs does not require any arguments, only a
201 match. Some legal expressions are:
202
203 # pattern for CERN, NCSA, Netscape etc Error Logs
204 regex CERN_err .
205
206 # pattern for FTP Server error logs (normally in SYSLOG)
207 regex SYSLOG_FTP_err FTP LOGIN FAILED
208
209 If POSIX compliant regular expressions are used, additional information
210 is required since the order of parameters cannot be specified in the
211 regular expression. For backwards compatibility, the common case of
212 two parameters the order may be specified as method,size or size,method
213 In the general case, the ordering is specified by one of the following
214 methods:
215
216 n1,n2,n3,n4
217 where nX is a digit between 1 and 4. Each comma-seperated field
218 represents (in order) the arument number for
219 method,size,client_status,server_status
220
221 - Used for cases like the error logs where the content is ignored.
222
223 As for the non- Posix format, the SQUID RegexName is treated as a spe‐
224 cial case to match the non-numerical status codes.
225
226 Some legal Posix regex expression specifications for monitoring an
227 access log are:
228
229 # pattern for CERN, NCSA, Netscape, Apache etc Access Logs
230 regex_posix CERN method,size ][ \]+"([A-Za-z][-A-Za-z]+) \
231 [^"]*" [-0-9]+ ([-0-9]+)
232
233 # pattern for CERN, NCSA, Netscape, Apache etc Access Logs
234 regex_posix CERN 1,2 ][ \]+"([A-Za-z][-A-Za-z]+) \
235 [^"]*" [-0-9]+ ([-0-9]+)
236
237 # pattern for FTP Server access logs (normally in SYSLOG)
238 regex_posix SYSLOG_FTP method,size ftpd[.*]: \
239 ([gp][-A-Za-z]+)( )
240
241 # pattern for Netscape Proxy Server Extended Logs
242 regex_posix NS_PROXY 1,3,2,4 ][ ]+"([A-Za-z][-A-Za-z]+) \
243 [^"]*" ([-0-9]+) ([-0-9]+) ([-0-9]+)
244
245 # pattern for Squid Cache logs
246 regex_posix SQUID 4,3,2,1 [0-9]+.[0-9]+[ ]+[0-9]+ \
247 [a-zA-Z0-9.]+ ([_A-Z]+)([0-9]+) ([0-9]+) ([A-Z]+)
248
249 # pattern for CERN, NCSA, Netscape etc Error Logs
250 regex_posix CERN_err - .
251
252 # pattern for FTP Server error logs (normally in SYSLOG)
253 regex_posix SYSLOG_FTP_err - FTP LOGIN FAILED
254
255
256 A Web server can be specified using this syntax:
257
258 server serverName on|off accessRegex accessFile errorRegex errorFile
259
260 The serverName must be unique for each server, and is the name given to
261 the instance for the associated performance metrics. See PMAPI(3) for
262 a discussion of PCP instance domains. The on or off flag indicates
263 whether the server is to be monitored when the PMDA is installed. This
264 can altered dynamically using pmstore(1) for the metric
265 web.perserver.watched, which has one instance for each Web server named
266 in configfile.
267
268 Two files are monitored for each Web server, the access and the error
269 log. Each file requires the name of a previously declared regular
270 expression, and a file name. The log files specified for each server
271 do not have to exist when the weblog PMDA is installed. The PMDA will
272 continue to check for non-existent log files and open them when possi‐
273 ble. Some legal server specifications are:
274
275 # Netscape Server on Port 80 at IP address 127.55.555.555
276 server 127.55.555.555:80 on CERN /logs/access CERN_err /logs/errors
277
278 # FTP Server.
279 server ftpd on SYSLOG_FTP /var/log/messages SYSLOG_FTP_err /var/log/messages
280
282 Specifying regular expressions with an incorrect number of arguments,
283 anything other than 2 for access logs, and none for error logs, may
284 cause the PMDA to behave incorrectly and even crash. This is due to
285 limitations in the interface of regex(3).
286
288 $PCP_PMDAS_DIR/weblog
289 installation directory for the weblog PMDA
290
291 $PCP_PMDAS_DIR/weblog/Install
292 installation script for the weblog PMDA
293
294 $PCP_PMDAS_DIR/weblog/Remove
295 de-installation script for the weblog PMDA
296
297 $PCP_LOG_DIR/pmcd/weblog.log
298 default log file for error reporting
299
300 $PCP_PMCDCONF_PATH
301 pmcd configuration file that specifies the command line
302 options to be used when pmdaweblog is launched
303
304 $PCP_LOG_DIR/NOTICES
305 log of PMDA installations and removals
306
307 $PCP_VAR_DIR/config/web/weblog.conf
308 likely location of the weblog PMDA configuration file
309
310 $PCP_DOC_DIR/pcpweb/index.html
311 the online HTML documentation for PCPWEB
312
314 Environment variables with the prefix PCP_ are used to parameterize the
315 file and directory names used by PCP. On each installation, the file
316 /etc/pcp.conf contains the local values for these variables. The
317 $PCP_CONF variable may be used to specify an alternative configuration
318 file, as described in pcp.conf(4).
319
321 pmcd(1), pmchart(1), pmdawebping(1), pminfo(1), pmstore(1), pmview(1),
322 tail(1), weblogvis(1), webvis(1), PMAPI(3), PMDA(3) and regcmp(3).
323
324
325
326Performance Co-Pilot SGI PMDAWEBLOG(1)