hb_report(8)

1HB_REPORT(8)                Pacemaker documentation               HB_REPORT(8)
2
3
4

NAME

6       hb_report - create report for CRM based clusters (Pacemaker)
7

SYNOPSIS

9       hb_report -f {time|"cts:"testnum} [-t time] [-u user] [-l file] [-n
10       nodes] [-E files] [-p patt] [-L patt] [-e prog] [-MSDCZAVsvhd] [dest]
11

DESCRIPTION

13       The hb_report(1) is a utility to collect all information (logs,
14       configuration files, system information, etc) relevant to Pacemaker
15       (CRM) over the given period of time.
16

OPTIONS

18       dest
19           The destination directory. Must be an absolute path. The resulting
20           tarball is placed in the parent directory and contains the last
21           directory element of this path. Typically something like
22           /tmp/standby-failed. If left out, the tarball is created in your
23           home directory named "hb_report-current_date", for instance
24           hb_report-Wed-03-Mar-2010.
25
26       -d
27           Don’t create the compressed tar, but leave the result in a
28           directory.
29
30       -f { time | "cts:"testnum }
31           The start time from which to collect logs. The time is in the
32           format as used by the Date::Parse perl module. For cts tests,
33           specify the "cts:" string followed by the test number. This option
34           is required.
35
36       -t time
37           The end time to which to collect logs. Defaults to now.
38
39       -n nodes
40           A list of space separated hostnames (cluster members). hb_report
41           may try to find out the set of nodes by itself, but if it runs on
42           the loghost which, as it is usually the case, does not belong to
43           the cluster, that may be difficult. Also, OpenAIS doesn’t contain a
44           list of nodes and if Pacemaker is not running, there is no way to
45           find it out automatically. This option is cumulative (i.e. use -n
46           "a b" or -n a -n b).
47
48       -l file
49           Log file location. If, for whatever reason, hb_report cannot find
50           the log files, you can specify its absolute path.
51
52       -E files
53           Extra log files to collect. This option is cumulative. By default,
54           /var/log/messages are collected along with the cluster logs.
55
56       -M
57           Don’t collect extra log files, but only the file containing
58           messages from the cluster subsystems.
59
60       -L patt
61           A list of regular expressions to match in log files for analysis.
62           This option is additive (default: "CRIT: ERROR:").
63
64       -p patt
65           Additional patterns to match parameter name which contain sensitive
66           information. This option is additive (default: "passw.*").
67
68       -A
69           This is an OpenAIS cluster. hb_report has some heuristics to find
70           the cluster stack, but that is not always reliable. By default,
71           hb_report assumes that it is run on a Heartbeat cluster.
72
73       -u user
74           The ssh user. hb_report will try to login to other nodes without
75           specifying a user, then as "root", and finally as "hacluster". If
76           you have another user for administration over ssh, please use this
77           option.
78
79       -S
80           Single node operation. Run hb_report only on this node and don’t
81           try to start slave collectors on other members of the cluster.
82           Under normal circumstances this option is not needed. Use if ssh(1)
83           does not work to other nodes.
84
85       -Z
86           If destination directories exist, remove them instead of exiting
87           (this is default for CTS).
88
89       -V
90           Print the version including the last repository changeset.
91
92       -v
93           Increase verbosity. Normally used to debug unexpected behaviour.
94
95       -h
96           Show usage and some examples.
97
98       -D (obsolete)
99           Don’t invoke editor to fill the description text file.
100
101       -e prog (obsolete)
102           Your favourite text editor. Defaults to $EDITOR, vim, vi, emacs, or
103           nano, whichever is found first.
104
105       -C (obsolete)
106           Remove the destination directory once the report has been put in a
107           tarball.
108

EXAMPLES

110       Last night during the backup there were several warnings encountered
111       (logserver is the log host):
112
113           logserver# hb_report -f 3:00 -t 4:00 -n "node1 node2" /tmp/report
114
115       collects everything from all nodes from 3am to 4am last night. The
116       files are compressed to a tarball /tmp/report.tar.bz2.
117
118       Just found a problem during testing:
119
120           # note the current time
121           node1# date
122           Fri Sep 11 18:51:40 CEST 2009
123           node1# /etc/init.d/heartbeat start
124           node1# nasty-command-that-breaks-things
125           node1# sleep 120 #wait for the cluster to settle
126           node1# hb_report -f 18:51 /tmp/hb1
127
128           # if hb_report can´t figure out that this is openais
129           node1# hb_report -f 18:51 -A /tmp/hb1
130
131           # if hb_report can´t figure out the cluster members
132           node1# hb_report -f 18:51 -n "node1 node2" /tmp/hb1
133
134       The files are compressed to a tarball /tmp/hb1.tar.bz2.
135

INTERPRETING RESULTS

137       The compressed tar archive is the final product of hb_report. This is
138       one example of its content, for a CTS test case on a three node OpenAIS
139       cluster:
140
141           $ ls -RF 001-Restart
142
143           001-Restart:
144           analysis.txt     events.txt  logd.cf       s390vm13/  s390vm16/
145           description.txt  ha-log.txt  openais.conf  s390vm14/
146
147           001-Restart/s390vm13:
148           STOPPED  crm_verify.txt  hb_uuid.txt  openais.conf@   sysinfo.txt
149           cib.txt  dlm_dump.txt    logd.cf@     pengine/        sysstats.txt
150           cib.xml  events.txt      messages     permissions.txt
151
152           001-Restart/s390vm13/pengine:
153           pe-input-738.bz2  pe-input-740.bz2  pe-warn-450.bz2
154           pe-input-739.bz2  pe-warn-449.bz2   pe-warn-451.bz2
155
156           001-Restart/s390vm14:
157           STOPPED  crm_verify.txt  hb_uuid.txt  openais.conf@   sysstats.txt
158           cib.txt  dlm_dump.txt    logd.cf@     permissions.txt
159           cib.xml  events.txt      messages     sysinfo.txt
160
161           001-Restart/s390vm16:
162           STOPPED  crm_verify.txt  hb_uuid.txt  messages        sysinfo.txt
163           cib.txt  dlm_dump.txt    hostcache    openais.conf@   sysstats.txt
164           cib.xml  events.txt      logd.cf@     permissions.txt
165
166       The top directory contains information which pertains to the cluster or
167       event as a whole. Files with exactly the same content on all nodes will
168       also be at the top, with per-node links created (as it is in this
169       example the case with openais.conf and logd.cf).
170
171       The cluster log files are named ha-log.txt regardless of the actual log
172       file name on the system. If it is found on the loghost, then it is
173       placed in the top directory. Files named messages are excerpts of
174       /var/log/messages from nodes.
175
176       Most files are copied verbatim or they contain output of a command. For
177       instance, cib.xml is a copy of the CIB found in
178       /var/lib/heartbeat/crm/cib.xml. crm_verify.txt is output of the
179       crm_verify(8) program.
180
181       Some files are result of a more involved processing:
182
183       analysis.txt
184           A set of log messages matching user defined patterns (may be
185           provided with the -L option).
186
187       events.txt
188           A set of log messages matching event patterns. It should provide
189           information about major cluster motions without unnecessary
190           details. These patterns are devised by the cluster experts.
191           Currently, the patterns cover membership and quorum changes,
192           resource starts and stops, fencing (stonith) actions, and cluster
193           starts and stops. events.txt is always generated for each node. In
194           case the central cluster log was found, also combined for all
195           nodes.
196
197       permissions.txt
198           One of the more common problem causes are file and directory
199           permissions. hb_report looks for a set of predefined directories
200           and checks their permissions. Any issues are reported here.
201
202       backtraces.txt
203           gdb generated backtrace information for cores dumped within the
204           specified period.
205
206       sysinfo.txt
207           Various release information about the platform, kernel, operating
208           system, packages, and anything else deemed to be relevant. The
209           static part of the system.
210
211       sysstats.txt
212           Output of various system commands such as ps(1), uptime(1),
213           netstat(8), and ifconfig(8). The dynamic part of the system.
214
215       description.txt should contain a user supplied description of the
216       problem, but since it is very seldom used, it will be dropped from the
217       future releases.
218

PREREQUISITES

220       ssh
221           It is not strictly required, but you won’t regret having a
222           password-less ssh. It is not too difficult to setup and will save
223           you a lot of time. If you can’t have it, for example because your
224           security policy does not allow such a thing, or you just prefer
225           menial work, then you will have to resort to the semi-manual
226           semi-automated report generation. See below for instructions.
227
228           If you need to supply a password for your passphrase/login, then
229           please use the -u option.
230
231       Times
232           In order to find files and messages in the given period and to
233           parse the -f and -t options, hb_report uses perl and one of the
234           Date::Parse or Date::Manip perl modules. Note that you need only
235           one of these. Furthermore, on nodes which have no logs and where
236           you don’t run hb_report directly, no date parsing is necessary. In
237           other words, if you run this on a loghost then you don’t need these
238           perl modules on the cluster nodes.
239
240           On rpm based distributions, you can find Date::Parse in
241           perl-TimeDate and on Debian and its derivatives in
242           libtimedate-perl.
243
244       Core dumps
245           To backtrace core dumps gdb is needed and the packages with the
246           debugging info. The debug info packages may be installed at the
247           time the report is created. Let’s hope that you will need this
248           really seldom.
249

TIMES

251       Specifying times can at times be a nuisance. That is why we have chosen
252       to use one of the perl modules—they do allow certain freedom when
253       talking dates. You can either read the instructions at the Date::Parse
254       examples page[1]. or just rely on common sense and try stuff like:
255
256           3:00          (today at 3am)
257           15:00         (today at 3pm)
258           2007/9/1 2pm  (September 1st at 2pm)
259           Tue Sep 15 20:46:27 CEST 2009 (September 15th etc)
260
261       hb_report will (probably) complain if it can’t figure out what do you
262       mean.
263
264       Try to delimit the event as close as possible in order to reduce the
265       size of the report, but still leaving a minute or two around for good
266       measure.
267
268       -f is not optional. And don’t forget to quote dates when they contain
269       spaces.
270

SHOULD I SEND ALL THIS TO THE REST OF INTERNET?

272       By default, the sensitive data in CIB and PE files is not mangled by
273       hb_report because that makes PE input files mostly useless. If you
274       still have no other option but to send the report to a public mailing
275       list and do not want the sensitive data to be included, use the -s
276       option. Without this option, hb_report will issue a warning if it finds
277       information which should not be exposed. By default, parameters
278       matching passw.* are considered sensitive. Use the -p option to specify
279       additional regular expressions to match variable names which may
280       contain information you don’t want to leak. For example:
281
282           # hb_report -f 18:00 -p "user.*" -p "secret.*" /var/tmp/report
283
284       Heartbeat’s ha.cf is always sanitized. Logs and other files are not
285       filtered.
286

LOGS

288       It may be tricky to find syslog logs. The scheme used is to log a
289       unique message on all nodes and then look it up in the usual syslog
290       locations. This procedure is not foolproof, in particular if the syslog
291       files are in a non-standard directory. We look in /var/log /var/logs
292       /var/syslog /var/adm /var/log/ha /var/log/cluster. In case we can’t
293       find the logs, please supply their location:
294
295           # hb_report -f 5pm -l /var/log/cluster1/ha-log -S /tmp/report_node1
296
297       If you have different log locations on different nodes, well, perhaps
298       you’d like to make them the same and make life easier for everybody.
299
300       Files starting with "ha-" are preferred. In case syslog sends messages
301       to more than one file, if one of them is named ha-log or ha-debug those
302       will be favoured over syslog or messages.
303
304       hb_report supports also archived logs in case the period specified
305       extends that far in the past. The archives must reside in the same
306       directory as the current log and their names must be prefixed with the
307       name of the current log (syslog-1.gz or messages-20090105.bz2).
308
309       If there is no separate log for the cluster, possibly unrelated
310       messages from other programs are included. We don’t filter logs, but
311       just pick a segment for the period you specified.
312

MANUAL REPORT COLLECTION

314       So, your ssh doesn’t work. In that case, you will have to run this
315       procedure on all nodes. Use -S so that hb_report doesn’t bother with
316       ssh:
317
318           # hb_report -f 5:20pm -t 5:30pm -S /tmp/report_node1
319
320       If you also have a log host which is not in the cluster, then you’ll
321       have to copy the log to one of the nodes and tell us where it is:
322
323           # hb_report -f 5:20pm -t 5:30pm -l /var/tmp/ha-log -S /tmp/report_node1
324
325       If you reconsider and want the ssh setup, take a look at the CTS README
326       file for instructions.
327

OPERATION

329       hb_report collects files and other information in a fairly
330       straightforward way. The most complex tasks are discovering the log
331       file locations (if syslog is used which is the most common case) and
332       coordinating the operation on multiple nodes.
333
334       The instance of hb_report running on the host where it was invoked is
335       the master instance. Instances running on other nodes are slave
336       instances. The master instance communicates with slave instances by
337       ssh. There are multiple ssh invocations per run, so it is essential
338       that the ssh works without password, i.e. with the public key
339       authentication and authorized_keys.
340
341       The operation consists of three phases. Each phase must finish on all
342       nodes before the next one can commence. The first phase consists of
343       logging unique messages through syslog on all nodes. This is the
344       shortest of all phases.
345
346       The second phase is the most involved. During this phase all local
347       information is collected, which includes:
348
349       ·   logs (both current and archived if the start time is far in the
350           past)
351
352       ·   various configuration files (openais, heartbeat, logd)
353
354       ·   the CIB (both as xml and as represented by the crm shell)
355
356       ·   pengine inputs (if this node was the DC at any point in time over
357           the given period)
358
359       ·   system information and status
360
361       ·   package information and status
362
363       ·   dlm lock information
364
365       ·   backtraces (if there were core dumps)
366
367       The third phase is collecting information from all nodes and analyzing
368       it. The analyzis consists of the following tasks:
369
370       ·   identify files equal on all nodes which may then be moved to the
371           top directory
372
373       ·   save log messages matching user defined patterns (defaults to
374           ERRORs and CRITical conditions)
375
376       ·   report if there were coredumps and by whom
377
378       ·   report crm_verify(8) results
379
380       ·   save log messages matching major events to events.txt
381
382       ·   in case logging is configured without loghost, node logs and events
383           files are combined using a perl utility
384

BUGS

386       Finding logs may at times be extremely difficult, depending on how
387       weird the syslog configuration. It would be nice to ask syslog-ng
388       developers to provide a way to find out the log destination based on
389       facility and priority.
390
391       If you think you found a bug, please rerun with the -v option and
392       attach the output to bugzilla.
393
394       hb_report can function in a satisfactory way only if ssh works to all
395       nodes using authorized_keys (without password).
396
397       There are way too many options.
398

AUTHOR

400       Written by Dejan Muhamedagic, <dejan@suse.de[2]>
401

RESOURCES

403       Pacemaker: http://clusterlabs.org/
404
405       Heartbeat and other Linux HA resources: http://linux-ha.org/wiki
406
407       OpenAIS: http://www.openais.org/
408
409       Corosync: http://www.corosync.org/
410

COPYING

415       Copyright (C) 2007-2009 Dejan Muhamedagic. Free use of this software is
416       granted under the terms of the GNU General Public License (GPL).
417

NOTES

419        1. Date::Parse examples page
420           http://search.cpan.org/dist/TimeDate/lib/Date/Parse.pm#EXAMPLE_DATES
421
422        2. dejan@suse.de
423           mailto:dejan@suse.de
424
425
426
427hb_report 1.2                     06/22/2012                      HB_REPORT(8)