1MCELOG(8) Linux's Administrator's Manual MCELOG(8)
2
3
4
6 mcelog - Decode kernel machine check log on x86 machines
7
9 mcelog [options] [device]
10 mcelog [options] --daemon
11 mcelog [options] --client
12 mcelog [options] --ascii
13 mcelog [options] --is-cpu-supported
14 mcelog --version
15
17 X86 CPUs report errors detected by the CPU as machine check events
18 (MCEs). These can be data corruption detected in the CPU caches, in
19 main memory by an integrated memory controller, data transfer errors on
20 the front side bus or CPU interconnect or other internal errors. Pos‐
21 sible causes can be cosmic radiation, instable power supplies, cooling
22 problems, broken hardware, running systems out of specification, or bad
23 luck.
24
25 Most errors can be corrected by the CPU by internal error correction
26 mechanisms. Uncorrected errors cause machine check exceptions which may
27 kill processes or panic the machine. A small number of corrected errors
28 is usually not a cause for worry, but a large number can indicate
29 future failure.
30
31 When a corrected or recovered error happens the x86 kernel writes a
32 record describing the MCE into a internal ring buffer available through
33 the /dev/mcelog device mcelog retrieves errors from /dev/mcelog,
34 decodes them into a human readable format and prints them on the stan‐
35 dard output or optionally into the system log.
36
37 Optionally it can also take more options like keeping statistics or
38 triggering shell scripts on specific events. By default mcelog supports
39 offlining memory pages with persistent corrected errors, offlining CPU
40 cores if they developed cache problems, and otherwise logging specific
41 events to the system log after they crossed a threshold.
42
43 The normal operating modi for mcelog are running as a regular cron job
44 (traditional way, deprecated), running as a trigger directly executed
45 by the kernel, or running as a daemon with the --daemon option.
46
47 When an uncorrected machine check error happens that the kernel cannot
48 recover from then it will usually panic the system. In this case when
49 there was a warm reset after the panic mcelog should pick up the
50 machine check errors after reboot. This is not possible after a cold
51 reset.
52
53 In addition mcelog can be used on the command line to decode the kernel
54 output for a fatal machine check panic in text format using the --ascii
55 option. This is typically used to decode the panic console output of a
56 fatal machine check, if the system was power cycled or mcelog didn't
57 run immediately after reboot.
58
59 When the panic triggers a kdump kexec crash kernel the crash kernel
60 boot up script should log the machine checks to disk, otherwise they
61 might be lost.
62
63 Note that after mcelog retrieves an error the kernel doesn't store it
64 anymore (different from dmesg(1)), so the output should be always saved
65 somewhere and mcelog not run in uncontrolled ways.
66
67 When invoked with the --is-cpu-supported option mcelog exits with code
68 0 if the current CPU is supported, 1 otherwise.
69
70
72 When the --syslog option is specified redirect output to system log.
73 The --syslog-error option causes the normal machine checks to be logged
74 as LOG_ERR (implies --syslog ). Normally only fatal errors or high
75 level remarks are logged with error level. High level one line sum‐
76 maries of specific errors are also logged to the syslog by default
77 unless mcelog operates in --ascii mode.
78
79 When the --logfile=file option is specified append log output to the
80 specified file. With the --no-syslog option mcelog will never log any‐
81 thing to the syslog.
82
83 When the --cpu=cputype option is specified set the to be decoded CPU to
84 cputype. See mcelog --help for a list of valid CPUs. Note that speci‐
85 fying an incorrect CPU can lead to incorrect decoding output. Default
86 is either the CPU of the machine that reported the machine check (needs
87 a newer kernel version) or the CPU of the machine mcelog is running on,
88 so normally this option doesn't have to be used. Older versions of
89 mcelog had separate options for different CPU types. These are still
90 implemented, but deprecated and undocumented now.
91
92 With the --dmi option mcelog will look up the DIMMs reported in machine
93 checks in the SMBIOS/DMI tables of the BIOS and map the DIMMs to board
94 identifiers. This only works when the BIOS reports the identifiers
95 correctly. Unfortunately often the information reported by the BIOS is
96 either subtly or obviously wrong or useless. This option requires that
97 mcelog has read access to /dev/mem (normally requires root) and runs on
98 the same machine in the same hardware configuration as when the machine
99 check event happened.
100
101 When --ignorenodev is specified then mcelog will exit silently when the
102 device cannot be opened. This is useful in virtualized environment with
103 limited devices.
104
105 When --filter is specified mcelog will filter out known broken machine
106 check events (default on). When the --no-filter option is specified
107 mcelog does not filter events.
108
109 When --raw is specified mcelog will not decode, but just dump the
110 mcelog in a raw hex format. This can be useful for automatic post pro‐
111 cessing.
112
113 When a device is specified the machine check logs are read from device
114 instead of the default /dev/mcelog.
115
116 With the --ascii option mcelog decodes a fatal machine check panic gen‐
117 erated by the kernel ("CPU n: Machine Check Exception ...") in ASCII
118 from standard input and exits afterwards. Note that when the panic
119 comes from a different machine than where mcelog is running on you
120 might need to specify the correct cputype on older kernels. On newer
121 kernels which output the PROCESSOR field this is not needed anymore.
122
123 When the --file filename option is specified mcelog --ascii will read
124 the ASCII machine check record from input file filename instead of
125 standard input.
126
127 With the --config-file file option mcelog reads the specified config
128 file. Default is /etc/mcelog/mcelog.conf See also CONFIG FILE below.
129
130 With the --daemon option mcelog will run in the background. This gives
131 the fastest reaction time and is the recommended operating mode. If an
132 output option isn't selected ( --logfile or --syslog or --syslog-error
133 ), this option implies --logfile=/var/log/mcelog. Important messages
134 will be logged as one-liner summaries to syslog unless --no-syslog is
135 given. The option --foreground will prevent mcelog from giving up the
136 terminal in daemon mode. This is intended for debugging.
137
138 With the --client option mcelog will query a running daemon for accumu‐
139 lated errors.
140
141 With the --cpumhz=mhz option assume the CPU has mhz frequency for
142 decoding the time of the event using the CPU time stamp counter. This
143 also forces decoding. Note this can be unreliable. on some systems
144 with CPU frequency scaling or deep C states, where the CPU time stamp
145 counter does not increase linearly. By default the frequency of the
146 current CPU is used when mcelog determines it is safe to use. Newer
147 kernels report the time directly in the event and don't need this any‐
148 more.
149
150 The --pidfile file option writes the process id of the daemon into file
151 file. Only valid in daemon mode.
152
153 Mcelog will enable extended error reporting from the memory controller
154 on processors that support it unless you tell it not to with the --no-
155 imc-log option. You might need this option when decoding old logs from
156 a system where this mode was not enabled.
157
158
159 --version displays the version of mcelog and exits.
160
161
163 mcelog supports a config file to set defaults. Command line options
164 override the config file. By default the config file is read from
165 /etc/mcelog/mcelog.conf unless overridden with the --config-file
166 option.
167
168 The general format is optionname = value White space is not allowed in
169 value currently, except at the end where it is dropped Comments start
170 with #.
171
172 All command line options that are not commands can be specified in the
173 config file. For example t to enable the --no-syslog option use no-
174 syslog = yes (or no to disable). When the option has a argument use
175 logfile = /tmp/logfile
176
177 For more information on the config file please see mcelog.conf(5).
178
179
181 The kernel prefers old messages over new. If the log buffer overflows
182 only old ones will be kept.
183
184 The exact output in the log file depends on the CPU, unless the --raw
185 option is used.
186
187 mcelog will report serious errors to the syslog during decoding.
188
189
191 When mcelog runs in daemon mode and receives a SIGUSR1 it will close
192 and reopen the log files. This can be used to rotate logs without
193 restarting the daemon.
194
195
197 /dev/mcelog (char 10, minor 227)
198
199 /etc/mcelog/mcelog.conf
200
201 /var/log/mcelog
202
203 /var/run/mcelog.pid
204
205
207 mcelog.conf(5), mcelog.triggers(5)
208
209 http://www.mcelog.org
210
211 AMD x86-64 architecture programmer's manual, Volume 2, System program‐
212 ming
213
214 Intel 64 and IA32 Architectures Software Developer's manual, Volume 3,
215 System programming guide Chapter 15 and 16. http://www.intel.com/sdm
216
217 Datasheet of your CPU.
218
219
220
221 Mar 2015 MCELOG(8)