1MCELOG(8) Linux's Administrator's Manual MCELOG(8)
2
3
4
6 mcelog - Decode kernel machine check log on x86 machines
7
9 mcelog [options] [device]
10 mcelog [options] --daemon
11 mcelog [options] --client
12 mcelog [options] --ascii
13 mcelog [options] --is-cpu-supported
14 mcelog --version
15 mcelog --supported
16
18 X86 CPUs report errors detected by the CPU as machine check events
19 (MCEs). These can be data corruption detected in the CPU caches, in
20 main memory by an integrated memory controller, data transfer errors on
21 the front side bus or CPU interconnect or other internal errors. Pos‐
22 sible causes can be cosmic radiation, instable power supplies, cooling
23 problems, broken hardware, running systems out of specification, or bad
24 luck.
25
26 Most errors can be corrected by the CPU by internal error correction
27 mechanisms. Uncorrected errors cause machine check exceptions which may
28 kill processes or panic the machine. A small number of corrected errors
29 is usually not a cause for worry, but a large number can indicate
30 future failure.
31
32 When a corrected or recovered error happens the x86 kernel writes a
33 record describing the MCE into a internal ring buffer available through
34 the /dev/mcelog device mcelog retrieves errors from /dev/mcelog,
35 decodes them into a human readable format and prints them on the stan‐
36 dard output or optionally into the system log.
37
38 Optionally it can also take more options like keeping statistics or
39 triggering shell scripts on specific events. By default mcelog supports
40 offlining memory pages with persistent corrected errors, offlining CPU
41 cores if they developed cache problems, and otherwise logging specific
42 events to the system log after they crossed a threshold.
43
44 The normal operating modi for mcelog are running as a regular cron job
45 (traditional way, deprecated), running as a trigger directly executed
46 by the kernel, or running as a daemon with the --daemon option.
47
48 When an uncorrected machine check error happens that the kernel cannot
49 recover from then it will usually panic the system. In this case when
50 there was a warm reset after the panic mcelog should pick up the
51 machine check errors after reboot. This is not possible after a cold
52 reset.
53
54 In addition mcelog can be used on the command line to decode the kernel
55 output for a fatal machine check panic in text format using the --ascii
56 option. This is typically used to decode the panic console output of a
57 fatal machine check, if the system was power cycled or mcelog didn't
58 run immediately after reboot.
59
60 When the panic triggers a kdump kexec crash kernel the crash kernel
61 boot up script should log the machine checks to disk, otherwise they
62 might be lost.
63
64 Note that after mcelog retrieves an error the kernel doesn't store it
65 anymore (different from dmesg(1)), so the output should be always saved
66 somewhere and mcelog not run in uncontrolled ways.
67
68 When invoked with the --is-cpu-supported option mcelog exits with code
69 0 if the current CPU is supported, 1 otherwise.
70
71
73 When the --syslog option is specified redirect output to system log.
74 The --syslog-error option causes the normal machine checks to be logged
75 as LOG_ERR (implies --syslog ). Normally only fatal errors or high
76 level remarks are logged with error level. High level one line sum‐
77 maries of specific errors are also logged to the syslog by default
78 unless mcelog operates in --ascii mode.
79
80 When the --logfile=file option is specified append log output to the
81 specified file. With the --no-syslog option mcelog will never log any‐
82 thing to the syslog.
83
84 When the --cpu=cputype option is specified set the to be decoded CPU to
85 cputype. See mcelog --help for a list of valid CPUs. Note that speci‐
86 fying an incorrect CPU can lead to incorrect decoding output. Default
87 is either the CPU of the machine that reported the machine check (needs
88 a newer kernel version) or the CPU of the machine mcelog is running on,
89 so normally this option doesn't have to be used. Older versions of
90 mcelog had separate options for different CPU types. These are still
91 implemented, but deprecated and undocumented now.
92
93 With the --dmi option mcelog will look up the DIMMs reported in machine
94 checks in the SMBIOS/DMI tables of the BIOS and map the DIMMs to board
95 identifiers. This only works when the BIOS reports the identifiers
96 correctly. Unfortunately often the information reported by the BIOS is
97 either subtly or obviously wrong or useless. This option requires that
98 mcelog has read access to /dev/mem (normally requires root) and runs on
99 the same machine in the same hardware configuration as when the machine
100 check event happened.
101
102 When --ignorenodev is specified then mcelog will exit silently when the
103 device cannot be opened. This is useful in virtualized environment with
104 limited devices.
105
106 When --filter is specified mcelog will filter out known broken machine
107 check events (default on). When the --no-filter option is specified
108 mcelog does not filter events.
109
110 When --raw is specified mcelog will not decode, but just dump the
111 mcelog in a raw hex format. This can be useful for automatic post pro‐
112 cessing.
113
114 When a device is specified the machine check logs are read from device
115 instead of the default /dev/mcelog.
116
117 With the --ascii option mcelog decodes a fatal machine check panic gen‐
118 erated by the kernel ("CPU n: Machine Check Exception ...") in ASCII
119 from standard input and exits afterwards. Note that when the panic
120 comes from a different machine than where mcelog is running on you
121 might need to specify the correct cputype on older kernels. On newer
122 kernels which output the PROCESSOR field this is not needed anymore.
123
124 When the --file filename option is specified mcelog --ascii will read
125 the ASCII machine check record from input file filename instead of
126 standard input.
127
128 With the --config-file file option mcelog reads the specified config
129 file. Default is /etc/mcelog/mcelog.conf See also CONFIG FILE below.
130
131 With the --daemon option mcelog will run in the background. This gives
132 the fastest reaction time and is the recommended operating mode. If an
133 output option isn't selected ( --logfile or --syslog or --syslog-error
134 ), this option implies --logfile=/var/log/mcelog. Important messages
135 will be logged as one-liner summaries to syslog unless --no-syslog is
136 given. The option --foreground will prevent mcelog from giving up the
137 terminal in daemon mode. This is intended for debugging.
138
139 With the --client option mcelog will query a running daemon for accumu‐
140 lated errors.
141
142 With the --cpumhz=mhz option assume the CPU has mhz frequency for
143 decoding the time of the event using the CPU time stamp counter. This
144 also forces decoding. Note this can be unreliable. on some systems
145 with CPU frequency scaling or deep C states, where the CPU time stamp
146 counter does not increase linearly. By default the frequency of the
147 current CPU is used when mcelog determines it is safe to use. Newer
148 kernels report the time directly in the event and don't need this any‐
149 more.
150
151 The --pidfile file option writes the process id of the daemon into file
152 file. Only valid in daemon mode.
153
154 Mcelog will enable extended error reporting from the memory controller
155 on processors that support it unless you tell it not to with the --no-
156 imc-log option. You might need this option when decoding old logs from
157 a system where this mode was not enabled.
158
159 Mcelog will enable extended error reporting from the memory controller
160 on processors that support it unless you tell it not to with the --no-
161 imc-log option. You might need this option when decoding old logs from
162 a system where this mode was not enabled.
163
164
165 --version displays the version of mcelog and exits.
166
167 --supported returns 0 if the system has processors which support MCE,
168 and 1 otherwise.
169
170
172 mcelog supports a config file to set defaults. Command line options
173 override the config file. By default the config file is read from
174 /etc/mcelog/mcelog.conf unless overridden with the --config-file
175 option.
176
177 The general format is optionname = value White space is not allowed in
178 value currently, except at the end where it is dropped Comments start
179 with #.
180
181 All command line options that are not commands can be specified in the
182 config file. For example t to enable the --no-syslog option use no-
183 syslog = yes (or no to disable). When the option has a argument use
184 logfile = /tmp/logfile
185
186 For more information on the config file please see mcelog.conf(5).
187
188
190 The kernel prefers old messages over new. If the log buffer overflows
191 only old ones will be kept.
192
193 The exact output in the log file depends on the CPU, unless the --raw
194 option is used.
195
196 mcelog will report serious errors to the syslog during decoding.
197
198
200 When mcelog runs in daemon mode and receives a SIGUSR1 it will close
201 and reopen the log files. This can be used to rotate logs without
202 restarting the daemon.
203
204
206 /dev/mcelog (char 10, minor 227)
207
208 /etc/mcelog/mcelog.conf
209
210 /var/log/mcelog
211
212 /var/run/mcelog.pid
213
214
216 mcelog.conf(5), mcelog.triggers(5)
217
218 http://www.mcelog.org
219
220 AMD x86-64 architecture programmer's manual, Volume 2, System program‐
221 ming
222
223 Intel 64 and IA32 Architectures Software Developer's manual, Volume 3,
224 System programming guide Chapter 15 and 16. http://www.intel.com/sdm
225
226 Datasheet of your CPU.
227
228
229
230 Mar 2015 MCELOG(8)