1MCELOG(8) Linux's Administrator's Manual MCELOG(8)
2
3
4
6 mcelog - Decode kernel machine check log on x86 machines
7
9 mcelog [options] [device]
10 mcelog [options] --daemon
11 mcelog [options] --client
12 mcelog [options] --ascii
13 mcelog --version
14
16 X86 CPUs report errors detected by the CPU as machine check events
17 (MCEs). These can be data corruption detected in the CPU caches, in
18 main memory by an integrated memory controller, data transfer errors on
19 the front side bus or CPU interconnect or other internal errors. Pos‐
20 sible causes can be cosmic radiation, instable power supplies, cooling
21 problems, broken hardware, or bad luck.
22
23 Most errors can be corrected by the CPU by internal error correction
24 mechanisms. Uncorrected errors cause machine check exceptions which may
25 panic the machine.
26
27 When a corrected error happens the x86 kernel writes a record describ‐
28 ing the MCE into a internal ring buffer available through the
29 /dev/mcelog device mcelog retrieves errors from /dev/mcelog, decodes
30 them into a human readable format and prints them on the standard out‐
31 put or optionally into the system log.
32
33 Optionally it can also take more options like keeping statistics or
34 triggering shell scripts on specific events.
35
36 The normal operating modi for mcelog are running as a regular cron job
37 (traditional way, deprecated), running as a trigger directly executed
38 by the kernel, or running as a daemon with the --daemon option.
39
40 When an uncorrected machine check error happens that the kernel cannot
41 recover from then it will usually panic the system. In this case when
42 there was a warm reset after the panic mcelog should pick up the
43 machine check errors after reboot. This is not possible after a cold
44 reset.
45
46 In addition mcelog can be used on the command line to decode the kernel
47 output for a fatal machine check panic in text format using the --ascii
48 option. This is typically used to decode the panic console output of a
49 fatal machine check, if the system was power cycled or mcelog didn't
50 run immediately after reboot.
51
52 When the panic triggers a kdump kexec crash kernel the crash kernel
53 boot up script should log the machine checks to disk, otherwise they
54 might be lost.
55
56 Note that after mcelog retrieves an error the kernel doesn't store it
57 anymore (different from dmesg(1)), so the output should be always saved
58 somewhere and mcelog not run in uncontrolled ways.
59
60
62 When the --syslog option is specified redirect output to system log.
63 The --syslog-error option causes the normal machine checks to be logged
64 as LOG_ERR (implies --syslog ). Normally only fatal errors or high
65 level remarks are logged with error level. High level one line sum‐
66 maries of specific errors are also logged to the syslog by default
67 unless mcelog operates in --ascii mode.
68
69 When the --logfile=file option is specified append log output to the
70 specified file. With the --no-syslog option mcelog will never log any‐
71 thing to the syslog.
72
73 When the --cpu=cputype option is specified set the to be decoded CPU to
74 cputype. See mcelog --help for a list of valid CPUs. Note that speci‐
75 fying an incorrect CPU can lead to incorrect decoding output. Default
76 is either the CPU of the machine that reported the machine check (needs
77 a newer kernel version) or the CPU of the machine mcelog is running on,
78 so normally this option doesn't have to be used. Older versions of
79 mcelog had separate options for different CPU types. These are still
80 implemented, but deprecated and undocumented now.
81
82 With the --dmi option mcelog will look up the addresses reported in
83 machine checks in the SMBIOS/DMI tables of the BIOS. This can some‐
84 times tell you which DIMM or memory controller has developed a problem.
85 More often the information reported by the BIOS is either subtly or
86 obviously wrong or useless. This option requires that mcelog has read
87 access to /dev/mem (normally requires root) and runs on the same
88 machine in the same hardware configuration as when the machine check
89 event happened.
90
91 When --ignorenodev is specified then mcelog will exit silently when the
92 device cannot be opened. This is useful in virtualized environment with
93 limited devices.
94
95 When --filter is specified mcelog will filter out known broken machine
96 check events (default on). When the --no-filter option is specified
97 mcelog does not filter events.
98
99 When --raw is specified mcelog will not decode, but just dump the
100 mcelog in a raw hex format. This can be useful for automatic post pro‐
101 cessing.
102
103 When a device is specified the machine check logs are read from device
104 instead of the default /dev/mcelog.
105
106 With the --ascii option mcelog decodes a fatal machine check panic gen‐
107 erated by the kernel ("CPU n: Machine Check Exception ...") in ASCII
108 from standard input and exits afterwards. Note that when the panic
109 comes from a different machine than where mcelog is running on you
110 might need to specify the correct cputype on older kernels. On newer
111 kernels which output the PROCESSOR field this is not needed anymore.
112
113 When the --file filename option is specified mcelog --ascii will read
114 the ASCII machine check record from input file filename instead of
115 standard input.
116
117 With the --config-file file option mcelog reads the specified config
118 file. Default is /etc/mcelog.conf See also CONFIG FILE below.
119
120 With the --daemon option mcelog will run in the background. This gives
121 the fastest reaction time and is the recommended operating mode. This
122 option implies --syslog. The option --foreground will prevent mcelog
123 from giving up the terminal in daemon mode. This is intended for debug‐
124 ging.
125
126 With the --client option mcelog will query a running daemon for accumu‐
127 lated errors.
128
129 With the --cpumhz=mhz option assume the CPU has mhz frequency for
130 decoding the time of the event using the CPU time stamp counter. This
131 also forces decoding. Note this can be unreliable. on some systems
132 with CPU frequency scaling or deep C states, where the CPU time stamp
133 counter does not increase linearly. By default the frequency of the
134 current CPU is used when mcelog determines it is safe to use. Newer
135 kernels report the time directly in the event and don't need this any‐
136 more.
137
138 The --pidfile file option writes the process id of the daemon into file
139 file. Only valid in daemon mode.
140
141
142 --version displays the version of mcelog and exits.
143
144
146 mcelog supports a config file to set defaults. Command line options
147 override the config file. By default the config file is read from
148 /etc/mcelog.conf unless overridden with the --config-file option.
149
150 The general format is optionname = value White space is not allowed in
151 value currently, except at the end where it is dropped Comments start
152 with #.
153
154 All command line options that are not commands can be specified in the
155 config file. For example t to enable the --no-syslog option use no-
156 syslog = yes (or no to disable). When the option has a argument use
157 logfile = /tmp/logfile
158
159
161 The kernel prefers old messages over new. If the log buffer overflows
162 only old ones will be kept.
163
164 The exact output in the log file depends on the CPU, unless the --raw
165 option is used.
166
167 mcelog will report serious errors to the syslog during decoding.
168
169
171 /dev/mcelog (char 10, minor 227)
172
173 /etc/mcelog/mcelog.conf
174
175 /sys/devices/system/machinecheck/machinecheck0/trigger
176
177
179 AMD x86-64 architecture programmer's manual, Volume 2, System program‐
180 ming
181
182 Intel 64 and IA32 Architectures Software Developer's manual, Volume 3,
183 System programming guide Parts 1 and 2. Machine checks are described in
184 Chapter 14 in Part1 and in Appendix E in Part2.
185
186 Datasheet of your CPU.
187
188
189
190 May 2009 MCELOG(8)