1mcelog.triggers(5) File Formats Manual mcelog.triggers(5)
2
3
4
6 mcelog.triggers - mcelog trigger scripts reference
7
9 /etc/mcelog/bus-error-trigger
10 /etc/mcelog/cache-error-trigger
11 /etc/mcelog/dimm-error-trigger
12 /etc/mcelog/iomca-error-trigger
13 /etc/mcelog/page-error-trigger
14 /etc/mcelog/socket-memory-error-trigger
15 /etc/mcelog/unknown-error-trigger
16
18 mcelog(8) maintains thresholds of errors using a leaky-bucket algo‐
19 rithm. When the number of errors in a specific time window exceeds a
20 pre-configured threshold a trigger will be executed. Triggers are usu‐
21 ally shell scripts in the /etc/mcelog directory but can be also other
22 internal actions. Thresholds and triggers can be configured in
23 mcelog.conf(5)
24
25 Trigger will run as the user configured for mcelog in mcelog.conf, by
26 default root. The default trigger action can be overridden by specify‐
27 ing a different trigger script in the configuration file. Actions in
28 addition to the default trigger (like notifying an administrator) can
29 be put into the respective /etc/mcelog/*.local script which is executed
30 after the default action. This allows updating the default scripts
31 without overriding local actions. All trigger actions are also logged
32 to syslog.
33
34 The DIMM and socket memory error triggers
35
36 The /etc/mcelog/dimm-error-trigger and /etc/mcelog/socket-memory-error-
37 trigger scripts are executed when a DIMM or a CPU socket exceeds a con‐
38 figured corrected or uncorrected memory error threshold. The thresh‐
39 olds are configured in the mcelog.conf [dimm] and [socket] sections.
40 The default triggers log a warning message in the system log. The
41 triggers are only executed when mcelog runs as a daemon.
42
43 Arguments are passed as environment variables
44
45 THRESHOLD human readable threshold status
46 MESSAGE Human readable consolidated error message
47 TOTALCOUNT total corrected or uncorrected count of errors for current DIMM depending on what triggered the event
48 LOCATION Consolidated location as a single string
49 DMI_LOCATION DIMM location from DMI/SMBIOS if available
50 DMI_NAME DIMM identifier from DMI/SMBIOS if available
51 DIMM DIMM number reported by hardware
52 CHANNEL Channel number reported by hardware
53 SOCKETID Socket ID of CPU that includes the memory controller with the DIMM
54 CECOUNT Total corrected error count for DIMM
55 UCCOUNT Total uncorrected error count for DIMM
56 LASTEVENT Time stamp of event that triggered threshold (in time_t format, seconds)
57 THRESHOLD_COUNT Total umber of events in current threshold time period of specific type
58
59 After the default action local actions in /etc/mcelog/dimm-error-trig‐
60 ger.local or respective /etc/mcelog/socket-memory-error-trigger.local
61 are executed.
62
63
64 The page error trigger
65
66 The /etc/mcelog/page-error-trigger script is executed by mcelog in dae‐
67 mon mode when a page in memory exceeds a pre-configured corrected or
68 uncorrected error threshold. mcelog internally also implements
69 offlining the page through the kernel. This is configured through the
70 [page] section of mcelog.conf(5)
71
72 The environment arguments are the same as for the dimm-error-trigger
73 script
74
75 After the default action local actions in /etc/mcelog/page-error-trig‐
76 ger.loccal are executed.
77
78
79 The cache error trigger
80
81 The /etc/mcelog/cache-error-trigger shell script is called for cache
82 error handling in daemon mode when a CPU reports excessive corrected
83 cache errors. This could be a indication for future uncorrected
84 errors.
85
86 This trigger is configured through the [cache] section in the
87 mcelog.conf(5) configuration file. The threshold is defined by the CPU.
88 The default trigger offlines the affected CPU cores, unless it is the
89 last core running.
90
91 Arguments are passed as environment variables
92
93 MESSAGE Human readable error message
94 CPU Linux CPU number that triggered the error
95 LEVEL Cache level affected by error
96 TYPE Cache type affected by error (Data,Instruction,Generic)
97 AFFECTED_CPUS List of CPUs sharing the affected cache
98 SOCKETID Socket ID of affected CPU
99
100 After the default action local actions in /etc/mcelog/cache-error-trig‐
101 ger.local are executed.
102
103 The bus-uc-threshold-trigger
104
105 The bus-uc-threshold-trigger runs on uncorrected errors on a IO bus. It
106 is configured through the bus-uc-threshold-trigger and bus-uc-thresh‐
107 old-trigger-threshold options in /etc/mcelog.conf(5). By default it
108 logs a message with the error location to the system log. After the
109 default action local actions in /etc/mcelog/bus-uc-error-trigger.local
110 are executed.
111
112 Arguments are passed as environment variables
113
114 MESSAGE Human readable consolidated error message.
115 LOCATION Consolidated location as a single string
116 SOCKETID Socket ID of CPU that includes the memory controller with the DIMM
117 LEVEL Interconnect level
118 PARTICIPATION Processor Participation (Originator, Responder or Observer)
119 REQUEST Request type (read, write, prefetch, etc.)
120 ORIGIN Memory or IO
121 TIMEOUT The request timed out or not
122
123 The iomca-error-trigger
124
125 The iomca-error-trigger runs when a socket receives bus or interconnect
126 errors. It is configured through the iomca-error-trigger and iomca-
127 error-trigger-threshold options in /etc/mcelog.conf. By default it logs
128 a message with the error location to the system log. After the default
129 action local actions in /etc/mcelog/iomca-error-trigger.local are exe‐
130 cuted.
131
132 Arguments are passed as environment variables
133
134 MESSAGE Human readable consolidated error message
135 LOCATION Consolidated location as a single string
136 SOCKETID Socket ID of CPU that includes the memory controller with the DIMM
137 CPU Linux CPU number that triggered the error
138 SET PCI segment number
139 BUS PCI bus number
140 DEVICE PCI device number
141 FUNCTION PCI function number
142
143 The unknown-error-trigger
144
145 The unknown-error-trigger runs on any errors not otherwise categorized.
146 It is configured through the unknown-error-trigger and unknown-error-
147 trigger-threshold options in /etc/mcelog.conf. By default it logs a
148 message to the system log. After the default action local actions in
149 /etc/mcelog/unknown-error-trigger.local are executed.
150
151 Arguments are passed as environment variables
152
153 MESSAGE Human readable consolidated error message
154 LOCATION Consolidated location as a single string
155 SOCKETID Socket ID of CPU that includes the memory controller with the DIMM
156 CPU Linux CPU number that triggered the error
157 STATUS IA32_MCi_STATUS register value
158 ADDR IA32_MCi_ADDR register value
159 MISC IA32_MCi_MISC register value
160 MCGSTATUS IA32_MCG_STATUS register value
161 MCGCAP IA32_MCG_CAP register value
162
164 http://www.mcelog.org
165
166 mcelog(8), mcelog.conf(5)
167
168
169
170 mcelog mcelog.triggers(5)