1th_define(1M) System Administration Commands th_define(1M)
2
3
4
6 th_define - create fault injection test harness error specifications
7
9 th_define [-n name -i instance| -P path] [-a acc_types]
10 [-r reg_number] [-l offset [length]]
11 [-c count [failcount]] [-o operator [operand]]
12 [-f acc_chk] [-w max_wait_period [report_interval]]
13
14
15 or
16
17
18 th_define [-n name -i instance| -P path]
19 [-a log [acc_types] [-r reg_number] [-l offset [length]]]
20 [-c count [failcount]] [-s collect_time] [-p policy]
21 [-x flags] [-C comment_string]
22 [-e fixup_script [args]]
23
24
25 or
26
27
28 th_define [-h]
29
30
32 The th_define utility provides an interface to the bus_ops fault injec‐
33 tion bofi device driver for defining error injection specifications
34 (referred to as errdefs). An errdef corresponds to a specification of
35 how to corrupt a device driver's accesses to its hardware. The command
36 line arguments determine the precise nature of the fault to be
37 injected. If the supplied arguments define a consistent errdef, the
38 th_define process will store the errdef with the bofi driver and sus‐
39 pend itself until the criteria given by the errdef become satisfied (in
40 practice, this will occur when the access counts go to zero).
41
42
43 You use the th_manage(1M) command with the start option to activate the
44 resulting errdef. The effect of th_manage with the start option is that
45 the bofi driver acts upon the errdef by matching the number of hardware
46 accesses—specified in count, that are of the type specified in
47 acc_types, made by instance number instance—of the driver whose name is
48 name, (or by the driver instance specified by path) to the register set
49 (or DMA handle) specified by reg_number, that lie within the range off‐
50 set to offset + length from the beginning of the register set or DMA
51 handle. It then applies operator and operand to the next failcount
52 matching accesses.
53
54
55 If acc_types includes log, th_define runs in automatic test script gen‐
56 eration mode, and a set of test scripts (written in the Korn shell) is
57 created and placed in a sub-directory of the current directory with the
58 name <driver>.test.<id> (for example, glm.test.978177106). A separate,
59 executable script is generated for each access handle that matches the
60 logging criteria. The log of accesses is placed at the top of each
61 script as a record of the session. If the current directory is not
62 writable, file output is written to standard output. The base name of
63 each test file is the driver name, and the extension is a number that
64 discriminates between different access handles. A control script (with
65 the same name as the created test directory) is generated that will run
66 all the test scripts sequentially.
67
68
69 Executing the scripts will install, and then activate, the resulting
70 error definitions. Error definitions are activated sequentially and the
71 driver instance under test is taken offline and brought back online
72 before each test (refer to the -e option for more information). By
73 default, logging applies to all PIO accesses, all interrupts, and all
74 DMA accesses to and from areas mapped for both reading and writing. You
75 can constrain logging by specifying additional acc_types, reg_number,
76 offset and length. Logging will continue for count matching accesses,
77 with an optional time limit of collect_time seconds.
78
79
80 Either the -n or -P option must be provided. The other options are
81 optional. If an option (other than -a) is specified multiple times,
82 only the final value for the option is used. If an option is not speci‐
83 fied, its associated value is set to an appropriate default, which will
84 provide maximal error coverage as described below.
85
87 The following options are available:
88
89 -n name
90
91 Specify the name of the driver to test. (String)
92
93
94 -i instance
95
96 Test only the specified driver instance (-1 matches all instances
97 of driver). (Numeric)
98
99
100 -P path
101
102 Specify the full device path of the driver to test. (String)
103
104
105 -r reg_number
106
107 Test only the given register set or DMA handle (-1 matches all reg‐
108 ister sets and DMA handles). (Numeric)
109
110
111 -a acc_types
112
113 Only the specified access types will be matched. Valid values for
114 the acc_types argument are log, pio, pio_r, pio_w, dma, dma_r,
115 dma_w and intr. Multiple access types, separated by spaces, can be
116 specified. The default is to match all hardware accesses.
117
118 If acc_types is set to log, logging will match all PIO accesses,
119 interrupts and DMA accesses to and from areas mapped for both read‐
120 ing and writing. log can be combined with other acc_types, in which
121 case the matching condition for logging will be restricted to the
122 specified addional acc_types. Note that dma_r will match only DMA
123 handles mapped for reading only; dma_w will match only DMA handles
124 mapped for writing only; dma will match only DMA handles mapped for
125 both reading and writing.
126
127
128 -l offset [length]
129
130 Constrain the range of qualifying accesses. The offset and length
131 arguments indicate that any access of the type specified with the
132 -a option, to the register set or DMA handle specified with the -r
133 option, lie at least offset bytes into the register set or DMA han‐
134 dle and at most offset + length bytes into it. The default for off‐
135 set is 0. The default for length is the maximum value that can be
136 placed in an offset_t C data type (see types.h). Negative values
137 are converted into unsigned quantities. Thus, th_define -l 0 -1 is
138 maximal.
139
140
141 -c count[failcount]
142
143 Wait for count number of matching accesses, then apply an operator
144 and operand (see the -o option) to the next failcount number of
145 matching accesses. If the access type (see the -a option) includes
146 logging, the number of logged accesses is given by count + fail‐
147 count - 1. The -1 is required because the last access coincides
148 with the first faulting access.
149
150 Note that access logging may be combined with error injection if
151 failcount and operator are nonzero and if the access type includes
152 logging and any of the other access types (pio, dma and intr) See
153 the description of access types in the definition of the -a option,
154 above.
155
156 When the count and failcount fields reach zero, the status of the
157 errdef is reported to standard output. When all active errdefs cre‐
158 ated by the th_define process complete, the process exits. If
159 acc_types includes log, count determines how many accesses to log.
160 If count is not specified, a default value is used. If failcount is
161 set in this mode, it will simply increase the number of accesses
162 logged by a further failcount - 1.
163
164
165 -o operator [operand]
166
167 For qualifying PIO read and write accesses, the value read from or
168 written to the hardware is corrupted according to the value of
169 operator:
170
171 EQ operand is returned to the driver.
172
173
174 OR operand is bitwise ORed with the real value.
175
176
177 AND operand is bitwise ANDed with the real value.
178
179
180 XOR operand is bitwise XORed with the real value.
181
182 For PIO write accesses, the following operator is allowed:
183
184 NO Simply ignore the driver's attempt to write to the hardware.
185
186 Note that a driver performs PIO via the ddi_getX(), ddi_putX(),
187 ddi_rep_getX() and ddi_rep_putX() routines (where X is 8, 16, 32 or
188 64). Accesses made using ddi_getX() and ddi_putX() are treated as a
189 single access, whereas an access made using the ddi_rep_*[22m(9F) rou‐
190 tines are broken down into their respective number of accesses, as
191 given by the repcount parameter to these DDI calls. If the access
192 is performed via a DMA handle, operator and value are applied to
193 every access that comprises the DMA request. If interference with
194 interrupts has been requested then the operator may take any of the
195 following values:
196
197 DELAY After count accesses (see the -c option), delay delivery
198 of the next failcount number of interrupts for operand
199 number of microseconds.
200
201
202 LOSE After count number of interrupts, fail to deliver the next
203 failcount number of real interrupts to the driver.
204
205
206 EXTRA After count number of interrupts, start delivering operand
207 number of extra interrupts for the next failcount number
208 of real interrupts.
209
210 The default value for operand and operator is to corrupt the data
211 access by flipping each bit (XOR with -1).
212
213
214 -f acc_chk
215
216 If the acc_chk parameter is set to 1 or pio, then the driver's
217 calls to ddi_check_acc_handle(9F) return DDI_FAILURE when the
218 access count goes to 1. If the acc_chk parameter is set to 2 or
219 dma, then the driver's calls to ddi_check_dma_handle(9F) return
220 DDI_FAILURE when the access count goes to 1.
221
222
223 -w max_wait_period [report_interval]
224
225 Constrain the period for which an error definition will remain
226 active. The option applies only to non-logging errdefs. If an error
227 definition remains active for max_wait_period seconds, the test
228 will be aborted. If report_interval is set to a nonzero value, the
229 current status of the error definition is reported to standard out‐
230 put every report_interval seconds. The default value is zero. The
231 status of the errdef is reported in parsable format (eight fields,
232 each separated by a colon (:) character, the last of which is a
233 string enclosed by double quotes and the remaining seven fields are
234 integers):
235
236 ft:mt:ac:fc:chk:ec:s:"message" which are defined as follows:
237
238 ft The UTC time when the fault was injected.
239
240
241 mt The UTC time when the driver reported the fault.
242
243
244 ac The number of remaining non-faulting accesses.
245
246
247 fc The number of remaining faulting accesses.
248
249
250 chk The value of the acc_chk field of the errdef.
251
252
253 ec The number of fault reports issued by the driver
254 against this errdef (mt holds the time of the initial
255 report).
256
257
258 s The severity level reported by the driver.
259
260
261 "message" Textual reason why the driver has reported a fault.
262
263
264
265 -h
266
267 Display the command usage string.
268
269
270 -s collect_time
271
272 If acc_types is given with the -a option and includes log, the
273 errdef will log accesses for collect_time seconds (the default is
274 to log until the log becomes full). Note that, if the errdef speci‐
275 fication matches multiple driver handles, multiple logging errdefs
276 are registered with the bofi driver and logging terminates when all
277 logs become full or when collect_time expires or when the associ‐
278 ated errdefs are cleared. The current state of the log can be
279 checked with the th_manage(1M) command, using the broadcast parame‐
280 ter. A log can be terminated by running th_manage(1M) with the
281 clear_errdefs option or by sending a SIGALRM signal to the
282 th_define process. See alarm(2) for the semantics of SIGALRM.
283
284
285 -p policy
286
287 Applicable when the acc_types option includes log. The parameter
288 modifies the policy used for converting from logged accesses to
289 errdefs. All policies are inclusive:
290
291 o Use rare to bias error definitions toward rare accesses
292 (default).
293
294 o Use operator to produce a separate error definition for
295 each operator type (default).
296
297 o Use common to bias error definitions toward common
298 accesses.
299
300 o Use median to bias error definitions toward median
301 accesses.
302
303 o Use maximal to produce multiple error definitions for
304 duplicate accesses.
305
306 o Use unbiased to create unbiased error definitions.
307
308 o Use onebyte, twobyte, fourbyte, or eightbyte to select
309 errdefs corresponding to 1, 2, 4 or 8 byte accesses (if
310 chosen, the -xr option is enforced in order to ensure
311 that ddi_rep_*() calls are decomposed into multiple sin‐
312 gle accesses).
313
314 o Use multibyte to create error definitions for multibyte
315 accesses performed using ddi_rep_get*() and
316 ddi_rep_put*().
317 Policies can be combined by adding together these options. See the
318 NOTES section for further information.
319
320
321 -x flags
322
323 Applicable when the acc_types option includes log. The flags param‐
324 eter modifies the way in which the bofi driver logs accesses. It is
325 specified as a string containing any combination of the following
326 letters:
327
328 w Continuous logging (that is, the log will wrap when full).
329
330
331 t Timestamp each log entry (access times are in seconds).
332
333
334 r Log repeated I/O as individual accesses (for example, a
335 ddi_rep_get16(9F) call which has a repcount of N is logged N
336 times with each transaction logged as size 2 bytes. Without
337 this option, the default logging behavior is to log this
338 access once only, with a transaction size of twice the rep‐
339 count).
340
341
342
343 -C comment_string
344
345 Applicable when the acc_types option includes log. It provides a
346 comment string to be placed in any generated test scripts. The
347 string must be enclosed in double quotes.
348
349
350 -e fixup_script [args]
351
352 Applicable when the acc_types option includes log. The output of a
353 logging errdefs is to generate a test script for each driver access
354 handle. Use this option to embed a command in the resulting script
355 before the errors are injected. The generated test scripts will
356 take an instance offline and bring it back online before injecting
357 errors in order to bring the instance into a known fault-free
358 state. The executable fixup_script will be called twice with the
359 set of optional args— once just before the instance is taken off‐
360 line and again after the instance has been brought online. The fol‐
361 lowing variables are passed into the environment of the called exe‐
362 cutable:
363
364 DRIVER_PATH Identifies the device path of the instance.
365
366
367 DRIVER_INSTANCE Identifies the instance number of the device.
368
369
370 DRIVER_UNCONFIGURE Has the value 1 when the instance is about to
371 be taken offline.
372
373
374 DRIVER_CONFIGURE Has the value 1 when the instance has just
375 been brought online.
376
377 Typically, the executable ensures that the device under test is in
378 a suitable state to be taken offline (unconfigured) or in a suit‐
379 able state for error injection (for example configured, error free
380 and servicing a workload). A minimal script for a network driver
381 could be:
382
383 #!/bin/ksh
384
385 driver=xyznetdriver
386 ifnum=$driver$DRIVER_INSTANCE
387
388 if [[ $DRIVER_CONFIGURE = 1 ]]; then
389 ifconfig $ifnum plumb
390 ifconfig $ifnum ...
391 ifworkload start $ifnum
392 elif [[ $DRIVER_UNCONFIGURE = 1 ]]; then
393 ifworkload stop $ifnum
394 ifconfig $ifnum down
395 ifconfig $ifnum unplumb
396 fi
397 exit $?
398
399
400 The -e option must be the last option on the command line.
401
402
403
404 If the -a log option is selected but the -e option is not given, a
405 default script is used. This script repeatedly attempts to detach and
406 then re-attach the device instance under test.
407
409 Examples of Error Definitions
410 th_define -n foo -i 1 -a log
411
412
413 Logs all accesses to all handles used by instance 1 of the foo driver
414 while running the default workload (attaching and detaching the
415 instance). Then generates a set of test scripts to inject appropriate
416 errdefs while running that default workload.
417
418
419 th_define -n foo -i 1 -a log pio
420
421
422 Logs PIO accesses to each PIO handle used by instance 1 of the foo
423 driver while running the default workload (attaching and detaching the
424 instance). Then generates a set of test scripts to inject appropriate
425 errdefs while running that default workload.
426
427
428 th_define -n foo -i 1 -p onebyte median -e fixup arg -now
429
430
431 Logs all accesses to all handles used by instance 1 of the foo driver
432 while running the workload defined in the fixup script fixup with argu‐
433 ments arg and -now. Then generates a set of test scripts to inject
434 appropriate errdefs while running that workload. The resulting error
435 definitions are requested to focus upon single byte accesses to loca‐
436 tions that are accessed a median number of times with respect to fre‐
437 quency of access to I/O addresses.
438
439
440 th_define -n se -l 0x20 1 -a pio_r -o OR 0x4 -c 10 1000
441
442
443 Simulates a stuck serial chip command by forcing 1000 consecutive read
444 accesses made by any instance of the se driver to its command status
445 register, thereby returning status busy.
446
447
448 th_define -n foo -i 3 -r 1 -a pio_r -c 0 1 -f 1 -o OR 0x100
449
450
451 Causes 0x100 to be ORed into the next physical I/O read access from any
452 register in register set 1 of instance 3 of the foo driver. Subsequent
453 calls in the driver to ddi_check_acc_handle() return DDI_FAILURE.
454
455
456 th_define -n foo -i 3 -r 1 -a pio_r -c 0 1 -o OR 0x0
457
458
459 Causes 0x0 to be ORed into the next physical I/O read access from any
460 register in register set 1 of instance 3 of the foo driver. This is of
461 course a no-op.
462
463
464 th_define -n foo -i 3 -r 1 -l 0x8100 1 -a pio_r -c 0 10 -o EQ 0x70003
465
466
467 Causes the next ten next physical I/O reads from the register at offset
468 0x8100 in register set 1 of instance 3 of the foo driver to return
469 0x70003.
470
471
472 th_define -n foo -i 3 -r 1 -l 0x8100 1 -a pio_w -c 100 3 -o AND
473 0xffffffffffffefff
474
475
476 The next 100 physical I/O writes to the register at offset 0x8100 in
477 register set 1 of instance 3 of the foo driver take place as normal.
478 However, on each of the three subsequent accesses, the 0x1000 bit will
479 be cleared.
480
481
482 th_define -n foo -i 3 -r 1 -l 0x8100 0x10 -a pio_r -c 0 1 -f 1 -o XOR 7
483
484
485 Causes the bottom three bits to have their values toggled for the next
486 physical I/O read access to registers with offsets in the range 0x8100
487 to 0x8110 in register set 1 of instance 3 of the foo driver. Subsequent
488 calls in the driver to ddi_check_acc_handle() return DDI_FAILURE.
489
490
491 th_define -n foo -i 3 -a pio_w -c 0 1 -o NO 0
492
493
494 Prevents the next physical I/O write access to any register in any reg‐
495 ister set of instance 3 of the foo driver from going out on the bus.
496
497
498 th_define -n foo -i 3 -l 0 8192 -a dma_r -c 0 1 -o OR 7
499
500
501 Causes 0x7 to be ORed into each long long in the first 8192 bytes of
502 the next DMA read, using any DMA handle for instance 3 of the foo
503 driver.
504
505
506 th_define -n foo -i 3 -r 2 -l 0 8 -a dma_r -c 0 1 -o OR
507 0x7070707070707070
508
509
510 Causes 0x70 to be ORed into each byte of the first long long of the
511 next DMA read, using the DMA handle with sequential allocation number 2
512 for instance 3 of the foo driver.
513
514
515 th_define -n foo -i 3 -l 256 256 -a dma_w -c 0 1 -f 2 -o OR 7
516
517
518 Causes 0x7 to be ORed into each long long in the range from offset 256
519 to offset 512 of the next DMA write, using any DMA handle for instance
520 3 of the foo driver. Subsequent calls in the driver to
521 ddi_check_dma_handle() return DDI_FAILURE.
522
523
524 th_define -n foo -i 3 -r 0 -l 0 8 -a dma_w -c 100 3 -o AND
525 0xffffffffffffefff
526
527
528 The next 100 DMA writes using the DMA handle with sequential allocation
529 number 0 for instance 3 of the foo driver take place as normal. How‐
530 ever, on each of the three subsequent accesses, the 0x1000 bit will be
531 cleared in the first long long of the transfer.
532
533
534 th_define -n foo -i 3 -a intr -c 0 6 -o LOSE 0
535
536
537 Causes the next six interrupts for instance 3 of the foo driver to be
538 lost.
539
540
541 th_define -n foo -i 3 -a intr -c 30 1 -o EXTRA 10
542
543
544 When the thirty-first subsequent interrupt for instance 3 of the foo
545 driver occurs, a further ten interrupts are also generated.
546
547
548 th_define -n foo -i 3 -a intr -c 0 1 -o DELAY 1024
549
550
551 Causes the next interrupt for instance 3 of the foo driver to be
552 delayed by 1024 microseconds.
553
555 The policy option in the th_define -p syntax determines how a set of
556 logged accesses will be converted into the set of error definitions.
557 Each logged access will be matched against the chosen policies to
558 determine whether an error definition should be created based on the
559 access.
560
561
562 Any number of policy options can be combined to modify the generated
563 error definitions.
564
565 Bytewise Policies
566 These select particular I/O transfer sizes. Specifing a byte policy
567 will exclude other byte policies that have not been chosen. If none of
568 the byte type policies is selected, all transfer sizes are treated
569 equally. Otherwise, only those specified transfer sizes will be
570 selected.
571
572 onebyte Create errdefs for one byte accesses (ddi_get8())
573
574
575 twobyte Create errdefs for two byte accesses (ddi_get16())
576
577
578 fourbyte Create errdefs for four byte accesses (ddi_get32())
579
580
581 eightbyte Create errdefs for eight byte accesses (ddi_get64())
582
583
584 multibyte Create errdefs for repeated byte accesses (ddi_rep_get*())
585
586
587 Frequency of Access Policies
588 The frequency of access to a location is determined according to the
589 access type, location and transfer size (for example, a two-byte read
590 access to address A is considered distinct from a four-byte read access
591 to address A). The algorithm is to count the number of accesses (of a
592 given type and size) to a given location, and find the locations that
593 were most and least accessed (let maxa and mina be the number of times
594 these locations were accessed, and mean the total number of accesses
595 divided by total number of locations that were accessed). Then a rare
596 access is a location that was accessed less than
597
598
599 (mean - mina) / 3 + mina
600
601
602 times. Similarly for the definition of common accesses:
603
604
605 maxa - (maxa - mean) / 3
606
607
608 A location whose access patterns lies within these cutoffs is regarded
609 as a location that is accessed with median frequency.
610
611 rare Create errdefs for locations that are rarely accessed.
612
613
614 common Create errdefs for locations that are commonly accessed.
615
616
617 median Create errdefs for locations that are accessed a median fre‐
618 quency.
619
620
621 Policies for Minimizing errdefs
622 If a transaction is duplicated, either a single or multiple errdefs
623 will be written to the test scripts, depending upon the following two
624 policies:
625
626 maximal Create multiple errdefs for locations that are repeatedly
627 accessed.
628
629
630 unbiased Create a single errdef for locations that are repeatedly
631 accessed.
632
633
634 operators For each location, a default operator and operand is typi‐
635 cally applied. For maximal test coverage, this default may
636 be modified using the operators policy so that a separate
637 errdef is created for each of the possible corruption
638 operators.
639
640
642 kill(1), th_manage(1M), alarm(2), ddi_check_acc_handle(9F),
643 ddi_check_dma_handle(9F)
644
645
646
647SunOS 5.11 11 Apr 2001 th_define(1M)