1Tcl_GetEncoding(3) Tcl Library Procedures Tcl_GetEncoding(3)
2
3
4
5______________________________________________________________________________
6
8 Tcl_GetEncoding, Tcl_FreeEncoding, Tcl_GetEncodingFromObj,
9 Tcl_ExternalToUtfDString, Tcl_ExternalToUtf, Tcl_UtfToExternalDString,
10 Tcl_UtfToExternal, Tcl_WinTCharToUtf, Tcl_WinUtfToTChar, Tcl_GetEncod‐
11 ingName, Tcl_SetSystemEncoding, Tcl_GetEncodingNameFromEnvironment,
12 Tcl_GetEncodingNames, Tcl_CreateEncoding, Tcl_GetEncodingSearchPath,
13 Tcl_SetEncodingSearchPath, Tcl_GetDefaultEncodingDir, Tcl_SetDefault‐
14 EncodingDir - procedures for creating and using encodings
15
17 #include <tcl.h>
18
19 Tcl_Encoding
20 Tcl_GetEncoding(interp, name)
21
22 void
23 Tcl_FreeEncoding(encoding)
24
25 int
26 Tcl_GetEncodingFromObj(interp, objPtr, encodingPtr)
27
28 char *
29 Tcl_ExternalToUtfDString(encoding, src, srcLen, dstPtr)
30
31 char *
32 Tcl_UtfToExternalDString(encoding, src, srcLen, dstPtr)
33
34 int
35 Tcl_ExternalToUtf(interp, encoding, src, srcLen, flags, statePtr,
36 dst, dstLen, srcReadPtr, dstWrotePtr, dstCharsPtr)
37
38 int
39 Tcl_UtfToExternal(interp, encoding, src, srcLen, flags, statePtr,
40 dst, dstLen, srcReadPtr, dstWrotePtr, dstCharsPtr)
41
42 char *
43 Tcl_WinTCharToUtf(tsrc, srcLen, dstPtr)
44
45 TCHAR *
46 Tcl_WinUtfToTChar(src, srcLen, dstPtr)
47
48 const char *
49 Tcl_GetEncodingName(encoding)
50
51 int
52 Tcl_SetSystemEncoding(interp, name)
53
54 const char *
55 Tcl_GetEncodingNameFromEnvironment(bufPtr)
56
57 void
58 Tcl_GetEncodingNames(interp)
59
60 Tcl_Encoding
61 Tcl_CreateEncoding(typePtr)
62
63 Tcl_Obj *
64 Tcl_GetEncodingSearchPath()
65
66 int
67 Tcl_SetEncodingSearchPath(searchPath)
68
69 const char *
70 Tcl_GetDefaultEncodingDir(void)
71
72 void
73 Tcl_SetDefaultEncodingDir(path)
74
76 Tcl_Interp *interp (in) Interpreter to use
77 for error reporting,
78 or NULL if no error
79 reporting is desired.
80
81 const char *name (in) Name of encoding to
82 load.
83
84 Tcl_Encoding encoding (in) The encoding to
85 query, free, or use
86 for converting text.
87 If encoding is NULL,
88 the current system
89 encoding is used.
90
91 Tcl_Obj *objPtr (in) Name of encoding to
92 get token for.
93
94 Tcl_Encoding *encodingPtr (out) Points to storage
95 where encoding token
96 is to be written.
97
98 const char *src (in) For the
99 Tcl_ExternalToUtf
100 functions, an array
101 of bytes in the spec‐
102 ified encoding that
103 are to be converted
104 to UTF-8. For the
105 Tcl_UtfToExternal and
106 Tcl_WinUtfToTChar
107 functions, an array
108 of UTF-8 characters
109 to be converted to
110 the specified encod‐
111 ing.
112
113 const TCHAR *tsrc (in) An array of Windows
114 TCHAR characters to
115 convert to UTF-8.
116
117 int srcLen (in) Length of src or tsrc
118 in bytes. If the
119 length is negative,
120 the encoding-specific
121 length of the string
122 is used.
123
124 Tcl_DString *dstPtr (out) Pointer to an unini‐
125 tialized or free
126 Tcl_DString in which
127 the converted result
128 will be stored.
129
130 int flags (in) Various flag bits OR-
131 ed together.
132 TCL_ENCODING_START
133 signifies that the
134 source buffer is the
135 first block in a
136 (potentially multi-
137 block) input stream,
138 telling the conver‐
139 sion routine to reset
140 to an initial state
141 and perform any ini‐
142 tialization that
143 needs to occur before
144 the first byte is
145 converted. TCL_ENCOD‐
146 ING_END signifies
147 that the source buf‐
148 fer is the last block
149 in a (potentially
150 multi-block) input
151 stream, telling the
152 conversion routine to
153 perform any finaliza‐
154 tion that needs to
155 occur after the last
156 byte is converted and
157 then to reset to an
158 initial state.
159 TCL_ENCODING_STOPON‐
160 ERROR signifies that
161 the conversion rou‐
162 tine should return
163 immediately upon
164 reading a source
165 character that does
166 not exist in the tar‐
167 get encoding; other‐
168 wise a default fall‐
169 back character will
170 automatically be sub‐
171 stituted.
172
173 Tcl_EncodingState *statePtr (in/out) Used when converting
174 a (generally long or
175 indefinite length)
176 byte stream in a
177 piece-by-piece fash‐
178 ion. The conversion
179 routine stores its
180 current state in
181 *statePtr after src
182 (the buffer contain‐
183 ing the current
184 piece) has been con‐
185 verted; that state
186 information must be
187 passed back when con‐
188 verting the next
189 piece of the stream
190 so the conversion
191 routine knows what
192 state it was in when
193 it left off at the
194 end of the last
195 piece. May be NULL,
196 in which case the
197 value specified for
198 flags is ignored and
199 the source buffer is
200 assumed to contain
201 the complete string
202 to convert.
203
204 char *dst (out) Buffer in which the
205 converted result will
206 be stored. No more
207 than dstLen bytes
208 will be stored in
209 dst.
210
211 int dstLen (in) The maximum length of
212 the output buffer dst
213 in bytes.
214
215 int *srcReadPtr (out) Filled with the num‐
216 ber of bytes from src
217 that were actually
218 converted. This may
219 be less than the
220 original source
221 length if there was a
222 problem converting
223 some source charac‐
224 ters. May be NULL.
225
226 int *dstWrotePtr (out) Filled with the num‐
227 ber of bytes that
228 were actually stored
229 in the output buffer
230 as a result of the
231 conversion. May be
232 NULL.
233
234 int *dstCharsPtr (out) Filled with the num‐
235 ber of characters
236 that correspond to
237 the number of bytes
238 stored in the output
239 buffer. May be NULL.
240
241 Tcl_DString *bufPtr (out) Storage for the pre‐
242 scribed system encod‐
243 ing name.
244
245 const Tcl_EncodingType *typePtr (in) Structure that
246 defines a new type of
247 encoding.
248
249 Tcl_Obj *searchPath (in) List of filesystem
250 directories in which
251 to search for encod‐
252 ing data files.
253
254 const char *path (in) A path to the loca‐
255 tion of the encoding
256 file.
257______________________________________________________________________________
258
260 These routines convert between Tcl's internal character representation,
261 UTF-8, and character representations used by various operating systems
262 or file systems, such as Unicode, ASCII, or Shift-JIS. When operating
263 on strings, such as such as obtaining the names of files or displaying
264 characters using international fonts, the strings must be translated
265 into one or possibly multiple formats that the various system calls can
266 use. For instance, on a Japanese Unix workstation, a user might obtain
267 a filename represented in the EUC-JP file encoding and then translate
268 the characters to the jisx0208 font encoding in order to display the
269 filename in a Tk widget. The purpose of the encoding package is to
270 help bridge the translation gap. UTF-8 provides an intermediate stag‐
271 ing ground for all the various encodings. In the example above, text
272 would be translated into UTF-8 from whatever file encoding the operat‐
273 ing system is using. Then it would be translated from UTF-8 into what‐
274 ever font encoding the display routines require.
275
276 Some basic encodings are compiled into Tcl. Others can be defined by
277 the user or dynamically loaded from encoding files in a platform-inde‐
278 pendent manner.
279
281 Tcl_GetEncoding finds an encoding given its name. The name may refer
282 to a built-in Tcl encoding, a user-defined encoding registered by call‐
283 ing Tcl_CreateEncoding, or a dynamically-loadable encoding file. The
284 return value is a token that represents the encoding and can be used in
285 subsequent calls to procedures such as Tcl_GetEncodingName, Tcl_FreeEn‐
286 coding, and Tcl_UtfToExternal. If the name did not refer to any known
287 or loadable encoding, NULL is returned and an error message is returned
288 in interp.
289
290 The encoding package maintains a database of all encodings currently in
291 use. The first time name is seen, Tcl_GetEncoding returns an encoding
292 with a reference count of 1. If the same name is requested further
293 times, then the reference count for that encoding is incremented with‐
294 out the overhead of allocating a new encoding and all its associated
295 data structures.
296
297 When an encoding is no longer needed, Tcl_FreeEncoding should be called
298 to release it. When an encoding is no longer in use anywhere (i.e., it
299 has been freed as many times as it has been gotten) Tcl_FreeEncoding
300 will release all storage the encoding was using and delete it from the
301 database.
302
303 Tcl_GetEncodingFromObj treats the string representation of objPtr as an
304 encoding name, and finds an encoding with that name, just as Tcl_GetEn‐
305 coding does. When an encoding is found, it is cached within the objPtr
306 value for future reference, the Tcl_Encoding token is written to the
307 storage pointed to by encodingPtr, and the value TCL_OK is returned. If
308 no such encoding is found, the value TCL_ERROR is returned, and no
309 writing to *encodingPtr takes place. Just as with Tcl_GetEncoding, the
310 caller should call Tcl_FreeEncoding on the resulting encoding token
311 when that token will no longer be used.
312
313 Tcl_ExternalToUtfDString converts a source buffer src from the speci‐
314 fied encoding into UTF-8. The converted bytes are stored in dstPtr,
315 which is then null-terminated. The caller should eventually call
316 Tcl_DStringFree to free any information stored in dstPtr. When con‐
317 verting, if any of the characters in the source buffer cannot be repre‐
318 sented in the target encoding, a default fallback character will be
319 used. The return value is a pointer to the value stored in the
320 DString.
321
322 Tcl_ExternalToUtf converts a source buffer src from the specified
323 encoding into UTF-8. Up to srcLen bytes are converted from the source
324 buffer and up to dstLen converted bytes are stored in dst. In all
325 cases, *srcReadPtr is filled with the number of bytes that were suc‐
326 cessfully converted from src and *dstWrotePtr is filled with the corre‐
327 sponding number of bytes that were stored in dst. The return value is
328 one of the following:
329
330 TCL_OK All bytes of src were converted.
331
332 TCL_CONVERT_NOSPACE The destination buffer was not
333 large enough for all of the con‐
334 verted data; as many characters as
335 could fit were converted though.
336
337 TCL_CONVERT_MULTIBYTE The last few bytes in the source
338 buffer were the beginning of a
339 multibyte sequence, but more bytes
340 were needed to complete this
341 sequence. A subsequent call to the
342 conversion routine should pass a
343 buffer containing the unconverted
344 bytes that remained in src plus
345 some further bytes from the source
346 stream to properly convert the for‐
347 merly split-up multibyte sequence.
348
349 TCL_CONVERT_SYNTAX The source buffer contained an
350 invalid character sequence. This
351 may occur if the input stream has
352 been damaged or if the input encod‐
353 ing method was misidentified.
354
355 TCL_CONVERT_UNKNOWN The source buffer contained a char‐
356 acter that could not be represented
357 in the target encoding and
358 TCL_ENCODING_STOPONERROR was speci‐
359 fied.
360
361 Tcl_UtfToExternalDString converts a source buffer src from UTF-8 into
362 the specified encoding. The converted bytes are stored in dstPtr,
363 which is then terminated with the appropriate encoding-specific null.
364 The caller should eventually call Tcl_DStringFree to free any informa‐
365 tion stored in dstPtr. When converting, if any of the characters in
366 the source buffer cannot be represented in the target encoding, a
367 default fallback character will be used. The return value is a pointer
368 to the value stored in the DString.
369
370 Tcl_UtfToExternal converts a source buffer src from UTF-8 into the
371 specified encoding. Up to srcLen bytes are converted from the source
372 buffer and up to dstLen converted bytes are stored in dst. In all
373 cases, *srcReadPtr is filled with the number of bytes that were suc‐
374 cessfully converted from src and *dstWrotePtr is filled with the corre‐
375 sponding number of bytes that were stored in dst. The return values
376 are the same as the return values for Tcl_ExternalToUtf.
377
378 Tcl_WinUtfToTChar and Tcl_WinTCharToUtf are Windows-only convenience
379 functions for converting between UTF-8 and Windows strings based on the
380 TCHAR type which is by convention a Unicode character on Windows NT.
381
382 Tcl_GetEncodingName is roughly the inverse of Tcl_GetEncoding. Given
383 an encoding, the return value is the name argument that was used to
384 create the encoding. The string returned by Tcl_GetEncodingName is
385 only guaranteed to persist until the encoding is deleted. The caller
386 must not modify this string.
387
388 Tcl_SetSystemEncoding sets the default encoding that should be used
389 whenever the user passes a NULL value for the encoding argument to any
390 of the other encoding functions. If name is NULL, the system encoding
391 is reset to the default system encoding, binary. If the name did not
392 refer to any known or loadable encoding, TCL_ERROR is returned and an
393 error message is left in interp. Otherwise, this procedure increments
394 the reference count of the new system encoding, decrements the refer‐
395 ence count of the old system encoding, and returns TCL_OK.
396
397 Tcl_GetEncodingNameFromEnvironment provides a means for the Tcl library
398 to report the encoding name it believes to be the correct one to use as
399 the system encoding, based on system calls and examination of the envi‐
400 ronment suitable for the platform. It accepts bufPtr, a pointer to an
401 uninitialized or freed Tcl_DString and writes the encoding name to it.
402 The Tcl_DStringValue is returned.
403
404 Tcl_GetEncodingNames sets the interp result to a list consisting of the
405 names of all the encodings that are currently defined or can be dynami‐
406 cally loaded, searching the encoding path specified by Tcl_SetDefault‐
407 EncodingDir. This procedure does not ensure that the dynamically-load‐
408 able encoding files contain valid data, but merely that they exist.
409
410 Tcl_CreateEncoding defines a new encoding and registers the C proce‐
411 dures that are called back to convert between the encoding and UTF-8.
412 Encodings created by Tcl_CreateEncoding are thereafter visible in the
413 database used by Tcl_GetEncoding. Just as with the Tcl_GetEncoding
414 procedure, the return value is a token that represents the encoding and
415 can be used in subsequent calls to other encoding functions. Tcl_Cre‐
416 ateEncoding returns an encoding with a reference count of 1. If an
417 encoding with the specified name already exists, then its entry in the
418 database is replaced with the new encoding; the token for the old
419 encoding will remain valid and continue to behave as before, but users
420 of the new token will now call the new encoding procedures.
421
422 The typePtr argument to Tcl_CreateEncoding contains information about
423 the name of the encoding and the procedures that will be called to con‐
424 vert between this encoding and UTF-8. It is defined as follows:
425
426 typedef struct Tcl_EncodingType {
427 const char *encodingName;
428 Tcl_EncodingConvertProc *toUtfProc;
429 Tcl_EncodingConvertProc *fromUtfProc;
430 Tcl_EncodingFreeProc *freeProc;
431 ClientData clientData;
432 int nullSize;
433 } Tcl_EncodingType;
434
435 The encodingName provides a string name for the encoding, by which it
436 can be referred in other procedures such as Tcl_GetEncoding. The
437 toUtfProc refers to a callback procedure to invoke to convert text from
438 this encoding into UTF-8. The fromUtfProc refers to a callback proce‐
439 dure to invoke to convert text from UTF-8 into this encoding. The
440 freeProc refers to a callback procedure to invoke when this encoding is
441 deleted. The freeProc field may be NULL. The clientData contains an
442 arbitrary one-word value passed to toUtfProc, fromUtfProc, and freeProc
443 whenever they are called. Typically, this is a pointer to a data
444 structure containing encoding-specific information that can be used by
445 the callback procedures. For instance, two very similar encodings such
446 as ascii and macRoman may use the same callback procedure, but use dif‐
447 ferent values of clientData to control its behavior. The nullSize
448 specifies the number of zero bytes that signify end-of-string in this
449 encoding. It must be 1 (for single-byte or multi-byte encodings like
450 ASCII or Shift-JIS) or 2 (for double-byte encodings like Unicode).
451 Constant-sized encodings with 3 or more bytes per character (such as
452 CNS11643) are not accepted.
453
454 The callback procedures toUtfProc and fromUtfProc should match the type
455 Tcl_EncodingConvertProc:
456
457 typedef int Tcl_EncodingConvertProc(
458 ClientData clientData,
459 const char *src,
460 int srcLen,
461 int flags,
462 Tcl_EncodingState *statePtr,
463 char *dst,
464 int dstLen,
465 int *srcReadPtr,
466 int *dstWrotePtr,
467 int *dstCharsPtr);
468
469 The toUtfProc and fromUtfProc procedures are called by the
470 Tcl_ExternalToUtf or Tcl_UtfToExternal family of functions to perform
471 the actual conversion. The clientData parameter to these procedures is
472 the same as the clientData field specified to Tcl_CreateEncoding when
473 the encoding was created. The remaining arguments to the callback pro‐
474 cedures are the same as the arguments, documented at the top, to
475 Tcl_ExternalToUtf or Tcl_UtfToExternal, with the following exceptions.
476 If the srcLen argument to one of those high-level functions is nega‐
477 tive, the value passed to the callback procedure will be the appropri‐
478 ate encoding-specific string length of src. If any of the srcReadPtr,
479 dstWrotePtr, or dstCharsPtr arguments to one of the high-level func‐
480 tions is NULL, the corresponding value passed to the callback procedure
481 will be a non-NULL location.
482
483 The callback procedure freeProc, if non-NULL, should match the type
484 Tcl_EncodingFreeProc:
485
486 typedef void Tcl_EncodingFreeProc(
487 ClientData clientData);
488
489 This freeProc function is called when the encoding is deleted. The
490 clientData parameter is the same as the clientData field specified to
491 Tcl_CreateEncoding when the encoding was created.
492
493 Tcl_GetEncodingSearchPath and Tcl_SetEncodingSearchPath are called to
494 access and set the list of filesystem directories searched for encoding
495 data files.
496
497 The value returned by Tcl_GetEncodingSearchPath is the value stored by
498 the last successful call to Tcl_SetEncodingSearchPath. If no calls to
499 Tcl_SetEncodingSearchPath have occurred, Tcl will compute an initial
500 value based on the environment. There is one encoding search path for
501 the entire process, shared by all threads in the process.
502
503 Tcl_SetEncodingSearchPath stores searchPath and returns TCL_OK, unless
504 searchPath is not a valid Tcl list, which causes TCL_ERROR to be
505 returned. The elements of searchPath are not verified as existing
506 readable filesystem directories. When searching for encoding data
507 files takes place, and non-existent or non-readable filesystem directo‐
508 ries on the searchPath are silently ignored.
509
510 Tcl_GetDefaultEncodingDir and Tcl_SetDefaultEncodingDir are obsolete
511 interfaces best replaced with calls to Tcl_GetEncodingSearchPath and
512 Tcl_SetEncodingSearchPath. They are called to access and set the first
513 element of the searchPath list. Since Tcl searches searchPath for
514 encoding data files in list order, these routines establish the
515 “default” directory in which to find encoding data files.
516
518 Space would prohibit precompiling into Tcl every possible encoding
519 algorithm, so many encodings are stored on disk as dynamically-loadable
520 encoding files. This behavior also allows the user to create addi‐
521 tional encoding files that can be loaded using the same mechanism.
522 These encoding files contain information about the tables and/or escape
523 sequences used to map between an external encoding and Unicode. The
524 external encoding may consist of single-byte, multi-byte, or double-
525 byte characters.
526
527 Each dynamically-loadable encoding is represented as a text file. The
528 initial line of the file, beginning with a “#” symbol, is a comment
529 that provides a human-readable description of the file. The next line
530 identifies the type of encoding file. It can be one of the following
531 letters:
532
533 [1] S A single-byte encoding, where one character is always one byte
534 long in the encoding. An example is iso8859-1, used by many
535 European languages.
536
537 [2] D A double-byte encoding, where one character is always two bytes
538 long in the encoding. An example is big5, used for Chinese
539 text.
540
541 [3] M A multi-byte encoding, where one character may be either one or
542 two bytes long. Certain bytes are lead bytes, indicating that
543 another byte must follow and that together the two bytes repre‐
544 sent one character. Other bytes are not lead bytes and repre‐
545 sent themselves. An example is shiftjis, used by many Japanese
546 computers.
547
548 [4] E An escape-sequence encoding, specifying that certain sequences
549 of bytes do not represent characters, but commands that describe
550 how following bytes should be interpreted.
551
552 The rest of the lines in the file depend on the type.
553
554 Cases [1], [2], and [3] are collectively referred to as table-based
555 encoding files. The lines in a table-based encoding file are in the
556 same format as this example taken from the shiftjis encoding (this is
557 not the complete file):
558
559 # Encoding file: shiftjis, multi-byte
560 M
561 003F 0 40
562 00
563 0000000100020003000400050006000700080009000A000B000C000D000E000F
564 0010001100120013001400150016001700180019001A001B001C001D001E001F
565 0020002100220023002400250026002700280029002A002B002C002D002E002F
566 0030003100320033003400350036003700380039003A003B003C003D003E003F
567 0040004100420043004400450046004700480049004A004B004C004D004E004F
568 0050005100520053005400550056005700580059005A005B005C005D005E005F
569 0060006100620063006400650066006700680069006A006B006C006D006E006F
570 0070007100720073007400750076007700780079007A007B007C007D203E007F
571 0080000000000000000000000000000000000000000000000000000000000000
572 0000000000000000000000000000000000000000000000000000000000000000
573 0000FF61FF62FF63FF64FF65FF66FF67FF68FF69FF6AFF6BFF6CFF6DFF6EFF6F
574 FF70FF71FF72FF73FF74FF75FF76FF77FF78FF79FF7AFF7BFF7CFF7DFF7EFF7F
575 FF80FF81FF82FF83FF84FF85FF86FF87FF88FF89FF8AFF8BFF8CFF8DFF8EFF8F
576 FF90FF91FF92FF93FF94FF95FF96FF97FF98FF99FF9AFF9BFF9CFF9DFF9EFF9F
577 0000000000000000000000000000000000000000000000000000000000000000
578 0000000000000000000000000000000000000000000000000000000000000000
579 81
580 0000000000000000000000000000000000000000000000000000000000000000
581 0000000000000000000000000000000000000000000000000000000000000000
582 0000000000000000000000000000000000000000000000000000000000000000
583 0000000000000000000000000000000000000000000000000000000000000000
584 300030013002FF0CFF0E30FBFF1AFF1BFF1FFF01309B309C00B4FF4000A8FF3E
585 FFE3FF3F30FD30FE309D309E30034EDD30053006300730FC20152010FF0F005C
586 301C2016FF5C2026202520182019201C201DFF08FF0930143015FF3BFF3DFF5B
587 FF5D30083009300A300B300C300D300E300F30103011FF0B221200B100D70000
588 00F7FF1D2260FF1CFF1E22662267221E22342642264000B0203220332103FFE5
589 FF0400A200A3FF05FF03FF06FF0AFF2000A72606260525CB25CF25CE25C725C6
590 25A125A025B325B225BD25BC203B301221922190219121933013000000000000
591 000000000000000000000000000000002208220B2286228722822283222A2229
592 000000000000000000000000000000002227222800AC21D221D4220022030000
593 0000000000000000000000000000000000000000222022A52312220222072261
594 2252226A226B221A223D221D2235222B222C0000000000000000000000000000
595 212B2030266F266D266A2020202100B6000000000000000025EF000000000000
596
597 The third line of the file is three numbers. The first number is the
598 fallback character (in base 16) to use when converting from UTF-8 to
599 this encoding. The second number is a 1 if this file represents the
600 encoding for a symbol font, or 0 otherwise. The last number (in base
601 10) is how many pages of data follow.
602
603 Subsequent lines in the example above are pages that describe how to
604 map from the encoding into 2-byte Unicode. The first line in a page
605 identifies the page number. Following it are 256 double-byte numbers,
606 arranged as 16 rows of 16 numbers. Given a character in the encoding,
607 the high byte of that character is used to select which page, and the
608 low byte of that character is used as an index to select one of the
609 double-byte numbers in that page - the value obtained being the corre‐
610 sponding Unicode character. By examination of the example above, one
611 can see that the characters 0x7E and 0x8163 in shiftjis map to 203E and
612 2026 in Unicode, respectively.
613
614 Following the first page will be all the other pages, each in the same
615 format as the first: one number identifying the page followed by 256
616 double-byte Unicode characters. If a character in the encoding maps to
617 the Unicode character 0000, it means that the character does not actu‐
618 ally exist. If all characters on a page would map to 0000, that page
619 can be omitted.
620
621 Case [4] is the escape-sequence encoding file. The lines in an this
622 type of file are in the same format as this example taken from the
623 iso2022-jp encoding:
624
625 # Encoding file: iso2022-jp, escape-driven
626 E
627 init {}
628 final {}
629 iso8859-1 \x1b(B
630 jis0201 \x1b(J
631 jis0208 \x1b$@
632 jis0208 \x1b$B
633 jis0212 \x1b$(D
634 gb2312 \x1b$A
635 ksc5601 \x1b$(C
636
637 In the file, the first column represents an option and the second col‐
638 umn is the associated value. init is a string to emit or expect before
639 the first character is converted, while final is a string to emit or
640 expect after the last character. All other options are names of table-
641 based encodings; the associated value is the escape-sequence that marks
642 that encoding. Tcl syntax is used for the values; in the above exam‐
643 ple, for instance, “{}” represents the empty string and “\x1b” repre‐
644 sents character 27.
645
646 When Tcl_GetEncoding encounters an encoding name that has not been
647 loaded, it attempts to load an encoding file called name.enc from the
648 encoding subdirectory of each directory that Tcl searches for its
649 script library. If the encoding file exists, but is malformed, an
650 error message will be left in interp.
651
653 utf, encoding, convert
654
655
656
657Tcl 8.1 Tcl_GetEncoding(3)