QTextCodec(3qt)

1QTextCodec(3qt)                                                QTextCodec(3qt)
2
3
4

NAME

6       QTextCodec - Conversion between text encodings
7

SYNOPSIS

9       Almost all the functions in this class are reentrant when Qt is built
10       with thread support. The exceptions are ~QTextCodec(), setCodecForTr(),
11       setCodecForCStrings(), and QTextCodec(). </p>
12
13       #include <qtextcodec.h>
14
15       Inherited by QBig5Codec, QBig5hkscsCodec, QEucJpCodec, QEucKrCodec,
16       QGb18030Codec, QJisCodec, QHebrewCodec, QSjisCodec, and QTsciiCodec.
17
18   Public Members
19       virtual ~QTextCodec ()
20       virtual const char * name () const = 0
21       virtual const char * mimeName () const
22       virtual int mibEnum () const = 0
23       virtual QTextDecoder * makeDecoder () const
24       virtual QTextEncoder * makeEncoder () const
25       virtual QString toUnicode ( const char * chars, int len ) const
26       virtual QCString fromUnicode ( const QString & uc, int & lenInOut )
27           const
28       QCString fromUnicode ( const QString & uc ) const
29       QString toUnicode ( const QByteArray & a, int len ) const
30       QString toUnicode ( const QByteArray & a ) const
31       QString toUnicode ( const QCString & a, int len ) const
32       QString toUnicode ( const QCString & a ) const
33       QString toUnicode ( const char * chars ) const
34       virtual bool canEncode ( QChar ch ) const
35       virtual bool canEncode ( const QString & s ) const
36       virtual int heuristicContentMatch ( const char * chars, int len ) const
37           = 0
38       virtual int heuristicNameMatch ( const char * hint ) const
39
40   Static Public Members
41       QTextCodec * loadCharmap ( QIODevice * iod )
42       QTextCodec * loadCharmapFile ( QString filename )
43       QTextCodec * codecForMib ( int mib )
44       QTextCodec * codecForName ( const char * name, int accuracy = 0 )
45       QTextCodec * codecForContent ( const char * chars, int len )
46       QTextCodec * codecForIndex ( int i )
47       QTextCodec * codecForLocale ()
48       void setCodecForLocale ( QTextCodec * c )
49       QTextCodec * codecForTr ()
50       void setCodecForTr ( QTextCodec * c )
51       QTextCodec * codecForCStrings ()
52       void setCodecForCStrings ( QTextCodec * c )
53       void deleteAllCodecs ()
54       const char * locale ()
55
56   Protected Members
57       QTextCodec ()
58
59   Static Protected Members
60       int simpleHeuristicNameMatch ( const char * name, const char * hint )
61

DESCRIPTION

63       The QTextCodec class provides conversion between text encodings.
64
65       Qt uses Unicode to store, draw and manipulate strings. In many
66       situations you may wish to deal with data that uses a different
67       encoding. For example, most Japanese documents are still stored in
68       Shift-JIS or ISO2022, while Russian users often have their documents in
69       KOI8-R or CP1251.
70
71       Qt provides a set of QTextCodec classes to help with converting non-
72       Unicode formats to and from Unicode. You can also create your own codec
73       classes (see later).
74
75       The supported encodings are:
76
77       Latin1
78
79       Big5 -- Chinese
80
81       Big5-HKSCS -- Chinese
82
83       eucJP -- Japanese
84
85       eucKR -- Korean
86
87       GB2312 -- Chinese
88
89       GBK -- Chinese
90
91       GB18030 -- Chinese
92
93       JIS7 -- Japanese
94
95       Shift-JIS -- Japanese
96
97       TSCII -- Tamil
98
99       utf8 -- Unicode, 8-bit
100
101       utf16 -- Unicode
102
103       KOI8-R -- Russian
104
105       KOI8-U -- Ukrainian
106
107       ISO8859-1 -- Western
108
109       ISO8859-2 -- Central European
110
111       ISO8859-3 -- Central European
112
113       ISO8859-4 -- Baltic
114
115       ISO8859-5 -- Cyrillic
116
117       ISO8859-6 -- Arabic
118
119       ISO8859-7 -- Greek
120
121       ISO8859-8 -- Hebrew, visually ordered
122
123       ISO8859-8-i -- Hebrew, logically ordered
124
125       ISO8859-9 -- Turkish
126
127       ISO8859-10
128
129       ISO8859-13
130
131       ISO8859-14
132
133       ISO8859-15 -- Western
134
135       IBM 850
136
137       IBM 866
138
139       CP874
140
141       CP1250 -- Central European
142
143       CP1251 -- Cyrillic
144
145       CP1252 -- Western
146
147       CP1253 -- Greek
148
149       CP1254 -- Turkish
150
151       CP1255 -- Hebrew
152
153       CP1256 -- Arabic
154
155       CP1257 -- Baltic
156
157       CP1258
158
159       Apple Roman
160
161       TIS-620 -- Thai
162
163       QTextCodecs can be used as follows to convert some locally encoded
164       string to Unicode. Suppose you have some string encoded in Russian
165       KOI8-R encoding, and want to convert it to Unicode. The simple way to
166       do this is:
167
168           QCString locallyEncoded = "..."; // text to convert
169           QTextCodec *codec = QTextCodec::codecForName("KOI8-R"); // get the codec for KOI8-R
170           QString unicodeString = codec->toUnicode( locallyEncoded );
171
172       After this, unicodeString holds the text converted to Unicode.
173       Converting a string from Unicode to the local encoding is just as easy:
174
175           QString unicodeString = "..."; // any Unicode text
176           QTextCodec *codec = QTextCodec::codecForName("KOI8-R"); // get the codec for KOI8-R
177           QCString locallyEncoded = codec->fromUnicode( unicodeString );
178
179       Some care must be taken when trying to convert the data in chunks, for
180       example, when receiving it over a network. In such cases it is possible
181       that a multi-byte character will be split over two chunks. At best this
182       might result in the loss of a character and at worst cause the entire
183       conversion to fail.
184
185       The approach to use in these situations is to create a QTextDecoder
186       object for the codec and use this QTextDecoder for the whole decoding
187       process, as shown below:
188
189           QTextCodec *codec = QTextCodec::codecForName( "Shift-JIS" );
190           QTextDecoder *decoder = codec->makeDecoder();
191           QString unicodeString;
192           while( receiving_data ) {
193               QByteArray chunk = new_data;
194               unicodeString += decoder->toUnicode( chunk.data(), chunk.length() );
195           }
196
197       The QTextDecoder object maintains state between chunks and therefore
198       works correctly even if a multi-byte character is split between chunks.
199

Creating your own Codec class

201       Support for new text encodings can be added to Qt by creating
202       QTextCodec subclasses.
203
204       Built-in codecs can be overridden by custom codecs since more recently
205       created QTextCodec objects take precedence over earlier ones.
206
207       You may find it more convenient to make your codec class available as a
208       plugin; see the plugin documentation for more details.
209
210       The abstract virtual functions describe the encoder to the system and
211       the coder is used as required in the different text file formats
212       supported by QTextStream, and under X11, for the locale-specific
213       character input and output.
214
215       To add support for another 8-bit encoding to Qt, make a subclass of
216       QTextCodec and implement at least the following methods:
217
218           const char* name() const
219       Return the official name for the encoding.
220
221           int mibEnum() const
222       Return the MIB enum for the encoding if it is listed in the IANA
223       character-sets encoding file.
224
225       If the encoding is multi-byte then it will have "state"; that is, the
226       interpretation of some bytes will be dependent on some preceding bytes.
227       For such encodings, you must implement:
228
229           QTextDecoder* makeDecoder() const
230       Return a QTextDecoder that remembers incomplete multi-byte sequence
231       prefixes or other required state.
232
233       If the encoding does not require state, you should implement:
234
235           QString toUnicode(const char* chars, int len) const
236       Converts len characters from chars to Unicode.
237
238       The base QTextCodec class has default implementations of the above two
239       functions, but they are mutually recursive, so you must re-implement at
240       least one of them, or both for improved efficiency.
241
242       For conversion from Unicode to 8-bit encodings, it is rarely necessary
243       to maintain state. However, two functions similar to the two above are
244       used for encoding:
245
246           QTextEncoder* makeEncoder() const
247       Return a QTextEncoder.
248
249           QCString fromUnicode(const QString& uc, int& lenInOut ) const
250       Converts lenInOut characters (of type QChar) from the start of the
251       string uc, returning a QCString result, and also returning the length
252       of the result in lenInOut.
253
254       Again, these are mutually recursive so only one needs to be
255       implemented, or both if greater efficiency is possible.
256
257       Finally, you must implement:
258
259           int heuristicContentMatch(const char* chars, int len) const
260       Gives a value indicating how likely it is that len characters from
261       chars are in the encoding.
262
263       A good model for this function is the
264       QWindowsLocalCodec::heuristicContentMatch function found in the Qt
265       sources.
266
267       A QTextCodec subclass might have improved performance if you also re-
268       implement:
269
270           bool canEncode( QChar ) const
271       Test if a Unicode character can be encoded.
272
273           bool canEncode( const QString& ) const
274       Test if a string of Unicode characters can be encoded.
275
276           int heuristicNameMatch(const char* hint) const
277       Test if a possibly non-standard name is referring to the codec.
278
279       Codecs can also be created as plugins.
280
281       See also Internationalization with Qt.
282

MEMBER FUNCTION DOCUMENTATION

QTextCodec::QTextCodec () [protected]

285       Warning: This function is not reentrant.</p>
286
287       Constructs a QTextCodec, and gives it the highest precedence. The
288       QTextCodec should always be constructed on the heap (i.e. with new). Qt
289       takes ownership and will delete it when the application terminates.
290

QTextCodec::~QTextCodec () [virtual]

292       Warning: This function is not reentrant.</p>
293
294       Destroys the QTextCodec. Note that you should not delete codecs
295       yourself: once created they become Qt's responsibility.
296

bool QTextCodec::canEncode ( QChar ch ) const [virtual]

298       Returns TRUE if the Unicode character ch can be fully encoded with this
299       codec; otherwise returns FALSE. The default implementation tests if the
300       result of toUnicode(fromUnicode(ch)) is the original ch. Subclasses may
301       be able to improve the efficiency.
302

bool QTextCodec::canEncode ( const QString & s ) const [virtual]

304       This is an overloaded member function, provided for convenience. It
305       behaves essentially like the above function.
306
307       s contains the string being tested for encode-ability.
308

QTextCodec * QTextCodec::codecForCStrings () [static]

310       Returns the codec used by QString to convert to and from const char*
311       and QCStrings. If this function returns 0 (the default), QString
312       assumes Latin-1.
313
314       See also setCodecForCStrings().
315

QTextCodec * QTextCodec::codecForContent ( const char * chars, int len )

317       [static]
318       Searches all installed QTextCodec objects, returning the one which most
319       recognizes the given content. May return 0.
320
321       Note that this is often a poor choice, since character encodings often
322       use most of the available character sequences, and so only by
323       linguistic analysis could a true match be made.
324
325       chars contains the string to check, and len contains the number of
326       characters in the string to use.
327
328       See also heuristicContentMatch().
329
330       Example: qwerty/qwerty.cpp.
331

QTextCodec * QTextCodec::codecForIndex ( int i ) [static]

333       Returns the QTextCodec i positions from the most recently inserted
334       codec, or 0 if there is no such QTextCodec. Thus, codecForIndex(0)
335       returns the most recently created QTextCodec.
336
337       Example: qwerty/qwerty.cpp.
338

QTextCodec * QTextCodec::codecForLocale () [static]

340       Returns a pointer to the codec most suitable for this locale.
341
342       Example: qwerty/qwerty.cpp.
343

QTextCodec * QTextCodec::codecForMib ( int mib ) [static]

345       Returns the QTextCodec which matches the MIBenum mib.
346

QTextCodec * QTextCodec::codecForName ( const char * name, int accuracy = 0 )

348       [static]
349       Searches all installed QTextCodec objects and returns the one which
350       best matches name; the match is case-insensitive. Returns 0 if no
351       codec's heuristicNameMatch() reports a match better than accuracy, or
352       if name is a null string.
353
354       See also heuristicNameMatch().
355

QTextCodec * QTextCodec::codecForTr () [static]

357       Returns the codec used by QObject::tr() on its argument. If this
358       function returns 0 (the default), tr() assumes Latin-1.
359
360       See also setCodecForTr().
361

void QTextCodec::deleteAllCodecs () [static]

363       Deletes all the created codecs.
364
365       Warning: Do not call this function.
366
367       QApplication calls this function just before exiting to delete any
368       QTextCodec objects that may be lying around. Since various other
369       classes hold pointers to QTextCodec objects, it is not safe to call
370       this function earlier.
371
372       If you are using the utility classes (like QString) but not using
373       QApplication, calling this function at the very end of your application
374       may be helpful for chasing down memory leaks by eliminating any
375       QTextCodec objects.
376

QCString QTextCodec::fromUnicode ( const QString & uc, int & lenInOut ) const

378       [virtual]
379       QTextCodec subclasses must reimplement either this function or
380       makeEncoder(). It converts the first lenInOut characters of uc from
381       Unicode to the encoding of the subclass. If lenInOut is negative or too
382       large, the length of uc is used instead.
383
384       Converts lenInOut characters (not bytes) from uc, producing a QCString.
385       lenInOut will be set to the length of the result (in bytes).
386
387       The default implementation makes an encoder with makeEncoder() and
388       converts the input with that. Note that the default makeEncoder()
389       implementation makes an encoder that simply calls this function, hence
390       subclasses must reimplement one function or the other to avoid infinite
391       recursion.
392
393       Reimplemented in QHebrewCodec.
394

QCString QTextCodec::fromUnicode ( const QString & uc ) const

396       This is an overloaded member function, provided for convenience. It
397       behaves essentially like the above function.
398
399       uc is the unicode source string.
400

int QTextCodec::heuristicContentMatch ( const char * chars, int len ) const

402       [pure virtual]
403       QTextCodec subclasses must reimplement this function. It examines the
404       first len bytes of chars and returns a value indicating how likely it
405       is that the string is a prefix of text encoded in the encoding of the
406       subclass. A negative return value indicates that the text is detectably
407       not in the encoding (e.g. it contains characters undefined in the
408       encoding). A return value of 0 indicates that the text should be
409       decoded with this codec rather than as ASCII, but there is no
410       particular evidence. The value should range up to len. Thus, most
411       decoders will return -1, 0, or -len.
412
413       The characters are not null terminated.
414
415       See also codecForContent().
416

int QTextCodec::heuristicNameMatch ( const char * hint ) const [virtual]

418       Returns a value indicating how likely it is that this decoder is
419       appropriate for decoding some format that has the given name. The name
420       is compared with the hint.
421
422       A good match returns a positive number around the length of the string.
423       A bad match is negative.
424
425       The default implementation calls simpleHeuristicNameMatch() with the
426       name of the codec.
427

QTextCodec * QTextCodec::loadCharmap ( QIODevice * iod ) [static]

429       Reads a POSIX2 charmap definition from iod. The parser recognizes the
430       following lines:
431
432       <font name="sans"> <code_set_name> name</br> <escape_char>
433       character</br> % alias alias</br> CHARMAP</br> <token> /xhexbyte
434       <Uunicode> ...</br> <token> /ddecbyte <Uunicode> ...</br> <token>
435       /octbyte <Uunicode> ...</br> <token> /any/any... <Uunicode> ...</br>
436       END CHARMAP</br> </font>
437
438       The resulting QTextCodec is returned (and also added to the global list
439       of codecs). The name() of the result is taken from the code_set_name.
440
441       Note that a codec constructed in this way uses much more memory and is
442       slower than a hand-written QTextCodec subclass, since tables in code
443       are kept in memory shared by all Qt applications.
444
445       See also loadCharmapFile().
446
447       Example: qwerty/qwerty.cpp.
448

QTextCodec * QTextCodec::loadCharmapFile ( QString filename ) [static]

450       A convenience function for loadCharmap() that loads the charmap
451       definition from the file filename.
452

const char * QTextCodec::locale () [static]

454       Returns a string representing the current language and sublanguage,
455       e.g. "pt" for Portuguese, or "pt_br" for Portuguese/Brazil.
456
457       Example: i18n/main.cpp.
458

QTextDecoder * QTextCodec::makeDecoder () const [virtual]

460       Creates a QTextDecoder which stores enough state to decode chunks of
461       char* data to create chunks of Unicode data. The default implementation
462       creates a stateless decoder, which is only sufficient for the simplest
463       encodings where each byte corresponds to exactly one Unicode character.
464
465       The caller is responsible for deleting the returned object.
466

QTextEncoder * QTextCodec::makeEncoder () const [virtual]

468       Creates a QTextEncoder which stores enough state to encode chunks of
469       Unicode data as char* data. The default implementation creates a
470       stateless encoder, which is only sufficient for the simplest encodings
471       where each Unicode character corresponds to exactly one character.
472
473       The caller is responsible for deleting the returned object.
474

int QTextCodec::mibEnum () const [pure virtual]

476       Subclasses of QTextCodec must reimplement this function. It returns the
477       MIBenum (see the IANA character-sets encoding file for more
478       information). It is important that each QTextCodec subclass returns the
479       correct unique value for this function.
480
481       Reimplemented in QEucJpCodec.
482

const char * QTextCodec::mimeName () const [virtual]

484       Returns the preferred mime name of the encoding as defined in the IANA
485       character-sets encoding file.
486
487       Reimplemented in QEucJpCodec, QEucKrCodec, QJisCodec, QHebrewCodec, and
488       QSjisCodec.
489

const char * QTextCodec::name () const [pure virtual]

491       QTextCodec subclasses must reimplement this function. It returns the
492       name of the encoding supported by the subclass. When choosing a name
493       for an encoding, consider these points:
494
495       On X11, heuristicNameMatch( const char * hint ) is used to test if a
496       the QTextCodec can convert between Unicode and the encoding of a font
497       with encoding hint, such as "iso8859-1" for Latin-1 fonts," koi8-r" for
498       Russian KOI8 fonts. The default algorithm of heuristicNameMatch() uses
499       name().
500
501       Some applications may use this function to present encodings to the end
502       user.
503
504       Example: qwerty/qwerty.cpp.
505

void QTextCodec::setCodecForCStrings ( QTextCodec * c ) [static]

507       Warning: This function is not reentrant.</p>
508
509       Sets the codec used by QString to convert to and from const char* and
510       QCStrings. If c is 0 (the default), QString assumes Latin-1.
511
512       Warning: Some codecs do not preserve the characters in the ascii range
513       (0x00 to 0x7f). For example, the Japanese Shift-JIS encoding maps the
514       backslash character (0x5a) to the Yen character. This leads to
515       unexpected results when using the backslash character to escape
516       characters in strings used in e.g. regular expressions. Use
517       QString::fromLatin1() to preserve characters in the ascii range when
518       needed.
519
520       See also codecForCStrings() and setCodecForTr().
521

void QTextCodec::setCodecForLocale ( QTextCodec * c ) [static]

523       Set the codec to c; this will be returned by codecForLocale(). This
524       might be needed for some applications that want to use their own
525       mechanism for setting the locale.
526
527       See also codecForLocale().
528

void QTextCodec::setCodecForTr ( QTextCodec * c ) [static]

530       Warning: This function is not reentrant.</p>
531
532       Sets the codec used by QObject::tr() on its argument to c. If c is 0
533       (the default), tr() assumes Latin-1.
534
535       If the literal quoted text in the program is not in the Latin-1
536       encoding, this function can be used to set the appropriate encoding.
537       For example, software developed by Korean programmers might use eucKR
538       for all the text in the program, in which case the main() function
539       might look like this:
540
541           int main(int argc, char** argv)
542           {
543               QApplication app(argc, argv);
544               ... install any additional codecs ...
545               QTextCodec::setCodecForTr( QTextCodec::codecForName("eucKR") );
546               ...
547           }
548
549       Note that this is not the way to select the encoding that the user has
550       chosen. For example, to convert an application containing literal
551       English strings to Korean, all that is needed is for the English
552       strings to be passed through tr() and for translation files to be
553       loaded. For details of internationalization, see the Qt
554       internationalization documentation.
555
556       See also codecForTr() and setCodecForCStrings().
557

int QTextCodec::simpleHeuristicNameMatch ( const char * name, const char *

559       hint ) [static protected]
560       A simple utility function for heuristicNameMatch(): it does some very
561       minor character-skipping so that almost-exact matches score high. name
562       is the text we're matching and hint is used for the comparison.
563

QString QTextCodec::toUnicode ( const char * chars, int len ) const [virtual]

565       QTextCodec subclasses must reimplement this function or makeDecoder().
566       It converts the first len characters of chars to Unicode.
567
568       The default implementation makes a decoder with makeDecoder() and
569       converts the input with that. Note that the default makeDecoder()
570       implementation makes a decoder that simply calls this function, hence
571       subclasses must reimplement one function or the other to avoid infinite
572       recursion.
573

QString QTextCodec::toUnicode ( const QByteArray & a, int len ) const

575       This is an overloaded member function, provided for convenience. It
576       behaves essentially like the above function.
577
578       a contains the source characters; len contains the number of characters
579       in a to use.
580

QString QTextCodec::toUnicode ( const QByteArray & a ) const

582       This is an overloaded member function, provided for convenience. It
583       behaves essentially like the above function.
584
585       a contains the source characters.
586

QString QTextCodec::toUnicode ( const QCString & a, int len ) const

588       This is an overloaded member function, provided for convenience. It
589       behaves essentially like the above function.
590
591       a contains the source characters; len contains the number of characters
592       in a to use.
593

QString QTextCodec::toUnicode ( const QCString & a ) const

595       This is an overloaded member function, provided for convenience. It
596       behaves essentially like the above function.
597
598       a contains the source characters.
599

QString QTextCodec::toUnicode ( const char * chars ) const

601       This is an overloaded member function, provided for convenience. It
602       behaves essentially like the above function.
603
604       chars contains the source characters.
605
606

COPYRIGHT

612       Copyright 1992-2007 Trolltech ASA, http://www.trolltech.com.  See the
613       license file included in the distribution for a complete license
614       statement.
615

AUTHOR

617       Generated automatically from the source code.
618

BUGS

620       If you find a bug in Qt, please report it as described in
621       http://doc.trolltech.com/bughowto.html.  Good bug reports help us to
622       help you. Thank you.
623
624       The definitive Qt documentation is provided in HTML format; it is
625       located at $QTDIR/doc/html and can be read using Qt Assistant or with a
626       web browser. This man page is provided as a convenience for those users
627       who prefer man pages, although this format is not officially supported
628       by Trolltech.
629
630       If you find errors in this manual page, please report them to qt-
631       bugs@trolltech.com.  Please include the name of the manual page
632       (qtextcodec.3qt) and the Qt version (3.3.8).
633
634
635
636Trolltech AS                    2 February 2007                QTextCodec(3qt)