QRegExp(3qt)

1QRegExp(3qt)                                                      QRegExp(3qt)
2
3
4

NAME

6       QRegExp - Pattern matching using regular expressions
7

SYNOPSIS

9       All the functions in this class are reentrant when Qt is built with
10       thread support.</p>
11
12       #include <qregexp.h>
13
14   Public Members
15       enum CaretMode { CaretAtZero, CaretAtOffset, CaretWontMatch }
16       QRegExp ()
17       QRegExp ( const QString & pattern, bool caseSensitive = TRUE, bool
18           wildcard = FALSE )
19       QRegExp ( const QRegExp & rx )
20       ~QRegExp ()
21       QRegExp & operator= ( const QRegExp & rx )
22       bool operator== ( const QRegExp & rx ) const
23       bool operator!= ( const QRegExp & rx ) const
24       bool isEmpty () const
25       bool isValid () const
26       QString pattern () const
27       void setPattern ( const QString & pattern )
28       bool caseSensitive () const
29       void setCaseSensitive ( bool sensitive )
30       bool wildcard () const
31       void setWildcard ( bool wildcard )
32       bool minimal () const
33       void setMinimal ( bool minimal )
34       bool exactMatch ( const QString & str ) const
35       int match ( const QString & str, int index = 0, int * len = 0, bool
36           indexIsStart = TRUE ) const  (obsolete)
37       int search ( const QString & str, int offset = 0, CaretMode caretMode =
38           CaretAtZero ) const
39       int searchRev ( const QString & str, int offset = -1, CaretMode
40           caretMode = CaretAtZero ) const
41       int matchedLength () const
42       int numCaptures () const
43       QStringList capturedTexts ()
44       QString cap ( int nth = 0 )
45       int pos ( int nth = 0 )
46       QString errorString ()
47
48   Static Public Members
49       QString escape ( const QString & str )
50

DESCRIPTION

52       The QRegExp class provides pattern matching using regular expressions.
53
54       Regular expressions, or "regexps", provide a way to find patterns
55       within text. This is useful in many contexts, for example:
56
57       <center>.nf
58
59       </center>
60
61       We present a very brief introduction to regexps, a description of Qt's
62       regexp language, some code examples, and finally the function
63       documentation itself. QRegExp is modeled on Perl's regexp language, and
64       also fully supports Unicode. QRegExp can also be used in the weaker
65       'wildcard' (globbing) mode which works in a similar way to command
66       shells. A good text on regexps is Mastering Regular Expressions:
67       Powerful Techniques for Perl and Other Tools by Jeffrey E. Friedl, ISBN
68       1565922573.
69
70       Experienced regexp users may prefer to skip the introduction and go
71       directly to the relevant information.
72
73       In case of multi-threaded programming, note that QRegExp depends on
74       QThreadStorage internally. For that reason, QRegExp should only be used
75       with threads started with QThread, i.e. not with threads started with
76       platform-specific APIs.
77
78       Introduction
79
80       Characters and Abbreviations for Sets of Characters
81
82       Sets of Characters
83
84       Quantifiers
85
86       Capturing Text
87
88       Assertions
89
90       Wildcard Matching (globbing)
91
92       Notes for Perl Users
93
94        Code Examples
95
96

Introduction

98       Regexps are built up from expressions, quantifiers, and assertions. The
99       simplest form of expression is simply a character, e.g. x or 5. An
100       expression can also be a set of characters. For example, [ABCD], will
101       match an A or a B or a C or a D. As a shorthand we could write this as
102       [A-D]. If we want to match any of the captital letters in the English
103       alphabet we can write [A-Z]. A quantifier tells the regexp engine how
104       many occurrences of the expression we want, e.g. x{1,1} means match an
105       x which occurs at least once and at most once. We'll look at assertions
106       and more complex expressions later.
107
108       Note that in general regexps cannot be used to check for balanced
109       brackets or tags. For example if you want to match an opening html <b>
110       and its closing </b> you can only use a regexp if you know that these
111       tags are not nested; the html fragment, <b>bold <b>bolder</b></b> will
112       not match as expected. If you know the maximum level of nesting it is
113       possible to create a regexp that will match correctly, but for an
114       unknown level of nesting, regexps will fail.
115
116       We'll start by writing a regexp to match integers in the range 0 to 99.
117       We will require at least one digit so we will start with [0-9]{1,1}
118       which means match a digit exactly once. This regexp alone will match
119       integers in the range 0 to 9. To match one or two digits we can
120       increase the maximum number of occurrences so the regexp becomes
121       [0-9]{1,2} meaning match a digit at least once and at most twice.
122       However, this regexp as it stands will not match correctly. This regexp
123       will match one or two digits within a string. To ensure that we match
124       against the whole string we must use the anchor assertions. We need ^
125       (caret) which when it is the first character in the regexp means that
126       the regexp must match from the beginning of the string. And we also
127       need $ (dollar) which when it is the last character in the regexp means
128       that the regexp must match until the end of the string. So now our
129       regexp is ^[0-9]{1,2}$. Note that assertions, such as ^ and $, do not
130       match any characters.
131
132       If you've seen regexps elsewhere they may have looked different from
133       the ones above. This is because some sets of characters and some
134       quantifiers are so common that they have special symbols to represent
135       them. [0-9] can be replaced with the symbol \d. The quantifier to match
136       exactly one occurrence, {1,1}, can be replaced with the expression
137       itself. This means that x{1,1} is exactly the same as x alone. So our 0
138       to 99 matcher could be written ^\d{1,2}$. Another way of writing it
139       would be ^\d\d{0,1}$, i.e. from the start of the string match a digit
140       followed by zero or one digits. In practice most people would write it
141       ^\d\d?$. The ? is a shorthand for the quantifier {0,1}, i.e. a minimum
142       of no occurrences a maximum of one occurrence. This is used to make an
143       expression optional. The regexp ^\d\d?$ means "from the beginning of
144       the string match one digit followed by zero or one digits and then the
145       end of the string".
146
147       Our second example is matching the words 'mail', 'letter' or
148       'correspondence' but without matching 'email', 'mailman', 'mailer',
149       'letterbox' etc. We'll start by just matching 'mail'. In full the
150       regexp is, m{1,1}a{1,1}i{1,1}l{1,1}, but since each expression itself
151       is automatically quantified by {1,1} we can simply write this as mail;
152       an 'm' followed by an 'a' followed by an 'i' followed by an 'l'. The
153       symbol '|' (bar) is used for alternation, so our regexp now becomes
154       mail|letter|correspondence which means match 'mail' or 'letter' or
155       'correspondence'. Whilst this regexp will find the words we want it
156       will also find words we don't want such as 'email'. We will start by
157       putting our regexp in parentheses, (mail|letter|correspondence).
158       Parentheses have two effects, firstly they group expressions together
159       and secondly they identify parts of the regexp that we wish to capture.
160       Our regexp still matches any of the three words but now they are
161       grouped together as a unit. This is useful for building up more complex
162       regexps. It is also useful because it allows us to examine which of the
163       words actually matched. We need to use another assertion, this time \b
164       "word boundary": \b(mail|letter|correspondence)\b. This regexp means
165       "match a word boundary followed by the expression in parentheses
166       followed by another word boundary". The \b assertion matches at a
167       position in the regexp not a character in the regexp. A word boundary
168       is any non-word character such as a space a newline or the beginning or
169       end of the string.
170
171       For our third example we want to replace ampersands with the HTML
172       entity '&amp;'. The regexp to match is simple: &, i.e. match one
173       ampersand. Unfortunately this will mess up our text if some of the
174       ampersands have already been turned into HTML entities. So what we
175       really want to say is replace an ampersand providing it is not followed
176       by 'amp;'. For this we need the negative lookahead assertion and our
177       regexp becomes: &(?!amp;). The negative lookahead assertion is
178       introduced with '(?!' and finishes at the ')'. It means that the text
179       it contains, 'amp;' in our example, must not follow the expression that
180       preceeds it.
181
182       Regexps provide a rich language that can be used in a variety of ways.
183       For example suppose we want to count all the occurrences of 'Eric' and
184       'Eirik' in a string. Two valid regexps to match these are
185       &#92;b(Eric|Eirik)&#92;b and &#92;bEi?ri[ck]&#92;b. We need the word
186       boundary '\b' so we don't get 'Ericsson' etc. The second regexp
187       actually matches more than we want, 'Eric', 'Erik', 'Eiric' and
188       'Eirik'.
189
190       We will implement some the examples above in the code examples section.
191

Characters and Abbreviations for Sets of Characters

193       <center>.nf
194
195       Element
196       ───────────────────────────────────────────────────────────────
197
198
199       regexp meaning. Thus
200
201       itself except where mentioned below. For example if you
202       wished to match a literal caret at the beginning of a string
203       you would write
204
205
206
207
208
209
210
211       hexadecimal number hhhh (between 0x0000 and 0xFFFF). &#92;0ooo
212       (i.e., \zero ooo) matches the ASCII/Latin-1 character
213       corresponding to the octal number ooo (between 0 and 0377).
214
215
216
217
218
219
220
221
222       </center>
223
224       Note that the C++ compiler transforms backslashes in strings so to
225       include a &#92; in a regexp you will need to enter it twice, i.e.
226       &#92;&#92;.
227

Sets of Characters

229       Square brackets are used to match any character in the set of
230       characters contained within the square brackets. All the character set
231       abbreviations described above can be used within square brackets. Apart
232       from the character set abbreviations and the following two exceptions
233       no characters have special meanings in square brackets.
234
235       <center>.nf
236
237       </center>
238
239       Using the predefined character set abbreviations is more portable than
240       using character ranges across platforms and languages. For example,
241       [0-9] matches a digit in Western alphabets but \d matches a digit in
242       any alphabet.
243
244       Note that in most regexp literature sets of characters are called"
245       character classes".
246

Quantifiers

248       By default an expression is automatically quantified by {1,1}, i.e. it
249       should occur exactly once. In the following list E stands for any
250       expression. An expression is a character or an abbreviation for a set
251       of characters or a set of characters in square brackets or any
252       parenthesised expression.
253
254       <center>.nf
255
256
257       ─────────────────────────────────────────────────────────────
258       means "the previous expression is optional" since it will
259       match whether or not the expression occurs in the string. It
260       is the same as
261
262       as
263
264
265       as
266
267       is the same as repeating the expression n times. For
268       example,
269
270       is the same as
271
272       is the same as
273
274       </center>
275
276       (MAXINT is implementation dependent but will not be smaller than 1024.)
277
278       If we wish to apply a quantifier to more than just the preceding
279       character we can use parentheses to group characters together in an
280       expression. For example, tag+ matches a 't' followed by an 'a' followed
281       by at least one 'g', whereas (tag)+ matches at least one occurrence of
282       'tag'.
283
284       Note that quantifiers are "greedy". They will match as much text as
285       they can. For example, 0+ will match as many zeros as it can from the
286       first zero it finds, e.g. '2.<u>000</u>5'. Quantifiers can be made non-
287       greedy, see setMinimal().
288

Capturing Text

290       Parentheses allow us to group elements together so that we can quantify
291       and capture them. For example if we have the expression
292       mail|letter|correspondence that matches a string we know that one of
293       the words matched but not which one. Using parentheses allows us to
294       "capture" whatever is matched within their bounds, so if we used
295       (mail|letter|correspondence) and matched this regexp against the string
296       "I sent you some email" we can use the cap() or capturedTexts()
297       functions to extract the matched characters, in this case 'mail'.
298
299       We can use captured text within the regexp itself. To refer to the
300       captured text we use backreferences which are indexed from 1, the same
301       as for cap(). For example we could search for duplicate words in a
302       string using \b(\w+)\W+&#92;1\b which means match a word boundary
303       followed by one or more word characters followed by one or more non-
304       word characters followed by the same text as the first parenthesised
305       expression followed by a word boundary.
306
307       If we want to use parentheses purely for grouping and not for capturing
308       we can use the non-capturing syntax, e.g. (?:green|blue). Non-capturing
309       parentheses begin '(?:' and end ')'. In this example we match either
310       'green' or 'blue' but we do not capture the match so we only know
311       whether or not we matched but not which color we actually found. Using
312       non-capturing parentheses is more efficient than using capturing
313       parentheses since the regexp engine has to do less book-keeping.
314
315       Both capturing and non-capturing parentheses may be nested.
316

Assertions

318       Assertions make some statement about the text at the point where they
319       occur in the regexp but they do not match any characters. In the
320       following list E stands for any expression.
321
322       <center>.nf
323
324       </center>
325

Wildcard Matching (globbing)

327       Most command shells such as bash or cmd.exe support "file globbing",
328       the ability to identify a group of files by using wildcards. The
329       setWildcard() function is used to switch between regexp and wildcard
330       mode. Wildcard matching is much simpler than full regexps and has only
331       four features:
332
333       <center>.nf
334
335
336       ────────────
337       below. Thus
338
339
340       same as
341
342       </center>
343
344       For example if we are in wildcard mode and have strings which contain
345       filenames we could identify HTML files with *.html. This will match
346       zero or more characters followed by a dot followed by 'h', 't', 'm' and
347       'l'.
348

Notes for Perl Users

350       Most of the character class abbreviations supported by Perl are
351       supported by QRegExp, see characters and abbreviations for sets of
352       characters.
353
354       In QRegExp, apart from within character classes, ^ always signifies the
355       start of the string, so carets must always be escaped unless used for
356       that purpose. In Perl the meaning of caret varies automagically
357       depending on where it occurs so escaping it is rarely necessary. The
358       same applies to $ which in QRegExp always signifies the end of the
359       string.
360
361       QRegExp's quantifiers are the same as Perl's greedy quantifiers. Non-
362       greedy matching cannot be applied to individual quantifiers, but can be
363       applied to all the quantifiers in the pattern. For example, to match
364       the Perl regexp ro+?m requires:
365
366           QRegExp rx( "ro+m" );
367           rx.setMinimal( TRUE );
368
369       The equivalent of Perl's /i option is setCaseSensitive(FALSE).
370
371       Perl's /g option can be emulated using a loop.
372
373       In QRegExp . matches any character, therefore all QRegExp regexps have
374       the equivalent of Perl's /s option. QRegExp does not have an equivalent
375       to Perl's /m option, but this can be emulated in various ways for
376       example by splitting the input into lines or by looping with a regexp
377       that searches for newlines.
378
379       Because QRegExp is string oriented there are no \A, \Z or \z
380       assertions. The \G assertion is not supported but can be emulated in a
381       loop.
382
383       Perl's $& is cap(0) or capturedTexts()[0]. There are no QRegExp
384       equivalents for $`, $' or $+. Perl's capturing variables, $1, $2,
385       capturedTexts()[2], etc.
386
387       To substitute a pattern use QString::replace().
388
389       Perl's extended /x syntax is not supported, nor are directives, e.g.
390       (?i), or regexp comments, e.g. (?#comment). On the other hand, C++'s
391       rules for literal strings can be used to achieve the same:
392
393           QRegExp mark( "\\b" // word boundary
394                         "[Mm]ark" // the word we want to match
395                       );
396
397       Both zero-width positive and zero-width negative lookahead assertions
398       (?=pattern) and (?!pattern) are supported with the same syntax as Perl.
399       Perl's lookbehind assertions, "independent" subexpressions and
400       conditional expressions are not supported.
401
402       Non-capturing parentheses are also supported, with the same (?:pattern)
403       syntax.
404
405       See QStringList::split() and QStringList::join() for equivalents to
406       Perl's split and join functions.
407
408       Note: because C++ transforms &#92;'s they must be written twice in
409       code, e.g. &#92;b must be written &#92;&#92;b.
410

Code Examples

412           QRegExp rx( "^\\d\\d?$" );  // match integers 0 to 99
413           rx.search( "123" );         // returns -1 (no match)
414           rx.search( "-6" );          // returns -1 (no match)
415           rx.search( "6" );           // returns 0 (matched as position 0)
416
417       The third string matches '<u>6</u>'. This is a simple validation regexp
418       for integers in the range 0 to 99.
419
420           QRegExp rx( "^\\S+$" );     // match strings without whitespace
421           rx.search( "Hello world" ); // returns -1 (no match)
422           rx.search( "This_is-OK" );  // returns 0 (matched at position 0)
423
424       The second string matches '<u>This_is-OK</u>'. We've used the character
425       set abbreviation '\S' (non-whitespace) and the anchors to match strings
426       which contain no whitespace.
427
428       In the following example we match strings containing 'mail' or 'letter'
429       or 'correspondence' but only match whole words i.e. not 'email'
430
431           QRegExp rx( "\\b(mail|letter|correspondence)\\b" );
432           rx.search( "I sent you an email" );     // returns -1 (no match)
433           rx.search( "Please write the letter" ); // returns 17
434
435       The second string matches "Please write the <u>letter</u>". The word
436       'letter' is also captured (because of the parentheses). We can see what
437       text we've captured like this:
438
439           QString captured = rx.cap( 1 ); // captured == "letter"
440
441       This will capture the text from the first set of capturing parentheses
442       (counting capturing left parentheses from left to right). The
443       parentheses are counted from 1 since cap( 0 ) is the whole matched
444       regexp (equivalent to '&' in most regexp engines).
445
446           QRegExp rx( "&(?!amp;)" );      // match ampersands but not &amp;
447           QString line1 = "This & that";
448           line1.replace( rx, "&amp;" );
449           // line1 == "This &amp; that"
450           QString line2 = "His &amp; hers & theirs";
451           line2.replace( rx, "&amp;" );
452           // line2 == "His &amp; hers &amp; theirs"
453
454       Here we've passed the QRegExp to QString's replace() function to
455       replace the matched text with new text.
456
457           QString str = "One Eric another Eirik, and an Ericsson."
458                           " How many Eiriks, Eric?";
459           QRegExp rx( "\\b(Eric|Eirik)\\b" ); // match Eric or Eirik
460           int pos = 0;    // where we are in the string
461           int count = 0;  // how many Eric and Eirik's we've counted
462           while ( pos >= 0 ) {
463               pos = rx.search( str, pos );
464               if ( pos >= 0 ) {
465                   pos++;      // move along in str
466                   count++;    // count our Eric or Eirik
467               }
468           }
469
470       We've used the search() function to repeatedly match the regexp in the
471       string. Note that instead of moving forward by one character at a time
472       pos++ we could have written pos += rx.matchedLength() to skip over the
473       already matched string. The count will equal 3, matching 'One
474       <u>Eric</u> another <u>Eirik</u>, and an Ericsson. How many Eiriks,
475       <u>Eric</u>?'; it doesn't match 'Ericsson' or 'Eiriks' because they are
476       not bounded by non-word boundaries.
477
478       One common use of regexps is to split lines of delimited data into
479       their component fields.
480
481           str = "Trolltech AS\twww.trolltech.com\tNorway";
482           QString company, web, country;
483           rx.setPattern( "^([^\t]+)\t([^\t]+)\t([^\t]+)$" );
484           if ( rx.search( str ) != -1 ) {
485               company = rx.cap( 1 );
486               web = rx.cap( 2 );
487               country = rx.cap( 3 );
488           }
489
490       In this example our input lines have the format company name, web
491       address and country. Unfortunately the regexp is rather long and not
492       very versatile -- the code will break if we add any more fields. A
493       simpler and better solution is to look for the separator, '\t' in this
494       case, and take the surrounding text. The QStringList split() function
495       can take a separator string or regexp as an argument and split a string
496       accordingly.
497
498           QStringList field = QStringList::split( "\t", str );
499
500       Here field[0] is the company, field[1] the web address and so on.
501
502       To imitate the matching of a shell we can use wildcard mode.
503
504           QRegExp rx( "*.html" );         // invalid regexp: * doesn't quantify anything
505           rx.setWildcard( TRUE );         // now it's a valid wildcard regexp
506           rx.exactMatch( "index.html" );  // returns TRUE
507           rx.exactMatch( "default.htm" ); // returns FALSE
508           rx.exactMatch( "readme.txt" );  // returns FALSE
509
510       Wildcard matching can be convenient because of its simplicity, but any
511       wildcard regexp can be defined using full regexps, e.g. .*&#92;.html$.
512       Notice that we can't match both .html and .htm files with a wildcard
513       unless we use *.htm* which will also match 'test.html.bak'. A full
514       regexp gives us the precision we need, .*&#92;.html?$.
515
516       QRegExp can match case insensitively using setCaseSensitive(), and can
517       use non-greedy matching, see setMinimal(). By default QRegExp uses full
518       regexps but this can be changed with setWildcard(). Searching can be
519       forward with search() or backward with searchRev(). Captured text can
520       be accessed using capturedTexts() which returns a string list of all
521       captured strings, or using cap() which returns the captured string for
522       the given index. The pos() function takes a match index and returns the
523       position in the string where the match was made (or -1 if there was no
524       match).
525
526       See also QRegExpValidator, QString, QStringList, Miscellaneous Classes,
527       Implicitly and Explicitly Shared Classes, and Non-GUI Classes.
528
529   Member Type Documentation

QRegExp::CaretMode

531       The CaretMode enum defines the different meanings of the caret (^) in a
532       regular expression. The possible values are:
533
534       QRegExp::CaretAtZero - The caret corresponds to index 0 in the searched
535       string.
536
537       QRegExp::CaretAtOffset - The caret corresponds to the start offset of
538       the search.
539
540       QRegExp::CaretWontMatch - The caret never matches.
541

MEMBER FUNCTION DOCUMENTATION

QRegExp::QRegExp ()

544       Constructs an empty regexp.
545
546       See also isValid() and errorString().
547

QRegExp::QRegExp ( const QString & pattern, bool caseSensitive = TRUE, bool

549       wildcard = FALSE )
550       Constructs a regular expression object for the given pattern string.
551       The pattern must be given using wildcard notation if wildcard is TRUE
552       (default is FALSE). The pattern is case sensitive, unless caseSensitive
553       is FALSE. Matching is greedy (maximal), but can be changed by calling
554       setMinimal().
555
556       See also setPattern(), setCaseSensitive(), setWildcard(), and
557       setMinimal().
558

QRegExp::QRegExp ( const QRegExp & rx )

560       Constructs a regular expression as a copy of rx.
561
562       See also operator=().
563

QRegExp::~QRegExp ()

565       Destroys the regular expression and cleans up its internal data.
566

QString QRegExp::cap ( int nth = 0 )

568       Returns the text captured by the nth subexpression. The entire match
569       has index 0 and the parenthesized subexpressions have indices starting
570       from 1 (excluding non-capturing parentheses).
571
572           QRegExp rxlen( "(\\d+)(?:\\s*)(cm|inch)" );
573           int pos = rxlen.search( "Length: 189cm" );
574           if ( pos > -1 ) {
575               QString value = rxlen.cap( 1 ); // "189"
576               QString unit = rxlen.cap( 2 );  // "cm"
577               // ...
578           }
579
580       The order of elements matched by cap() is as follows. The first
581       element, cap(0), is the entire matching string. Each subsequent element
582       corresponds to the next capturing open left parentheses. Thus cap(1) is
583       the text of the first capturing parentheses, cap(2) is the text of the
584       second, and so on.
585
586       Some patterns may lead to a number of matches which cannot be
587       determined in advance, for example:
588
589           QRegExp rx( "(\\d+)" );
590           str = "Offsets: 12 14 99 231 7";
591           QStringList list;
592           pos = 0;
593           while ( pos >= 0 ) {
594               pos = rx.search( str, pos );
595               if ( pos > -1 ) {
596                   list += rx.cap( 1 );
597                   pos  += rx.matchedLength();
598               }
599           }
600           // list contains "12", "14", "99", "231", "7"
601
602       See also capturedTexts(), pos(), exactMatch(), search(), and
603       searchRev().
604
605       Examples:
606

QStringList QRegExp::capturedTexts ()

608       Returns a list of the captured text strings.
609
610       The first string in the list is the entire matched string. Each
611       subsequent list element contains a string that matched a (capturing)
612       subexpression of the regexp.
613
614       For example:
615
616               QRegExp rx( "(\\d+)(\\s*)(cm|inch(es)?)" );
617               int pos = rx.search( "Length: 36 inches" );
618               QStringList list = rx.capturedTexts();
619               // list is now ( "36 inches", "36", " ", "inches", "es" )
620
621       The above example also captures elements that may be present but which
622       we have no interest in. This problem can be solved by using non-
623       capturing parentheses:
624
625               QRegExp rx( "(\\d+)(?:\\s*)(cm|inch(?:es)?)" );
626               int pos = rx.search( "Length: 36 inches" );
627               QStringList list = rx.capturedTexts();
628               // list is now ( "36 inches", "36", "inches" )
629
630       Note that if you want to iterate over the list, you should iterate over
631       a copy, e.g.
632
633               QStringList list = rx.capturedTexts();
634               QStringList::Iterator it = list.begin();
635               while( it != list.end() ) {
636                   myProcessing( *it );
637                   ++it;
638               }
639
640       Some regexps can match an indeterminate number of times. For example if
641       the input string is "Offsets: 12 14 99 231 7" and the regexp, rx, is
642       (&#92;d+)+, we would hope to get a list of all the numbers matched.
643       However, after calling rx.search(str), capturedTexts() will return the
644       list ( "12"," 12" ), i.e. the entire match was "12" and the first
645       subexpression matched was "12". The correct approach is to use cap() in
646       a loop.
647
648       The order of elements in the string list is as follows. The first
649       element is the entire matching string. Each subsequent element
650       corresponds to the next capturing open left parentheses. Thus
651       capturedTexts()[1] is the text of the first capturing parentheses,
652       capturedTexts()[2] is the text of the second and so on (corresponding
653       to $1, $2, etc., in some other regexp languages).
654
655       See also cap(), pos(), exactMatch(), search(), and searchRev().
656

bool QRegExp::caseSensitive () const

658       Returns TRUE if case sensitivity is enabled; otherwise returns FALSE.
659       The default is TRUE.
660
661       See also setCaseSensitive().
662

QString QRegExp::errorString ()

664       Returns a text string that explains why a regexp pattern is invalid the
665       case being; otherwise returns "no error occurred".
666
667       See also isValid().
668
669       Example: regexptester/regexptester.cpp.
670

QString QRegExp::escape ( const QString & str ) [static]

672       Returns the string str with every regexp special character escaped with
673       a backslash. The special characters are $, (, ), *, +,
674
675       Example:
676
677            s1 = QRegExp::escape( "bingo" );   // s1 == "bingo"
678            s2 = QRegExp::escape( "f(x)" );    // s2 == "f\\(x\\)"
679
680       This function is useful to construct regexp patterns dynamically:
681
682           QRegExp rx( "(" + QRegExp::escape(name) +
683                       "|" + QRegExp::escape(alias) + ")" );
684

bool QRegExp::exactMatch ( const QString & str ) const

686       Returns TRUE if str is matched exactly by this regular expression;
687       otherwise returns FALSE. You can determine how much of the string was
688       matched by calling matchedLength().
689
690       For a given regexp string, R, exactMatch("R") is the equivalent of
691       search("^R$") since exactMatch() effectively encloses the regexp in the
692       start of string and end of string anchors, except that it sets
693       matchedLength() differently.
694
695       For example, if the regular expression is blue, then exactMatch()
696       returns TRUE only for input blue. For inputs bluebell, blutak and
697       lightblue, exactMatch() returns FALSE and matchedLength() will return
698       4, 3 and 0 respectively.
699
700       Although const, this function sets matchedLength(), capturedTexts() and
701       pos().
702
703       See also search(), searchRev(), and QRegExpValidator.
704

bool QRegExp::isEmpty () const

706       Returns TRUE if the pattern string is empty; otherwise returns FALSE.
707
708       If you call exactMatch() with an empty pattern on an empty string it
709       will return TRUE; otherwise it returns FALSE since it operates over the
710       whole string. If you call search() with an empty pattern on any string
711       it will return the start offset (0 by default) because the empty
712       pattern matches the 'emptiness' at the start of the string. In this
713       case the length of the match returned by matchedLength() will be 0.
714
715       See QString::isEmpty().
716

bool QRegExp::isValid () const

718       Returns TRUE if the regular expression is valid; otherwise returns
719       FALSE. An invalid regular expression never matches.
720
721       The pattern [a-z is an example of an invalid pattern, since it lacks a
722       closing square bracket.
723
724       Note that the validity of a regexp may also depend on the setting of
725       the wildcard flag, for example *.html is a valid wildcard regexp but an
726       invalid full regexp.
727
728       See also errorString().
729
730       Example: regexptester/regexptester.cpp.
731

int QRegExp::match ( const QString & str, int index = 0, int * len = 0, bool

733       indexIsStart = TRUE ) const
734       This function is obsolete. It is provided to keep old source working.
735       We strongly advise against using it in new code.
736
737       Attempts to match in str, starting from position index. Returns the
738       position of the match, or -1 if there was no match.
739
740       The length of the match is stored in *len, unless len is a null
741       pointer.
742
743       If indexIsStart is TRUE (the default), the position index in the string
744       will match the start of string anchor, ^, in the regexp, if present.
745       Otherwise, position 0 in str will match.
746
747       Use search() and matchedLength() instead of this function.
748
749       See also QString::mid() and QConstString.
750
751       Example: qmag/qmag.cpp.
752

int QRegExp::matchedLength () const

754       Returns the length of the last matched string, or -1 if there was no
755       match.
756
757       See also exactMatch(), search(), and searchRev().
758
759       Examples:
760

bool QRegExp::minimal () const

762       Returns TRUE if minimal (non-greedy) matching is enabled; otherwise
763       returns FALSE.
764
765       See also setMinimal().
766

int QRegExp::numCaptures () const

768       Returns the number of captures contained in the regular expression.
769
770       Example: regexptester/regexptester.cpp.
771

bool QRegExp::operator!= ( const QRegExp & rx ) const

773       Returns TRUE if this regular expression is not equal to rx; otherwise
774       returns FALSE.
775
776       See also operator==().
777

QRegExp & QRegExp::operator= ( const QRegExp & rx )

779       Copies the regular expression rx and returns a reference to the copy.
780       The case sensitivity, wildcard and minimal matching options are also
781       copied.
782

bool QRegExp::operator== ( const QRegExp & rx ) const

784       Returns TRUE if this regular expression is equal to rx; otherwise
785       returns FALSE.
786
787       Two QRegExp objects are equal if they have the same pattern strings and
788       the same settings for case sensitivity, wildcard and minimal matching.
789

QString QRegExp::pattern () const

791       Returns the pattern string of the regular expression. The pattern has
792       either regular expression syntax or wildcard syntax, depending on
793       wildcard().
794
795       See also setPattern().
796

int QRegExp::pos ( int nth = 0 )

798       Returns the position of the nth captured text in the searched string.
799       If nth is 0 (the default), pos() returns the position of the whole
800       match.
801
802       Example:
803
804           QRegExp rx( "/([a-z]+)/([a-z]+)" );
805           rx.search( "Output /dev/null" );    // returns 7 (position of /dev/null)
806           rx.pos( 0 );                        // returns 7 (position of /dev/null)
807           rx.pos( 1 );                        // returns 8 (position of dev)
808           rx.pos( 2 );                        // returns 12 (position of null)
809
810       For zero-length matches, pos() always returns -1. (For example, if
811       cap(4) would return an empty string, pos(4) returns -1.) This is due to
812       an implementation tradeoff.
813
814       See also capturedTexts(), exactMatch(), search(), and searchRev().
815

int QRegExp::search ( const QString & str, int offset = 0, CaretMode caretMode

817       = CaretAtZero ) const
818       Attempts to find a match in str from position offset (0 by default). If
819       offset is -1, the search starts at the last character; if -2, at the
820       next to last character; etc.
821
822       Returns the position of the first match, or -1 if there was no match.
823
824       The caretMode parameter can be used to instruct whether ^ should match
825       at index 0 or at offset.
826
827       You might prefer to use QString::find(), QString::contains() or even
828       QStringList::grep(). To replace matches use QString::replace().
829
830       Example:
831
832               QString str = "offsets: 1.23 .50 71.00 6.00";
833               QRegExp rx( "\\d*\\.\\d+" );    // primitive floating point matching
834               int count = 0;
835               int pos = 0;
836               while ( (pos = rx.search(str, pos)) != -1 ) {
837                   count++;
838                   pos += rx.matchedLength();
839               }
840               // pos will be 9, 14, 18 and finally 24; count will end up as 4
841
842       Although const, this function sets matchedLength(), capturedTexts() and
843       pos().
844
845       See also searchRev() and exactMatch().
846
847       Examples:
848

int QRegExp::searchRev ( const QString & str, int offset = -1, CaretMode

850       caretMode = CaretAtZero ) const
851       Attempts to find a match backwards in str from position offset. If
852       offset is -1 (the default), the search starts at the last character; if
853       -2, at the next to last character; etc.
854
855       Returns the position of the first match, or -1 if there was no match.
856
857       The caretMode parameter can be used to instruct whether ^ should match
858       at index 0 or at offset.
859
860       Although const, this function sets matchedLength(), capturedTexts() and
861       pos().
862
863       Warning: Searching backwards is much slower than searching forwards.
864
865       See also search() and exactMatch().
866

void QRegExp::setCaseSensitive ( bool sensitive )

868       Sets case sensitive matching to sensitive.
869
870       If sensitive is TRUE, &#92;.txt$ matches readme.txt but not README.TXT.
871
872       See also caseSensitive().
873
874       Example: regexptester/regexptester.cpp.
875

void QRegExp::setMinimal ( bool minimal )

877       Enables or disables minimal matching. If minimal is FALSE, matching is
878       greedy (maximal) which is the default.
879
880       For example, suppose we have the input string "We must be <b>bold</b>,
881       very <b>bold</b>!" and the pattern <b>.*</b>. With the default greedy
882       (maximal) matching, the match is "We must be <u><b>bold</b>, very
883       <b>bold</b></u>!". But with minimal (non-greedy) matching the first
884       match is: "We must be <u><b>bold</b></u>, very <b>bold</b>!" and the
885       second match is "We must be <b>bold</b>, very <u><b>bold</b></u>!". In
886       practice we might use the pattern <b>[^<]+</b> instead, although this
887       will still fail for nested tags.
888
889       See also minimal().
890
891       Examples:
892

void QRegExp::setPattern ( const QString & pattern )

894       Sets the pattern string to pattern. The case sensitivity, wildcard and
895       minimal matching options are not changed.
896
897       See also pattern().
898

void QRegExp::setWildcard ( bool wildcard )

900       Sets the wildcard mode for the regular expression. The default is
901       FALSE.
902
903       Setting wildcard to TRUE enables simple shell-like wildcard matching.
904       (See wildcard matching (globbing).)
905
906       For example, r*.txt matches the string readme.txt in wildcard mode, but
907       does not match readme.
908
909       See also wildcard().
910
911       Example: regexptester/regexptester.cpp.
912

bool QRegExp::wildcard () const

914       Returns TRUE if wildcard mode is enabled; otherwise returns FALSE. The
915       default is FALSE.
916
917       See also setWildcard().
918
919

COPYRIGHT

925       Copyright 1992-2007 Trolltech ASA, http://www.trolltech.com.  See the
926       license file included in the distribution for a complete license
927       statement.
928

AUTHOR

930       Generated automatically from the source code.
931

BUGS

933       If you find a bug in Qt, please report it as described in
934       http://doc.trolltech.com/bughowto.html.  Good bug reports help us to
935       help you. Thank you.
936
937       The definitive Qt documentation is provided in HTML format; it is
938       located at $QTDIR/doc/html and can be read using Qt Assistant or with a
939       web browser. This man page is provided as a convenience for those users
940       who prefer man pages, although this format is not officially supported
941       by Trolltech.
942
943       If you find errors in this manual page, please report them to qt-
944       bugs@trolltech.com.  Please include the name of the manual page
945       (qregexp.3qt) and the Qt version (3.3.8).
946
947
948
949Trolltech AS                    2 February 2007                   QRegExp(3qt)