1QRegExp(3qt) QRegExp(3qt)
2
3
4
6 QRegExp - Pattern matching using regular expressions
7
9 All the functions in this class are reentrant when Qt is built with
10 thread support.</p>
11
12 #include <qregexp.h>
13
14 Public Members
15 enum CaretMode { CaretAtZero, CaretAtOffset, CaretWontMatch }
16 QRegExp ()
17 QRegExp ( const QString & pattern, bool caseSensitive = TRUE, bool
18 wildcard = FALSE )
19 QRegExp ( const QRegExp & rx )
20 ~QRegExp ()
21 QRegExp & operator= ( const QRegExp & rx )
22 bool operator== ( const QRegExp & rx ) const
23 bool operator!= ( const QRegExp & rx ) const
24 bool isEmpty () const
25 bool isValid () const
26 QString pattern () const
27 void setPattern ( const QString & pattern )
28 bool caseSensitive () const
29 void setCaseSensitive ( bool sensitive )
30 bool wildcard () const
31 void setWildcard ( bool wildcard )
32 bool minimal () const
33 void setMinimal ( bool minimal )
34 bool exactMatch ( const QString & str ) const
35 int match ( const QString & str, int index = 0, int * len = 0, bool
36 indexIsStart = TRUE ) const (obsolete)
37 int search ( const QString & str, int offset = 0, CaretMode caretMode =
38 CaretAtZero ) const
39 int searchRev ( const QString & str, int offset = -1, CaretMode
40 caretMode = CaretAtZero ) const
41 int matchedLength () const
42 int numCaptures () const
43 QStringList capturedTexts ()
44 QString cap ( int nth = 0 )
45 int pos ( int nth = 0 )
46 QString errorString ()
47
48 Static Public Members
49 QString escape ( const QString & str )
50
52 The QRegExp class provides pattern matching using regular expressions.
53
54 Regular expressions, or "regexps", provide a way to find patterns
55 within text. This is useful in many contexts, for example:
56
57 <center>.nf
58
59 </center>
60
61 We present a very brief introduction to regexps, a description of Qt's
62 regexp language, some code examples, and finally the function
63 documentation itself. QRegExp is modeled on Perl's regexp language, and
64 also fully supports Unicode. QRegExp can also be used in the weaker
65 'wildcard' (globbing) mode which works in a similar way to command
66 shells. A good text on regexps is Mastering Regular Expressions:
67 Powerful Techniques for Perl and Other Tools by Jeffrey E. Friedl, ISBN
68 1565922573.
69
70 Experienced regexp users may prefer to skip the introduction and go
71 directly to the relevant information.
72
73 In case of multi-threaded programming, note that QRegExp depends on
74 QThreadStorage internally. For that reason, QRegExp should only be used
75 with threads started with QThread, i.e. not with threads started with
76 platform-specific APIs.
77
78 Introduction
79
80 Characters and Abbreviations for Sets of Characters
81
82 Sets of Characters
83
84 Quantifiers
85
86 Capturing Text
87
88 Assertions
89
90 Wildcard Matching (globbing)
91
92 Notes for Perl Users
93
94 Code Examples
95
96
98 Regexps are built up from expressions, quantifiers, and assertions. The
99 simplest form of expression is simply a character, e.g. x or 5. An
100 expression can also be a set of characters. For example, [ABCD], will
101 match an A or a B or a C or a D. As a shorthand we could write this as
102 [A-D]. If we want to match any of the captital letters in the English
103 alphabet we can write [A-Z]. A quantifier tells the regexp engine how
104 many occurrences of the expression we want, e.g. x{1,1} means match an
105 x which occurs at least once and at most once. We'll look at assertions
106 and more complex expressions later.
107
108 Note that in general regexps cannot be used to check for balanced
109 brackets or tags. For example if you want to match an opening html <b>
110 and its closing </b> you can only use a regexp if you know that these
111 tags are not nested; the html fragment, <b>bold <b>bolder</b></b> will
112 not match as expected. If you know the maximum level of nesting it is
113 possible to create a regexp that will match correctly, but for an
114 unknown level of nesting, regexps will fail.
115
116 We'll start by writing a regexp to match integers in the range 0 to 99.
117 We will require at least one digit so we will start with [0-9]{1,1}
118 which means match a digit exactly once. This regexp alone will match
119 integers in the range 0 to 9. To match one or two digits we can
120 increase the maximum number of occurrences so the regexp becomes
121 [0-9]{1,2} meaning match a digit at least once and at most twice.
122 However, this regexp as it stands will not match correctly. This regexp
123 will match one or two digits within a string. To ensure that we match
124 against the whole string we must use the anchor assertions. We need ^
125 (caret) which when it is the first character in the regexp means that
126 the regexp must match from the beginning of the string. And we also
127 need $ (dollar) which when it is the last character in the regexp means
128 that the regexp must match until the end of the string. So now our
129 regexp is ^[0-9]{1,2}$. Note that assertions, such as ^ and $, do not
130 match any characters.
131
132 If you've seen regexps elsewhere they may have looked different from
133 the ones above. This is because some sets of characters and some
134 quantifiers are so common that they have special symbols to represent
135 them. [0-9] can be replaced with the symbol \d. The quantifier to match
136 exactly one occurrence, {1,1}, can be replaced with the expression
137 itself. This means that x{1,1} is exactly the same as x alone. So our 0
138 to 99 matcher could be written ^\d{1,2}$. Another way of writing it
139 would be ^\d\d{0,1}$, i.e. from the start of the string match a digit
140 followed by zero or one digits. In practice most people would write it
141 ^\d\d?$. The ? is a shorthand for the quantifier {0,1}, i.e. a minimum
142 of no occurrences a maximum of one occurrence. This is used to make an
143 expression optional. The regexp ^\d\d?$ means "from the beginning of
144 the string match one digit followed by zero or one digits and then the
145 end of the string".
146
147 Our second example is matching the words 'mail', 'letter' or
148 'correspondence' but without matching 'email', 'mailman', 'mailer',
149 'letterbox' etc. We'll start by just matching 'mail'. In full the
150 regexp is, m{1,1}a{1,1}i{1,1}l{1,1}, but since each expression itself
151 is automatically quantified by {1,1} we can simply write this as mail;
152 an 'm' followed by an 'a' followed by an 'i' followed by an 'l'. The
153 symbol '|' (bar) is used for alternation, so our regexp now becomes
154 mail|letter|correspondence which means match 'mail' or 'letter' or
155 'correspondence'. Whilst this regexp will find the words we want it
156 will also find words we don't want such as 'email'. We will start by
157 putting our regexp in parentheses, (mail|letter|correspondence).
158 Parentheses have two effects, firstly they group expressions together
159 and secondly they identify parts of the regexp that we wish to capture.
160 Our regexp still matches any of the three words but now they are
161 grouped together as a unit. This is useful for building up more complex
162 regexps. It is also useful because it allows us to examine which of the
163 words actually matched. We need to use another assertion, this time \b
164 "word boundary": \b(mail|letter|correspondence)\b. This regexp means
165 "match a word boundary followed by the expression in parentheses
166 followed by another word boundary". The \b assertion matches at a
167 position in the regexp not a character in the regexp. A word boundary
168 is any non-word character such as a space a newline or the beginning or
169 end of the string.
170
171 For our third example we want to replace ampersands with the HTML
172 entity '&'. The regexp to match is simple: &, i.e. match one
173 ampersand. Unfortunately this will mess up our text if some of the
174 ampersands have already been turned into HTML entities. So what we
175 really want to say is replace an ampersand providing it is not followed
176 by 'amp;'. For this we need the negative lookahead assertion and our
177 regexp becomes: &(?!amp;). The negative lookahead assertion is
178 introduced with '(?!' and finishes at the ')'. It means that the text
179 it contains, 'amp;' in our example, must not follow the expression that
180 preceeds it.
181
182 Regexps provide a rich language that can be used in a variety of ways.
183 For example suppose we want to count all the occurrences of 'Eric' and
184 'Eirik' in a string. Two valid regexps to match these are
185 \b(Eric|Eirik)\b and \bEi?ri[ck]\b. We need the word
186 boundary '\b' so we don't get 'Ericsson' etc. The second regexp
187 actually matches more than we want, 'Eric', 'Erik', 'Eiric' and
188 'Eirik'.
189
190 We will implement some the examples above in the code examples section.
191
193 <center>.nf
194
195 Element
196 ───────────────────────────────────────────────────────────────
197
198
199 regexp meaning. Thus
200
201 itself except where mentioned below. For example if you
202 wished to match a literal caret at the beginning of a string
203 you would write
204
205
206
207
208
209
210
211 hexadecimal number hhhh (between 0x0000 and 0xFFFF). \0ooo
212 (i.e., \zero ooo) matches the ASCII/Latin-1 character
213 corresponding to the octal number ooo (between 0 and 0377).
214
215
216
217
218
219
220
221
222 </center>
223
224 Note that the C++ compiler transforms backslashes in strings so to
225 include a \ in a regexp you will need to enter it twice, i.e.
226 \\.
227
229 Square brackets are used to match any character in the set of
230 characters contained within the square brackets. All the character set
231 abbreviations described above can be used within square brackets. Apart
232 from the character set abbreviations and the following two exceptions
233 no characters have special meanings in square brackets.
234
235 <center>.nf
236
237 </center>
238
239 Using the predefined character set abbreviations is more portable than
240 using character ranges across platforms and languages. For example,
241 [0-9] matches a digit in Western alphabets but \d matches a digit in
242 any alphabet.
243
244 Note that in most regexp literature sets of characters are called"
245 character classes".
246
248 By default an expression is automatically quantified by {1,1}, i.e. it
249 should occur exactly once. In the following list E stands for any
250 expression. An expression is a character or an abbreviation for a set
251 of characters or a set of characters in square brackets or any
252 parenthesised expression.
253
254 <center>.nf
255
256
257 ─────────────────────────────────────────────────────────────
258 means "the previous expression is optional" since it will
259 match whether or not the expression occurs in the string. It
260 is the same as
261
262 as
263
264
265 as
266
267 is the same as repeating the expression n times. For
268 example,
269
270 is the same as
271
272 is the same as
273
274 </center>
275
276 (MAXINT is implementation dependent but will not be smaller than 1024.)
277
278 If we wish to apply a quantifier to more than just the preceding
279 character we can use parentheses to group characters together in an
280 expression. For example, tag+ matches a 't' followed by an 'a' followed
281 by at least one 'g', whereas (tag)+ matches at least one occurrence of
282 'tag'.
283
284 Note that quantifiers are "greedy". They will match as much text as
285 they can. For example, 0+ will match as many zeros as it can from the
286 first zero it finds, e.g. '2.<u>000</u>5'. Quantifiers can be made non-
287 greedy, see setMinimal().
288
290 Parentheses allow us to group elements together so that we can quantify
291 and capture them. For example if we have the expression
292 mail|letter|correspondence that matches a string we know that one of
293 the words matched but not which one. Using parentheses allows us to
294 "capture" whatever is matched within their bounds, so if we used
295 (mail|letter|correspondence) and matched this regexp against the string
296 "I sent you some email" we can use the cap() or capturedTexts()
297 functions to extract the matched characters, in this case 'mail'.
298
299 We can use captured text within the regexp itself. To refer to the
300 captured text we use backreferences which are indexed from 1, the same
301 as for cap(). For example we could search for duplicate words in a
302 string using \b(\w+)\W+\1\b which means match a word boundary
303 followed by one or more word characters followed by one or more non-
304 word characters followed by the same text as the first parenthesised
305 expression followed by a word boundary.
306
307 If we want to use parentheses purely for grouping and not for capturing
308 we can use the non-capturing syntax, e.g. (?:green|blue). Non-capturing
309 parentheses begin '(?:' and end ')'. In this example we match either
310 'green' or 'blue' but we do not capture the match so we only know
311 whether or not we matched but not which color we actually found. Using
312 non-capturing parentheses is more efficient than using capturing
313 parentheses since the regexp engine has to do less book-keeping.
314
315 Both capturing and non-capturing parentheses may be nested.
316
318 Assertions make some statement about the text at the point where they
319 occur in the regexp but they do not match any characters. In the
320 following list E stands for any expression.
321
322 <center>.nf
323
324 </center>
325
327 Most command shells such as bash or cmd.exe support "file globbing",
328 the ability to identify a group of files by using wildcards. The
329 setWildcard() function is used to switch between regexp and wildcard
330 mode. Wildcard matching is much simpler than full regexps and has only
331 four features:
332
333 <center>.nf
334
335
336 ────────────
337 below. Thus
338
339
340 same as
341
342 </center>
343
344 For example if we are in wildcard mode and have strings which contain
345 filenames we could identify HTML files with *.html. This will match
346 zero or more characters followed by a dot followed by 'h', 't', 'm' and
347 'l'.
348
350 Most of the character class abbreviations supported by Perl are
351 supported by QRegExp, see characters and abbreviations for sets of
352 characters.
353
354 In QRegExp, apart from within character classes, ^ always signifies the
355 start of the string, so carets must always be escaped unless used for
356 that purpose. In Perl the meaning of caret varies automagically
357 depending on where it occurs so escaping it is rarely necessary. The
358 same applies to $ which in QRegExp always signifies the end of the
359 string.
360
361 QRegExp's quantifiers are the same as Perl's greedy quantifiers. Non-
362 greedy matching cannot be applied to individual quantifiers, but can be
363 applied to all the quantifiers in the pattern. For example, to match
364 the Perl regexp ro+?m requires:
365
366 QRegExp rx( "ro+m" );
367 rx.setMinimal( TRUE );
368
369 The equivalent of Perl's /i option is setCaseSensitive(FALSE).
370
371 Perl's /g option can be emulated using a loop.
372
373 In QRegExp . matches any character, therefore all QRegExp regexps have
374 the equivalent of Perl's /s option. QRegExp does not have an equivalent
375 to Perl's /m option, but this can be emulated in various ways for
376 example by splitting the input into lines or by looping with a regexp
377 that searches for newlines.
378
379 Because QRegExp is string oriented there are no \A, \Z or \z
380 assertions. The \G assertion is not supported but can be emulated in a
381 loop.
382
383 Perl's $& is cap(0) or capturedTexts()[0]. There are no QRegExp
384 equivalents for $`, $' or $+. Perl's capturing variables, $1, $2,
385 capturedTexts()[2], etc.
386
387 To substitute a pattern use QString::replace().
388
389 Perl's extended /x syntax is not supported, nor are directives, e.g.
390 (?i), or regexp comments, e.g. (?#comment). On the other hand, C++'s
391 rules for literal strings can be used to achieve the same:
392
393 QRegExp mark( "\\b" // word boundary
394 "[Mm]ark" // the word we want to match
395 );
396
397 Both zero-width positive and zero-width negative lookahead assertions
398 (?=pattern) and (?!pattern) are supported with the same syntax as Perl.
399 Perl's lookbehind assertions, "independent" subexpressions and
400 conditional expressions are not supported.
401
402 Non-capturing parentheses are also supported, with the same (?:pattern)
403 syntax.
404
405 See QStringList::split() and QStringList::join() for equivalents to
406 Perl's split and join functions.
407
408 Note: because C++ transforms \'s they must be written twice in
409 code, e.g. \b must be written \\b.
410
412 QRegExp rx( "^\\d\\d?$" ); // match integers 0 to 99
413 rx.search( "123" ); // returns -1 (no match)
414 rx.search( "-6" ); // returns -1 (no match)
415 rx.search( "6" ); // returns 0 (matched as position 0)
416
417 The third string matches '<u>6</u>'. This is a simple validation regexp
418 for integers in the range 0 to 99.
419
420 QRegExp rx( "^\\S+$" ); // match strings without whitespace
421 rx.search( "Hello world" ); // returns -1 (no match)
422 rx.search( "This_is-OK" ); // returns 0 (matched at position 0)
423
424 The second string matches '<u>This_is-OK</u>'. We've used the character
425 set abbreviation '\S' (non-whitespace) and the anchors to match strings
426 which contain no whitespace.
427
428 In the following example we match strings containing 'mail' or 'letter'
429 or 'correspondence' but only match whole words i.e. not 'email'
430
431 QRegExp rx( "\\b(mail|letter|correspondence)\\b" );
432 rx.search( "I sent you an email" ); // returns -1 (no match)
433 rx.search( "Please write the letter" ); // returns 17
434
435 The second string matches "Please write the <u>letter</u>". The word
436 'letter' is also captured (because of the parentheses). We can see what
437 text we've captured like this:
438
439 QString captured = rx.cap( 1 ); // captured == "letter"
440
441 This will capture the text from the first set of capturing parentheses
442 (counting capturing left parentheses from left to right). The
443 parentheses are counted from 1 since cap( 0 ) is the whole matched
444 regexp (equivalent to '&' in most regexp engines).
445
446 QRegExp rx( "&(?!amp;)" ); // match ampersands but not &
447 QString line1 = "This & that";
448 line1.replace( rx, "&" );
449 // line1 == "This & that"
450 QString line2 = "His & hers & theirs";
451 line2.replace( rx, "&" );
452 // line2 == "His & hers & theirs"
453
454 Here we've passed the QRegExp to QString's replace() function to
455 replace the matched text with new text.
456
457 QString str = "One Eric another Eirik, and an Ericsson."
458 " How many Eiriks, Eric?";
459 QRegExp rx( "\\b(Eric|Eirik)\\b" ); // match Eric or Eirik
460 int pos = 0; // where we are in the string
461 int count = 0; // how many Eric and Eirik's we've counted
462 while ( pos >= 0 ) {
463 pos = rx.search( str, pos );
464 if ( pos >= 0 ) {
465 pos++; // move along in str
466 count++; // count our Eric or Eirik
467 }
468 }
469
470 We've used the search() function to repeatedly match the regexp in the
471 string. Note that instead of moving forward by one character at a time
472 pos++ we could have written pos += rx.matchedLength() to skip over the
473 already matched string. The count will equal 3, matching 'One
474 <u>Eric</u> another <u>Eirik</u>, and an Ericsson. How many Eiriks,
475 <u>Eric</u>?'; it doesn't match 'Ericsson' or 'Eiriks' because they are
476 not bounded by non-word boundaries.
477
478 One common use of regexps is to split lines of delimited data into
479 their component fields.
480
481 str = "Trolltech AS\twww.trolltech.com\tNorway";
482 QString company, web, country;
483 rx.setPattern( "^([^\t]+)\t([^\t]+)\t([^\t]+)$" );
484 if ( rx.search( str ) != -1 ) {
485 company = rx.cap( 1 );
486 web = rx.cap( 2 );
487 country = rx.cap( 3 );
488 }
489
490 In this example our input lines have the format company name, web
491 address and country. Unfortunately the regexp is rather long and not
492 very versatile -- the code will break if we add any more fields. A
493 simpler and better solution is to look for the separator, '\t' in this
494 case, and take the surrounding text. The QStringList split() function
495 can take a separator string or regexp as an argument and split a string
496 accordingly.
497
498 QStringList field = QStringList::split( "\t", str );
499
500 Here field[0] is the company, field[1] the web address and so on.
501
502 To imitate the matching of a shell we can use wildcard mode.
503
504 QRegExp rx( "*.html" ); // invalid regexp: * doesn't quantify anything
505 rx.setWildcard( TRUE ); // now it's a valid wildcard regexp
506 rx.exactMatch( "index.html" ); // returns TRUE
507 rx.exactMatch( "default.htm" ); // returns FALSE
508 rx.exactMatch( "readme.txt" ); // returns FALSE
509
510 Wildcard matching can be convenient because of its simplicity, but any
511 wildcard regexp can be defined using full regexps, e.g. .*\.html$.
512 Notice that we can't match both .html and .htm files with a wildcard
513 unless we use *.htm* which will also match 'test.html.bak'. A full
514 regexp gives us the precision we need, .*\.html?$.
515
516 QRegExp can match case insensitively using setCaseSensitive(), and can
517 use non-greedy matching, see setMinimal(). By default QRegExp uses full
518 regexps but this can be changed with setWildcard(). Searching can be
519 forward with search() or backward with searchRev(). Captured text can
520 be accessed using capturedTexts() which returns a string list of all
521 captured strings, or using cap() which returns the captured string for
522 the given index. The pos() function takes a match index and returns the
523 position in the string where the match was made (or -1 if there was no
524 match).
525
526 See also QRegExpValidator, QString, QStringList, Miscellaneous Classes,
527 Implicitly and Explicitly Shared Classes, and Non-GUI Classes.
528
529 Member Type Documentation
531 The CaretMode enum defines the different meanings of the caret (^) in a
532 regular expression. The possible values are:
533
534 QRegExp::CaretAtZero - The caret corresponds to index 0 in the searched
535 string.
536
537 QRegExp::CaretAtOffset - The caret corresponds to the start offset of
538 the search.
539
540 QRegExp::CaretWontMatch - The caret never matches.
541
544 Constructs an empty regexp.
545
546 See also isValid() and errorString().
547
549 wildcard = FALSE )
550 Constructs a regular expression object for the given pattern string.
551 The pattern must be given using wildcard notation if wildcard is TRUE
552 (default is FALSE). The pattern is case sensitive, unless caseSensitive
553 is FALSE. Matching is greedy (maximal), but can be changed by calling
554 setMinimal().
555
556 See also setPattern(), setCaseSensitive(), setWildcard(), and
557 setMinimal().
558
560 Constructs a regular expression as a copy of rx.
561
562 See also operator=().
563
565 Destroys the regular expression and cleans up its internal data.
566
568 Returns the text captured by the nth subexpression. The entire match
569 has index 0 and the parenthesized subexpressions have indices starting
570 from 1 (excluding non-capturing parentheses).
571
572 QRegExp rxlen( "(\\d+)(?:\\s*)(cm|inch)" );
573 int pos = rxlen.search( "Length: 189cm" );
574 if ( pos > -1 ) {
575 QString value = rxlen.cap( 1 ); // "189"
576 QString unit = rxlen.cap( 2 ); // "cm"
577 // ...
578 }
579
580 The order of elements matched by cap() is as follows. The first
581 element, cap(0), is the entire matching string. Each subsequent element
582 corresponds to the next capturing open left parentheses. Thus cap(1) is
583 the text of the first capturing parentheses, cap(2) is the text of the
584 second, and so on.
585
586 Some patterns may lead to a number of matches which cannot be
587 determined in advance, for example:
588
589 QRegExp rx( "(\\d+)" );
590 str = "Offsets: 12 14 99 231 7";
591 QStringList list;
592 pos = 0;
593 while ( pos >= 0 ) {
594 pos = rx.search( str, pos );
595 if ( pos > -1 ) {
596 list += rx.cap( 1 );
597 pos += rx.matchedLength();
598 }
599 }
600 // list contains "12", "14", "99", "231", "7"
601
602 See also capturedTexts(), pos(), exactMatch(), search(), and
603 searchRev().
604
605 Examples:
606
608 Returns a list of the captured text strings.
609
610 The first string in the list is the entire matched string. Each
611 subsequent list element contains a string that matched a (capturing)
612 subexpression of the regexp.
613
614 For example:
615
616 QRegExp rx( "(\\d+)(\\s*)(cm|inch(es)?)" );
617 int pos = rx.search( "Length: 36 inches" );
618 QStringList list = rx.capturedTexts();
619 // list is now ( "36 inches", "36", " ", "inches", "es" )
620
621 The above example also captures elements that may be present but which
622 we have no interest in. This problem can be solved by using non-
623 capturing parentheses:
624
625 QRegExp rx( "(\\d+)(?:\\s*)(cm|inch(?:es)?)" );
626 int pos = rx.search( "Length: 36 inches" );
627 QStringList list = rx.capturedTexts();
628 // list is now ( "36 inches", "36", "inches" )
629
630 Note that if you want to iterate over the list, you should iterate over
631 a copy, e.g.
632
633 QStringList list = rx.capturedTexts();
634 QStringList::Iterator it = list.begin();
635 while( it != list.end() ) {
636 myProcessing( *it );
637 ++it;
638 }
639
640 Some regexps can match an indeterminate number of times. For example if
641 the input string is "Offsets: 12 14 99 231 7" and the regexp, rx, is
642 (\d+)+, we would hope to get a list of all the numbers matched.
643 However, after calling rx.search(str), capturedTexts() will return the
644 list ( "12"," 12" ), i.e. the entire match was "12" and the first
645 subexpression matched was "12". The correct approach is to use cap() in
646 a loop.
647
648 The order of elements in the string list is as follows. The first
649 element is the entire matching string. Each subsequent element
650 corresponds to the next capturing open left parentheses. Thus
651 capturedTexts()[1] is the text of the first capturing parentheses,
652 capturedTexts()[2] is the text of the second and so on (corresponding
653 to $1, $2, etc., in some other regexp languages).
654
655 See also cap(), pos(), exactMatch(), search(), and searchRev().
656
658 Returns TRUE if case sensitivity is enabled; otherwise returns FALSE.
659 The default is TRUE.
660
661 See also setCaseSensitive().
662
664 Returns a text string that explains why a regexp pattern is invalid the
665 case being; otherwise returns "no error occurred".
666
667 See also isValid().
668
669 Example: regexptester/regexptester.cpp.
670
672 Returns the string str with every regexp special character escaped with
673 a backslash. The special characters are $, (, ), *, +,
674
675 Example:
676
677 s1 = QRegExp::escape( "bingo" ); // s1 == "bingo"
678 s2 = QRegExp::escape( "f(x)" ); // s2 == "f\\(x\\)"
679
680 This function is useful to construct regexp patterns dynamically:
681
682 QRegExp rx( "(" + QRegExp::escape(name) +
683 "|" + QRegExp::escape(alias) + ")" );
684
686 Returns TRUE if str is matched exactly by this regular expression;
687 otherwise returns FALSE. You can determine how much of the string was
688 matched by calling matchedLength().
689
690 For a given regexp string, R, exactMatch("R") is the equivalent of
691 search("^R$") since exactMatch() effectively encloses the regexp in the
692 start of string and end of string anchors, except that it sets
693 matchedLength() differently.
694
695 For example, if the regular expression is blue, then exactMatch()
696 returns TRUE only for input blue. For inputs bluebell, blutak and
697 lightblue, exactMatch() returns FALSE and matchedLength() will return
698 4, 3 and 0 respectively.
699
700 Although const, this function sets matchedLength(), capturedTexts() and
701 pos().
702
703 See also search(), searchRev(), and QRegExpValidator.
704
706 Returns TRUE if the pattern string is empty; otherwise returns FALSE.
707
708 If you call exactMatch() with an empty pattern on an empty string it
709 will return TRUE; otherwise it returns FALSE since it operates over the
710 whole string. If you call search() with an empty pattern on any string
711 it will return the start offset (0 by default) because the empty
712 pattern matches the 'emptiness' at the start of the string. In this
713 case the length of the match returned by matchedLength() will be 0.
714
715 See QString::isEmpty().
716
718 Returns TRUE if the regular expression is valid; otherwise returns
719 FALSE. An invalid regular expression never matches.
720
721 The pattern [a-z is an example of an invalid pattern, since it lacks a
722 closing square bracket.
723
724 Note that the validity of a regexp may also depend on the setting of
725 the wildcard flag, for example *.html is a valid wildcard regexp but an
726 invalid full regexp.
727
728 See also errorString().
729
730 Example: regexptester/regexptester.cpp.
731
733 indexIsStart = TRUE ) const
734 This function is obsolete. It is provided to keep old source working.
735 We strongly advise against using it in new code.
736
737 Attempts to match in str, starting from position index. Returns the
738 position of the match, or -1 if there was no match.
739
740 The length of the match is stored in *len, unless len is a null
741 pointer.
742
743 If indexIsStart is TRUE (the default), the position index in the string
744 will match the start of string anchor, ^, in the regexp, if present.
745 Otherwise, position 0 in str will match.
746
747 Use search() and matchedLength() instead of this function.
748
749 See also QString::mid() and QConstString.
750
751 Example: qmag/qmag.cpp.
752
754 Returns the length of the last matched string, or -1 if there was no
755 match.
756
757 See also exactMatch(), search(), and searchRev().
758
759 Examples:
760
762 Returns TRUE if minimal (non-greedy) matching is enabled; otherwise
763 returns FALSE.
764
765 See also setMinimal().
766
768 Returns the number of captures contained in the regular expression.
769
770 Example: regexptester/regexptester.cpp.
771
773 Returns TRUE if this regular expression is not equal to rx; otherwise
774 returns FALSE.
775
776 See also operator==().
777
779 Copies the regular expression rx and returns a reference to the copy.
780 The case sensitivity, wildcard and minimal matching options are also
781 copied.
782
784 Returns TRUE if this regular expression is equal to rx; otherwise
785 returns FALSE.
786
787 Two QRegExp objects are equal if they have the same pattern strings and
788 the same settings for case sensitivity, wildcard and minimal matching.
789
791 Returns the pattern string of the regular expression. The pattern has
792 either regular expression syntax or wildcard syntax, depending on
793 wildcard().
794
795 See also setPattern().
796
798 Returns the position of the nth captured text in the searched string.
799 If nth is 0 (the default), pos() returns the position of the whole
800 match.
801
802 Example:
803
804 QRegExp rx( "/([a-z]+)/([a-z]+)" );
805 rx.search( "Output /dev/null" ); // returns 7 (position of /dev/null)
806 rx.pos( 0 ); // returns 7 (position of /dev/null)
807 rx.pos( 1 ); // returns 8 (position of dev)
808 rx.pos( 2 ); // returns 12 (position of null)
809
810 For zero-length matches, pos() always returns -1. (For example, if
811 cap(4) would return an empty string, pos(4) returns -1.) This is due to
812 an implementation tradeoff.
813
814 See also capturedTexts(), exactMatch(), search(), and searchRev().
815
817 = CaretAtZero ) const
818 Attempts to find a match in str from position offset (0 by default). If
819 offset is -1, the search starts at the last character; if -2, at the
820 next to last character; etc.
821
822 Returns the position of the first match, or -1 if there was no match.
823
824 The caretMode parameter can be used to instruct whether ^ should match
825 at index 0 or at offset.
826
827 You might prefer to use QString::find(), QString::contains() or even
828 QStringList::grep(). To replace matches use QString::replace().
829
830 Example:
831
832 QString str = "offsets: 1.23 .50 71.00 6.00";
833 QRegExp rx( "\\d*\\.\\d+" ); // primitive floating point matching
834 int count = 0;
835 int pos = 0;
836 while ( (pos = rx.search(str, pos)) != -1 ) {
837 count++;
838 pos += rx.matchedLength();
839 }
840 // pos will be 9, 14, 18 and finally 24; count will end up as 4
841
842 Although const, this function sets matchedLength(), capturedTexts() and
843 pos().
844
845 See also searchRev() and exactMatch().
846
847 Examples:
848
850 caretMode = CaretAtZero ) const
851 Attempts to find a match backwards in str from position offset. If
852 offset is -1 (the default), the search starts at the last character; if
853 -2, at the next to last character; etc.
854
855 Returns the position of the first match, or -1 if there was no match.
856
857 The caretMode parameter can be used to instruct whether ^ should match
858 at index 0 or at offset.
859
860 Although const, this function sets matchedLength(), capturedTexts() and
861 pos().
862
863 Warning: Searching backwards is much slower than searching forwards.
864
865 See also search() and exactMatch().
866
868 Sets case sensitive matching to sensitive.
869
870 If sensitive is TRUE, \.txt$ matches readme.txt but not README.TXT.
871
872 See also caseSensitive().
873
874 Example: regexptester/regexptester.cpp.
875
877 Enables or disables minimal matching. If minimal is FALSE, matching is
878 greedy (maximal) which is the default.
879
880 For example, suppose we have the input string "We must be <b>bold</b>,
881 very <b>bold</b>!" and the pattern <b>.*</b>. With the default greedy
882 (maximal) matching, the match is "We must be <u><b>bold</b>, very
883 <b>bold</b></u>!". But with minimal (non-greedy) matching the first
884 match is: "We must be <u><b>bold</b></u>, very <b>bold</b>!" and the
885 second match is "We must be <b>bold</b>, very <u><b>bold</b></u>!". In
886 practice we might use the pattern <b>[^<]+</b> instead, although this
887 will still fail for nested tags.
888
889 See also minimal().
890
891 Examples:
892
894 Sets the pattern string to pattern. The case sensitivity, wildcard and
895 minimal matching options are not changed.
896
897 See also pattern().
898
900 Sets the wildcard mode for the regular expression. The default is
901 FALSE.
902
903 Setting wildcard to TRUE enables simple shell-like wildcard matching.
904 (See wildcard matching (globbing).)
905
906 For example, r*.txt matches the string readme.txt in wildcard mode, but
907 does not match readme.
908
909 See also wildcard().
910
911 Example: regexptester/regexptester.cpp.
912
914 Returns TRUE if wildcard mode is enabled; otherwise returns FALSE. The
915 default is FALSE.
916
917 See also setWildcard().
918
919
921 http://doc.trolltech.com/qregexp.html
922 http://www.trolltech.com/faq/tech.html
923
925 Copyright 1992-2007 Trolltech ASA, http://www.trolltech.com. See the
926 license file included in the distribution for a complete license
927 statement.
928
930 Generated automatically from the source code.
931
933 If you find a bug in Qt, please report it as described in
934 http://doc.trolltech.com/bughowto.html. Good bug reports help us to
935 help you. Thank you.
936
937 The definitive Qt documentation is provided in HTML format; it is
938 located at $QTDIR/doc/html and can be read using Qt Assistant or with a
939 web browser. This man page is provided as a convenience for those users
940 who prefer man pages, although this format is not officially supported
941 by Trolltech.
942
943 If you find errors in this manual page, please report them to qt-
944 bugs@trolltech.com. Please include the name of the manual page
945 (qregexp.3qt) and the Qt version (3.3.8).
946
947
948
949Trolltech AS 2 February 2007 QRegExp(3qt)