1Regexp::Common::commentU(s3e)r Contributed Perl DocumentaRteigoenxp::Common::comment(3)
2
3
4
6 Regexp::Common::comment -- provide regexes for comments.
7
9 use Regexp::Common qw /comment/;
10
11 while (<>) {
12 /$RE{comment}{C}/ and print "Contains a C comment\n";
13 /$RE{comment}{C++}/ and print "Contains a C++ comment\n";
14 /$RE{comment}{PHP}/ and print "Contains a PHP comment\n";
15 /$RE{comment}{Java}/ and print "Contains a Java comment\n";
16 /$RE{comment}{Perl}/ and print "Contains a Perl comment\n";
17 /$RE{comment}{awk}/ and print "Contains an awk comment\n";
18 /$RE{comment}{HTML}/ and print "Contains an HTML comment\n";
19 }
20
21 use Regexp::Common qw /comment RE_comment_HTML/;
22
23 while (<>) {
24 $_ =~ RE_comment_HTML() and print "Contains an HTML comment\n";
25 }
26
28 Please consult the manual of Regexp::Common for a general description
29 of the works of this interface.
30
31 Do not use this module directly, but load it via Regexp::Common.
32
33 This modules gives you regular expressions for comments in various
34 languages.
35
36 THE LANGUAGES
37 Below, the comments of each of the languages are described. The
38 patterns are available as $RE{comment}{LANG}, foreach language LANG.
39 Some languages have variants; it's described at the individual
40 languages how to get the patterns for the variants. Unless mentioned
41 otherwise, "{-keep}" sets $1, $2, $3 and $4 to the entire comment, the
42 opening marker, the content of the comment, and the closing marker (for
43 many languages, the latter is a newline) respectively.
44
45 ABC Comments in ABC start with a backslash ("\"), and last till the end
46 of the line. See <http://homepages.cwi.nl/%7Esteven/abc/>.
47
48 Ada Comments in Ada start with "--", and last till the end of the line.
49
50 Advisor
51 Advisor is a language used by the HP product glance. Comments for
52 this language start with either "#" or "//", and last till the end
53 of the line.
54
55 Advsys
56 Comments for the Advsys language start with ";" and last till the
57 end of the line. See also <http://www.wurb.com/if/devsys/12>.
58
59 Alan
60 Alan comments start with "--", and last till the end of the line.
61 See also
62 <http://w1.132.telia.com/~u13207378/alan/manual/alanTOC.html>.
63
64 Algol 60
65 Comments in the Algol 60 language start with the keyword "comment",
66 and end with a ";". See
67 <http://www.masswerk.at/algol60/report.htm>.
68
69 Algol 68
70 In Algol 68, comments are either delimited by "#", or by one of the
71 keywords "co" or "comment". The keywords should not be part of
72 another word. See
73 http://westein.arb-phys.uni-dortmund.de/~wb/a68s.txt
74 <http://westein.arb-phys.uni-dortmund.de/~wb/a68s.txt>. With
75 "{-keep}", only $1 will be set, returning the entire comment.
76
77 ALPACA
78 The ALPACA language has comments starting with "/*" and ending with
79 "*/".
80
81 awk The awk programming language uses comments that start with "#" and
82 end at the end of the line.
83
84 B The B language has comments starting with "/*" and ending with
85 "*/".
86
87 BASIC
88 There are various forms of BASIC around. Currently, we only support
89 the variant supported by mvEnterprise, whose pattern is available
90 as $RE{comment}{BASIC}{mvEnterprise}. Comments in this language
91 start with a "!", a "*" or the keyword "REM", and end till the end
92 of the line. See
93 <http://www.rainingdata.com/products/beta/docs/mve/50/ReferenceManual/Basic.pdf>.
94
95 Beatnik
96 The esotoric language Beatnik only uses words consisting of
97 letters. Words are scored according to the rules of Scrabble.
98 Words scoring less than 5 points, or 18 points or more are
99 considered comments (although the compiler might mock at you if you
100 score less than 5 points). Regardless whether "{-keep}", $1 will
101 be set, and set to the entire comment. This pattern requires perl
102 5.8.0 or newer.
103
104 beta-Juliet
105 The beta-Juliet programming language has comments that start with
106 "//" and that continue till the end of the line. See also
107 http://www.catseye.mb.ca/esoteric/b-juliet/index.html
108 <http://www.catseye.mb.ca/esoteric/b-juliet/index.html>.
109
110 Befunge-98
111 The esotoric language Befunge-98 uses comments that start and end
112 with a ";". See
113 <http://www.catseye.mb.ca/esoteric/befunge/98/spec98.html>.
114
115 BML BML, or Better Markup Language is an HTML templating language that
116 uses comments starting with "<?c_", and ending with "c_?>". See
117 <http://www.livejournal.com/doc/server/bml.index.html>.
118
119 Brainfuck
120 The minimal language Brainfuck uses only eight characters, "<",
121 ">", "[", "]", "+", "-", "." and ",". Any other characters are
122 considered comments. With "{-keep}", $1 is set to the entire
123 comment.
124
125 C The C language has comments starting with "/*" and ending with
126 "*/".
127
128 C-- The C-- language has comments starting with "/*" and ending with
129 "*/". See
130 http://cs.uas.arizona.edu/classes/453/programs/C--Spec.html
131 <http://cs.uas.arizona.edu/classes/453/programs/C--Spec.html>.
132
133 C++ The C++ language has two forms of comments. Comments that start
134 with "//" and last till the end of the line, and comments that
135 start with "/*", and end with "*/". If "{-keep}" is used, only $1
136 will be set, and set to the entire comment.
137
138 C# The C# language has two forms of comments. Comments that start with
139 "//" and last till the end of the line, and comments that start
140 with "/*", and end with "*/". If "{-keep}" is used, only $1 will be
141 set, and set to the entire comment. See
142 http://msdn.microsoft.com/library/default.asp?url=/library/en-us/csspec/html/vclrfcsharpspec_C.asp
143 <http://msdn.microsoft.com/library/default.asp?url=/library/en-
144 us/csspec/html/vclrfcsharpspec_C.asp>.
145
146 Caml
147 Comments in Caml start with "(*", end with "*)", and can be nested.
148 See <http://www.cs.caltech.edu/courses/cs134/cs134b/book.pdf> and
149 http://pauillac.inria.fr/caml/index-eng.html
150 <http://pauillac.inria.fr/caml/index-eng.html>.
151
152 Cg The Cg language has two forms of comments. Comments that start with
153 "//" and last till the end of the line, and comments that start
154 with "/*", and end with "*/". If "{-keep}" is used, only $1 will be
155 set, and set to the entire comment. See
156 <http://developer.nvidia.com/attach/3722>.
157
158 CLU In "CLU", a comment starts with a procent sign ("%"), and ends with
159 the next newline. See ftp://ftp.lcs.mit.edu:/pub/pclu/CLU-syntax.ps
160 <ftp://ftp.lcs.mit.edu:/pub/pclu/CLU-syntax.ps> and
161 <http://www.pmg.lcs.mit.edu/CLU.html>.
162
163 COBOL
164 Traditionally, comments in COBOL are indicated by an asteriks in
165 the seventh column. This is what the pattern matches. Modern
166 compiler may more lenient though. See
167 <http://www.csis.ul.ie/cobol/Course/COBOLIntro.htm>, and
168 <http://www.csis.ul.ie/cobol/default.htm>. Due to a bug in the
169 regexp engine of perl 5.6.x, this regexp is only available in
170 version 5.8.0 and up.
171
172 CQL Comments in the chess query language (CQL) start with a semi colon
173 (";") and last till the end of the line. See
174 <http://www.rbnn.com/cql/>.
175
176 Crystal Report
177 The formula editor in Crystal Reports uses comments that start with
178 "//", and end with the end of the line.
179
180 Dylan
181 There are two types of comments in Dylan. They either start with
182 "//", or are nested comments, delimited with "/*" and "*/". Under
183 "{-keep}", only $1 will be set, returning the entire comment. This
184 pattern requires perl 5.6.0 or newer.
185
186 ECMAScript
187 The ECMAScript language has two forms of comments. Comments that
188 start with "//" and last till the end of the line, and comments
189 that start with "/*", and end with "*/". If "{-keep}" is used, only
190 $1 will be set, and set to the entire comment. JavaScript is
191 Netscapes implementation of ECMAScript. See
192 http://www.ecma-international.org/publications/files/ecma-st/Ecma-262.pdf
193 <http://www.ecma-international.org/publications/files/ecma-
194 st/Ecma-262.pdf>, and
195 http://www.ecma-international.org/publications/standards/Ecma-262.htm
196 <http://www.ecma-
197 international.org/publications/standards/Ecma-262.htm>.
198
199 Eiffel
200 Eiffel comments start with "--", and last till the end of the line.
201
202 False
203 In False, comments start with "{" and end with "}". See
204 <http://wouter.fov120.com/false/false.txt>
205
206 FPL The FPL language has two forms of comments. Comments that start
207 with "//" and last till the end of the line, and comments that
208 start with "/*", and end with "*/". If "{-keep}" is used, only $1
209 will be set, and set to the entire comment.
210
211 Forth
212 Comments in Forth start with "\", and end with the end of the line.
213 See also http://docs.sun.com/sb/doc/806-1377-10
214 <http://docs.sun.com/sb/doc/806-1377-10>.
215
216 Fortran
217 There are two forms of Fortran. There's free form Fortran, which
218 has comments that start with "!", and end at the end of the line.
219 The pattern for this is given by $RE{Fortran}. Fixed form Fortran,
220 which has been obsoleted, has comments that start with "C", "c" or
221 "*" in the first column, or with "!" anywhere, but the sixth
222 column. The pattern for this are given by $RE{Fortran}{fixed}.
223
224 See also
225 http://www.cray.com/craydoc/manuals/007-3692-005/html-007-3692-005/
226 <http://www.cray.com/craydoc/manuals/007-3692-005/html-007-3692-005/>.
227
228 Funge-98
229 The esotoric language Funge-98 uses comments that start and end
230 with a ";".
231
232 fvwm2
233 Configuration files for fvwm2 have comments starting with a "#" and
234 lasting the rest of the line.
235
236 Haifu
237 Haifu, an esotoric language using haikus, has comments starting and
238 ending with a ",". See
239 <http://www.dangermouse.net/esoteric/haifu.html>.
240
241 Haskell
242 There are two types of comments in Haskell. They either start with
243 at least two dashes, or are nested comments, delimited with "{-"
244 and "-}". Under "{-keep}", only $1 will be set, returning the
245 entire comment. This pattern requires perl 5.6.0 or newer.
246
247 HTML
248 In HTML, comments only appear inside a comment declaration. A
249 comment declaration starts with a "<!", and ends with a ">". Inside
250 this declaration, we have zero or more comments. Comments starts
251 with "--" and end with "--", and are optionally followed by
252 whitespace. The pattern $RE{comment}{HTML} recognizes those comment
253 declarations (and hence more than a comment). Note that this is
254 not the same as something that starts with "<!--" and ends with
255 "-->", because the following will be matched completely:
256
257 <!-- First Comment --
258 --> Second Comment <!--
259 -- Third Comment -->
260
261 Do not be fooled by what your favourite browser thinks is an HTML
262 comment.
263
264 If "{-keep}" is used, the following are returned:
265
266 $1 captures the entire comment declaration.
267
268 $2 captures the MDO (markup declaration open), "<!".
269
270 $3 captures the content between the MDO and the MDC.
271
272 $4 captures the (last) comment, without the surrounding dashes.
273
274 $5 captures the MDC (markup declaration close), ">".
275
276 Hugo
277 There are two types of comments in Hugo. They either start with "!"
278 (which cannot be followed by a "\"), or are nested comments,
279 delimited with "!\" and "\!". Under "{-keep}", only $1 will be
280 set, returning the entire comment. This pattern requires perl
281 5.6.0 or newer.
282
283 Icon
284 Icon has comments that start with "#" and end at the next new line.
285 See
286 <http://www.toolsofcomputing.com/IconHandbook/IconHandbook.pdf>,
287 <http://www.cs.arizona.edu/icon/index.htm>, and
288 <http://burks.bton.ac.uk/burks/language/icon/index.htm>.
289
290 ILLGOL
291 The esotoric language ILLGOL uses comments starting with NB and
292 lasting till the end of the line. See
293 <http://www.catseye.mb.ca/esoteric/illgol/index.html>.
294
295 INTERCAL
296 Comments in INTERCAL are single line comments. They start with one
297 of the keywords "NOT" or "N'T", and can optionally be preceeded by
298 the keywords "DO" and "PLEASE". If both keywords are used, "PLEASE"
299 preceeds "DO". Keywords are separated by whitespace.
300
301 J The language J uses comments that start with "NB.", and that last
302 till the end of the line. See
303 <http://www.jsoftware.com/books/help/primer/contents.htm>, and
304 <http://www.jsoftware.com/>.
305
306 Java
307 The Java language has two forms of comments. Comments that start
308 with "//" and last till the end of the line, and comments that
309 start with "/*", and end with "*/". If "{-keep}" is used, only $1
310 will be set, and set to the entire comment.
311
312 JavaScript
313 The JavaScript language has two forms of comments. Comments that
314 start with "//" and last till the end of the line, and comments
315 that start with "/*", and end with "*/". If "{-keep}" is used, only
316 $1 will be set, and set to the entire comment. JavaScript is
317 Netscapes implementation of ECMAScript. See
318 http://www.mozilla.org/js/language/E262-3.pdf
319 <http://www.mozilla.org/js/language/E262-3.pdf>, and
320 <http://www.mozilla.org/js/language/>.
321
322 LaTeX
323 The documentation language LaTeX uses comments starting with "%"
324 and ending at the end of the line.
325
326 Lisp
327 Comments in Lisp start with a semi-colon (";") and last till the
328 end of the line.
329
330 LPC The LPC language has comments starting with "/*" and ending with
331 "*/".
332
333 LOGO
334 Comments for the language LOGO start with ";", and last till the
335 end of the line.
336
337 lua Comments for the lua language start with "--", and last till the
338 end of the line. See also <http://www.lua.org/manual/manual.html>.
339
340 M, MUMPS
341 In "M" (aka "MUMPS"), comments start with a semi-colon, and last
342 till the end of a line. The language specification requires the
343 semi-colon to be preceeded by one or more linestart characters.
344 Those characters default to a space, but that's configurable. This
345 requirement, of preceeding the comment with linestart characters is
346 not tested for. See
347 <ftp://ftp.intersys.com/pub/openm/ism/ism64docs.zip>,
348 <http://mtechnology.intersys.com/mproducts/openm/index.html>, and
349 <http://mcenter.com/mtrc/index.html>.
350
351 m4 By default, the preprocessor language m4 uses single line comments,
352 that start with a "#" and continue to the end of the line,
353 including the newline. The pattern "$RE {comment} {m4}" matches
354 such comments. In m4, it is possible to change the starting token
355 though. See
356 <http://wolfram.schneider.org/bsd/7thEdManVol2/m4/m4.pdf>,
357 http://www.cs.stir.ac.uk/~kjt/research/pdf/expl-m4.pdf
358 <http://www.cs.stir.ac.uk/~kjt/research/pdf/expl-m4.pdf>, and
359 <http://www.gnu.org/software/m4/manual/>.
360
361 Modula-2
362 In "Modula-2", comments start with "(*", and end with "*)".
363 Comments may be nested. See <http://www.modula2.org/>.
364
365 Modula-3
366 In "Modula-3", comments start with "(*", and end with "*)".
367 Comments may be nested. See <http://www.m3.org/>.
368
369 mutt
370 Configuration files for mutt have comments starting with a "#" and
371 lasting the rest of the line.
372
373 Nickle
374 The Nickle language has one line comments starting with "#" (like
375 Perl), or multiline comments delimited by "/*" and "*/" (like C).
376 Under "-keep", only $1 will be set. See also
377 <http://www.nickle.org>.
378
379 Oberon
380 Comments in Oberon start with "(*" and end with "*)". See
381 <http://www.oberon.ethz.ch/oreport.html>.
382
383 Pascal
384 There are many implementations of Pascal. This modules provides
385 pattern for comments of several implementations.
386
387 $RE{comment}{Pascal}
388 This is the pattern that recognizes comments according to the
389 Pascal ISO standard. This standard says that comments start
390 with either "{", or "(*", and end with "}" or "*)". This means
391 that "{*)" and "(*}" are considered to be comments. Many Pascal
392 applications don't allow this. See
393 http://www.pascal-central.com/docs/iso10206.txt
394 <http://www.pascal-central.com/docs/iso10206.txt>
395
396 $RE{comment}{Alice}
397 The Alice Pascal compiler accepts comments that start with "{"
398 and end with "}". Comments are not allowed to contain newlines.
399 See <http://www.templetons.com/brad/alice/language/>.
400
401 $RE{comment}{Pascal}{Delphi}, $RE{comment}{Pascal}{Free} and
402 $RE{comment}{Pascal}{GPC}
403 The Delphi Pascal, Free Pascal and the Gnu Pascal Compiler
404 implementations of Pascal all have comments that either start
405 with "//" and last till the end of the line, are delimited with
406 "{" and "}" or are delimited with "(*" and "*)". Patterns for
407 those comments are given by $RE{comment}{Pascal}{Delphi},
408 $RE{comment}{Pascal}{Free} and $RE{comment}{Pascal}{GPC}
409 respectively. These patterns only set $1 when "{-keep}" is
410 used, which will then include the entire comment.
411
412 See <http://info.borland.com/techpubs/delphi5/oplg/>,
413 http://www.freepascal.org/docs-html/ref/ref.html
414 <http://www.freepascal.org/docs-html/ref/ref.html> and
415 http://www.gnu-pascal.de/gpc/ <http://www.gnu-pascal.de/gpc/>.
416
417 $RE{comment}{Pascal}{Workshop}
418 The Workshop Pascal compiler, from SUN Microsystems, allows
419 comments that are delimited with either "{" and "}", delimited
420 with "(*)" and "*"), delimited with "/*", and "*/", or starting
421 and ending with a double quote ("""). When "{-keep}" is used,
422 only $1 is set, and returns the entire comment.
423
424 See http://docs.sun.com/db/doc/802-5762
425 <http://docs.sun.com/db/doc/802-5762>.
426
427 PEARL
428 Comments in PEARL start with a "!" and last till the end of the
429 line, or start with "/*" and end with "*/". With "{-keep}", $1 will
430 be set to the entire comment.
431
432 PHP Comments in PHP start with either "#" or "//" and last till the end
433 of the line, or are delimited by "/*" and "*/". With "{-keep}", $1
434 will be set to the entire comment.
435
436 PL/B
437 In PL/B, comments start with either "." or ";", and end with the
438 next newline. See http://www.mmcctech.com/pl-b/plb-0010.htm
439 <http://www.mmcctech.com/pl-b/plb-0010.htm>.
440
441 PL/I
442 The PL/I language has comments starting with "/*" and ending with
443 "*/".
444
445 PL/SQL
446 In PL/SQL, comments either start with "--" and run till the end of
447 the line, or start with "/*" and end with "*/".
448
449 Perl
450 Perl uses comments that start with a "#", and continue till the end
451 of the line.
452
453 Portia
454 The Portia programming language has comments that start with "//",
455 and last till the end of the line.
456
457 Python
458 Python uses comments that start with a "#", and continue till the
459 end of the line.
460
461 Q-BAL
462 Comments in the Q-BAL language start with "`" (a backtick), and
463 contine till the end of the line.
464
465 QML In "QML", comments start with "#" and last till the end of the
466 line. See <http://www.questionmark.com/uk/qml/overview.doc>.
467
468 R The statistical language R uses comments that start with a "#" and
469 end with the following new line. See http://www.r-project.org/
470 <http://www.r-project.org/>.
471
472 REBOL
473 Comments for the REBOL language start with ";" and last till the
474 end of the line.
475
476 Ruby
477 Comments in Ruby start with "#" and last till the end of the time.
478
479 Scheme
480 Scheme comments start with ";", and last till the end of the line.
481 See <http://schemers.org/>.
482
483 shell
484 Comments in various shells start with a "#" and end at the end of
485 the line.
486
487 Shelta
488 The esotoric language Shelta uses comments that start and end with
489 a ";". See <http://www.catseye.mb.ca/esoteric/shelta/index.html>.
490
491 SLIDE
492 The SLIDE language has two froms of comments. First there is the
493 line comment, which starts with a "#" and includes the rest of the
494 line (just like Perl). Second, there is the multiline, nested
495 comment, which are delimited by "(*" and "*)". Under C{-keep}>,
496 only $1 is set, and is set to the entire comment. This pattern
497 needs at least Perl version 5.6.0. See
498 <http://www.cs.berkeley.edu/~ug/slide/docs/slide/spec/spec_frame_intro.shtml>.
499
500 slrn
501 Configuration files for slrn have comments starting with a "%" and
502 lasting the rest of the line.
503
504 Smalltalk
505 Smalltalk uses comments that start and end with a double quote,
506 """.
507
508 SMITH
509 Comments in the SMITH language start with ";", and last till the
510 end of the line.
511
512 Squeak
513 In the Smalltalk variant Squeak, comments start and end with """.
514 Double quotes can appear inside comments by doubling them.
515
516 SQL Standard SQL uses comments starting with two or more dashes, and
517 ending at the end of the line.
518
519 MySQL does not follow the standard. Instead, it allows comments
520 that start with a "#" or "-- " (that's two dashes and a space)
521 ending with the following newline, and comments starting with "/*",
522 and ending with the next ";" or "*/" that isn't inside single or
523 double quotes. A pattern for this is returned by
524 $RE{comment}{SQL}{MySQL}. With "{-keep}", only $1 will be set, and
525 it returns the entire comment.
526
527 Tcl In Tcl, comments start with "#" and continue till the end of the
528 line.
529
530 TeX The documentation language TeX uses comments starting with "%" and
531 ending at the end of the line.
532
533 troff
534 The document formatting language troff uses comments starting with
535 "\"", and continuing till the end of the line.
536
537 Ubercode
538 The Windows programming language Ubercode uses comments that start
539 with "//" and continue to the end of the line. See
540 <http://www.ubercode.com>.
541
542 vi In configuration files for the editor vi, one can use comments
543 starting with """, and ending at the end of the line.
544
545 *W In the language *W, comments start with "||", and end with "!!".
546
547 zonefile
548 Comments in DNS zonefiles start with ";", and continue till the end
549 of the line.
550
551 ZZT-OOP
552 The in-game language ZZT-OOP uses comments that start with a "'"
553 character, and end at the following newline. See
554 <http://dave2.rocketjump.org/rad/zzthelp/lang.html>.
555
557 [Go 90]
558 Charles F. Goldfarb: The SGML Handbook. Oxford: Oxford University
559 Press. 1990. ISBN 0-19-853737-9. Ch. 10.3, pp 390-391.
560
562 Regexp::Common for a general description of how to use this interface.
563
565 Damian Conway (damian@conway.org)
566
568 This package is maintained by Abigail (regexp-common@abigail.be).
569
571 Bound to be plenty.
572
573 For a start, there are many common regexes missing. Send them in to
574 regexp-common@abigail.be.
575
577 This software is Copyright (c) 2001 - 2009, Damian Conway and Abigail.
578
579 This module is free software, and maybe used under any of the following
580 licenses:
581
582 1) The Perl Artistic License. See the file COPYRIGHT.AL.
583 2) The Perl Artistic License 2.0. See the file COPYRIGHT.AL2.
584 3) The BSD Licence. See the file COPYRIGHT.BSD.
585 4) The MIT Licence. See the file COPYRIGHT.MIT.
586
587
588
589perl v5.12.0 2010-01-02 Regexp::Common::comment(3)