1docstrip(n) Literate programming tool docstrip(n)
2
3
4
5______________________________________________________________________________
6
8 docstrip - Docstrip style source code extraction
9
11 package require Tcl 8.4
12
13 package require docstrip ?1.2?
14
15 docstrip::extract text terminals ?option value ...?
16
17 docstrip::sourcefrom filename terminals ?option value ...?
18
19______________________________________________________________________________
20
22 Docstrip is a tool created to support a brand of Literate Programming.
23 It is most common in the (La)TeX community, where it is being used for
24 pretty much everything from the LaTeX core and up, but there is nothing
25 about docstrip which prevents using it for other types of software.
26
27 In short, the basic principle of literate programming is that program
28 source should primarily be written and structured to suit the develop‐
29 ers (and advanced users who want to peek "under the hood"), not to suit
30 the whims of a compiler or corresponding source code consumer. This
31 means literate sources often need some kind of "translation" to an il‐
32 literate form that dumb software can understand. The docstrip Tcl
33 package handles this translation.
34
35 Even for those who do not whole-hartedly subscribe to the philosophy
36 behind literate programming, docstrip can bring greater clarity to in
37 particular:
38
39 • programs employing non-obvious mathematics
40
41 • projects where separate pieces of code, perhaps in different
42 languages, need to be closely coordinated.
43
44 The first is by providing access to much more powerful typographical
45 features for source code comments than are possible in plain text. The
46 second is because all the separate pieces of code can be kept next to
47 each other in the same source file.
48
49 The way it works is that the programmer edits directly only one or sev‐
50 eral "master" source code files, from which docstrip generates the more
51 traditional "source" files compilers or the like would expect. The mas‐
52 ter sources typically contain a large amount of documentation of the
53 code, sometimes even in places where the code consumers would not allow
54 any comments. The etymology of "docstrip" is that this documentation
55 was stripped away (although "code extraction" might be a better de‐
56 scription, as it has always been a matter of copying selected pieces of
57 the master source rather than deleting text from it). The docstrip Tcl
58 package contains a reimplementation of the basic extraction functional‐
59 ity from the docstrip program, and thus makes it possible for a Tcl in‐
60 terpreter to read and interpret the master source files directly.
61
62 Readers who are not previously familiar with docstrip but want to know
63 more about it may consult the following sources.
64
65 [1] The tclldoc package and class, http://ctan.org/tex-ar‐
66 chive/macros/latex/contrib/tclldoc/.
67
68 [2] The DocStrip utility, http://ctan.org/tex-archive/macros/la‐
69 tex/base/docstrip.dtx.
70
71 [3] The doc and shortvrb Packages, http://ctan.org/tex-ar‐
72 chive/macros/latex/base/doc.dtx.
73
74 [4] Chapter 14 of The LaTeX Companion (second edition), Addison-Wes‐
75 ley, 2004; ISBN 0-201-36299-6.
76
78 The basic unit docstrip operates on are the lines of a master source
79 file. Extraction consists of selecting some of these lines to be copied
80 from input text to output text. The basic distinction is that between
81 code lines (which are copied and do not begin with a percent character)
82 and comment lines (which begin with a percent character and are not
83 copied).
84
85
86 docstrip::extract [join {
87 {% comment}
88 {% more comment !"#$%&/(}
89 {some command}
90 { % blah $blah "Not a comment."}
91 {% abc; this is comment}
92 {# def; this is code}
93 {ghi}
94 {% jkl}
95 } \n] {}
96
97 returns the same sequence of lines as
98
99
100 join {
101 {some command}
102 { % blah $blah "Not a comment."}
103 {# def; this is code}
104 {ghi} ""
105 } \n
106
107 It does not matter to docstrip what format is used for the documenta‐
108 tion in the comment lines, but in order to do better than plain text
109 comments, one typically uses some markup language. Most commonly LaTeX
110 is used, as that is a very established standard and also provides the
111 best support for mathematical formulae, but the docstrip::util package
112 also gives some support for doctools-like markup.
113
114 Besides the basic code and comment lines, there are also guard lines,
115 which begin with the two characters '%<', and meta-comment lines, which
116 begin with the two characters ´%%'. Within guard lines there is fur‐
117 thermore the distinction between verbatim guard lines, which begin with
118 '%<<', and ordinary guard lines, where the '%<' is not followed by an‐
119 other '<'. The last category is by far the most common.
120
121 Ordinary guard lines conditions extraction of the code line(s) they
122 guard by the value of a boolean expression; the guarded block of code
123 lines will only be included if the expression evaluates to true. The
124 syntax of an ordinary guard line is one of
125
126
127 '%' '<' STARSLASH EXPRESSION '>'
128 '%' '<' PLUSMINUS EXPRESSION '>' CODE
129
130 where
131
132
133 STARSLASH ::= '*' | '/'
134 PLUSMINUS ::= | '+' | '-'
135 EXPRESSION ::= SECONDARY | SECONDARY ',' EXPRESSION
136 | SECONDARY '|' EXPRESSION
137 SECONDARY ::= PRIMARY | PRIMARY '&' SECONDARY
138 PRIMARY ::= TERMINAL | '!' PRIMARY | '(' EXPRESSION ')'
139 CODE ::= { any character except end-of-line }
140
141 Comma and vertical bar both denote 'or'. Ampersand denotes 'and'. Ex‐
142 clamation mark denotes 'not'. A TERMINAL can be any nonempty string of
143 characters not containing '>', '&', '|', comma, '(', or ')', although
144 the docstrip manual is a bit restrictive and only guarantees proper op‐
145 eration for strings of letters (although even the LaTeX core sources
146 make heavy use also of digits in TERMINALs). The second argument of
147 docstrip::extract is the list of those TERMINALs that should count as
148 having the value 'true'; all other TERMINALs count as being 'false'
149 when guard expressions are evaluated.
150
151 In the case of a '%<*EXPRESSION>' guard, the lines guarded are all
152 lines up to the next '%</EXPRESSION>' guard with the same EXPRESSION
153 (compared as strings). The blocks of code delimited by such '*' and '/'
154 guard lines must be properly nested.
155
156
157 set text [join {
158 {begin}
159 {%<*foo>}
160 {1}
161 {%<*bar>}
162 {2}
163 {%</bar>}
164 {%<*!bar>}
165 {3}
166 {%</!bar>}
167 {4}
168 {%</foo>}
169 {5}
170 {%<*bar>}
171 {6}
172 {%</bar>}
173 {end}
174 } \n]
175 set res [docstrip::extract $text foo]
176 append res [docstrip::extract $text {foo bar}]
177 append res [docstrip::extract $text bar]
178
179 sets $res to the result of
180
181
182 join {
183 {begin}
184 {1}
185 {3}
186 {4}
187 {5}
188 {end}
189 {begin}
190 {1}
191 {2}
192 {4}
193 {5}
194 {6}
195 {end}
196 {begin}
197 {5}
198 {6}
199 {end} ""
200 } \n
201
202 In guard lines without a '*', '/', '+', or '-' modifier after the ´%<',
203 the guard applies only to the CODE following the '>' on that single
204 line. A '+' modifier is equivalent to no modifier. A '-' modifier is
205 like the case with no modifier, but the expression is implicitly
206 negated, i.e., the CODE of a '%<-' guard line is only included if the
207 expression evaluates to false.
208
209 Metacomment lines are "comment lines which should not be stripped
210 away", but be extracted like code lines; these are sometimes used for
211 copyright notices and similar material. The '%%' prefix is however not
212 kept, but substituted by the current -metaprefix, which is customarily
213 set to some "comment until end of line" character (or character se‐
214 quence) of the language of the code being extracted.
215
216
217 set text [join {
218 {begin}
219 {%<foo> foo}
220 {%<+foo>plusfoo}
221 {%<-foo>minusfoo}
222 {middle}
223 {%% some metacomment}
224 {%<*foo>}
225 {%%another metacomment}
226 {%</foo>}
227 {end}
228 } \n]
229 set res [docstrip::extract $text foo -metaprefix {# }]
230 append res [docstrip::extract $text bar -metaprefix {#}]
231
232 sets $res to the result of
233
234
235 join {
236 {begin}
237 { foo}
238 {plusfoo}
239 {middle}
240 {# some metacomment}
241 {# another metacomment}
242 {end}
243 {begin}
244 {minusfoo}
245 {middle}
246 {# some metacomment}
247 {end} ""
248 } \n
249
250 Verbatim guards can be used to force code line interpretation of a
251 block of lines even if some of them happen to look like any other type
252 of lines to docstrip. A verbatim guard has the form '%<<END-TAG' and
253 the verbatim block is terminated by the first line that is exactly
254 '%END-TAG'.
255
256
257 set text [join {
258 {begin}
259 {%<*myblock>}
260 {some stupid()}
261 { #computer<program>}
262 {%<<QQQ-98765}
263 {% These three lines are copied verbatim (including percents}
264 {%% even if -metaprefix is something different than %%).}
265 {%</myblock>}
266 {%QQQ-98765}
267 { using*strange@programming<language>}
268 {%</myblock>}
269 {end}
270 } \n]
271 set res [docstrip::extract $text myblock -metaprefix {# }]
272 append res [docstrip::extract $text {}]
273
274 sets $res to the result of
275
276
277 join {
278 {begin}
279 {some stupid()}
280 { #computer<program>}
281 {% These three lines are copied verbatim (including percents}
282 {%% even if -metaprefix is something different than %%).}
283 {%</myblock>}
284 { using*strange@programming<language>}
285 {end}
286 {begin}
287 {end} ""
288 } \n
289
290 The processing of verbatim guards takes place also inside blocks of
291 lines which due to some outer block guard will not be copied.
292
293 The final piece of docstrip syntax is that extraction stops at a line
294 that is exactly "\endinput"; this is often used to avoid copying random
295 whitespace at the end of a file. In the unlikely case that one wants
296 such a code line, one can protect it with a verbatim guard.
297
299 The package defines two commands.
300
301 docstrip::extract text terminals ?option value ...?
302 The extract command docstrips the text and returns the extracted
303 lines of code, as a string with each line terminated with a new‐
304 line. The terminals is the list of those guard expression termi‐
305 nals which should evaluate to true. The available options are:
306
307 -annotate lines
308 Requests the specified number of lines of annotation to
309 follow each extracted line in the result. Defaults to 0.
310 Annotation lines are mostly useful when the extracted
311 lines are to undergo some further transformation. A first
312 annotation line is a list of three elements: line type,
313 prefix removed in extraction, and prefix inserted in ex‐
314 traction. The line type is one of: 'V' (verbatim), ´M'
315 (metacomment), '+' (+ or no modifier guard line), '-' (-
316 modifier guard line), '.' (normal line). A second annota‐
317 tion line is the source line number. A third annotation
318 line is the current stack of block guards. Requesting
319 more than three lines of annotation is currently not sup‐
320 ported.
321
322 -metaprefix string
323 The string by which the '%%' prefix of a metacomment line
324 will be replaced. Defaults to '%%'. For Tcl code this
325 would typically be '#'.
326
327 -onerror keyword
328 Controls what will be done when a format error in the
329 text being processed is detected. The settings are:
330
331 ignore Just ignore the error; continue as if nothing hap‐
332 pened.
333
334 puts Write an error message to stderr, then continue
335 processing.
336
337 throw Throw an error. The -errorcode is set to a list
338 whose first element is DOCSTRIP, second element is
339 the type of error, and third element is the line
340 number where the error is detected. This is the
341 default.
342
343 -trimlines boolean
344 Controls whether spaces at the end of a line should be
345 trimmed away before the line is processed. Defaults to
346 true.
347
348 It should be remarked that the terminals are often called "op‐
349 tions" in the context of the docstrip program, since these spec‐
350 ify which optional code fragments should be included.
351
352 docstrip::sourcefrom filename terminals ?option value ...?
353 The sourcefrom command is a docstripping emulation of source. It
354 opens the file filename, reads it, closes it, docstrips the con‐
355 tents as specified by the terminals, and evaluates the result in
356 the local context of the caller, during which time the info
357 script value will be the filename. The options are passed on to
358 fconfigure to configure the file before its contents are read.
359 The -metaprefix is set to '#', all other extract options have
360 their default values.
361
363 The file format (as described above) determines whether a master source
364 code file can be processed correctly by docstrip, but the usefulness of
365 the format is to no little part also dependent on that the code and
366 comment lines together constitute a well-formed document.
367
368 For a document format that does not require any non-Tcl software, see
369 the ddt2man command in the docstrip::util package. It is suggested that
370 files employing that document format are given the suffix ".ddt", to
371 distinguish them from the more traditional LaTeX-based ".dtx" files.
372
373 Master source files with ".dtx" extension are usually set up so that
374 they can be typeset directly by latex without any support from other
375 files. This is achieved by beginning the file with the lines
376
377
378 % \iffalse
379 %<*driver>
380 \documentclass{tclldoc}
381 \begin{document}
382 \DocInput{filename.dtx}
383 \end{document}
384 %</driver>
385 % \fi
386
387 or some variation thereof. The trick is that the file gets read twice.
388 With normal LaTeX reading rules, the first two lines are comments and
389 therefore ignored. The third line is the document preamble, the fourth
390 line begins the document body, and the sixth line ends the document, so
391 LaTeX stops there — non-comments below that point in the file are never
392 subjected to the normal LaTeX reading rules. Before that, however, the
393 \DocInput command on the fifth line is processed, and that does two
394 things: it changes the interpretation of '%' from "comment" to "ig‐
395 nored", and it inputs the file specified in the argument (which is nor‐
396 mally the name of the file the command is in). It is this second time
397 that the file is being read that the comments and code in it are type‐
398 set.
399
400 The function of the \iffalse ... \fi is to skip lines two to seven on
401 this second time through; this is similar to the "if 0 { ... }" idiom
402 for block comments in Tcl code, and it is needed here because (amongst
403 other things) the \documentclass command may only be executed once. The
404 function of the <driver> guards is to prevent this short piece of LaTeX
405 code from being extracted by docstrip. The total effect is that the
406 file can function both as a LaTeX document and as a docstrip master
407 source code file.
408
409 It is not necessary to use the tclldoc document class, but that does
410 provide a number of features that are convenient for ".dtx" files con‐
411 taining Tcl code. More information on this matter can be found in the
412 references above.
413
415 docstrip_util
416
418 \.dtx, LaTeX, docstrip, documentation, literate programming, source
419
421 Documentation tools
422
424 Copyright (c) 2003–2010 Lars Hellström <Lars dot Hellstrom at residenset dot net>
425
426
427
428
429tcllib 1.2 docstrip(n)