1docstrip(n) Literate programming tool docstrip(n)
2
3
4
5______________________________________________________________________________
6
8 docstrip - Docstrip style source code extraction
9
11 package require Tcl 8.4
12
13 package require docstrip ?1.2?
14
15 docstrip::extract text terminals ?option value ...?
16
17 docstrip::sourcefrom filename terminals ?option value ...?
18
19_________________________________________________________________
20
22 Docstrip is a tool created to support a brand of Literate Programming.
23 It is most common in the (La)TeX community, where it is being used for
24 pretty much everything from the LaTeX core and up, but there is nothing
25 about docstrip which prevents using it for other types of software.
26
27 In short, the basic principle of literate programming is that program
28 source should primarily be written and structured to suit the develop‐
29 ers (and advanced users who want to peek "under the hood"), not to suit
30 the whims of a compiler or corresponding source code consumer. This
31 means literate sources often need some kind of "translation" to an
32 illiterate form that dumb software can understand. The docstrip Tcl
33 package handles this translation.
34
35 Even for those who do not whole-hartedly subscribe to the philosophy
36 behind literate programming, docstrip can bring greater clarity to in
37 particular:
38
39 · programs employing non-obvious mathematics
40
41 · projects where separate pieces of code, perhaps in different
42 languages, need to be closely coordinated. The first is by pro‐
43 viding access to much more powerful typographical features for
44 source code comments than are possible in plain text. The sec‐
45 ond is because all the separate pieces of code can be kept next
46 to each other in the same source file.
47
48 The way it works is that the programmer edits directly only one or sev‐
49 eral "master" source code files, from which docstrip generates the more
50 traditional "source" files compilers or the like would expect. The mas‐
51 ter sources typically contain a large amount of documentation of the
52 code, sometimes even in places where the code consumers would not allow
53 any comments. The etymology of "docstrip" is that this documentation
54 was stripped away (although "code extraction" might be a better
55 description, as it has always been a matter of copying selected pieces
56 of the master source rather than deleting text from it). The docstrip
57 Tcl package contains a reimplementation of the basic extraction func‐
58 tionality from the docstrip program, and thus makes it possible for a
59 Tcl interpreter to read and interpret the master source files directly.
60
61 Readers who are not previously familiar with docstrip but want to know
62 more about it may consult the following sources.
63
64 [1] The tclldoc package and class, http://tug.org/tex-ar‐
65 chive/macros/latex/contrib/tclldoc/.
66
67 [2] The DocStrip utility, http://tug.org/tex-ar‐
68 chive/macros/latex/base/docstrip.dtx.
69
70 [3] The doc and shortvrb Packages, http://tug.org/tex-ar‐
71 chive/macros/latex/base/doc.dtx.
72
73 [4] Chapter 14 of The LaTeX Companion (second edition), Addison-Wes‐
74 ley, 2004; ISBN 0-201-36299-6.
75
77 The basic unit docstrip operates on are the lines of a master source
78 file. Extraction consists of selecting some of these lines to be copied
79 from input text to output text. The basic distinction is that between
80 code lines (which are copied and do not begin with a percent character)
81 and comment lines (which begin with a percent character and are not
82 copied).
83
84 docstrip::extract [join {
85 {% comment}
86 {% more comment !"#$%&/(}
87 {some command}
88 { % blah $blah "Not a comment."}
89 {% abc; this is comment}
90 {# def; this is code}
91 {ghi}
92 {% jkl}
93 } \n] {}
94
95 returns the same sequence of lines as
96
97 join {
98 {some command}
99 { % blah $blah "Not a comment."}
100 {# def; this is code}
101 {ghi} ""
102 } \n
103
104 It does not matter to docstrip what format is used for the documenta‐
105 tion in the comment lines, but in order to do better than plain text
106 comments, one typically uses some markup language. Most commonly LaTeX
107 is used, as that is a very established standard and also provides the
108 best support for mathematical formulae, but the docstrip::util package
109 also gives some support for doctools-like markup.
110
111 Besides the basic code and comment lines, there are also guard lines,
112 which begin with the two characters '%<', and meta-comment lines, which
113 begin with the two characters ´%%'. Within guard lines there is fur‐
114 thermore the distinction between verbatim guard lines, which begin with
115 '%<<', and ordinary guard lines, where the '%<' is not followed by
116 another '<'. The last category is by far the most common.
117
118 Ordinary guard lines conditions extraction of the code line(s) they
119 guard by the value of a boolean expression; the guarded block of code
120 lines will only be included if the expression evaluates to true. The
121 syntax of an ordinary guard line is one of
122
123 '%' '<' STARSLASH EXPRESSION '>'
124 '%' '<' PLUSMINUS EXPRESSION '>' CODE
125
126 where
127
128 STARSLASH ::= '*' | '/'
129 PLUSMINUS ::= '+' | '-' |
130 EXPRESSION ::= SECONDARY | SECONDARY ',' EXPRESSION
131 | SECONDARY '|' EXPRESSION
132 SECONDARY ::= PRIMARY | PRIMARY '&' SECONDARY
133 PRIMARY ::= TERMINAL | '!' PRIMARY | '(' EXPRESSION ')'
134 CODE ::= { any character except end-of-line }
135
136 Comma and vertical bar both denote 'or'. Ampersand denotes 'and'.
137 Exclamation mark denotes 'not'. A TERMINAL can be any nonempty string
138 of characters not containing '>', '&', '|', comma, '(', or ')',
139 although the docstrip manual is a bit restrictive and only guarantees
140 proper operation for strings of letters (although even the LaTeX core
141 sources make heavy use also of digits in TERMINALs). The second argu‐
142 ment of docstrip::extract is the list of those TERMINALs that should
143 count as having the value 'true'; all other TERMINALs count as being
144 'false' when guard expressions are evaluated.
145
146 In the case of a '%<*EXPRESSION>' guard, the lines guarded are all
147 lines up to the next '%</EXPRESSION>' guard with the same EXPRESSION
148 (compared as strings). The blocks of code delimited by such '*' and '/'
149 guard lines must be properly nested.
150
151 set text [join {
152 {begin}
153 {%<*foo>}
154 {1}
155 {%<*bar>}
156 {2}
157 {%</bar>}
158 {%<*!bar>}
159 {3}
160 {%</!bar>}
161 {4}
162 {%</foo>}
163 {5}
164 {%<*bar>}
165 {6}
166 {%</bar>}
167 {end}
168 } \n]
169 set res [docstrip::extract $text foo]
170 append res [docstrip::extract $text {foo bar}]
171 append res [docstrip::extract $text bar]
172
173 sets $res to the result of
174
175 join {
176 {begin}
177 {1}
178 {3}
179 {4}
180 {5}
181 {end}
182 {begin}
183 {1}
184 {2}
185 {4}
186 {5}
187 {6}
188 {end}
189 {begin}
190 {5}
191 {6}
192 {end} ""
193 } \n
194
195 In guard lines without a '*', '/', '+', or '-' modifier after the ´%<',
196 the guard applies only to the CODE following the '>' on that single
197 line. A '+' modifier is equivalent to no modifier. A '-' modifier is
198 like the case with no modifier, but the expression is implicitly
199 negated, i.e., the CODE of a '%<-' guard line is only included if the
200 expression evaluates to false.
201
202 Metacomment lines are "comment lines which should not be stripped
203 away", but be extracted like code lines; these are sometimes used for
204 copyright notices and similar material. The '%%' prefix is however not
205 kept, but substituted by the current -metaprefix, which is customarily
206 set to some "comment until end of line" character (or character
207 sequence) of the language of the code being extracted.
208
209 set text [join {
210 {begin}
211 {%<foo> foo}
212 {%<+foo>plusfoo}
213 {%<-foo>minusfoo}
214 {middle}
215 {%% some metacomment}
216 {%<*foo>}
217 {%%another metacomment}
218 {%</foo>}
219 {end}
220 } \n]
221 set res [docstrip::extract $text foo -metaprefix {# }]
222 append res [docstrip::extract $text bar -metaprefix {#}]
223
224 sets $res to the result of
225
226 join {
227 {begin}
228 { foo}
229 {plusfoo}
230 {middle}
231 {# some metacomment}
232 {# another metacomment}
233 {end}
234 {begin}
235 {minusfoo}
236 {middle}
237 {# some metacomment}
238 {end} ""
239 } \n
240
241 Verbatim guards can be used to force code line interpretation of a
242 block of lines even if some of them happen to look like any other type
243 of lines to docstrip. A verbatim guard has the form '%<<END-TAG' and
244 the verbatim block is terminated by the first line that is exactly
245 '%END-TAG'.
246
247 set text [join {
248 {begin}
249 {%<*myblock>}
250 {some stupid()}
251 { #computer<program>}
252 {%<<QQQ-98765}
253 {% These three lines are copied verbatim (including percents}
254 {%% even if -metaprefix is something different than %%).}
255 {%</myblock>}
256 {%QQQ-98765}
257 { using*strange@programming<language>}
258 {%</myblock>}
259 {end}
260 } \n]
261 set res [docstrip::extract $text myblock -metaprefix {# }]
262 append res [docstrip::extract $text {}]
263
264 sets $res to the result of
265
266 join {
267 {begin}
268 {some stupid()}
269 { #computer<program>}
270 {% These three lines are copied verbatim (including percents}
271 {%% even if -metaprefix is something different than %%).}
272 {%</myblock>}
273 { using*strange@programming<language>}
274 {end}
275 {begin}
276 {end} ""
277 } \n
278
279 The processing of verbatim guards takes place also inside blocks of
280 lines which due to some outer block guard will not be copied.
281
282 The final piece of docstrip syntax is that extraction stops at a line
283 that is exactly "\endinput"; this is often used to avoid copying random
284 whitespace at the end of a file. In the unlikely case that one wants
285 such a code line, one can protect it with a verbatim guard.
286
288 The package defines two commands.
289
290 docstrip::extract text terminals ?option value ...?
291 The extract command docstrips the text and returns the extracted
292 lines of code, as a string with each line terminated with a new‐
293 line. The terminals is the list of those guard expression termi‐
294 nals which should evaluate to true. The available options are:
295
296 -annotate lines
297 Requests the specified number of lines of annotation to
298 follow each extracted line in the result. Defaults to 0.
299 Annotation lines are mostly useful when the extracted
300 lines are to undergo some further transformation. A first
301 annotation line is a list of three elements: line type,
302 prefix removed in extraction, and prefix inserted in
303 extraction. The line type is one of: 'V' (verbatim), ´M'
304 (metacomment), '+' (+ or no modifier guard line), '-' (-
305 modifier guard line), '.' (normal line). A second annota‐
306 tion line is the source line number. A third annotation
307 line is the current stack of block guards. Requesting
308 more than three lines of annotation is currently not sup‐
309 ported.
310
311 -metaprefix string
312 The string by which the '%%' prefix of a metacomment line
313 will be replaced. Defaults to '%%'. For Tcl code this
314 would typically be '#'.
315
316 -onerror keyword
317 Controls what will be done when a format error in the
318 text being processed is detected. The settings are:
319
320 ignore Just ignore the error; continue as if nothing hap‐
321 pened.
322
323 puts Write an error message to stderr, then continue
324 processing.
325
326 throw Throw an error. ::errorCode is set to a list whose
327 first element is DOCSTRIP, second element is the
328 type of error, and third element is the line num‐
329 ber where the error is detected. This is the
330 default.
331
332 -trimlines boolean
333 Controls whether spaces at the end of a line should be
334 trimmed away before the line is processed. Defaults to
335 true.
336 It should be remarked that the terminals are often called "options" in
337 the context of the docstrip program, since these specify which optional
338 code fragments should be included.
339
340 docstrip::sourcefrom filename terminals ?option value ...?
341 The sourcefrom command is a docstripping emulation of source. It
342 opens the file filename, reads it, closes it, docstrips the con‐
343 tents as specified by the terminals, and evaluates the result in
344 the local context of the caller, during which time the info
345 script value will be the filename. The options are passed on to
346 fconfigure to configure the file before its contents are read.
347 The -metaprefix is set to '#', all other extract options have
348 their default values.
349
351 The file format (as described above) determines whether a master source
352 code file can be processed correctly by docstrip, but the usefulness of
353 the format is to no little part also dependent on that the code and
354 comment lines together constitute a well-formed document.
355
356 For a document format that does not require any non-Tcl software, see
357 the ddt2man command in the docstrip::util package. It is suggested that
358 files employing that document format are given the suffix ".ddt", to
359 distinguish them from the more traditional LaTeX-based ".dtx" files.
360
361 Master source files with ".dtx" extension are usually set up so that
362 they can be typeset directly by latex without any support from other
363 files. This is achieved by beginning the file with the lines
364
365 % \iffalse
366 %<*driver>
367 \documentclass{tclldoc}
368 \begin{document}
369 \DocInput{filename.dtx}
370 \end{document}
371 %</driver>
372 % \fi
373
374 or some variation thereof. The trick is that the file gets read twice.
375 With normal LaTeX reading rules, the first two lines are comments and
376 therefore ignored. The third line is the document preamble, the fourth
377 line begins the document body, and the sixth line ends the document, so
378 LaTeX stops there -- non-comments below that point in the file are
379 never subjected to the normal LaTeX reading rules. Before that, how‐
380 ever, the \DocInput command on the fifth line is processed, and that
381 does two things: it changes the interpretation of '%' from "comment" to
382 "ignored", and it inputs the file specified in the argument (which is
383 normally the name of the file the command is in). It is this second
384 time that the file is being read that the comments and code in it are
385 typeset.
386
387 The function of the \iffalse ... \fi is to skip lines two to seven on
388 this second time through; this is similar to the "if 0 { ... }" idiom
389 for block comments in Tcl code, and it is needed here because (amongst
390 other things) the \documentclass command may only be executed once. The
391 function of the <driver> guards is to prevent this short piece of LaTeX
392 code from being extracted by docstrip. The total effect is that the
393 file can function both as a LaTeX document and as a docstrip master
394 source code file.
395
396 It is not necessary to use the tclldoc document class, but that does
397 provide a number of features that are convenient for ".dtx" files con‐
398 taining Tcl code. More information on this matter can be found in the
399 references above.
400
402 docstrip_util
403
406 Copyright (c) 2003-2005 Lars Hellström <Lars dot Hellstrom at residenset dot net>
407
408
409
410
411docstrip 1.2 docstrip(n)