1docstrip(n)                Literate programming tool               docstrip(n)
2
3
4
5______________________________________________________________________________
6

NAME

8       docstrip - Docstrip style source code extraction
9

SYNOPSIS

11       package require Tcl  8.4
12
13       package require docstrip  ?1.2?
14
15       docstrip::extract text terminals ?option value ...?
16
17       docstrip::sourcefrom filename terminals ?option value ...?
18
19_________________________________________________________________
20

DESCRIPTION

22       Docstrip  is a tool created to support a brand of Literate Programming.
23       It is most common in the (La)TeX community, where it is being used  for
24       pretty much everything from the LaTeX core and up, but there is nothing
25       about docstrip which prevents using it for other types of software.
26
27       In short, the basic principle of literate programming is  that  program
28       source  should primarily be written and structured to suit the develop‐
29       ers (and advanced users who want to peek "under the hood"), not to suit
30       the  whims  of  a compiler or corresponding source code consumer.  This
31       means literate sources often need some  kind  of  "translation"  to  an
32       illiterate  form  that  dumb software can understand.  The docstrip Tcl
33       package handles this translation.
34
35       Even for those who do not whole-hartedly subscribe  to  the  philosophy
36       behind  literate  programming, docstrip can bring greater clarity to in
37       particular:
38
39       ·      programs employing non-obvious mathematics
40
41       ·      projects where separate pieces of  code,  perhaps  in  different
42              languages, need to be closely coordinated.  The first is by pro‐
43              viding access to much more powerful typographical  features  for
44              source  code comments than are possible in plain text.  The sec‐
45              ond is because all the separate pieces of code can be kept  next
46              to each other in the same source file.
47
48       The way it works is that the programmer edits directly only one or sev‐
49       eral "master" source code files, from which docstrip generates the more
50       traditional "source" files compilers or the like would expect. The mas‐
51       ter sources typically contain a large amount of  documentation  of  the
52       code, sometimes even in places where the code consumers would not allow
53       any comments. The etymology of "docstrip" is  that  this  documentation
54       was  stripped  away  (although  "code  extraction"  might  be  a better
55       description, as it has always been a matter of copying selected  pieces
56       of  the master source rather than deleting text from it).  The docstrip
57       Tcl package contains a reimplementation of the basic  extraction  func‐
58       tionality  from  the docstrip program, and thus makes it possible for a
59       Tcl interpreter to read and interpret the master source files directly.
60
61       Readers who are not previously familiar with docstrip but want to  know
62       more about it may consult the following sources.
63
64       [1]    The    tclldoc   package   and   class,   http://tug.org/tex-ar
65              chive/macros/latex/contrib/tclldoc/.
66
67       [2]    The       DocStrip        utility,        http://tug.org/tex-ar
68              chive/macros/latex/base/docstrip.dtx.
69
70       [3]    The    doc   and   shortvrb   Packages,   http://tug.org/tex-ar
71              chive/macros/latex/base/doc.dtx.
72
73       [4]    Chapter 14 of The LaTeX Companion (second edition), Addison-Wes‐
74              ley, 2004; ISBN 0-201-36299-6.
75

File format

77       The  basic  unit  docstrip operates on are the lines of a master source
78       file. Extraction consists of selecting some of these lines to be copied
79       from  input  text to output text. The basic distinction is that between
80       code lines (which are copied and do not begin with a percent character)
81       and  comment  lines  (which  begin with a percent character and are not
82       copied).
83
84          docstrip::extract [join {
85            {% comment}
86            {% more comment !"#$%&/(}
87            {some command}
88            { % blah $blah "Not a comment."}
89            {% abc; this is comment}
90            {# def; this is code}
91            {ghi}
92            {% jkl}
93          } \n] {}
94
95       returns the same sequence of lines as
96
97          join {
98            {some command}
99            { % blah $blah "Not a comment."}
100            {# def; this is code}
101            {ghi} ""
102          } \n
103
104       It does not matter to docstrip what format is used for  the  documenta‐
105       tion  in  the  comment lines, but in order to do better than plain text
106       comments, one typically uses some markup language. Most commonly  LaTeX
107       is  used,  as that is a very established standard and also provides the
108       best support for mathematical formulae, but the docstrip::util  package
109       also gives some support for doctools-like markup.
110
111       Besides  the  basic code and comment lines, there are also guard lines,
112       which begin with the two characters '%<', and meta-comment lines, which
113       begin  with  the  two characters ´%%'. Within guard lines there is fur‐
114       thermore the distinction between verbatim guard lines, which begin with
115       '%<<',  and  ordinary  guard  lines,  where the '%<' is not followed by
116       another '<'. The last category is by far the most common.
117
118       Ordinary guard lines conditions extraction of  the  code  line(s)  they
119       guard  by  the value of a boolean expression; the guarded block of code
120       lines will only be included if the expression evaluates to  true.   The
121       syntax of an ordinary guard line is one of
122
123           '%' '<' STARSLASH EXPRESSION '>'
124           '%' '<' PLUSMINUS EXPRESSION '>' CODE
125
126       where
127
128           STARSLASH  ::=  '*' | '/'
129           PLUSMINUS  ::=  '+' | '-' |
130           EXPRESSION ::= SECONDARY | SECONDARY ',' EXPRESSION
131                        | SECONDARY '|' EXPRESSION
132           SECONDARY  ::= PRIMARY | PRIMARY '&' SECONDARY
133           PRIMARY    ::= TERMINAL | '!' PRIMARY | '(' EXPRESSION ')'
134           CODE       ::= { any character except end-of-line }
135
136       Comma  and  vertical  bar  both  denote  'or'. Ampersand denotes 'and'.
137       Exclamation mark denotes 'not'. A TERMINAL can be any  nonempty  string
138       of  characters  not  containing  '>',  '&',  '|',  comma,  '(', or ')',
139       although the docstrip manual is a bit restrictive and  only  guarantees
140       proper  operation  for strings of letters (although even the LaTeX core
141       sources make heavy use also of digits in TERMINALs).  The second  argu‐
142       ment  of  docstrip::extract  is the list of those TERMINALs that should
143       count as having the value 'true'; all other TERMINALs  count  as  being
144       'false' when guard expressions are evaluated.
145
146       In  the  case  of  a  '%<*EXPRESSION>' guard, the lines guarded are all
147       lines up to the next '%</EXPRESSION>' guard with  the  same  EXPRESSION
148       (compared as strings). The blocks of code delimited by such '*' and '/'
149       guard lines must be properly nested.
150
151          set text [join {
152             {begin}
153             {%<*foo>}
154             {1}
155             {%<*bar>}
156             {2}
157             {%</bar>}
158             {%<*!bar>}
159             {3}
160             {%</!bar>}
161             {4}
162             {%</foo>}
163             {5}
164             {%<*bar>}
165             {6}
166             {%</bar>}
167             {end}
168          } \n]
169          set res [docstrip::extract $text foo]
170          append res [docstrip::extract $text {foo bar}]
171          append res [docstrip::extract $text bar]
172
173       sets $res to the result of
174
175          join {
176             {begin}
177             {1}
178             {3}
179             {4}
180             {5}
181             {end}
182             {begin}
183             {1}
184             {2}
185             {4}
186             {5}
187             {6}
188             {end}
189             {begin}
190             {5}
191             {6}
192             {end} ""
193          } \n
194
195       In guard lines without a '*', '/', '+', or '-' modifier after the ´%<',
196       the  guard  applies  only  to the CODE following the '>' on that single
197       line. A '+' modifier is equivalent to no modifier. A  '-'  modifier  is
198       like  the  case  with  no  modifier,  but  the expression is implicitly
199       negated, i.e., the CODE of a '%<-' guard line is only included  if  the
200       expression evaluates to false.
201
202       Metacomment  lines  are  "comment  lines  which  should not be stripped
203       away", but be extracted like code lines; these are sometimes  used  for
204       copyright  notices and similar material. The '%%' prefix is however not
205       kept, but substituted by the current -metaprefix, which is  customarily
206       set  to  some  "comment  until  end  of  line"  character (or character
207       sequence) of the language of the code being extracted.
208
209          set text [join {
210             {begin}
211             {%<foo> foo}
212             {%<+foo>plusfoo}
213             {%<-foo>minusfoo}
214             {middle}
215             {%% some metacomment}
216             {%<*foo>}
217             {%%another metacomment}
218             {%</foo>}
219             {end}
220          } \n]
221          set res [docstrip::extract $text foo -metaprefix {# }]
222          append res [docstrip::extract $text bar -metaprefix {#}]
223
224       sets $res to the result of
225
226          join {
227             {begin}
228             { foo}
229             {plusfoo}
230             {middle}
231             {#  some metacomment}
232             {# another metacomment}
233             {end}
234             {begin}
235             {minusfoo}
236             {middle}
237             {# some metacomment}
238             {end} ""
239          } \n
240
241       Verbatim guards can be used to force  code  line  interpretation  of  a
242       block  of lines even if some of them happen to look like any other type
243       of lines to docstrip. A verbatim guard has the  form  '%<<END-TAG'  and
244       the  verbatim  block  is  terminated  by the first line that is exactly
245       '%END-TAG'.
246
247          set text [join {
248             {begin}
249             {%<*myblock>}
250             {some stupid()}
251             {   #computer<program>}
252             {%<<QQQ-98765}
253             {% These three lines are copied verbatim (including percents}
254             {%% even if -metaprefix is something different than %%).}
255             {%</myblock>}
256             {%QQQ-98765}
257             {   using*strange@programming<language>}
258             {%</myblock>}
259             {end}
260          } \n]
261          set res [docstrip::extract $text myblock -metaprefix {# }]
262          append res [docstrip::extract $text {}]
263
264       sets $res to the result of
265
266          join {
267             {begin}
268             {some stupid()}
269             {   #computer<program>}
270             {% These three lines are copied verbatim (including percents}
271             {%% even if -metaprefix is something different than %%).}
272             {%</myblock>}
273             {   using*strange@programming<language>}
274             {end}
275             {begin}
276             {end} ""
277          } \n
278
279       The processing of verbatim guards takes place  also  inside  blocks  of
280       lines which due to some outer block guard will not be copied.
281
282       The  final  piece of docstrip syntax is that extraction stops at a line
283       that is exactly "\endinput"; this is often used to avoid copying random
284       whitespace  at  the  end of a file. In the unlikely case that one wants
285       such a code line, one can protect it with a verbatim guard.
286

Commands

288       The package defines two commands.
289
290       docstrip::extract text terminals ?option value ...?
291              The extract command docstrips the text and returns the extracted
292              lines of code, as a string with each line terminated with a new‐
293              line. The terminals is the list of those guard expression termi‐
294              nals which should evaluate to true.  The available options are:
295
296              -annotate lines
297                     Requests  the  specified number of lines of annotation to
298                     follow each extracted line in the result. Defaults to  0.
299                     Annotation  lines  are  mostly  useful when the extracted
300                     lines are to undergo some further transformation. A first
301                     annotation  line  is a list of three elements: line type,
302                     prefix removed in  extraction,  and  prefix  inserted  in
303                     extraction.  The line type is one of: 'V' (verbatim), ´M'
304                     (metacomment), '+' (+ or no modifier guard line), '-'  (-
305                     modifier guard line), '.' (normal line). A second annota‐
306                     tion line is the source line number. A  third  annotation
307                     line  is  the  current  stack of block guards. Requesting
308                     more than three lines of annotation is currently not sup‐
309                     ported.
310
311              -metaprefix string
312                     The string by which the '%%' prefix of a metacomment line
313                     will be replaced. Defaults to '%%'.  For  Tcl  code  this
314                     would typically be '#'.
315
316              -onerror keyword
317                     Controls  what  will  be  done when a format error in the
318                     text being processed is detected. The settings are:
319
320                     ignore Just ignore the error; continue as if nothing hap‐
321                            pened.
322
323                     puts   Write  an  error  message to stderr, then continue
324                            processing.
325
326                     throw  Throw an error. ::errorCode is set to a list whose
327                            first  element  is DOCSTRIP, second element is the
328                            type of error, and third element is the line  num‐
329                            ber  where  the  error  is  detected.  This is the
330                            default.
331
332              -trimlines boolean
333                     Controls whether spaces at the end of a  line  should  be
334                     trimmed  away  before  the line is processed. Defaults to
335                     true.
336       It should be remarked that the terminals are often called "options"  in
337       the context of the docstrip program, since these specify which optional
338       code fragments should be included.
339
340       docstrip::sourcefrom filename terminals ?option value ...?
341              The sourcefrom command is a docstripping emulation of source. It
342              opens the file filename, reads it, closes it, docstrips the con‐
343              tents as specified by the terminals, and evaluates the result in
344              the  local  context  of  the  caller, during which time the info
345              script value will be the filename. The options are passed on  to
346              fconfigure  to  configure the file before its contents are read.
347              The -metaprefix is set to '#', all other  extract  options  have
348              their default values.
349

Document structure

351       The file format (as described above) determines whether a master source
352       code file can be processed correctly by docstrip, but the usefulness of
353       the  format  is  to  no little part also dependent on that the code and
354       comment lines together constitute a well-formed document.
355
356       For a document format that does not require any non-Tcl  software,  see
357       the ddt2man command in the docstrip::util package. It is suggested that
358       files employing that document format are given the  suffix  ".ddt",  to
359       distinguish them from the more traditional LaTeX-based ".dtx" files.
360
361       Master  source  files  with ".dtx" extension are usually set up so that
362       they can be typeset directly by latex without any  support  from  other
363       files. This is achieved by beginning the file with the lines
364
365          % \iffalse
366          %<*driver>
367          \documentclass{tclldoc}
368          \begin{document}
369          \DocInput{filename.dtx}
370          \end{document}
371          %</driver>
372          % \fi
373
374       or  some variation thereof. The trick is that the file gets read twice.
375       With normal LaTeX reading rules, the first two lines are  comments  and
376       therefore  ignored. The third line is the document preamble, the fourth
377       line begins the document body, and the sixth line ends the document, so
378       LaTeX  stops  there  --  non-comments  below that point in the file are
379       never subjected to the normal LaTeX reading rules.  Before  that,  how‐
380       ever,  the  \DocInput  command on the fifth line is processed, and that
381       does two things: it changes the interpretation of '%' from "comment" to
382       "ignored",  and  it inputs the file specified in the argument (which is
383       normally the name of the file the command is in).  It  is  this  second
384       time  that  the file is being read that the comments and code in it are
385       typeset.
386
387       The function of the \iffalse ... \fi is to skip lines two to  seven  on
388       this  second  time through; this is similar to the "if 0 { ... }" idiom
389       for block comments in Tcl code, and it is needed here because  (amongst
390       other things) the \documentclass command may only be executed once. The
391       function of the <driver> guards is to prevent this short piece of LaTeX
392       code  from  being  extracted by docstrip.  The total effect is that the
393       file can function both as a LaTeX document and  as  a  docstrip  master
394       source code file.
395
396       It  is  not  necessary to use the tclldoc document class, but that does
397       provide a number of features that are convenient for ".dtx" files  con‐
398       taining  Tcl  code. More information on this matter can be found in the
399       references above.
400

SEE ALSO

402       docstrip_util
403

KEYWORDS

406       Copyright (c) 2003-2005 Lars Hellström <Lars dot Hellstrom at residenset dot net>
407
408
409
410
411docstrip                              1.2                          docstrip(n)
Impressum