1LGC(5) File Formats LGC(5)
2
3
4
6 lgc - the lgs source file format for the lgc compiler
7
9 Source files of the Logiweb compiler lgc (lgc(1)) are expressed in the
10 LoGiweb Source language (lgs). The lgs language allows to express math‐
11 ematics in a seminatural style.
12
13 To learn lgs, simply read the Logiweb source of the 'base' page at
14 http://logiweb.eu/1.0/doc/pages/base/source.lgs. The comments in there
15 give much more details than could reasonably be included here. Then
16 read the 'lgc' page found same place. It documents the lgc compiler
17 including lots of details on lgs.
18
19 An overview is given in the following, however.
20
22 The lgc compiler translates lgs into Logiweb vectors, racks, and ren‐
23 derings. The Logiweb standard defines the format of Logiweb vectors and
24 racks, and defines precisely how vectors are translated to racks.
25
26 The Logiweb standard does not, however, define the lgs format. The lgc
27 compiler is the compiler which happens to come with the Logiweb distri‐
28 bution and the lgs format happens to be the input format of the lgc
29 compiler. But Logiweb does not consider lgs as part of the standard.
30 Any compiler which produduces vectors, racks, and renderings may be
31 used in connection with Logiweb.
32
33 The Logiweb standard partially defines what a rendering is: A rendering
34 is a file tree rooted at a 'rendering directory'. The rendering direc‐
35 tory is supposed to contain a file named vector.lgw which contains the
36 page in vector format, a file named rack.lgr which contains the page in
37 rack format, and a subdirectory named page which contains the rendering
38 of the page. Compilers for Logiweb are free to produce additional con‐
39 tents of the rendering directory such as an index.html file.
40
41 Logiweb compilers are only required to (1) produce a vector.lgw file in
42 Logiweb vector format, (2) to produce an associated rack.lgr file which
43 is derived from vector.lgw in exactly the same way as lgc does, and (3)
44 a 'page' subdirectory which is derived from rack.lgr in exactly the
45 same way as lgc does.
46
48 Each lgs file is expressed in Unicode UTF-8. Lines may be terminated by
49 LF (code 10), CR (code 13), CRLF (code 10 followed by code 13), or LFCR
50 (code 13 followed by code 10).
51
52 Internally, Logiweb uses LF for terminating lines. More specificially,
53 plain text inside Logiweb vectors and Logiweb racks uses LF for termi‐
54 nating lines. The purpose of this is to ensure interoperability between
55 different platforms.
56
57 lgc translates to LF when reading lgs files and translates to host new‐
58 line convention when producing renderings.
59
61 The only reserved character in lgs is the double quote character. The
62 lgs language uses double quote characters for many different purposes.
63
64 We shall refer to a sequence of two or more double quote characters as
65 a 'multiquote' and to an isolated double quote character as a 'lone
66 quote'.
67
68 We shall refer to a multiquote followed by a non-quote as a 'direc‐
69 tive'.
70
72 Comments start with ""{ or ""; directives (i.e. with two or more double
73 quote characters followed by a left brace or a semicolon).
74
75 Comments that start with ""; end at the end of the line.
76
77 Comments that start with ""{ can span any number of lines. They end at
78 the first ""} directive which has exactly the same number of double
79 quote characters as the opening directive. This is an example of a com‐
80 ment:
81 """{ A ""} ends a comment starting with ""{ """}
82 Note that the comment is enclosed in brace directives with three double
83 quotes. The brace directives with two double quotes are part of the
84 comment.
85
86 Comments may occur anywhere except after a double quote since if it did
87 then that double quote would be considered to be part of the directive.
88 In particular, comments may occur inside strings and in the middle of
89 keywords.
90
91 If the first four characters of a file constitute the magic code "";;
92 then the first line of the file is considered to be a 'header'. All hex
93 characters from the magic code and up to the first non-hex character
94 suggests what the reference of the page might be. Whenever a source
95 file with a header is translated, the suggested reference is used if it
96 fits the contents. Otherwise, a new reference is generated and the com‐
97 piler writes the new reference back into the header. To use this facil‐
98 ity, let your source file start with a line containing nothing but
99 "";;. At first translation, a reference will be stored back in the
100 header. After that, whenever you retranslate the source without having
101 done changes to it, the page will get the same reference as last time
102 it was translated. Without a header, the page will get a new time stamp
103 at each translation.
104
106 The following is a wellformed lgs file:
107 ""P my page
108 ""R base
109 ""D
110 " square
111 ""B
112 "We have that "[[ 2 square ]]" is four."
113
115 Each lgs file must contain one ""P directive which defines the name of
116 the page being defined. The page name comprises all characters from the
117 directive until the end of the line. One may use a newline directive
118 (""n) instead of the end of the line to delimit the page name.
119
120 Lone quotes after the ""P directive have a special meaning described in
121 the section named QUALIFIERS below.
122
123 Comments in page names are ignored. Note that if the line defining the
124 page name ends with a ""; comment then the end of line is ignored and
125 the page name effectively continues on the next line. A similar remark
126 holds for ""{ comments which spans several lines.
127
128 By convention, the ""P directive of an lgs file should occur at the
129 beginning of the file, possibly after a "";; header and a comment about
130 copyright.
131
133 Each lgs file may contain zero, one, or more ""R directives. Each ""R
134 directive names a page being referenced. The name of the referenced
135 page comprises all characters from the ""R directive until the end of
136 the line or until the first ""n directive, whatever comes first.
137
138 The page named by the first ""R directive is reference number 1, the
139 one named by the second is reference number 2, and so on. Implicitly,
140 the page being defined is considered to be 'reference number 0'.
141
142 Lone quotes after ""R directives have a special meaning described in
143 the section named QUALIFIERS below.
144
145 By convention, all ""R directives should come right after the ""P
146 directive.
147
148 Referenced pages may be pointed at in many, different ways. Some exam‐
149 ples read:
150 ""R file:/usr/share/logiweb/name/base/vector.lgw
151 ""R file:~/.logiweb/name/base/vector.lgw
152 ""R file:../name/base/vector.lgw
153 ""R http://logiweb.eu/1.0/doc/pages/base/vector.lgw
154 ""R base
155 ""R lgw:017451CF6643931035C71796AC493D382EC8357EE9A390D5D6DBCDAA0806
156 The first three reference Logiweb vectors in the local file system,
157 relative to the root directory, the home directory, and the current
158 directory, respectively. The fourth one references a particular http
159 url. The fifth makes a reference by name which is resolved by the
160 'namepath' parameter of the lgc compiler. The last one uses a Logiweb
161 reference which is resolved by the 'path' parameter of the lgc com‐
162 piler.
163
164 See the 'lgc' Logiweb page or http://logiweb.eu/ for more details on
165 references.
166
168 Each lgs file may contain zero, one, or more ""D directives. Each ""D
169 directive defines zero, one, or more syntactical constructs.
170
171 Each line following a ""D directive and until the first ""P, ""R, ""D,
172 or ""B directive defines one syntactical construct (blank lines are
173 ignored, though).
174
175 In construct definitions, lone quotes serve as placeholders. Three
176 examples of constructs read:
177 " square
178 " < "
179 if " then " else "
180 The constructs above allow to write expressions like
181 if 2 square < 3 square then 4 else 5
182
183 Each page has a Logiweb reference of about 30 bytes and each construct
184 defined on a page has an index. The first construct defined has index
185 1, then second has index 2 and so on. Implicitly, the page name is also
186 considered to be a construct. The page name has index 0.
187
188 When a page defines a construct, that page is considered to be the
189 'home page' of the construct. Each Logiweb page is identified by its
190 world wide unique Logiweb reference. Each Logiweb construct is uniquely
191 identified by its index together with the reference of its home page.
192
193 By convention, ""D sections come after the ""R sections.
194
196 One may assign a 'charge' to defined constructs. As an example, it is
197 customary to assign a larger charge to addition than to multiplication
198 such that e.g.
199 2 * 3 + 4 * 5
200 means
201 ( 2 * 3 ) + ( 4 * 5 )
202 A charge is the opposite of a priority such that constructs with high
203 charge has low priority and vice versa.
204
205 Charges are expressed as lists of integers, separated by dots. As an
206 example, 2.-3.4 is an example of a charge.
207
208 Charges are sorted lexicographically such that e.g.
209 1.2.-1 < 1.2 < 1.2.2 < 2.1
210 When comparing two charges of different length, the shorter one is
211 padded with zeros at the end. As an example 1.2 and 1.2.0 denote the
212 same charge.
213
214 One may include a charge between a ""D directive and the first newline
215 character after it. The charge applies to all constructs introduced by
216 the given ""D section. As an example, the following definitions assign
217 charge 1.6 to multiplication and 1.8 to addition and subtraction:
218 ""D 1.6
219 " * "
220 ""D 1.8
221 " + "
222 " - "
223 One may also give a charge indirectly. As an example, the following
224 assigns the charge of multiplication to division:
225 ""D " * "
226 " / "
227 By convention, all constructs which neither start nor end by a lone
228 quote should have charge zero. The page symbol always has charge zero.
229 If no charge is given after a ""D directive then all constructs defined
230 by the directive get charge zero.
231
232 A charge is said to be odd/even if its last, nonzero element is
233 odd/even. As an example, 2.4.6.7.0.0 is odd. As a special case, charge
234 zero is considered to be even.
235
236 Constructs with even charge are preassociative. A preassociative con‐
237 struct is left associative in text written left to right, right asso‐
238 ciative in text written right to left, and counterclockwise associative
239 in text written in clockwise spirals. Constructs with odd charge are
240 postassociative. As an example, if subtraction has charge 1.8 then sub‐
241 traction is preassociative. man pages are written left to right so pre‐
242 associative means left associative here. Hence,
243 6 - 2 - 3
244 means
245 ( 6 - 2 ) - 3
246
248 The body of a page comprises all of an lgs file except comments, page
249 name, references, and definitions. By convention, the body comes after
250 the ""D sections.
251
252 The ""B directive may be used to terminate a ""D section. Terminating a
253 ""D section, however, implicitly starts or resumes the body section, so
254 one may think of ""B as a 'body directive'.
255
256 The body of a page is made up of constructs, strings, and body direc‐
257 tives.
258
259 The constructs may be constructs defined on the page itself or con‐
260 structs defined on directly referenced pages. Directly referenced pages
261 are those mentioned in ""R directives, as opposed to transitively ref‐
262 erenced pages which are the directly referenced pages plus the pages
263 transitively referenced by directly referenced pages.
264
266 The lgs language treats all characters almost equal, the exceptions
267 being the characters in the range 0 to 32 (inclusive). Characters with
268 codes 0-8, 11, and 14-31 are ignored. In the body and outside strings,
269 any sequence of spaces (code 32), vertical tabs (code 9), line feeds
270 (code 10), form feeds (code 12), and carriage returns (code 13) are
271 treated as a single space character. Apart from that, space characters
272 are treated like any other character.
273
274 As an example, consider addition:
275 ""D 1.6
276 " + "
277 The definition allows to interpret
278 2 + 3
279 as the sum of 2 and 3 whereas
280 2+3
281 is unparseable due to missing spaces around the sum sign.
282 The la
283
285 Strings are arbitrary sequences of characters enclosed in string delim‐
286 iters. A string can start with a lone quote or by a ""- directive. A
287 string can end with a lone quote or a "". directive.
288
289 The empty string, however, cannot be enclosed in lone quotes since that
290 would produce two double quotes in a row which counts as the beginning
291 of a directive. The "". directive, however, may be used both for ending
292 a string and for representing the empty string. One can always tell
293 from context which meaning "". has. The following four lines all repre‐
294 sent an emtpy string.
295 "".
296 ""-"
297 ""-"".
298 ""-""{Comment""}"
299
300 The lgc compiler applies 'newline translation' to strings: CR, CRLF,
301 LFCR, and FF are translated to LF, TAB characters are translated to
302 space characters, and characters with codes below 32 (Space) other than
303 TAB, LF, FF, and CR are removed. Each TAB character is translated to
304 one and only one space character. To include characters like CR and TAB
305 in strings, one has to use directives.
306
307 Inside strings, one may use the following directives:
308 ""- No character
309 ""! Double quote
310 ""f Form feed
311 ""n Line feed
312 ""r Carriage return
313 ""t Horizontal tab
314 ""x Characters given in hexadecimal (until period)
315
316 As an example of use of the ""x directive, "I""x4A4B4C.M" means
317 "IJKLM".
318
320 The directives that can be used in the body are:
321 ""# (until lone quote) include given file verbatim as a string
322 ""$ (until lone quote) same, but with newline processing
323 ""S include the lgs source text itself as a string
324 ""N include name definitions
325 ""C include charge definitions
326
327 For details on these directives, consult the lgc Logiweb page or
328 http://logiweb.eu/. A short list of examples follow, however:
329 ""#logiweb.png"
330 Include the Logiweb icon as a string of raw bytes. Keep the bytes as
331 they are.
332 ""$README"
333 Include the given README as a string and apply newline translation to
334 it.
335 ""S
336 Include the lgs source file itself as a string. Inclusion is like ""#
337 but with a twist: If the lgs file does not start with a header, a line
338 containing nothing but "";; is prepended. And if the lgs file does
339 start with a header then all hex digits in the header are removed. The
340 latter ensures that an lgs file with a header gives the same result if
341 translated twice. The former ensures that if the source.lgs file gener‐
342 ated as part of the rendering is retranslated then the result is iden‐
343 tical to the result of the first translation.
344
345 A README consists of plain text, so it is reasonable to apply newline
346 processing. A png file contains binary data, so translation of CR to LF
347 could corrupt the file.
348
349 It is debatable how e.g. an html file should be included. An html file
350 is near-plain without being completely plain. Furthermore, the html
351 standard specifies CRLF to be used as line terminator. One may choose
352 to include it with newline processing in which case one should remember
353 to translate back to CRLF if writing it back to disk. Or one may choose
354 to include it raw and consider the CRLFs to be part of the html format.
355
356 Note that lgs has nothing which resembles #include of the C programming
357 language: The three include directives of Logiweb only allow to include
358 a file as a single string. Beta-test versions of Logiweb had a #include
359 like feature, but the feature has been removed.
360
361 The ""N directive expands into a list of definitions which records the
362 relationship between construct indexes and construct names. The ""C
363 directive expands into a list of definitions which records the rela‐
364 tionship between construct indexes and construct charges. The body of a
365 page should include one ""N and one ""C directive placed in a suitable
366 context. Otherwise, information about construct names and charges are
367 lost in translation. Look at the lgs sources of the pages that come
368 with Logiweb for examples on how to use ""N and ""C.
369
371 When referencing pages one may run into the problem that two distinct
372 constructs may have the same name. To cope with that, ""R directives
373 allows constructs to be qualified.
374
375 Qualifiers modify constucts as they are imported. After the ""R direc‐
376 tive, one may list an arbitrary number of qualifiers before the refer‐
377 ence, separated by lone quotes
378
379 As an example, suppose the base page defines these constructs:
380 if " then " else "
381 " + "
382 Furhtermore, suppose a page references the base page using the follow‐
383 ing reference:
384 ""R abc " def " base
385 The reference is to the base page and has qualifiers abc and def.
386
387 With the reference above, one may refer to the if-then-else and the
388 addition constructs under these names:
389 abc if " then " else "
390 def if " then " else "
391 " abc + "
392 " def + "
393
394 One may include the empty qualifier in the list of qualifiers. If the
395 empty qualifier is included, it has to appear first. As an example, the
396 reference
397 ""R" abc " def " base
398 allows to reference the if-then-else construct under these names:
399 if " then " else "
400 abc if " then " else "
401 def if " then " else "
402
403 As can be seen, each construct may be known under more than one name
404 and distinct constructs may have the same name. If a name belongs to
405 more than one construct, then lgc will protest if that name is used in
406 the body.
407
408 For more on qualifiers, including handling of spaces, see the lgc Logi‐
409 web page or http://logiweb.eu.
410
412 The frontend of the lgc compiler translates an lgs source text into a
413 Logiweb vector. The Logiweb vector consists of a bibliography, a dic‐
414 tionary, and a body, c.f. logiweb(5). The bibliography consists of the
415 references of all referenced pages, starting with reference zero (the
416 reference of the page itself). The dictionary records the relationship
417 between construct indexes and construct arities. The arity of a con‐
418 struct equals the number of lone quotes in the construct. The body is
419 no more than the parse tree of the body expressed in Polish prefix.
420
421 The codifier of the lgc compiler translates the vector to a rack. The
422 renderer of the lgc compiler than translates the rack to a rendering.
423 These translations have little to do with the lgs format.
424
425 See the lgc Logiweb page or http://logiweb.eu/ for more.
426
428 Klaus Grue, http://logiweb.eu/
429
431 lgc(1), logiweb(5), http://logiweb.eu/
432
433
434
435Logiweb JULY 2009 LGC(5)