1LOGIWEB(5)                File formats and protocols                LOGIWEB(5)
2
3
4

NAME

6       logiweb - Logiweb file formats and protocols
7

DESCRIPTION

9       Logiweb  is  a  system  for storing, locating, and transmitting Logiweb
10       pages. Logiweb pages may contain free text mixed with machine  intelli‐
11       gible objects like computer programs, testsuites, and formal proofs.
12
13       Logiweb  defines  a referencing scheme in which each Logiweb page has a
14       unique Logiweb reference. A Logiweb reference is  typically  around  30
15       bytes  long.  A  Logiweb  reference contains, among other, a RIPEMD-160
16       hash key of the referenced page.
17
18       The Logiweb standard comprises the following file formats  and  network
19       protocols.
20
21           The Logiweb vector format
22           The Logiweb rack format
23           The Logiweb protocol
24
25       The vector format is a condensed representation suited for transmission
26       over the Internet. Furthermore, the authenticity of a Logiweb  page  in
27       vector format can be verified using the RIPEMD-160 hash function. Logi‐
28       web vectors can be retrieved from untrusted repositories and  transmit‐
29       ted over untrusted networks.
30
31       The  rack  format is a comprehensive, precompiled representation suited
32       for storing on a local disk. Furthermore, the rack  format  is  a  good
33       starting  point  for  conversion to other formats. The Logiweb Abstract
34       Machine lgwam(1) needs pages in rack format in order  to  execute  pro‐
35       grams defined on the pages.
36
37       Earlier  versions  of Logiweb supported XML, MathML, Lisp S-expressions
38       and several other formats. The current version leaves such  support  to
39       the  user  through  the  capabilities for user defined rendering of the
40       Logiweb compiler lgc(1). Only support for latex(1), bibtex(1),  makein‐
41       dex(1), dvipdfm(1), and cc(1) has survived in the present version.
42
43       The  Logiweb protocol allows to locate a Logiweb page given its Logiweb
44       reference. More specificially, the Logiweb protocol allows to translate
45       a  Logiweb  reference  into  an  http  URL.  Then  http may be used for
46       retrieving the referenced page and RIPEMD-160 may be used for verifying
47       the authenticity of page.
48
49       Logiweb  version 0.1.1 to 0.1.10 included a Logiweb server which imple‐
50       mented the Logiweb protocol. From Version 1.0.1, the Logiweb server  is
51       excluded  from  the  distribution.  That is because installation of the
52       Logiweb server requires considerable privileges and network insight.
53
54       For Version 1.0.1 a Logiweb server does not even exist,  and  users  of
55       the  Logiweb  compiler  lgc(1) are refered to PATH based translation of
56       references to http URLs. Furthermore, the Logiweb protocol will change.
57       That  is  because  it has proved to be too difficult to get an assigned
58       UDP port and thus the protocol will be modified  to  be  channeled  via
59       http. That will reduce efficiency but also has some benefits.
60
61       The  Logiweb  server  will  be distributed separately at a later stage.
62       When Logiweb servers are installed in the  future,  repositories  which
63       run  a  server will appear to lgc(1) users to contain all Logiweb pages
64       in the world. The present but obsolete Logiweb protocol  is  documented
65       in logiweb(7).
66

BITS, BYTES, AND VECTORS

68       The  file  formats  and protocols are byte oriented, so we first define
69       bytes and vectors (where 'vector' means 'byte vector').
70
71           bit           ::= "0" | "1"
72           byte          ::= bit*8
73           vector        ::= byte*
74
75       A byte denotes a value in 0..255. We refer to the least significant bit
76       of a byte as bit zero and to the most significant bit as bit seven.
77
78       Bytes  of  vectors are given in network order which is identical to the
79       order in which they are stored on disk. We refer to the first byte of a
80       vector as byte zero.
81
82       We  write vectors in mixed endian binary: each byte is written with the
83       most significant bit first and bytes a written in network order.
84
85       We refer to bit m of byte n of a vector as bit  m+8*n  of  the  vector.
86       Thus,  the  bits of a two byte vector are given in this order: 7, 6, 5,
87       4, 3, 2, 1, 0, 15, 14, 13, 12, 11, 10, 9, 8. This order gives  rise  to
88       the term 'mixed endian'.
89
90       Mixed  endian  notation  resembles  Internet dot notation. Internet dot
91       notation specifies bytes to be written in network order and  each  byte
92       to be written with the most significant digit first.
93
94       As  an  example,  mixed endian binary 0000 0010 0000 0011 denotes a two
95       byte followed by a three byte.
96

CARDINALS AND STRINGS

98       We shall refer to non-negative integers as cardinals. We represent car‐
99       dinals and byte strings thus:
100
101           septet        ::= bit*7
102           middle-septet ::= "1" septet
103           end-septet    ::= "0" septet
104           cardinal      ::= middle-septet* end-septet
105           string        ::= length vector
106           length        ::= cardinal
107
108       A middle or end septet denotes a value in 0..127. As examples, the mid‐
109       dle septet 1000 0010 and the end septet 0000 0010 both denote two.
110
111       Cardinals are expressed little endian base 128. As an example,
112
113           1000 0011 0000 0010
114
115       denotes 3+128*2 = 3+256 = 259.
116
117       A string encodes a sequence of bytes. The length field  of  the  string
118       indicates the number of bytes in the string.
119
120       The  grammar  is  *not*  context  free  because  the  strings cannot be
121       expressed in a context free manner.
122

VECTOR FORMAT

124       The vector format of Logiweb pages has the following syntax:
125
126           pagevector    ::= bibliography dictionary body
127           bibliography  ::= reference reference* zero
128           reference     ::= string         ; the string must be non-empty
129           zero          ::= cardinal       ; the cardinal must denote zero
130           dictionary    ::= entry* zero
131           entry         ::= index arity
132           index         ::= nonzero
133           nonzero       ::= cardinal       ; the cardinal must be non-zero
134           arity         ::= cardinal
135           body          ::= vector
136
137       The indexes of the dictionary must be strictly decreasing.
138
139       We refer to the first reference of the bibliography as reference  zero.
140       Reference zero must be the reference of the page itself.
141
142       Above, references are described as strings. The following definition is
143       more precise:
144
145           reference     ::= length scheme ripemd timestamp
146           scheme        ::= "0000 0001"
147           ripemd        ::= byte*20
148           timestamp     ::= mantissa exponent
149           mantissa      ::= cardinal
150           exponent      ::= cardinal
151
152       The length must equal the total number of bytes in scheme, ripemd,  and
153       timestamp.
154
155       Whenever  a  Logiweb  page  is  published,  the  time of publication is
156       included as a timestamp in the reference of the page. The timestamp  is
157       given in Logiweb time as M*10^(-E) where M and E are the values of man‐
158       tissa and exponent, respectively.
159
160       Logiweb time indicates the number of seconds which have elapsed accord‐
161       ing  to  International Atomic Time (TAI) since TAI:00:00:00 of Modified
162       Julian  Day  (MJD)  Zero.  MJD-0  corresponds  to  the  Gregorian  date
163       GRD-1858-11-17. In 1858, UTC lacked 10 seconds behind TAI (even thought
164       they did not know at that time). Hence, Logiweb time measures the  num‐
165       ber   of   seconds   according   to  International  Atomic  Time  since
166       UTC:23:59:50 of GRD-1858-11-16.
167
168       The ripemd code consists of 20 bytes which corresponds to 160 bits. The
169       ripemd  code of a page must be computed using the RIPEMD-160 hash func‐
170       tion applied to all bytes following ripemd, including the timestamp  of
171       the page itself.
172
173       The  body represents a parse tree whose nodes may be Logiweb symbols or
174       strings. A Logiweb symbol comprises a Logiweb reference  and  an  index
175       where  the  index is a cardinal. To be a valid symbol, the index of the
176       symbol must occur in the dictionary of the referenced page.  The  arity
177       of a symbol is the arity assigned to the symbol by that dictionary.
178
179       Transformation  of the body given as a vector to a parse tree is called
180       'unpacking'. Reference one of a page may define  an  unpacker,  i.e.  a
181       user defined unpacking function. See the 'base' and 'lgc' Logiweb pages
182       which come with the distribution for details on this.
183
184       If reference one of a page does not define an unpacker then the body is
185       in default format which is as follows:
186
187           body          ::= node*
188           node*         ::= symbol | zero string
189           symbol        ::= nonzero
190
191       The  nodes  of the body are given in Polish prefix. The arities of each
192       symbol is taken from dictionaries of the relevant  page.  Strings  have
193       arity zero.
194
195       Each  symbol has value 1+R+n*i where i is the index of the symbol, n is
196       the number of references in the bibliography, and  R  is  the  relative
197       reference of the symbol. The relative reference R if between 0 and n-1.
198       The reference of a symbol is the R'th element of the bibliography.
199

RACK FORMAT

201       The rack of a page is a structure which represents a  compiled  version
202       of the page. See the 'base' and 'lgc' pages which come with the distri‐
203       bution for more on this.
204
205       A rack is an inhomogeneous array, i.e. and  array  whose  elements  may
206       have  different  types.  Indexes  of a rack are called hooks. Racks and
207       hooks are to Logiweb what records and record fields  are  to  languages
208       like C.
209
210       Racks  may  reside  in memory or may be stored to disk. Racks in memory
211       contain more data than racks stored to disk. Hence,  racks  are  pruned
212       when  written  to  disk  and the missing parts have to be reconstructed
213       when reading the racks back into memory.
214
215       Pruning affects three hooks of a rack.
216
217       First, a rack in memory has a code hook which  contains  compiled  ver‐
218       sions  of  value  definitions. To keep disk racks architecture indepen‐
219       dent, the code hook is removed when writing a rack to  disk  and  value
220       definitions  have  to  be  recompiled when reading the rack back. Using
221       racks still represent a great saving in time since no parsing,  unpack‐
222       ing, and macro expansion is needed when loading a rack.
223
224       Second, a rack in memory has a cluster hook which contains racks of all
225       transitively referenced pages. This hook is removed since it contains a
226       lot  of  data  which is fast and easy to reconstruct from the disk ver‐
227       sions of transitively referenced pages.
228
229       Third and finaly, a rack in memory has a diagnose hook.  The  value  of
230       that  hook  is 'maptagged'. A maptag allows to delay computation of the
231       diagnose in an otherwise eager process. When writing a  rack  to  disk,
232       computation  of  the diagnose is forced and the maptag is removed. When
233       reading the rack back from disk, a maptag  is  added  to  the  diagnose
234       hook.  Again, see the 'base' and 'lgc' pages that come with the distri‐
235       bution for details.
236
237       Once a rack is pruned for code hook, cluster hook, and diagnose maptag,
238       it  is  a  data structure which is built up from only three data types:
239       cons, cardinals, and a special value named T which corresponds to  NULL
240       in C.
241
242       Pruned racks are stored in the following format:
243
244           rack          ::= racknode* count
245           count         ::= cardinal       ; three plus the number of nodes
246           racknode      ::= card1 | card2 | cons
247           card1         ::= zero cardinal
248           card2         ::= two string
249           two           ::= cardinal       ; the cardinal must represent 2
250           cons          ::= head tail
251           head          ::= cardinal
252           tail          ::= cardinal
253
254       A  rack  with  no nodes represents the value T. Racks with at least one
255       node represents the value represented by the last node. Each node  rep‐
256       resents a value as indicated in the following.
257
258       A card1 node represents the given cardinal.
259
260       A  card2  node represents a cardinal as follows: Take the given string.
261       Discard the length field to obtain a vector. Append a byte  with  value
262       one  to the vector. Then interpret the vector as a little endian cardi‐
263       nal base 256.
264
265       A cons represents the pair of the given head and  tail.  A  head  whose
266       value  is one represents T. A head whose value H is greater than 2 rep‐
267       resents the value of node H-3. The tail is interpretted similarly.  The
268       head  and tail must equal one or reference nodes that appear before the
269       cons.
270

RACK WRITING CONVENTIONS

272       As can be seen, cardinals can be represented in two ways. All cardinals
273       can be represented as a card1 whereas only some cardinals can be repre‐
274       sented as a card2. When writing racks to disk, one should use the card2
275       representation when possible and the card1 representation otherwise.
276
277       The  representation  allows  to represent sharing in that a node can be
278       referenced from more than one cons node. When writing a rack  to  disk,
279       one  should  traverse the rack left-to-right, root-to-leaves, and depth
280       first. One should maximize sharing such that no two nodes represent the
281       same value.
282
283       If  followed,  the  rack  writing conventions ensure that the same rack
284       will be written the same way on distinct systems. That simplifies test‐
285       ing but is not necessary for reading racks back to memory.
286

AUTHOR

288       Klaus Grue, http://logiweb.eu/
289

SEE ALSO

291       lgc(1), lgwam(1), lgc(5), lgc.conf(5), logiweb(7)
292
293
294
295Logiweb                            JULY 2009                        LOGIWEB(5)
Impressum