1Scanf(3)                         OCaml library                        Scanf(3)
2
3
4

NAME

6       Scanf - Formatted input functions.
7

Module

9       Module   Scanf
10

Documentation

12       Module Scanf
13        : sig end
14
15
16       Formatted input functions.
17
18
19
20
21
22
23
24
25       === Introduction ===
26
27
28       === Functional input with format strings ===
29
30
31       ===  The  module  Scanf provides formatted input functions or scanners.
32       The formatted input functions can read from any kind of input,  includ‐
33       ing  strings,  files,  or anything that can return characters. The more
34       general source of characters is named a scanning buffer  and  has  type
35       Scanf.Scanning.scanbuf. The more general formatted input function reads
36       from any scanning buffer and is named bscanf.  Generally speaking,  the
37       formatted  input  functions have 3 arguments: - the first argument is a
38       source of characters for the input, - the second argument is  a  format
39       string  that  specifies  the  values to read, - the third argument is a
40       receiver function that is applied to the values read.  Hence, a typical
41       call  to  the formatted input function Scanf.bscanf is bscanf ib fmt f,
42       where: - ib is a source of characters (typically a scanning buffer with
43       type Scanf.Scanning.scanbuf), - fmt is a format string (the same format
44       strings as those used to print material with module Printf or  Format),
45       - f is a function that has as many arguments as the number of values to
46       read in the input.  ===
47
48
49       === A simple example ===
50
51
52       === As suggested above, the expression bscanf ib %d f reads  a  decimal
53       integer  n  from  the  source  of  characters  ib and returns f n.  For
54       instance, - if we use stdib as the source  of  characters  (Scanf.Scan‐
55       ning.stdib  is  the  predefined  input  buffer that reads from standard
56       input), - if we define the receiver f as let f x = x + 1,  then  bscanf
57       stdib  %d  f reads an integer n from the standard input and returns f n
58       (that is n + 1). Thus, if we evaluate bscanf stdib %d f, and then enter
59       41 at the keyboard, we get 42 as the final result. ===
60
61
62       === Formatted input as a functional feature ===
63
64
65       ===  The  Caml  scanning facility is reminiscent of the corresponding C
66       feature.  However, it is also largely different, simpler, and yet  more
67       powerful:  the  formatted  input functions are higher-order functionals
68       and the parameter passing mechanism is just the regular function appli‐
69       cation not the variable assignment based mechanism which is typical for
70       formatted input in imperative languages; the Caml format  strings  also
71       feature  useful  additions to easily define complex tokens; as expected
72       within a functional programming language, the formatted input functions
73       also  support  polymorphism,  in  particular arbitrary interaction with
74       polymorphic user-defined  scanners.  Furthermore,  the  Caml  formatted
75       input facility is fully type-checked at compile time. ===
76
77
78       module Scanning : sig end
79
80
81
82       Scanning buffers
83
84
85
86
87
88
89       === Type of formatted input functions ===
90
91
92       type  ('a,  'b,  'c, 'd) scanner = ('a, Scanning.scanbuf, 'b, 'c, 'a ->
93       'd, 'd) format6 -> 'c
94
95
96       The type of formatted input scanners: ('a, 'b, 'c, 'd) scanner  is  the
97       type of a formatted input function that reads from some scanning buffer
98       according to some format string; more precisely, if scan is  some  for‐
99       matted  input  function,  then scan ib fmt f applies f to the arguments
100       specified by the format string fmt , when scan has read those arguments
101       from the scanning input buffer ib .
102
103       For  instance, the scanf function below has type ('a, 'b, 'c, 'd) scan‐
104       ner , since it is a formatted input function that reads  from  stdib  :
105       scanf fmt f applies f to the arguments specified by fmt , reading those
106       arguments from stdin as expected.
107
108       If the format fmt has some  %r  indications,  the  corresponding  input
109       functions  must  be  provided  before  the  receiver  f  argument.  For
110       instance, if read_elem is an input function for values of type t , then
111       bscanf  ib  %r; read_elem f reads a value v of type t followed by a ';'
112       character, and returns f v .
113
114
115
116
117       exception Scan_failure of string
118
119
120       The exception that formatted input functions raise when the input  can‐
121       not be read according to the given format.
122
123
124
125
126
127       === The general formatted input function ===
128
129
130       val bscanf : Scanning.scanbuf -> ('a, 'b, 'c, 'd) scanner
131
132
133       bscanf ib fmt r1 ... rN f reads arguments for the function f , from the
134       scanning buffer ib , according to the format string fmt , and applies f
135       to  these  values.   The  result  of  this call to f is returned as the
136       result of the entire bscanf call.  For instance, if f is  the  function
137       fun s i -> i + 1 , then Scanf.sscanf x=  1 %s = %i f returns 2 .
138
139       Arguments r1 to rN are user-defined input functions that read the argu‐
140       ment corresponding to a %r conversion.
141
142
143
144
145
146       === Format string description ===
147
148
149       === The format is a character string  which  contains  three  types  of
150       objects:  - plain characters, which are simply matched with the charac‐
151       ters of the input, - conversion specifications, each  of  which  causes
152       reading  and  conversion of one argument for the function f, - scanning
153       indications to specify boundaries of tokens.  ===
154
155
156       === The space character in format strings ===
157
158
159       === As mentioned above, a plain character in the format string is  just
160       matched  with  the next character of the input; however, two characters
161       are special exceptions to this rule: the space character (' ' or  ASCII
162       code  32) and the line feed character ('\n' or ASCII code 10).  A space
163       does not match a single space character, but  any  amount  of  ``white‐
164       space''  in the input. More precisely, a space inside the format string
165       matches any number of tab, space, line feed and carriage return charac‐
166       ters.  Similarly,  a  line  feed character in the format string matches
167       either a single line feed or a carriage return followed by a line feed.
168       Matching  any  amount  of whitespace, a space in the format string also
169       matches no amount of whitespace at all; hence, the call bscanf ib Price
170       =  %d  $ (fun p -> p) succeeds and returns 1 when reading an input with
171       various whitespace in it, such as Price = 1 $, Price =  1  $,  or  even
172       Price=1$. ===
173
174
175       === Conversion specifications in format strings ===
176
177
178       ===  Conversion  specifications consist in the % character, followed by
179       an optional flag, an optional field width, and followed by one  or  two
180       conversion  characters.  The  conversion  characters and their meanings
181       are: - d: reads an optionally signed decimal integer.  -  i:  reads  an
182       optionally  signed integer (usual input formats for hexadecimal (0x[d]+
183       and 0X[d]+), octal (0o[d]+), and binary  0b[d]+  notations  are  under‐
184       stood).   -  u:  reads an unsigned decimal integer.  - x or X: reads an
185       unsigned hexadecimal integer.  - o: reads an unsigned octal integer.  -
186       s:  reads a string argument that spreads as much as possible, until the
187       following bounding condition holds: a  whitespace  has  been  found,  a
188       scanning  indication has been encountered, or the end-of-input has been
189       reached.  Hence, this conversion always succeeds: it returns  an  empty
190       string,  if  the  bounding  condition holds when the scan begins.  - S:
191       reads a delimited string argument (delimiters and special escaped char‐
192       acters  follow  the  lexical conventions of Caml).  - c: reads a single
193       character. To test the current  input  character  without  reading  it,
194       specify   a  null  field  width,  i.e.  use  specification  %0c.  Raise
195       Invalid_argument, if the field width specification is greater  than  1.
196       - C: reads a single delimited character (delimiters and special escaped
197       characters follow the lexical conventions of Caml).  - f, e, E,  g,  G:
198       reads  an  optionally signed floating-point number in decimal notation,
199       in the style dddd.ddd e/E+-dd.  - F:  reads  a  floating  point  number
200       according  to  the lexical conventions of Caml (hence the decimal point
201       is mandatory if the exponent part is not  mentioned).   -  B:  reads  a
202       boolean  argument  (true or false).  - b: reads a boolean argument (for
203       backward compatibility; do not use in new programs).  - ld, li, lu, lx,
204       lX,  lo:  reads an int32 argument to the format specified by the second
205       letter (decimal, hexadecimal, etc).  - nd, ni, nu, nx, nX, no: reads  a
206       nativeint argument to the format specified by the second letter.  - Ld,
207       Li, Lu, Lx, LX, Lo: reads an int64 argument to the format specified  by
208       the  second  letter.  - [ range ]: reads characters that matches one of
209       the characters mentioned in the range of characters range (or not  men‐
210       tioned  in  it, if the range starts with ^). Reads a string that can be
211       empty, if the next input character does not match the range. The set of
212       characters  from  c1  to  c2 (inclusively) is denoted by c1-c2.  Hence,
213       %[0-9] returns a string representing  a  decimal  number  or  an  empty
214       string     if     no     decimal    digit    is    found;    similarly,
215       %[\\048-\\057\\065-\\070] returns a string of hexadecimal digits.  If a
216       closing  bracket appears in a range, it must occur as the first charac‐
217       ter of the range (or just after the ^ in case of range negation); hence
218       []] matches a ] character and [^]] matches any character that is not ].
219       - r: user-defined reader. Takes the next ri  formatted  input  function
220       and applies it to the scanning buffer ib to read the next argument. The
221       input function ri must therefore have type Scanning.scanbuf ->  'a  and
222       the argument read has type 'a.  - { fmt %}: reads a format string argu‐
223       ment.  The format string read must have the same  type  as  the  format
224       string specification fmt.  For instance, %{%i%} reads any format string
225       that can read a value of type int; hence Scanf.sscanf fmt:\\\ number is
226       %u\\\""  fmt:%{%i%} succeeds and returns the format string number is %u
227       .  - \( fmt %\): scanning format substitution.  Reads a  format  string
228       to  replace fmt.  The format string read must have the same type as the
229       format string specification fmt.  For instance, %\( %i%  \)  reads  any
230       format string that can read a value of type int; hence Scanf.sscanf \\\
231       %4d\\\"1234.00" %\(%i%\) is equivalent to Scanf.sscanf 1234.00 %4d .  -
232       l: returns the number of lines read so far.  - n: returns the number of
233       characters read so far.  - N or L: returns the number of tokens read so
234       far.   -  !:  matches  the  end of input condition.  - %: matches one %
235       character in the input.  - ,: the no-op delimiter for conversion speci‐
236       fications.   Following  the  %  character that introduces a conversion,
237       there may be the special flag _: the conversion that follows occurs  as
238       usual, but the resulting value is discarded.  For instance, if f is the
239       function fun i -> i + 1, then Scanf.sscanf x = 1 %_s = %i f returns  2.
240       The  field  width is composed of an optional integer literal indicating
241       the maximal width of the token to read.  For  instance,  %6d  reads  an
242       integer,  having  at  most  6 decimal digits; %4f reads a float with at
243       most 4 characters; and %8[\\000-\\255] returns the  next  8  characters
244       (or  all the characters still available, if fewer than 8 characters are
245       available in the input).  Notes: - as mentioned above, a %s  conversion
246       always succeeds, even if there is nothing to read in the input: it sim‐
247       ply returns  .  - in addition to the relevant  digits,  '_'  characters
248       may  appear inside numbers (this is reminiscent to the usual Caml lexi‐
249       cal conventions). If stricter scanning is desired, use the  range  con‐
250       version facility instead of the number conversions.  - the scanf facil‐
251       ity is not intended for heavy duty lexical analysis and parsing. If  it
252       appears  not  expressive  enough  for  your  needs, several alternative
253       exists: regular expressions (module Str), stream parsers, ocamllex-gen‐
254       erated lexers, ocamlyacc-generated parsers.  ===
255
256
257       === Scanning indications in format strings ===
258
259
260       ===  Scanning  indications  appear just after the string conversions %s
261       and %[ range ] to delimit the end of the token. A  scanning  indication
262       is  introduced by a @ character, followed by some constant character c.
263       It means that the string token should end just before the next matching
264       c  (which  is  skipped).  If  no c character is encountered, the string
265       token spreads as much as possible. For instance, %s@\t reads  a  string
266       up  to  the  next  tab  character or to the end of input. If a scanning
267       indication @c does not follow a string conversion, it is treated  as  a
268       plain  c  character.  Note: - the scanning indications introduce slight
269       differences in the syntax of Scanf format strings,  compared  to  those
270       used for the Printf module. However, the scanning indications are simi‐
271       lar to those used in the Format module; hence, when producing formatted
272       text  to  be scanned by !Scanf.bscanf, it is wise to use printing func‐
273       tions from the Format module (or, if you need  to  use  functions  from
274       Printf,  banish  or carefully double check the format strings that con‐
275       tain '@' characters).  ===
276
277
278       === Exceptions during scanning ===
279
280
281       === Scanners may raise the following exceptions when the  input  cannot
282       be  read  according to the format string: - Raise Scanf.Scan_failure if
283       the input does not match the format.  - Raise Failure if  a  conversion
284       to  a  number is not possible.  - Raise End_of_file if the end of input
285       is encountered while some more characters are needed to read  the  cur‐
286       rent  conversion specification.  - Raise Invalid_argument if the format
287       string is invalid.  Note: - as a consequence, scanning a %s  conversion
288       never  raises exception End_of_file: if the end of input is reached the
289       conversion succeeds and simply returns the characters read so  far,  or
290       if none were read.  ===
291
292
293       === Specialised formatted input functions ===
294
295
296       val fscanf : Pervasives.in_channel -> ('a, 'b, 'c, 'd) scanner
297
298       Same as Scanf.bscanf , but reads from the given channel.
299
300       Warning:  since  all  formatted input functions operate from a scanning
301       buffer, be aware that each fscanf invocation will operate with a  scan‐
302       ning buffer reading from the given channel. This extra level of buffer‐
303       ization can lead to strange scanning behaviour if  you  use  low  level
304       primitives  on  the  channel  (reading  characters, seeking the reading
305       position, and so on).
306
307       As a consequence, never mix direct low level  reading  and  high  level
308       scanning from the same input channel.
309
310
311
312
313       val sscanf : string -> ('a, 'b, 'c, 'd) scanner
314
315       Same as Scanf.bscanf , but reads from the given string.
316
317
318
319
320       val scanf : ('a, 'b, 'c, 'd) scanner
321
322       Same  as  Scanf.bscanf  , but reads from the predefined scanning buffer
323       Scanf.Scanning.stdib that is connected to stdin .
324
325
326
327
328       val kscanf : Scanning.scanbuf -> (Scanning.scanbuf ->  exn  ->  'a)  ->
329       ('b, 'c, 'd, 'a) scanner
330
331       Same  as  Scanf.bscanf  ,  but takes an additional function argument ef
332       that is called in case of error: if the scanning process or  some  con‐
333       version  fails,  the  scanning function aborts and calls the error han‐
334       dling function ef with the  scanning  buffer  and  the  exception  that
335       aborted the scanning process.
336
337
338
339
340
341       === Reading format strings from input ===
342
343
344       val  bscanf_format  : Scanning.scanbuf -> ('a, 'b, 'c, 'd, 'e, 'f) for‐
345       mat6 -> (('a, 'b, 'c, 'd, 'e, 'f) format6 -> 'g) -> 'g
346
347
348       bscanf_format ib fmt f reads a format string token  from  the  scanning
349       buffer ib , according to the given format string fmt , and applies f to
350       the resulting format string value.  Raise Scan_failure  if  the  format
351       string value read does not have the same type as fmt .
352
353
354
355
356       val sscanf_format : string -> ('a, 'b, 'c, 'd, 'e, 'f) format6 -> (('a,
357       'b, 'c, 'd, 'e, 'f) format6 -> 'g) -> 'g
358
359       Same as Scanf.bscanf_format , but reads from the given string.
360
361
362
363
364       val format_from_string : string -> ('a, 'b, 'c, 'd, 'e, 'f) format6  ->
365       ('a, 'b, 'c, 'd, 'e, 'f) format6
366
367
368       format_from_string s fmt converts a string argument to a format string,
369       according to the given format string fmt .  Raise Scan_failure if  s  ,
370       considered as a format string, does not have the same type as fmt .
371
372
373
374
375
376
377OCamldoc                          2010-01-29                          Scanf(3)
Impressum