1Scanf(3)                         OCaml library                        Scanf(3)
2
3
4

NAME

6       Scanf - Formatted input functions.
7

Module

9       Module   Scanf
10

Documentation

12       Module Scanf
13        : sig end
14
15
16       Formatted input functions.
17
18
19
20
21
22
23
24       === Introduction ===
25
26
27       === Functional input with format strings ===
28
29
30       ===  The  module  Scanf provides formatted input functions or scanners.
31       The formatted input functions can read from any kind of input,  includ‐
32       ing  strings,  files,  or anything that can return characters. The more
33       general source of characters is named a  formatted  input  channel  (or
34       scanning  buffer) and has type Scanf.Scanning.in_channel. The more gen‐
35       eral formatted input function reads from any  scanning  buffer  and  is
36       named bscanf.  Generally speaking, the formatted input functions have 3
37       arguments: - the first argument is  a  source  of  characters  for  the
38       input, - the second argument is a format string that specifies the val‐
39       ues to read, - the third  argument  is  a  receiver  function  that  is
40       applied  to  the  values  read.  Hence, a typical call to the formatted
41       input function Scanf.bscanf is bscanf ic fmt f, where: - ic is a source
42       of   characters   (typically   a  formatted  input  channel  with  type
43       Scanf.Scanning.in_channel), - fmt is a format string (the  same  format
44       strings  as those used to print material with module Printf or Format),
45       - f is a function that has as many arguments as the number of values to
46       read in the input according to fmt.  ===
47
48
49       === A simple example ===
50
51
52       ===  As  suggested above, the expression bscanf ic %d f reads a decimal
53       integer n from the source of  characters  ic  and  returns  f  n.   For
54       instance,  -  if  we use stdin as the source of characters (Scanf.Scan‐
55       ning.stdin is the predefined formatted input channel  that  reads  from
56       standard input), - if we define the receiver f as let f x = x + 1, then
57       bscanf Scanning.stdin %d f reads an integer n from the  standard  input
58       and  returns  f n (that is n + 1). Thus, if we evaluate bscanf stdin %d
59       f, and then enter 41 at the keyboard, the result we get is 42. ===
60
61
62       === Formatted input as a functional feature ===
63
64
65       === The OCaml scanning facility is reminiscent of the  corresponding  C
66       feature.   However, it is also largely different, simpler, and yet more
67       powerful: the formatted input functions  are  higher-order  functionals
68       and the parameter passing mechanism is just the regular function appli‐
69       cation not the variable assignment based mechanism which is typical for
70       formatted  input in imperative languages; the OCaml format strings also
71       feature useful additions to easily define complex tokens;  as  expected
72       within a functional programming language, the formatted input functions
73       also support polymorphism, in  particular  arbitrary  interaction  with
74       polymorphic  user-defined  scanners.  Furthermore,  the OCaml formatted
75       input facility is fully type-checked at compile time. ===
76
77
78       === Formatted input channel ===
79
80
81       module Scanning : sig end
82
83
84
85
86
87
88       === Type of formatted input functions ===
89
90
91       type ('a, 'b, 'c, 'd) scanner = ('a, Scanning.in_channel, 'b, 'c, 'a ->
92       'd, 'd) Pervasives.format6 -> 'c
93
94
95       The  type  of formatted input scanners: ('a, 'b, 'c, 'd) scanner is the
96       type of a formatted input function that reads from some formatted input
97       channel  according  to  some  format string; more precisely, if scan is
98       some formatted input function, then scan ic fmt f applies f to all  the
99       arguments  specified  by  format  string fmt , when scan has read those
100       arguments from the Scanf.Scanning.in_channel formatted input channel ic
101       .
102
103       For  instance, the Scanf.scanf function below has type ('a, 'b, 'c, 'd)
104       scanner , since it is  a  formatted  input  function  that  reads  from
105       Scanf.Scanning.stdin : scanf fmt f applies f to the arguments specified
106       by fmt , reading those arguments from Pervasives.stdin as expected.
107
108       If the format fmt has some %r indications, the corresponding  formatted
109       input  functions  must  be  provided  before  receiver function f . For
110       instance, if read_elem is an input function for values of type t , then
111       bscanf  ic  %r; read_elem f reads a value v of type t followed by a ';'
112       character, and returns f v .
113
114
115       Since 3.10.0
116
117
118
119       exception Scan_failure of string
120
121
122       When the input can not be read according to the format string  specifi‐
123       cation,  formatted input functions typically raise exception Scan_fail‐
124       ure .
125
126
127
128
129       === The general formatted input function ===
130
131
132       val bscanf : Scanning.in_channel -> ('a, 'b, 'c, 'd) scanner
133
134
135
136
137
138       === bscanf ic fmt r1 ... rN f reads  characters  from  the  Scanf.Scan‐
139       ning.in_channel  formatted input channel ic and converts them to values
140       according to format string fmt.  As a final step, receiver  function  f
141       is  applied to the values read and gives the result of the bscanf call.
142       For instance, if f is the function fun s i -> i + 1, then  Scanf.sscanf
143       x=  1  %s  = %i f returns 2.  Arguments r1 to rN are user-defined input
144       functions that read the argument corresponding to  the  %r  conversions
145       specified in the format string. ===
146
147
148       === Format string description ===
149
150
151       ===  The format string is a character string which contains three types
152       of objects: - plain characters, which are simply matched with the char‐
153       acters  of  the input (with a special case for space and line feed, see
154       Scanf.space), - conversion specifications, each of which causes reading
155       and  conversion  of  one argument for the function f (see Scanf.conver‐
156       sion), - scanning indications to  specify  boundaries  of  tokens  (see
157       scanning Scanf.indication).  ===
158
159
160       === The space character in format strings ===
161
162
163       ===  As mentioned above, a plain character in the format string is just
164       matched with the next character of the input; however,  two  characters
165       are  special exceptions to this rule: the space character (' ' or ASCII
166       code 32) and the line feed character ('\n' or ASCII code 10).  A  space
167       does not match a single space character, but any amount of 'whitespace'
168       in the input. More precisely, a space inside the format string  matches
169       any  number  of  tab,  space, line feed and carriage return characters.
170       Similarly, a line feed character in the format string matches either  a
171       single  line feed or a carriage return followed by a line feed.  Match‐
172       ing any amount of whitespace, a space in the format string also matches
173       no  amount of whitespace at all; hence, the call bscanf ib Price = %d $
174       (fun p -> p) succeeds and returns 1 when reading an input with  various
175       whitespace  in  it, such as Price = 1 $, Price = 1 $, or even Price=1$.
176       ===
177
178
179       === Conversion specifications in format strings ===
180
181
182       === Conversion specifications consist in the % character,  followed  by
183       an  optional  flag, an optional field width, and followed by one or two
184       conversion characters.  The conversion characters  and  their  meanings
185       are:  -  d:  reads  an  optionally signed decimal integer (0-9+).  - i:
186       reads an optionally signed integer (usual input conventions for decimal
187       (0-9+),  hexadecimal  (0x[0-9a-f]+  and 0X[0-9A-F]+), octal (0o[0-7]+),
188       and binary  (0b[0-1]+)  notations  are  understood).   -  u:  reads  an
189       unsigned  decimal  integer.   -  x  or X: reads an unsigned hexadecimal
190       integer ([0-9a-fA-F]+).  - o: reads an unsigned octal integer ([0-7]+).
191       -  s:  reads  a string argument that spreads as much as possible, until
192       the following bounding condition holds: - a whitespace has  been  found
193       (see  Scanf.space), - a scanning indication (see scanning Scanf.indica‐
194       tion) has been  encountered,  -  the  end-of-input  has  been  reached.
195       Hence,  this  conversion always succeeds: it returns an empty string if
196       the bounding condition holds when the scan begins.  - S: reads a delim‐
197       ited  string argument (delimiters and special escaped characters follow
198       the lexical conventions of OCaml).  - c: reads a single  character.  To
199       test  the  current  input  character without reading it, specify a null
200       field width, i.e. use specification %0c. Raise Invalid_argument, if the
201       field  width  specification  is  greater  than  1.  - C: reads a single
202       delimited character (delimiters and special escaped  characters  follow
203       the  lexical  conventions of OCaml).  - f, e, E, g, G: reads an option‐
204       ally signed floating-point number in decimal  notation,  in  the  style
205       dddd.ddd  e/E+-dd.   -  h, H: reads an optionally signed floating-point
206       number in hexadecimal notation.  - F: reads  a  floating  point  number
207       according  to the lexical conventions of OCaml (hence the decimal point
208       is mandatory if the exponent part is not  mentioned).   -  B:  reads  a
209       boolean  argument  (true or false).  - b: reads a boolean argument (for
210       backward compatibility; do not use in new programs).  - ld, li, lu, lx,
211       lX,  lo:  reads an int32 argument to the format specified by the second
212       letter for regular integers.  -  nd,  ni,  nu,  nx,  nX,  no:  reads  a
213       nativeint  argument  to  the  format specified by the second letter for
214       regular integers.  - Ld, Li, Lu, Lx, LX, Lo: reads an int64 argument to
215       the  format  specified  by the second letter for regular integers.  - [
216       range ]: reads characters that matches one of the characters  mentioned
217       in  the range of characters range (or not mentioned in it, if the range
218       starts with ^). Reads a string that can be empty,  if  the  next  input
219       character does not match the range. The set of characters from c1 to c2
220       (inclusively) is denoted by c1-c2.  Hence, %[0-9] returns a string rep‐
221       resenting  a  decimal  number or an empty string if no decimal digit is
222       found; similarly, %[0-9a-f] returns a string of hexadecimal digits.  If
223       a  closing bracket appears in a range, it must occur as the first char‐
224       acter of the range (or just after the ^ in  case  of  range  negation);
225       hence  []] matches a ] character and [^]] matches any character that is
226       not ].  Use %% and %@ to include  a  %  or  a  @  in  a  range.   -  r:
227       user-defined  reader.  Takes  the  next ri formatted input function and
228       applies it to the scanning buffer ib to read  the  next  argument.  The
229       input  function  ri  must therefore have type Scanning.in_channel -> 'a
230       and the argument read has type 'a.  - { fmt %}: reads a  format  string
231       argument.  The format string read must have the same type as the format
232       string specification fmt. For instance,  %{  %i  %}  reads  any  format
233       string  that  can  read  a value of type int; hence, if s is the string
234       fmt:\ number is %u\"", then Scanf.sscanf s  fmt:  %{%i%}  succeeds  and
235       returns the format string number is %u .  - ( fmt %): scanning sub-for‐
236       mat substitution.  Reads a format string rf in the input, then goes  on
237       scanning  with  rf  instead of scanning with fmt.  The format string rf
238       must have the same type as the format string specification fmt that  it
239       replaces.  For instance, %( %i %) reads any format string that can read
240       a value of type int.  The conversion returns the format string read rf,
241       and  then  a  value  read  using  rf.   Hence,  if  s  is  the string \
242       %4d\"1234.00", then Scanf.sscanf s %(%i%) (fun fmt i -> fmt, i)  evalu‐
243       ates to ("%4d", 1234).  This behaviour is not mere format substitution,
244       since the conversion returns the format string read as additional argu‐
245       ment.  If you need pure format substitution, use special flag _ to dis‐
246       card the extraneous argument: conversion %_(  fmt  %)  reads  a  format
247       string rf and then behaves the same as format string rf. Hence, if s is
248       the string \ %4d\"1234.00",  then  Scanf.sscanf  s  %_(%i%)  is  simply
249       equivalent  to  Scanf.sscanf  1234.00 %4d .  - l: returns the number of
250       lines read so far.  - n: returns the number of characters read so  far.
251       -  N  or L: returns the number of tokens read so far.  - !: matches the
252       end of input condition.  - %: matches one % character in the input.   -
253       @: matches one @ character in the input.  - ,: does nothing.  Following
254       the % character that introduces a conversion, there may be the  special
255       flag  _: the conversion that follows occurs as usual, but the resulting
256       value is discarded.  For instance, if f is the function fun i -> i + 1,
257       and  s  is the string x = 1 , then Scanf.sscanf s %_s = %i f returns 2.
258       The field width is composed of an optional integer  literal  indicating
259       the  maximal  width  of  the token to read.  For instance, %6d reads an
260       integer, having at most 6 decimal digits; %4f reads  a  float  with  at
261       most  4 characters; and %8[\000-\255] returns the next 8 characters (or
262       all the characters still available, if  fewer  than  8  characters  are
263       available  in the input).  Notes: - as mentioned above, a %s conversion
264       always succeeds, even if there is nothing to read in the input: in this
265       case,  it  simply returns  .  - in addition to the relevant digits, '_'
266       characters may appear inside numbers (this is reminiscent to the  usual
267       OCaml  lexical  conventions).  If stricter scanning is desired, use the
268       range conversion facility instead of the  number  conversions.   -  the
269       scanf  facility  is  not  intended  for heavy duty lexical analysis and
270       parsing. If it appears not expressive enough for  your  needs,  several
271       alternative  exists:  regular expressions (module Str), stream parsers,
272       ocamllex-generated lexers, ocamlyacc-generated parsers.  ===
273
274
275       === Scanning indications in format strings ===
276
277
278       === Scanning indications appear just after the  string  conversions  %s
279       and  %[  range ] to delimit the end of the token. A scanning indication
280       is introduced by a @ character, followed by some plain character c.  It
281       means  that the string token should end just before the next matching c
282       (which is skipped). If no c character is encountered, the string  token
283       spreads  as  much as possible. For instance, %s@\t reads a string up to
284       the next tab character or to the end of input. If a @ character appears
285       anywhere else in the format string, it is treated as a plain character.
286       Note: - As usual in format strings, % and @ characters must be  escaped
287       using  %% and %@; this rule still holds within range specifications and
288       scanning indications.  For instance, format %s@%% reads a string up  to
289       the next % character, and format %s@%@ reads a string up to the next @.
290       - The scanning indications introduce slight differences in  the  syntax
291       of  Scanf format strings, compared to those used for the Printf module.
292       However, the scanning indications are similar to those used in the For‐
293       mat  module;  hence,  when  producing  formatted  text to be scanned by
294       Scanf.bscanf, it is wise to use printing functions from the Format mod‐
295       ule  (or, if you need to use functions from Printf, banish or carefully
296       double check the format strings that contain '@' characters).  ===
297
298
299       === Exceptions during scanning ===
300
301
302       === Scanners may raise the following exceptions when the  input  cannot
303       be  read  according to the format string: - Raise Scanf.Scan_failure if
304       the input does not match the format.  - Raise Failure if  a  conversion
305       to  a  number is not possible.  - Raise End_of_file if the end of input
306       is encountered while some more characters are needed to read  the  cur‐
307       rent  conversion specification.  - Raise Invalid_argument if the format
308       string is invalid.  Note: - as a consequence, scanning a %s  conversion
309       never  raises exception End_of_file: if the end of input is reached the
310       conversion succeeds and simply returns the characters read so  far,  or
311       if none were ever read.  ===
312
313
314       === Specialised formatted input functions ===
315
316
317       val sscanf : string -> ('a, 'b, 'c, 'd) scanner
318
319       Same as Scanf.bscanf , but reads from the given string.
320
321
322
323       val scanf : ('a, 'b, 'c, 'd) scanner
324
325       Same  as  Scanf.bscanf  , but reads from the predefined formatted input
326       channel Scanf.Scanning.stdin that is connected to Pervasives.stdin .
327
328
329
330       val kscanf : Scanning.in_channel -> (Scanning.in_channel -> exn ->  'd)
331       -> ('a, 'b, 'c, 'd) scanner
332
333       Same  as  Scanf.bscanf  ,  but takes an additional function argument ef
334       that is called in case of error: if the scanning process or  some  con‐
335       version  fails,  the  scanning function aborts and calls the error han‐
336       dling function ef with the formatted input channel  and  the  exception
337       that aborted the scanning process as arguments.
338
339
340
341       val  ksscanf : string -> (Scanning.in_channel -> exn -> 'd) -> ('a, 'b,
342       'c, 'd) scanner
343
344       Same as Scanf.kscanf but reads from the given string.
345
346
347       Since 4.02.0
348
349
350
351
352       === Reading format strings from input ===
353
354
355       val bscanf_format : Scanning.in_channel -> ('a, 'b,  'c,  'd,  'e,  'f)
356       Pervasives.format6  ->  (('a, 'b, 'c, 'd, 'e, 'f) Pervasives.format6 ->
357       'g) -> 'g
358
359
360       bscanf_format ic fmt f reads a format string token from  the  formatted
361       input  channel  ic  ,  according  to  the given format string fmt , and
362       applies f to the resulting format string value.  Raise Scanf.Scan_fail‐
363       ure  if the format string value read does not have the same type as fmt
364       .
365
366
367       Since 3.09.0
368
369
370
371       val sscanf_format : string -> ('a, 'b, 'c, 'd, 'e, 'f)  Pervasives.for‐
372       mat6 -> (('a, 'b, 'c, 'd, 'e, 'f) Pervasives.format6 -> 'g) -> 'g
373
374       Same as Scanf.bscanf_format , but reads from the given string.
375
376
377       Since 3.09.0
378
379
380
381       val  format_from_string  :  string  ->  ('a, 'b, 'c, 'd, 'e, 'f) Perva‐
382       sives.format6 -> ('a, 'b, 'c, 'd, 'e, 'f) Pervasives.format6
383
384
385       format_from_string s fmt converts a string argument to a format string,
386       according to the given format string fmt .  Raise Scanf.Scan_failure if
387       s , considered as a format string, does not have the same type as fmt .
388
389
390       Since 3.10.0
391
392
393
394       val unescaped : string -> string
395
396
397       unescaped s return a copy of s with escape sequences (according to  the
398       lexical  conventions  of OCaml) replaced by their corresponding special
399       characters.  More precisely, Scanf.unescaped has  the  following  prop‐
400       erty: for all string s , Scanf.unescaped (String.escaped s) = s .
401
402       Always  return  a  copy  of  the  argument,  even if there is no escape
403       sequence in the argument.  Raise Scanf.Scan_failure if s is  not  prop‐
404       erly  escaped  (i.e.  s has invalid escape sequences or special charac‐
405       ters that are not properly escaped).  For instance, String.unescaped \"
406       will fail.
407
408
409       Since 4.00.0
410
411
412
413
414       === Deprecated ===
415
416
417       val fscanf : Pervasives.in_channel -> ('a, 'b, 'c, 'd) scanner
418
419       Deprecated.
420
421       Scanf.fscanf is error prone and deprecated since 4.03.0.
422
423       This  function violates the following invariant of the Scanf module: To
424       preserve scanning semantics, all scanning functions  defined  in  Scanf
425       must read from a user defined Scanf.Scanning.in_channel formatted input
426       channel.
427
428       If you need to read from a Pervasives.in_channel  input  channel  ic  ,
429       simply define a Scanf.Scanning.in_channel formatted input channel as in
430       let ib = Scanning.from_channel ic , then use Scanf.bscanf ib as usual.
431
432
433
434       val kfscanf : Pervasives.in_channel -> (Scanning.in_channel ->  exn  ->
435       'd) -> ('a, 'b, 'c, 'd) scanner
436
437       Deprecated.
438
439       Scanf.kfscanf is error prone and deprecated since 4.03.0.
440
441
442
443
444
445OCamldoc                          2019-02-02                          Scanf(3)
Impressum