1Stdlib.Scanf(3)                  OCaml library                 Stdlib.Scanf(3)
2
3
4

NAME

6       Stdlib.Scanf - no description
7

Module

9       Module   Stdlib.Scanf
10

Documentation

12       Module Scanf
13        : (module Stdlib__scanf)
14
15
16
17
18
19
20
21
22
23   Introduction
24   Functional input with format strings
25       The module Scanf provides formatted input functions or scanners.
26
27       The  formatted input functions can read from any kind of input, includ‐
28       ing strings, files, or anything that can return  characters.  The  more
29       general  source  of  characters  is named a formatted input channel (or
30       scanning buffer) and has type Scanf.Scanning.in_channel . The more gen‐
31       eral  formatted  input  function  reads from any scanning buffer and is
32       named bscanf .
33
34       Generally speaking, the formatted input functions have 3 arguments:
35
36       -the first argument is a source of characters for the input,
37
38       -the second argument is a format string that specifies  the  values  to
39       read,
40
41       -the  third argument is a receiver function that is applied to the val‐
42       ues read.
43
44       Hence, a typical call to the formatted input function  Scanf.bscanf  is
45       bscanf ic fmt f , where:
46
47
48       - ic is a source of characters (typically a     formatted input channel
49       with type Scanf.Scanning.in_channel ),
50
51
52       - fmt is a format string (the same format  strings  as  those  used  to
53       print material with module Printf or Format ),
54
55
56       - f is a function that has as many arguments as the number of values to
57       read in the input according to fmt .
58
59
60   A simple example
61       As suggested above, the expression bscanf ic "%d" f reads a decimal in‐
62       teger n from the source of characters ic and returns f n .
63
64       For instance,
65
66
67       -if  we use stdin as the source of characters ( Scanf.Scanning.stdin is
68       the predefined formatted input channel that reads from standard input),
69
70
71       -if we define the receiver f as let f x = x + 1 ,
72
73       then bscanf Scanning.stdin "%d" f reads an integer n from the  standard
74       input  and  returns  f  n (that is n + 1 ). Thus, if we evaluate bscanf
75       stdin "%d" f , and then enter 41 at the keyboard, the result we get  is
76       42 .
77
78   Formatted input as a functional feature
79       The  OCaml scanning facility is reminiscent of the corresponding C fea‐
80       ture.  However, it is also largely different,  simpler,  and  yet  more
81       powerful:  the  formatted  input functions are higher-order functionals
82       and the parameter passing mechanism is just the regular function appli‐
83       cation not the variable assignment based mechanism which is typical for
84       formatted input in imperative languages; the OCaml format strings  also
85       feature  useful  additions to easily define complex tokens; as expected
86       within a functional programming language, the formatted input functions
87       also  support  polymorphism,  in  particular arbitrary interaction with
88       polymorphic user-defined scanners. Furthermore, the OCaml formatted in‐
89       put facility is fully type-checked at compile time.
90
91   Formatted input channel
92       module Scanning : sig end
93
94
95
96
97
98
99   Type of formatted input functions
100       type ('a, 'b, 'c, 'd) scanner = ('a, Scanning.in_channel, 'b, 'c, 'a ->
101       'd, 'd) format6 -> 'c
102
103
104       The type of formatted input scanners: ('a, 'b, 'c, 'd) scanner  is  the
105       type of a formatted input function that reads from some formatted input
106       channel according to some format string; more  precisely,  if  scan  is
107       some formatted input function, then scan
108            ic fmt f applies f to all the arguments specified by format string
109       fmt  ,  when  scan  has  read  those  arguments  from  the  Scanf.Scan‐
110       ning.in_channel formatted input channel ic .
111
112       For  instance, the Scanf.scanf function below has type ('a, 'b, 'c, 'd)
113       scanner , since it is  a  formatted  input  function  that  reads  from
114       Scanf.Scanning.stdin : scanf fmt f applies f to the arguments specified
115       by fmt , reading those arguments from stdin as expected.
116
117       If the format fmt has some %r indications, the corresponding  formatted
118       input  functions  must be provided before receiver function f . For in‐
119       stance, if read_elem is an input function for values of type t  ,  then
120       bscanf ic "%r;" read_elem f reads a value v of type t followed by a ';'
121       character, and returns f v .
122
123
124       Since 3.10.0
125
126
127
128       exception Scan_failure of string
129
130
131       When the input can not be read according to the format string  specifi‐
132       cation,  formatted input functions typically raise exception Scan_fail‐
133       ure .
134
135
136
137
138   The general formatted input function
139       val bscanf : Scanning.in_channel -> ('a, 'b, 'c, 'd) scanner
140
141
142
143
144
145       bscanf ic fmt r1  ...  rN  f  reads  characters  from  the  Scanf.Scan‐
146       ning.in_channel  formatted input channel ic and converts them to values
147       according to format string fmt .  As a final step, receiver function  f
148       is applied to the values read and gives the result of the bscanf call.
149
150       For instance, if f is the function fun s i -> i + 1 , then Scanf.sscanf
151       "x= 1" "%s = %i" f returns 2 .
152
153       Arguments r1 to rN are user-defined input functions that read the argu‐
154       ment  corresponding  to  the  %r  conversions  specified  in the format
155       string.
156
157   Format string description
158       The format string is a character string which contains three  types  of
159       objects:
160
161       -plain  characters, which are simply matched with the characters of the
162       input (with a special case for space and line feed, see Scanf.space ),
163
164       -conversion specifications, each of which causes reading and conversion
165       of one argument for the function f (see Scanf.conversion ),
166
167       -scanning  indications  to  specify  boundaries of tokens (see scanning
168       Scanf.indication ).
169
170
171   The space character in format strings
172       As mentioned above, a plain character in  the  format  string  is  just
173       matched  with  the next character of the input; however, two characters
174       are special exceptions to this rule: the space character ( ' ' or ASCII
175       code 32) and the line feed character ( '\n' or ASCII code 10).  A space
176       does not match a single space character, but any amount of 'whitespace'
177       in  the input. More precisely, a space inside the format string matches
178       any number of tab, space, line feed  and  carriage  return  characters.
179       Similarly,  a line feed character in the format string matches either a
180       single line feed or a carriage return followed by a line feed.
181
182       Matching any amount of whitespace, a space in the  format  string  also
183       matches no amount of whitespace at all; hence, the call bscanf ib
184            "Price = %d $" (fun p -> p) succeeds and returns 1 when reading an
185       input with various whitespace in it, such as Price = 1 $ , Price  =   1
186       $ , or even Price=1$ .
187
188   Conversion specifications in format strings
189       Conversion  specifications  consist  in the % character, followed by an
190       optional flag, an optional field width, and followed by one or two con‐
191       version characters.
192
193       The conversion characters and their meanings are:
194
195
196       - d : reads an optionally signed decimal integer ( 0-9 +).
197
198       -  i  : reads an optionally signed integer (usual input conventions for
199       decimal ( 0-9 +), hexadecimal ( 0x[0-9a-f]+ and 0X[0-9A-F]+ ), octal  (
200       0o[0-7]+ ), and binary ( 0b[0-1]+ ) notations are understood).
201
202       - u : reads an unsigned decimal integer.
203
204       - x or X : reads an unsigned hexadecimal integer ( [0-9a-fA-F]+ ).
205
206       - o : reads an unsigned octal integer ( [0-7]+ ).
207
208       -  s  : reads a string argument that spreads as much as possible, until
209       the following bounding condition holds:
210
211       -a whitespace has been found (see Scanf.space ),
212
213       -a scanning indication (see scanning Scanf.indication )  has  been  en‐
214       countered,
215
216       -the end-of-input has been reached.
217
218       Hence,  this  conversion always succeeds: it returns an empty string if
219       the bounding condition holds when the scan begins.
220
221       - S : reads a delimited string argument (delimiters and special escaped
222       characters follow the lexical conventions of OCaml).
223
224       -  c  :  reads  a single character. To test the current input character
225       without reading it, specify a null field width, i.e. use  specification
226       %0c  .  Raise  Invalid_argument  ,  if the field width specification is
227       greater than 1.
228
229       - C : reads a single delimited character (delimiters  and  special  es‐
230       caped characters follow the lexical conventions of OCaml).
231
232       -  f , e , E , g , G : reads an optionally signed floating-point number
233       in decimal notation, in the style dddd.ddd
234             e/E+-dd .
235
236       - h , H : reads an optionally signed floating-point number in hexadeci‐
237       mal notation.
238
239       -  F  :  reads a floating point number according to the lexical conven‐
240       tions of OCaml (hence the decimal point is mandatory  if  the  exponent
241       part is not mentioned).
242
243       - B : reads a boolean argument ( true or false ).
244
245       -  b : reads a boolean argument (for backward compatibility; do not use
246       in new programs).
247
248       - ld , li , lu , lx , lX , lo : reads an int32 argument to  the  format
249       specified by the second letter for regular integers.
250
251       -  nd , ni , nu , nx , nX , no : reads a nativeint argument to the for‐
252       mat specified by the second letter for regular integers.
253
254       - Ld , Li , Lu , Lx , LX , Lo : reads an int64 argument to  the  format
255       specified by the second letter for regular integers.
256
257       -  [ range ] : reads characters that matches one of the characters men‐
258       tioned in the range of characters range (or not mentioned in it, if the
259       range  starts  with  ^ ). Reads a string that can be empty, if the next
260       input character does not match the range. The set of characters from c1
261       to  c2  (inclusively)  is  denoted  by c1-c2 .  Hence, %[0-9] returns a
262       string representing a decimal number or an empty string if  no  decimal
263       digit  is  found;  similarly, %[0-9a-f] returns a string of hexadecimal
264       digits.  If a closing bracket appears in a range, it must occur as  the
265       first  character  of  the  range  (or just after the ^ in case of range
266       negation); hence []] matches a ] character and [^]] matches any charac‐
267       ter that is not ] .  Use %% and %@ to include a % or a @ in a range.
268
269       -  r  : user-defined reader. Takes the next ri formatted input function
270       and applies it to the scanning buffer ib to read the next argument. The
271       input  function  ri  must therefore have type Scanning.in_channel -> 'a
272       and the argument read has type 'a .
273
274       - { fmt %} : reads a format string argument.  The  format  string  read
275       must  have  the  same type as the format string specification fmt . For
276       instance, "%{ %i %}" reads any format string that can read a  value  of
277       type  int  ;  hence,  if  s is the string "fmt:\"number is %u\"" , then
278       Scanf.sscanf s "fmt: %{%i%}" succeeds and  returns  the  format  string
279       "number is %u" .
280
281       -  (  fmt %) : scanning sub-format substitution.  Reads a format string
282       rf in the input, then goes on scanning with rf instead of scanning with
283       fmt  .   The  format  string  rf  must have the same type as the format
284       string specification fmt that it replaces.  For instance,  "%(  %i  %)"
285       reads  any  format string that can read a value of type int .  The con‐
286       version returns the format string read rf , and then a value read using
287       rf  .  Hence, if s is the string "\"%4d\"1234.00" , then Scanf.sscanf s
288       "%(%i%)" (fun fmt i -> fmt, i) evaluates to ("%4d", 1234) .   This  be‐
289       haviour  is  not mere format substitution, since the conversion returns
290       the format string read as additional argument. If you need pure  format
291       substitution,  use  special  flag _ to discard the extraneous argument:
292       conversion %_( fmt %) reads a format string rf  and  then  behaves  the
293       same  as format string rf .  Hence, if s is the string "\"%4d\"1234.00"
294       , then Scanf.sscanf s "%_(%i%)" is simply  equivalent  to  Scanf.sscanf
295       "1234.00" "%4d" .
296
297       - l : returns the number of lines read so far.
298
299       - n : returns the number of characters read so far.
300
301       - N or L : returns the number of tokens read so far.
302
303       - !  : matches the end of input condition.
304
305       - % : matches one % character in the input.
306
307       - @ : matches one @ character in the input.
308
309       - , : does nothing.
310
311       Following  the  %  character that introduces a conversion, there may be
312       the special flag _ : the conversion that follows occurs as  usual,  but
313       the  resulting  value is discarded.  For instance, if f is the function
314       fun i -> i + 1 , and s is the string "x = 1" , then Scanf.sscanf s "%_s
315       = %i" f returns 2 .
316
317       The  field  width is composed of an optional integer literal indicating
318       the maximal width of the token to read.  For instance, %6d reads an in‐
319       teger,  having at most 6 decimal digits; %4f reads a float with at most
320       4 characters; and %8[\000-\255] returns the next 8 characters  (or  all
321       the  characters  still available, if fewer than 8 characters are avail‐
322       able in the input).
323
324       Notes:
325
326
327       -as mentioned above, a %s conversion always succeeds, even if there  is
328       nothing to read in the input: in this case, it simply returns "" .
329
330
331       -in  addition  to the relevant digits, '_' characters may appear inside
332       numbers (this is reminiscent to the usual OCaml  lexical  conventions).
333       If  stricter scanning is desired, use the range conversion facility in‐
334       stead of the number conversions.
335
336
337       -the scanf facility is not intended for heavy duty lexical analysis and
338       parsing.  If  it  appears not expressive enough for your needs, several
339       alternative exists: regular expressions (module Str ), stream  parsers,
340       ocamllex -generated lexers, ocamlyacc -generated parsers.
341
342
343   Scanning indications in format strings
344       Scanning indications appear just after the string conversions %s and %[
345       range ] to delimit the end of the token. A scanning indication  is  in‐
346       troduced  by  a  @  character,  followed by some plain character c . It
347       means that the string token should end just before the next matching  c
348       (which  is skipped). If no c character is encountered, the string token
349       spreads as much as possible. For instance, "%s@\t" reads a string up to
350       the next tab character or to the end of input. If a @ character appears
351       anywhere else in the format string, it is treated as a plain character.
352
353       Note:
354
355
356       -As usual in format strings, % and @ characters must be  escaped  using
357       %% and %@ ; this rule still holds within range specifications and scan‐
358       ning indications.  For instance, format "%s@%%" reads a  string  up  to
359       the  next % character, and format "%s@%@" reads a string up to the next
360       @ .
361
362       -The scanning indications introduce slight differences in the syntax of
363       Scanf  format  strings,  compared  to those used for the Printf module.
364       However, the scanning indications are similar to those used in the For‐
365       mat  module;  hence,  when  producing  formatted  text to be scanned by
366       Scanf.bscanf , it is wise to use printing  functions  from  the  Format
367       module  (or, if you need to use functions from Printf , banish or care‐
368       fully double check the format strings that contain '@' characters).
369
370
371   Exceptions during scanning
372       Scanners may raise the following exceptions when the  input  cannot  be
373       read according to the format string:
374
375
376       -Raise Scanf.Scan_failure if the input does not match the format.
377
378
379       -Raise Failure if a conversion to a number is not possible.
380
381
382       -Raise  End_of_file  if the end of input is encountered while some more
383       characters are needed to read the current conversion specification.
384
385
386       -Raise Invalid_argument if the format string is invalid.
387
388       Note:
389
390
391       -as a consequence, scanning a  %s  conversion  never  raises  exception
392       End_of_file  :  if  the end of input is reached the conversion succeeds
393       and simply returns the characters read so far, or "" if none were  ever
394       read.
395
396
397   Specialised formatted input functions
398       val sscanf : string -> ('a, 'b, 'c, 'd) scanner
399
400       Same as Scanf.bscanf , but reads from the given string.
401
402
403
404       val scanf : ('a, 'b, 'c, 'd) scanner
405
406       Same  as  Scanf.bscanf  , but reads from the predefined formatted input
407       channel Scanf.Scanning.stdin that is connected to stdin .
408
409
410
411       val kscanf : Scanning.in_channel -> (Scanning.in_channel -> exn ->  'd)
412       -> ('a, 'b, 'c, 'd) scanner
413
414       Same  as  Scanf.bscanf  ,  but takes an additional function argument ef
415       that is called in case of error: if the scanning process or  some  con‐
416       version  fails,  the  scanning function aborts and calls the error han‐
417       dling function ef with the formatted input channel  and  the  exception
418       that aborted the scanning process as arguments.
419
420
421
422       val  ksscanf : string -> (Scanning.in_channel -> exn -> 'd) -> ('a, 'b,
423       'c, 'd) scanner
424
425       Same as Scanf.kscanf but reads from the given string.
426
427
428       Since 4.02.0
429
430
431
432
433   Reading format strings from input
434       val bscanf_format : Scanning.in_channel -> ('a, 'b,  'c,  'd,  'e,  'f)
435       format6 -> (('a, 'b, 'c, 'd, 'e, 'f) format6 -> 'g) -> 'g
436
437
438       bscanf_format  ic  fmt f reads a format string token from the formatted
439       input channel ic , according to the given format string fmt ,  and  ap‐
440       plies f to the resulting format string value.
441
442
443       Since 3.09.0
444
445
446       Raises  Scan_failure  if the format string value read does not have the
447       same type as fmt .
448
449
450
451       val sscanf_format : string -> ('a, 'b, 'c, 'd, 'e, 'f) format6 -> (('a,
452       'b, 'c, 'd, 'e, 'f) format6 -> 'g) -> 'g
453
454       Same as Scanf.bscanf_format , but reads from the given string.
455
456
457       Since 3.09.0
458
459
460
461       val  format_from_string : string -> ('a, 'b, 'c, 'd, 'e, 'f) format6 ->
462       ('a, 'b, 'c, 'd, 'e, 'f) format6
463
464
465       format_from_string s fmt converts a string argument to a format string,
466       according to the given format string fmt .
467
468
469       Since 3.10.0
470
471
472       Raises Scan_failure if s , considered as a format string, does not have
473       the same type as fmt .
474
475
476
477       val unescaped : string -> string
478
479
480       unescaped s return a copy of s with escape sequences (according to  the
481       lexical  conventions  of OCaml) replaced by their corresponding special
482       characters.  More precisely, Scanf.unescaped has  the  following  prop‐
483       erty: for all string s , Scanf.unescaped (String.escaped s) = s .
484
485       Always  return  a  copy of the argument, even if there is no escape se‐
486       quence in the argument.
487
488
489       Since 4.00.0
490
491
492       Raises Scan_failure if s is not properly escaped (i.e.  s  has  invalid
493       escape  sequences or special characters that are not properly escaped).
494       For instance, Scanf.unescaped "\"" will fail.
495
496
497
498
499   Deprecated
500       val fscanf : in_channel -> ('a, 'b, 'c, 'd) scanner
501
502       Deprecated.
503
504       Scanf.fscanf is error prone and deprecated since 4.03.0.
505
506       This function violates the following invariant of the Scanf module:  To
507       preserve  scanning  semantics,  all scanning functions defined in Scanf
508       must read from a user defined Scanf.Scanning.in_channel formatted input
509       channel.
510
511       If  you need to read from a in_channel input channel ic , simply define
512       a Scanf.Scanning.in_channel formatted input channel  as  in  let  ib  =
513       Scanning.from_channel ic , then use Scanf.bscanf ib as usual.
514
515
516
517       val  kfscanf : in_channel -> (Scanning.in_channel -> exn -> 'd) -> ('a,
518       'b, 'c, 'd) scanner
519
520       Deprecated.
521
522       Scanf.kfscanf is error prone and deprecated since 4.03.0.
523
524
525
526
527
528OCamldoc                          2021-07-22                   Stdlib.Scanf(3)
Impressum