1binary(3) Erlang Module Definition binary(3)
2
3
4
6 binary - Library for handling binary data.
7
9 This module contains functions for manipulating byte-oriented binaries.
10 Although the majority of functions could be provided using bit-syntax,
11 the functions in this library are highly optimized and are expected to
12 either execute faster or consume less memory, or both, than a counter‐
13 part written in pure Erlang.
14
15 The module is provided according to Erlang Enhancement Proposal (EEP)
16 31.
17
18 Note:
19 The library handles byte-oriented data. For bitstrings that are not
20 binaries (does not contain whole octets of bits) a badarg exception is
21 thrown from any of the functions in this module.
22
23
25 cp()
26
27 Opaque data type representing a compiled search pattern. Guaran‐
28 teed to be a tuple() to allow programs to distinguish it from
29 non-precompiled search patterns.
30
31 part() = {Start :: integer() >= 0, Length :: integer()}
32
33 A representaion of a part (or range) in a binary. Start is a
34 zero-based offset into a binary() and Length is the length of
35 that part. As input to functions in this module, a reverse part
36 specification is allowed, constructed with a negative Length, so
37 that the part of the binary begins at Start + Length and is
38 -Length long. This is useful for referencing the last N bytes of
39 a binary as {size(Binary), -N}. The functions in this module
40 always return part()s with positive Length.
41
43 at(Subject, Pos) -> byte()
44
45 Types:
46
47 Subject = binary()
48 Pos = integer() >= 0
49
50 Returns the byte at position Pos (zero-based) in binary Subject
51 as an integer. If Pos >= byte_size(Subject), a badarg exception
52 is raised.
53
54 bin_to_list(Subject) -> [byte()]
55
56 Types:
57
58 Subject = binary()
59
60 Same as bin_to_list(Subject, {0,byte_size(Subject)}).
61
62 bin_to_list(Subject, PosLen) -> [byte()]
63
64 Types:
65
66 Subject = binary()
67 PosLen = part()
68
69 Converts Subject to a list of byte()s, each representing the
70 value of one byte. part() denotes which part of the binary() to
71 convert.
72
73 Example:
74
75 1> binary:bin_to_list(<<"erlang">>, {1,3}).
76 "rla"
77 %% or [114,108,97] in list notation.
78
79 If PosLen in any way references outside the binary, a badarg
80 exception is raised.
81
82 bin_to_list(Subject, Pos, Len) -> [byte()]
83
84 Types:
85
86 Subject = binary()
87 Pos = integer() >= 0
88 Len = integer()
89
90 Same as bin_to_list(Subject, {Pos, Len}).
91
92 compile_pattern(Pattern) -> cp()
93
94 Types:
95
96 Pattern = binary() | [binary()]
97
98 Builds an internal structure representing a compilation of a
99 search pattern, later to be used in functions match/3,
100 matches/3, split/3, or replace/4. The cp() returned is guaran‐
101 teed to be a tuple() to allow programs to distinguish it from
102 non-precompiled search patterns.
103
104 When a list of binaries is specified, it denotes a set of alter‐
105 native binaries to search for. For example, if [<<"func‐
106 tional">>,<<"programming">>] is specified as Pattern, this means
107 either <<"functional">> or <<"programming">>". The pattern is a
108 set of alternatives; when only a single binary is specified, the
109 set has only one element. The order of alternatives in a pattern
110 is not significant.
111
112 The list of binaries used for search alternatives must be flat
113 and proper.
114
115 If Pattern is not a binary or a flat proper list of binaries
116 with length > 0, a badarg exception is raised.
117
118 copy(Subject) -> binary()
119
120 Types:
121
122 Subject = binary()
123
124 Same as copy(Subject, 1).
125
126 copy(Subject, N) -> binary()
127
128 Types:
129
130 Subject = binary()
131 N = integer() >= 0
132
133 Creates a binary with the content of Subject duplicated N times.
134
135 This function always creates a new binary, even if N = 1. By
136 using copy/1 on a binary referencing a larger binary, one can
137 free up the larger binary for garbage collection.
138
139 Note:
140 By deliberately copying a single binary to avoid referencing a
141 larger binary, one can, instead of freeing up the larger binary
142 for later garbage collection, create much more binary data than
143 needed. Sharing binary data is usually good. Only in special
144 cases, when small parts reference large binaries and the large
145 binaries are no longer used in any process, deliberate copying
146 can be a good idea.
147
148
149 If N < 0, a badarg exception is raised.
150
151 decode_unsigned(Subject) -> Unsigned
152
153 Types:
154
155 Subject = binary()
156 Unsigned = integer() >= 0
157
158 Same as decode_unsigned(Subject, big).
159
160 decode_unsigned(Subject, Endianness) -> Unsigned
161
162 Types:
163
164 Subject = binary()
165 Endianness = big | little
166 Unsigned = integer() >= 0
167
168 Converts the binary digit representation, in big endian or lit‐
169 tle endian, of a positive integer in Subject to an Erlang inte‐
170 ger().
171
172 Example:
173
174 1> binary:decode_unsigned(<<169,138,199>>,big).
175 11111111
176
177 encode_unsigned(Unsigned) -> binary()
178
179 Types:
180
181 Unsigned = integer() >= 0
182
183 Same as encode_unsigned(Unsigned, big).
184
185 encode_unsigned(Unsigned, Endianness) -> binary()
186
187 Types:
188
189 Unsigned = integer() >= 0
190 Endianness = big | little
191
192 Converts a positive integer to the smallest possible representa‐
193 tion in a binary digit representation, either big endian or lit‐
194 tle endian.
195
196 Example:
197
198 1> binary:encode_unsigned(11111111, big).
199 <<169,138,199>>
200
201 first(Subject) -> byte()
202
203 Types:
204
205 Subject = binary()
206
207 Returns the first byte of binary Subject as an integer. If the
208 size of Subject is zero, a badarg exception is raised.
209
210 last(Subject) -> byte()
211
212 Types:
213
214 Subject = binary()
215
216 Returns the last byte of binary Subject as an integer. If the
217 size of Subject is zero, a badarg exception is raised.
218
219 list_to_bin(ByteList) -> binary()
220
221 Types:
222
223 ByteList = iolist()
224
225 Works exactly as erlang:list_to_binary/1, added for complete‐
226 ness.
227
228 longest_common_prefix(Binaries) -> integer() >= 0
229
230 Types:
231
232 Binaries = [binary()]
233
234 Returns the length of the longest common prefix of the binaries
235 in list Binaries.
236
237 Example:
238
239 1> binary:longest_common_prefix([<<"erlang">>, <<"ergonomy">>]).
240 2
241 2> binary:longest_common_prefix([<<"erlang">>, <<"perl">>]).
242 0
243
244 If Binaries is not a flat list of binaries, a badarg exception
245 is raised.
246
247 longest_common_suffix(Binaries) -> integer() >= 0
248
249 Types:
250
251 Binaries = [binary()]
252
253 Returns the length of the longest common suffix of the binaries
254 in list Binaries.
255
256 Example:
257
258 1> binary:longest_common_suffix([<<"erlang">>, <<"fang">>]).
259 3
260 2> binary:longest_common_suffix([<<"erlang">>, <<"perl">>]).
261 0
262
263 If Binaries is not a flat list of binaries, a badarg exception
264 is raised.
265
266 match(Subject, Pattern) -> Found | nomatch
267
268 Types:
269
270 Subject = binary()
271 Pattern = binary() | [binary()] | cp()
272 Found = part()
273
274 Same as match(Subject, Pattern, []).
275
276 match(Subject, Pattern, Options) -> Found | nomatch
277
278 Types:
279
280 Subject = binary()
281 Pattern = binary() | [binary()] | cp()
282 Found = part()
283 Options = [Option]
284 Option = {scope, part()}
285 part() = {Start :: integer() >= 0, Length :: integer()}
286
287 Searches for the first occurrence of Pattern in Subject and
288 returns the position and length.
289
290 The function returns {Pos, Length} for the binary in Pattern,
291 starting at the lowest position in Subject.
292
293 Example:
294
295 1> binary:match(<<"abcde">>, [<<"bcde">>, <<"cd">>],[]).
296 {1,4}
297
298 Even though <<"cd">> ends before <<"bcde">>, <<"bcde">> begins
299 first and is therefore the first match. If two overlapping
300 matches begin at the same position, the longest is returned.
301
302 Summary of the options:
303
304 {scope, {Start, Length}}:
305 Only the specified part is searched. Return values still
306 have offsets from the beginning of Subject. A negative
307 Length is allowed as described in section Data Types in this
308 manual.
309
310 If none of the strings in Pattern is found, the atom nomatch is
311 returned.
312
313 For a description of Pattern, see function compile_pattern/1.
314
315 If {scope, {Start,Length}} is specified in the options such that
316 Start > size of Subject, Start + Length < 0 or Start + Length >
317 size of Subject, a badarg exception is raised.
318
319 matches(Subject, Pattern) -> Found
320
321 Types:
322
323 Subject = binary()
324 Pattern = binary() | [binary()] | cp()
325 Found = [part()]
326
327 Same as matches(Subject, Pattern, []).
328
329 matches(Subject, Pattern, Options) -> Found
330
331 Types:
332
333 Subject = binary()
334 Pattern = binary() | [binary()] | cp()
335 Found = [part()]
336 Options = [Option]
337 Option = {scope, part()}
338 part() = {Start :: integer() >= 0, Length :: integer()}
339
340 As match/2, but Subject is searched until exhausted and a list
341 of all non-overlapping parts matching Pattern is returned (in
342 order).
343
344 The first and longest match is preferred to a shorter, which is
345 illustrated by the following example:
346
347 1> binary:matches(<<"abcde">>,
348 [<<"bcde">>,<<"bc">>,<<"de">>],[]).
349 [{1,4}]
350
351 The result shows that <<"bcde">> is selected instead of the
352 shorter match <<"bc">> (which would have given raise to one more
353 match, <<"de">>). This corresponds to the behavior of POSIX reg‐
354 ular expressions (and programs like awk), but is not consistent
355 with alternative matches in re (and Perl), where instead lexical
356 ordering in the search pattern selects which string matches.
357
358 If none of the strings in a pattern is found, an empty list is
359 returned.
360
361 For a description of Pattern, see compile_pattern/1. For a
362 description of available options, see match/3.
363
364 If {scope, {Start,Length}} is specified in the options such that
365 Start > size of Subject, Start + Length < 0 or Start + Length is
366 > size of Subject, a badarg exception is raised.
367
368 part(Subject, PosLen) -> binary()
369
370 Types:
371
372 Subject = binary()
373 PosLen = part()
374
375 Extracts the part of binary Subject described by PosLen.
376
377 A negative length can be used to extract bytes at the end of a
378 binary:
379
380 1> Bin = <<1,2,3,4,5,6,7,8,9,10>>.
381 2> binary:part(Bin, {byte_size(Bin), -5}).
382 <<6,7,8,9,10>>
383
384 Note:
385 part/2 and part/3 are also available in the erlang module under
386 the names binary_part/2 and binary_part/3. Those BIFs are
387 allowed in guard tests.
388
389
390 If PosLen in any way references outside the binary, a badarg
391 exception is raised.
392
393 part(Subject, Pos, Len) -> binary()
394
395 Types:
396
397 Subject = binary()
398 Pos = integer() >= 0
399 Len = integer()
400
401 Same as part(Subject, {Pos, Len}).
402
403 referenced_byte_size(Binary) -> integer() >= 0
404
405 Types:
406
407 Binary = binary()
408
409 If a binary references a larger binary (often described as being
410 a subbinary), it can be useful to get the size of the referenced
411 binary. This function can be used in a program to trigger the
412 use of copy/1. By copying a binary, one can dereference the
413 original, possibly large, binary that a smaller binary is a ref‐
414 erence to.
415
416 Example:
417
418 store(Binary, GBSet) ->
419 NewBin =
420 case binary:referenced_byte_size(Binary) of
421 Large when Large > 2 * byte_size(Binary) ->
422 binary:copy(Binary);
423 _ ->
424 Binary
425 end,
426 gb_sets:insert(NewBin,GBSet).
427
428 In this example, we chose to copy the binary content before
429 inserting it in gb_sets:set() if it references a binary more
430 than twice the data size we want to keep. Of course, different
431 rules apply when copying to different programs.
432
433 Binary sharing occurs whenever binaries are taken apart. This is
434 the fundamental reason why binaries are fast, decomposition can
435 always be done with O(1) complexity. In rare circumstances this
436 data sharing is however undesirable, why this function together
437 with copy/1 can be useful when optimizing for memory use.
438
439 Example of binary sharing:
440
441 1> A = binary:copy(<<1>>, 100).
442 <<1,1,1,1,1 ...
443 2> byte_size(A).
444 100
445 3> binary:referenced_byte_size(A).
446 100
447 4> <<B:10/binary, C:90/binary>> = A.
448 <<1,1,1,1,1 ...
449 5> {byte_size(B), binary:referenced_byte_size(B)}.
450 {10,10}
451 6> {byte_size(C), binary:referenced_byte_size(C)}.
452 {90,100}
453
454 In the above example, the small binary B was copied while the
455 larger binary C references binary A.
456
457 Note:
458 Binary data is shared among processes. If another process still
459 references the larger binary, copying the part this process uses
460 only consumes more memory and does not free up the larger binary
461 for garbage collection. Use this kind of intrusive functions
462 with extreme care and only if a real problem is detected.
463
464
465 replace(Subject, Pattern, Replacement) -> Result
466
467 Types:
468
469 Subject = binary()
470 Pattern = binary() | [binary()] | cp()
471 Replacement = Result = binary()
472
473 Same as replace(Subject, Pattern, Replacement,[]).
474
475 replace(Subject, Pattern, Replacement, Options) -> Result
476
477 Types:
478
479 Subject = binary()
480 Pattern = binary() | [binary()] | cp()
481 Replacement = binary()
482 Options = [Option]
483 Option = global | {scope, part()} | {insert_replaced, InsPos}
484 InsPos = OnePos | [OnePos]
485 OnePos = integer() >= 0
486 An integer() =< byte_size(Replacement)
487 Result = binary()
488
489 Constructs a new binary by replacing the parts in Subject match‐
490 ing Pattern with the content of Replacement.
491
492 If the matching subpart of Subject giving raise to the replace‐
493 ment is to be inserted in the result, option {insert_replaced,
494 InsPos} inserts the matching part into Replacement at the speci‐
495 fied position (or positions) before inserting Replacement into
496 Subject.
497
498 Example:
499
500 1> binary:replace(<<"abcde">>,<<"b">>,<<"[]">>, [{insert_replaced,1}]).
501 <<"a[b]cde">>
502 2> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[]">>,[global,{insert_replaced,1}]).
503 <<"a[b]c[d]e">>
504 3> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[]">>,[global,{insert_replaced,[1,1]}]).
505 <<"a[bb]c[dd]e">>
506 4> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[-]">>,[global,{insert_replaced,[1,2]}]).
507 <<"a[b-b]c[d-d]e">>
508
509 If any position specified in InsPos > size of the replacement
510 binary, a badarg exception is raised.
511
512 Options global and {scope, part()} work as for split/3. The
513 return type is always a binary().
514
515 For a description of Pattern, see compile_pattern/1.
516
517 split(Subject, Pattern) -> Parts
518
519 Types:
520
521 Subject = binary()
522 Pattern = binary() | [binary()] | cp()
523 Parts = [binary()]
524
525 Same as split(Subject, Pattern, []).
526
527 split(Subject, Pattern, Options) -> Parts
528
529 Types:
530
531 Subject = binary()
532 Pattern = binary() | [binary()] | cp()
533 Options = [Option]
534 Option = {scope, part()} | trim | global | trim_all
535 Parts = [binary()]
536
537 Splits Subject into a list of binaries based on Pattern. If
538 option global is not specified, only the first occurrence of
539 Pattern in Subject gives rise to a split.
540
541 The parts of Pattern found in Subject are not included in the
542 result.
543
544 Example:
545
546 1> binary:split(<<1,255,4,0,0,0,2,3>>, [<<0,0,0>>,<<2>>],[]).
547 [<<1,255,4>>, <<2,3>>]
548 2> binary:split(<<0,1,0,0,4,255,255,9>>, [<<0,0>>, <<255,255>>],[global]).
549 [<<0,1>>,<<4>>,<<9>>]
550
551 Summary of options:
552
553 {scope, part()}:
554 Works as in match/3 and matches/3. Notice that this only
555 defines the scope of the search for matching strings, it
556 does not cut the binary before splitting. The bytes before
557 and after the scope are kept in the result. See the example
558 below.
559
560 trim:
561 Removes trailing empty parts of the result (as does trim in
562 re:split/3.
563
564 trim_all:
565 Removes all empty parts of the result.
566
567 global:
568 Repeats the split until Subject is exhausted. Conceptually
569 option global makes split work on the positions returned by
570 matches/3, while it normally works on the position returned
571 by match/3.
572
573 Example of the difference between a scope and taking the binary
574 apart before splitting:
575
576 1> binary:split(<<"banana">>, [<<"a">>],[{scope,{2,3}}]).
577 [<<"ban">>,<<"na">>]
578 2> binary:split(binary:part(<<"banana">>,{2,3}), [<<"a">>],[]).
579 [<<"n">>,<<"n">>]
580
581 The return type is always a list of binaries that are all refer‐
582 encing Subject. This means that the data in Subject is not
583 copied to new binaries, and that Subject cannot be garbage col‐
584 lected until the results of the split are no longer referenced.
585
586 For a description of Pattern, see compile_pattern/1.
587
588
589
590Ericsson AB stdlib 3.12.1 binary(3)