1FASTAVRO(1)                        fastavro                        FASTAVRO(1)
2
3
4

NAME

6       fastavro - fastavro Documentation
7
8       The current Python avro package is packed with features but dog slow.
9
10       On  a  test  case of about 10K records, it takes about 14sec to iterate
11       over all of them. In comparison the JAVA avro  SDK  does  it  in  about
12       1.9sec.
13
14       fastavro  is less feature complete than avro, however it’s much faster.
15       It iterates over the same 10K records in 2.9sec, and if you use it with
16       PyPy  it’ll  do  it  in 1.5sec (to be fair, the JAVA benchmark is doing
17       some extra JSON encoding/decoding).
18
19       If the optional C extension (generated by Cython)  is  available,  then
20       fastavro  will  be  even  faster. For the same 10K records it’ll run in
21       about 1.7sec.
22

EXAMPLE

24          # Writing
25          from fastavro import writer
26
27          schema = {
28              'doc': 'A weather reading.',
29              'name': 'Weather',
30              'namespace': 'test',
31              'type': 'record',
32              'fields': [
33                  {'name': 'station', 'type': 'string'},
34                  {'name': 'time', 'type': 'long'},
35                  {'name': 'temp', 'type': 'int'},
36              ],
37          }
38
39          # 'records' can be an iterable (including generator)
40          records = [
41              {u'station': u'011990-99999', u'temp': 0, u'time': 1433269388},
42              {u'station': u'011990-99999', u'temp': 22, u'time': 1433270389},
43              {u'station': u'011990-99999', u'temp': -11, u'time': 1433273379},
44              {u'station': u'012650-99999', u'temp': 111, u'time': 1433275478},
45          ]
46
47          with open('weather.avro', 'wb') as out:
48              writer(out, schema, records)
49
50          # Reading
51          import fastavro
52
53          with open('weather.avro', 'rb') as fo:
54              reader = fastavro.reader(fo)
55              schema = reader.schema
56
57              for record in reader:
58                  print(record)
59

DOCUMENTATION

61   fastavro.read
62       class reader(fo, reader_schema=None)
63              Iterator over records in an avro file.
64
65              Parameters
66
67                     · fo (file-like) – Input stream
68
69                     · reader_schema (dict, optional) – Reader schema
70
71              Example:
72
73                 from fastavro import reader
74                 with open('some-file.avro', 'rb') as fo:
75                     avro_reader = reader(fo)
76                     schema = avro_reader.schema
77                     for record in avro_reader:
78                         process_record(record)
79
80       class block_reader(fo, reader_schema=None)
81              Iterator over blocks in an avro file.
82
83              Parameters
84
85                     · fo (file-like) – Input stream
86
87                     · reader_schema (dict, optional) – Reader schema
88
89              Example:
90
91                 from fastavro import block_reader
92                 with open('some-file.avro', 'rb') as fo:
93                     avro_reader = block_reader(fo)
94                     schema = avro_reader.schema
95                     for block in avro_reader:
96                         process_block(block)
97
98       schemaless_reader(fo, writer_schema, reader_schema=None)
99              Reads a single record writen using the schemaless_writer
100
101              Parameters
102
103                     · fo (file-like) – Input stream
104
105                     · writer_schema (dict) – Schema used when calling schema‐
106                       less_writer
107
108                     · reader_schema  (dict,  optional)  –  If  the schema has
109                       changed since being written then the new schema can  be
110                       given to allow for schema migration
111
112              Example:
113
114                 with open('file.avro', 'rb') as fp:
115                     record = fastavro.schemaless_reader(fp, schema)
116
117              Note: The schemaless_reader can only read a single record.
118
119       is_avro(path_or_buffer)
120              Return True if path (or buffer) points to an Avro file.
121
122              Parameters
123                     path_or_buffer  (path to file or file-like object) – Path
124                     to file
125
126   fastavro.write
127       writer(fo, schema, records,  codec='null',  sync_interval=16000,  meta‐
128       data=None, validator=None)
129              Write records to fo (stream) according to schema
130
131              Parameters
132
133                     · fo (file-like) – Output stream
134
135                     · records (iterable) – Records to write
136
137                     · codec  (string,  optional)  – Compression codec, can be
138                       ‘null’, ‘deflate’ or ‘snappy’ (if installed)
139
140                     · sync_interval (int, optional) – Size of sync interval
141
142                     · metadata (dict, optional) – Header metadata
143
144                     · validator (None, True or a function) – Validator  func‐
145                       tion.  If  None  (the default) - no validation. If True
146                       then then fastavro.validation.validate will be used. If
147                       it’s  a  function, it should have the same signature as
148                       fastavro.writer.validate  and  raise  an  exeption   on
149                       error.
150
151              Example:
152
153                 from fastavro import writer
154
155                 schema = {
156                     'doc': 'A weather reading.',
157                     'name': 'Weather',
158                     'namespace': 'test',
159                     'type': 'record',
160                     'fields': [
161                         {'name': 'station', 'type': 'string'},
162                         {'name': 'time', 'type': 'long'},
163                         {'name': 'temp', 'type': 'int'},
164                     ],
165                 }
166
167                 records = [
168                     {u'station': u'011990-99999', u'temp': 0, u'time': 1433269388},
169                     {u'station': u'011990-99999', u'temp': 22, u'time': 1433270389},
170                     {u'station': u'011990-99999', u'temp': -11, u'time': 1433273379},
171                     {u'station': u'012650-99999', u'temp': 111, u'time': 1433275478},
172                 ]
173
174                 with open('weather.avro', 'wb') as out:
175                     writer(out, schema, records)
176
177       schemaless_writer(fo, schema, record)
178              Write a single record without the schema or header information
179
180              Parameters
181
182                     · fo (file-like) – Output file
183
184                     · schema (dict) – Schema
185
186                     · record (dict) – Record to write
187
188              Example:
189
190                 with open('file.avro', 'rb') as fp:
191                     fastavro.schemaless_writer(fp, schema, record)
192
193              Note: The schemaless_writer can only write a single record.
194
195   fastavro.validation
196       validate(datum, schema, field=None, raise_errors=True)
197              Determine if a python datum is an instance of a schema.
198
199              Parameters
200
201                     · datum (Any) – Data being validated
202
203                     · schema (dict) – Schema
204
205                     · field (str, optional) – Record field being validated
206
207                     · raise_errors  (bool,  optional)  –  If true, errors are
208                       raised for  invalid  data.  If  false,  a  simple  True
209                       (valid) or False (invalid) result is returned
210
211              Example:
212
213                 from fastavro.validation import validate
214                 schema = {...}
215                 record = {...}
216                 validate(record, schema)
217
218       validate_many(records, schema, raise_errors=True)
219              Validate a list of data!
220
221              Parameters
222
223                     · records (iterable) – List of records to validate
224
225                     · schema (dict) – Schema
226
227                     · raise_errors  (bool,  optional)  –  If true, errors are
228                       raised for  invalid  data.  If  false,  a  simple  True
229                       (valid) or False (invalid) result is returned
230
231              Example:
232
233                 from fastavro.validation import validate_many
234                 schema = {...}
235                 records = [{...}, {...}, ...]
236                 validate_many(records, schema)
237
238   fastavro command line script
239       A command line script is installed with the library that can be used to
240       dump the contents of avro file(s) to the standard output.
241
242       Usage:
243
244          usage: fastavro [-h] [--schema] [--codecs] [--version] [-p] [file [file ...]]
245
246          iter over avro file, emit records as JSON
247
248          positional arguments:
249            file          file(s) to parse
250
251          optional arguments:
252            -h, --help    show this help message and exit
253            --schema      dump schema instead of records
254            --codecs      print supported codecs
255            --version     show program's version number and exit
256            -p, --pretty  pretty print json
257
258   Examples
259       Read an avro file:
260
261          $ fastavro weather.avro
262
263          {"temp": 0, "station": "011990-99999", "time": -619524000000}
264          {"temp": 22, "station": "011990-99999", "time": -619506000000}
265          {"temp": -11, "station": "011990-99999", "time": -619484400000}
266          {"temp": 111, "station": "012650-99999", "time": -655531200000}
267          {"temp": 78, "station": "012650-99999", "time": -655509600000}
268
269       Show the schema:
270
271          $ fastavro --schema weather.avro
272
273          {
274           "type": "record",
275           "namespace": "test",
276           "doc": "A weather reading.",
277           "fields": [
278            {
279             "type": "string",
280             "name": "station"
281            },
282            {
283             "type": "long",
284             "name": "time"
285            },
286            {
287             "type": "int",
288             "name": "temp"
289            }
290           ],
291           "name": "Weather"
292          }
293
294       · genindex
295
296       · modindex
297
298       · search
299

AUTHOR

301       Miki Tebeka
302
304       2012, Miki Tebeka
305
306
307
308
3090.19.8                           Jul 15, 2018                      FASTAVRO(1)
Impressum