1Fsdb(3) User Contributed Perl Documentation Fsdb(3)
2
3
4
6 Fsdb - a flat-text database for shell scripting
7
9 Fsdb, the flatfile streaming database is package of commands for
10 manipulating flat-ASCII databases from shell scripts. Fsdb is useful
11 to process medium amounts of data (with very little data you'd do it by
12 hand, with megabytes you might want a real database). Fsdb was known
13 as as Jdb from 1991 to Oct. 2008.
14
15 Fsdb is very good at doing things like:
16
17 • extracting measurements from experimental output
18
19 • examining data to address different hypotheses
20
21 • joining data from different experiments
22
23 • eliminating/detecting outliers
24
25 • computing statistics on data (mean, confidence intervals,
26 correlations, histograms)
27
28 • reformatting data for graphing programs
29
30 Fsdb is built around the idea of a flat text file as a database. Fsdb
31 files (by convention, with the extension .fsdb), have a header
32 documenting the schema (what the columns mean), and then each line
33 represents a database record (or row).
34
35 For example:
36
37 #fsdb experiment duration
38 ufs_mab_sys 37.2
39 ufs_mab_sys 37.3
40 ufs_rcp_real 264.5
41 ufs_rcp_real 277.9
42
43 Is a simple file with four experiments (the rows), each with a
44 description, size parameter, and run time in the first, second, and
45 third columns.
46
47 Rather than hand-code scripts to do each special case, Fsdb provides
48 higher-level functions. Although it's often easy throw together a
49 custom script to do any single task, I believe that there are several
50 advantages to using Fsdb:
51
52 • these programs provide a higher level interface than plain Perl, so
53
54 ** Fewer lines of simpler code:
55
56 dbrow '_experiment eq "ufs_mab_sys"' | dbcolstats duration
57
58 Picks out just one type of experiment and computes statistics
59 on it, rather than:
60
61 while (<>) { split; $sum+=$F[1]; $ss+=$F[1]**2; $n++; }
62 $mean = $sum / $n; $std_dev = ...
63
64 in dozens of places.
65
66 • the library uses names for columns, so
67
68 ** No more $F[1], use "_duration".
69
70 ** New or different order columns? No changes to your scripts!
71
72 Thus if your experiment gets more complicated with a size
73 parameter, so your log changes to:
74
75 #fsdb experiment size duration
76 ufs_mab_sys 1024 37.2
77 ufs_mab_sys 1024 37.3
78 ufs_rcp_real 1024 264.5
79 ufs_rcp_real 1024 277.9
80 ufs_mab_sys 2048 45.3
81 ufs_mab_sys 2048 44.2
82
83 Then the previous scripts still work, even though duration is now
84 the third column, not the second.
85
86 • A series of actions are self-documenting (the provenance of
87 processsing done to produce each output is recorded in comments).
88
89 ** No more wondering what hacks were used to compute the final
90 data, just look at the comments at the end of the output.
91
92 For example, the commands
93
94 dbrow '_experiment eq "ufs_mab_sys"' | dbcolstats duration
95
96 add to the end of the output the lines
97 # | dbrow _experiment eq "ufs_mab_sys"
98 # | dbcolstats duration
99
100 • The library is mature, supporting large datasets (more than 100GB),
101 corner cases, error handling, backed by an automated test suite.
102
103 ** No more puzzling about bad output because your custom script
104 skimped on error checking.
105
106 ** No more memory thrashing when you try to sort ten million
107 records.
108
109 • Fsdb-2.x supports Perl scripting (in addition to shell scripting),
110 with libraries to do Fsdb input and output, and easy support for
111 pipelines. The shell script
112
113 dbcol name test1 | dbroweval '_test1 += 5;'
114
115 can be written in perl as:
116
117 dbpipeline(dbcol(qw(name test1)), dbroweval('_test1 += 5;'));
118
119 (The disadvantage is that you need to learn what functions Fsdb
120 provides.)
121
122 Fsdb is built on flat-ASCII databases. By storing data in simple text
123 files and processing it with pipelines it is easy to experiment (in the
124 shell) and look at the output. To the best of my knowledge, the
125 original implementation of this idea was "/rdb", a commercial product
126 described in the book UNIX relational database management: application
127 development in the UNIX environment by Rod Manis, Evan Schaffer, and
128 Robert Jorgensen (1988 by Prentice Hall, and also at the web page
129 <http://www.rdb.com/>). Fsdb is an incompatible re-implementation of
130 their idea without any accelerated indexing or forms support. (But
131 it's free, and probably has better statistics!).
132
133 Fsdb-2.x will exploit multiple processors or cores, and provides Perl-
134 level support for input, output, and threaded-pipelines. (As of
135 Fsdb-2.44 it no longer uses Perl threading, just processes, since they
136 are faster.)
137
138 Installation instructions follow at the end of this document. Fsdb-2.x
139 requires Perl 5.8 to run. All commands have manual pages and provide
140 usage with the "--help" option. All commands are backed by an
141 automated test suite.
142
143 The most recent version of Fsdb is available on the web at
144 <http://www.isi.edu/~johnh/SOFTWARE/FSDB/index.html>.
145
147 3.0, 2022-04-04 Complete type support and accordingly bump major version.
148 NEW The major version number is now 3.0 to correspond to the addition
149 of types (although they were actually added in 2.75). Old fsdb
150 files are supported (Fsdb-3.0 is backwards compatible with
151 databases), but older versions will confuse types in new files (new
152 Fsdb files are not forward compatible with old versions).
153
154 ENHANCEMENT
155 Type specifications in a few more programs: dbcolhisto,
156 dbcolscorrelate, dbcolsregression, dbcolstatscores,
157 dbrowaccumulate, dbrowcount, dbrowdiff, dbrvstatdiff.
158
159 ENHANCEMENT
160 dbcolhisto now puts an empty value on any empty rows.
161
162 NEW dbcoltype redefines column types, or clears them with the "-v"
163 option.
164
166 executive summary
167 what's new
168 README CONTENTS
169 installation
170 basic data format
171 basic data manipulation
172 list of commands
173 another example
174 a gradebook example
175 a password example
176 history
177 related work
178 release notes
179 copyright
180 comments
181
183 Fsdb now uses the standard Perl build and installation from
184 ExtUtil::MakeMaker(3), so the quick answer to installation is to type:
185
186 perl Makefile.PL
187 make
188 make test
189 make install
190
191 Or, if you want to install it somewhere else, change the first line to
192
193 perl Makefile.PL PREFIX=$HOME
194
195 and it will go in your home directory's bin, etc. (See
196 ExtUtil::MakeMaker(3) for more details.)
197
198 Fsdb requires perl 5.8 or later.
199
200 A test-suite is available, run it with
201
202 make test
203
204 In the past, the ports existed for FreeBSD and MacOS. If someone
205 running one of those OSes wants to contribute a new port, please let me
206 know.
207
209 These programs are based on the idea storing data in simple ASCII
210 files. A database is a file with one header line and then data or
211 comment lines. For example:
212
213 #fsdb account passwd uid gid fullname homedir shell
214 johnh * 2274 134 John_Heidemann /home/johnh /bin/bash
215 greg * 2275 134 Greg_Johnson /home/greg /bin/bash
216 root * 0 0 Root /root /bin/bash
217 # this is a simple database
218
219 The header line must be first and begins with "#fsdb". There are rows
220 (records) and columns (fields), just like in a normal database.
221 Comment lines begin with "#". Column names are any string not
222 containing spaces or single quote (although it is prudent to keep them
223 alphanumeric with underscore).
224
225 Columns can optionally include type anntations by following name with
226 :t where t is some type. (Types are not used in Perl, but are relevant
227 in Python and Go Fsdb bindings.) Types use a subset of perl pack
228 specifiers: c, s, l, q are signed 8, 16, 32, and 64-bit integers, f is
229 a float, d is double float, a is utf-8 string, and > and < can
230 force big or little endianness.
231
232 By default, columns are delimited by whitespace. With this default
233 configuration, the contents of a field cannot contain whitespace.
234 However, this limitation can be relaxed by changing the field separator
235 as described below.
236
237 The big advantage of simple flat-text databases is that it is usually
238 easy to massage data into this format, and it's reasonably easy to take
239 data out of this format into other (text-based) programs, like gnuplot,
240 jgraph, and LaTeX. Think Unix. Think pipes. (Or even output to Excel
241 and HTML if you prefer.)
242
243 Since no-whitespace in columns was a problem for some applications,
244 there's an option which relaxes this rule. You can specify the field
245 separator in the table header with "-F x" where "x" is a code for the
246 new field separator. A full list of codes is at dbfilealter(1), but
247 two common special values are "-F t" which is a separator of a single
248 tab character, and "-F S", a separator of two spaces. Both allowing
249 (single) spaces in fields. An example:
250
251 #fsdb -F S account passwd uid gid fullname homedir shell
252 johnh * 2274 134 John Heidemann /home/johnh /bin/bash
253 greg * 2275 134 Greg Johnson /home/greg /bin/bash
254 root * 0 0 Root /root /bin/bash
255 # this is a simple database
256
257 See dbfilealter(1) for more details. Regardless of what the column
258 separator is for the body of the data, it's always whitespace in the
259 header.
260
261 There's also a third format: a "list". Because it's often hard to see
262 what's columns past the first two, in list format each "column" is on a
263 separate line. The programs dblistize and dbcolize convert to and from
264 this format, and all programs work with either formats. The command
265
266 dbfilealter -R C < DATA/passwd.fsdb
267
268 outputs:
269
270 #fsdb -R C account passwd uid gid fullname homedir shell
271 account: johnh
272 passwd: *
273 uid: 2274
274 gid: 134
275 fullname: John_Heidemann
276 homedir: /home/johnh
277 shell: /bin/bash
278
279 account: greg
280 passwd: *
281 uid: 2275
282 gid: 134
283 fullname: Greg_Johnson
284 homedir: /home/greg
285 shell: /bin/bash
286
287 account: root
288 passwd: *
289 uid: 0
290 gid: 0
291 fullname: Root
292 homedir: /root
293 shell: /bin/bash
294
295 # this is a simple database
296 # | dblistize
297
298 See dbfilealter(1) for more details.
299
301 A number of programs exist to manipulate databases. Complex functions
302 can be made by stringing together commands with shell pipelines. For
303 example, to print the home directories of everyone with ``john'' in
304 their names, you would do:
305
306 cat DATA/passwd | dbrow '_fullname =~ /John/' | dbcol homedir
307
308 The output might be:
309
310 #fsdb homedir
311 /home/johnh
312 /home/greg
313 # this is a simple database
314 # | dbrow _fullname =~ /John/
315 # | dbcol homedir
316
317 (Notice that comments are appended to the output listing each command,
318 providing an automatic audit log.)
319
320 In addition to typical database functions (select, join, etc.) there
321 are also a number of statistical functions.
322
323 The real power of Fsdb is that one can apply arbitrary code to rows to
324 do powerful things.
325
326 cat DATA/passwd | dbroweval '_fullname =~ s/(\w+)_(\w+)/$2,_$1/'
327
328 converts "John_Heidemann" into "Heidemann,_John". Not too much more
329 work could split fullname into firstname and lastname fields.
330
331 (Or:
332
333 cat DATA/passwd | dbcolcreate sort | dbroweval -b 'use Fsdb::Support'
334 '_sort = _fullname; _sort =~ s/_/ /g; _sort = fullname_to_sort(_sort);'
335
337 An advantage of Fsdb is that you can talk about columns by name
338 (symbolically) rather than simply by their positions. So in the above
339 example, "dbcol homedir" pulled out the home directory column, and
340 "dbrow '_fullname =~ /John/'" matched against column fullname.
341
342 In general, you can use the name of the column listed on the "#fsdb"
343 line to identify it in most programs, and _name to identify it in code.
344
345 Some alternatives for flexibility:
346
347 • Numeric values identify columns positionally, numbering from 0. So
348 0 or _0 is the first column, 1 is the second, etc.
349
350 • In code, _last_columnname gets the value from columname's previous
351 row.
352
353 See dbroweval(1) for more details about writing code.
354
356 Enough said. I'll summarize the commands, and then you can experiment.
357 For a detailed description of each command, see a summary by running it
358 with the argument "--help" (or "-?" if you prefer.) Full manual pages
359 can be found by running the command with the argument "--man", or
360 running the Unix command "man dbcol" or whatever program you want.
361
362 TABLE CREATION
363 dbcolcreate
364 add columns to a database
365
366 dbcoldefine
367 set the column headings for a non-Fsdb file
368
369 TABLE MANIPULATION
370 dbcol
371 select columns from a table
372
373 dbrow
374 select rows from a table
375
376 dbsort
377 sort rows based on a set of columns
378
379 dbjoin
380 compute the natural join of two tables
381
382 dbcolrename
383 rename a column
384
385 dbcolmerge
386 merge two columns into one
387
388 dbcolsplittocols
389 split one column into two or more columns
390
391 dbcolsplittorows
392 split one column into multiple rows
393
394 dbfilepivot
395 "pivots" a file, converting multiple rows corresponding to the same
396 entity into a single row with multiple columns.
397
398 dbfilevalidate
399 check that db file doesn't have some common errors
400
401 COMPUTATION AND STATISTICS
402 dbcolstats
403 compute statistics over a column (mean,etc.,optionally median)
404
405 dbmultistats
406 group rows by some key value, then compute stats (mean, etc.) over
407 each group (equivalent to dbmapreduce with dbcolstats as the
408 reducer)
409
410 dbmapreduce
411 group rows (map) and then apply an arbitrary function to each group
412 (reduce)
413
414 dbrvstatdiff
415 compare two samples distributions (mean/conf interval/T-test)
416
417 dbcolmovingstats
418 computing moving statistics over a column of data
419
420 dbcolstatscores
421 compute Z-scores and T-scores over one column of data
422
423 dbcolpercentile
424 compute the rank or percentile of a column
425
426 dbcolhisto
427 compute histograms over a column of data
428
429 dbcolscorrelate
430 compute the coefficient of correlation over several columns
431
432 dbcolsregression
433 compute linear regression and correlation for two columns
434
435 dbrowaccumulate
436 compute a running sum over a column of data
437
438 dbrowcount
439 count the number of rows (a subset of dbstats)
440
441 dbrowdiff
442 compute differences between a columns in each row of a table
443
444 dbrowenumerate
445 number each row
446
447 dbroweval
448 run arbitrary Perl code on each row
449
450 dbrowuniq
451 count/eliminate identical rows (like Unix uniq(1))
452
453 dbfilediff
454 compare fields on rows of a file (something like Unix diff(1))
455
456 OUTPUT CONTROL
457 dbcolneaten
458 pretty-print columns
459
460 dbfilealter
461 convert between column or list format, or change the column
462 separator
463
464 dbfilestripcomments
465 remove comments from a table
466
467 dbformmail
468 generate a script that sends form mail based on each row
469
470 CONVERSIONS
471 (These programs convert data into fsdb. See their web pages for
472 details.)
473
474 cgi_to_db
475 <http://stein.cshl.org/boulder/>
476
477 combined_log_format_to_db
478 <http://httpd.apache.org/docs/2.0/logs.html>
479
480 html_table_to_db
481 HTML tables to fsdb (assuming they're reasonably formatted).
482
483 kitrace_to_db
484 <http://ficus-www.cs.ucla.edu/ficus-members/geoff/kitrace.html>
485
486 ns_to_db
487 <http://mash-www.cs.berkeley.edu/ns/>
488
489 sqlselect_to_db
490 the output of SQL SELECT tables to db
491
492 tabdelim_to_db
493 spreadsheet tab-delimited files to db
494
495 tcpdump_to_db
496 (see man tcpdump(8) on any reasonable system)
497
498 xml_to_db
499 XML input to fsdb, assuming they're very regular
500
501 (And out of fsdb:)
502
503 db_to_csv
504 Comma-separated-value format from fsdb.
505
506 db_to_html_table
507 simple conversion of Fsdb to html tables
508
509 STANDARD OPTIONS
510 Many programs have common options:
511
512 -? or --help
513 Show basic usage.
514
515 -N on --new-name
516 When a command creates a new column like dbrowaccumulate's "accum",
517 this option lets one override the default name of that new column.
518
519 -T TmpDir
520 where to put tmp files. Also uses environment variable TMPDIR, if
521 -T is not specified. Default is /tmp.
522
523 Show basic usage.
524
525 -c FRACTION or --confidence FRACTION
526 Specify confidence interval FRACTION (dbcolstats, dbmultistats,
527 etc.)
528
529 -C S or "--element-separator S"
530 Specify column separator S (dbcolsplittocols, dbcolmerge).
531
532 -d or --debug
533 Enable debugging (may be repeated for greater effect in some
534 cases).
535
536 -a or --include-non-numeric
537 Compute stats over all data (treating non-numbers as zeros). (By
538 default, things that can't be treated as numbers are ignored for
539 stats purposes)
540
541 -S or --pre-sorted
542 Assume the data is pre-sorted. May be repeated to disable
543 verification (saving a small amount of work).
544
545 -e E or --empty E
546 give value E as the value for empty (null) records
547
548 -i I or --input I
549 Input data from file I.
550
551 -o O or --output O
552 Write data out to file O.
553
554 --header H
555 Use H as the full Fsdb header, rather than reading a header from
556 then input. This option is particularly useful when using Fsdb
557 under Hadoop, where split files don't have heades.
558
559 --nolog.
560 Skip logging the program in a trailing comment.
561
562 When giving Perl code (in dbrow and dbroweval) column names can be
563 embedded if preceded by underscores. Look at dbrow(1) or dbroweval(1)
564 for examples.)
565
566 Most programs run in constant memory and use temporary files if
567 necessary. Exceptions are dbcolneaten, dbcolpercentile, dbmapreduce,
568 dbmultistats, dbrowsplituniq.
569
570 STANDARD SORTING OPTIONS
571 A number of programs do sorting, or depend on defining an ordering of
572 rows. Such programs use these standard sorting options:
573
574 -r or --descending
575 sort in reverse order (high to low)
576
577 -R or --ascending
578 sort in normal order (low to high)
579
580 -t or --type-inferred-sorting
581 sort fields by type (numeric or leicographic), automatically
582
583 -n or --numeric
584 sort numerically
585
586 -N or --lexical
587 sort lexicographically
588
590 Take the raw data in "DATA/http_bandwidth", put a header on it
591 ("dbcoldefine size bw"), took statistics of each category
592 ("dbmultistats -k size bw"), pick out the relevant fields ("dbcol size
593 mean stddev pct_rsd"), and you get:
594
595 #fsdb size mean stddev pct_rsd
596 1024 1.4962e+06 2.8497e+05 19.047
597 10240 5.0286e+06 6.0103e+05 11.952
598 102400 4.9216e+06 3.0939e+05 6.2863
599 # | dbcoldefine size bw
600 # | /home/johnh/BIN/DB/dbmultistats -k size bw
601 # | /home/johnh/BIN/DB/dbcol size mean stddev pct_rsd
602
603 (The whole command was:
604
605 cat DATA/http_bandwidth |
606 dbcoldefine size |
607 dbmultistats -k size bw |
608 dbcol size mean stddev pct_rsd
609
610 all on one line.)
611
612 Then post-process them to get rid of the exponential notation by adding
613 this to the end of the pipeline:
614
615 dbroweval '_mean = sprintf("%8.0f", _mean); _stddev = sprintf("%8.0f", _stddev);'
616
617 (Actually, this step is no longer required since dbcolstats now uses a
618 different default format.)
619
620 giving:
621
622 #fsdb size mean stddev pct_rsd
623 1024 1496200 284970 19.047
624 10240 5028600 601030 11.952
625 102400 4921600 309390 6.2863
626 # | dbcoldefine size bw
627 # | dbmultistats -k size bw
628 # | dbcol size mean stddev pct_rsd
629 # | dbroweval { _mean = sprintf("%8.0f", _mean); _stddev = sprintf("%8.0f", _stddev); }
630
631 In a few lines, raw data is transformed to processed output.
632
633 Suppose you expect there is an odd distribution of results of one
634 datapoint. Fsdb can easily produce a CDF (cumulative distribution
635 function) of the data, suitable for graphing:
636
637 cat DB/DATA/http_bandwidth | \
638 dbcoldefine size bw | \
639 dbrow '_size == 102400' | \
640 dbcol bw | \
641 dbsort -n bw | \
642 dbrowenumerate | \
643 dbcolpercentile count | \
644 dbcol bw percentile | \
645 xgraph
646
647 The steps, roughly: 1. get the raw input data and turn it into fsdb
648 format, 2. pick out just the relevant column (for efficiency) and sort
649 it, 3. for each data point, assign a CDF percentage to it, 4. pick out
650 the two columns to graph and show them
651
653 The first commercial program I wrote was a gradebook, so here's how to
654 do it with Fsdb.
655
656 Format your data like DATA/grades.
657
658 #fsdb name email id test1
659 a a@ucla.example.edu 1 80
660 b b@usc.example.edu 2 70
661 c c@isi.example.edu 3 65
662 d d@lmu.example.edu 4 90
663 e e@caltech.example.edu 5 70
664 f f@oxy.example.edu 6 90
665
666 Or if your students have spaces in their names, use "-F S" and two
667 spaces to separate each column:
668
669 #fsdb -F S name email id test1
670 alfred aho a@ucla.example.edu 1 80
671 butler lampson b@usc.example.edu 2 70
672 david clark c@isi.example.edu 3 65
673 constantine drovolis d@lmu.example.edu 4 90
674 debrorah estrin e@caltech.example.edu 5 70
675 sally floyd f@oxy.example.edu 6 90
676
677 To compute statistics on an exam, do
678
679 cat DATA/grades | dbstats test1 |dblistize
680
681 giving
682
683 #fsdb -R C ...
684 mean: 77.5
685 stddev: 10.84
686 pct_rsd: 13.987
687 conf_range: 11.377
688 conf_low: 66.123
689 conf_high: 88.877
690 conf_pct: 0.95
691 sum: 465
692 sum_squared: 36625
693 min: 65
694 max: 90
695 n: 6
696 ...
697
698 To do a histogram:
699
700 cat DATA/grades | dbcolhisto -n 5 -g test1
701
702 giving
703
704 #fsdb low histogram
705 65 *
706 70 **
707 75
708 80 *
709 85
710 90 **
711 # | /home/johnh/BIN/DB/dbhistogram -n 5 -g test1
712
713 Now you want to send out grades to the students by e-mail. Create a
714 form-letter (in the file test1.txt):
715
716 To: _email (_name)
717 From: J. Random Professor <jrp@usc.example.edu>
718 Subject: test1 scores
719
720 _name, your score on test1 was _test1.
721 86+ A
722 75-85 B
723 70-74 C
724 0-69 F
725
726 Generate the shell script that will send the mail out:
727
728 cat DATA/grades | dbformmail test1.txt > test1.sh
729
730 And run it:
731
732 sh <test1.sh
733
734 The last two steps can be combined:
735
736 cat DATA/grades | dbformmail test1.txt | sh
737
738 but I like to keep a copy of exactly what I send.
739
740 At the end of the semester you'll want to compute grade totals and
741 assign letter grades. Both fall out of dbroweval. For example, to
742 compute weighted total grades with a 40% midterm/60% final where the
743 midterm is 84 possible points and the final 100:
744
745 dbcol -rv total |
746 dbcolcreate total - |
747 dbroweval '
748 _total = .40 * _midterm/84.0 + .60 * _final/100.0;
749 _total = sprintf("%4.2f", _total);
750 if (_final eq "-" || ( _name =~ /^_/)) { _total = "-"; };' |
751 dbcolneaten
752
753 If you got the data originally from a spreadsheet, save it in "tab-
754 delimited" format and convert it with tabdelim_to_db (run
755 tabdelim_to_db -? for examples).
756
758 To convert the Unix password file to db:
759
760 cat /etc/passwd | sed 's/:/ /g'| \
761 dbcoldefine -F S login password uid gid gecos home shell \
762 >passwd.fsdb
763
764 To convert the group file
765
766 cat /etc/group | sed 's/:/ /g' | \
767 dbcoldefine -F S group password gid members \
768 >group.fsdb
769
770 To show the names of the groups that div7-members are in (assuming DIV7
771 is in the gecos field):
772
773 cat passwd.fsdb | dbrow '_gecos =~ /DIV7/' | dbcol login gid | \
774 dbjoin -i - -i group.fsdb gid | dbcol login group
775
777 Which Fsdb programs are the most complicated (based on number of test
778 cases)?
779
780 ls TEST/*.cmd | \
781 dbcoldefine test | \
782 dbroweval '_test =~ s@^TEST/([^_]+).*$@$1@' | \
783 dbrowuniq -c | \
784 dbsort -nr count | \
785 dbcolneaten
786
787 (Answer: dbmapreduce, then dbcolstats, dbfilealter and dbjoin.)
788
789 Stats on an exam (in $FILE, where $COLUMN is the name of the exam)?
790
791 cat $FILE | dbcolstats -q 4 $COLUMN <$FILE | dblistize | dbstripcomments
792
793 cat $FILE | dbcolhisto -g -n 20 $COLUMN | dbcolneaten | dbstripcomments
794
795 Merging a the hw1 column from file hw1.fsdb into grades.fsdb assuming
796 there's a common student id in column "id":
797
798 dbcol id hw1 <hw1.fsdb >t.fsdb
799
800 dbjoin -a -e - grades.fsdb t.fsdb id | \
801 dbsort name | \
802 dbcolneaten >new_grades.fsdb
803
804 Merging two fsdb files with the same rows:
805
806 cat file1.fsdb file2.fsdb >output.fsdb
807
808 or if you want to clean things up a bit
809
810 cat file1.fsdb file2.fsdb | dbstripextraheaders >output.fsdb
811
812 or if you want to know where the data came from
813
814 for i in 1 2
815 do
816 dbcolcreate source $i < file$i.fsdb
817 done >output.fsdb
818
819 (assumes you're using a Bourne-shell compatible shell, not csh).
820
822 As with any tool, one should (which means must) understand the limits
823 of the tool.
824
825 All Fsdb tools should run in constant memory. In some cases (such as
826 dbcolstats with quartiles, where the whole input must be re-read),
827 programs will spool data to disk if necessary.
828
829 Most tools buffer one or a few lines of data, so memory will scale with
830 the size of each line. (So lines with many columns, or when columns
831 have lots data, may cause large memory consumption.)
832
833 All Fsdb tools should run in constant or at worst "n log n" time.
834
835 All Fsdb tools use normal Perl math routines for computation. Although
836 I make every attempt to choose numerically stable algorithms (although
837 I also welcome feedback and suggestions for improvement), normal
838 rounding due to computer floating point approximations can result in
839 inaccuracies when data spans a large range of precision. (See for
840 example the dbcolstats_extrema test cases.)
841
842 Any requirements and limitations of each Fsdb tool is documented on its
843 manual page.
844
845 If any Fsdb program violates these assumptions, that is a bug that
846 should be documented on the tool's manual page or ideally fixed.
847
848 Fsdb does depend on Perl's correctness, and Perl (and Fsdb) have some
849 bugs. Fsdb should work on perl from version 5.10 onward.
850
852 There have been four major versions of Fsdb: fsdb-0.x was begun in 1991
853 for my personal use. Fsdb 1.0 is a complete re-write of the pre-1995
854 versions, and was distributed from 1995 to 2007. Fsdb 2.0 is a
855 significant re-write of the 1.x versions to systematically use a
856 library and threads (although threads were abandoned in 2.44). Fsdb
857 3.0 in 2022 adds type specifiers to the schema, mostly to support use
858 in languages with stronger typing (like Python, Go, and C).
859
860 Fsdb (in its various forms) has been used extensively by its author
861 since 1991. Since 1995 it's been used by two other researchers at UCLA
862 and several at ISI. In February 1998 it was announced to the Internet.
863 Since then it has found a few users, some outside where I work.
864
865 Major changes:
866
867 1.0 1997-07-22: first public release.
868 2.0 2008-01-25: rewrite to use a common library, and starting to use
869 threads.
870 2.12 2008-10-16: completion of the rewrite, and first RPM package.
871 2.44 2013-10-02: abandoning threads for improved performance
872 3.0 2022-04-04: adding type specifiers to the schema
873
874 Fsdb 2.0 Rationale
875 I've thought about fsdb-2.0 for many years, but it was started in
876 earnest in 2007. Fsdb-2.0 has the following goals:
877
878 in-one-process processing
879 While fsdb is great on the Unix command line as a pipeline between
880 programs, it should also be possible to set it up to run in a
881 single process. And if it does so, it should be able to avoid
882 serializing and deserializing (converting to and from text) data
883 between each module. (Accomplished in fsdb-2.0: see dbpipeline,
884 although still needs tuning.)
885
886 clean IO API
887 Fsdb's roots go back to perl4 and 1991, so the fsdb-1.x library is
888 very, very crufty. More than just being ugly (but it was that
889 too), this made things reading from one format file and writing to
890 another the application's job, when it should be the library's.
891 (Accomplished in fsdb-1.15 and improved in 2.0: see Fsdb::IO.)
892
893 normalized module APIs
894 Because fsdb modules were added as needed over 10 years, sometimes
895 the module APIs became inconsistent. (For example, the 1.x
896 "dbcolcreate" required an empty value following the name of the new
897 column, but other programs specify empty values with the "-e"
898 argument.) We should smooth over these inconsistencies.
899 (Accomplished as each module was ported in 2.0 through 2.7.)
900
901 everyone handles all input formats
902 Given a clean IO API, the distinction between "colized" and
903 "listized" fsdb files should go away. Any program should be able
904 to read and write files in any format. (Accomplished in fsdb-2.1.)
905
906 Fsdb-2.0 preserves backwards compatibility where possible, but breaks
907 it where necessary to accomplish the above goals. In August 2008,
908 Fsdb-2.7 was declared preferred over the 1.x versions. Benchmarking in
909 2013 showed that threading performed much worse than just using pipes,
910 so Fsdb-2.44 uses threading "style", but implemented with processes
911 (via my "Freds" library).
912
913 Contributors
914 Fsdb includes code ported from Geoff Kuenning
915 ("Fsdb::Support::TDistribution").
916
917 Fsdb contributors: Ashvin Goel goel@cse.oge.edu, Geoff Kuenning
918 geoff@fmg.cs.ucla.edu, Vikram Visweswariah visweswa@isi.edu, Kannan
919 Varadahan kannan@isi.edu, Lars Eggert larse@isi.edu, Arkadi Gelfond
920 arkadig@dyna.com, David Graff graff@ldc.upenn.edu, Haobo Yu
921 haoboy@packetdesign.com, Pavlin Radoslavov pavlin@catarina.usc.edu,
922 Graham Phillips, Yuri Pradkin, Alefiya Hussain, Ya Xu, Michael
923 Schwendt, Fabio Silva fabio@isi.edu, Jerry Zhao zhaoy@isi.edu, Ning Xu
924 nxu@aludra.usc.edu, Martin Lukac mlukac@lecs.cs.ucla.edu, Xue Cai,
925 Michael McQuaid, Christopher Meng, Calvin Ardi, H. Merijn Brand, Lan
926 Wei, Hang Guo, Wes Hardaker.
927
928 Fsdb includes datasets contributed from NIST (DATA/nist_zarr13.fsdb),
929 from
930 <http://www.itl.nist.gov/div898/handbook/eda/section4/eda4281.htm>, the
931 NIST/SEMATECH e-Handbook of Statistical Methods, section 1.4.2.8.1.
932 Background and Data. The source is public domain, and reproduced with
933 permission.
934
936 As stated in the introduction, Fsdb is an incompatible reimplementation
937 of the ideas found in "/rdb". By storing data in simple text files and
938 processing it with pipelines it is easy to experiment (in the shell)
939 and look at the output. The original implementation of this idea was
940 /rdb, a commercial product described in the book UNIX relational
941 database management: application development in the UNIX environment by
942 Rod Manis, Evan Schaffer, and Robert Jorgensen (and also at the web
943 page <http://www.rdb.com/>).
944
945 While Fsdb is inspired by Rdb, it includes no code from it, and Fsdb
946 makes several different design choices. In particular: rdb attempts to
947 be closer to a "real" database, with provision for locking, file
948 indexing. Fsdb focuses on single user use and so eschews these
949 choices. Rdb also has some support for interactive editing. Fsdb
950 leaves editing to text editors like emacs or vi.
951
952 In August, 2002 I found out Carlo Strozzi extended RDB with his package
953 NoSQL <http://www.linux.it/~carlos/nosql/>. According to Mr. Strozzi,
954 he implemented NoSQL in awk to avoid the Perl start-up of RDB.
955 Although I haven't found Perl startup overhead to be a big problem on
956 my platforms (from old Sparcstation IPCs to 2GHz Pentium-4s), you may
957 want to evaluate his system. The Linux Journal has a description of
958 NoSQL at <http://www.linuxjournal.com/article/3294>. It seems quite
959 similar to Fsdb. Like /rdb, NoSQL supports indexing (not present in
960 Fsdb). Fsdb appears to have richer support for statistics, and, as of
961 Fsdb-2.x, its support for Perl threading may support faster performance
962 (one-process, less serialization and deserialization).
963
965 Versions prior to 1.0 were released informally on my web page but were
966 not announced.
967
968 0.0 1991
969 started for my own research use
970
971 0.1 26-May-94
972 first check-in to RCS
973
974 0.2 15-Mar-95
975 parts now require perl5
976
977 1.0, 22-Jul-97
978 adds autoconf support and a test script.
979
980 1.1, 20-Jan-98
981 support for double space field separators, better tests
982
983 1.2, 11-Feb-98
984 minor changes and release on comp.lang.perl.announce
985
986 1.3, 17-Mar-98
987 • adds median and quartile options to dbstats
988
989 • adds dmalloc_to_db converter
990
991 • fixes some warnings
992
993 • dbjoin now can run on unsorted input
994
995 • fixes a dbjoin bug
996
997 • some more tests in the test suite
998
999 1.4, 27-Mar-98
1000 • improves error messages (all should now report the program that
1001 makes the error)
1002
1003 • fixed a bug in dbstats output when the mean is zero
1004
1005 1.5, 25-Jun-98
1006 BUG FIX dbcolhisto, dbcolpercentile now handles non-numeric values like
1007 dbstats
1008 NEW dbcolstats computes zscores and tscores over a column
1009 NEW dbcolscorrelate computes correlation coefficients between two
1010 columns
1011 INTERNAL ficus_getopt.pl has been replaced by DbGetopt.pm
1012 BUG FIX all tests are now ``portable'' (previously some tests ran only
1013 on my system)
1014 BUG FIX you no longer need to have the db programs in your path (fix
1015 arose from a discussion with Arkadi Gelfond)
1016 BUG FIX installation no longer uses cp -f (to work on SunOS 4)
1017
1018 1.6, 24-May-99
1019 NEW dbsort, dbstats, dbmultistats now run in constant memory (using tmp
1020 files if necessary)
1021 NEW dbcolmovingstats does moving means over a series of data
1022 NEW dbcol has a -v option to get all columns except those listed
1023 NEW dbmultistats does quartiles and medians
1024 NEW dbstripextraheaders now also cleans up bogus comments before the
1025 fist header
1026 BUG FIX dbcolneaten works better with double-space-separated data
1027
1028 1.7, 5-Jan-00
1029 NEW dbcolize now detects and rejects lines that contain embedded copies
1030 of the field separator
1031 NEW configure tries harder to prevent people from improperly
1032 configuring/installing fsdb
1033 NEW tcpdump_to_db converter (incomplete)
1034 NEW tabdelim_to_db converter: from spreadsheet tab-delimited files to
1035 db
1036 NEW mailing lists for fsdb are "fsdb-announce@heidemann.la.ca.us"
1037 and "fsdb-talk@heidemann.la.ca.us"
1038 To subscribe to either, send mail
1039 to "fsdb-announce-request@heidemann.la.ca.us" or
1040 "fsdb-talk-request@heidemann.la.ca.us" with "subscribe" in the
1041 BODY of the message.
1042
1043 BUG FIX dbjoin used to produce incorrect output if there were extra,
1044 unmatched values in the 2nd table. Thanks to Graham Phillips for
1045 providing a test case.
1046 BUG FIX the sample commands in the usage strings now all should
1047 explicitly include the source of data (typically from "cat foo.fsdb
1048 |"). Thanks to Ya Xu for pointing out this documentation deficiency.
1049 BUG FIX (DOCUMENTATION) dbcolmovingstats had incorrect sample output.
1050
1051 1.8, 28-Jun-00
1052 BUG FIX header options are now preserved when writing with dblistize
1053 NEW dbrowuniq now optionally checks for uniqueness only on certain
1054 fields
1055 NEW dbrowsplituniq makes one pass through a file and splits it into
1056 separate files based on the given fields
1057 NEW converter for "crl" format network traces
1058 NEW anywhere you use arbitrary code (like dbroweval), _last_foo now
1059 maps to the last row's value for field _foo.
1060 OPTIMIZATION comment processing slightly changed so that dbmultistats
1061 now is much faster on files with lots of comments (for example, ~100k
1062 lines of comments and 700 lines of data!) (Thanks to Graham Phillips
1063 for pointing out this performance problem.)
1064 BUG FIX dbstats with median/quartiles now correctly handles singleton
1065 data points.
1066
1067 1.9, 6-Nov-00
1068 NEW dbfilesplit, split a single input file into multiple output files
1069 (based on code contributed by Pavlin Radoslavov).
1070 BUG FIX dbsort now works with perl-5.6
1071
1072 1.10, 10-Apr-01
1073 BUG FIX dbstats now handles the case where there are more n-tiles than
1074 data
1075 NEW dbstats now includes a -S option to optimize work on pre-sorted
1076 data (inspired by code contributed by Haobo Yu)
1077 BUG FIX dbsort now has a better estimate of memory usage when run on
1078 data with very short records (problem detected by Haobo Yu)
1079 BUG FIX cleanup of temporary files is slightly better
1080
1081 1.11, 2-Nov-01
1082 BUG FIX dbcolneaten now runs in constant memory
1083 NEW dbcolneaten now supports "field specifiers" that allow some control
1084 over how wide columns should be
1085 OPTIMIZATION dbsort now tries hard to be filesystem cache-friendly
1086 (inspired by "Information and Control in Gray-box Systems" by the
1087 Arpaci-Dusseau's at SOSP 2001)
1088 INTERNAL t_distr now ported to perl5 module DbTDistr
1089
1090 1.12, 30-Oct-02
1091 BUG FIX dbmultistats documentation typo fixed
1092 NEW dbcolmultiscale
1093 NEW dbcol has -r option for "relaxed error checking"
1094 NEW dbcolneaten has new -e option to strip end-of-line spaces
1095 NEW dbrow finally has a -v option to negate the test
1096 BUG FIX math bug in dbcoldiff fixed by Ashvin Goel (need to check
1097 Scheaffer test cases)
1098 BUG FIX some patches to run with Perl 5.8. Note: some programs
1099 (dbcolmultiscale, dbmultistats, dbrowsplituniq) generate warnings like:
1100 "Use of uninitialized value in concatenation (.)" or "string at
1101 /usr/lib/perl5/5.8.0/FileCache.pm line 98, <STDIN> line 2". Please
1102 ignore this until I figure out how to suppress it. (Thanks to Jerry
1103 Zhao for noticing perl-5.8 problems.)
1104 BUG FIX fixed an autoconf problem where configure would fail to find a
1105 reasonable prefix (thanks to Fabio Silva for reporting the problem)
1106 NEW db_to_html_table: simple conversion to html tables (NO fancy stuff)
1107 NEW dblib now has a function dblib_text2html() that will do simple
1108 conversion of iso-8859-1 to HTML
1109
1110 1.13, 4-Feb-04
1111 NEW fsdb added to the freebsd ports tree
1112 <http://www.freshports.org/databases/fsdb/>. Maintainer:
1113 "larse@isi.edu"
1114 BUG FIX properly handle trailing spaces when data must be numeric (ex.
1115 dbstats with -FS, see test dbstats_trailing_spaces). Fix from Ning Xu
1116 "nxu@aludra.usc.edu".
1117 NEW dbcolize error message improved (bug report from Terrence Brannon),
1118 and list format documented in the README.
1119 NEW cgi_to_db converts CGI.pm-format storage to fsdb list format
1120 BUG FIX handle numeric synonyms for column names in dbcol properly
1121 ENHANCEMENT "talking about columns" section added to README. Lack of
1122 documentation pointed out by Lars Eggert.
1123 CHANGE dbformmail now defaults to using Mail ("Berkeley Mail") to send
1124 mail, rather than sendmail (sendmail is still an option, but mail
1125 doesn't require running as root)
1126 NEW on platforms that support it (i.e., with perl 5.8), fsdb works fine
1127 with unicode
1128 NEW dbfilevalidate: check a db file for some common errors
1129
1130 1.14, 24-Aug-06
1131 ENHANCEMENT README cleanup
1132 INCOMPATIBLE CHANGE dbcolsplit renamed dbcolsplittocols
1133 NEW dbcolsplittorows split one column into multiple rows
1134 NEW dbcolsregression compute linear regression and correlation for two
1135 columns
1136 ENHANCEMENT cvs_to_db: better error handling, normalize field names,
1137 skip blank lines
1138 ENHANCEMENT dbjoin now detects (and fails) if non-joined files have
1139 duplicate names
1140 BUG FIX minor bug fixed in calculation of Student t-distributions
1141 (doesn't change any test output, but may have caused small errors)
1142
1143 1.15, 12-Nov-07
1144 NEW fsdb-1.14 added to the MacOS Fink system
1145 <http://pdb.finkproject.org/pdb/package.php/fsdb>. (Thanks to Lars
1146 Eggert for maintaining this port.)
1147 NEW Fsdb::IO::Reader and Fsdb::IO::Writer now provide reasonably clean
1148 OO I/O interfaces to Fsdb files. Highly recommended if you use fsdb
1149 directly from perl. In the fullness of time I expect to reimplement
1150 the entire thing using these APIs to replace the current dblib.pl which
1151 is still hobbled by its roots in perl4.
1152 NEW dbmapreduce now implements a Google-style map/reduce abstraction,
1153 generalizing dbmultistats.
1154 ENHANCEMENT fsdb now uses the Perl build system (Makefile.PL, etc.),
1155 instead of autoconf. This change paves the way to better perl-5-style
1156 modularization, proper manual pages, input of both listize and colize
1157 format for every program, and world peace.
1158 ENHANCEMENT dblib.pl is now moved to Fsdb::Old.pm.
1159 BUG FIX dbmultistats now propagates its format argument (-f). Bug and
1160 fix from Martin Lukac (thanks!).
1161 ENHANCEMENT dbformmail documentation now is clearer that it doesn't
1162 send the mail, you have to run the shell script it writes. (Problem
1163 observed by Unkyu Park.)
1164 ENHANCEMENT adapted to autoconf-2.61 (and then these changes were
1165 discarded in favor of The Perl Way.
1166 BUG FIX dbmultistats memory usage corrected (O(# tags), not O(1))
1167 ENHANCEMENT dbmultistats can now optionally run with pre-grouped input
1168 in O(1) memory
1169 ENHANCEMENT dbroweval -N was finally implemented (eat comments)
1170
1171 2.0, 25-Jan-08
1172 2.0, 25-Jan-08 --- a quiet 2.0 release (gearing up towards complete)
1173
1174 ENHANCEMENT: shifting old programs to Perl modules, with the front-end
1175 program as just a wrapper. In the short-term, this change just means
1176 programs have real man pages. In the long-run, it will mean that one
1177 can run a pipeline in a single Perl program. So far: dbcol, dbroweval,
1178 the new dbrowcount. dbsort the new dbmerge, the old "dbstats" (renamed
1179 dbcolstats), dbcolrename, dbcolcreate,
1180 NEW: Fsdb::Filter::dbpipeline is an internal-only module that lets one
1181 use fsdb commands from within perl (via threads).
1182 It also provides perl function aliases for the internal modules, so
1183 a string of fsdb commands in perl are nearly as terse as in the
1184 shell:
1185
1186 use Fsdb::Filter::dbpipeline qw(:all);
1187 dbpipeline(
1188 dbrow(qw(name test1)),
1189 dbroweval('_test1 += 5;')
1190 );
1191
1192 INCOMPATIBLE CHANGE: The old dbcolstats has been renamed
1193 dbcolstatscores. The new dbcolstats does the same thing as the old
1194 dbstats. This incompatibility is unfortunate but normalizes program
1195 names.
1196 CHANGE: The new dbcolstats program always outputs "-" (the default
1197 empty value) for statistics it cannot compute (for example, standard
1198 deviation if there is only one row), instead of the old mix of "-" and
1199 "na".
1200 INCOMPATIBLE CHANGE: The old dbcolstats program, now called
1201 dbcolstatscores, also has different arguments. The "-t mean,stddev"
1202 option is now "--tmean mean --tstddev stddev". See dbcolstatscores for
1203 details.
1204 INCOMPATIBLE CHANGE: dbcolcreate now assumes all new columns get the
1205 default value rather than requiring each column to have an initial
1206 constant value. To change the initial value, sue the new "-e" option.
1207 NEW: dbrowcount counts rows, an almost-subset of dbcolstats's "n"
1208 output (except without differentiating numeric/non-numeric input), or
1209 the equivalent of "dbstripcomments | wc -l".
1210 NEW: dbmerge merges two sorted files. This functionality was previously
1211 embedded in dbsort.
1212 INCOMPATIBLE CHANGE: dbjoin's "-i" option to include non-matches is now
1213 renamed "-a", so as to not conflict with the new standard option "-i"
1214 for input file.
1215
1216 2.1, 6-Apr-08
1217 2.1, 6-Apr-08 --- another alpha 2.0, but now all converted programs
1218 understand both listize and colize format
1219
1220 ENHANCEMENT: shifting more old programs to Perl modules. New in 2.1:
1221 dbcolneaten, dbcoldefine, dbcolhisto, dblistize, dbcolize, dbrecolize
1222 ENHANCEMENT dbmerge now handles an arbitrary number of input files, not
1223 just exactly two.
1224 NEW dbmerge2 is an internal routine that handles merging exactly two
1225 files.
1226 INCOMPATIBLE CHANGE dbjoin now specifies inputs like dbmerge2, rather
1227 than assuming the first two arguments were tables (as in fsdb-1).
1228 The old dbjoin argument "-i" is now "-a" or <--type=outer>.
1229
1230 A minor change: comments in the source files for dbjoin are now
1231 intermixed with output rather than being delayed until the end.
1232
1233 ENHANCEMENT dbsort now no longer produces warnings when null values are
1234 passed to numeric comparisons.
1235 BUG FIX dbroweval now once again works with code that lacks a trailing
1236 semicolon. (This bug fixes a regression from 1.15.)
1237 INCOMPATIBLE CHANGE dbcolneaten's old "-e" option (to avoid end-of-line
1238 spaces) is now "-E" to avoid conflicts with the standard empty field
1239 argument.
1240 INCOMPATIBLE CHANGE dbcolhisto's old "-e" option is now "-E" to avoid
1241 conflicts. And its "-n", "-s", and "-w" are now "-N", "-S", and "-W" to
1242 correspond.
1243 NEW dbfilealter replaces dbrecolize, dblistize, and dbcolize, but with
1244 different options.
1245 ENHANCEMENT The library routines "Fsdb::IO" now understand both list-
1246 format and column-format data, so all converted programs can now
1247 automatically read either format. This capability was one of the
1248 milestone goals for 2.0, so yea!
1249
1250 2.2, 23-May-08
1251 Release 2.2 is another 2.x alpha release. Now most of the commands are
1252 ported, but a few remain, and I plan one last incompatible change (to
1253 the file header) before 2.x final.
1254
1255 ENHANCEMENT
1256 shifting more old programs to Perl modules. New in 2.2:
1257 dbrowaccumulate, dbformmail. dbcolmovingstats. dbrowuniq.
1258 dbrowdiff. dbcolmerge. dbcolsplittocols. dbcolsplittorows.
1259 dbmapreduce. dbmultistats. dbrvstatdiff. Also dbrowenumerate
1260 exists only as a front-end (command-line) program.
1261
1262 INCOMPATIBLE CHANGE
1263 The following programs have been dropped from fsdb-2.x:
1264 dbcoltighten, dbfilesplit, dbstripextraheaders,
1265 dbstripleadingspace.
1266
1267 NEW combined_log_format_to_db to convert Apache logfiles
1268
1269 INCOMPATIBLE CHANGE
1270 Options to dbrowdiff are now -B and -I, not -a and -i.
1271
1272 INCOMPATIBLE CHANGE
1273 dbstripcomments is now dbfilestripcomments.
1274
1275 BUG FIXES
1276 dbcolneaten better handles empty columns; dbcolhisto warning
1277 suppressed (actually a bug in high-bucket handling).
1278
1279 INCOMPATIBLE CHANGE
1280 dbmultistats now requires a "-k" option in front of the key (tag)
1281 field, or if none is given, it will group by the first field (both
1282 like dbmapreduce).
1283
1284 KNOWN BUG
1285 dbmultistats with quantile option doesn't work currently.
1286
1287 INCOMPATIBLE CHANGE
1288 dbcoldiff is renamed dbrvstatdiff.
1289
1290 BUG FIXES
1291 dbformmail was leaving its log message as a command, not a
1292 comment. Oops. No longer.
1293
1294 2.3, 27-May-08 (alpha)
1295 Another alpha release, this one just to fix the critical dbjoin bug
1296 listed below (that happens to have blocked my MP3 jukebox :-).
1297
1298 BUG FIX
1299 Dbsort no longer hangs if given an input file with no rows.
1300
1301 BUG FIX
1302 Dbjoin now works with unsorted input coming from a pipeline (like
1303 stdin). Perl-5.8.8 has a bug (?) that was making this case
1304 fail---opening stdin in one thread, reading some, then reading more
1305 in a different thread caused an lseek which works on files, but
1306 fails on pipes like stdin. Go figure.
1307
1308 BUG FIX / KNOWN BUG
1309 The dbjoin fix also fixed dbmultistats -q (it now gives the right
1310 answer). Although a new bug appeared, messages like:
1311 Attempt to free unreferenced scalar: SV 0xa9dd0c4, Perl
1312 interpreter: 0xa8350b8 during global destruction. So the
1313 dbmultistats_quartile test is still disabled.
1314
1315 2.4, 18-Jun-08
1316 Another alpha release, mostly to fix minor usability problems in
1317 dbmapreduce and client functions.
1318
1319 ENHANCEMENT
1320 dbrow now defaults to running user supplied code without warnings
1321 (as with fsdb-1.x). Use "--warnings" or "-w" to turn them back on.
1322
1323 ENHANCEMENT
1324 dbroweval can now write different format output than the input,
1325 using the "-m" option.
1326
1327 KNOWN BUG
1328 dbmapreduce emits warnings on perl 5.10.0 about "Unbalanced string
1329 table refcount" and "Scalars leaked" when run with an external
1330 program as a reducer.
1331
1332 dbmultistats emits the warning "Attempt to free unreferenced
1333 scalar" when run with quartiles.
1334
1335 In each case the output is correct. I believe these can be
1336 ignored.
1337
1338 CHANGE
1339 dbmapreduce no longer logs a line for each reducer that is invoked.
1340
1341 2.5, 24-Jun-08
1342 Another alpha release, fixing more minor bugs in "dbmapreduce" and
1343 lossage in "Fsdb::IO".
1344
1345 ENHANCEMENT
1346 dbmapreduce can now tolerate non-map-aware reducers that pass back
1347 the key column in put. It also passes the current key as the last
1348 argument to external reducers.
1349
1350 BUG FIX
1351 Fsdb::IO::Reader, correctly handle "-header" option again. (Broken
1352 since fsdb-2.3.)
1353
1354 2.6, 11-Jul-08
1355 Another alpha release, needed to fix DaGronk. One new port, small bug
1356 fixes, and important fix to dbmapreduce.
1357
1358 ENHANCEMENT
1359 shifting more old programs to Perl modules. New in 2.2:
1360 dbcolpercentile.
1361
1362 INCOMPATIBLE CHANGE and ENHANCEMENTS dbcolpercentile arguments changed,
1363 use "--rank" to require ranking instead of "-r". Also, "--ascending"
1364 and "--descending" can now be specified separately, both for
1365 "--percentile" and "--rank".
1366 BUG FIX
1367 Sigh, the sense of the --warnings option in dbrow was inverted. No
1368 longer.
1369
1370 BUG FIX
1371 I found and fixed the string leaks (errors like "Unbalanced string
1372 table refcount" and "Scalars leaked") in dbmapreduce and
1373 dbmultistats. (All "IO::Handle"s in threads must be manually
1374 destroyed.)
1375
1376 BUG FIX
1377 The "-C" option to specify the column separator in dbcolsplittorows
1378 now works again (broken since it was ported).
1379
1380 2.7, 30-Jul-08 beta
1381
1382 The beta release of fsdb-2.x. Finally, all programs are ported. As
1383 statistics, the number of lines of non-library code doubled from 7.5k
1384 to 15.5k. The libraries are much more complete, going from 866 to 5164
1385 lines. The overall number of programs is about the same, although 19
1386 were dropped and 11 were added. The number of test cases has grown
1387 from 116 to 175. All programs are now in perl-5, no more shell scripts
1388 or perl-4. All programs now have manual pages.
1389
1390 Although this is a major step forward, I still expect to rename "jdb"
1391 to "fsdb".
1392
1393 ENHANCEMENT
1394 shifting more old programs to Perl modules. New in 2.7:
1395 dbcolscorellate. dbcolsregression. cgi_to_db. dbfilevalidate.
1396 db_to_csv. csv_to_db, db_to_html_table, kitrace_to_db,
1397 tcpdump_to_db, tabdelim_to_db, ns_to_db.
1398
1399 INCOMPATIBLE CHANGE
1400 The following programs have been dropped from fsdb-2.x: db2dcliff,
1401 dbcolmultiscale, crl_to_db. ipchain_logs_to_db. They may come
1402 back, but seemed overly specialized. The following program
1403 dbrowsplituniq was dropped because it is superseded by dbmapreduce.
1404 dmalloc_to_db was dropped pending a test cases and examples.
1405
1406 ENHANCEMENT
1407 dbfilevalidate now has a "-c" option to correct errors.
1408
1409 NEW html_table_to_db provides the inverse of db_to_html_table.
1410
1411 2.8, 5-Aug-08
1412 Change header format, preserving forwards compatibility.
1413
1414 BUG FIX
1415 Complete editing pass over the manual, making sure it aligns with
1416 fsdb-2.x.
1417
1418 SEMI-COMPATIBLE CHANGE
1419 The header of fsdb files has changed, it is now #fsdb, not #h (or
1420 #L) and parsing of -F and -R are also different. See dbfilealter
1421 for the new specification. The v1 file format will be read,
1422 compatibly, but not written.
1423
1424 BUG FIX
1425 dbmapreduce now tolerates comments that precede the first key,
1426 instead of failing with an error message.
1427
1428 2.9, 6-Aug-08
1429 Still in beta; just a quick bug-fix for dbmapreduce.
1430
1431 ENHANCEMENT
1432 dbmapreduce now generates plausible output when given no rows of
1433 input.
1434
1435 2.10, 23-Sep-08
1436 Still in beta, but picking up some bug fixes.
1437
1438 ENHANCEMENT
1439 dbmapreduce now generates plausible output when given no rows of
1440 input.
1441
1442 ENHANCEMENT
1443 dbroweval the warnings option was backwards; now corrected. As a
1444 result, warnings in user code now default off (like in fsdb-1.x).
1445
1446 BUG FIX
1447 dbcolpercentile now defaults to assuming the target column is
1448 numeric. The new option "-N" allows selection of a non-numeric
1449 target.
1450
1451 BUG FIX
1452 dbcolscorrelate now includes "--sample" and "--nosample" options to
1453 compute the sample or full population correlation coefficients.
1454 Thanks to Xue Cai for finding this bug.
1455
1456 2.11, 14-Oct-08
1457 Still in beta, but picking up some bug fixes.
1458
1459 ENHANCEMENT
1460 html_table_to_db is now more aggressive about filling in empty
1461 cells with the official empty value, rather than leaving them blank
1462 or as whitespace.
1463
1464 ENHANCEMENT
1465 dbpipeline now catches failures during pipeline element setup and
1466 exits reasonably gracefully.
1467
1468 BUG FIX
1469 dbsubprocess now reaps child processes, thus avoiding running out
1470 of processes when used a lot.
1471
1472 2.12, 16-Oct-08
1473 Finally, a full (non-beta) 2.x release!
1474
1475 INCOMPATIBLE CHANGE
1476 Jdb has been renamed Fsdb, the flatfile-streaming database. This
1477 change affects all internal Perl APIs, but no shell command-level
1478 APIs. While Jdb served well for more than ten years, it is easily
1479 confused with the Java debugger (even though Jdb was there first!).
1480 It also is too generic to work well in web search engines.
1481 Finally, Jdb stands for ``John's database'', and we're a bit beyond
1482 that. (However, some call me the ``file-system guy'', so one could
1483 argue it retains that meeting.)
1484
1485 If you just used the shell commands, this change should not affect
1486 you. If you used the Perl-level libraries directly in your code,
1487 you should be able to rename "Jdb" to "Fsdb" to move to 2.12.
1488
1489 The jdb-announce list not yet been renamed, but it will be shortly.
1490
1491 With this release I've accomplished everything I wanted to in
1492 fsdb-2.x. I therefore expect to return to boring, bugfix releases.
1493
1494 2.13, 30-Oct-08
1495 BUG FIX
1496 dbrowaccumulate now treats non-numeric data as zero by default.
1497
1498 BUG FIX
1499 Fixed a perl-5.10ism in dbmapreduce that breaks that program under
1500 5.8. Thanks to Martin Lukac for reporting the bug.
1501
1502 2.14, 26-Nov-08
1503 BUG FIX
1504 Improved documentation for dbmapreduce's "-f" option.
1505
1506 ENHANCEMENT
1507 dbcolmovingstats how computes a moving standard deviation in
1508 addition to a moving mean.
1509
1510 2.15, 13-Apr-09
1511 BUG FIX
1512 Fix a make install bug reported by Shalindra Fernando.
1513
1514 2.16, 14-Apr-09
1515 BUG FIX
1516 Another minor release bug: on some systems programize_module looses
1517 executable permissions. Again reported by Shalindra Fernando.
1518
1519 2.17, 25-Jun-09
1520 TYPO FIXES
1521 Typo in the dbroweval manual fixed.
1522
1523 IMPROVEMENT
1524 There is no longer a comment line to label columns in dbcolneaten,
1525 instead the header line is tweaked to line up. This change
1526 restores the Jdb-1.x behavior, and means that repeated runs of
1527 dbcolneaten no longer add comment lines each time.
1528
1529 BUG FIX
1530 It turns out dbcolneaten was not correctly handling trailing
1531 spaces when given the "-E" option to suppress them. This
1532 regression is now fixed.
1533
1534 EXTENSION
1535 dbroweval(1) can now handle direct references to the last row via
1536 $lfref, a dubious but now documented feature.
1537
1538 BUG FIXES
1539 Separators set with "-C" in dbcolmerge and dbcolsplittocols were
1540 not properly setting the heading, and null fields were not
1541 recognized. The first bug was reported by Martin Lukac.
1542
1543 2.18, 1-Jul-09 A minor release
1544 IMPROVEMENT
1545 Documentation for Fsdb::IO::Reader has been improved.
1546
1547 IMPROVEMENT
1548 The package should now be PGP-signed.
1549
1550 2.19, 10-Jul-09
1551 BUG FIX
1552 Internal improvements to debugging output and robustness of
1553 dbmapreduce and dbpipeline. TEST/dbpipeline_first_fails.cmd re-
1554 enabled.
1555
1556 2.20, 30-Nov-09 (A collection of minor bugfixes, plus a build against
1557 Fedora 12.)
1558 BUG FIX
1559 Loging for dbmapreduce with code refs is now stable (it no longer
1560 includes a hex pointer to the code reference).
1561
1562 BUG FIX
1563 Better handling of mixed blank lines in Fsdb::IO::Reader (see test
1564 case dbcolize_blank_lines.cmd).
1565
1566 BUG FIX
1567 html_table_to_db now handles multi-line input better, and handles
1568 tables with COLSPAN.
1569
1570 BUG FIX
1571 dbpipeline now cleans up threads in an "eval" to prevent "cannot
1572 detach a joined thread" errors that popped up in perl-5.10.
1573 Hopefully this prevents a race condition that causes the test
1574 suites to hang about 20% of the time (in dbpipeline_first_fails).
1575
1576 IMPROVEMENT
1577 dbmapreduce now detects and correctly fails when the input and
1578 reducer have incompatible field separators.
1579
1580 IMPROVEMENT
1581 dbcolstats, dbcolhisto, dbcolscorrelate, dbcolsregression, and
1582 dbrowcount now all take an "-F" option to let one specify the
1583 output field separator (so they work better with dbmapreduce).
1584
1585 BUG FIX
1586 An omitted "-k" from the manual page of dbmultistats is now there.
1587 Bug reported by Unkyu Park.
1588
1589 2.21, 17-Apr-10 bug fix release
1590 BUG FIX
1591 Fsdb::IO::Writer now no longer fails with -outputheader => never
1592 (an obscure bug).
1593
1594 IMPROVEMENT
1595 Fsdb (in the warnings section) and dbcolstats now more carefully
1596 document how they handle (and do not handle) numerical precision
1597 problems, and other general limits. Thanks to Yuri Pradkin for
1598 prompting this documentation.
1599
1600 IMPROVEMENT
1601 "Fsdb::Support::fullname_to_sortkey" is now restored from "Jdb".
1602
1603 IMPROVEMENT
1604 Documention for multiple styles of input approaches (including
1605 performance description) added to Fsdb::IO.
1606
1607 2.22, 2010-10-31 One new tool dbcolcopylast and several bug fixes for Perl
1608 5.10.
1609 BUG FIX
1610 dbmerge now correctly handles n-way merges. Bug reported by Yuri
1611 Pradkin.
1612
1613 INCOMPARABLE CHANGE
1614 dbcolneaten now defaults to not padding the last column.
1615
1616 ADDITION
1617 dbrowenumerate now takes -N NewColumn to give the new column a name
1618 other than "count". Feature requested by Mike Rouch in January
1619 2005.
1620
1621 ADDITION
1622 New program dbcolcopylast copies the last value of a column into a
1623 new column copylast_column of the next row. New program requested
1624 by Fabio Silva; useful for converting dbmultistats output into
1625 dbrvstatdiff input.
1626
1627 BUG FIX
1628 Several tools (particularly dbmapreduce and dbmultistats) would
1629 report errors like "Unbalanced string table refcount: (1) for
1630 "STDOUT" during global destruction" on exit, at least on certain
1631 versions of Perl (for me on 5.10.1), but similar errors have been
1632 off-and-on for several Perl releases. Although I think my code
1633 looked OK, I worked around this problem with a different way of
1634 handling standard IO redirection.
1635
1636 2.23, 2011-03-10 Several small portability bugfixes; improved dbcolstats
1637 for large datasets
1638 IMPROVEMENT
1639 Documentation to dbrvstatdiff was changed to use "sd" to refer to
1640 standard deviation, not "ss" (which might be confused with sum-of-
1641 squares).
1642
1643 BUG FIX
1644 This documentation about dbmultistats was missing the -k option in
1645 some cases.
1646
1647 BUG FIX
1648 dbmapreduce was failing on MacOS-10.6.3 for some tests with the
1649 error
1650
1651 dbmapreduce: cannot run external dbmapreduce reduce program (perl TEST/dbmapreduce_external_with_key.pl)
1652
1653 The problem seemed to be only in the error, not in operation. On
1654 MacOS, the error is now suppressed. Thanks to Alefiya Hussain for
1655 providing access to a Mac system that allowed debugging of this
1656 problem.
1657
1658 IMPROVEMENT
1659 The csv_to_db command requires an external Perl library
1660 (Text::CSV_XS). On computers that lack this optional library,
1661 previously Fsdb would configure with a warning and then test cases
1662 would fail. Now those test cases are skipped with an additional
1663 warning.
1664
1665 BUG FIX
1666 The test suite now supports alternative valid output, as a hack to
1667 account for last-digit floating point differences. (Not very
1668 satisfying :-(
1669
1670 BUG FIX
1671 dbcolstats output for confidence intervals on very large datasets
1672 has changed. Previously it failed for more than 2^31-1 records,
1673 and handling of T-Distributions with thousands of rows was a bit
1674 dubious. Now datasets with more than 10000 are considered
1675 infinitely large and hopefully correctly handled.
1676
1677 2.24, 2011-04-15 Improvements to fix an old bug in dbmapreduce with
1678 different field separators
1679 IMPROVEMENT
1680 The dbfilealter command had a "--correct" option to work-around
1681 from incompatible field-separators, but it did nothing. Now it
1682 does the correct but sad, data-loosing thing.
1683
1684 IMPROVEMENT
1685 The dbmultistats command previously failed with an error message
1686 when invoked on input with a non-default field separator. The root
1687 cause was the underlying dbmapreduce that did not handle the case
1688 of reducers that generated output with a different field separator
1689 than the input. We now detect and repair incompatible field
1690 separators. This change corrects a problem originally documented
1691 and detected in Fsdb-2.20. Bug re-reported by Unkyu Park.
1692
1693 2.25, 2011-08-07 Two new tools, xml_to_db and dbfilepivot, and a bugfix for
1694 two people.
1695 IMPROVEMENT
1696 kitrace_to_db now supports a --utc option, which also fixes this
1697 test case for users outside of the Pacific time zone. Bug reported
1698 by David Graff, and also by Peter Desnoyers (within a week of each
1699 other :-)
1700
1701 NEW xml_to_db can convert simple, very regular XML files into Fsdb.
1702
1703 NEW dbfilepivot "pivots" a file, converting multiple rows corresponding
1704 to the same entity into a single row with multiple columns.
1705
1706 2.26, 2011-12-12 Bug fixes, particularly for perl-5.14.2.
1707 BUG FIX
1708 Bugs fixed in Fsdb::IO::Reader(3) manual page.
1709
1710 BUG FIX
1711 Fixed problems where dbcolstats was truncating floating point
1712 numbers when sorting. This strange behavior happens as of
1713 perl-5.14.2 and it seems like a Perl bug. I've worked around it
1714 for the test suites, but I'm a bit nervous.
1715
1716 2.27, 2012-11-15 Accumulated bug fixes.
1717 IMPROVEMENT
1718 csv_to_db now reports errors in CVS input with real diagnostics.
1719
1720 IMPROVEMENT
1721 dbcolmovingstats can now compute median, when given the "-m"
1722 option.
1723
1724 BUG FIX
1725 dbcolmovingstats non-numeric handling (the "-a" option) now works
1726 properly.
1727
1728 DOCUMENTATION
1729 The internal t/test_command.t test framework is now documented.
1730
1731 BUG FIX
1732 dbrowuniq now correctly handles the case where there is no input
1733 (previously it output a blank line, which is a malformed fsdb
1734 file). Thanks to Yuri Pradkin for reporting this bug.
1735
1736 2.28, 2012-11-15 A quick release to fix most rpmlint errors.
1737 BUG FIX
1738 Fixed a number of minor release problems (wrong permissions, old
1739 FSF address, etc.) found by rpmlint.
1740
1741 2.29, 2012-11-20 a quick release for CPAN testing
1742 IMPROVEMENT
1743 Tweaked the RPM spec.
1744
1745 IMPROVEMENT
1746 Modified Makefile.PL to fail gracefully on Perl installations that
1747 lack threads. (Without this fix, I get massive failures in the
1748 non-ithreads test system.)
1749
1750 2.30, 2012-11-25 improvements to perl portability
1751 BUG FIX
1752 Removed unicode character in documention of dbcolscorrelated so pod
1753 tests will pass. (Sigh, that should work :-( )
1754
1755 BUG FIX
1756 Fixed test suite failures on 5 tests (dbcolcreate_double_creation
1757 was the first) due to Carp's addition of a period. This problem
1758 was breaking Fsdb on perl-5.17. Thanks to Michael McQuaid for
1759 helping diagnose this problem.
1760
1761 IMPROVEMENT
1762 The test suite now prints out the names of tests it tries.
1763
1764 2.31, 2012-11-28 A release with actual improvements to dbfilepivot and
1765 dbrowuniq.
1766 BUG FIX
1767 Documentation fixes: typos in dbcolscorrelated, bugs in
1768 dbfilepivot, clarification for comment handling in
1769 Fsdb::IO::Reader.
1770
1771 IMPROVEMENT
1772 Previously dbfilepivot assumed the input was grouped by keys and
1773 didn't very that pre-condition. Now there is no pre-condition (it
1774 will sort the input by default), and it checks if the invariant is
1775 violated.
1776
1777 BUG FIX
1778 Previously dbfilepivot failed if the input had comments (oops :-);
1779 no longer.
1780
1781 IMPROVEMENT
1782 Now dbrowuniq has the "-L" option to preserve the last unique row
1783 (instead of the first), a common idiom.
1784
1785 2.32, 2012-12-21 Test suites should now be more numerically robust.
1786 NEW New dbfilediff does fsdb-aware file differencing. It does not do
1787 smart intuition of add/removes like Unix diff(1), but it does know
1788 about columns, and with "-E", it does numeric-aware differences.
1789
1790 IMPROVEMENT
1791 Test suites that are numeric now use dbfilediff to do numeric-aware
1792 comparisons, so the test suite should now be robust to slightly
1793 different computers and operating systems and compilers than
1794 exactly what I use.
1795
1796 2.33, 2012-12-23 Minor fixes to some test cases.
1797 IMPROVEMENT
1798 dbfilediff and dbrowuniq now supports the "-N" option to give the
1799 new column a different name. (And a test cases where this
1800 duplication mattered have been fixed.)
1801
1802 IMPROVEMENT
1803 dbrvstatdiff now show the t-test breakpoint with a reasonable
1804 number of floating point digits.
1805
1806 BUG FIX
1807 Fixed a numerical stability problem in the dbroweval_last test
1808 case.
1809
1811 2.34, 2013-02-10 Parallelism in dbmerge.
1812 IMPROVEMENT
1813 Documention for dbjoin now includes resource requirements.
1814
1815 IMPROVEMENT
1816 Default memory usage for dbsort is now about 256MB. (The world
1817 keeps moving forward.)
1818
1819 IMPROVEMENT
1820 dbmerge now does merging in parallel. As a side-effect, dbsort
1821 should be faster when input overflows memory. The level of
1822 parallelism can be limited with the "--parallelism" option. (There
1823 is more work to do here, but we're off to a start.)
1824
1825 2.35, 2013-02-23 Improvements to dbmerge parallelism
1826 BUG FIX
1827 Fsdb temporary files are now created more securely (with
1828 File::Temp).
1829
1830 IMPROVEMENT
1831 Programs that sort or merge on fields (dbmerge2, dbmerge, dbsort,
1832 dbjoin) now report an error if no fields on which to join or merge
1833 are given.
1834
1835 IMPROVEMENT
1836 Parallelism in dbmerge is should now be more consistent, with less
1837 starting and stopping.
1838
1839 IMPROVEMENT In dbmerge, the "--xargs" option lets one give input
1840 filenames on standard input, rather than the command line. This feature
1841 paves the way for faster dbsort for large inputs (by pipelining sorting
1842 and merging), expected in the next release.
1843
1844 2.36, 2013-02-25 dbsort pipelines with dbmerge
1845 IMPROVEMENT For large inputs, dbsort now pipelines sorting and merging,
1846 allowing earlier processing.
1847 BUG FIX Since 2.35, dbmerge delayed cleanup of intermediate files,
1848 thereby requiring extra disk space.
1849
1850 2.37, 2013-02-26 quick bugfix to support parallel sort and merge from
1851 recent releases
1852 BUG FIX Since 2.35, dbmerge delayed removal of input files given by
1853 "--xargs". This problem is now fixed.
1854
1855 2.38, 2013-04-29 minor bug fixes
1856 CLARIFICATION
1857 Configure now rejects Windows since tests seem to hang on some
1858 versions of Windows. (I would love help from a Windows developer
1859 to get this problem fixed, but I cannot do it.) See
1860 https://rt.cpan.org/Ticket/Display.html?id=84201.
1861
1862 IMPROVEMENT
1863 All programs that use temporary files (dbcolpercentile,
1864 dbcolscorrelate, dbcolstats, dbcolstatscores) now take the "-T"
1865 option and set the temporary directory consistently.
1866
1867 In addition, error messages are better when the temporary directory
1868 has problems. Problem reported by Liang Zhu.
1869
1870 BUG FIX
1871 dbmapreduce was failing with external, map-reduce aware reducers
1872 (when invoked with -M and an external program). (Sigh, did this
1873 case ever work?) This case should now work. Thanks to Yuri
1874 Pradkin for reporting this bug (in 2011).
1875
1876 BUG FIX
1877 Fixed perl-5.10 problem with dbmerge. Thanks to Yuri Pradkin for
1878 reporting this bug (in 2013).
1879
1880 2.39, date 2013-05-31 quick release for the dbrowuniq extension
1881 BUG FIX
1882 Actually in 2.38, the Fedora .spec got cleaner dependencies.
1883 Suggestion from Christopher Meng via
1884 <https://bugzilla.redhat.com/show_bug.cgi?id=877096>.
1885
1886 ENHANCEMENT
1887 Fsdb files are now explicitly set into UTF-8 encoding, unless one
1888 specifies "-encoding" to "Fsdb::IO".
1889
1890 ENHANCEMENT
1891 dbrowuniq now supports "-I" for incremental counting.
1892
1893 2.40, 2013-07-13 small bug fixes
1894 BUG FIX
1895 dbsort now has more respect for a user-given temporary directory;
1896 it no longer is ignored for merging.
1897
1898 IMPROVEMENT
1899 dbrowuniq now has options to output the first, last, and both first
1900 and last rows of a run ("-F", "-L", and "-B").
1901
1902 BUG FIX
1903 dbrowuniq now correctly handles "-N". Sigh, it didn't work before.
1904
1905 2.41, 2013-07-29 small bug and packaging fixes
1906 ENHANCEMENT
1907 Documentation to dbrvstatdiff improved (inspired by questions from
1908 Qian Kun).
1909
1910 BUG FIX
1911 dbrowuniq no longer duplicates singleton unique lines when
1912 outputting both (with "-B").
1913
1914 BUG FIX
1915 Add missing "XML::Simple" dependency to Makefile.PL.
1916
1917 ENHANCEMENT
1918 Tests now show the diff of the failing output if run with "make
1919 test TEST_VERBOSE=1".
1920
1921 ENHANCEMENT
1922 dbroweval now includes documentation for how to output extra rows.
1923 Suggestion from Yuri Pradkin.
1924
1925 BUG FIX
1926 Several improvements to the Fedora package from Michael Schwendt
1927 via <https://bugzilla.redhat.com/show_bug.cgi?id=877096>, and from
1928 the harsh master that is rpmlint. (I am stymied at teaching it
1929 that "outliers" is spelled correctly. Maybe I should send it
1930 Schneier's book. And an unresolvable invalid-spec-name lurks in
1931 the SRPM.)
1932
1933 2.42, 2013-07-31 A bug fix and packaging release.
1934 ENHANCEMENT
1935 Documentation to dbjoin improved to better memory usage. (Based on
1936 problem report by Lin Quan.)
1937
1938 BUG FIX
1939 The .spec is now perl-Fsdb.spec to satisfy rpmlint. Thanks to
1940 Christopher Meng for a specific bug report.
1941
1942 BUG FIX
1943 Test dbroweval_last.cmd no longer has a column that caused failures
1944 because of numerical instability.
1945
1946 BUG FIX
1947 Some tests now better handle bugs in old versions of perl (5.10,
1948 5.12). Thanks to Calvin Ardi for help debugging this on a Mac with
1949 perl-5.12, but the fix should affect other platforms.
1950
1951 2.43, 2013-08-27 Adds in-file compression.
1952 BUG FIX
1953 Changed the sort on TEST/dbsort_merge.cmd to strings (from
1954 numerics) so we're less susceptible to false test-failures due to
1955 floating point IO differences.
1956
1957 EXPERIMENTAL ENHANCEMENT
1958 Yet more parallelism in dbmerge: new "endgame-mode" builds a merge
1959 tree of processes at the end of large merge tasks to get maximally
1960 parallelism. Currently this feature is off by default because it
1961 can hang for some inputs. Enable this experimental feature with
1962 "--endgame".
1963
1964 ENHANCEMENT
1965 "Fsdb::IO" now handles being given "IO::Pipe" objects (as exercised
1966 by dbmerge).
1967
1968 BUG FIX
1969 Handling of NamedTmpfiles now supports concurrency. This fix will
1970 hopefully fix occasional "Use of uninitialized value $_ in string
1971 ne at ...NamedTmpfile.pm line 93." errors.
1972
1973 BUG FIX
1974 Fsdb now requires perl 5.10. This is a bug fix because some test
1975 cases used to require it, but this fact was not properly
1976 documented. (Back-porting to 5.008 would require removing all "//"
1977 operators.)
1978
1979 ENHANCEMENT
1980 Fsdb now handles automatic compression of file contents. Enable
1981 compression with "dbfilealter -Z xz" (or "gz" or "bz2"). All
1982 programs should operate on compressed files and leave the output
1983 with the same level of compression. "xz" is recommended as fastest
1984 and most efficient. "gz" is produces unrepeatable output (and so
1985 has no output test), it seems to insist on adding a timestamp.
1986
1987 2.44, 2013-10-02 A major change--all threads are gone.
1988 ENHANCEMENT
1989 Fsdb is now thread free and only uses processes for parallelism.
1990 This change is a big change--the entire motivation for Fsdb-2 was
1991 to exploit parallelism via threading. Parallelism--good, but perl
1992 threading--bad for performance. Horribly bad for performance.
1993 About 20x worse than pipes on my box. (See perl bug #119445 for
1994 the discussion.)
1995
1996 NEW "Fsdb::Support::Freds" provides a thread-like abstraction over
1997 forking, with some nice support for callbacks in the parent upon
1998 child termination.
1999
2000 ENHANCEMENT
2001 Details about removing threads: "dbpipeline" is thread free, and
2002 new tests to verify each of its parts. The easy cases are
2003 "dbcolpercentile", "dbcolstats", "dbfilepivot", "dbjoin", and
2004 "dbcolstatscores", each of which use it in simple ways
2005 (2013-09-09). "dbmerge" is now thread free (2013-09-13), but was a
2006 significant rewrite, which brought "dbsort" along. "dbmapreduce"
2007 is partly thread free (2013-09-21), again as a rewrite, and it
2008 brings "dbmultistats" along. Full "dbmapreduce" support took much
2009 longer (2013-10-02).
2010
2011 BUG FIX
2012 When running with user-only output ("-n"), dbroweval now resets the
2013 output vector $ofref after it has been output.
2014
2015 NEW dbcolcreate will create all columns at the head of each row with
2016 the "--first" option.
2017
2018 NEW dbfilecat will concatenate two files, verifying that they have the
2019 same schema.
2020
2021 ENHANCEMENT
2022 dbmapreduce now passes comments through, rather than eating them as
2023 before.
2024
2025 Also, dbmapreduce now supports a "--" option to prevent
2026 misinterpreting sub-program parameters as for dbmapreduce.
2027
2028 INCOMPATIBLE CHANGE
2029 dbmapreduce no longer figures out if it needs to add the key to the
2030 output. For multi-key-aware reducers, it never does (and cannot).
2031 For non-multi-key-aware reducers, it defaults to add the key and
2032 will now fail if the reducer adds the key (with error "dbcolcreate:
2033 attempt to create pre-existing column..."). In such cases, one
2034 must disable adding the key with the new option "--no-prepend-key".
2035
2036 INCOMPATIBLE CHANGE
2037 dbmapreduce no longer copies the input field separator by default.
2038 For multi-key-aware reducers, it never does (and cannot). For non-
2039 multi-key-aware reducers, it defaults to not copying the field
2040 separator, but it will copy it (the old default) with the
2041 "--copy-fs" option
2042
2043 2.45, 2013-10-07 cleanup from de-thread-ification
2044 BUG FIX
2045 Corrected a fast busy-wait in dbmerge.
2046
2047 ENHANCEMENT
2048 Endgame mode enabled in dbmerge; it (and also large cases of
2049 dbsort) should now exploit greater parallelism.
2050
2051 BUG FIX
2052 Test case with "Fsdb::BoundedQueue" (gone since 2.44) now removed.
2053
2054 2.46, 2013-10-08 continuing cleanup of our no-threads version
2055 BUG FIX
2056 Fixed some packaging details. (Really, threads are no longer
2057 required, missing tests in the MANIFEST.)
2058
2059 IMPROVEMENT
2060 dbsort now better communicates with the merge process to avoid
2061 bursty parallelism.
2062
2063 Fsdb::IO::Writer now can take "-autoflush =" 1> for line-buffered
2064 IO.
2065
2066 2.47, 2013-10-12 test suite cleanup for non-threaded perls
2067 BUG FIX
2068 Removed some stray "use threads" in some test cases. We didn't
2069 need them, and these were breaking non-threaded perls.
2070
2071 BUG FIX
2072 Better handling of Fred cleanup; should fix intermittent
2073 dbmapreduce failures on BSD.
2074
2075 ENHANCEMENT
2076 Improved test framework to show output when tests fail. (This
2077 time, for real.)
2078
2079 2.48, 2014-01-03 small bugfixes and improved release engineering
2080 ENHANCEMENT
2081 Test suites now skip tests for libraries that are missing. (Patch
2082 for missing "IO::Compresss:Xz" contributed by Calvin Ardi.)
2083
2084 ENHANCEMENT
2085 Removed references to Jdb in the package specification. Since the
2086 name was changed in 2008, there's no longer a huge need for
2087 backwards compatibility. (Suggestion form Petr Šabata.)
2088
2089 ENHANCEMENT
2090 Test suites now invoke the perl using the path from
2091 $Config{perlpath}. Hopefully this helps testing in environments
2092 where there are multiple installed perls and the default perl is
2093 not the same as the perl-under-test (as happens in
2094 cpantesters.org).
2095
2096 BUG FIX
2097 Added specific encoding to this manpage to account for Unicode.
2098 Required to build correctly against perl-5.18.
2099
2100 2.49, 2014-01-04 bugfix to unicode handling in Fsdb IO (plus minor
2101 packaging fixes)
2102 BUG FIX
2103 Restored a line in the .spec to chmod g-s.
2104
2105 BUG FIX
2106 Unicode decoding is now handled correctly for programs that read
2107 from standard input. (Also: New test scripts cover unicode input
2108 and output.)
2109
2110 BUG FIX
2111 Fix to Fsdb documentation encoding line. Addresses test failure in
2112 perl-5.16 and earlier. (Who knew "encoding" had to be followed by
2113 a blank line.)
2114
2116 2.50, 2014-05-27 a quick release for spec tweaks
2117 ENHANCEMENT
2118 In dbroweval, the "-N" (no output, even comments) option now
2119 implies "-n", and it now suppresses the header and trailer.
2120
2121 BUG FIX
2122 A few more tweaks to the perl-Fsdb.spec from Petr Šabata.
2123
2124 BUG FIX
2125 Fixed 3 uses of "use v5.10" in test suites that were causing test
2126 failures (due to warnings, not real failures) on some platforms.
2127
2128 2.51, 2014-09-05 Feature enhancements to dbcolmovingstats, dbcolcreate,
2129 dbmapreduce, and new sqlselect_to_db
2130 ENHANCEMENT
2131 dbcolcreate now has a "--no-recreate-fatal" that causes it to
2132 ignore creation of existing columns (instead of failing).
2133
2134 ENHANCEMENT
2135 dbmapreduce once again is robust to reducers that output the key;
2136 "--no-prepend-key" is no longer mandatory.
2137
2138 ENHANCEMENT
2139 dbcolsplittorows can now enumerate the output rows with "-E".
2140
2141 BUG FIX
2142 dbcolmovingstats is more mathematically robust. Previously for
2143 some inputs and some platforms, floating point rounding could
2144 sometimes cause squareroots of negative numbers.
2145
2146 NEW sqlselect_to_db converts the output of the MySQL or MarinaDB select
2147 comment into fsdb format.
2148
2149 INCOMPATIBLE CHANGE
2150 dbfilediff now outputs the second row when doing sloppy numeric
2151 comparisons, to better support test suites.
2152
2153 2.52, 2014-11-03 Fixing the test suite for line number changes.
2154 ENHANCEMENT
2155 Test suites changes to be robust to exact line numbers of failures,
2156 since different Perl releases fail on different lines.
2157 <https://bugzilla.redhat.com/show_bug.cgi?id=1158380>
2158
2159 2.53, 2014-11-26 bug fixes and stability improvements to dbmapreduce
2160 ENHANCEMENT
2161 The dbfilediff how supports a "--quiet" option.
2162
2163 ENHANCEMENT
2164 Better documention of dbpipeline_filter.
2165
2166 BUGFIX
2167 Added groff-base and perl-podlators to the Fedora package spec.
2168 Fixes <https://bugzilla.redhat.com/show_bug.cgi?id=1163149>. (Also
2169 in package 2.52-2.)
2170
2171 BUGFIX
2172 An important stability improvement to dbmapreduce. It, plus
2173 dbmultistats, and dbcolstats now support controlled parallelism
2174 with the "--pararallelism=N" option. They default to run with the
2175 number of available CPUs. dbmapreduce also moderates its level of
2176 parallelism. Previously it would create reducers as needed,
2177 causing CPU thrashing if reducers ran much slower than data
2178 production.
2179
2180 BUGFIX
2181 The combination of dbmapreduce with dbrowenumerate now works as it
2182 should. (The obscure bug was an interaction with dbcolcreate with
2183 non-multi-key reducers that output their own key. dbmapreduce has
2184 too many useful corner cases.)
2185
2186 2.54, 2014-11-28 fix for the test suite to correct failing tests on not-my-
2187 platform
2188 BUGFIX
2189 Sigh, the test suite now has a test suite. Because, yes, I broke
2190 it, causing many incorrect failures at cpantesters. Now fixed.
2191
2192 2.55, 2015-01-05 many spelling fixes and dbcolmovingstats tests are more
2193 robust to different numeric precision
2194 ENHANCEMENT
2195 dbfilediff now can be extra quiet, as I continue to try to track
2196 down a numeric difference on FreeBSD AMD boxes.
2197
2198 ENHANCEMENT
2199 dbcolmovingstats gave different test output (just reflecting
2200 rounding error) when stddev approaches zero. We now detect hand
2201 handle this case. See
2202 <https://rt.cpan.org/Public/Bug/Display.html?id=101220> and thanks
2203 to H. Merijn Brand for the bug report.
2204
2205 BUG FIX
2206 Many, many spelling bugs found by H. Merijn Brand; thanks for the
2207 bug report.
2208
2209 INCOMPATBLE CHANGE
2210 A number of programs had misspelled "separator" in
2211 "--fieldseparator" and "--columnseparator" options as "seperator".
2212 These are now correctly spelled.
2213
2214 2.56, 2015-02-03 fix against Getopt::Long-2.43's stricter error checkign
2215 BUG FIX
2216 Internal argument parsing uses Getopt::Long, but mixed pass-through
2217 and <>. Bug reported by Petr Pisar at
2218 <https://bugzilla.redhat.com/show_bug.cgi?id=1188538>.a
2219
2220 BUG FIX
2221 Added missing BuildRequires for "XML::Simple".
2222
2223 2.57, 2015-04-29 Minor changes, with better performance from dbmulitstats.
2224 BUG FIX
2225 dbfilecat now honors "--remove-inputs" (previously it didn't).
2226 This omission meant that dbmapreduce (and dbmultistats) would
2227 accumulate files in /tmp when running. Bad news for inputs with 4M
2228 keys.
2229
2230 ENHANCMENT
2231 dbmultistats should be faster with lots of small keys. dbcolstats
2232 now supports "-k" to get some of the functionality of dbmultistats
2233 (if data is pre-sorted and median/quartiles are not required).
2234
2235 dbfilecat now honors "--remove-inputs" (previously it didn't).
2236 This omission meant that dbmapreduce (and dbmultistats) would
2237 accumulate files in /tmp when running. Bad news for inputs with 4M
2238 keys.
2239
2240 2.58, 2015-04-30 Bugfix in dbmerge
2241 BUG FIX
2242 Fixed a case where dbmerge suffered mojobake in endgame mode. This
2243 bug surfaced when dbsort was applied to large files (big enough to
2244 require merging) with unicode in them; the symptom was soemthing
2245 like:
2246 Wide character in print at /usr/lib64/perl5/IO/Handle.pm line
2247 420, <GEN12> line 111.
2248
2249 2.59, 2016-09-01 Collect a few small bug fixes and documentation
2250 improvements.
2251 BUG FIX
2252 More IO is explicitly marked UTF-8 to avoid Perl's tendency to
2253 mojibake on otherwise valid unicode input. This change helps
2254 html_table_to_db.
2255
2256 ENHANCEMENT
2257 dbcolscorrelate now crossreferences dbcolsregression.
2258
2259 ENHANCEMENT
2260 Documentation for dbrowdiff now clarifies that the default is
2261 baseline mode.
2262
2263 BUG FIX
2264 dbjoin now propagates "-T" into the sorting process (if it is
2265 required). Thanks to Lan Wei for reporting this bug.
2266
2267 2.60, 2016-09-04 Adds support for hash joins.
2268 ENHANCEMENT
2269 dbjoin now supports hash joins with "-t lefthash" and "-t
2270 righthash". Hash joins cache a table in memory, but do not require
2271 that the other table be sorted. They are ideal when joining a
2272 large table against a small one.
2273
2274 2.61, 2016-09-05 Support left and right outer joins.
2275 ENHANCEMENT
2276 dbjoin now handles left and right outer joins with "-t left" and
2277 "-t right".
2278
2279 ENHANCEMENT
2280 dbjoin hash joins are now selected with "-m lefthash" and "-m
2281 righthash" (not the shortlived "-t righthash" option).
2282 (Technically this change is incompatible with Fsdd-2.60, but no one
2283 but me ever used that version.)
2284
2285 2.62, 2016-11-29 A new yaml_to_db and other minor improvements.
2286 ENHANCEMENT
2287 Documentation for xml_to_db now includes sample output.
2288
2289 NEW yaml_to_db converts a specific form of YAML to fsdb.
2290
2291 BUG FIX
2292 The test suite now uses "diff -c -b" rather than "diff -cb" to make
2293 OpenBSD-5.9 happier, I hope.
2294
2295 ENHANCEMENT
2296 Comments that log operations at the end of each file now do simple
2297 quoting of spaces. (It is not guaranteed to be fully shell-
2298 compliant.)
2299
2300 ENHANCEMENT
2301 There is a new standard option, "--header", allowing one to specify
2302 an Fsdb header for inputs that lack it. Currently it is supported
2303 by dbcoldefine, dbrowuniq, dbmapreduce, dbmultistats, dbsort,
2304 dbpipeline.
2305
2306 ENHANCEMENT
2307 dbfilepivot now allows the --possible-pivots option, and if it is
2308 provided processes the data in one pass.
2309
2310 ENHANCEMENT
2311 dbroweval logs are now quoted.
2312
2313 2.63, 2017-02-03 Re-add some features supposedly in 2.62 but not, and add
2314 more --header options.
2315 ENHANCEMENT
2316 The option -j is now a synonym for --parallelism. (And several
2317 documention bugs about this option are fixed.)
2318
2319 ENHANCEMENT
2320 Additional support for "--header" in dbcolmerge, dbcol, dbrow, and
2321 dbroweval.
2322
2323 BUG FIX
2324 Version 2.62 was supposed to have this improvement, but did not
2325 (and now does): dbfilepivot now allows the --possible-pivots
2326 option, and if it is provided processes the data in one pass.
2327
2328 BUG FIX
2329 Version 2.62 was supposed to have this improvement, but did not
2330 (and now does): dbroweval logs are now quoted.
2331
2332 2.64, 2017-11-20 several small bugfixes and enhancements
2333 BUG FIX
2334 In dbroweval, the "next row" option previously did not correctly
2335 set up "_last_fieldname". It now does.
2336
2337 ENHANCEMENT
2338 The csv_to_db converter now has an optional "-F x" option to set
2339 the field separator.
2340
2341 ENHANCEMENT
2342 Finally dbcolsplittocols has a "--header" option, and a new "-N"
2343 option to give the list of resulting output columns.
2344
2345 INCOMPATIBLE CHANGE
2346 Now dbcolstats and dbmultistats produce no output (but a schema)
2347 when given no input but a schema. Previously they gave a null row
2348 of output. The "--output-on-no-input" and
2349 "--no-output-on-no-input" options can control this behavior.
2350
2351 2.65, 2018-02-16 Minor release, bug fix and -F option.
2352 ENHANCEMENT
2353 dbmultistats and dbmapreduce now both take a "-F x" option to set
2354 the field separator.
2355
2356 BUG FIX
2357 Fixed missing "use Carp" in dbcolstats. Also went back and cleaned
2358 up all uses of "croak()". Thanks to Zefram for the bug report.
2359
2360 2.66, 2018-12-20 Critical bug fix in dbjoin.
2361 BUG FIX
2362 Removed old tests from MANIFEST. (Thanks to Hang Guo for reporting
2363 this bug.)
2364
2365 IMPROVEMENT
2366 Errors for non-existing input files now include the bad filename
2367 (before: "cannot setup filehandle", now: "cannot open input: cannot
2368 open TEST/bad_filename").
2369
2370 BUG FIX
2371 Hash joins with three identical rows were failing with the
2372 assertion failure "internal error: confused about overflow" due to
2373 a now-fixed bug.
2374
2375 2.67, 2019-07-10 add support for reading and writing hdfs
2376 IMPROVEMENT
2377 dbformmail now has an "mh" mechanism that writes messages to
2378 individual files (an mh-style mailbox).
2379
2380 BUG FIX
2381 dbrow failed to include the Carp library, leading to fails on
2382 croak.
2383
2384 BUG FIX
2385 Fixed dbjoin error message for an unsorted right stream was
2386 incorrect (it said left).
2387
2388 IMPROVEMENT
2389 All Fsdb programs can now read from and write to HDFS, when files
2390 that start with "hdfs:" are given to -i and -o options.
2391
2392 2.68, 2019-09-19 All programs now support automatic decompression based on
2393 file extension.
2394 IMPROVEMENT
2395 The omitted-possible-error test case for dbfilepivot now has an
2396 altnerative output that I saw on some BSD-running systems (thanks
2397 to CPAN).
2398
2399 IMPROVEMENT
2400 dbmerge and dbmerge2 now support "--header". dbmerge2 now gives
2401 better error messages when presented the wrong number of inputs.
2402
2403 BUG FIX
2404 dbsort now works with "--header" even when the file is big (due to
2405 fixes to dbmerge).
2406
2407 IMPROVEMENT
2408 cvs_to_db now processes data with the "binary" option, allowing it
2409 to handle newlines embedded in quoted fields.
2410
2411 IMPROVEMENT
2412 All programs now will transparently decompress input files, if they
2413 are listed as a filename as an input argument that extends with a
2414 standard extension (.gz, .bz2, and .xz).
2415
2416 2.69, 2019-11-22 a small bugfix in dbcolstats
2417 BUG FIX
2418 Filled in the the test case for autodecompress, which was missing
2419 for the 2.68 release.
2420
2421 ENHANCEMENT
2422 The groff program is required for build, and the "Makefile.PL"
2423 fails if groff is missing at build time. Thanks to Chris Williams
2424 for suggesting this check, and the CPAN auto-building system for
2425 trying many platforms.
2426
2427 BUG FIX
2428 The dbcolstats program had numerical instability that sometimes
2429 results in failing with a square-root of a negative number when
2430 many values varied right at the edge of floating-point precision.
2431 We now detect and report that case as 0 stddev. Thanks to Hang Guo
2432 for providing a test case.
2433
2434 2.70, 2020-11-12 Some small quality-of-life enhancements and corner-case
2435 bugfixes.
2436 ENHANCEMENT
2437 dbcol can now take an option "-a" to include all columns, allowing
2438 reordering of certain columns while passing the rest through.
2439
2440 ENHANCEMENT
2441 dbrowuniq and dbmerge now buffer comments in a way that the last
2442 row of data output is no longer in the last block of comments.
2443 (The data is identical, but for humans looking at output, this
2444 change makes it less likely to lose the last row.)
2445
2446 BUG FIX
2447 dbmultistats and dbpipeline documentation now indicates that they
2448 support "--header" (something they did since version 2.62 in
2449 2016-11-29, but now documented.
2450
2451 ENHANCEMENT
2452 dbcolcreate now supports "--header".
2453
2454 BUG FIX
2455 Fixed several spelling errors in deprecated programs and removed
2456 information about the no-longer existing FreeBSD and MacOS ports.
2457 Thanks to Calvin Ardi for the patch.
2458
2459 BUG FIX
2460 dbmerge now handles --xargs when only one file is provided (and
2461 passes the file through unchanged). It also throws a clean error
2462 with --xargs if zero files are provided. (To support dbmerge,
2463 dbcol now has an internal "--saveoutput" option.) Thanks to Yuri
2464 Pradkin for reporting the unhandled corner-case.
2465
2466 2.71, 2020-11-16 Fix a race condition breaking test suites.
2467 BUG FIX
2468 Suppress a race condition in dbcolmerge was sometimes throwing the
2469 error "Fsdb::Support::Freds: ending, but running process:
2470 dbmerge:xargs" in the dbmerge_0_xargs test case, on exit.
2471
2472 2.72, 2020-12-01 A small bug and a packaging improvement.
2473 BUG FIX
2474 dbcolhisto now handles the degenerate case where everything has the
2475 same value (previously it would throw "illegal division by zero").
2476
2477 ENHANCEMENT
2478 The spec for Fedora now includes "make" as BuildRequires, something
2479 required for Fedora 34.
2480
2481 2.73, 2021-05-18 Updates dbcolpercentile with "--weighted", and with more
2482 ipv6.
2483 ENHANCEMENT
2484 dbcolpercentile now has a "--weighted" option.
2485
2486 ENHANCEMENT
2487 The new Fsdb::Support::IPv6 package includes ipv6_normalize,
2488 ipv6_zeroize to rewrite ipv6 print addresses in IPv6 normal form,
2489 with a 0 in each 4-nybble field.
2490
2491 2.74, 2021-06-23 More ipv6.
2492 ENHANCEMENT
2493 Fsdb::Support::IPv6 package includes ipv6_fullhex to rewrite ipv6
2494 print addresses as full, 128-bit hex values.
2495
2496 2.75, 2022-04-02 New type specifications in the schema to better support
2497 type conversions in python.
2498 ENHANCEMENT
2499 Add optional type specifications to the schema. Types are not used
2500 in Perl, but are relevant in Python and Go Fsdb bindings. Types
2501 use a subset of perl pack specifiers: c, s, l, q are signed 8, 16,
2502 32, and 64-bit integers, f is a float, d is double float, a is
2503 utf-8 string, and > and < can force big or little endianness.
2504 The default type for everything is "a", that is, utf-8 strings.
2505 Thanks to Wes Hardaker for pushing to get this long-desired feature
2506 out the door; his Python bindings need types.
2507
2508 ENHANCEMENT
2509 dbcol, dbcolcreate, dbcolcopylast, and dbcolrename now understand
2510 and propagate schema types. dbsort, dbjoin, dbmerge, dbmerge2 and
2511 dbfilepivot all take a new option "-t" to sort by type-inferred
2512 comparision, if a type is given.
2513
2514 ENHANCEMENT
2515 dbcolstat, dbmultistats, and dbcolmovingstats now include type
2516 information in their output schema. (They assumes input variables
2517 are floats, not integers.)
2518
2519 ENHANCEMENT
2520 Even more IPv6: the functions in Fsdb::Support::IPv6 package now
2521 support strings of hex digits as an alternate encoding for IP
2522 address (and they are already the output of ipv6_fullhex), and
2523 "ip_fullhex_to_normal" converts full hex-encoded IPv4 or IPv6
2524 addresses to their "normal" form (dotted-quad or IPv6 printable
2525 format).
2526
2528 John Heidemann, "johnh@isi.edu"
2529
2530 See "Contributors" for the many people who have contributed bug reports
2531 and fixes.
2532
2534 Fsdb is Copyright (C) 1991-2022 by John Heidemann <johnh@isi.edu>.
2535
2536 This program is free software; you can redistribute it and/or modify it
2537 under the terms of version 2 of the GNU General Public License as
2538 published by the Free Software Foundation.
2539
2540 This program is distributed in the hope that it will be useful, but
2541 WITHOUT ANY WARRANTY; without even the implied warranty of
2542 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
2543 General Public License for more details.
2544
2545 You should have received a copy of the GNU General Public License along
2546 with this program; if not, write to the Free Software Foundation, Inc.,
2547 675 Mass Ave, Cambridge, MA 02139, USA.
2548
2549 A copy of the GNU General Public License can be found in the file
2550 ``COPYING''.
2551
2553 Any comments about these programs should be sent to John Heidemann
2554 "johnh@isi.edu".
2555
2556
2557
2558perl v5.34.1 2022-04-04 Fsdb(3)