Fsdb(3pm)

1Fsdb(3)               User Contributed Perl Documentation              Fsdb(3)
2
3
4

NAME

6       Fsdb - a flat-text database for shell scripting
7

SYNOPSIS

9       Fsdb, the flatfile streaming database is package of commands for
10       manipulating flat-ASCII databases from shell scripts.  Fsdb is useful
11       to process medium amounts of data (with very little data you'd do it by
12       hand, with megabytes you might want a real database).  Fsdb was known
13       as as Jdb from 1991 to Oct. 2008.
14
15       Fsdb is very good at doing things like:
16
17       •   extracting measurements from experimental output
18
19       •   examining data to address different hypotheses
20
21       •   joining data from different experiments
22
23       •   eliminating/detecting outliers
24
25       •   computing statistics on data (mean, confidence intervals,
26           correlations, histograms)
27
28       •   reformatting data for graphing programs
29
30       Fsdb is built around the idea of a flat text file as a database.  Fsdb
31       files (by convention, with the extension .fsdb), have a header
32       documenting the schema (what the columns mean), and then each line
33       represents a database record (or row).
34
35       For example:
36
37               #fsdb experiment duration
38               ufs_mab_sys 37.2
39               ufs_mab_sys 37.3
40               ufs_rcp_real 264.5
41               ufs_rcp_real 277.9
42
43       Is a simple file with four experiments (the rows), each with a
44       description, size parameter, and run time in the first, second, and
45       third columns.
46
47       Rather than hand-code scripts to do each special case, Fsdb provides
48       higher-level functions.  Although it's often easy throw together a
49       custom script to do any single task, I believe that there are several
50       advantages to using Fsdb:
51
52       •   these programs provide a higher level interface than plain Perl, so
53
54           **  Fewer lines of simpler code:
55
56                   dbrow '_experiment eq "ufs_mab_sys"' | dbcolstats duration
57
58               Picks out just one type of experiment and computes statistics
59               on it, rather than:
60
61                   while (<>) { split; $sum+=$F[1]; $ss+=$F[1]**2; $n++; }
62                   $mean = $sum / $n; $std_dev = ...
63
64               in dozens of places.
65
66       •   the library uses names for columns, so
67
68           **  No more $F[1], use "_duration".
69
70           **  New or different order columns?  No changes to your scripts!
71
72           Thus if your experiment gets more complicated with a size
73           parameter, so your log changes to:
74
75                   #fsdb experiment size duration
76                   ufs_mab_sys 1024 37.2
77                   ufs_mab_sys 1024 37.3
78                   ufs_rcp_real 1024 264.5
79                   ufs_rcp_real 1024 277.9
80                   ufs_mab_sys 2048 45.3
81                   ufs_mab_sys 2048 44.2
82
83           Then the previous scripts still work, even though duration is now
84           the third column, not the second.
85
86       •   A series of actions are self-documenting (the provenance of
87           processsing done to produce each output is recorded in comments).
88
89           **  No more wondering what hacks were used to compute the final
90               data, just look at the comments at the end of the output.
91
92           For example, the commands
93
94               dbrow '_experiment eq "ufs_mab_sys"' | dbcolstats duration
95
96           add to the end of the output the lines
97               #    | dbrow _experiment eq "ufs_mab_sys"
98               #    | dbcolstats duration
99
100       •   The library is mature, supporting large datasets (more than 100GB),
101           corner cases, error handling, backed by an automated test suite.
102
103           **  No more puzzling about bad output because your custom script
104               skimped on error checking.
105
106           **  No more memory thrashing when you try to sort ten million
107               records.
108
109       •   Fsdb-2.x supports Perl scripting (in addition to shell scripting),
110           with libraries to do Fsdb input and output, and easy support for
111           pipelines.  The shell script
112
113               dbcol name test1 | dbroweval '_test1 += 5;'
114
115           can be written in perl as:
116
117               dbpipeline(dbcol(qw(name test1)), dbroweval('_test1 += 5;'));
118
119       (The disadvantage is that you need to learn what functions Fsdb
120       provides.)
121
122       Fsdb is built on flat-ASCII databases.  By storing data in simple text
123       files and processing it with pipelines it is easy to experiment (in the
124       shell) and look at the output.  To the best of my knowledge, the
125       original implementation of this idea was "/rdb", a commercial product
126       described in the book UNIX relational database management: application
127       development in the UNIX environment by Rod Manis, Evan Schaffer, and
128       Robert Jorgensen (1988 by Prentice Hall, and also at the web page
129       <http://www.rdb.com/>).  Fsdb is an incompatible re-implementation of
130       their idea without any accelerated indexing or forms support.  (But
131       it's free, and probably has better statistics!).
132
133       Fsdb-2.x will exploit multiple processors or cores, and provides Perl-
134       level support for input, output, and threaded-pipelines.  (As of
135       Fsdb-2.44 it no longer uses Perl threading, just processes, since they
136       are faster.)
137
138       Installation instructions follow at the end of this document.  Fsdb-2.x
139       requires Perl 5.8 to run.  All commands have manual pages and provide
140       usage with the "--help" option.  All commands are backed by an
141       automated test suite.
142
143       The most recent version of Fsdb is available on the web at
144       <http://www.isi.edu/~johnh/SOFTWARE/FSDB/index.html>.
145

WHAT'S NEW

147   3.0, 2022-04-04 Complete type support and accordingly bump major version.
148       NEW The major version number is now 3.0 to correspond to the addition
149           of types (although they were actually added in 2.75).  Old fsdb
150           files are supported (Fsdb-3.0 is backwards compatible with
151           databases), but older versions will confuse types in new files (new
152           Fsdb files are not forward compatible with old versions).
153
154       ENHANCEMENT
155           Type specifications in a few more programs: dbcolhisto,
156           dbcolscorrelate, dbcolsregression, dbcolstatscores,
157           dbrowaccumulate, dbrowcount, dbrowdiff, dbrvstatdiff.
158
159       ENHANCEMENT
160           dbcolhisto now puts an empty value on any empty rows.
161
162       NEW dbcoltype redefines column types, or clears them with the "-v"
163           option.
164

README CONTENTS

166       executive summary
167       what's new
168       README CONTENTS
169       installation
170       basic data format
171       basic data manipulation
172       list of commands
173       another example
174       a gradebook example
175       a password example
176       history
177       related work
178       release notes
179       copyright
180       comments
181

INSTALLATION

183       Fsdb now uses the standard Perl build and installation from
184       ExtUtil::MakeMaker(3), so the quick answer to installation is to type:
185
186           perl Makefile.PL
187           make
188           make test
189           make install
190
191       Or, if you want to install it somewhere else, change the first line to
192
193           perl Makefile.PL PREFIX=$HOME
194
195       and it will go in your home directory's bin, etc.  (See
196       ExtUtil::MakeMaker(3) for more details.)
197
198       Fsdb requires perl 5.8 or later.
199
200       A test-suite is available, run it with
201
202           make test
203
204       In the past, the ports existed for FreeBSD and MacOS.  If someone
205       running one of those OSes wants to contribute a new port, please let me
206       know.
207

BASIC DATA FORMAT

209       These programs are based on the idea storing data in simple ASCII
210       files.  A database is a file with one header line and then data or
211       comment lines.  For example:
212
213               #fsdb account passwd uid gid fullname homedir shell
214               johnh * 2274 134 John_Heidemann /home/johnh /bin/bash
215               greg * 2275 134 Greg_Johnson /home/greg /bin/bash
216               root * 0 0 Root /root /bin/bash
217               # this is a simple database
218
219       The header line must be first and begins with "#fsdb".  There are rows
220       (records) and columns (fields), just like in a normal database.
221       Comment lines begin with "#".  Column names are any string not
222       containing spaces or single quote (although it is prudent to keep them
223       alphanumeric with underscore).
224
225       Columns can optionally include type anntations by following name with
226       :t where t is some type.  (Types are not used in Perl, but are relevant
227       in Python and Go Fsdb bindings.)  Types use a subset of perl pack
228       specifiers: c, s, l, q are signed 8, 16, 32, and 64-bit integers, f is
229       a float, d is double float, a is utf-8 string, and &gt; and &lt; can
230       force big or little endianness.
231
232       By default, columns are delimited by whitespace.  With this default
233       configuration, the contents of a field cannot contain whitespace.
234       However, this limitation can be relaxed by changing the field separator
235       as described below.
236
237       The big advantage of simple flat-text databases is that it is usually
238       easy to massage data into this format, and it's reasonably easy to take
239       data out of this format into other (text-based) programs, like gnuplot,
240       jgraph, and LaTeX.  Think Unix.  Think pipes.  (Or even output to Excel
241       and HTML if you prefer.)
242
243       Since no-whitespace in columns was a problem for some applications,
244       there's an option which relaxes this rule.  You can specify the field
245       separator in the table header with "-F x" where "x" is a code for the
246       new field separator.  A full list of codes is at dbfilealter(1), but
247       two common special values are "-F t" which is a separator of a single
248       tab character, and "-F S", a separator of two spaces.  Both allowing
249       (single) spaces in fields.  An example:
250
251               #fsdb -F S account passwd uid gid fullname homedir shell
252               johnh  *  2274  134  John Heidemann  /home/johnh  /bin/bash
253               greg  *  2275  134  Greg Johnson  /home/greg  /bin/bash
254               root  *  0  0  Root  /root  /bin/bash
255               # this is a simple database
256
257       See dbfilealter(1) for more details.  Regardless of what the column
258       separator is for the body of the data, it's always whitespace in the
259       header.
260
261       There's also a third format: a "list".  Because it's often hard to see
262       what's columns past the first two, in list format each "column" is on a
263       separate line.  The programs dblistize and dbcolize convert to and from
264       this format, and all programs work with either formats.  The command
265
266           dbfilealter -R C  < DATA/passwd.fsdb
267
268       outputs:
269
270               #fsdb -R C account passwd uid gid fullname homedir shell
271               account:  johnh
272               passwd:   *
273               uid:      2274
274               gid:      134
275               fullname: John_Heidemann
276               homedir:  /home/johnh
277               shell:    /bin/bash
278
279               account:  greg
280               passwd:   *
281               uid:      2275
282               gid:      134
283               fullname: Greg_Johnson
284               homedir:  /home/greg
285               shell:    /bin/bash
286
287               account:  root
288               passwd:   *
289               uid:      0
290               gid:      0
291               fullname: Root
292               homedir:  /root
293               shell:    /bin/bash
294
295               # this is a simple database
296               #  | dblistize
297
298       See dbfilealter(1) for more details.
299

BASIC DATA MANIPULATION

301       A number of programs exist to manipulate databases.  Complex functions
302       can be made by stringing together commands with shell pipelines.  For
303       example, to print the home directories of everyone with ``john'' in
304       their names, you would do:
305
306               cat DATA/passwd | dbrow '_fullname =~ /John/' | dbcol homedir
307
308       The output might be:
309
310               #fsdb homedir
311               /home/johnh
312               /home/greg
313               # this is a simple database
314               #  | dbrow _fullname =~ /John/
315               #  | dbcol homedir
316
317       (Notice that comments are appended to the output listing each command,
318       providing an automatic audit log.)
319
320       In addition to typical database functions (select, join, etc.) there
321       are also a number of statistical functions.
322
323       The real power of Fsdb is that one can apply arbitrary code to rows to
324       do powerful things.
325
326               cat DATA/passwd | dbroweval '_fullname =~ s/(\w+)_(\w+)/$2,_$1/'
327
328       converts "John_Heidemann" into "Heidemann,_John".  Not too much more
329       work could split fullname into firstname and lastname fields.
330
331       (Or:
332
333               cat DATA/passwd | dbcolcreate sort | dbroweval -b 'use Fsdb::Support'
334                       '_sort = _fullname; _sort =~ s/_/ /g; _sort = fullname_to_sort(_sort);'
335

TALKING ABOUT COLUMNS

337       An advantage of Fsdb is that you can talk about columns by name
338       (symbolically) rather than simply by their positions.  So in the above
339       example, "dbcol homedir" pulled out the home directory column, and
340       "dbrow '_fullname =~ /John/'" matched against column fullname.
341
342       In general, you can use the name of the column listed on the "#fsdb"
343       line to identify it in most programs, and _name to identify it in code.
344
345       Some alternatives for flexibility:
346
347       •   Numeric values identify columns positionally, numbering from 0.  So
348           0 or _0 is the first column, 1 is the second, etc.
349
350       •   In code, _last_columnname gets the value from columname's previous
351           row.
352
353       See dbroweval(1) for more details about writing code.
354

LIST OF COMMANDS

356       Enough said.  I'll summarize the commands, and then you can experiment.
357       For a detailed description of each command, see a summary by running it
358       with the argument "--help" (or "-?" if you prefer.)  Full manual pages
359       can be found by running the command with the argument "--man", or
360       running the Unix command "man dbcol" or whatever program you want.
361
362   TABLE CREATION
363       dbcolcreate
364           add columns to a database
365
366       dbcoldefine
367           set the column headings for a non-Fsdb file
368
369   TABLE MANIPULATION
370       dbcol
371           select columns from a table
372
373       dbrow
374           select rows from a table
375
376       dbsort
377           sort rows based on a set of columns
378
379       dbjoin
380           compute the natural join of two tables
381
382       dbcolrename
383           rename a column
384
385       dbcolmerge
386           merge two columns into one
387
388       dbcolsplittocols
389           split one column into two or more columns
390
391       dbcolsplittorows
392           split one column into multiple rows
393
394       dbfilepivot
395           "pivots" a file, converting multiple rows corresponding to the same
396           entity into a single row with multiple columns.
397
398       dbfilevalidate
399           check that db file doesn't have some common errors
400
401   COMPUTATION AND STATISTICS
402       dbcolstats
403           compute statistics over a column (mean,etc.,optionally median)
404
405       dbmultistats
406           group rows by some key value, then compute stats (mean, etc.) over
407           each group (equivalent to dbmapreduce with dbcolstats as the
408           reducer)
409
410       dbmapreduce
411           group rows (map) and then apply an arbitrary function to each group
412           (reduce)
413
414       dbrvstatdiff
415           compare two samples distributions (mean/conf interval/T-test)
416
417       dbcolmovingstats
418           computing moving statistics over a column of data
419
420       dbcolstatscores
421           compute Z-scores and T-scores over one column of data
422
423       dbcolpercentile
424           compute the rank or percentile of a column
425
426       dbcolhisto
427           compute histograms over a column of data
428
429       dbcolscorrelate
430           compute the coefficient of correlation over several columns
431
432       dbcolsregression
433           compute linear regression and correlation for two columns
434
435       dbrowaccumulate
436           compute a running sum over a column of data
437
438       dbrowcount
439           count the number of rows (a subset of dbstats)
440
441       dbrowdiff
442           compute differences between a columns in each row of a table
443
444       dbrowenumerate
445           number each row
446
447       dbroweval
448           run arbitrary Perl code on each row
449
450       dbrowuniq
451           count/eliminate identical rows (like Unix uniq(1))
452
453       dbfilediff
454           compare fields on rows of a file (something like Unix diff(1))
455
456   OUTPUT CONTROL
457       dbcolneaten
458           pretty-print columns
459
460       dbfilealter
461           convert between column or list format, or change the column
462           separator
463
464       dbfilestripcomments
465           remove comments from a table
466
467       dbformmail
468           generate a script that sends form mail based on each row
469
470   CONVERSIONS
471       (These programs convert data into fsdb.  See their web pages for
472       details.)
473
474       cgi_to_db
475           <http://stein.cshl.org/boulder/>
476
477       combined_log_format_to_db
478           <http://httpd.apache.org/docs/2.0/logs.html>
479
480       html_table_to_db
481           HTML tables to fsdb (assuming they're reasonably formatted).
482
483       kitrace_to_db
484           <http://ficus-www.cs.ucla.edu/ficus-members/geoff/kitrace.html>
485
486       ns_to_db
487           <http://mash-www.cs.berkeley.edu/ns/>
488
489       sqlselect_to_db
490           the output of SQL SELECT tables to db
491
492       tabdelim_to_db
493           spreadsheet tab-delimited files to db
494
495       tcpdump_to_db
496           (see man tcpdump(8) on any reasonable system)
497
498       xml_to_db
499           XML input to fsdb, assuming they're very regular
500
501       (And out of fsdb:)
502
503       db_to_csv
504           Comma-separated-value format from fsdb.
505
506       db_to_html_table
507           simple conversion of Fsdb to html tables
508
509   STANDARD OPTIONS
510       Many programs have common options:
511
512       -? or --help
513           Show basic usage.
514
515       -N on --new-name
516           When a command creates a new column like dbrowaccumulate's "accum",
517           this option lets one override the default name of that new column.
518
519       -T TmpDir
520           where to put tmp files.  Also uses environment variable TMPDIR, if
521           -T is not specified.  Default is /tmp.
522
523           Show basic usage.
524
525       -c FRACTION or --confidence FRACTION
526           Specify confidence interval FRACTION (dbcolstats, dbmultistats,
527           etc.)
528
529       -C S or "--element-separator S"
530           Specify column separator S (dbcolsplittocols, dbcolmerge).
531
532       -d or --debug
533           Enable debugging (may be repeated for greater effect in some
534           cases).
535
536       -a or --include-non-numeric
537           Compute stats over all data (treating non-numbers as zeros).  (By
538           default, things that can't be treated as numbers are ignored for
539           stats purposes)
540
541       -S or --pre-sorted
542           Assume the data is pre-sorted.  May be repeated to disable
543           verification (saving a small amount of work).
544
545       -e E or --empty E
546           give value E as the value for empty (null) records
547
548       -i I or --input I
549           Input data from file I.
550
551       -o O or --output O
552           Write data out to file O.
553
554       --header H
555           Use H as the full Fsdb header, rather than reading a header from
556           then input.  This option is particularly useful when using Fsdb
557           under Hadoop, where split files don't have heades.
558
559       --nolog.
560           Skip logging the program in a trailing comment.
561
562       When giving Perl code (in dbrow and dbroweval) column names can be
563       embedded if preceded by underscores.  Look at dbrow(1) or dbroweval(1)
564       for examples.)
565
566       Most programs run in constant memory and use temporary files if
567       necessary.  Exceptions are dbcolneaten, dbcolpercentile, dbmapreduce,
568       dbmultistats, dbrowsplituniq.
569
570   STANDARD SORTING OPTIONS
571       A number of programs do sorting, or depend on defining an ordering of
572       rows.  Such programs use these standard sorting options:
573
574       -r or --descending
575           sort in reverse order (high to low)
576
577       -R or --ascending
578           sort in normal order (low to high)
579
580       -t or --type-inferred-sorting
581           sort fields by type (numeric or leicographic), automatically
582
583       -n or --numeric
584           sort numerically
585
586       -N or --lexical
587           sort lexicographically
588

ANOTHER EXAMPLE

590       Take the raw data in "DATA/http_bandwidth", put a header on it
591       ("dbcoldefine size bw"), took statistics of each category
592       ("dbmultistats -k size bw"), pick out the relevant fields ("dbcol size
593       mean stddev pct_rsd"), and you get:
594
595               #fsdb size mean stddev pct_rsd
596               1024    1.4962e+06      2.8497e+05      19.047
597               10240   5.0286e+06      6.0103e+05      11.952
598               102400  4.9216e+06      3.0939e+05      6.2863
599               #  | dbcoldefine size bw
600               #  | /home/johnh/BIN/DB/dbmultistats -k size bw
601               #  | /home/johnh/BIN/DB/dbcol size mean stddev pct_rsd
602
603       (The whole command was:
604
605               cat DATA/http_bandwidth |
606               dbcoldefine size |
607               dbmultistats -k size bw |
608               dbcol size mean stddev pct_rsd
609
610       all on one line.)
611
612       Then post-process them to get rid of the exponential notation by adding
613       this to the end of the pipeline:
614
615           dbroweval '_mean = sprintf("%8.0f", _mean); _stddev = sprintf("%8.0f", _stddev);'
616
617       (Actually, this step is no longer required since dbcolstats now uses a
618       different default format.)
619
620       giving:
621
622               #fsdb      size    mean    stddev  pct_rsd
623               1024     1496200          284970        19.047
624               10240    5028600          601030        11.952
625               102400   4921600          309390        6.2863
626               #  | dbcoldefine size bw
627               #  | dbmultistats -k size bw
628               #  | dbcol size mean stddev pct_rsd
629               #  | dbroweval   { _mean = sprintf("%8.0f", _mean); _stddev = sprintf("%8.0f", _stddev); }
630
631       In a few lines, raw data is transformed to processed output.
632
633       Suppose you expect there is an odd distribution of results of one
634       datapoint.  Fsdb can easily produce a CDF (cumulative distribution
635       function) of the data, suitable for graphing:
636
637           cat DB/DATA/http_bandwidth | \
638               dbcoldefine size bw | \
639               dbrow '_size == 102400' | \
640               dbcol bw | \
641               dbsort -n bw | \
642               dbrowenumerate | \
643               dbcolpercentile count | \
644               dbcol bw percentile | \
645               xgraph
646
647       The steps, roughly: 1. get the raw input data and turn it into fsdb
648       format, 2. pick out just the relevant column (for efficiency) and sort
649       it, 3. for each data point, assign a CDF percentage to it, 4. pick out
650       the two columns to graph and show them
651

A GRADEBOOK EXAMPLE

653       The first commercial program I wrote was a gradebook, so here's how to
654       do it with Fsdb.
655
656       Format your data like DATA/grades.
657
658               #fsdb name email id test1
659               a a@ucla.example.edu 1 80
660               b b@usc.example.edu 2 70
661               c c@isi.example.edu 3 65
662               d d@lmu.example.edu 4 90
663               e e@caltech.example.edu 5 70
664               f f@oxy.example.edu 6 90
665
666       Or if your students have spaces in their names, use "-F S" and two
667       spaces to separate each column:
668
669               #fsdb -F S name email id test1
670               alfred aho  a@ucla.example.edu  1  80
671               butler lampson  b@usc.example.edu  2  70
672               david clark  c@isi.example.edu  3  65
673               constantine drovolis  d@lmu.example.edu  4  90
674               debrorah estrin  e@caltech.example.edu  5  70
675               sally floyd  f@oxy.example.edu  6  90
676
677       To compute statistics on an exam, do
678
679               cat DATA/grades | dbstats test1 |dblistize
680
681       giving
682
683               #fsdb -R C  ...
684               mean:        77.5
685               stddev:      10.84
686               pct_rsd:     13.987
687               conf_range:  11.377
688               conf_low:    66.123
689               conf_high:   88.877
690               conf_pct:    0.95
691               sum:         465
692               sum_squared: 36625
693               min:         65
694               max:         90
695               n:           6
696               ...
697
698       To do a histogram:
699
700               cat DATA/grades | dbcolhisto -n 5 -g test1
701
702       giving
703
704               #fsdb low histogram
705               65      *
706               70      **
707               75
708               80      *
709               85
710               90      **
711               #  | /home/johnh/BIN/DB/dbhistogram -n 5 -g test1
712
713       Now you want to send out grades to the students by e-mail.  Create a
714       form-letter (in the file test1.txt):
715
716               To: _email (_name)
717               From: J. Random Professor <jrp@usc.example.edu>
718               Subject: test1 scores
719
720               _name, your score on test1 was _test1.
721               86+   A
722               75-85 B
723               70-74 C
724               0-69  F
725
726       Generate the shell script that will send the mail out:
727
728               cat DATA/grades | dbformmail test1.txt > test1.sh
729
730       And run it:
731
732               sh <test1.sh
733
734       The last two steps can be combined:
735
736               cat DATA/grades | dbformmail test1.txt | sh
737
738       but I like to keep a copy of exactly what I send.
739
740       At the end of the semester you'll want to compute grade totals and
741       assign letter grades.  Both fall out of dbroweval.  For example, to
742       compute weighted total grades with a 40% midterm/60% final where the
743       midterm is 84 possible points and the final 100:
744
745               dbcol -rv total |
746               dbcolcreate total - |
747               dbroweval '
748                       _total = .40 * _midterm/84.0 + .60 * _final/100.0;
749                       _total = sprintf("%4.2f", _total);
750                       if (_final eq "-" || ( _name =~ /^_/)) { _total = "-"; };' |
751               dbcolneaten
752
753       If you got the data originally from a spreadsheet, save it in "tab-
754       delimited" format and convert it with tabdelim_to_db (run
755       tabdelim_to_db -? for examples).
756

A PASSWORD EXAMPLE

758       To convert the Unix password file to db:
759
760               cat /etc/passwd | sed 's/:/  /g'| \
761                       dbcoldefine -F S login password uid gid gecos home shell \
762                       >passwd.fsdb
763
764       To convert the group file
765
766               cat /etc/group | sed 's/:/  /g' | \
767                       dbcoldefine -F S group password gid members \
768                       >group.fsdb
769
770       To show the names of the groups that div7-members are in (assuming DIV7
771       is in the gecos field):
772
773               cat passwd.fsdb | dbrow '_gecos =~ /DIV7/' | dbcol login gid | \
774                       dbjoin -i - -i group.fsdb gid | dbcol login group
775

SHORT EXAMPLES

777       Which Fsdb programs are the most complicated (based on number of test
778       cases)?
779
780               ls TEST/*.cmd | \
781                       dbcoldefine test | \
782                       dbroweval '_test =~ s@^TEST/([^_]+).*$@$1@' | \
783                       dbrowuniq -c | \
784                       dbsort -nr count | \
785                       dbcolneaten
786
787       (Answer: dbmapreduce, then dbcolstats, dbfilealter and dbjoin.)
788
789       Stats on an exam (in $FILE, where $COLUMN is the name of the exam)?
790
791               cat $FILE | dbcolstats -q 4 $COLUMN <$FILE | dblistize | dbstripcomments
792
793               cat $FILE | dbcolhisto -g -n 20 $COLUMN | dbcolneaten | dbstripcomments
794
795       Merging a the hw1 column from file hw1.fsdb into grades.fsdb assuming
796       there's a common student id in column "id":
797
798               dbcol id hw1 <hw1.fsdb >t.fsdb
799
800               dbjoin -a -e - grades.fsdb t.fsdb id | \
801                   dbsort  name | \
802                   dbcolneaten >new_grades.fsdb
803
804       Merging two fsdb files with the same rows:
805
806               cat file1.fsdb file2.fsdb >output.fsdb
807
808       or if you want to clean things up a bit
809
810               cat file1.fsdb file2.fsdb | dbstripextraheaders >output.fsdb
811
812       or if you want to know where the data came from
813
814               for i in 1 2
815               do
816                       dbcolcreate source $i < file$i.fsdb
817               done >output.fsdb
818
819       (assumes you're using a Bourne-shell compatible shell, not csh).
820

WARNINGS

822       As with any tool, one should (which means must) understand the limits
823       of the tool.
824
825       All Fsdb tools should run in constant memory.  In some cases (such as
826       dbcolstats with quartiles, where the whole input must be re-read),
827       programs will spool data to disk if necessary.
828
829       Most tools buffer one or a few lines of data, so memory will scale with
830       the size of each line.  (So lines with many columns, or when columns
831       have lots data, may cause large memory consumption.)
832
833       All Fsdb tools should run in constant or at worst "n log n" time.
834
835       All Fsdb tools use normal Perl math routines for computation.  Although
836       I make every attempt to choose numerically stable algorithms (although
837       I also welcome feedback and suggestions for improvement), normal
838       rounding due to computer floating point approximations can result in
839       inaccuracies when data spans a large range of precision.  (See for
840       example the dbcolstats_extrema test cases.)
841
842       Any requirements and limitations of each Fsdb tool is documented on its
843       manual page.
844
845       If any Fsdb program violates these assumptions, that is a bug that
846       should be documented on the tool's manual page or ideally fixed.
847
848       Fsdb does depend on Perl's correctness, and Perl (and Fsdb) have some
849       bugs.  Fsdb should work on perl from version 5.10 onward.
850

HISTORY

852       There have been four major versions of Fsdb: fsdb-0.x was begun in 1991
853       for my personal use.  Fsdb 1.0 is a complete re-write of the pre-1995
854       versions, and was distributed from 1995 to 2007.  Fsdb 2.0 is a
855       significant re-write of the 1.x versions to systematically use a
856       library and threads (although threads were abandoned in 2.44).  Fsdb
857       3.0 in 2022 adds type specifiers to the schema, mostly to support use
858       in languages with stronger typing (like Python, Go, and C).
859
860       Fsdb (in its various forms) has been used extensively by its author
861       since 1991.  Since 1995 it's been used by two other researchers at UCLA
862       and several at ISI.  In February 1998 it was announced to the Internet.
863       Since then it has found a few users, some outside where I work.
864
865       Major changes:
866
867       1.0 1997-07-22: first public release.
868       2.0 2008-01-25: rewrite to use a common library, and starting to use
869       threads.
870       2.12 2008-10-16: completion of the rewrite, and first RPM package.
871       2.44 2013-10-02: abandoning threads for improved performance
872       3.0 2022-04-04: adding type specifiers to the schema
873
874   Fsdb 2.0 Rationale
875       I've thought about fsdb-2.0 for many years, but it was started in
876       earnest in 2007.  Fsdb-2.0 has the following goals:
877
878       in-one-process processing
879           While fsdb is great on the Unix command line as a pipeline between
880           programs, it should also be possible to set it up to run in a
881           single process.  And if it does so, it should be able to avoid
882           serializing and deserializing (converting to and from text) data
883           between each module.  (Accomplished in fsdb-2.0: see dbpipeline,
884           although still needs tuning.)
885
886       clean IO API
887           Fsdb's roots go back to perl4 and 1991, so the fsdb-1.x library is
888           very, very crufty.  More than just being ugly (but it was that
889           too), this made things reading from one format file and writing to
890           another the application's job, when it should be the library's.
891           (Accomplished in fsdb-1.15 and improved in 2.0: see Fsdb::IO.)
892
893       normalized module APIs
894           Because fsdb modules were added as needed over 10 years, sometimes
895           the module APIs became inconsistent.  (For example, the 1.x
896           "dbcolcreate" required an empty value following the name of the new
897           column, but other programs specify empty values with the "-e"
898           argument.)  We should smooth over these inconsistencies.
899           (Accomplished as each module was ported in 2.0 through 2.7.)
900
901       everyone handles all input formats
902           Given a clean IO API, the distinction between "colized" and
903           "listized" fsdb files should go away.  Any program should be able
904           to read and write files in any format.  (Accomplished in fsdb-2.1.)
905
906       Fsdb-2.0 preserves backwards compatibility where possible, but breaks
907       it where necessary to accomplish the above goals.  In August 2008,
908       Fsdb-2.7 was declared preferred over the 1.x versions.  Benchmarking in
909       2013 showed that threading performed much worse than just using pipes,
910       so Fsdb-2.44 uses threading "style", but implemented with processes
911       (via my "Freds" library).
912
913   Contributors
914       Fsdb includes code ported from Geoff Kuenning
915       ("Fsdb::Support::TDistribution").
916
917       Fsdb contributors: Ashvin Goel goel@cse.oge.edu, Geoff Kuenning
918       geoff@fmg.cs.ucla.edu, Vikram Visweswariah visweswa@isi.edu, Kannan
919       Varadahan kannan@isi.edu, Lars Eggert larse@isi.edu, Arkadi Gelfond
920       arkadig@dyna.com, David Graff graff@ldc.upenn.edu, Haobo Yu
921       haoboy@packetdesign.com, Pavlin Radoslavov pavlin@catarina.usc.edu,
922       Graham Phillips, Yuri Pradkin, Alefiya Hussain, Ya Xu, Michael
923       Schwendt, Fabio Silva fabio@isi.edu, Jerry Zhao zhaoy@isi.edu, Ning Xu
924       nxu@aludra.usc.edu, Martin Lukac mlukac@lecs.cs.ucla.edu, Xue Cai,
925       Michael McQuaid, Christopher Meng, Calvin Ardi, H. Merijn Brand, Lan
926       Wei, Hang Guo, Wes Hardaker.
927
928       Fsdb includes datasets contributed from NIST (DATA/nist_zarr13.fsdb),
929       from
930       <http://www.itl.nist.gov/div898/handbook/eda/section4/eda4281.htm>, the
931       NIST/SEMATECH e-Handbook of Statistical Methods, section 1.4.2.8.1.
932       Background and Data.  The source is public domain, and reproduced with
933       permission.
934

RELATED WORK

936       As stated in the introduction, Fsdb is an incompatible reimplementation
937       of the ideas found in "/rdb".  By storing data in simple text files and
938       processing it with pipelines it is easy to experiment (in the shell)
939       and look at the output.  The original implementation of this idea was
940       /rdb, a commercial product described in the book UNIX relational
941       database management: application development in the UNIX environment by
942       Rod Manis, Evan Schaffer, and Robert Jorgensen (and also at the web
943       page <http://www.rdb.com/>).
944
945       While Fsdb is inspired by Rdb, it includes no code from it, and Fsdb
946       makes several different design choices.  In particular: rdb attempts to
947       be closer to a "real" database, with provision for locking, file
948       indexing.  Fsdb focuses on single user use and so eschews these
949       choices.  Rdb also has some support for interactive editing.  Fsdb
950       leaves editing to text editors like emacs or vi.
951
952       In August, 2002 I found out Carlo Strozzi extended RDB with his package
953       NoSQL <http://www.linux.it/~carlos/nosql/>.  According to Mr. Strozzi,
954       he implemented NoSQL in awk to avoid the Perl start-up of RDB.
955       Although I haven't found Perl startup overhead to be a big problem on
956       my platforms (from old Sparcstation IPCs to 2GHz Pentium-4s), you may
957       want to evaluate his system.  The Linux Journal has a description of
958       NoSQL at <http://www.linuxjournal.com/article/3294>.  It seems quite
959       similar to Fsdb.  Like /rdb, NoSQL supports indexing (not present in
960       Fsdb).  Fsdb appears to have richer support for statistics, and, as of
961       Fsdb-2.x, its support for Perl threading may support faster performance
962       (one-process, less serialization and deserialization).
963

RELEASE NOTES

965       Versions prior to 1.0 were released informally on my web page but were
966       not announced.
967
968   0.0 1991
969       started for my own research use
970
971   0.1 26-May-94
972       first check-in to RCS
973
974   0.2 15-Mar-95
975       parts now require perl5
976
977   1.0, 22-Jul-97
978       adds autoconf support and a test script.
979
980   1.1, 20-Jan-98
981       support for double space field separators, better tests
982
983   1.2, 11-Feb-98
984       minor changes and release on comp.lang.perl.announce
985
986   1.3, 17-Mar-98
987       •   adds median and quartile options to dbstats
988
989       •   adds dmalloc_to_db converter
990
991       •   fixes some warnings
992
993       •   dbjoin now can run on unsorted input
994
995       •   fixes a dbjoin bug
996
997       •   some more tests in the test suite
998
999   1.4, 27-Mar-98
1000       •   improves error messages (all should now report the program that
1001           makes the error)
1002
1003       •   fixed a bug in dbstats output when the mean is zero
1004
1005   1.5, 25-Jun-98
1006       BUG FIX dbcolhisto, dbcolpercentile now handles non-numeric values like
1007       dbstats
1008       NEW dbcolstats computes zscores and tscores over a column
1009       NEW dbcolscorrelate computes correlation coefficients between two
1010       columns
1011       INTERNAL ficus_getopt.pl has been replaced by DbGetopt.pm
1012       BUG FIX all tests are now ``portable'' (previously some tests ran only
1013       on my system)
1014       BUG FIX you no longer need to have the db programs in your path (fix
1015       arose from a discussion with Arkadi Gelfond)
1016       BUG FIX installation no longer uses cp -f (to work on SunOS 4)
1017
1018   1.6, 24-May-99
1019       NEW dbsort, dbstats, dbmultistats now run in constant memory (using tmp
1020       files if necessary)
1021       NEW dbcolmovingstats does moving means over a series of data
1022       NEW dbcol has a -v option to get all columns except those listed
1023       NEW dbmultistats does quartiles and medians
1024       NEW dbstripextraheaders now also cleans up bogus comments before the
1025       fist header
1026       BUG FIX dbcolneaten works better with double-space-separated data
1027
1028   1.7,  5-Jan-00
1029       NEW dbcolize now detects and rejects lines that contain embedded copies
1030       of the field separator
1031       NEW configure tries harder to prevent people from improperly
1032       configuring/installing fsdb
1033       NEW tcpdump_to_db converter (incomplete)
1034       NEW tabdelim_to_db converter:  from spreadsheet tab-delimited files to
1035       db
1036       NEW mailing lists for fsdb are     "fsdb-announce@heidemann.la.ca.us"
1037       and  "fsdb-talk@heidemann.la.ca.us"
1038           To subscribe to either, send mail
1039           to    "fsdb-announce-request@heidemann.la.ca.us"   or
1040           "fsdb-talk-request@heidemann.la.ca.us"     with "subscribe" in the
1041           BODY of the message.
1042
1043       BUG FIX dbjoin used to produce incorrect output if there were extra,
1044       unmatched values in the 2nd table. Thanks to Graham Phillips for
1045       providing a test case.
1046       BUG FIX the sample commands in the usage strings now all should
1047       explicitly include the source of data (typically from "cat foo.fsdb
1048       |").  Thanks to Ya Xu for pointing out this documentation deficiency.
1049       BUG FIX (DOCUMENTATION) dbcolmovingstats had incorrect sample output.
1050
1051   1.8, 28-Jun-00
1052       BUG FIX header options are now preserved when writing with dblistize
1053       NEW dbrowuniq now optionally checks for uniqueness only on certain
1054       fields
1055       NEW dbrowsplituniq makes one pass through a file and splits it into
1056       separate files based on the given fields
1057       NEW converter for "crl" format network traces
1058       NEW anywhere you use arbitrary code (like dbroweval), _last_foo now
1059       maps to the last row's value for field _foo.
1060       OPTIMIZATION comment processing slightly changed so that dbmultistats
1061       now is much faster on files with lots of comments (for example, ~100k
1062       lines of comments and 700 lines of data!) (Thanks to Graham Phillips
1063       for pointing out this performance problem.)
1064       BUG FIX dbstats with median/quartiles now correctly handles singleton
1065       data points.
1066
1067   1.9,  6-Nov-00
1068       NEW dbfilesplit, split a single input file into multiple output files
1069       (based on code contributed by Pavlin Radoslavov).
1070       BUG FIX dbsort now works with perl-5.6
1071
1072   1.10, 10-Apr-01
1073       BUG FIX dbstats now handles the case where there are more n-tiles than
1074       data
1075       NEW dbstats now includes a -S option to optimize work on pre-sorted
1076       data (inspired by code contributed by Haobo Yu)
1077       BUG FIX dbsort now has a better estimate of memory usage when run on
1078       data with very short records (problem detected by Haobo Yu)
1079       BUG FIX cleanup of temporary files is slightly better
1080
1081   1.11,  2-Nov-01
1082       BUG FIX dbcolneaten now runs in constant memory
1083       NEW dbcolneaten now supports "field specifiers" that allow some control
1084       over how wide columns should be
1085       OPTIMIZATION dbsort now tries hard to be filesystem cache-friendly
1086       (inspired by "Information and Control in Gray-box Systems" by the
1087       Arpaci-Dusseau's at SOSP 2001)
1088       INTERNAL t_distr now ported to perl5 module DbTDistr
1089
1090   1.12,  30-Oct-02
1091       BUG FIX dbmultistats documentation typo fixed
1092       NEW dbcolmultiscale
1093       NEW dbcol has -r option for "relaxed error checking"
1094       NEW dbcolneaten has new -e option to strip end-of-line spaces
1095       NEW dbrow finally has a -v option to negate the test
1096       BUG FIX math bug in dbcoldiff fixed by Ashvin Goel (need to check
1097       Scheaffer test cases)
1098       BUG FIX some patches to run with Perl 5.8. Note: some programs
1099       (dbcolmultiscale, dbmultistats, dbrowsplituniq) generate warnings like:
1100       "Use of uninitialized value in concatenation (.)" or "string at
1101       /usr/lib/perl5/5.8.0/FileCache.pm line 98, <STDIN> line 2". Please
1102       ignore this until I figure out how to suppress it. (Thanks to Jerry
1103       Zhao for noticing perl-5.8 problems.)
1104       BUG FIX fixed an autoconf problem where configure would fail to find a
1105       reasonable prefix (thanks to Fabio Silva for reporting the problem)
1106       NEW db_to_html_table: simple conversion to html tables (NO fancy stuff)
1107       NEW dblib now has a function dblib_text2html() that will do simple
1108       conversion of iso-8859-1 to HTML
1109
1110   1.13,  4-Feb-04
1111       NEW fsdb added to the freebsd ports tree
1112       <http://www.freshports.org/databases/fsdb/>.  Maintainer:
1113       "larse@isi.edu"
1114       BUG FIX properly handle trailing spaces when data must be numeric (ex.
1115       dbstats with -FS, see test dbstats_trailing_spaces). Fix from Ning Xu
1116       "nxu@aludra.usc.edu".
1117       NEW dbcolize error message improved (bug report from Terrence Brannon),
1118       and list format documented in the README.
1119       NEW cgi_to_db converts CGI.pm-format storage to fsdb list format
1120       BUG FIX handle numeric synonyms for column names in dbcol properly
1121       ENHANCEMENT "talking about columns" section added to README. Lack of
1122       documentation pointed out by Lars Eggert.
1123       CHANGE dbformmail now defaults to using Mail ("Berkeley Mail") to send
1124       mail, rather than sendmail (sendmail is still an option, but mail
1125       doesn't require running as root)
1126       NEW on platforms that support it (i.e., with perl 5.8), fsdb works fine
1127       with unicode
1128       NEW dbfilevalidate: check a db file for some common errors
1129
1130   1.14,  24-Aug-06
1131       ENHANCEMENT README cleanup
1132       INCOMPATIBLE CHANGE dbcolsplit renamed dbcolsplittocols
1133       NEW dbcolsplittorows  split one column into multiple rows
1134       NEW dbcolsregression compute linear regression and correlation for two
1135       columns
1136       ENHANCEMENT cvs_to_db: better error handling, normalize field names,
1137       skip blank lines
1138       ENHANCEMENT dbjoin now detects (and fails) if non-joined files have
1139       duplicate names
1140       BUG FIX minor bug fixed in calculation of Student t-distributions
1141       (doesn't change any test output, but may have caused small errors)
1142
1143   1.15, 12-Nov-07
1144       NEW fsdb-1.14 added to the MacOS Fink system
1145       <http://pdb.finkproject.org/pdb/package.php/fsdb>. (Thanks to Lars
1146       Eggert for maintaining this port.)
1147       NEW Fsdb::IO::Reader and Fsdb::IO::Writer now provide reasonably clean
1148       OO I/O interfaces to Fsdb files.  Highly recommended if you use fsdb
1149       directly from perl.  In the fullness of time I expect to reimplement
1150       the entire thing using these APIs to replace the current dblib.pl which
1151       is still hobbled by its roots in perl4.
1152       NEW dbmapreduce now implements a Google-style map/reduce abstraction,
1153       generalizing dbmultistats.
1154       ENHANCEMENT fsdb now uses the Perl build system (Makefile.PL, etc.),
1155       instead of autoconf.  This change paves the way to better perl-5-style
1156       modularization, proper manual pages, input of both listize and colize
1157       format for every program, and world peace.
1158       ENHANCEMENT dblib.pl is now moved to Fsdb::Old.pm.
1159       BUG FIX dbmultistats now propagates its format argument (-f). Bug and
1160       fix from Martin Lukac (thanks!).
1161       ENHANCEMENT dbformmail documentation now is clearer that it doesn't
1162       send the mail, you have to run the shell script it writes.  (Problem
1163       observed by Unkyu Park.)
1164       ENHANCEMENT adapted to autoconf-2.61 (and then these changes were
1165       discarded in favor of The Perl Way.
1166       BUG FIX dbmultistats memory usage corrected (O(# tags), not O(1))
1167       ENHANCEMENT dbmultistats can now optionally run with pre-grouped input
1168       in O(1) memory
1169       ENHANCEMENT dbroweval -N was finally implemented (eat comments)
1170
1171   2.0, 25-Jan-08
1172       2.0, 25-Jan-08 --- a quiet 2.0 release (gearing up towards complete)
1173
1174       ENHANCEMENT: shifting old programs to Perl modules, with the front-end
1175       program as just a wrapper. In the short-term, this change just means
1176       programs have real man pages. In the long-run, it will mean that one
1177       can run a pipeline in a single Perl program. So far: dbcol, dbroweval,
1178       the new dbrowcount. dbsort the new dbmerge, the old "dbstats" (renamed
1179       dbcolstats), dbcolrename, dbcolcreate,
1180       NEW: Fsdb::Filter::dbpipeline is an internal-only module that lets one
1181       use fsdb commands from within perl (via threads).
1182           It also provides perl function aliases for the internal modules, so
1183           a string of fsdb commands in perl are nearly as terse as in the
1184           shell:
1185
1186               use Fsdb::Filter::dbpipeline qw(:all);
1187               dbpipeline(
1188                   dbrow(qw(name test1)),
1189                   dbroweval('_test1 += 5;')
1190               );
1191
1192       INCOMPATIBLE CHANGE: The old dbcolstats has been renamed
1193       dbcolstatscores. The new dbcolstats does the same thing as the old
1194       dbstats. This incompatibility is unfortunate but normalizes program
1195       names.
1196       CHANGE: The new dbcolstats program always outputs "-" (the default
1197       empty value) for statistics it cannot compute (for example, standard
1198       deviation if there is only one row), instead of the old mix of "-" and
1199       "na".
1200       INCOMPATIBLE CHANGE: The old dbcolstats program, now called
1201       dbcolstatscores, also has different arguments.  The "-t mean,stddev"
1202       option is now "--tmean mean --tstddev stddev".  See dbcolstatscores for
1203       details.
1204       INCOMPATIBLE CHANGE: dbcolcreate now assumes all new columns get the
1205       default value rather than requiring each column to have an initial
1206       constant value. To change the initial value, sue the new "-e" option.
1207       NEW: dbrowcount counts rows, an almost-subset of dbcolstats's "n"
1208       output (except without differentiating numeric/non-numeric input), or
1209       the equivalent of "dbstripcomments | wc -l".
1210       NEW: dbmerge merges two sorted files. This functionality was previously
1211       embedded in dbsort.
1212       INCOMPATIBLE CHANGE: dbjoin's "-i" option to include non-matches is now
1213       renamed "-a", so as to not conflict with the new standard option "-i"
1214       for input file.
1215
1216   2.1,  6-Apr-08
1217       2.1,  6-Apr-08 --- another alpha 2.0, but now all converted programs
1218       understand both listize and colize format
1219
1220       ENHANCEMENT: shifting more old programs to Perl modules. New in 2.1:
1221       dbcolneaten, dbcoldefine, dbcolhisto, dblistize, dbcolize, dbrecolize
1222       ENHANCEMENT dbmerge now handles an arbitrary number of input files, not
1223       just exactly two.
1224       NEW dbmerge2 is an internal routine that handles merging exactly two
1225       files.
1226       INCOMPATIBLE CHANGE dbjoin now specifies inputs like dbmerge2, rather
1227       than assuming the first two arguments were tables (as in fsdb-1).
1228           The old dbjoin argument "-i" is now "-a" or <--type=outer>.
1229
1230           A minor change: comments in the source files for dbjoin are now
1231           intermixed with output rather than being delayed until the end.
1232
1233       ENHANCEMENT dbsort now no longer produces warnings when null values are
1234       passed to numeric comparisons.
1235       BUG FIX dbroweval now once again works with code that lacks a trailing
1236       semicolon. (This bug fixes a regression from 1.15.)
1237       INCOMPATIBLE CHANGE dbcolneaten's old "-e" option (to avoid end-of-line
1238       spaces) is now "-E" to avoid conflicts with the standard empty field
1239       argument.
1240       INCOMPATIBLE CHANGE dbcolhisto's old "-e" option is now "-E" to avoid
1241       conflicts. And its "-n", "-s", and "-w" are now "-N", "-S", and "-W" to
1242       correspond.
1243       NEW dbfilealter replaces dbrecolize, dblistize, and dbcolize, but with
1244       different options.
1245       ENHANCEMENT The library routines "Fsdb::IO" now understand both list-
1246       format and column-format data, so all converted programs can now
1247       automatically read either format.  This capability was one of the
1248       milestone goals for 2.0, so yea!
1249
1250   2.2, 23-May-08
1251       Release 2.2 is another 2.x alpha release.  Now most of the commands are
1252       ported, but a few remain, and I plan one last incompatible change (to
1253       the file header) before 2.x final.
1254
1255       ENHANCEMENT
1256           shifting more old programs to Perl modules.  New in 2.2:
1257           dbrowaccumulate, dbformmail.  dbcolmovingstats.  dbrowuniq.
1258           dbrowdiff.  dbcolmerge.  dbcolsplittocols.  dbcolsplittorows.
1259           dbmapreduce.  dbmultistats.  dbrvstatdiff.  Also dbrowenumerate
1260           exists only as a front-end (command-line) program.
1261
1262       INCOMPATIBLE CHANGE
1263           The following programs have been dropped from fsdb-2.x:
1264           dbcoltighten, dbfilesplit, dbstripextraheaders,
1265           dbstripleadingspace.
1266
1267       NEW combined_log_format_to_db to convert Apache logfiles
1268
1269       INCOMPATIBLE CHANGE
1270           Options to dbrowdiff are now -B and -I, not -a and -i.
1271
1272       INCOMPATIBLE CHANGE
1273           dbstripcomments is now dbfilestripcomments.
1274
1275       BUG FIXES
1276           dbcolneaten better handles empty columns; dbcolhisto warning
1277           suppressed (actually a bug in high-bucket handling).
1278
1279       INCOMPATIBLE CHANGE
1280           dbmultistats now requires a "-k" option in front of the key (tag)
1281           field, or if none is given, it will group by the first field (both
1282           like dbmapreduce).
1283
1284       KNOWN BUG
1285           dbmultistats with quantile option doesn't work currently.
1286
1287       INCOMPATIBLE CHANGE
1288           dbcoldiff is renamed dbrvstatdiff.
1289
1290       BUG FIXES
1291           dbformmail was leaving its log message as a  command, not a
1292           comment.  Oops.  No longer.
1293
1294   2.3, 27-May-08 (alpha)
1295       Another alpha release, this one just to fix the critical dbjoin bug
1296       listed below (that happens to have blocked my MP3 jukebox :-).
1297
1298       BUG FIX
1299           Dbsort no longer hangs if given an input file with no rows.
1300
1301       BUG FIX
1302           Dbjoin now works with unsorted input coming from a pipeline (like
1303           stdin).  Perl-5.8.8 has a bug (?) that was making this case
1304           fail---opening stdin in one thread, reading some, then reading more
1305           in a different thread caused an lseek which works on files, but
1306           fails on pipes like stdin.  Go figure.
1307
1308       BUG FIX / KNOWN BUG
1309           The dbjoin fix also fixed dbmultistats -q (it now gives the right
1310           answer).  Although a new bug appeared, messages like:
1311               Attempt to free unreferenced scalar: SV 0xa9dd0c4, Perl
1312           interpreter: 0xa8350b8 during global destruction.  So the
1313           dbmultistats_quartile test is still disabled.
1314
1315   2.4, 18-Jun-08
1316       Another alpha release, mostly to fix minor usability problems in
1317       dbmapreduce and client functions.
1318
1319       ENHANCEMENT
1320           dbrow now defaults to running user supplied code without warnings
1321           (as with fsdb-1.x).  Use "--warnings" or "-w" to turn them back on.
1322
1323       ENHANCEMENT
1324           dbroweval can now write different format output than the input,
1325           using the "-m" option.
1326
1327       KNOWN BUG
1328           dbmapreduce emits warnings on perl 5.10.0 about "Unbalanced string
1329           table refcount" and "Scalars leaked" when run with an external
1330           program as a reducer.
1331
1332           dbmultistats emits the warning "Attempt to free unreferenced
1333           scalar" when run with quartiles.
1334
1335           In each case the output is correct.  I believe these can be
1336           ignored.
1337
1338       CHANGE
1339           dbmapreduce no longer logs a line for each reducer that is invoked.
1340
1341   2.5, 24-Jun-08
1342       Another alpha release, fixing more minor bugs in "dbmapreduce" and
1343       lossage in "Fsdb::IO".
1344
1345       ENHANCEMENT
1346           dbmapreduce can now tolerate non-map-aware reducers that pass back
1347           the key column in put.  It also passes the current key as the last
1348           argument to external reducers.
1349
1350       BUG FIX
1351           Fsdb::IO::Reader, correctly handle "-header" option again.  (Broken
1352           since fsdb-2.3.)
1353
1354   2.6, 11-Jul-08
1355       Another alpha release, needed to fix DaGronk.  One new port, small bug
1356       fixes, and important fix to dbmapreduce.
1357
1358       ENHANCEMENT
1359           shifting more old programs to Perl modules.  New in 2.2:
1360           dbcolpercentile.
1361
1362       INCOMPATIBLE CHANGE and ENHANCEMENTS dbcolpercentile arguments changed,
1363       use "--rank" to require ranking instead of "-r". Also, "--ascending"
1364       and "--descending" can now be specified separately, both for
1365       "--percentile" and "--rank".
1366       BUG FIX
1367           Sigh, the sense of the --warnings option in dbrow was inverted.  No
1368           longer.
1369
1370       BUG FIX
1371           I found and fixed the string leaks (errors like "Unbalanced string
1372           table refcount" and "Scalars leaked") in dbmapreduce and
1373           dbmultistats.  (All "IO::Handle"s in threads must be manually
1374           destroyed.)
1375
1376       BUG FIX
1377           The "-C" option to specify the column separator in dbcolsplittorows
1378           now works again (broken since it was ported).
1379
1380       2.7, 30-Jul-08 beta
1381
1382       The beta release of fsdb-2.x.  Finally, all programs are ported.  As
1383       statistics, the number of lines of non-library code doubled from 7.5k
1384       to 15.5k.  The libraries are much more complete, going from 866 to 5164
1385       lines.  The overall number of programs is about the same, although 19
1386       were dropped and 11 were added.  The number of test cases has grown
1387       from 116 to 175.  All programs are now in perl-5, no more shell scripts
1388       or perl-4.  All programs now have manual pages.
1389
1390       Although this is a major step forward, I still expect to rename "jdb"
1391       to "fsdb".
1392
1393       ENHANCEMENT
1394           shifting more old programs to Perl modules.  New in 2.7:
1395           dbcolscorellate.  dbcolsregression.  cgi_to_db.  dbfilevalidate.
1396           db_to_csv.  csv_to_db, db_to_html_table, kitrace_to_db,
1397           tcpdump_to_db, tabdelim_to_db, ns_to_db.
1398
1399       INCOMPATIBLE CHANGE
1400           The following programs have been dropped from fsdb-2.x: db2dcliff,
1401           dbcolmultiscale, crl_to_db.  ipchain_logs_to_db.  They may come
1402           back, but seemed overly specialized.  The following program
1403           dbrowsplituniq was dropped because it is superseded by dbmapreduce.
1404           dmalloc_to_db was dropped pending a test cases and examples.
1405
1406       ENHANCEMENT
1407           dbfilevalidate now has a "-c" option to correct errors.
1408
1409       NEW html_table_to_db provides the inverse of db_to_html_table.
1410
1411   2.8,  5-Aug-08
1412       Change header format, preserving forwards compatibility.
1413
1414       BUG FIX
1415           Complete editing pass over the manual, making sure it aligns with
1416           fsdb-2.x.
1417
1418       SEMI-COMPATIBLE CHANGE
1419           The header of fsdb files has changed, it is now #fsdb, not #h (or
1420           #L) and parsing of -F and -R are also different.  See dbfilealter
1421           for the new specification.  The v1 file format will be read,
1422           compatibly, but not written.
1423
1424       BUG FIX
1425           dbmapreduce now tolerates comments that precede the first key,
1426           instead of failing with an error message.
1427
1428   2.9, 6-Aug-08
1429       Still in beta; just a quick bug-fix for dbmapreduce.
1430
1431       ENHANCEMENT
1432           dbmapreduce now generates plausible output when given no rows of
1433           input.
1434
1435   2.10, 23-Sep-08
1436       Still in beta, but picking up some bug fixes.
1437
1438       ENHANCEMENT
1439           dbmapreduce now generates plausible output when given no rows of
1440           input.
1441
1442       ENHANCEMENT
1443           dbroweval the warnings option was backwards; now corrected.  As a
1444           result, warnings in user code now default off (like in fsdb-1.x).
1445
1446       BUG FIX
1447           dbcolpercentile now defaults to assuming the target column is
1448           numeric.  The new option "-N" allows selection of a non-numeric
1449           target.
1450
1451       BUG FIX
1452           dbcolscorrelate now includes "--sample" and "--nosample" options to
1453           compute the sample or full population correlation coefficients.
1454           Thanks to Xue Cai for finding this bug.
1455
1456   2.11, 14-Oct-08
1457       Still in beta, but picking up some bug fixes.
1458
1459       ENHANCEMENT
1460           html_table_to_db is now more aggressive about filling in empty
1461           cells with the official empty value, rather than leaving them blank
1462           or as whitespace.
1463
1464       ENHANCEMENT
1465           dbpipeline now catches failures during pipeline element setup and
1466           exits reasonably gracefully.
1467
1468       BUG FIX
1469           dbsubprocess now reaps child processes, thus avoiding running out
1470           of processes when used a lot.
1471
1472   2.12, 16-Oct-08
1473       Finally, a full (non-beta) 2.x release!
1474
1475       INCOMPATIBLE CHANGE
1476           Jdb has been renamed Fsdb, the flatfile-streaming database.  This
1477           change affects all internal Perl APIs, but no shell command-level
1478           APIs.  While Jdb served well for more than ten years, it is easily
1479           confused with the Java debugger (even though Jdb was there first!).
1480           It also is too generic to work well in web search engines.
1481           Finally, Jdb stands for ``John's database'', and we're a bit beyond
1482           that.  (However, some call me the ``file-system guy'', so one could
1483           argue it retains that meeting.)
1484
1485           If you just used the shell commands, this change should not affect
1486           you.  If you used the Perl-level libraries directly in your code,
1487           you should be able to rename "Jdb" to "Fsdb" to move to 2.12.
1488
1489           The jdb-announce list not yet been renamed, but it will be shortly.
1490
1491           With this release I've accomplished everything I wanted to in
1492           fsdb-2.x.  I therefore expect to return to boring, bugfix releases.
1493
1494   2.13, 30-Oct-08
1495       BUG FIX
1496           dbrowaccumulate now treats non-numeric data as zero by default.
1497
1498       BUG FIX
1499           Fixed a perl-5.10ism in dbmapreduce that breaks that program under
1500           5.8.  Thanks to Martin Lukac for reporting the bug.
1501
1502   2.14, 26-Nov-08
1503       BUG FIX
1504           Improved documentation for dbmapreduce's "-f" option.
1505
1506       ENHANCEMENT
1507           dbcolmovingstats how computes a moving standard deviation in
1508           addition to a moving mean.
1509
1510   2.15, 13-Apr-09
1511       BUG FIX
1512           Fix a make install bug reported by Shalindra Fernando.
1513
1514   2.16, 14-Apr-09
1515       BUG FIX
1516           Another minor release bug: on some systems programize_module looses
1517           executable permissions.  Again reported by Shalindra Fernando.
1518
1519   2.17, 25-Jun-09
1520       TYPO FIXES
1521           Typo in the dbroweval manual fixed.
1522
1523       IMPROVEMENT
1524           There is no longer a comment line to label columns in dbcolneaten,
1525           instead the header line is tweaked to line up.  This change
1526           restores the Jdb-1.x behavior, and means that repeated runs of
1527           dbcolneaten no longer add comment lines each time.
1528
1529       BUG FIX
1530           It turns out  dbcolneaten was not correctly handling trailing
1531           spaces when given the "-E" option to suppress them.  This
1532           regression is now fixed.
1533
1534       EXTENSION
1535           dbroweval(1) can now handle direct references to the last row via
1536           $lfref, a dubious but now documented feature.
1537
1538       BUG FIXES
1539           Separators set with "-C" in dbcolmerge and dbcolsplittocols were
1540           not properly setting the heading, and null fields were not
1541           recognized.  The first bug was reported by Martin Lukac.
1542
1543   2.18,  1-Jul-09  A minor release
1544       IMPROVEMENT
1545           Documentation for Fsdb::IO::Reader has been improved.
1546
1547       IMPROVEMENT
1548           The package should now be PGP-signed.
1549
1550   2.19,  10-Jul-09
1551       BUG FIX
1552           Internal improvements to debugging output and robustness of
1553           dbmapreduce and dbpipeline.  TEST/dbpipeline_first_fails.cmd re-
1554           enabled.
1555
1556   2.20, 30-Nov-09 (A collection of minor bugfixes, plus a build against
1557       Fedora 12.)
1558       BUG FIX
1559           Loging for dbmapreduce with code refs is now stable (it no longer
1560           includes a hex pointer to the code reference).
1561
1562       BUG FIX
1563           Better handling of mixed blank lines in Fsdb::IO::Reader (see test
1564           case dbcolize_blank_lines.cmd).
1565
1566       BUG FIX
1567           html_table_to_db now handles multi-line input better, and handles
1568           tables with COLSPAN.
1569
1570       BUG FIX
1571           dbpipeline now cleans up threads in an "eval" to prevent "cannot
1572           detach a joined thread" errors that popped up in perl-5.10.
1573           Hopefully this prevents a race condition that causes the test
1574           suites to hang about 20% of the time (in dbpipeline_first_fails).
1575
1576       IMPROVEMENT
1577           dbmapreduce now detects and correctly fails when the input and
1578           reducer have incompatible field separators.
1579
1580       IMPROVEMENT
1581           dbcolstats, dbcolhisto, dbcolscorrelate, dbcolsregression, and
1582           dbrowcount now all take an "-F" option to let one specify the
1583           output field separator (so they work better with dbmapreduce).
1584
1585       BUG FIX
1586           An omitted "-k" from the manual page of dbmultistats is now there.
1587           Bug reported by Unkyu Park.
1588
1589   2.21, 17-Apr-10 bug fix release
1590       BUG FIX
1591           Fsdb::IO::Writer now no longer fails with -outputheader => never
1592           (an obscure bug).
1593
1594       IMPROVEMENT
1595           Fsdb (in the warnings section) and dbcolstats now more carefully
1596           document how they handle (and do not handle) numerical precision
1597           problems, and other general limits.  Thanks to Yuri Pradkin for
1598           prompting this documentation.
1599
1600       IMPROVEMENT
1601           "Fsdb::Support::fullname_to_sortkey" is now restored from "Jdb".
1602
1603       IMPROVEMENT
1604           Documention for multiple styles of input approaches (including
1605           performance description) added to Fsdb::IO.
1606
1607   2.22, 2010-10-31 One new tool dbcolcopylast and several bug fixes for Perl
1608       5.10.
1609       BUG FIX
1610           dbmerge now correctly handles n-way merges.  Bug reported by Yuri
1611           Pradkin.
1612
1613       INCOMPARABLE CHANGE
1614           dbcolneaten now defaults to not padding the last column.
1615
1616       ADDITION
1617           dbrowenumerate now takes -N NewColumn to give the new column a name
1618           other than "count".  Feature requested by Mike Rouch in January
1619           2005.
1620
1621       ADDITION
1622           New program dbcolcopylast copies the last value of a column into a
1623           new column copylast_column of the next row.  New program requested
1624           by Fabio Silva; useful for converting dbmultistats output into
1625           dbrvstatdiff input.
1626
1627       BUG FIX
1628           Several tools (particularly dbmapreduce and dbmultistats) would
1629           report errors like "Unbalanced string table refcount: (1) for
1630           "STDOUT" during global destruction" on exit, at least on certain
1631           versions of Perl (for me on 5.10.1), but similar errors have been
1632           off-and-on for several Perl releases.  Although I think my code
1633           looked OK, I worked around this problem with a different way of
1634           handling standard IO redirection.
1635
1636   2.23, 2011-03-10 Several small portability bugfixes; improved dbcolstats
1637       for large datasets
1638       IMPROVEMENT
1639           Documentation to dbrvstatdiff was changed to use "sd" to refer to
1640           standard deviation, not "ss" (which might be confused with sum-of-
1641           squares).
1642
1643       BUG FIX
1644           This documentation about dbmultistats was missing the -k option in
1645           some cases.
1646
1647       BUG FIX
1648           dbmapreduce was failing on MacOS-10.6.3 for some tests with the
1649           error
1650
1651               dbmapreduce: cannot run external dbmapreduce reduce program (perl TEST/dbmapreduce_external_with_key.pl)
1652
1653           The problem seemed to be only in the error, not in operation.  On
1654           MacOS, the error is now suppressed.  Thanks to Alefiya Hussain for
1655           providing access to a Mac system that allowed debugging of this
1656           problem.
1657
1658       IMPROVEMENT
1659           The csv_to_db command requires an external Perl library
1660           (Text::CSV_XS).  On computers that lack this optional library,
1661           previously Fsdb would configure with a warning and then test cases
1662           would fail.  Now those test cases are skipped with an additional
1663           warning.
1664
1665       BUG FIX
1666           The test suite now supports alternative valid output, as a hack to
1667           account for last-digit floating point differences.  (Not very
1668           satisfying :-(
1669
1670       BUG FIX
1671           dbcolstats output for confidence intervals on very large datasets
1672           has changed.  Previously it failed for more than 2^31-1 records,
1673           and handling of T-Distributions with thousands of rows was a bit
1674           dubious.  Now datasets with more than 10000 are considered
1675           infinitely large and hopefully correctly handled.
1676
1677   2.24, 2011-04-15 Improvements to fix an old bug in dbmapreduce with
1678       different field separators
1679       IMPROVEMENT
1680           The dbfilealter command had a "--correct" option to work-around
1681           from incompatible field-separators, but it did nothing.  Now it
1682           does the correct but sad, data-loosing thing.
1683
1684       IMPROVEMENT
1685           The dbmultistats command previously failed with an error message
1686           when invoked on input with a non-default field separator.  The root
1687           cause was the underlying dbmapreduce that did not handle the case
1688           of reducers that generated output with a different field separator
1689           than the input.  We now detect and repair incompatible field
1690           separators.  This change corrects a problem originally documented
1691           and detected in Fsdb-2.20.  Bug re-reported by Unkyu Park.
1692
1693   2.25, 2011-08-07 Two new tools, xml_to_db and dbfilepivot, and a bugfix for
1694       two people.
1695       IMPROVEMENT
1696           kitrace_to_db now supports a --utc option, which also fixes this
1697           test case for users outside of the Pacific time zone.  Bug reported
1698           by David Graff, and also by Peter Desnoyers (within a week of each
1699           other :-)
1700
1701       NEW xml_to_db can convert simple, very regular XML files into Fsdb.
1702
1703       NEW dbfilepivot "pivots" a file, converting multiple rows corresponding
1704           to the same entity into a single row with multiple columns.
1705
1706   2.26, 2011-12-12 Bug fixes, particularly for perl-5.14.2.
1707       BUG FIX
1708           Bugs fixed in Fsdb::IO::Reader(3) manual page.
1709
1710       BUG FIX
1711           Fixed problems where dbcolstats was truncating floating point
1712           numbers when sorting.  This strange behavior happens as of
1713           perl-5.14.2 and it seems like a Perl bug.  I've worked around it
1714           for the test suites, but I'm a bit nervous.
1715
1716   2.27, 2012-11-15 Accumulated bug fixes.
1717       IMPROVEMENT
1718           csv_to_db now reports errors in CVS input with real diagnostics.
1719
1720       IMPROVEMENT
1721           dbcolmovingstats can now compute median, when given the "-m"
1722           option.
1723
1724       BUG FIX
1725           dbcolmovingstats non-numeric handling (the "-a" option) now works
1726           properly.
1727
1728       DOCUMENTATION
1729           The internal t/test_command.t test framework is now documented.
1730
1731       BUG FIX
1732           dbrowuniq now correctly handles the case where there is no input
1733           (previously it output a blank line, which is a malformed fsdb
1734           file).  Thanks to Yuri Pradkin for reporting this bug.
1735
1736   2.28, 2012-11-15 A quick release to fix most rpmlint errors.
1737       BUG FIX
1738           Fixed a number of minor release problems (wrong permissions, old
1739           FSF address, etc.) found by rpmlint.
1740
1741   2.29, 2012-11-20 a quick release for CPAN testing
1742       IMPROVEMENT
1743           Tweaked the RPM spec.
1744
1745       IMPROVEMENT
1746           Modified Makefile.PL to fail gracefully on Perl installations that
1747           lack threads.  (Without this fix, I get massive failures in the
1748           non-ithreads test system.)
1749
1750   2.30, 2012-11-25 improvements to perl portability
1751       BUG FIX
1752           Removed unicode character in documention of dbcolscorrelated so pod
1753           tests will pass.  (Sigh, that should work :-( )
1754
1755       BUG FIX
1756           Fixed test suite failures on 5 tests (dbcolcreate_double_creation
1757           was the first) due to Carp's addition of a period.  This problem
1758           was breaking Fsdb on perl-5.17.  Thanks to Michael McQuaid for
1759           helping diagnose this problem.
1760
1761       IMPROVEMENT
1762           The test suite now prints out the names of tests it tries.
1763
1764   2.31, 2012-11-28 A release with actual improvements to dbfilepivot and
1765       dbrowuniq.
1766       BUG FIX
1767           Documentation fixes: typos in dbcolscorrelated, bugs in
1768           dbfilepivot, clarification for comment handling in
1769           Fsdb::IO::Reader.
1770
1771       IMPROVEMENT
1772           Previously dbfilepivot assumed the input was grouped by keys and
1773           didn't very that pre-condition.  Now there is no pre-condition (it
1774           will sort the input by default), and it checks if the invariant is
1775           violated.
1776
1777       BUG FIX
1778           Previously dbfilepivot failed if the input had comments (oops :-);
1779           no longer.
1780
1781       IMPROVEMENT
1782           Now dbrowuniq has the "-L" option to preserve the last unique row
1783           (instead of the first), a common idiom.
1784
1785   2.32, 2012-12-21 Test suites should now be more numerically robust.
1786       NEW New dbfilediff does fsdb-aware file differencing.  It does not do
1787           smart intuition of add/removes like Unix diff(1), but it does know
1788           about columns, and with "-E", it does numeric-aware differences.
1789
1790       IMPROVEMENT
1791           Test suites that are numeric now use dbfilediff to do numeric-aware
1792           comparisons, so the test suite should now be robust to slightly
1793           different computers and operating systems and compilers than
1794           exactly what I use.
1795
1796   2.33, 2012-12-23 Minor fixes to some test cases.
1797       IMPROVEMENT
1798           dbfilediff and dbrowuniq now supports the "-N" option to give the
1799           new column a different name.  (And a test cases where this
1800           duplication mattered have been fixed.)
1801
1802       IMPROVEMENT
1803           dbrvstatdiff now show the t-test breakpoint with a reasonable
1804           number of floating point digits.
1805
1806       BUG FIX
1807           Fixed a numerical stability problem in the dbroweval_last test
1808           case.
1809

WHAT'S NEW

1811   2.34, 2013-02-10 Parallelism in dbmerge.
1812       IMPROVEMENT
1813           Documention for dbjoin now includes resource requirements.
1814
1815       IMPROVEMENT
1816           Default memory usage for dbsort is now about 256MB.  (The world
1817           keeps moving forward.)
1818
1819       IMPROVEMENT
1820           dbmerge now does merging in parallel.  As a side-effect, dbsort
1821           should be faster when input overflows memory.  The level of
1822           parallelism can be limited with the "--parallelism" option.  (There
1823           is more work to do here, but we're off to a start.)
1824
1825   2.35, 2013-02-23 Improvements to dbmerge parallelism
1826       BUG FIX
1827           Fsdb temporary files are now created more securely (with
1828           File::Temp).
1829
1830       IMPROVEMENT
1831           Programs that sort or merge on fields (dbmerge2, dbmerge, dbsort,
1832           dbjoin) now report an error if no fields on which to join or merge
1833           are given.
1834
1835       IMPROVEMENT
1836           Parallelism in dbmerge is should now be more consistent, with less
1837           starting and stopping.
1838
1839       IMPROVEMENT In dbmerge, the "--xargs" option lets one give input
1840       filenames on standard input, rather than the command line. This feature
1841       paves the way for faster dbsort for large inputs (by pipelining sorting
1842       and merging), expected in the next release.
1843
1844   2.36, 2013-02-25 dbsort pipelines with dbmerge
1845       IMPROVEMENT For large inputs, dbsort now pipelines sorting and merging,
1846       allowing earlier processing.
1847       BUG FIX Since 2.35, dbmerge delayed cleanup of intermediate files,
1848       thereby requiring extra disk space.
1849
1850   2.37, 2013-02-26 quick bugfix to support parallel sort and merge from
1851       recent releases
1852       BUG FIX Since 2.35, dbmerge delayed removal of input files given by
1853       "--xargs".  This problem is now fixed.
1854
1855   2.38, 2013-04-29 minor bug fixes
1856       CLARIFICATION
1857           Configure now rejects Windows since tests seem to hang on some
1858           versions of Windows.  (I would love help from a Windows developer
1859           to get this problem fixed, but I cannot do it.)  See
1860           https://rt.cpan.org/Ticket/Display.html?id=84201.
1861
1862       IMPROVEMENT
1863           All programs that use temporary files (dbcolpercentile,
1864           dbcolscorrelate, dbcolstats, dbcolstatscores) now take the "-T"
1865           option and set the temporary directory consistently.
1866
1867           In addition, error messages are better when the temporary directory
1868           has problems.  Problem reported by Liang Zhu.
1869
1870       BUG FIX
1871           dbmapreduce was failing with external, map-reduce aware reducers
1872           (when invoked with -M and an external program).  (Sigh, did this
1873           case ever work?)  This case should now work.  Thanks to Yuri
1874           Pradkin for reporting this bug (in 2011).
1875
1876       BUG FIX
1877           Fixed perl-5.10 problem with dbmerge.  Thanks to Yuri Pradkin for
1878           reporting this bug (in 2013).
1879
1880   2.39, date 2013-05-31 quick release for the dbrowuniq extension
1881       BUG FIX
1882           Actually in 2.38, the Fedora .spec got cleaner dependencies.
1883           Suggestion from Christopher Meng via
1884           <https://bugzilla.redhat.com/show_bug.cgi?id=877096>.
1885
1886       ENHANCEMENT
1887           Fsdb files are now explicitly set into UTF-8 encoding, unless one
1888           specifies "-encoding" to "Fsdb::IO".
1889
1890       ENHANCEMENT
1891           dbrowuniq now supports "-I" for incremental counting.
1892
1893   2.40, 2013-07-13 small bug fixes
1894       BUG FIX
1895           dbsort now has more respect for a user-given temporary directory;
1896           it no longer is ignored for merging.
1897
1898       IMPROVEMENT
1899           dbrowuniq now has options to output the first, last, and both first
1900           and last rows of a run ("-F", "-L", and "-B").
1901
1902       BUG FIX
1903           dbrowuniq now correctly handles "-N".  Sigh, it didn't work before.
1904
1905   2.41, 2013-07-29 small bug and packaging fixes
1906       ENHANCEMENT
1907           Documentation to dbrvstatdiff improved (inspired by questions from
1908           Qian Kun).
1909
1910       BUG FIX
1911           dbrowuniq no longer duplicates singleton unique lines when
1912           outputting both (with "-B").
1913
1914       BUG FIX
1915           Add missing "XML::Simple" dependency to Makefile.PL.
1916
1917       ENHANCEMENT
1918           Tests now show the diff of the failing output if run with "make
1919           test TEST_VERBOSE=1".
1920
1921       ENHANCEMENT
1922           dbroweval now includes documentation for how to output extra rows.
1923           Suggestion from Yuri Pradkin.
1924
1925       BUG FIX
1926           Several improvements to the Fedora package from Michael Schwendt
1927           via <https://bugzilla.redhat.com/show_bug.cgi?id=877096>, and from
1928           the harsh master that is rpmlint.  (I am stymied at teaching it
1929           that "outliers" is spelled correctly.  Maybe I should send it
1930           Schneier's book.  And an unresolvable invalid-spec-name lurks in
1931           the SRPM.)
1932
1933   2.42, 2013-07-31 A bug fix and packaging release.
1934       ENHANCEMENT
1935           Documentation to dbjoin improved to better memory usage.  (Based on
1936           problem report by Lin Quan.)
1937
1938       BUG FIX
1939           The .spec is now perl-Fsdb.spec to satisfy rpmlint.  Thanks to
1940           Christopher Meng for a specific bug report.
1941
1942       BUG FIX
1943           Test dbroweval_last.cmd no longer has a column that caused failures
1944           because of numerical instability.
1945
1946       BUG FIX
1947           Some tests now better handle bugs in old versions of perl (5.10,
1948           5.12).  Thanks to Calvin Ardi for help debugging this on a Mac with
1949           perl-5.12, but the fix should affect other platforms.
1950
1951   2.43, 2013-08-27 Adds in-file compression.
1952       BUG FIX
1953           Changed the sort on TEST/dbsort_merge.cmd to strings (from
1954           numerics) so we're less susceptible to false test-failures due to
1955           floating point IO differences.
1956
1957       EXPERIMENTAL ENHANCEMENT
1958           Yet more parallelism in dbmerge: new "endgame-mode" builds a merge
1959           tree of processes at the end of large merge tasks to get maximally
1960           parallelism.  Currently this feature is off by default because it
1961           can hang for some inputs.  Enable this experimental feature with
1962           "--endgame".
1963
1964       ENHANCEMENT
1965           "Fsdb::IO" now handles being given "IO::Pipe" objects (as exercised
1966           by dbmerge).
1967
1968       BUG FIX
1969           Handling of NamedTmpfiles now supports concurrency.  This fix will
1970           hopefully fix occasional "Use of uninitialized value $_ in string
1971           ne at ...NamedTmpfile.pm line 93."  errors.
1972
1973       BUG FIX
1974           Fsdb now requires perl 5.10.  This is a bug fix because some test
1975           cases used to require it, but this fact was not properly
1976           documented.  (Back-porting to 5.008 would require removing all "//"
1977           operators.)
1978
1979       ENHANCEMENT
1980           Fsdb now handles automatic compression of file contents.  Enable
1981           compression with "dbfilealter -Z xz" (or "gz" or "bz2").  All
1982           programs should operate on compressed files and leave the output
1983           with the same level of compression.  "xz" is recommended as fastest
1984           and most efficient.  "gz" is produces unrepeatable output (and so
1985           has no output test), it seems to insist on adding a timestamp.
1986
1987   2.44, 2013-10-02 A major change--all threads are gone.
1988       ENHANCEMENT
1989           Fsdb is now thread free and only uses processes for parallelism.
1990           This change is a big change--the entire motivation for Fsdb-2 was
1991           to exploit parallelism via threading.  Parallelism--good, but perl
1992           threading--bad for performance.  Horribly bad for performance.
1993           About 20x worse than pipes on my box.  (See perl bug #119445 for
1994           the discussion.)
1995
1996       NEW "Fsdb::Support::Freds" provides a thread-like abstraction over
1997           forking, with some nice support for callbacks in the parent upon
1998           child termination.
1999
2000       ENHANCEMENT
2001           Details about removing threads: "dbpipeline" is thread free, and
2002           new tests to verify each of its parts.  The easy cases are
2003           "dbcolpercentile", "dbcolstats", "dbfilepivot", "dbjoin", and
2004           "dbcolstatscores", each of which use it in simple ways
2005           (2013-09-09).  "dbmerge" is now thread free (2013-09-13), but was a
2006           significant rewrite, which brought "dbsort" along.  "dbmapreduce"
2007           is partly thread free (2013-09-21), again as a rewrite, and it
2008           brings "dbmultistats" along.  Full "dbmapreduce" support took much
2009           longer (2013-10-02).
2010
2011       BUG FIX
2012           When running with user-only output ("-n"), dbroweval now resets the
2013           output vector $ofref after it has been output.
2014
2015       NEW dbcolcreate will create all columns at the head of each row with
2016           the "--first" option.
2017
2018       NEW dbfilecat will concatenate two files, verifying that they have the
2019           same schema.
2020
2021       ENHANCEMENT
2022           dbmapreduce now passes comments through, rather than eating them as
2023           before.
2024
2025           Also, dbmapreduce now supports a "--" option to prevent
2026           misinterpreting sub-program parameters as for dbmapreduce.
2027
2028       INCOMPATIBLE CHANGE
2029           dbmapreduce no longer figures out if it needs to add the key to the
2030           output.  For multi-key-aware reducers, it never does (and cannot).
2031           For non-multi-key-aware reducers, it defaults to add the key and
2032           will now fail if the reducer adds the key (with error "dbcolcreate:
2033           attempt to create pre-existing column...").  In such cases, one
2034           must disable adding the key with the new option "--no-prepend-key".
2035
2036       INCOMPATIBLE CHANGE
2037           dbmapreduce no longer copies the input field separator by default.
2038           For multi-key-aware reducers, it never does (and cannot).  For non-
2039           multi-key-aware reducers, it defaults to not copying the field
2040           separator, but it will copy it (the old default) with the
2041           "--copy-fs" option
2042
2043   2.45, 2013-10-07 cleanup from de-thread-ification
2044       BUG FIX
2045           Corrected a fast busy-wait in dbmerge.
2046
2047       ENHANCEMENT
2048           Endgame mode enabled in dbmerge; it (and also large cases of
2049           dbsort) should now exploit greater parallelism.
2050
2051       BUG FIX
2052           Test case with "Fsdb::BoundedQueue" (gone since 2.44) now removed.
2053
2054   2.46, 2013-10-08 continuing cleanup of our no-threads version
2055       BUG FIX
2056           Fixed some packaging details.  (Really, threads are no longer
2057           required, missing tests in the MANIFEST.)
2058
2059       IMPROVEMENT
2060           dbsort now better communicates with the merge process to avoid
2061           bursty parallelism.
2062
2063           Fsdb::IO::Writer now can take "-autoflush =" 1> for line-buffered
2064           IO.
2065
2066   2.47, 2013-10-12 test suite cleanup for non-threaded perls
2067       BUG FIX
2068           Removed some stray "use threads" in some test cases.  We didn't
2069           need them, and these were breaking non-threaded perls.
2070
2071       BUG FIX
2072           Better handling of Fred cleanup; should fix intermittent
2073           dbmapreduce failures on BSD.
2074
2075       ENHANCEMENT
2076           Improved test framework to show output when tests fail.  (This
2077           time, for real.)
2078
2079   2.48, 2014-01-03 small bugfixes and improved release engineering
2080       ENHANCEMENT
2081           Test suites now skip tests for libraries that are missing.  (Patch
2082           for missing "IO::Compresss:Xz" contributed by Calvin Ardi.)
2083
2084       ENHANCEMENT
2085           Removed references to Jdb in the package specification.  Since the
2086           name was changed in 2008, there's no longer a huge need for
2087           backwards compatibility.  (Suggestion form Petr Šabata.)
2088
2089       ENHANCEMENT
2090           Test suites now invoke the perl using the path from
2091           $Config{perlpath}.  Hopefully this helps testing in environments
2092           where there are multiple installed perls and the default perl is
2093           not the same as the perl-under-test (as happens in
2094           cpantesters.org).
2095
2096       BUG FIX
2097           Added specific encoding to this manpage to account for Unicode.
2098           Required to build correctly against perl-5.18.
2099
2100   2.49, 2014-01-04 bugfix to unicode handling in Fsdb IO (plus minor
2101       packaging fixes)
2102       BUG FIX
2103           Restored a line in the .spec to chmod g-s.
2104
2105       BUG FIX
2106           Unicode decoding is now handled correctly for programs that read
2107           from standard input.  (Also: New test scripts cover unicode input
2108           and output.)
2109
2110       BUG FIX
2111           Fix to Fsdb documentation encoding line.  Addresses test failure in
2112           perl-5.16 and earlier.  (Who knew "encoding" had to be followed by
2113           a blank line.)
2114

WHAT'S NEW

2116   2.50, 2014-05-27 a quick release for spec tweaks
2117       ENHANCEMENT
2118           In dbroweval, the "-N" (no output, even comments) option now
2119           implies "-n", and it now suppresses the header and trailer.
2120
2121       BUG FIX
2122           A few more tweaks to the perl-Fsdb.spec from Petr Šabata.
2123
2124       BUG FIX
2125           Fixed 3 uses of "use v5.10" in test suites that were causing test
2126           failures (due to warnings, not real failures) on some platforms.
2127
2128   2.51, 2014-09-05 Feature enhancements to dbcolmovingstats, dbcolcreate,
2129       dbmapreduce, and new sqlselect_to_db
2130       ENHANCEMENT
2131           dbcolcreate now has a "--no-recreate-fatal" that causes it to
2132           ignore creation of existing columns (instead of failing).
2133
2134       ENHANCEMENT
2135           dbmapreduce once again is robust to reducers that output the key;
2136           "--no-prepend-key" is no longer mandatory.
2137
2138       ENHANCEMENT
2139           dbcolsplittorows can now enumerate the output rows with "-E".
2140
2141       BUG FIX
2142           dbcolmovingstats is more mathematically robust.  Previously for
2143           some inputs and some platforms, floating point rounding could
2144           sometimes cause squareroots of negative numbers.
2145
2146       NEW sqlselect_to_db converts the output of the MySQL or MarinaDB select
2147           comment into fsdb format.
2148
2149       INCOMPATIBLE CHANGE
2150           dbfilediff now outputs the second row when doing sloppy numeric
2151           comparisons, to better support test suites.
2152
2153   2.52, 2014-11-03 Fixing the test suite for line number changes.
2154       ENHANCEMENT
2155           Test suites changes to be robust to exact line numbers of failures,
2156           since different Perl releases fail on different lines.
2157           <https://bugzilla.redhat.com/show_bug.cgi?id=1158380>
2158
2159   2.53, 2014-11-26 bug fixes and stability improvements to dbmapreduce
2160       ENHANCEMENT
2161           The dbfilediff how supports a "--quiet" option.
2162
2163       ENHANCEMENT
2164           Better documention of dbpipeline_filter.
2165
2166       BUGFIX
2167           Added groff-base and perl-podlators to the Fedora package spec.
2168           Fixes <https://bugzilla.redhat.com/show_bug.cgi?id=1163149>.  (Also
2169           in package 2.52-2.)
2170
2171       BUGFIX
2172           An important stability improvement to dbmapreduce.  It, plus
2173           dbmultistats, and dbcolstats now support controlled parallelism
2174           with the "--pararallelism=N" option.  They default to run with the
2175           number of available CPUs.  dbmapreduce also moderates its level of
2176           parallelism.  Previously it would create reducers as needed,
2177           causing CPU thrashing if reducers ran much slower than data
2178           production.
2179
2180       BUGFIX
2181           The combination of dbmapreduce with dbrowenumerate now works as it
2182           should.  (The obscure bug was an interaction with dbcolcreate with
2183           non-multi-key reducers that output their own key.  dbmapreduce has
2184           too many useful corner cases.)
2185
2186   2.54, 2014-11-28 fix for the test suite to correct failing tests on not-my-
2187       platform
2188       BUGFIX
2189           Sigh, the test suite now has a test suite.  Because, yes, I broke
2190           it, causing many incorrect failures at cpantesters.  Now fixed.
2191
2192   2.55, 2015-01-05 many spelling fixes and dbcolmovingstats tests are more
2193       robust to different numeric precision
2194       ENHANCEMENT
2195           dbfilediff now can be extra quiet, as I continue to try to track
2196           down a numeric difference on FreeBSD AMD boxes.
2197
2198       ENHANCEMENT
2199           dbcolmovingstats gave different test output (just reflecting
2200           rounding error) when stddev approaches zero.  We now detect hand
2201           handle this case.  See
2202           <https://rt.cpan.org/Public/Bug/Display.html?id=101220> and thanks
2203           to H. Merijn Brand for the bug report.
2204
2205       BUG FIX
2206           Many, many spelling bugs found by H. Merijn Brand; thanks for the
2207           bug report.
2208
2209       INCOMPATBLE CHANGE
2210           A number of programs had misspelled "separator" in
2211           "--fieldseparator" and "--columnseparator" options as "seperator".
2212           These are now correctly spelled.
2213
2214   2.56, 2015-02-03 fix against Getopt::Long-2.43's stricter error checkign
2215       BUG FIX
2216           Internal argument parsing uses Getopt::Long, but mixed pass-through
2217           and <>.  Bug reported by Petr Pisar at
2218           <https://bugzilla.redhat.com/show_bug.cgi?id=1188538>.a
2219
2220       BUG FIX
2221           Added missing BuildRequires for "XML::Simple".
2222
2223   2.57, 2015-04-29 Minor changes, with better performance from dbmulitstats.
2224       BUG FIX
2225           dbfilecat now honors "--remove-inputs" (previously it didn't).
2226           This omission meant that dbmapreduce (and dbmultistats) would
2227           accumulate files in /tmp when running.  Bad news for inputs with 4M
2228           keys.
2229
2230       ENHANCMENT
2231           dbmultistats should be faster with lots of small keys.  dbcolstats
2232           now supports "-k" to get some of the functionality of dbmultistats
2233           (if data is pre-sorted and median/quartiles are not required).
2234
2235           dbfilecat now honors "--remove-inputs" (previously it didn't).
2236           This omission meant that dbmapreduce (and dbmultistats) would
2237           accumulate files in /tmp when running.  Bad news for inputs with 4M
2238           keys.
2239
2240   2.58, 2015-04-30 Bugfix in dbmerge
2241       BUG FIX
2242           Fixed a case where dbmerge suffered mojobake in endgame mode.  This
2243           bug surfaced when dbsort was applied to large files (big enough to
2244           require merging) with unicode in them; the symptom was soemthing
2245           like:
2246             Wide character in print at /usr/lib64/perl5/IO/Handle.pm line
2247           420, <GEN12> line 111.
2248
2249   2.59, 2016-09-01 Collect a few small bug fixes and documentation
2250       improvements.
2251       BUG FIX
2252           More IO is explicitly marked UTF-8 to avoid Perl's tendency to
2253           mojibake on otherwise valid unicode input.  This change helps
2254           html_table_to_db.
2255
2256       ENHANCEMENT
2257           dbcolscorrelate now crossreferences dbcolsregression.
2258
2259       ENHANCEMENT
2260           Documentation for dbrowdiff now clarifies that the default is
2261           baseline mode.
2262
2263       BUG FIX
2264           dbjoin now propagates "-T" into the sorting process (if it is
2265           required).  Thanks to Lan Wei for reporting this bug.
2266
2267   2.60, 2016-09-04 Adds support for hash joins.
2268       ENHANCEMENT
2269           dbjoin now supports hash joins with "-t lefthash" and "-t
2270           righthash".  Hash joins cache a table in memory, but do not require
2271           that the other table be sorted.  They are ideal when joining a
2272           large table against a small one.
2273
2274   2.61, 2016-09-05 Support left and right outer joins.
2275       ENHANCEMENT
2276           dbjoin now handles left and right outer joins with "-t left" and
2277           "-t right".
2278
2279       ENHANCEMENT
2280           dbjoin hash joins are now selected with "-m lefthash" and "-m
2281           righthash" (not the shortlived "-t righthash" option).
2282           (Technically this change is incompatible with Fsdd-2.60, but no one
2283           but me ever used that version.)
2284
2285   2.62, 2016-11-29 A new yaml_to_db and other minor improvements.
2286       ENHANCEMENT
2287           Documentation for xml_to_db now includes sample output.
2288
2289       NEW yaml_to_db converts a specific form of YAML to fsdb.
2290
2291       BUG FIX
2292           The test suite now uses "diff -c -b" rather than "diff -cb" to make
2293           OpenBSD-5.9 happier, I hope.
2294
2295       ENHANCEMENT
2296           Comments that log operations at the end of each file now do simple
2297           quoting of spaces.  (It is not guaranteed to be fully shell-
2298           compliant.)
2299
2300       ENHANCEMENT
2301           There is a new standard option, "--header", allowing one to specify
2302           an Fsdb header for inputs that lack it.  Currently it is supported
2303           by dbcoldefine, dbrowuniq, dbmapreduce, dbmultistats, dbsort,
2304           dbpipeline.
2305
2306       ENHANCEMENT
2307           dbfilepivot now allows the --possible-pivots option, and if it is
2308           provided processes the data in one pass.
2309
2310       ENHANCEMENT
2311           dbroweval logs are now quoted.
2312
2313   2.63, 2017-02-03 Re-add some features supposedly in 2.62 but not, and add
2314       more --header options.
2315       ENHANCEMENT
2316           The option -j is now a synonym for --parallelism.  (And several
2317           documention bugs about this option are fixed.)
2318
2319       ENHANCEMENT
2320           Additional support for "--header" in dbcolmerge, dbcol, dbrow, and
2321           dbroweval.
2322
2323       BUG FIX
2324           Version 2.62 was supposed to have this improvement, but did not
2325           (and now does): dbfilepivot now allows the --possible-pivots
2326           option, and if it is provided processes the data in one pass.
2327
2328       BUG FIX
2329           Version 2.62 was supposed to have this improvement, but did not
2330           (and now does): dbroweval logs are now quoted.
2331
2332   2.64, 2017-11-20 several small bugfixes and enhancements
2333       BUG FIX
2334           In dbroweval, the "next row" option previously did not correctly
2335           set up "_last_fieldname".  It now does.
2336
2337       ENHANCEMENT
2338           The csv_to_db converter now has an optional "-F x" option to set
2339           the field separator.
2340
2341       ENHANCEMENT
2342           Finally dbcolsplittocols has a "--header" option, and a new "-N"
2343           option to give the list of resulting output columns.
2344
2345       INCOMPATIBLE CHANGE
2346           Now dbcolstats and dbmultistats produce no output (but a schema)
2347           when given no input but a schema.  Previously they gave a null row
2348           of output.  The "--output-on-no-input" and
2349           "--no-output-on-no-input" options can control this behavior.
2350
2351   2.65, 2018-02-16 Minor release, bug fix and -F option.
2352       ENHANCEMENT
2353           dbmultistats and dbmapreduce now both take a "-F x" option to set
2354           the field separator.
2355
2356       BUG FIX
2357           Fixed missing "use Carp" in dbcolstats.  Also went back and cleaned
2358           up all uses of "croak()".  Thanks to Zefram for the bug report.
2359
2360   2.66, 2018-12-20 Critical bug fix in dbjoin.
2361       BUG FIX
2362           Removed old tests from MANIFEST.  (Thanks to Hang Guo for reporting
2363           this bug.)
2364
2365       IMPROVEMENT
2366           Errors for non-existing input files now include the bad filename
2367           (before: "cannot setup filehandle", now: "cannot open input: cannot
2368           open TEST/bad_filename").
2369
2370       BUG FIX
2371           Hash joins with three identical rows were failing with the
2372           assertion failure "internal error: confused about overflow" due to
2373           a now-fixed bug.
2374
2375   2.67, 2019-07-10 add support for reading and writing hdfs
2376       IMPROVEMENT
2377           dbformmail now has an "mh" mechanism that writes messages to
2378           individual files (an mh-style mailbox).
2379
2380       BUG FIX
2381           dbrow failed to include the Carp library, leading to fails on
2382           croak.
2383
2384       BUG FIX
2385           Fixed dbjoin error message for an unsorted right stream was
2386           incorrect (it said left).
2387
2388       IMPROVEMENT
2389           All Fsdb programs can now read from and write to HDFS, when files
2390           that start with "hdfs:" are given to -i and -o options.
2391
2392   2.68, 2019-09-19 All programs now support automatic decompression based on
2393       file extension.
2394       IMPROVEMENT
2395           The omitted-possible-error test case for dbfilepivot now has an
2396           altnerative output that I saw on some BSD-running systems (thanks
2397           to CPAN).
2398
2399       IMPROVEMENT
2400           dbmerge and dbmerge2 now support "--header".  dbmerge2 now gives
2401           better error messages when presented the wrong number of inputs.
2402
2403       BUG FIX
2404           dbsort now works with "--header" even when the file is big (due to
2405           fixes to dbmerge).
2406
2407       IMPROVEMENT
2408           cvs_to_db now processes data with the "binary" option, allowing it
2409           to handle newlines embedded in quoted fields.
2410
2411       IMPROVEMENT
2412           All programs now will transparently decompress input files, if they
2413           are listed as a filename as an input argument that extends with a
2414           standard extension (.gz, .bz2, and .xz).
2415
2416   2.69, 2019-11-22 a small bugfix in dbcolstats
2417       BUG FIX
2418           Filled in the the test case for autodecompress, which was missing
2419           for the 2.68 release.
2420
2421       ENHANCEMENT
2422           The groff program is required for build, and the "Makefile.PL"
2423           fails if groff is missing at build time.  Thanks to Chris Williams
2424           for suggesting this check, and the CPAN auto-building system for
2425           trying many platforms.
2426
2427       BUG FIX
2428           The dbcolstats program had numerical instability that sometimes
2429           results in failing with a square-root of a negative number when
2430           many values varied right at the edge of floating-point precision.
2431           We now detect and report that case as 0 stddev.  Thanks to Hang Guo
2432           for providing a test case.
2433
2434   2.70, 2020-11-12 Some small quality-of-life enhancements and corner-case
2435       bugfixes.
2436       ENHANCEMENT
2437           dbcol can now take an option "-a" to include all columns, allowing
2438           reordering of certain columns while passing the rest through.
2439
2440       ENHANCEMENT
2441           dbrowuniq and dbmerge now buffer comments in a way that the last
2442           row of data output is no longer in the last block of comments.
2443           (The data is identical, but for humans looking at output, this
2444           change makes it less likely to lose the last row.)
2445
2446       BUG FIX
2447           dbmultistats and dbpipeline documentation now indicates that they
2448           support "--header" (something they did since version 2.62 in
2449           2016-11-29, but now documented.
2450
2451       ENHANCEMENT
2452           dbcolcreate now supports "--header".
2453
2454       BUG FIX
2455           Fixed several spelling errors in deprecated programs and removed
2456           information about the no-longer existing FreeBSD and MacOS ports.
2457           Thanks to Calvin Ardi for the patch.
2458
2459       BUG FIX
2460           dbmerge now handles --xargs when only one file is provided (and
2461           passes the file through unchanged).  It also throws a clean error
2462           with --xargs if zero files are provided.  (To support dbmerge,
2463           dbcol now has an internal "--saveoutput" option.)  Thanks to Yuri
2464           Pradkin for reporting the unhandled corner-case.
2465
2466   2.71, 2020-11-16 Fix a race condition breaking test suites.
2467       BUG FIX
2468           Suppress a race condition in dbcolmerge was sometimes throwing the
2469           error "Fsdb::Support::Freds: ending, but running process:
2470           dbmerge:xargs" in the dbmerge_0_xargs test case, on exit.
2471
2472   2.72, 2020-12-01 A small bug and a packaging improvement.
2473       BUG FIX
2474           dbcolhisto now handles the degenerate case where everything has the
2475           same value (previously it would throw "illegal division by zero").
2476
2477       ENHANCEMENT
2478           The spec for Fedora now includes "make" as BuildRequires, something
2479           required for Fedora 34.
2480
2481   2.73, 2021-05-18 Updates dbcolpercentile with "--weighted", and with more
2482       ipv6.
2483       ENHANCEMENT
2484           dbcolpercentile now has a "--weighted" option.
2485
2486       ENHANCEMENT
2487           The new Fsdb::Support::IPv6 package includes ipv6_normalize,
2488           ipv6_zeroize to rewrite ipv6 print addresses in IPv6 normal form,
2489           with a 0 in each 4-nybble field.
2490
2491   2.74, 2021-06-23 More ipv6.
2492       ENHANCEMENT
2493           Fsdb::Support::IPv6 package includes ipv6_fullhex to rewrite ipv6
2494           print addresses as full, 128-bit hex values.
2495
2496   2.75, 2022-04-02 New type specifications in the schema to better support
2497       type conversions in python.
2498       ENHANCEMENT
2499           Add optional type specifications to the schema.  Types are not used
2500           in Perl, but are relevant in Python and Go Fsdb bindings.  Types
2501           use a subset of perl pack specifiers: c, s, l, q are signed 8, 16,
2502           32, and 64-bit integers, f is a float, d is double float, a is
2503           utf-8 string, and &gt; and &lt; can force big or little endianness.
2504           The default type for everything is "a", that is, utf-8 strings.
2505           Thanks to Wes Hardaker for pushing to get this long-desired feature
2506           out the door; his Python bindings need types.
2507
2508       ENHANCEMENT
2509           dbcol, dbcolcreate, dbcolcopylast, and dbcolrename now understand
2510           and propagate schema types.  dbsort, dbjoin, dbmerge, dbmerge2 and
2511           dbfilepivot all take a new option "-t" to sort by type-inferred
2512           comparision, if a type is given.
2513
2514       ENHANCEMENT
2515           dbcolstat, dbmultistats, and dbcolmovingstats now include type
2516           information in their output schema.  (They assumes input variables
2517           are floats, not integers.)
2518
2519       ENHANCEMENT
2520           Even more IPv6: the functions in Fsdb::Support::IPv6 package now
2521           support strings of hex digits as an alternate encoding for IP
2522           address (and they are already the output of ipv6_fullhex), and
2523           "ip_fullhex_to_normal" converts full hex-encoded IPv4 or IPv6
2524           addresses to their "normal" form (dotted-quad or IPv6 printable
2525           format).
2526

AUTHOR

2528       John Heidemann, "johnh@isi.edu"
2529
2530       See "Contributors" for the many people who have contributed bug reports
2531       and fixes.
2532

COPYRIGHT

2534       Fsdb is Copyright (C) 1991-2022 by John Heidemann <johnh@isi.edu>.
2535
2536       This program is free software; you can redistribute it and/or modify it
2537       under the terms of version 2 of the GNU General Public License as
2538       published by the Free Software Foundation.
2539
2540       This program is distributed in the hope that it will be useful, but
2541       WITHOUT ANY WARRANTY; without even the implied warranty of
2542       MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
2543       General Public License for more details.
2544
2545       You should have received a copy of the GNU General Public License along
2546       with this program; if not, write to the Free Software Foundation, Inc.,
2547       675 Mass Ave, Cambridge, MA 02139, USA.
2548
2549       A copy of the GNU General Public License can be found in the file
2550       ``COPYING''.
2551

COMMENTS and BUG REPORTS

2553       Any comments about these programs should be sent to John Heidemann
2554       "johnh@isi.edu".
2555
2556
2557
2558perl v5.34.1                      2022-04-04                           Fsdb(3)