Fsdb(3pm)

1Fsdb(3)               User Contributed Perl Documentation              Fsdb(3)
2
3
4

NAME

6       Fsdb - a flat-text database for shell scripting
7

SYNOPSIS

9       Fsdb, the flatfile streaming database is package of commands for
10       manipulating flat-ASCII databases from shell scripts.  Fsdb is useful
11       to process medium amounts of data (with very little data you'd do it by
12       hand, with megabytes you might want a real database).  Fsdb was known
13       as as Jdb from 1991 to Oct. 2008.
14
15       Fsdb is very good at doing things like:
16
17       •   extracting measurements from experimental output
18
19       •   examining data to address different hypotheses
20
21       •   joining data from different experiments
22
23       •   eliminating/detecting outliers
24
25       •   computing statistics on data (mean, confidence intervals,
26           correlations, histograms)
27
28       •   reformatting data for graphing programs
29
30       Fsdb is built around the idea of a flat text file as a database.  Fsdb
31       files (by convention, with the extension .fsdb), have a header
32       documenting the schema (what the columns mean), and then each line
33       represents a database record (or row).
34
35       For example:
36
37               #fsdb experiment duration
38               ufs_mab_sys 37.2
39               ufs_mab_sys 37.3
40               ufs_rcp_real 264.5
41               ufs_rcp_real 277.9
42
43       Is a simple file with four experiments (the rows), each with a
44       description, size parameter, and run time in the first, second, and
45       third columns.
46
47       Rather than hand-code scripts to do each special case, Fsdb provides
48       higher-level functions.  Although it's often easy throw together a
49       custom script to do any single task, I believe that there are several
50       advantages to using Fsdb:
51
52       •   these programs provide a higher level interface than plain Perl, so
53
54           **  Fewer lines of simpler code:
55
56                   dbrow '_experiment eq "ufs_mab_sys"' | dbcolstats duration
57
58               Picks out just one type of experiment and computes statistics
59               on it, rather than:
60
61                   while (<>) { split; $sum+=$F[1]; $ss+=$F[1]**2; $n++; }
62                   $mean = $sum / $n; $std_dev = ...
63
64               in dozens of places.
65
66       •   the library uses names for columns, so
67
68           **  No more $F[1], use "_duration".
69
70           **  New or different order columns?  No changes to your scripts!
71
72           Thus if your experiment gets more complicated with a size
73           parameter, so your log changes to:
74
75                   #fsdb experiment size duration
76                   ufs_mab_sys 1024 37.2
77                   ufs_mab_sys 1024 37.3
78                   ufs_rcp_real 1024 264.5
79                   ufs_rcp_real 1024 277.9
80                   ufs_mab_sys 2048 45.3
81                   ufs_mab_sys 2048 44.2
82
83           Then the previous scripts still work, even though duration is now
84           the third column, not the second.
85
86       •   A series of actions are self-documenting (the provenance of
87           processsing done to produce each output is recorded in comments).
88
89           **  No more wondering what hacks were used to compute the final
90               data, just look at the comments at the end of the output.
91
92           For example, the commands
93
94               dbrow '_experiment eq "ufs_mab_sys"' | dbcolstats duration
95
96           add to the end of the output the lines
97               #    | dbrow _experiment eq "ufs_mab_sys"
98               #    | dbcolstats duration
99
100       •   The library is mature, supporting large datasets (more than 100GB),
101           parallelism, corner cases, error handling, backed by an automated
102           test suite.
103
104           **  No more puzzling about bad output because your custom script
105               skimped on error checking.
106
107           **  No more memory thrashing when you try to sort ten million
108               records.
109
110           **  Makes use of multiple cores in your computer when it can,
111               because each pipeline component runs in parallel, and because
112               key tools (dbsort, dbmapreduce) run in parlallel when possible.
113
114       •   Fsdb-2.x supports Perl scripting (in addition to shell scripting),
115           with libraries to do Fsdb input and output, and easy support for
116           pipelines.  The shell script
117
118               dbcol name test1 | dbroweval '_test1 += 5;'
119
120           can be written in perl as:
121
122               dbpipeline(dbcol(qw(name test1)), dbroweval('_test1 += 5;'));
123
124       (The disadvantage is that you need to learn what functions Fsdb
125       provides.)
126
127       Fsdb is built on flat-ASCII databases.  By storing data in simple text
128       files and processing it with pipelines it is easy to experiment (in the
129       shell) and look at the output.  To the best of my knowledge, the
130       original implementation of this idea was "/rdb", a commercial product
131       described in the book UNIX relational database management: application
132       development in the UNIX environment by Rod Manis, Evan Schaffer, and
133       Robert Jorgensen (1988 by Prentice Hall, and also at the web page
134       <http://www.rdb.com/>).  Fsdb is an incompatible re-implementation of
135       their idea without any accelerated indexing or forms support.  (But
136       it's free, and probably has better statistics!).
137
138       Fsdb-2.x will exploit multiple processors or cores, and provides Perl-
139       level support for input, output, and threaded-pipelines.  (As of
140       Fsdb-2.44 it no longer uses Perl threading, just processes, since they
141       are faster.)
142
143       Installation instructions follow at the end of this document.  Fsdb-2.x
144       requires Perl 5.8 to run.  All commands have manual pages and provide
145       usage with the "--help" option.  All commands are backed by an
146       automated test suite.
147
148       The most recent version of Fsdb is available on the web at
149       <http://www.isi.edu/~johnh/SOFTWARE/FSDB/index.html>.
150

WHAT'S NEW

152   3.1, 2022-11-22 A post-3.0 cleanup release with minor fixes.
153       ENHANCEMENT
154           Type specifications in a few more programs that I missed:
155           dbrowuniq, dbcolpercentile.
156
157       ENHANCEMENT
158           Minor documentation improvements.
159

README CONTENTS

161       executive summary
162       what's new
163       README CONTENTS
164       installation
165       basic data format
166       basic data manipulation
167       list of commands
168       another example
169       a gradebook example
170       a password example
171       history
172       related work
173       release notes
174       copyright
175       comments
176

INSTALLATION

178       Fsdb now uses the standard Perl build and installation from
179       ExtUtil::MakeMaker(3), so the quick answer to installation is to type:
180
181           perl Makefile.PL
182           make
183           make test
184           make install
185
186       Or, if you want to install it somewhere else, change the first line to
187
188           perl Makefile.PL PREFIX=$HOME
189
190       and it will go in your home directory's bin, etc.  (See
191       ExtUtil::MakeMaker(3) for more details.)
192
193       Fsdb requires perl 5.8 or later.
194
195       A test-suite is available, run it with
196
197           make test
198
199       In the past, the ports existed for FreeBSD and MacOS.  If someone
200       running one of those OSes wants to contribute a new port, please let me
201       know.
202

BASIC DATA FORMAT

204       These programs are based on the idea storing data in simple ASCII
205       files.  A database is a file with one header line and then data or
206       comment lines.  For example:
207
208               #fsdb account passwd uid gid fullname homedir shell
209               johnh * 2274 134 John_Heidemann /home/johnh /bin/bash
210               greg * 2275 134 Greg_Johnson /home/greg /bin/bash
211               root * 0 0 Root /root /bin/bash
212               # this is a simple database
213
214       The header line must be first and begins with "#fsdb".  There are rows
215       (records) and columns (fields), just like in a normal database.
216       Comment lines begin with "#".  Column names are any string not
217       containing spaces or single quote (although it is prudent to keep them
218       alphanumeric with underscore).
219
220       Columns can optionally include type anntations by following name with
221       :t where t is some type.  (Types are not used in Perl, but are relevant
222       in Python and Go Fsdb bindings.)  Types use a subset of perl pack
223       specifiers: c, s, l, q are signed 8, 16, 32, and 64-bit integers, f is
224       a float, d is double float, a is utf-8 string, and &gt; and &lt; can
225       force big or little endianness.
226
227       By default, columns are delimited by whitespace.  With this default
228       configuration, the contents of a field cannot contain whitespace.
229       However, this limitation can be relaxed by changing the field separator
230       as described below.
231
232       The big advantage of simple flat-text databases is that it is usually
233       easy to massage data into this format, and it's reasonably easy to take
234       data out of this format into other (text-based) programs, like gnuplot,
235       jgraph, and LaTeX.  Think Unix.  Think pipes.  (Or even output to Excel
236       and HTML if you prefer.)
237
238       Since no-whitespace in columns was a problem for some applications,
239       there's an option which relaxes this rule.  You can specify the field
240       separator in the table header with "-F x" where "x" is a code for the
241       new field separator.  A full list of codes is at dbfilealter(1), but
242       two common special values are "-F t" which is a separator of a single
243       tab character, and "-F S", a separator of two spaces.  Both allowing
244       (single) spaces in fields.  An example:
245
246               #fsdb -F S account passwd uid gid fullname homedir shell
247               johnh  *  2274  134  John Heidemann  /home/johnh  /bin/bash
248               greg  *  2275  134  Greg Johnson  /home/greg  /bin/bash
249               root  *  0  0  Root  /root  /bin/bash
250               # this is a simple database
251
252       See dbfilealter(1) for more details.  Regardless of what the column
253       separator is for the body of the data, it's always whitespace in the
254       header.
255
256       There's also a third format: a "list".  Because it's often hard to see
257       what's columns past the first two, in list format each "column" is on a
258       separate line.  The programs dblistize and dbcolize convert to and from
259       this format, and all programs work with either formats.  The command
260
261           dbfilealter -R C  < DATA/passwd.fsdb
262
263       outputs:
264
265               #fsdb -R C account passwd uid gid fullname homedir shell
266               account:  johnh
267               passwd:   *
268               uid:      2274
269               gid:      134
270               fullname: John_Heidemann
271               homedir:  /home/johnh
272               shell:    /bin/bash
273
274               account:  greg
275               passwd:   *
276               uid:      2275
277               gid:      134
278               fullname: Greg_Johnson
279               homedir:  /home/greg
280               shell:    /bin/bash
281
282               account:  root
283               passwd:   *
284               uid:      0
285               gid:      0
286               fullname: Root
287               homedir:  /root
288               shell:    /bin/bash
289
290               # this is a simple database
291               #  | dblistize
292
293       See dbfilealter(1) for more details.
294

BASIC DATA MANIPULATION

296       A number of programs exist to manipulate databases.  Complex functions
297       can be made by stringing together commands with shell pipelines.  For
298       example, to print the home directories of everyone with ``john'' in
299       their names, you would do:
300
301               cat DATA/passwd | dbrow '_fullname =~ /John/' | dbcol homedir
302
303       The output might be:
304
305               #fsdb homedir
306               /home/johnh
307               /home/greg
308               # this is a simple database
309               #  | dbrow _fullname =~ /John/
310               #  | dbcol homedir
311
312       (Notice that comments are appended to the output listing each command,
313       providing an automatic audit log.)
314
315       In addition to typical database functions (select, join, etc.) there
316       are also a number of statistical functions.
317
318       The real power of Fsdb is that one can apply arbitrary code to rows to
319       do powerful things.
320
321               cat DATA/passwd | dbroweval '_fullname =~ s/(\w+)_(\w+)/$2,_$1/'
322
323       converts "John_Heidemann" into "Heidemann,_John".  Not too much more
324       work could split fullname into firstname and lastname fields.
325
326       (Or:
327
328               cat DATA/passwd | dbcolcreate sort | dbroweval -b 'use Fsdb::Support'
329                       '_sort = _fullname; _sort =~ s/_/ /g; _sort = fullname_to_sort(_sort);'
330

TALKING ABOUT COLUMNS

332       An advantage of Fsdb is that you can talk about columns by name
333       (symbolically) rather than simply by their positions.  So in the above
334       example, "dbcol homedir" pulled out the home directory column, and
335       "dbrow '_fullname =~ /John/'" matched against column fullname.
336
337       In general, you can use the name of the column listed on the "#fsdb"
338       line to identify it in most programs, and _name to identify it in code.
339
340       Some alternatives for flexibility:
341
342       •   Numeric values identify columns positionally, numbering from 0.  So
343           0 or _0 is the first column, 1 is the second, etc.
344
345       •   In code, _last_columnname gets the value from columname's previous
346           row.
347
348       See dbroweval(1) for more details about writing code.
349

LIST OF COMMANDS

351       Enough said.  I'll summarize the commands, and then you can experiment.
352       For a detailed description of each command, see a summary by running it
353       with the argument "--help" (or "-?" if you prefer.)  Full manual pages
354       can be found by running the command with the argument "--man", or
355       running the Unix command "man dbcol" or whatever program you want.
356
357   TABLE CREATION
358       dbcolcreate
359           add columns to a database
360
361       dbcoldefine
362           set the column headings for a non-Fsdb file
363
364   TABLE MANIPULATION
365       dbcol
366           select columns from a table
367
368       dbrow
369           select rows from a table
370
371       dbsort
372           sort rows based on a set of columns
373
374       dbjoin
375           compute the natural join of two tables
376
377       dbcolrename
378           rename a column
379
380       dbcolmerge
381           merge two columns into one
382
383       dbcolsplittocols
384           split one column into two or more columns
385
386       dbcolsplittorows
387           split one column into multiple rows
388
389       dbfilepivot
390           "pivots" a file, converting multiple rows corresponding to the same
391           entity into a single row with multiple columns.
392
393       dbfilevalidate
394           check that db file doesn't have some common errors
395
396   COMPUTATION AND STATISTICS
397       dbcolstats
398           compute statistics over a column (mean,etc.,optionally median)
399
400       dbmultistats
401           group rows by some key value, then compute stats (mean, etc.) over
402           each group (equivalent to dbmapreduce with dbcolstats as the
403           reducer)
404
405       dbmapreduce
406           group rows (map) and then apply an arbitrary function to each group
407           (reduce)
408
409       dbrvstatdiff
410           compare two samples distributions (mean/conf interval/T-test)
411
412       dbcolmovingstats
413           computing moving statistics over a column of data
414
415       dbcolstatscores
416           compute Z-scores and T-scores over one column of data
417
418       dbcolpercentile
419           compute the rank or percentile of a column
420
421       dbcolhisto
422           compute histograms over a column of data
423
424       dbcolscorrelate
425           compute the coefficient of correlation over several columns
426
427       dbcolsregression
428           compute linear regression and correlation for two columns
429
430       dbrowaccumulate
431           compute a running sum over a column of data
432
433       dbrowcount
434           count the number of rows (a subset of dbstats)
435
436       dbrowdiff
437           compute differences between a columns in each row of a table
438
439       dbrowenumerate
440           number each row
441
442       dbroweval
443           run arbitrary Perl code on each row
444
445       dbrowuniq
446           count/eliminate identical rows (like Unix uniq(1))
447
448       dbfilediff
449           compare fields on rows of a file (something like Unix diff(1))
450
451   OUTPUT CONTROL
452       dbcolneaten
453           pretty-print columns
454
455       dbfilealter
456           convert between column or list format, or change the column
457           separator
458
459       dbfilestripcomments
460           remove comments from a table
461
462       dbformmail
463           generate a script that sends form mail based on each row
464
465   CONVERSIONS
466       (These programs convert data into fsdb.  See their web pages for
467       details.)
468
469       cgi_to_db
470           <http://stein.cshl.org/boulder/>
471
472       combined_log_format_to_db
473           <http://httpd.apache.org/docs/2.0/logs.html>
474
475       html_table_to_db
476           HTML tables to fsdb (assuming they're reasonably formatted).
477
478       kitrace_to_db
479           <http://ficus-www.cs.ucla.edu/ficus-members/geoff/kitrace.html>
480
481       ns_to_db
482           <http://mash-www.cs.berkeley.edu/ns/>
483
484       sqlselect_to_db
485           the output of SQL SELECT tables to db
486
487       tabdelim_to_db
488           spreadsheet tab-delimited files to db
489
490       tcpdump_to_db
491           (see man tcpdump(8) on any reasonable system)
492
493       xml_to_db
494           XML input to fsdb, assuming they're very regular
495
496       (And out of fsdb:)
497
498       db_to_csv
499           Comma-separated-value format from fsdb.
500
501       db_to_html_table
502           simple conversion of Fsdb to html tables
503
504   STANDARD OPTIONS
505       Many programs have common options:
506
507       -? or --help
508           Show basic usage.
509
510       -N on --new-name
511           When a command creates a new column like dbrowaccumulate's "accum",
512           this option lets one override the default name of that new column.
513
514       -T TmpDir
515           where to put tmp files.  Also uses environment variable TMPDIR, if
516           -T is not specified.  Default is /tmp.
517
518           Show basic usage.
519
520       -c FRACTION or --confidence FRACTION
521           Specify confidence interval FRACTION (dbcolstats, dbmultistats,
522           etc.)
523
524       -C S or "--element-separator S"
525           Specify column separator S (dbcolsplittocols, dbcolmerge).
526
527       -d or --debug
528           Enable debugging (may be repeated for greater effect in some
529           cases).
530
531       -a or --include-non-numeric
532           Compute stats over all data (treating non-numbers as zeros).  (By
533           default, things that can't be treated as numbers are ignored for
534           stats purposes)
535
536       -S or --pre-sorted
537           Assume the data is pre-sorted.  May be repeated to disable
538           verification (saving a small amount of work).
539
540       -e E or --empty E
541           give value E as the value for empty (null) records
542
543       -i I or --input I
544           Input data from file I.
545
546       -o O or --output O
547           Write data out to file O.
548
549       --header H
550           Use H as the full Fsdb header, rather than reading a header from
551           then input.  This option is particularly useful when using Fsdb
552           under Hadoop, where split files don't have heades.
553
554       --nolog.
555           Skip logging the program in a trailing comment.
556
557       When giving Perl code (in dbrow and dbroweval) column names can be
558       embedded if preceded by underscores.  Look at dbrow(1) or dbroweval(1)
559       for examples.)
560
561       Most programs run in constant memory and use temporary files if
562       necessary.  Exceptions are dbcolneaten, dbcolpercentile, dbmapreduce,
563       dbmultistats, dbrowsplituniq.
564
565   STANDARD SORTING OPTIONS
566       A number of programs do sorting, or depend on defining an ordering of
567       rows.  Such programs use these standard sorting options:
568
569       -r or --descending
570           sort in reverse order (high to low)
571
572       -R or --ascending
573           sort in normal order (low to high)
574
575       -t or --type-inferred-sorting
576           sort fields by type (numeric or leicographic), automatically
577
578       -n or --numeric
579           sort numerically
580
581       -N or --lexical
582           sort lexicographically
583

ANOTHER EXAMPLE

585       Take the raw data in "DATA/http_bandwidth", put a header on it
586       ("dbcoldefine size bw"), took statistics of each category
587       ("dbmultistats -k size bw"), pick out the relevant fields ("dbcol size
588       mean stddev pct_rsd"), and you get:
589
590               #fsdb size mean stddev pct_rsd
591               1024    1.4962e+06      2.8497e+05      19.047
592               10240   5.0286e+06      6.0103e+05      11.952
593               102400  4.9216e+06      3.0939e+05      6.2863
594               #  | dbcoldefine size bw
595               #  | /home/johnh/BIN/DB/dbmultistats -k size bw
596               #  | /home/johnh/BIN/DB/dbcol size mean stddev pct_rsd
597
598       (The whole command was:
599
600               cat DATA/http_bandwidth |
601               dbcoldefine size |
602               dbmultistats -k size bw |
603               dbcol size mean stddev pct_rsd
604
605       all on one line.)
606
607       Then post-process them to get rid of the exponential notation by adding
608       this to the end of the pipeline:
609
610           dbroweval '_mean = sprintf("%8.0f", _mean); _stddev = sprintf("%8.0f", _stddev);'
611
612       (Actually, this step is no longer required since dbcolstats now uses a
613       different default format.)
614
615       giving:
616
617               #fsdb      size    mean    stddev  pct_rsd
618               1024     1496200          284970        19.047
619               10240    5028600          601030        11.952
620               102400   4921600          309390        6.2863
621               #  | dbcoldefine size bw
622               #  | dbmultistats -k size bw
623               #  | dbcol size mean stddev pct_rsd
624               #  | dbroweval   { _mean = sprintf("%8.0f", _mean); _stddev = sprintf("%8.0f", _stddev); }
625
626       In a few lines, raw data is transformed to processed output.
627
628       Suppose you expect there is an odd distribution of results of one
629       datapoint.  Fsdb can easily produce a CDF (cumulative distribution
630       function) of the data, suitable for graphing:
631
632           cat DB/DATA/http_bandwidth | \
633               dbcoldefine size bw | \
634               dbrow '_size == 102400' | \
635               dbcol bw | \
636               dbsort -n bw | \
637               dbrowenumerate | \
638               dbcolpercentile count | \
639               dbcol bw percentile | \
640               xgraph
641
642       The steps, roughly: 1. get the raw input data and turn it into fsdb
643       format, 2. pick out just the relevant column (for efficiency) and sort
644       it, 3. for each data point, assign a CDF percentage to it, 4. pick out
645       the two columns to graph and show them
646

A GRADEBOOK EXAMPLE

648       The first commercial program I wrote was a gradebook, so here's how to
649       do it with Fsdb.
650
651       Format your data like DATA/grades.
652
653               #fsdb name email id test1
654               a a@ucla.example.edu 1 80
655               b b@usc.example.edu 2 70
656               c c@isi.example.edu 3 65
657               d d@lmu.example.edu 4 90
658               e e@caltech.example.edu 5 70
659               f f@oxy.example.edu 6 90
660
661       Or if your students have spaces in their names, use "-F S" and two
662       spaces to separate each column:
663
664               #fsdb -F S name email id test1
665               alfred aho  a@ucla.example.edu  1  80
666               butler lampson  b@usc.example.edu  2  70
667               david clark  c@isi.example.edu  3  65
668               constantine drovolis  d@lmu.example.edu  4  90
669               debrorah estrin  e@caltech.example.edu  5  70
670               sally floyd  f@oxy.example.edu  6  90
671
672       To compute statistics on an exam, do
673
674               cat DATA/grades | dbstats test1 |dblistize
675
676       giving
677
678               #fsdb -R C  ...
679               mean:        77.5
680               stddev:      10.84
681               pct_rsd:     13.987
682               conf_range:  11.377
683               conf_low:    66.123
684               conf_high:   88.877
685               conf_pct:    0.95
686               sum:         465
687               sum_squared: 36625
688               min:         65
689               max:         90
690               n:           6
691               ...
692
693       To do a histogram:
694
695               cat DATA/grades | dbcolhisto -n 5 -g test1
696
697       giving
698
699               #fsdb low histogram
700               65      *
701               70      **
702               75
703               80      *
704               85
705               90      **
706               #  | /home/johnh/BIN/DB/dbhistogram -n 5 -g test1
707
708       Now you want to send out grades to the students by e-mail.  Create a
709       form-letter (in the file test1.txt):
710
711               To: _email (_name)
712               From: J. Random Professor <jrp@usc.example.edu>
713               Subject: test1 scores
714
715               _name, your score on test1 was _test1.
716               86+   A
717               75-85 B
718               70-74 C
719               0-69  F
720
721       Generate the shell script that will send the mail out:
722
723               cat DATA/grades | dbformmail test1.txt > test1.sh
724
725       And run it:
726
727               sh <test1.sh
728
729       The last two steps can be combined:
730
731               cat DATA/grades | dbformmail test1.txt | sh
732
733       but I like to keep a copy of exactly what I send.
734
735       At the end of the semester you'll want to compute grade totals and
736       assign letter grades.  Both fall out of dbroweval.  For example, to
737       compute weighted total grades with a 40% midterm/60% final where the
738       midterm is 84 possible points and the final 100:
739
740               dbcol -rv total |
741               dbcolcreate total - |
742               dbroweval '
743                       _total = .40 * _midterm/84.0 + .60 * _final/100.0;
744                       _total = sprintf("%4.2f", _total);
745                       if (_final eq "-" || ( _name =~ /^_/)) { _total = "-"; };' |
746               dbcolneaten
747
748       If you got the data originally from a spreadsheet, save it in "tab-
749       delimited" format and convert it with tabdelim_to_db (run
750       tabdelim_to_db -? for examples).
751

A PASSWORD EXAMPLE

753       To convert the Unix password file to db:
754
755               cat /etc/passwd | sed 's/:/  /g'| \
756                       dbcoldefine -F S login password uid gid gecos home shell \
757                       >passwd.fsdb
758
759       To convert the group file
760
761               cat /etc/group | sed 's/:/  /g' | \
762                       dbcoldefine -F S group password gid members \
763                       >group.fsdb
764
765       To show the names of the groups that div7-members are in (assuming DIV7
766       is in the gecos field):
767
768               cat passwd.fsdb | dbrow '_gecos =~ /DIV7/' | dbcol login gid | \
769                       dbjoin -i - -i group.fsdb gid | dbcol login group
770

SHORT EXAMPLES

772       Which Fsdb programs are the most complicated (based on number of test
773       cases)?
774
775               ls TEST/*.cmd | \
776                       dbcoldefine test | \
777                       dbroweval '_test =~ s@^TEST/([^_]+).*$@$1@' | \
778                       dbrowuniq -c | \
779                       dbsort -nr count | \
780                       dbcolneaten
781
782       (Answer: dbmapreduce, then dbcolstats, dbfilealter and dbjoin.)
783
784       Stats on an exam (in $FILE, where $COLUMN is the name of the exam)?
785
786               cat $FILE | dbcolstats -q 4 $COLUMN <$FILE | dblistize | dbstripcomments
787
788               cat $FILE | dbcolhisto -g -n 20 $COLUMN | dbcolneaten | dbstripcomments
789
790       Merging a the hw1 column from file hw1.fsdb into grades.fsdb assuming
791       there's a common student id in column "id":
792
793               dbcol id hw1 <hw1.fsdb >t.fsdb
794
795               dbjoin -a -e - grades.fsdb t.fsdb id | \
796                   dbsort  name | \
797                   dbcolneaten >new_grades.fsdb
798
799       Merging two fsdb files with the same rows:
800
801               cat file1.fsdb file2.fsdb >output.fsdb
802
803       or if you want to clean things up a bit
804
805               cat file1.fsdb file2.fsdb | dbstripextraheaders >output.fsdb
806
807       or if you want to know where the data came from
808
809               for i in 1 2
810               do
811                       dbcolcreate source $i < file$i.fsdb
812               done >output.fsdb
813
814       (assumes you're using a Bourne-shell compatible shell, not csh).
815

WARNINGS

817       As with any tool, one should (which means must) understand the limits
818       of the tool.
819
820       All Fsdb tools should run in constant memory.  In some cases (such as
821       dbcolstats with quartiles, where the whole input must be re-read),
822       programs will spool data to disk if necessary.
823
824       Most tools buffer one or a few lines of data, so memory will scale with
825       the size of each line.  (So lines with many columns, or when columns
826       have lots data, may cause large memory consumption.)
827
828       All Fsdb tools should run in constant or at worst "n log n" time.
829
830       All Fsdb tools use normal Perl math routines for computation.  Although
831       I make every attempt to choose numerically stable algorithms (although
832       I also welcome feedback and suggestions for improvement), normal
833       rounding due to computer floating point approximations can result in
834       inaccuracies when data spans a large range of precision.  (See for
835       example the dbcolstats_extrema test cases.)
836
837       Any requirements and limitations of each Fsdb tool is documented on its
838       manual page.
839
840       If any Fsdb program violates these assumptions, that is a bug that
841       should be documented on the tool's manual page or ideally fixed.
842
843       Fsdb does depend on Perl's correctness, and Perl (and Fsdb) have some
844       bugs.  Fsdb should work on perl from version 5.10 onward.
845

HISTORY

847       There have been four major versions of Fsdb: fsdb-0.x was begun in 1991
848       for my personal use.  Fsdb 1.0 is a complete re-write of the pre-1995
849       versions, and was distributed from 1995 to 2007.  Fsdb 2.0 is a
850       significant re-write of the 1.x versions to systematically use a
851       library and threads (although threads were abandoned in 2.44).  Fsdb
852       3.0 in 2022 adds type specifiers to the schema, mostly to support use
853       in languages with stronger typing (like Python, Go, and C).
854
855       Fsdb (in its various forms) has been used extensively by its author
856       since 1991.  Since 1995 it's been used by two other researchers at UCLA
857       and several at ISI.  In February 1998 it was announced to the Internet.
858       Since then it has found a few users, some outside where I work.
859
860       Major changes:
861
862       1.0 1997-07-22: first public release.
863       2.0 2008-01-25: rewrite to use a common library, and starting to use
864       threads.
865       2.12 2008-10-16: completion of the rewrite, and first RPM package.
866       2.44 2013-10-02: abandoning threads for improved performance
867       3.0 2022-04-04: adding type specifiers to the schema
868
869   Fsdb 2.0 Rationale
870       I've thought about fsdb-2.0 for many years, but it was started in
871       earnest in 2007.  Fsdb-2.0 has the following goals:
872
873       in-one-process processing
874           While fsdb is great on the Unix command line as a pipeline between
875           programs, it should also be possible to set it up to run in a
876           single process.  And if it does so, it should be able to avoid
877           serializing and deserializing (converting to and from text) data
878           between each module.  (Accomplished in fsdb-2.0: see dbpipeline,
879           although still needs tuning.)
880
881       clean IO API
882           Fsdb's roots go back to perl4 and 1991, so the fsdb-1.x library is
883           very, very crufty.  More than just being ugly (but it was that
884           too), this made things reading from one format file and writing to
885           another the application's job, when it should be the library's.
886           (Accomplished in fsdb-1.15 and improved in 2.0: see Fsdb::IO.)
887
888       normalized module APIs
889           Because fsdb modules were added as needed over 10 years, sometimes
890           the module APIs became inconsistent.  (For example, the 1.x
891           "dbcolcreate" required an empty value following the name of the new
892           column, but other programs specify empty values with the "-e"
893           argument.)  We should smooth over these inconsistencies.
894           (Accomplished as each module was ported in 2.0 through 2.7.)
895
896       everyone handles all input formats
897           Given a clean IO API, the distinction between "colized" and
898           "listized" fsdb files should go away.  Any program should be able
899           to read and write files in any format.  (Accomplished in fsdb-2.1.)
900
901       Fsdb-2.0 preserves backwards compatibility where possible, but breaks
902       it where necessary to accomplish the above goals.  In August 2008,
903       Fsdb-2.7 was declared preferred over the 1.x versions.  Benchmarking in
904       2013 showed that threading performed much worse than just using pipes,
905       so Fsdb-2.44 uses threading "style", but implemented with processes
906       (via my "Freds" library).
907
908   Contributors
909       Fsdb includes code ported from Geoff Kuenning
910       ("Fsdb::Support::TDistribution").
911
912       Fsdb contributors: Ashvin Goel goel@cse.oge.edu, Geoff Kuenning
913       geoff@fmg.cs.ucla.edu, Vikram Visweswariah visweswa@isi.edu, Kannan
914       Varadahan kannan@isi.edu, Lars Eggert larse@isi.edu, Arkadi Gelfond
915       arkadig@dyna.com, David Graff graff@ldc.upenn.edu, Haobo Yu
916       haoboy@packetdesign.com, Pavlin Radoslavov pavlin@catarina.usc.edu,
917       Graham Phillips, Yuri Pradkin, Alefiya Hussain, Ya Xu, Michael
918       Schwendt, Fabio Silva fabio@isi.edu, Jerry Zhao zhaoy@isi.edu, Ning Xu
919       nxu@aludra.usc.edu, Martin Lukac mlukac@lecs.cs.ucla.edu, Xue Cai,
920       Michael McQuaid, Christopher Meng, Calvin Ardi, H. Merijn Brand, Lan
921       Wei, Hang Guo, Wes Hardaker.
922
923       Fsdb includes datasets contributed from NIST (DATA/nist_zarr13.fsdb),
924       from
925       <http://www.itl.nist.gov/div898/handbook/eda/section4/eda4281.htm>, the
926       NIST/SEMATECH e-Handbook of Statistical Methods, section 1.4.2.8.1.
927       Background and Data.  The source is public domain, and reproduced with
928       permission.
929

RELATED WORK

931       As stated in the introduction, Fsdb is an incompatible reimplementation
932       of the ideas found in "/rdb".  By storing data in simple text files and
933       processing it with pipelines it is easy to experiment (in the shell)
934       and look at the output.  The original implementation of this idea was
935       /rdb, a commercial product described in the book UNIX relational
936       database management: application development in the UNIX environment by
937       Rod Manis, Evan Schaffer, and Robert Jorgensen (and also at the web
938       page <http://www.rdb.com/>).
939
940       While Fsdb is inspired by Rdb, it includes no code from it, and Fsdb
941       makes several different design choices.  In particular: rdb attempts to
942       be closer to a "real" database, with provision for locking, file
943       indexing.  Fsdb focuses on single user use and so eschews these
944       choices.  Rdb also has some support for interactive editing.  Fsdb
945       leaves editing to text editors like emacs or vi.
946
947       In August, 2002 I found out Carlo Strozzi extended RDB with his package
948       NoSQL <http://www.linux.it/~carlos/nosql/>.  According to Mr. Strozzi,
949       he implemented NoSQL in awk to avoid the Perl start-up of RDB.
950       Although I haven't found Perl startup overhead to be a big problem on
951       my platforms (from old Sparcstation IPCs to 2GHz Pentium-4s), you may
952       want to evaluate his system.  The Linux Journal has a description of
953       NoSQL at <http://www.linuxjournal.com/article/3294>.  It seems quite
954       similar to Fsdb.  Like /rdb, NoSQL supports indexing (not present in
955       Fsdb).  Fsdb appears to have richer support for statistics, and, as of
956       Fsdb-2.x, its support for Perl threading may support faster performance
957       (one-process, less serialization and deserialization).
958

RELEASE NOTES

960       Versions prior to 1.0 were released informally on my web page but were
961       not announced.
962
963   0.0 1991
964       started for my own research use
965
966   0.1 26-May-94
967       first check-in to RCS
968
969   0.2 15-Mar-95
970       parts now require perl5
971
972   1.0, 22-Jul-97
973       adds autoconf support and a test script.
974
975   1.1, 20-Jan-98
976       support for double space field separators, better tests
977
978   1.2, 11-Feb-98
979       minor changes and release on comp.lang.perl.announce
980
981   1.3, 17-Mar-98
982       •   adds median and quartile options to dbstats
983
984       •   adds dmalloc_to_db converter
985
986       •   fixes some warnings
987
988       •   dbjoin now can run on unsorted input
989
990       •   fixes a dbjoin bug
991
992       •   some more tests in the test suite
993
994   1.4, 27-Mar-98
995       •   improves error messages (all should now report the program that
996           makes the error)
997
998       •   fixed a bug in dbstats output when the mean is zero
999
1000   1.5, 25-Jun-98
1001       BUG FIX dbcolhisto, dbcolpercentile now handles non-numeric values like
1002       dbstats
1003       NEW dbcolstats computes zscores and tscores over a column
1004       NEW dbcolscorrelate computes correlation coefficients between two
1005       columns
1006       INTERNAL ficus_getopt.pl has been replaced by DbGetopt.pm
1007       BUG FIX all tests are now ``portable'' (previously some tests ran only
1008       on my system)
1009       BUG FIX you no longer need to have the db programs in your path (fix
1010       arose from a discussion with Arkadi Gelfond)
1011       BUG FIX installation no longer uses cp -f (to work on SunOS 4)
1012
1013   1.6, 24-May-99
1014       NEW dbsort, dbstats, dbmultistats now run in constant memory (using tmp
1015       files if necessary)
1016       NEW dbcolmovingstats does moving means over a series of data
1017       NEW dbcol has a -v option to get all columns except those listed
1018       NEW dbmultistats does quartiles and medians
1019       NEW dbstripextraheaders now also cleans up bogus comments before the
1020       fist header
1021       BUG FIX dbcolneaten works better with double-space-separated data
1022
1023   1.7,  5-Jan-00
1024       NEW dbcolize now detects and rejects lines that contain embedded copies
1025       of the field separator
1026       NEW configure tries harder to prevent people from improperly
1027       configuring/installing fsdb
1028       NEW tcpdump_to_db converter (incomplete)
1029       NEW tabdelim_to_db converter:  from spreadsheet tab-delimited files to
1030       db
1031       NEW mailing lists for fsdb are     "fsdb-announce@heidemann.la.ca.us"
1032       and  "fsdb-talk@heidemann.la.ca.us"
1033           To subscribe to either, send mail
1034           to    "fsdb-announce-request@heidemann.la.ca.us"   or
1035           "fsdb-talk-request@heidemann.la.ca.us"     with "subscribe" in the
1036           BODY of the message.
1037
1038       BUG FIX dbjoin used to produce incorrect output if there were extra,
1039       unmatched values in the 2nd table. Thanks to Graham Phillips for
1040       providing a test case.
1041       BUG FIX the sample commands in the usage strings now all should
1042       explicitly include the source of data (typically from "cat foo.fsdb
1043       |").  Thanks to Ya Xu for pointing out this documentation deficiency.
1044       BUG FIX (DOCUMENTATION) dbcolmovingstats had incorrect sample output.
1045
1046   1.8, 28-Jun-00
1047       BUG FIX header options are now preserved when writing with dblistize
1048       NEW dbrowuniq now optionally checks for uniqueness only on certain
1049       fields
1050       NEW dbrowsplituniq makes one pass through a file and splits it into
1051       separate files based on the given fields
1052       NEW converter for "crl" format network traces
1053       NEW anywhere you use arbitrary code (like dbroweval), _last_foo now
1054       maps to the last row's value for field _foo.
1055       OPTIMIZATION comment processing slightly changed so that dbmultistats
1056       now is much faster on files with lots of comments (for example, ~100k
1057       lines of comments and 700 lines of data!) (Thanks to Graham Phillips
1058       for pointing out this performance problem.)
1059       BUG FIX dbstats with median/quartiles now correctly handles singleton
1060       data points.
1061
1062   1.9,  6-Nov-00
1063       NEW dbfilesplit, split a single input file into multiple output files
1064       (based on code contributed by Pavlin Radoslavov).
1065       BUG FIX dbsort now works with perl-5.6
1066
1067   1.10, 10-Apr-01
1068       BUG FIX dbstats now handles the case where there are more n-tiles than
1069       data
1070       NEW dbstats now includes a -S option to optimize work on pre-sorted
1071       data (inspired by code contributed by Haobo Yu)
1072       BUG FIX dbsort now has a better estimate of memory usage when run on
1073       data with very short records (problem detected by Haobo Yu)
1074       BUG FIX cleanup of temporary files is slightly better
1075
1076   1.11,  2-Nov-01
1077       BUG FIX dbcolneaten now runs in constant memory
1078       NEW dbcolneaten now supports "field specifiers" that allow some control
1079       over how wide columns should be
1080       OPTIMIZATION dbsort now tries hard to be filesystem cache-friendly
1081       (inspired by "Information and Control in Gray-box Systems" by the
1082       Arpaci-Dusseau's at SOSP 2001)
1083       INTERNAL t_distr now ported to perl5 module DbTDistr
1084
1085   1.12,  30-Oct-02
1086       BUG FIX dbmultistats documentation typo fixed
1087       NEW dbcolmultiscale
1088       NEW dbcol has -r option for "relaxed error checking"
1089       NEW dbcolneaten has new -e option to strip end-of-line spaces
1090       NEW dbrow finally has a -v option to negate the test
1091       BUG FIX math bug in dbcoldiff fixed by Ashvin Goel (need to check
1092       Scheaffer test cases)
1093       BUG FIX some patches to run with Perl 5.8. Note: some programs
1094       (dbcolmultiscale, dbmultistats, dbrowsplituniq) generate warnings like:
1095       "Use of uninitialized value in concatenation (.)" or "string at
1096       /usr/lib/perl5/5.8.0/FileCache.pm line 98, <STDIN> line 2". Please
1097       ignore this until I figure out how to suppress it. (Thanks to Jerry
1098       Zhao for noticing perl-5.8 problems.)
1099       BUG FIX fixed an autoconf problem where configure would fail to find a
1100       reasonable prefix (thanks to Fabio Silva for reporting the problem)
1101       NEW db_to_html_table: simple conversion to html tables (NO fancy stuff)
1102       NEW dblib now has a function dblib_text2html() that will do simple
1103       conversion of iso-8859-1 to HTML
1104
1105   1.13,  4-Feb-04
1106       NEW fsdb added to the freebsd ports tree
1107       <http://www.freshports.org/databases/fsdb/>.  Maintainer:
1108       "larse@isi.edu"
1109       BUG FIX properly handle trailing spaces when data must be numeric (ex.
1110       dbstats with -FS, see test dbstats_trailing_spaces). Fix from Ning Xu
1111       "nxu@aludra.usc.edu".
1112       NEW dbcolize error message improved (bug report from Terrence Brannon),
1113       and list format documented in the README.
1114       NEW cgi_to_db converts CGI.pm-format storage to fsdb list format
1115       BUG FIX handle numeric synonyms for column names in dbcol properly
1116       ENHANCEMENT "talking about columns" section added to README. Lack of
1117       documentation pointed out by Lars Eggert.
1118       CHANGE dbformmail now defaults to using Mail ("Berkeley Mail") to send
1119       mail, rather than sendmail (sendmail is still an option, but mail
1120       doesn't require running as root)
1121       NEW on platforms that support it (i.e., with perl 5.8), fsdb works fine
1122       with unicode
1123       NEW dbfilevalidate: check a db file for some common errors
1124
1125   1.14,  24-Aug-06
1126       ENHANCEMENT README cleanup
1127       INCOMPATIBLE CHANGE dbcolsplit renamed dbcolsplittocols
1128       NEW dbcolsplittorows  split one column into multiple rows
1129       NEW dbcolsregression compute linear regression and correlation for two
1130       columns
1131       ENHANCEMENT cvs_to_db: better error handling, normalize field names,
1132       skip blank lines
1133       ENHANCEMENT dbjoin now detects (and fails) if non-joined files have
1134       duplicate names
1135       BUG FIX minor bug fixed in calculation of Student t-distributions
1136       (doesn't change any test output, but may have caused small errors)
1137
1138   1.15, 12-Nov-07
1139       NEW fsdb-1.14 added to the MacOS Fink system
1140       <http://pdb.finkproject.org/pdb/package.php/fsdb>. (Thanks to Lars
1141       Eggert for maintaining this port.)
1142       NEW Fsdb::IO::Reader and Fsdb::IO::Writer now provide reasonably clean
1143       OO I/O interfaces to Fsdb files.  Highly recommended if you use fsdb
1144       directly from perl.  In the fullness of time I expect to reimplement
1145       the entire thing using these APIs to replace the current dblib.pl which
1146       is still hobbled by its roots in perl4.
1147       NEW dbmapreduce now implements a Google-style map/reduce abstraction,
1148       generalizing dbmultistats.
1149       ENHANCEMENT fsdb now uses the Perl build system (Makefile.PL, etc.),
1150       instead of autoconf.  This change paves the way to better perl-5-style
1151       modularization, proper manual pages, input of both listize and colize
1152       format for every program, and world peace.
1153       ENHANCEMENT dblib.pl is now moved to Fsdb::Old.pm.
1154       BUG FIX dbmultistats now propagates its format argument (-f). Bug and
1155       fix from Martin Lukac (thanks!).
1156       ENHANCEMENT dbformmail documentation now is clearer that it doesn't
1157       send the mail, you have to run the shell script it writes.  (Problem
1158       observed by Unkyu Park.)
1159       ENHANCEMENT adapted to autoconf-2.61 (and then these changes were
1160       discarded in favor of The Perl Way.
1161       BUG FIX dbmultistats memory usage corrected (O(# tags), not O(1))
1162       ENHANCEMENT dbmultistats can now optionally run with pre-grouped input
1163       in O(1) memory
1164       ENHANCEMENT dbroweval -N was finally implemented (eat comments)
1165
1166   2.0, 25-Jan-08
1167       2.0, 25-Jan-08 --- a quiet 2.0 release (gearing up towards complete)
1168
1169       ENHANCEMENT: shifting old programs to Perl modules, with the front-end
1170       program as just a wrapper. In the short-term, this change just means
1171       programs have real man pages. In the long-run, it will mean that one
1172       can run a pipeline in a single Perl program. So far: dbcol, dbroweval,
1173       the new dbrowcount. dbsort the new dbmerge, the old "dbstats" (renamed
1174       dbcolstats), dbcolrename, dbcolcreate,
1175       NEW: Fsdb::Filter::dbpipeline is an internal-only module that lets one
1176       use fsdb commands from within perl (via threads).
1177           It also provides perl function aliases for the internal modules, so
1178           a string of fsdb commands in perl are nearly as terse as in the
1179           shell:
1180
1181               use Fsdb::Filter::dbpipeline qw(:all);
1182               dbpipeline(
1183                   dbrow(qw(name test1)),
1184                   dbroweval('_test1 += 5;')
1185               );
1186
1187       INCOMPATIBLE CHANGE: The old dbcolstats has been renamed
1188       dbcolstatscores. The new dbcolstats does the same thing as the old
1189       dbstats. This incompatibility is unfortunate but normalizes program
1190       names.
1191       CHANGE: The new dbcolstats program always outputs "-" (the default
1192       empty value) for statistics it cannot compute (for example, standard
1193       deviation if there is only one row), instead of the old mix of "-" and
1194       "na".
1195       INCOMPATIBLE CHANGE: The old dbcolstats program, now called
1196       dbcolstatscores, also has different arguments.  The "-t mean,stddev"
1197       option is now "--tmean mean --tstddev stddev".  See dbcolstatscores for
1198       details.
1199       INCOMPATIBLE CHANGE: dbcolcreate now assumes all new columns get the
1200       default value rather than requiring each column to have an initial
1201       constant value. To change the initial value, sue the new "-e" option.
1202       NEW: dbrowcount counts rows, an almost-subset of dbcolstats's "n"
1203       output (except without differentiating numeric/non-numeric input), or
1204       the equivalent of "dbstripcomments | wc -l".
1205       NEW: dbmerge merges two sorted files. This functionality was previously
1206       embedded in dbsort.
1207       INCOMPATIBLE CHANGE: dbjoin's "-i" option to include non-matches is now
1208       renamed "-a", so as to not conflict with the new standard option "-i"
1209       for input file.
1210
1211   2.1,  6-Apr-08
1212       2.1,  6-Apr-08 --- another alpha 2.0, but now all converted programs
1213       understand both listize and colize format
1214
1215       ENHANCEMENT: shifting more old programs to Perl modules. New in 2.1:
1216       dbcolneaten, dbcoldefine, dbcolhisto, dblistize, dbcolize, dbrecolize
1217       ENHANCEMENT dbmerge now handles an arbitrary number of input files, not
1218       just exactly two.
1219       NEW dbmerge2 is an internal routine that handles merging exactly two
1220       files.
1221       INCOMPATIBLE CHANGE dbjoin now specifies inputs like dbmerge2, rather
1222       than assuming the first two arguments were tables (as in fsdb-1).
1223           The old dbjoin argument "-i" is now "-a" or <--type=outer>.
1224
1225           A minor change: comments in the source files for dbjoin are now
1226           intermixed with output rather than being delayed until the end.
1227
1228       ENHANCEMENT dbsort now no longer produces warnings when null values are
1229       passed to numeric comparisons.
1230       BUG FIX dbroweval now once again works with code that lacks a trailing
1231       semicolon. (This bug fixes a regression from 1.15.)
1232       INCOMPATIBLE CHANGE dbcolneaten's old "-e" option (to avoid end-of-line
1233       spaces) is now "-E" to avoid conflicts with the standard empty field
1234       argument.
1235       INCOMPATIBLE CHANGE dbcolhisto's old "-e" option is now "-E" to avoid
1236       conflicts. And its "-n", "-s", and "-w" are now "-N", "-S", and "-W" to
1237       correspond.
1238       NEW dbfilealter replaces dbrecolize, dblistize, and dbcolize, but with
1239       different options.
1240       ENHANCEMENT The library routines "Fsdb::IO" now understand both list-
1241       format and column-format data, so all converted programs can now
1242       automatically read either format.  This capability was one of the
1243       milestone goals for 2.0, so yea!
1244
1245   2.2, 23-May-08
1246       Release 2.2 is another 2.x alpha release.  Now most of the commands are
1247       ported, but a few remain, and I plan one last incompatible change (to
1248       the file header) before 2.x final.
1249
1250       ENHANCEMENT
1251           shifting more old programs to Perl modules.  New in 2.2:
1252           dbrowaccumulate, dbformmail.  dbcolmovingstats.  dbrowuniq.
1253           dbrowdiff.  dbcolmerge.  dbcolsplittocols.  dbcolsplittorows.
1254           dbmapreduce.  dbmultistats.  dbrvstatdiff.  Also dbrowenumerate
1255           exists only as a front-end (command-line) program.
1256
1257       INCOMPATIBLE CHANGE
1258           The following programs have been dropped from fsdb-2.x:
1259           dbcoltighten, dbfilesplit, dbstripextraheaders,
1260           dbstripleadingspace.
1261
1262       NEW combined_log_format_to_db to convert Apache logfiles
1263
1264       INCOMPATIBLE CHANGE
1265           Options to dbrowdiff are now -B and -I, not -a and -i.
1266
1267       INCOMPATIBLE CHANGE
1268           dbstripcomments is now dbfilestripcomments.
1269
1270       BUG FIXES
1271           dbcolneaten better handles empty columns; dbcolhisto warning
1272           suppressed (actually a bug in high-bucket handling).
1273
1274       INCOMPATIBLE CHANGE
1275           dbmultistats now requires a "-k" option in front of the key (tag)
1276           field, or if none is given, it will group by the first field (both
1277           like dbmapreduce).
1278
1279       KNOWN BUG
1280           dbmultistats with quantile option doesn't work currently.
1281
1282       INCOMPATIBLE CHANGE
1283           dbcoldiff is renamed dbrvstatdiff.
1284
1285       BUG FIXES
1286           dbformmail was leaving its log message as a  command, not a
1287           comment.  Oops.  No longer.
1288
1289   2.3, 27-May-08 (alpha)
1290       Another alpha release, this one just to fix the critical dbjoin bug
1291       listed below (that happens to have blocked my MP3 jukebox :-).
1292
1293       BUG FIX
1294           Dbsort no longer hangs if given an input file with no rows.
1295
1296       BUG FIX
1297           Dbjoin now works with unsorted input coming from a pipeline (like
1298           stdin).  Perl-5.8.8 has a bug (?) that was making this case
1299           fail---opening stdin in one thread, reading some, then reading more
1300           in a different thread caused an lseek which works on files, but
1301           fails on pipes like stdin.  Go figure.
1302
1303       BUG FIX / KNOWN BUG
1304           The dbjoin fix also fixed dbmultistats -q (it now gives the right
1305           answer).  Although a new bug appeared, messages like:
1306               Attempt to free unreferenced scalar: SV 0xa9dd0c4, Perl
1307           interpreter: 0xa8350b8 during global destruction.  So the
1308           dbmultistats_quartile test is still disabled.
1309
1310   2.4, 18-Jun-08
1311       Another alpha release, mostly to fix minor usability problems in
1312       dbmapreduce and client functions.
1313
1314       ENHANCEMENT
1315           dbrow now defaults to running user supplied code without warnings
1316           (as with fsdb-1.x).  Use "--warnings" or "-w" to turn them back on.
1317
1318       ENHANCEMENT
1319           dbroweval can now write different format output than the input,
1320           using the "-m" option.
1321
1322       KNOWN BUG
1323           dbmapreduce emits warnings on perl 5.10.0 about "Unbalanced string
1324           table refcount" and "Scalars leaked" when run with an external
1325           program as a reducer.
1326
1327           dbmultistats emits the warning "Attempt to free unreferenced
1328           scalar" when run with quartiles.
1329
1330           In each case the output is correct.  I believe these can be
1331           ignored.
1332
1333       CHANGE
1334           dbmapreduce no longer logs a line for each reducer that is invoked.
1335
1336   2.5, 24-Jun-08
1337       Another alpha release, fixing more minor bugs in "dbmapreduce" and
1338       lossage in "Fsdb::IO".
1339
1340       ENHANCEMENT
1341           dbmapreduce can now tolerate non-map-aware reducers that pass back
1342           the key column in put.  It also passes the current key as the last
1343           argument to external reducers.
1344
1345       BUG FIX
1346           Fsdb::IO::Reader, correctly handle "-header" option again.  (Broken
1347           since fsdb-2.3.)
1348
1349   2.6, 11-Jul-08
1350       Another alpha release, needed to fix DaGronk.  One new port, small bug
1351       fixes, and important fix to dbmapreduce.
1352
1353       ENHANCEMENT
1354           shifting more old programs to Perl modules.  New in 2.2:
1355           dbcolpercentile.
1356
1357       INCOMPATIBLE CHANGE and ENHANCEMENTS dbcolpercentile arguments changed,
1358       use "--rank" to require ranking instead of "-r". Also, "--ascending"
1359       and "--descending" can now be specified separately, both for
1360       "--percentile" and "--rank".
1361       BUG FIX
1362           Sigh, the sense of the --warnings option in dbrow was inverted.  No
1363           longer.
1364
1365       BUG FIX
1366           I found and fixed the string leaks (errors like "Unbalanced string
1367           table refcount" and "Scalars leaked") in dbmapreduce and
1368           dbmultistats.  (All "IO::Handle"s in threads must be manually
1369           destroyed.)
1370
1371       BUG FIX
1372           The "-C" option to specify the column separator in dbcolsplittorows
1373           now works again (broken since it was ported).
1374
1375       2.7, 30-Jul-08 beta
1376
1377       The beta release of fsdb-2.x.  Finally, all programs are ported.  As
1378       statistics, the number of lines of non-library code doubled from 7.5k
1379       to 15.5k.  The libraries are much more complete, going from 866 to 5164
1380       lines.  The overall number of programs is about the same, although 19
1381       were dropped and 11 were added.  The number of test cases has grown
1382       from 116 to 175.  All programs are now in perl-5, no more shell scripts
1383       or perl-4.  All programs now have manual pages.
1384
1385       Although this is a major step forward, I still expect to rename "jdb"
1386       to "fsdb".
1387
1388       ENHANCEMENT
1389           shifting more old programs to Perl modules.  New in 2.7:
1390           dbcolscorellate.  dbcolsregression.  cgi_to_db.  dbfilevalidate.
1391           db_to_csv.  csv_to_db, db_to_html_table, kitrace_to_db,
1392           tcpdump_to_db, tabdelim_to_db, ns_to_db.
1393
1394       INCOMPATIBLE CHANGE
1395           The following programs have been dropped from fsdb-2.x: db2dcliff,
1396           dbcolmultiscale, crl_to_db.  ipchain_logs_to_db.  They may come
1397           back, but seemed overly specialized.  The following program
1398           dbrowsplituniq was dropped because it is superseded by dbmapreduce.
1399           dmalloc_to_db was dropped pending a test cases and examples.
1400
1401       ENHANCEMENT
1402           dbfilevalidate now has a "-c" option to correct errors.
1403
1404       NEW html_table_to_db provides the inverse of db_to_html_table.
1405
1406   2.8,  5-Aug-08
1407       Change header format, preserving forwards compatibility.
1408
1409       BUG FIX
1410           Complete editing pass over the manual, making sure it aligns with
1411           fsdb-2.x.
1412
1413       SEMI-COMPATIBLE CHANGE
1414           The header of fsdb files has changed, it is now #fsdb, not #h (or
1415           #L) and parsing of -F and -R are also different.  See dbfilealter
1416           for the new specification.  The v1 file format will be read,
1417           compatibly, but not written.
1418
1419       BUG FIX
1420           dbmapreduce now tolerates comments that precede the first key,
1421           instead of failing with an error message.
1422
1423   2.9, 6-Aug-08
1424       Still in beta; just a quick bug-fix for dbmapreduce.
1425
1426       ENHANCEMENT
1427           dbmapreduce now generates plausible output when given no rows of
1428           input.
1429
1430   2.10, 23-Sep-08
1431       Still in beta, but picking up some bug fixes.
1432
1433       ENHANCEMENT
1434           dbmapreduce now generates plausible output when given no rows of
1435           input.
1436
1437       ENHANCEMENT
1438           dbroweval the warnings option was backwards; now corrected.  As a
1439           result, warnings in user code now default off (like in fsdb-1.x).
1440
1441       BUG FIX
1442           dbcolpercentile now defaults to assuming the target column is
1443           numeric.  The new option "-N" allows selection of a non-numeric
1444           target.
1445
1446       BUG FIX
1447           dbcolscorrelate now includes "--sample" and "--nosample" options to
1448           compute the sample or full population correlation coefficients.
1449           Thanks to Xue Cai for finding this bug.
1450
1451   2.11, 14-Oct-08
1452       Still in beta, but picking up some bug fixes.
1453
1454       ENHANCEMENT
1455           html_table_to_db is now more aggressive about filling in empty
1456           cells with the official empty value, rather than leaving them blank
1457           or as whitespace.
1458
1459       ENHANCEMENT
1460           dbpipeline now catches failures during pipeline element setup and
1461           exits reasonably gracefully.
1462
1463       BUG FIX
1464           dbsubprocess now reaps child processes, thus avoiding running out
1465           of processes when used a lot.
1466
1467   2.12, 16-Oct-08
1468       Finally, a full (non-beta) 2.x release!
1469
1470       INCOMPATIBLE CHANGE
1471           Jdb has been renamed Fsdb, the flatfile-streaming database.  This
1472           change affects all internal Perl APIs, but no shell command-level
1473           APIs.  While Jdb served well for more than ten years, it is easily
1474           confused with the Java debugger (even though Jdb was there first!).
1475           It also is too generic to work well in web search engines.
1476           Finally, Jdb stands for ``John's database'', and we're a bit beyond
1477           that.  (However, some call me the ``file-system guy'', so one could
1478           argue it retains that meeting.)
1479
1480           If you just used the shell commands, this change should not affect
1481           you.  If you used the Perl-level libraries directly in your code,
1482           you should be able to rename "Jdb" to "Fsdb" to move to 2.12.
1483
1484           The jdb-announce list not yet been renamed, but it will be shortly.
1485
1486           With this release I've accomplished everything I wanted to in
1487           fsdb-2.x.  I therefore expect to return to boring, bugfix releases.
1488
1489   2.13, 30-Oct-08
1490       BUG FIX
1491           dbrowaccumulate now treats non-numeric data as zero by default.
1492
1493       BUG FIX
1494           Fixed a perl-5.10ism in dbmapreduce that breaks that program under
1495           5.8.  Thanks to Martin Lukac for reporting the bug.
1496
1497   2.14, 26-Nov-08
1498       BUG FIX
1499           Improved documentation for dbmapreduce's "-f" option.
1500
1501       ENHANCEMENT
1502           dbcolmovingstats how computes a moving standard deviation in
1503           addition to a moving mean.
1504
1505   2.15, 13-Apr-09
1506       BUG FIX
1507           Fix a make install bug reported by Shalindra Fernando.
1508
1509   2.16, 14-Apr-09
1510       BUG FIX
1511           Another minor release bug: on some systems programize_module looses
1512           executable permissions.  Again reported by Shalindra Fernando.
1513
1514   2.17, 25-Jun-09
1515       TYPO FIXES
1516           Typo in the dbroweval manual fixed.
1517
1518       IMPROVEMENT
1519           There is no longer a comment line to label columns in dbcolneaten,
1520           instead the header line is tweaked to line up.  This change
1521           restores the Jdb-1.x behavior, and means that repeated runs of
1522           dbcolneaten no longer add comment lines each time.
1523
1524       BUG FIX
1525           It turns out  dbcolneaten was not correctly handling trailing
1526           spaces when given the "-E" option to suppress them.  This
1527           regression is now fixed.
1528
1529       EXTENSION
1530           dbroweval(1) can now handle direct references to the last row via
1531           $lfref, a dubious but now documented feature.
1532
1533       BUG FIXES
1534           Separators set with "-C" in dbcolmerge and dbcolsplittocols were
1535           not properly setting the heading, and null fields were not
1536           recognized.  The first bug was reported by Martin Lukac.
1537
1538   2.18,  1-Jul-09  A minor release
1539       IMPROVEMENT
1540           Documentation for Fsdb::IO::Reader has been improved.
1541
1542       IMPROVEMENT
1543           The package should now be PGP-signed.
1544
1545   2.19,  10-Jul-09
1546       BUG FIX
1547           Internal improvements to debugging output and robustness of
1548           dbmapreduce and dbpipeline.  TEST/dbpipeline_first_fails.cmd re-
1549           enabled.
1550
1551   2.20, 30-Nov-09 (A collection of minor bugfixes, plus a build against
1552       Fedora 12.)
1553       BUG FIX
1554           Loging for dbmapreduce with code refs is now stable (it no longer
1555           includes a hex pointer to the code reference).
1556
1557       BUG FIX
1558           Better handling of mixed blank lines in Fsdb::IO::Reader (see test
1559           case dbcolize_blank_lines.cmd).
1560
1561       BUG FIX
1562           html_table_to_db now handles multi-line input better, and handles
1563           tables with COLSPAN.
1564
1565       BUG FIX
1566           dbpipeline now cleans up threads in an "eval" to prevent "cannot
1567           detach a joined thread" errors that popped up in perl-5.10.
1568           Hopefully this prevents a race condition that causes the test
1569           suites to hang about 20% of the time (in dbpipeline_first_fails).
1570
1571       IMPROVEMENT
1572           dbmapreduce now detects and correctly fails when the input and
1573           reducer have incompatible field separators.
1574
1575       IMPROVEMENT
1576           dbcolstats, dbcolhisto, dbcolscorrelate, dbcolsregression, and
1577           dbrowcount now all take an "-F" option to let one specify the
1578           output field separator (so they work better with dbmapreduce).
1579
1580       BUG FIX
1581           An omitted "-k" from the manual page of dbmultistats is now there.
1582           Bug reported by Unkyu Park.
1583
1584   2.21, 17-Apr-10 bug fix release
1585       BUG FIX
1586           Fsdb::IO::Writer now no longer fails with -outputheader => never
1587           (an obscure bug).
1588
1589       IMPROVEMENT
1590           Fsdb (in the warnings section) and dbcolstats now more carefully
1591           document how they handle (and do not handle) numerical precision
1592           problems, and other general limits.  Thanks to Yuri Pradkin for
1593           prompting this documentation.
1594
1595       IMPROVEMENT
1596           "Fsdb::Support::fullname_to_sortkey" is now restored from "Jdb".
1597
1598       IMPROVEMENT
1599           Documention for multiple styles of input approaches (including
1600           performance description) added to Fsdb::IO.
1601
1602   2.22, 2010-10-31 One new tool dbcolcopylast and several bug fixes for Perl
1603       5.10.
1604       BUG FIX
1605           dbmerge now correctly handles n-way merges.  Bug reported by Yuri
1606           Pradkin.
1607
1608       INCOMPARABLE CHANGE
1609           dbcolneaten now defaults to not padding the last column.
1610
1611       ADDITION
1612           dbrowenumerate now takes -N NewColumn to give the new column a name
1613           other than "count".  Feature requested by Mike Rouch in January
1614           2005.
1615
1616       ADDITION
1617           New program dbcolcopylast copies the last value of a column into a
1618           new column copylast_column of the next row.  New program requested
1619           by Fabio Silva; useful for converting dbmultistats output into
1620           dbrvstatdiff input.
1621
1622       BUG FIX
1623           Several tools (particularly dbmapreduce and dbmultistats) would
1624           report errors like "Unbalanced string table refcount: (1) for
1625           "STDOUT" during global destruction" on exit, at least on certain
1626           versions of Perl (for me on 5.10.1), but similar errors have been
1627           off-and-on for several Perl releases.  Although I think my code
1628           looked OK, I worked around this problem with a different way of
1629           handling standard IO redirection.
1630
1631   2.23, 2011-03-10 Several small portability bugfixes; improved dbcolstats
1632       for large datasets
1633       IMPROVEMENT
1634           Documentation to dbrvstatdiff was changed to use "sd" to refer to
1635           standard deviation, not "ss" (which might be confused with sum-of-
1636           squares).
1637
1638       BUG FIX
1639           This documentation about dbmultistats was missing the -k option in
1640           some cases.
1641
1642       BUG FIX
1643           dbmapreduce was failing on MacOS-10.6.3 for some tests with the
1644           error
1645
1646               dbmapreduce: cannot run external dbmapreduce reduce program (perl TEST/dbmapreduce_external_with_key.pl)
1647
1648           The problem seemed to be only in the error, not in operation.  On
1649           MacOS, the error is now suppressed.  Thanks to Alefiya Hussain for
1650           providing access to a Mac system that allowed debugging of this
1651           problem.
1652
1653       IMPROVEMENT
1654           The csv_to_db command requires an external Perl library
1655           (Text::CSV_XS).  On computers that lack this optional library,
1656           previously Fsdb would configure with a warning and then test cases
1657           would fail.  Now those test cases are skipped with an additional
1658           warning.
1659
1660       BUG FIX
1661           The test suite now supports alternative valid output, as a hack to
1662           account for last-digit floating point differences.  (Not very
1663           satisfying :-(
1664
1665       BUG FIX
1666           dbcolstats output for confidence intervals on very large datasets
1667           has changed.  Previously it failed for more than 2^31-1 records,
1668           and handling of T-Distributions with thousands of rows was a bit
1669           dubious.  Now datasets with more than 10000 are considered
1670           infinitely large and hopefully correctly handled.
1671
1672   2.24, 2011-04-15 Improvements to fix an old bug in dbmapreduce with
1673       different field separators
1674       IMPROVEMENT
1675           The dbfilealter command had a "--correct" option to work-around
1676           from incompatible field-separators, but it did nothing.  Now it
1677           does the correct but sad, data-loosing thing.
1678
1679       IMPROVEMENT
1680           The dbmultistats command previously failed with an error message
1681           when invoked on input with a non-default field separator.  The root
1682           cause was the underlying dbmapreduce that did not handle the case
1683           of reducers that generated output with a different field separator
1684           than the input.  We now detect and repair incompatible field
1685           separators.  This change corrects a problem originally documented
1686           and detected in Fsdb-2.20.  Bug re-reported by Unkyu Park.
1687
1688   2.25, 2011-08-07 Two new tools, xml_to_db and dbfilepivot, and a bugfix for
1689       two people.
1690       IMPROVEMENT
1691           kitrace_to_db now supports a --utc option, which also fixes this
1692           test case for users outside of the Pacific time zone.  Bug reported
1693           by David Graff, and also by Peter Desnoyers (within a week of each
1694           other :-)
1695
1696       NEW xml_to_db can convert simple, very regular XML files into Fsdb.
1697
1698       NEW dbfilepivot "pivots" a file, converting multiple rows corresponding
1699           to the same entity into a single row with multiple columns.
1700
1701   2.26, 2011-12-12 Bug fixes, particularly for perl-5.14.2.
1702       BUG FIX
1703           Bugs fixed in Fsdb::IO::Reader(3) manual page.
1704
1705       BUG FIX
1706           Fixed problems where dbcolstats was truncating floating point
1707           numbers when sorting.  This strange behavior happens as of
1708           perl-5.14.2 and it seems like a Perl bug.  I've worked around it
1709           for the test suites, but I'm a bit nervous.
1710
1711   2.27, 2012-11-15 Accumulated bug fixes.
1712       IMPROVEMENT
1713           csv_to_db now reports errors in CVS input with real diagnostics.
1714
1715       IMPROVEMENT
1716           dbcolmovingstats can now compute median, when given the "-m"
1717           option.
1718
1719       BUG FIX
1720           dbcolmovingstats non-numeric handling (the "-a" option) now works
1721           properly.
1722
1723       DOCUMENTATION
1724           The internal t/test_command.t test framework is now documented.
1725
1726       BUG FIX
1727           dbrowuniq now correctly handles the case where there is no input
1728           (previously it output a blank line, which is a malformed fsdb
1729           file).  Thanks to Yuri Pradkin for reporting this bug.
1730
1731   2.28, 2012-11-15 A quick release to fix most rpmlint errors.
1732       BUG FIX
1733           Fixed a number of minor release problems (wrong permissions, old
1734           FSF address, etc.) found by rpmlint.
1735
1736   2.29, 2012-11-20 a quick release for CPAN testing
1737       IMPROVEMENT
1738           Tweaked the RPM spec.
1739
1740       IMPROVEMENT
1741           Modified Makefile.PL to fail gracefully on Perl installations that
1742           lack threads.  (Without this fix, I get massive failures in the
1743           non-ithreads test system.)
1744
1745   2.30, 2012-11-25 improvements to perl portability
1746       BUG FIX
1747           Removed unicode character in documention of dbcolscorrelated so pod
1748           tests will pass.  (Sigh, that should work :-( )
1749
1750       BUG FIX
1751           Fixed test suite failures on 5 tests (dbcolcreate_double_creation
1752           was the first) due to Carp's addition of a period.  This problem
1753           was breaking Fsdb on perl-5.17.  Thanks to Michael McQuaid for
1754           helping diagnose this problem.
1755
1756       IMPROVEMENT
1757           The test suite now prints out the names of tests it tries.
1758
1759   2.31, 2012-11-28 A release with actual improvements to dbfilepivot and
1760       dbrowuniq.
1761       BUG FIX
1762           Documentation fixes: typos in dbcolscorrelated, bugs in
1763           dbfilepivot, clarification for comment handling in
1764           Fsdb::IO::Reader.
1765
1766       IMPROVEMENT
1767           Previously dbfilepivot assumed the input was grouped by keys and
1768           didn't very that pre-condition.  Now there is no pre-condition (it
1769           will sort the input by default), and it checks if the invariant is
1770           violated.
1771
1772       BUG FIX
1773           Previously dbfilepivot failed if the input had comments (oops :-);
1774           no longer.
1775
1776       IMPROVEMENT
1777           Now dbrowuniq has the "-L" option to preserve the last unique row
1778           (instead of the first), a common idiom.
1779
1780   2.32, 2012-12-21 Test suites should now be more numerically robust.
1781       NEW New dbfilediff does fsdb-aware file differencing.  It does not do
1782           smart intuition of add/removes like Unix diff(1), but it does know
1783           about columns, and with "-E", it does numeric-aware differences.
1784
1785       IMPROVEMENT
1786           Test suites that are numeric now use dbfilediff to do numeric-aware
1787           comparisons, so the test suite should now be robust to slightly
1788           different computers and operating systems and compilers than
1789           exactly what I use.
1790
1791   2.33, 2012-12-23 Minor fixes to some test cases.
1792       IMPROVEMENT
1793           dbfilediff and dbrowuniq now supports the "-N" option to give the
1794           new column a different name.  (And a test cases where this
1795           duplication mattered have been fixed.)
1796
1797       IMPROVEMENT
1798           dbrvstatdiff now show the t-test breakpoint with a reasonable
1799           number of floating point digits.
1800
1801       BUG FIX
1802           Fixed a numerical stability problem in the dbroweval_last test
1803           case.
1804

WHAT'S NEW

1806   2.34, 2013-02-10 Parallelism in dbmerge.
1807       IMPROVEMENT
1808           Documention for dbjoin now includes resource requirements.
1809
1810       IMPROVEMENT
1811           Default memory usage for dbsort is now about 256MB.  (The world
1812           keeps moving forward.)
1813
1814       IMPROVEMENT
1815           dbmerge now does merging in parallel.  As a side-effect, dbsort
1816           should be faster when input overflows memory.  The level of
1817           parallelism can be limited with the "--parallelism" option.  (There
1818           is more work to do here, but we're off to a start.)
1819
1820   2.35, 2013-02-23 Improvements to dbmerge parallelism
1821       BUG FIX
1822           Fsdb temporary files are now created more securely (with
1823           File::Temp).
1824
1825       IMPROVEMENT
1826           Programs that sort or merge on fields (dbmerge2, dbmerge, dbsort,
1827           dbjoin) now report an error if no fields on which to join or merge
1828           are given.
1829
1830       IMPROVEMENT
1831           Parallelism in dbmerge is should now be more consistent, with less
1832           starting and stopping.
1833
1834       IMPROVEMENT In dbmerge, the "--xargs" option lets one give input
1835       filenames on standard input, rather than the command line. This feature
1836       paves the way for faster dbsort for large inputs (by pipelining sorting
1837       and merging), expected in the next release.
1838
1839   2.36, 2013-02-25 dbsort pipelines with dbmerge
1840       IMPROVEMENT For large inputs, dbsort now pipelines sorting and merging,
1841       allowing earlier processing.
1842       BUG FIX Since 2.35, dbmerge delayed cleanup of intermediate files,
1843       thereby requiring extra disk space.
1844
1845   2.37, 2013-02-26 quick bugfix to support parallel sort and merge from
1846       recent releases
1847       BUG FIX Since 2.35, dbmerge delayed removal of input files given by
1848       "--xargs".  This problem is now fixed.
1849
1850   2.38, 2013-04-29 minor bug fixes
1851       CLARIFICATION
1852           Configure now rejects Windows since tests seem to hang on some
1853           versions of Windows.  (I would love help from a Windows developer
1854           to get this problem fixed, but I cannot do it.)  See
1855           https://rt.cpan.org/Ticket/Display.html?id=84201.
1856
1857       IMPROVEMENT
1858           All programs that use temporary files (dbcolpercentile,
1859           dbcolscorrelate, dbcolstats, dbcolstatscores) now take the "-T"
1860           option and set the temporary directory consistently.
1861
1862           In addition, error messages are better when the temporary directory
1863           has problems.  Problem reported by Liang Zhu.
1864
1865       BUG FIX
1866           dbmapreduce was failing with external, map-reduce aware reducers
1867           (when invoked with -M and an external program).  (Sigh, did this
1868           case ever work?)  This case should now work.  Thanks to Yuri
1869           Pradkin for reporting this bug (in 2011).
1870
1871       BUG FIX
1872           Fixed perl-5.10 problem with dbmerge.  Thanks to Yuri Pradkin for
1873           reporting this bug (in 2013).
1874
1875   2.39, date 2013-05-31 quick release for the dbrowuniq extension
1876       BUG FIX
1877           Actually in 2.38, the Fedora .spec got cleaner dependencies.
1878           Suggestion from Christopher Meng via
1879           <https://bugzilla.redhat.com/show_bug.cgi?id=877096>.
1880
1881       ENHANCEMENT
1882           Fsdb files are now explicitly set into UTF-8 encoding, unless one
1883           specifies "-encoding" to "Fsdb::IO".
1884
1885       ENHANCEMENT
1886           dbrowuniq now supports "-I" for incremental counting.
1887
1888   2.40, 2013-07-13 small bug fixes
1889       BUG FIX
1890           dbsort now has more respect for a user-given temporary directory;
1891           it no longer is ignored for merging.
1892
1893       IMPROVEMENT
1894           dbrowuniq now has options to output the first, last, and both first
1895           and last rows of a run ("-F", "-L", and "-B").
1896
1897       BUG FIX
1898           dbrowuniq now correctly handles "-N".  Sigh, it didn't work before.
1899
1900   2.41, 2013-07-29 small bug and packaging fixes
1901       ENHANCEMENT
1902           Documentation to dbrvstatdiff improved (inspired by questions from
1903           Qian Kun).
1904
1905       BUG FIX
1906           dbrowuniq no longer duplicates singleton unique lines when
1907           outputting both (with "-B").
1908
1909       BUG FIX
1910           Add missing "XML::Simple" dependency to Makefile.PL.
1911
1912       ENHANCEMENT
1913           Tests now show the diff of the failing output if run with "make
1914           test TEST_VERBOSE=1".
1915
1916       ENHANCEMENT
1917           dbroweval now includes documentation for how to output extra rows.
1918           Suggestion from Yuri Pradkin.
1919
1920       BUG FIX
1921           Several improvements to the Fedora package from Michael Schwendt
1922           via <https://bugzilla.redhat.com/show_bug.cgi?id=877096>, and from
1923           the harsh master that is rpmlint.  (I am stymied at teaching it
1924           that "outliers" is spelled correctly.  Maybe I should send it
1925           Schneier's book.  And an unresolvable invalid-spec-name lurks in
1926           the SRPM.)
1927
1928   2.42, 2013-07-31 A bug fix and packaging release.
1929       ENHANCEMENT
1930           Documentation to dbjoin improved to better memory usage.  (Based on
1931           problem report by Lin Quan.)
1932
1933       BUG FIX
1934           The .spec is now perl-Fsdb.spec to satisfy rpmlint.  Thanks to
1935           Christopher Meng for a specific bug report.
1936
1937       BUG FIX
1938           Test dbroweval_last.cmd no longer has a column that caused failures
1939           because of numerical instability.
1940
1941       BUG FIX
1942           Some tests now better handle bugs in old versions of perl (5.10,
1943           5.12).  Thanks to Calvin Ardi for help debugging this on a Mac with
1944           perl-5.12, but the fix should affect other platforms.
1945
1946   2.43, 2013-08-27 Adds in-file compression.
1947       BUG FIX
1948           Changed the sort on TEST/dbsort_merge.cmd to strings (from
1949           numerics) so we're less susceptible to false test-failures due to
1950           floating point IO differences.
1951
1952       EXPERIMENTAL ENHANCEMENT
1953           Yet more parallelism in dbmerge: new "endgame-mode" builds a merge
1954           tree of processes at the end of large merge tasks to get maximally
1955           parallelism.  Currently this feature is off by default because it
1956           can hang for some inputs.  Enable this experimental feature with
1957           "--endgame".
1958
1959       ENHANCEMENT
1960           "Fsdb::IO" now handles being given "IO::Pipe" objects (as exercised
1961           by dbmerge).
1962
1963       BUG FIX
1964           Handling of NamedTmpfiles now supports concurrency.  This fix will
1965           hopefully fix occasional "Use of uninitialized value $_ in string
1966           ne at ...NamedTmpfile.pm line 93."  errors.
1967
1968       BUG FIX
1969           Fsdb now requires perl 5.10.  This is a bug fix because some test
1970           cases used to require it, but this fact was not properly
1971           documented.  (Back-porting to 5.008 would require removing all "//"
1972           operators.)
1973
1974       ENHANCEMENT
1975           Fsdb now handles automatic compression of file contents.  Enable
1976           compression with "dbfilealter -Z xz" (or "gz" or "bz2").  All
1977           programs should operate on compressed files and leave the output
1978           with the same level of compression.  "xz" is recommended as fastest
1979           and most efficient.  "gz" is produces unrepeatable output (and so
1980           has no output test), it seems to insist on adding a timestamp.
1981
1982   2.44, 2013-10-02 A major change--all threads are gone.
1983       ENHANCEMENT
1984           Fsdb is now thread free and only uses processes for parallelism.
1985           This change is a big change--the entire motivation for Fsdb-2 was
1986           to exploit parallelism via threading.  Parallelism--good, but perl
1987           threading--bad for performance.  Horribly bad for performance.
1988           About 20x worse than pipes on my box.  (See perl bug #119445 for
1989           the discussion.)
1990
1991       NEW "Fsdb::Support::Freds" provides a thread-like abstraction over
1992           forking, with some nice support for callbacks in the parent upon
1993           child termination.
1994
1995       ENHANCEMENT
1996           Details about removing threads: "dbpipeline" is thread free, and
1997           new tests to verify each of its parts.  The easy cases are
1998           "dbcolpercentile", "dbcolstats", "dbfilepivot", "dbjoin", and
1999           "dbcolstatscores", each of which use it in simple ways
2000           (2013-09-09).  "dbmerge" is now thread free (2013-09-13), but was a
2001           significant rewrite, which brought "dbsort" along.  "dbmapreduce"
2002           is partly thread free (2013-09-21), again as a rewrite, and it
2003           brings "dbmultistats" along.  Full "dbmapreduce" support took much
2004           longer (2013-10-02).
2005
2006       BUG FIX
2007           When running with user-only output ("-n"), dbroweval now resets the
2008           output vector $ofref after it has been output.
2009
2010       NEW dbcolcreate will create all columns at the head of each row with
2011           the "--first" option.
2012
2013       NEW dbfilecat will concatenate two files, verifying that they have the
2014           same schema.
2015
2016       ENHANCEMENT
2017           dbmapreduce now passes comments through, rather than eating them as
2018           before.
2019
2020           Also, dbmapreduce now supports a "--" option to prevent
2021           misinterpreting sub-program parameters as for dbmapreduce.
2022
2023       INCOMPATIBLE CHANGE
2024           dbmapreduce no longer figures out if it needs to add the key to the
2025           output.  For multi-key-aware reducers, it never does (and cannot).
2026           For non-multi-key-aware reducers, it defaults to add the key and
2027           will now fail if the reducer adds the key (with error "dbcolcreate:
2028           attempt to create pre-existing column...").  In such cases, one
2029           must disable adding the key with the new option "--no-prepend-key".
2030
2031       INCOMPATIBLE CHANGE
2032           dbmapreduce no longer copies the input field separator by default.
2033           For multi-key-aware reducers, it never does (and cannot).  For non-
2034           multi-key-aware reducers, it defaults to not copying the field
2035           separator, but it will copy it (the old default) with the
2036           "--copy-fs" option
2037
2038   2.45, 2013-10-07 cleanup from de-thread-ification
2039       BUG FIX
2040           Corrected a fast busy-wait in dbmerge.
2041
2042       ENHANCEMENT
2043           Endgame mode enabled in dbmerge; it (and also large cases of
2044           dbsort) should now exploit greater parallelism.
2045
2046       BUG FIX
2047           Test case with "Fsdb::BoundedQueue" (gone since 2.44) now removed.
2048
2049   2.46, 2013-10-08 continuing cleanup of our no-threads version
2050       BUG FIX
2051           Fixed some packaging details.  (Really, threads are no longer
2052           required, missing tests in the MANIFEST.)
2053
2054       IMPROVEMENT
2055           dbsort now better communicates with the merge process to avoid
2056           bursty parallelism.
2057
2058           Fsdb::IO::Writer now can take "-autoflush =" 1> for line-buffered
2059           IO.
2060
2061   2.47, 2013-10-12 test suite cleanup for non-threaded perls
2062       BUG FIX
2063           Removed some stray "use threads" in some test cases.  We didn't
2064           need them, and these were breaking non-threaded perls.
2065
2066       BUG FIX
2067           Better handling of Fred cleanup; should fix intermittent
2068           dbmapreduce failures on BSD.
2069
2070       ENHANCEMENT
2071           Improved test framework to show output when tests fail.  (This
2072           time, for real.)
2073
2074   2.48, 2014-01-03 small bugfixes and improved release engineering
2075       ENHANCEMENT
2076           Test suites now skip tests for libraries that are missing.  (Patch
2077           for missing "IO::Compresss:Xz" contributed by Calvin Ardi.)
2078
2079       ENHANCEMENT
2080           Removed references to Jdb in the package specification.  Since the
2081           name was changed in 2008, there's no longer a huge need for
2082           backwards compatibility.  (Suggestion form Petr Šabata.)
2083
2084       ENHANCEMENT
2085           Test suites now invoke the perl using the path from
2086           $Config{perlpath}.  Hopefully this helps testing in environments
2087           where there are multiple installed perls and the default perl is
2088           not the same as the perl-under-test (as happens in
2089           cpantesters.org).
2090
2091       BUG FIX
2092           Added specific encoding to this manpage to account for Unicode.
2093           Required to build correctly against perl-5.18.
2094
2095   2.49, 2014-01-04 bugfix to unicode handling in Fsdb IO (plus minor
2096       packaging fixes)
2097       BUG FIX
2098           Restored a line in the .spec to chmod g-s.
2099
2100       BUG FIX
2101           Unicode decoding is now handled correctly for programs that read
2102           from standard input.  (Also: New test scripts cover unicode input
2103           and output.)
2104
2105       BUG FIX
2106           Fix to Fsdb documentation encoding line.  Addresses test failure in
2107           perl-5.16 and earlier.  (Who knew "encoding" had to be followed by
2108           a blank line.)
2109

WHAT'S NEW

2111   2.50, 2014-05-27 a quick release for spec tweaks
2112       ENHANCEMENT
2113           In dbroweval, the "-N" (no output, even comments) option now
2114           implies "-n", and it now suppresses the header and trailer.
2115
2116       BUG FIX
2117           A few more tweaks to the perl-Fsdb.spec from Petr Šabata.
2118
2119       BUG FIX
2120           Fixed 3 uses of "use v5.10" in test suites that were causing test
2121           failures (due to warnings, not real failures) on some platforms.
2122
2123   2.51, 2014-09-05 Feature enhancements to dbcolmovingstats, dbcolcreate,
2124       dbmapreduce, and new sqlselect_to_db
2125       ENHANCEMENT
2126           dbcolcreate now has a "--no-recreate-fatal" that causes it to
2127           ignore creation of existing columns (instead of failing).
2128
2129       ENHANCEMENT
2130           dbmapreduce once again is robust to reducers that output the key;
2131           "--no-prepend-key" is no longer mandatory.
2132
2133       ENHANCEMENT
2134           dbcolsplittorows can now enumerate the output rows with "-E".
2135
2136       BUG FIX
2137           dbcolmovingstats is more mathematically robust.  Previously for
2138           some inputs and some platforms, floating point rounding could
2139           sometimes cause squareroots of negative numbers.
2140
2141       NEW sqlselect_to_db converts the output of the MySQL or MarinaDB select
2142           comment into fsdb format.
2143
2144       INCOMPATIBLE CHANGE
2145           dbfilediff now outputs the second row when doing sloppy numeric
2146           comparisons, to better support test suites.
2147
2148   2.52, 2014-11-03 Fixing the test suite for line number changes.
2149       ENHANCEMENT
2150           Test suites changes to be robust to exact line numbers of failures,
2151           since different Perl releases fail on different lines.
2152           <https://bugzilla.redhat.com/show_bug.cgi?id=1158380>
2153
2154   2.53, 2014-11-26 bug fixes and stability improvements to dbmapreduce
2155       ENHANCEMENT
2156           The dbfilediff how supports a "--quiet" option.
2157
2158       ENHANCEMENT
2159           Better documention of dbpipeline_filter.
2160
2161       BUGFIX
2162           Added groff-base and perl-podlators to the Fedora package spec.
2163           Fixes <https://bugzilla.redhat.com/show_bug.cgi?id=1163149>.  (Also
2164           in package 2.52-2.)
2165
2166       BUGFIX
2167           An important stability improvement to dbmapreduce.  It, plus
2168           dbmultistats, and dbcolstats now support controlled parallelism
2169           with the "--pararallelism=N" option.  They default to run with the
2170           number of available CPUs.  dbmapreduce also moderates its level of
2171           parallelism.  Previously it would create reducers as needed,
2172           causing CPU thrashing if reducers ran much slower than data
2173           production.
2174
2175       BUGFIX
2176           The combination of dbmapreduce with dbrowenumerate now works as it
2177           should.  (The obscure bug was an interaction with dbcolcreate with
2178           non-multi-key reducers that output their own key.  dbmapreduce has
2179           too many useful corner cases.)
2180
2181   2.54, 2014-11-28 fix for the test suite to correct failing tests on not-my-
2182       platform
2183       BUGFIX
2184           Sigh, the test suite now has a test suite.  Because, yes, I broke
2185           it, causing many incorrect failures at cpantesters.  Now fixed.
2186
2187   2.55, 2015-01-05 many spelling fixes and dbcolmovingstats tests are more
2188       robust to different numeric precision
2189       ENHANCEMENT
2190           dbfilediff now can be extra quiet, as I continue to try to track
2191           down a numeric difference on FreeBSD AMD boxes.
2192
2193       ENHANCEMENT
2194           dbcolmovingstats gave different test output (just reflecting
2195           rounding error) when stddev approaches zero.  We now detect hand
2196           handle this case.  See
2197           <https://rt.cpan.org/Public/Bug/Display.html?id=101220> and thanks
2198           to H. Merijn Brand for the bug report.
2199
2200       BUG FIX
2201           Many, many spelling bugs found by H. Merijn Brand; thanks for the
2202           bug report.
2203
2204       INCOMPATBLE CHANGE
2205           A number of programs had misspelled "separator" in
2206           "--fieldseparator" and "--columnseparator" options as "seperator".
2207           These are now correctly spelled.
2208
2209   2.56, 2015-02-03 fix against Getopt::Long-2.43's stricter error checkign
2210       BUG FIX
2211           Internal argument parsing uses Getopt::Long, but mixed pass-through
2212           and <>.  Bug reported by Petr Pisar at
2213           <https://bugzilla.redhat.com/show_bug.cgi?id=1188538>.a
2214
2215       BUG FIX
2216           Added missing BuildRequires for "XML::Simple".
2217
2218   2.57, 2015-04-29 Minor changes, with better performance from dbmulitstats.
2219       BUG FIX
2220           dbfilecat now honors "--remove-inputs" (previously it didn't).
2221           This omission meant that dbmapreduce (and dbmultistats) would
2222           accumulate files in /tmp when running.  Bad news for inputs with 4M
2223           keys.
2224
2225       ENHANCMENT
2226           dbmultistats should be faster with lots of small keys.  dbcolstats
2227           now supports "-k" to get some of the functionality of dbmultistats
2228           (if data is pre-sorted and median/quartiles are not required).
2229
2230           dbfilecat now honors "--remove-inputs" (previously it didn't).
2231           This omission meant that dbmapreduce (and dbmultistats) would
2232           accumulate files in /tmp when running.  Bad news for inputs with 4M
2233           keys.
2234
2235   2.58, 2015-04-30 Bugfix in dbmerge
2236       BUG FIX
2237           Fixed a case where dbmerge suffered mojobake in endgame mode.  This
2238           bug surfaced when dbsort was applied to large files (big enough to
2239           require merging) with unicode in them; the symptom was soemthing
2240           like:
2241             Wide character in print at /usr/lib64/perl5/IO/Handle.pm line
2242           420, <GEN12> line 111.
2243
2244   2.59, 2016-09-01 Collect a few small bug fixes and documentation
2245       improvements.
2246       BUG FIX
2247           More IO is explicitly marked UTF-8 to avoid Perl's tendency to
2248           mojibake on otherwise valid unicode input.  This change helps
2249           html_table_to_db.
2250
2251       ENHANCEMENT
2252           dbcolscorrelate now crossreferences dbcolsregression.
2253
2254       ENHANCEMENT
2255           Documentation for dbrowdiff now clarifies that the default is
2256           baseline mode.
2257
2258       BUG FIX
2259           dbjoin now propagates "-T" into the sorting process (if it is
2260           required).  Thanks to Lan Wei for reporting this bug.
2261
2262   2.60, 2016-09-04 Adds support for hash joins.
2263       ENHANCEMENT
2264           dbjoin now supports hash joins with "-t lefthash" and "-t
2265           righthash".  Hash joins cache a table in memory, but do not require
2266           that the other table be sorted.  They are ideal when joining a
2267           large table against a small one.
2268
2269   2.61, 2016-09-05 Support left and right outer joins.
2270       ENHANCEMENT
2271           dbjoin now handles left and right outer joins with "-t left" and
2272           "-t right".
2273
2274       ENHANCEMENT
2275           dbjoin hash joins are now selected with "-m lefthash" and "-m
2276           righthash" (not the shortlived "-t righthash" option).
2277           (Technically this change is incompatible with Fsdd-2.60, but no one
2278           but me ever used that version.)
2279
2280   2.62, 2016-11-29 A new yaml_to_db and other minor improvements.
2281       ENHANCEMENT
2282           Documentation for xml_to_db now includes sample output.
2283
2284       NEW yaml_to_db converts a specific form of YAML to fsdb.
2285
2286       BUG FIX
2287           The test suite now uses "diff -c -b" rather than "diff -cb" to make
2288           OpenBSD-5.9 happier, I hope.
2289
2290       ENHANCEMENT
2291           Comments that log operations at the end of each file now do simple
2292           quoting of spaces.  (It is not guaranteed to be fully shell-
2293           compliant.)
2294
2295       ENHANCEMENT
2296           There is a new standard option, "--header", allowing one to specify
2297           an Fsdb header for inputs that lack it.  Currently it is supported
2298           by dbcoldefine, dbrowuniq, dbmapreduce, dbmultistats, dbsort,
2299           dbpipeline.
2300
2301       ENHANCEMENT
2302           dbfilepivot now allows the --possible-pivots option, and if it is
2303           provided processes the data in one pass.
2304
2305       ENHANCEMENT
2306           dbroweval logs are now quoted.
2307
2308   2.63, 2017-02-03 Re-add some features supposedly in 2.62 but not, and add
2309       more --header options.
2310       ENHANCEMENT
2311           The option -j is now a synonym for --parallelism.  (And several
2312           documention bugs about this option are fixed.)
2313
2314       ENHANCEMENT
2315           Additional support for "--header" in dbcolmerge, dbcol, dbrow, and
2316           dbroweval.
2317
2318       BUG FIX
2319           Version 2.62 was supposed to have this improvement, but did not
2320           (and now does): dbfilepivot now allows the --possible-pivots
2321           option, and if it is provided processes the data in one pass.
2322
2323       BUG FIX
2324           Version 2.62 was supposed to have this improvement, but did not
2325           (and now does): dbroweval logs are now quoted.
2326
2327   2.64, 2017-11-20 several small bugfixes and enhancements
2328       BUG FIX
2329           In dbroweval, the "next row" option previously did not correctly
2330           set up "_last_fieldname".  It now does.
2331
2332       ENHANCEMENT
2333           The csv_to_db converter now has an optional "-F x" option to set
2334           the field separator.
2335
2336       ENHANCEMENT
2337           Finally dbcolsplittocols has a "--header" option, and a new "-N"
2338           option to give the list of resulting output columns.
2339
2340       INCOMPATIBLE CHANGE
2341           Now dbcolstats and dbmultistats produce no output (but a schema)
2342           when given no input but a schema.  Previously they gave a null row
2343           of output.  The "--output-on-no-input" and
2344           "--no-output-on-no-input" options can control this behavior.
2345
2346   2.65, 2018-02-16 Minor release, bug fix and -F option.
2347       ENHANCEMENT
2348           dbmultistats and dbmapreduce now both take a "-F x" option to set
2349           the field separator.
2350
2351       BUG FIX
2352           Fixed missing "use Carp" in dbcolstats.  Also went back and cleaned
2353           up all uses of "croak()".  Thanks to Zefram for the bug report.
2354
2355   2.66, 2018-12-20 Critical bug fix in dbjoin.
2356       BUG FIX
2357           Removed old tests from MANIFEST.  (Thanks to Hang Guo for reporting
2358           this bug.)
2359
2360       IMPROVEMENT
2361           Errors for non-existing input files now include the bad filename
2362           (before: "cannot setup filehandle", now: "cannot open input: cannot
2363           open TEST/bad_filename").
2364
2365       BUG FIX
2366           Hash joins with three identical rows were failing with the
2367           assertion failure "internal error: confused about overflow" due to
2368           a now-fixed bug.
2369
2370   2.67, 2019-07-10 add support for reading and writing hdfs
2371       IMPROVEMENT
2372           dbformmail now has an "mh" mechanism that writes messages to
2373           individual files (an mh-style mailbox).
2374
2375       BUG FIX
2376           dbrow failed to include the Carp library, leading to fails on
2377           croak.
2378
2379       BUG FIX
2380           Fixed dbjoin error message for an unsorted right stream was
2381           incorrect (it said left).
2382
2383       IMPROVEMENT
2384           All Fsdb programs can now read from and write to HDFS, when files
2385           that start with "hdfs:" are given to -i and -o options.
2386
2387   2.68, 2019-09-19 All programs now support automatic decompression based on
2388       file extension.
2389       IMPROVEMENT
2390           The omitted-possible-error test case for dbfilepivot now has an
2391           altnerative output that I saw on some BSD-running systems (thanks
2392           to CPAN).
2393
2394       IMPROVEMENT
2395           dbmerge and dbmerge2 now support "--header".  dbmerge2 now gives
2396           better error messages when presented the wrong number of inputs.
2397
2398       BUG FIX
2399           dbsort now works with "--header" even when the file is big (due to
2400           fixes to dbmerge).
2401
2402       IMPROVEMENT
2403           cvs_to_db now processes data with the "binary" option, allowing it
2404           to handle newlines embedded in quoted fields.
2405
2406       IMPROVEMENT
2407           All programs now will transparently decompress input files, if they
2408           are listed as a filename as an input argument that extends with a
2409           standard extension (.gz, .bz2, and .xz).
2410
2411   2.69, 2019-11-22 a small bugfix in dbcolstats
2412       BUG FIX
2413           Filled in the the test case for autodecompress, which was missing
2414           for the 2.68 release.
2415
2416       ENHANCEMENT
2417           The groff program is required for build, and the "Makefile.PL"
2418           fails if groff is missing at build time.  Thanks to Chris Williams
2419           for suggesting this check, and the CPAN auto-building system for
2420           trying many platforms.
2421
2422       BUG FIX
2423           The dbcolstats program had numerical instability that sometimes
2424           results in failing with a square-root of a negative number when
2425           many values varied right at the edge of floating-point precision.
2426           We now detect and report that case as 0 stddev.  Thanks to Hang Guo
2427           for providing a test case.
2428
2429   2.70, 2020-11-12 Some small quality-of-life enhancements and corner-case
2430       bugfixes.
2431       ENHANCEMENT
2432           dbcol can now take an option "-a" to include all columns, allowing
2433           reordering of certain columns while passing the rest through.
2434
2435       ENHANCEMENT
2436           dbrowuniq and dbmerge now buffer comments in a way that the last
2437           row of data output is no longer in the last block of comments.
2438           (The data is identical, but for humans looking at output, this
2439           change makes it less likely to lose the last row.)
2440
2441       BUG FIX
2442           dbmultistats and dbpipeline documentation now indicates that they
2443           support "--header" (something they did since version 2.62 in
2444           2016-11-29, but now documented.
2445
2446       ENHANCEMENT
2447           dbcolcreate now supports "--header".
2448
2449       BUG FIX
2450           Fixed several spelling errors in deprecated programs and removed
2451           information about the no-longer existing FreeBSD and MacOS ports.
2452           Thanks to Calvin Ardi for the patch.
2453
2454       BUG FIX
2455           dbmerge now handles --xargs when only one file is provided (and
2456           passes the file through unchanged).  It also throws a clean error
2457           with --xargs if zero files are provided.  (To support dbmerge,
2458           dbcol now has an internal "--saveoutput" option.)  Thanks to Yuri
2459           Pradkin for reporting the unhandled corner-case.
2460
2461   2.71, 2020-11-16 Fix a race condition breaking test suites.
2462       BUG FIX
2463           Suppress a race condition in dbcolmerge was sometimes throwing the
2464           error "Fsdb::Support::Freds: ending, but running process:
2465           dbmerge:xargs" in the dbmerge_0_xargs test case, on exit.
2466
2467   2.72, 2020-12-01 A small bug and a packaging improvement.
2468       BUG FIX
2469           dbcolhisto now handles the degenerate case where everything has the
2470           same value (previously it would throw "illegal division by zero").
2471
2472       ENHANCEMENT
2473           The spec for Fedora now includes "make" as BuildRequires, something
2474           required for Fedora 34.
2475
2476   2.73, 2021-05-18 Updates dbcolpercentile with "--weighted", and with more
2477       ipv6.
2478       ENHANCEMENT
2479           dbcolpercentile now has a "--weighted" option.
2480
2481       ENHANCEMENT
2482           The new Fsdb::Support::IPv6 package includes ipv6_normalize,
2483           ipv6_zeroize to rewrite ipv6 print addresses in IPv6 normal form,
2484           with a 0 in each 4-nybble field.
2485
2486   2.74, 2021-06-23 More ipv6.
2487       ENHANCEMENT
2488           Fsdb::Support::IPv6 package includes ipv6_fullhex to rewrite ipv6
2489           print addresses as full, 128-bit hex values.
2490
2491   2.75, 2022-04-02 New type specifications in the schema to better support
2492       type conversions in python.
2493       ENHANCEMENT
2494           Add optional type specifications to the schema.  Types are not used
2495           in Perl, but are relevant in Python and Go Fsdb bindings.  Types
2496           use a subset of perl pack specifiers: c, s, l, q are signed 8, 16,
2497           32, and 64-bit integers, f is a float, d is double float, a is
2498           utf-8 string, and &gt; and &lt; can force big or little endianness.
2499           The default type for everything is "a", that is, utf-8 strings.
2500           Thanks to Wes Hardaker for pushing to get this long-desired feature
2501           out the door; his Python bindings need types.
2502
2503       ENHANCEMENT
2504           dbcol, dbcolcreate, dbcolcopylast, and dbcolrename now understand
2505           and propagate schema types.  dbsort, dbjoin, dbmerge, dbmerge2 and
2506           dbfilepivot all take a new option "-t" to sort by type-inferred
2507           comparision, if a type is given.
2508
2509       ENHANCEMENT
2510           dbcolstat, dbmultistats, and dbcolmovingstats now include type
2511           information in their output schema.  (They assumes input variables
2512           are floats, not integers.)
2513
2514       ENHANCEMENT
2515           Even more IPv6: the functions in Fsdb::Support::IPv6 package now
2516           support strings of hex digits as an alternate encoding for IP
2517           address (and they are already the output of ipv6_fullhex), and
2518           "ip_fullhex_to_normal" converts full hex-encoded IPv4 or IPv6
2519           addresses to their "normal" form (dotted-quad or IPv6 printable
2520           format).
2521
2522   3.0, 2022-04-04 Complete type support and accordingly bump major version.
2523       NEW The major version number is now 3.0 to correspond to the addition
2524           of types (although they were actually added in 2.75).  Old fsdb
2525           files are supported (Fsdb-3.0 is backwards compatible with
2526           databases), but older versions will confuse types in new files (new
2527           Fsdb files are not forward compatible with old versions).
2528
2529       ENHANCEMENT
2530           Type specifications in a few more programs: dbcolhisto,
2531           dbcolscorrelate, dbcolsregression, dbcolstatscores,
2532           dbrowaccumulate, dbrowcount, dbrowdiff, dbrvstatdiff.
2533
2534       ENHANCEMENT
2535           dbcolhisto now puts an empty value on any empty rows.
2536
2537       NEW dbcoltype redefines column types, or clears them with the "-v"
2538           option.
2539

AUTHOR

2541       John Heidemann, "johnh@isi.edu"
2542
2543       See "Contributors" for the many people who have contributed bug reports
2544       and fixes.
2545

COPYRIGHT

2547       Fsdb is Copyright (C) 1991-2022 by John Heidemann <johnh@isi.edu>.
2548
2549       This program is free software; you can redistribute it and/or modify it
2550       under the terms of version 2 of the GNU General Public License as
2551       published by the Free Software Foundation.
2552
2553       This program is distributed in the hope that it will be useful, but
2554       WITHOUT ANY WARRANTY; without even the implied warranty of
2555       MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
2556       General Public License for more details.
2557
2558       You should have received a copy of the GNU General Public License along
2559       with this program; if not, write to the Free Software Foundation, Inc.,
2560       675 Mass Ave, Cambridge, MA 02139, USA.
2561
2562       A copy of the GNU General Public License can be found in the file
2563       ``COPYING''.
2564

COMMENTS and BUG REPORTS

2566       Any comments about these programs should be sent to John Heidemann
2567       "johnh@isi.edu".
2568
2569
2570
2571perl v5.36.0                      2022-11-22                           Fsdb(3)