1Fsdb::Filter::dbjoin(3)User Contributed Perl DocumentatioFnsdb::Filter::dbjoin(3)
2
3
4
6 dbjoin - join two tables on common columns
7
9 dbjoin [-Sid] --input table1.fsdb --input table2.fsdb [-nNrR] column [column...]
10
11 OR
12
13 cat table1.fsdb | dbjoin [-Sid] --input table2.fsdb [-nNrR] column [column...]
14
16 Does a natural, inner join on TABLE1 and TABLE2 the specified columns.
17 With the "-a" option, or with "-t outer" it will do a natural, full
18 outer join.
19
20 (Database review: inner joints output records only when there are
21 matches in both tables and will omit records that do not match. Outer
22 joins output all records from both tables, filling with the empty value
23 as needed. Right (left) outer joins keep all elements of the right
24 (left) table, even those that don't match in the other table.)
25
26 By default for non-hash joins, data will be sorted lexically, but the
27 usual sorting options can be mixed with the column specification.
28
29 Because two tables are required, input is typically in files. Standard
30 input is accessible by the file "-".
31
32 If only one input is given, the first (left) input is taken from stdin.
33
35 Joins can be expensive. Most databases have a query optimizer that
36 knows something about the data and so can select algorithms for
37 efficent operation, in Fsdb, you are that optimizer.
38
39 For non-hash joins: If data is already sorted, dbjoin will run more
40 efficiently by telling dbjoin the data is sorted with the "-S".
41
42 The resource requirements dbjoin vary. If input data is sorted and
43 "-S" is given, then memory consumption is bounded by the the sum of the
44 largest number of records in either dataset with the same value in the
45 join column, and there is no disk consumption. If data is not sorted,
46 then dbjoin requires disk storage the size of both input files.
47
48 One can minimize memory consumption by making sure each record of
49 table1 matches relatively few records in table2. Typically this means
50 that table2 should be the smaller. For example, given two files:
51 people.fsdb (schema: name iso_country_code) and countries.fsdb (schema:
52 iso_country_code full_country_name), then
53
54 dbjoin -i people.fsdb -i countries.fsdb iso_country_code
55
56 will require less memory than
57
58 dbjoin -i countries.fsdb -i people.fsdb iso_country_code
59
60 if there are many people per country (as one would expect). If warning
61 "lots of matching rows accumulating in memory" appears, this is the
62 cause and try swapping join order.
63
64 For hash joins (that is, with "-m righthash" or "-m lefthash"): all of
65 the right table (the second input) or the left (the first) is loaded
66 into memory (and "hashed"). The other table need not be sorted.
67 Runtime is O(n), but memory is O(size of hashed table).
68
70 -a or --all
71 Perform a full outer join, include non-matches (each record which
72 doesn't match at all will appear once). Default is an inner join.
73
74 -t TYPE or --type TYPE
75 Explicitly specify the join type. TYPE must be inner, outer, left
76 (outer), right (outer). (Recall tha inner join requires data on
77 both sides, outer joins keep all records from both sides for outer,
78 or all of the first or second input for left and right outer
79 joins.) Default: inner.
80
81 -m METHOD or --method METHOD
82 Select join method (algorithm). Choices are merge, righthash, and
83 lefthash. Default: merge.
84
85 -S or --pre-sorted
86 assume (and verify) data is already sorted
87
88 -e E or --empty E
89 give value E as the value for empty (null) records
90
91 -T TmpDir
92 where to put tmp files. Also uses environment variable TMPDIR, if
93 -T is not specified. Default is /tmp.
94
95 Sort specification options (can be interspersed with column names):
96
97 -r or --descending
98 sort in reverse order (high to low)
99
100 -R or --ascending
101 sort in normal order (low to high)
102
103 -n or --numeric
104 sort numerically
105
106 -N or --lexical
107 sort lexicographically
108
109 This module also supports the standard fsdb options:
110
111 -d Enable debugging output.
112
113 -i or --input InputSource
114 Read from InputSource, typically a file name, or "-" for standard
115 input, or (if in Perl) a IO::Handle, Fsdb::IO or Fsdb::BoundedQueue
116 objects.
117
118 -o or --output OutputDestination
119 Write to OutputDestination, typically a file name, or "-" for
120 standard output, or (if in Perl) a IO::Handle, Fsdb::IO or
121 Fsdb::BoundedQueue objects.
122
123 --autorun or --noautorun
124 By default, programs process automatically, but Fsdb::Filter
125 objects in Perl do not run until you invoke the run() method. The
126 "--(no)autorun" option controls that behavior within Perl.
127
128 --help
129 Show help.
130
131 --man
132 Show full manual.
133
135 Input:
136 #fsdb sid cid
137 1 10
138 2 11
139 1 12
140 2 12
141
142 And in the file DATA/classes:
143
144 #fsdb cid cname
145 10 pascal
146 11 numanal
147 12 os
148
149 Command:
150 cat DATA/reg.fsdb | dbsort -n cid | dbjoin -i - -i DATA/classes -n cid
151
152 Output:
153 #fsdb cid sid cname
154 10 1 pascal
155 11 2 numanal
156 12 1 os
157 12 2 os
158 # - COMMENTS:
159 # | /home/johnh/BIN/DB/dbsort -n cid
160 # DATA/classes COMMENTS:
161 # joined comments:
162 # | /home/johnh/BIN/DB/dbjoin - DATA/classes cid
163
165 Fsdb.
166
168 new
169 $filter = new Fsdb::Filter::dbjoin(@arguments);
170
171 Create a new dbjoin object, taking command-line arguments.
172
173 set_defaults
174 $filter->set_defaults();
175
176 Internal: set up defaults.
177
178 parse_options
179 $filter->parse_options(@ARGV);
180
181 Internal: parse command-line arguments.
182
183 setup
184 $filter->setup();
185
186 Internal: setup, parse headers.
187
188 run_merge_join
189 $filter->run_merge_join();
190
191 Internal: run over each rows.
192
193 run_hash_join
194 $filter->run_hash_join();
195
196 Internal: run over each rows, doing a hash join.
197
198 run
199 $filter->run();
200
201 Internal: run over each rows.
202
204 Copyright (C) 1991-2022 by John Heidemann <johnh@isi.edu>
205
206 This program is distributed under terms of the GNU general public
207 license, version 2. See the file COPYING with the distribution for
208 details.
209
210
211
212perl v5.34.1 2022-04-04 Fsdb::Filter::dbjoin(3)