1JOIN(1P) POSIX Programmer's Manual JOIN(1P)
2
3
4
6 This manual page is part of the POSIX Programmer's Manual. The Linux
7 implementation of this interface may differ (consult the corresponding
8 Linux manual page for details of Linux behavior), or the interface may
9 not be implemented on Linux.
10
11
13 join — relational database operator
14
16 join [−a file_number|−v file_number] [−e string] [−o list] [−t char]
17 [−1 field] [−2 field] file1 file2
18
20 The join utility shall perform an equality join on the files file1 and
21 file2. The joined files shall be written to the standard output.
22
23 The join field is a field in each file on which the files are compared.
24 The join utility shall write one line in the output for each pair of
25 lines in file1 and file2 that have identical join fields. The output
26 line by default shall consist of the join field, then the remaining
27 fields from file1, then the remaining fields from file2. This format
28 can be changed by using the −o option (see below). The −a option can be
29 used to add unmatched lines to the output. The −v option can be used to
30 output only unmatched lines.
31
32 The files file1 and file2 shall be ordered in the collating sequence of
33 sort −b on the fields on which they shall be joined, by default the
34 first in each line. All selected output shall be written in the same
35 collating sequence.
36
37 The default input field separators shall be <blank> characters. In this
38 case, multiple separators shall count as one field separator, and lead‐
39 ing separators shall be ignored. The default output field separator
40 shall be a <space>.
41
42 The field separator and collating sequence can be changed by using the
43 −t option (see below).
44
45 If the same key appears more than once in either file, all combinations
46 of the set of remaining fields in file1 and the set of remaining fields
47 in file2 are output in the order of the lines encountered.
48
49 If the input files are not in the appropriate collating sequence, the
50 results are unspecified.
51
53 The join utility shall conform to the Base Definitions volume of
54 POSIX.1‐2008, Section 12.2, Utility Syntax Guidelines.
55
56 The following options shall be supported:
57
58 −a file_number
59 Produce a line for each unpairable line in file file_number,
60 where file_number is 1 or 2, in addition to the default out‐
61 put. If both −a1 and −a2 are specified, all unpairable lines
62 shall be output.
63
64 −e string Replace empty output fields in the list selected by −o with
65 the string string.
66
67 −o list Construct the output line to comprise the fields specified in
68 list, each element of which shall have one of the following
69 two forms:
70
71 1. file_number.field, where file_number is a file number and
72 field is a decimal integer field number
73
74 2. 0 (zero), representing the join field
75
76 The elements of list shall be either <comma>-separated or
77 <blank>-separated, as specified in Guideline 8 of the Base
78 Definitions volume of POSIX.1‐2008, Section 12.2, Utility
79 Syntax Guidelines. The fields specified by list shall be
80 written for all selected output lines. Fields selected by
81 list that do not appear in the input shall be treated as
82 empty output fields. (See the −e option.) Only specifically
83 requested fields shall be written. The application shall
84 ensure that list is a single command line argument.
85
86 −t char Use character char as a separator, for both input and output.
87 Every appearance of char in a line shall be significant. When
88 this option is specified, the collating sequence shall be the
89 same as sort without the −b option.
90
91 −v file_number
92 Instead of the default output, produce a line only for each
93 unpairable line in file_number, where file_number is 1 or 2.
94 If both −v1 and −v2 are specified, all unpairable lines shall
95 be output.
96
97 −1 field Join on the fieldth field of file 1. Fields are decimal inte‐
98 gers starting with 1.
99
100 −2 field Join on the fieldth field of file 2. Fields are decimal inte‐
101 gers starting with 1.
102
104 The following operands shall be supported:
105
106 file1, file2
107 A pathname of a file to be joined. If either of the file1 or
108 file2 operands is '−', the standard input shall be used in
109 its place.
110
112 The standard input shall be used only if the file1 or file2 operand is
113 '−'. See the INPUT FILES section.
114
116 The input files shall be text files.
117
119 The following environment variables shall affect the execution of join:
120
121 LANG Provide a default value for the internationalization vari‐
122 ables that are unset or null. (See the Base Definitions vol‐
123 ume of POSIX.1‐2008, Section 8.2, Internationalization Vari‐
124 ables for the precedence of internationalization variables
125 used to determine the values of locale categories.)
126
127 LC_ALL If set to a non-empty string value, override the values of
128 all the other internationalization variables.
129
130 LC_COLLATE
131 Determine the locale of the collating sequence join expects
132 to have been used when the input files were sorted.
133
134 LC_CTYPE Determine the locale for the interpretation of sequences of
135 bytes of text data as characters (for example, single-byte as
136 opposed to multi-byte characters in arguments and input
137 files).
138
139 LC_MESSAGES
140 Determine the locale that should be used to affect the format
141 and contents of diagnostic messages written to standard
142 error.
143
144 NLSPATH Determine the location of message catalogs for the processing
145 of LC_MESSAGES.
146
148 Default.
149
151 The join utility output shall be a concatenation of selected character
152 fields. When the −o option is not specified, the output shall be:
153
154 "%s%s%s\n", <join field>, <other file1 fields>,
155 <other file2 fields>
156
157 If the join field is not the first field in a file, the
158 <other file fields> for that file shall be:
159
160 <fields preceding join field>, <fields following join field>
161
162 When the −o option is specified, the output format shall be:
163
164 "%s\n", <concatenation of fields>
165
166 where the concatenation of fields is described by the −o option, above.
167
168 For either format, each field (except the last) shall be written with
169 its trailing separator character. If the separator is the default
170 (<blank> characters), a single <space> shall be written after each
171 field (except the last).
172
174 The standard error shall be used only for diagnostic messages.
175
177 None.
178
180 None.
181
183 The following exit values shall be returned:
184
185 0 All input files were output successfully.
186
187 >0 An error occurred.
188
190 Default.
191
192 The following sections are informative.
193
195 Pathnames consisting of numeric digits or of the form string.string
196 should not be specified directly following the −o list.
197
199 The −o 0 field essentially selects the union of the join fields. For
200 example, given file phone:
201
202 !Name Phone Number
203 Don +1 123-456-7890
204 Hal +1 234-567-8901
205 Yasushi +2 345-678-9012
206
207 and file fax:
208
209 !Name Fax Number
210 Don +1 123-456-7899
211 Keith +1 456-789-0122
212 Yasushi +2 345-678-9011
213
214 (where the large expanses of white space are meant to each represent a
215 single <tab>), the command:
216
217 join −t "<tab>" −a 1 −a 2 −e '(unknown)' −o 0,1.2,2.2 phone fax
218
219 would produce:
220
221 !Name Phone Number Fax Number
222 Don +1 123-456-7890 +1 123-456-7899
223 Hal +1 234-567-8901 (unknown)
224 Keith (unknown) +1 456-789-0122
225 Yasushi +2 345-678-9012 +2 345-678-9011
226
227 Multiple instances of the same key will produce combinatorial results.
228 The following:
229
230 fa:
231 a x
232 a y
233 a z
234 fb:
235 a p
236
237 will produce:
238
239 a x p
240 a y p
241 a z p
242
243 And the following:
244
245 fa:
246 a b c
247 a d e
248 fb:
249 a w x
250 a y z
251 a o p
252
253 will produce:
254
255 a b c w x
256 a b c y z
257 a b c o p
258 a d e w x
259 a d e y z
260 a d e o p
261
263 The −e option is only effective when used with −o because, unless spe‐
264 cific fields are identified using −o, join is not aware of what fields
265 might be empty. The exception to this is the join field, but identify‐
266 ing an empty join field with the −e string is not historical practice
267 and some scripts might break if this were changed.
268
269 The 0 field in the −o list was adopted from the Tenth Edition version
270 of join to satisfy international objections that the join in the base
271 documents does not support the ``full join'' or ``outer join''
272 described in relational database literature. Although it has been pos‐
273 sible to include a join field in the output (by default, or by field
274 number using −o), the join field could not be included for an unpaired
275 line selected by −a. The −o 0 field essentially selects the union of
276 the join fields.
277
278 This sort of outer join was not possible with the join commands in the
279 base documents. The −o 0 field was chosen because it is an upwards-com‐
280 patible change for applications. An alternative was considered: have
281 the join field represent the union of the fields in the files (where
282 they are identical for matched lines, and one or both are null for
283 unmatched lines). This was not adopted because it would break some his‐
284 torical applications.
285
286 The ability to specify file2 as − is not historical practice; it was
287 added for completeness.
288
289 The −v option is not historical practice, but was considered necessary
290 because it permitted the writing of only those lines that do not match
291 on the join field, as opposed to the −a option, which prints both lines
292 that do and do not match. This additional facility is parallel with the
293 −v option of grep.
294
295 Some historical implementations have been encountered where a blank
296 line in one of the input files was considered to be the end of the
297 file; the description in this volume of POSIX.1‐2008 does not cite this
298 as an allowable case.
299
300 Earlier versions of this standard allowed −j, −j1, −j2 options, and a
301 form of the −o option that allowed the list option-argument to be mul‐
302 tiple arguments. These forms are no longer specified by POSIX.1‐2008
303 but may be present in some implementations.
304
306 None.
307
309 awk, comm, sort, uniq
310
311 The Base Definitions volume of POSIX.1‐2008, Chapter 8, Environment
312 Variables, Section 12.2, Utility Syntax Guidelines
313
315 Portions of this text are reprinted and reproduced in electronic form
316 from IEEE Std 1003.1, 2013 Edition, Standard for Information Technology
317 -- Portable Operating System Interface (POSIX), The Open Group Base
318 Specifications Issue 7, Copyright (C) 2013 by the Institute of Electri‐
319 cal and Electronics Engineers, Inc and The Open Group. (This is
320 POSIX.1-2008 with the 2013 Technical Corrigendum 1 applied.) In the
321 event of any discrepancy between this version and the original IEEE and
322 The Open Group Standard, the original IEEE and The Open Group Standard
323 is the referee document. The original Standard can be obtained online
324 at http://www.unix.org/online.html .
325
326 Any typographical or formatting errors that appear in this page are
327 most likely to have been introduced during the conversion of the source
328 files to man page format. To report such errors, see https://www.ker‐
329 nel.org/doc/man-pages/reporting_bugs.html .
330
331
332
333IEEE/The Open Group 2013 JOIN(1P)