1JOIN(1P) POSIX Programmer's Manual JOIN(1P)
2
3
4
6 This manual page is part of the POSIX Programmer's Manual. The Linux
7 implementation of this interface may differ (consult the corresponding
8 Linux manual page for details of Linux behavior), or the interface may
9 not be implemented on Linux.
10
12 join — relational database operator
13
15 join [-a file_number|-v file_number] [-e string] [-o list] [-t char]
16 [-1 field] [-2 field] file1 file2
17
19 The join utility shall perform an equality join on the files file1 and
20 file2. The joined files shall be written to the standard output.
21
22 The join field is a field in each file on which the files are compared.
23 The join utility shall write one line in the output for each pair of
24 lines in file1 and file2 that have join fields that collate equally.
25 The output line by default shall consist of the join field, then the
26 remaining fields from file1, then the remaining fields from file2.
27 This format can be changed by using the -o option (see below). The -a
28 option can be used to add unmatched lines to the output. The -v option
29 can be used to output only unmatched lines.
30
31 The files file1 and file2 shall be ordered in the collating sequence of
32 sort -b on the fields on which they shall be joined, by default the
33 first in each line. All selected output shall be written in the same
34 collating sequence.
35
36 The default input field separators shall be <blank> characters. In this
37 case, multiple separators shall count as one field separator, and lead‐
38 ing separators shall be ignored. The default output field separator
39 shall be a <space>.
40
41 The field separator and collating sequence can be changed by using the
42 -t option (see below).
43
44 If the same key appears more than once in either file, all combinations
45 of the set of remaining fields in file1 and the set of remaining fields
46 in file2 are output in the order of the lines encountered.
47
48 If the input files are not in the appropriate collating sequence, the
49 results are unspecified.
50
52 The join utility shall conform to the Base Definitions volume of
53 POSIX.1‐2017, Section 12.2, Utility Syntax Guidelines.
54
55 The following options shall be supported:
56
57 -a file_number
58 Produce a line for each unpairable line in file file_number,
59 where file_number is 1 or 2, in addition to the default out‐
60 put. If both -a1 and -a2 are specified, all unpairable lines
61 shall be output.
62
63 -e string Replace empty output fields in the list selected by -o with
64 the string string.
65
66 -o list Construct the output line to comprise the fields specified in
67 list, each element of which shall have one of the following
68 two forms:
69
70 1. file_number.field, where file_number is a file number and
71 field is a decimal integer field number
72
73 2. 0 (zero), representing the join field
74
75 The elements of list shall be either <comma>-separated or
76 <blank>-separated, as specified in Guideline 8 of the Base
77 Definitions volume of POSIX.1‐2017, Section 12.2, Utility
78 Syntax Guidelines. The fields specified by list shall be
79 written for all selected output lines. Fields selected by
80 list that do not appear in the input shall be treated as
81 empty output fields. (See the -e option.) Only specifically
82 requested fields shall be written. The application shall
83 ensure that list is a single command line argument.
84
85 -t char Use character char as a separator, for both input and output.
86 Every appearance of char in a line shall be significant. When
87 this option is specified, the collating sequence shall be the
88 same as sort without the -b option.
89
90 -v file_number
91 Instead of the default output, produce a line only for each
92 unpairable line in file_number, where file_number is 1 or 2.
93 If both -v1 and -v2 are specified, all unpairable lines shall
94 be output.
95
96 -1 field Join on the fieldth field of file 1. Fields are decimal inte‐
97 gers starting with 1.
98
99 -2 field Join on the fieldth field of file 2. Fields are decimal inte‐
100 gers starting with 1.
101
103 The following operands shall be supported:
104
105 file1, file2
106 A pathname of a file to be joined. If either of the file1 or
107 file2 operands is '-', the standard input shall be used in
108 its place.
109
111 The standard input shall be used only if the file1 or file2 operand is
112 '-'. See the INPUT FILES section.
113
115 The input files shall be text files.
116
118 The following environment variables shall affect the execution of join:
119
120 LANG Provide a default value for the internationalization vari‐
121 ables that are unset or null. (See the Base Definitions vol‐
122 ume of POSIX.1‐2017, Section 8.2, Internationalization Vari‐
123 ables for the precedence of internationalization variables
124 used to determine the values of locale categories.)
125
126 LC_ALL If set to a non-empty string value, override the values of
127 all the other internationalization variables.
128
129 LC_COLLATE
130 Determine the locale of the collating sequence join expects
131 to have been used when the input files were sorted.
132
133 LC_CTYPE Determine the locale for the interpretation of sequences of
134 bytes of text data as characters (for example, single-byte as
135 opposed to multi-byte characters in arguments and input
136 files).
137
138 LC_MESSAGES
139 Determine the locale that should be used to affect the format
140 and contents of diagnostic messages written to standard
141 error.
142
143 NLSPATH Determine the location of message catalogs for the processing
144 of LC_MESSAGES.
145
147 Default.
148
150 The join utility output shall be a concatenation of selected character
151 fields. When the -o option is not specified, the output shall be:
152
153
154 "%s%s%s\n", <join field>, <other file1 fields>,
155 <other file2 fields>
156
157 If the join field is not the first field in a file, the
158 <other file fields> for that file shall be:
159
160
161 <fields preceding join field>, <fields following join field>
162
163 When the -o option is specified, the output format shall be:
164
165
166 "%s\n", <concatenation of fields>
167
168 where the concatenation of fields is described by the -o option, above.
169
170 For either format, each field (except the last) shall be written with
171 its trailing separator character. If the separator is the default
172 (<blank> characters), a single <space> shall be written after each
173 field (except the last).
174
176 The standard error shall be used only for diagnostic messages.
177
179 None.
180
182 None.
183
185 The following exit values shall be returned:
186
187 0 All input files were output successfully.
188
189 >0 An error occurred.
190
192 Default.
193
194 The following sections are informative.
195
197 Pathnames consisting of numeric digits or of the form string.string
198 should not be specified directly following the -o list.
199
200 If the collating sequence of the current locale does not have a total
201 ordering of all characters (see the Base Definitions volume of
202 POSIX.1‐2017, Section 7.3.2, LC_COLLATE), join treats fields that col‐
203 late equally but are not identical as being the same. If this behavior
204 is not desired, it can be avoided by forcing the use of the POSIX
205 locale (although this means re-sorting the input files into the POSIX
206 locale collating sequence.)
207
208 When using join to process pathnames, it is recommended that LC_ALL, or
209 at least LC_CTYPE and LC_COLLATE, are set to POSIX or C in the environ‐
210 ment, since pathnames can contain byte sequences that do not form valid
211 characters in some locales, in which case the utility's behavior would
212 be undefined. In the POSIX locale each byte is a valid single-byte
213 character, and therefore this problem is avoided.
214
216 The -o 0 field essentially selects the union of the join fields. For
217 example, given file phone:
218
219
220 !Name Phone Number
221 Don +1 123-456-7890
222 Hal +1 234-567-8901
223 Yasushi +2 345-678-9012
224
225 and file fax:
226
227
228 !Name Fax Number
229 Don +1 123-456-7899
230 Keith +1 456-789-0122
231 Yasushi +2 345-678-9011
232
233 (where the large expanses of white space are meant to each represent a
234 single <tab>), the command:
235
236
237 join -t "<tab>" -a 1 -a 2 -e '(unknown)' -o 0,1.2,2.2 phone fax
238
239 (where <tab> is a literal <tab> character) would produce:
240
241
242 !Name Phone Number Fax Number
243 Don +1 123-456-7890 +1 123-456-7899
244 Hal +1 234-567-8901 (unknown)
245 Keith (unknown) +1 456-789-0122
246 Yasushi +2 345-678-9012 +2 345-678-9011
247
248 Multiple instances of the same key will produce combinatorial results.
249 The following:
250
251
252 fa:
253 a x
254 a y
255 a z
256 fb:
257 a p
258
259 will produce:
260
261
262 a x p
263 a y p
264 a z p
265
266 And the following:
267
268
269 fa:
270 a b c
271 a d e
272 fb:
273 a w x
274 a y z
275 a o p
276
277 will produce:
278
279
280 a b c w x
281 a b c y z
282 a b c o p
283 a d e w x
284 a d e y z
285 a d e o p
286
288 The -e option is only effective when used with -o because, unless spe‐
289 cific fields are identified using -o, join is not aware of what fields
290 might be empty. The exception to this is the join field, but identify‐
291 ing an empty join field with the -e string is not historical practice
292 and some scripts might break if this were changed.
293
294 The 0 field in the -o list was adopted from the Tenth Edition version
295 of join to satisfy international objections that the join in the base
296 documents for IEEE Std 1003.2‐1992 did not support the ``full join'' or
297 ``outer join'' described in relational database literature. Although
298 it has been possible to include a join field in the output (by default,
299 or by field number using -o), the join field could not be included for
300 an unpaired line selected by -a. The -o 0 field essentially selects
301 the union of the join fields.
302
303 This sort of outer join was not possible with the join commands in the
304 base documents for IEEE Std 1003.2‐1992. The -o 0 field was chosen
305 because it is an upwards-compatible change for applications. An alter‐
306 native was considered: have the join field represent the union of the
307 fields in the files (where they are identical for matched lines, and
308 one or both are null for unmatched lines). This was not adopted because
309 it would break some historical applications.
310
311 The ability to specify file2 as - is not historical practice; it was
312 added for completeness.
313
314 The -v option is not historical practice, but was considered necessary
315 because it permitted the writing of only those lines that do not match
316 on the join field, as opposed to the -a option, which prints both lines
317 that do and do not match. This additional facility is parallel with the
318 -v option of grep.
319
320 Some historical implementations have been encountered where a blank
321 line in one of the input files was considered to be the end of the
322 file; the description in this volume of POSIX.1‐2017 does not cite this
323 as an allowable case.
324
325 Earlier versions of this standard allowed -j, -j1, -j2 options, and a
326 form of the -o option that allowed the list option-argument to be mul‐
327 tiple arguments. These forms are no longer specified by POSIX.1‐2008
328 but may be present in some implementations.
329
331 None.
332
334 awk, comm, sort, uniq
335
336 The Base Definitions volume of POSIX.1‐2017, Section 7.3.2, LC_COLLATE,
337 Chapter 8, Environment Variables, Section 12.2, Utility Syntax Guide‐
338 lines
339
341 Portions of this text are reprinted and reproduced in electronic form
342 from IEEE Std 1003.1-2017, Standard for Information Technology -- Por‐
343 table Operating System Interface (POSIX), The Open Group Base Specifi‐
344 cations Issue 7, 2018 Edition, Copyright (C) 2018 by the Institute of
345 Electrical and Electronics Engineers, Inc and The Open Group. In the
346 event of any discrepancy between this version and the original IEEE and
347 The Open Group Standard, the original IEEE and The Open Group Standard
348 is the referee document. The original Standard can be obtained online
349 at http://www.opengroup.org/unix/online.html .
350
351 Any typographical or formatting errors that appear in this page are
352 most likely to have been introduced during the conversion of the source
353 files to man page format. To report such errors, see https://www.ker‐
354 nel.org/doc/man-pages/reporting_bugs.html .
355
356
357
358IEEE/The Open Group 2017 JOIN(1P)