1JOIN(1P)                   POSIX Programmer's Manual                  JOIN(1P)
2
3
4

PROLOG

6       This  manual  page is part of the POSIX Programmer's Manual.  The Linux
7       implementation of this interface may differ (consult the  corresponding
8       Linux  manual page for details of Linux behavior), or the interface may
9       not be implemented on Linux.
10

NAME

12       join — relational database operator
13

SYNOPSIS

15       join [-a file_number|-v file_number] [-e string] [-o list] [-t char]
16           [-1 field] [-2 field] file1 file2
17

DESCRIPTION

19       The join utility shall perform an equality join on the files file1  and
20       file2.  The joined files shall be written to the standard output.
21
22       The join field is a field in each file on which the files are compared.
23       The join utility shall write one line in the output for  each  pair  of
24       lines  in  file1  and file2 that have join fields that collate equally.
25       The output line by default shall consist of the join  field,  then  the
26       remaining  fields  from  file1,  then  the remaining fields from file2.
27       This format can be changed by using the -o option (see below).  The  -a
28       option  can be used to add unmatched lines to the output. The -v option
29       can be used to output only unmatched lines.
30
31       The files file1 and file2 shall be ordered in the collating sequence of
32       sort  -b  on  the  fields on which they shall be joined, by default the
33       first in each line. All selected output shall be written  in  the  same
34       collating sequence.
35
36       The default input field separators shall be <blank> characters. In this
37       case, multiple separators shall count as one field separator, and lead‐
38       ing  separators  shall  be  ignored. The default output field separator
39       shall be a <space>.
40
41       The field separator and collating sequence can be changed by using  the
42       -t option (see below).
43
44       If the same key appears more than once in either file, all combinations
45       of the set of remaining fields in file1 and the set of remaining fields
46       in file2 are output in the order of the lines encountered.
47
48       If  the  input files are not in the appropriate collating sequence, the
49       results are unspecified.
50

OPTIONS

52       The join utility shall  conform  to  the  Base  Definitions  volume  of
53       POSIX.1‐2017, Section 12.2, Utility Syntax Guidelines.
54
55       The following options shall be supported:
56
57       -a file_number
58                 Produce  a line for each unpairable line in file file_number,
59                 where file_number is 1 or 2, in addition to the default  out‐
60                 put.  If both -a1 and -a2 are specified, all unpairable lines
61                 shall be output.
62
63       -e string Replace empty output fields in the list selected by  -o  with
64                 the string string.
65
66       -o list   Construct the output line to comprise the fields specified in
67                 list, each element of which shall have one of  the  following
68                 two forms:
69
70                  1. file_number.field, where file_number is a file number and
71                     field is a decimal integer field number
72
73                  2. 0 (zero), representing the join field
74
75                 The elements of list shall  be  either  <comma>-separated  or
76                 <blank>-separated,  as  specified  in Guideline 8 of the Base
77                 Definitions volume of  POSIX.1‐2017,  Section  12.2,  Utility
78                 Syntax  Guidelines.   The  fields  specified by list shall be
79                 written for all selected output  lines.  Fields  selected  by
80                 list  that  do  not  appear  in the input shall be treated as
81                 empty output fields. (See the -e option.)  Only  specifically
82                 requested  fields  shall  be  written.  The application shall
83                 ensure that list is a single command line argument.
84
85       -t char   Use character char as a separator, for both input and output.
86                 Every appearance of char in a line shall be significant. When
87                 this option is specified, the collating sequence shall be the
88                 same as sort without the -b option.
89
90       -v file_number
91                 Instead  of  the default output, produce a line only for each
92                 unpairable line in file_number, where file_number is 1 or  2.
93                 If both -v1 and -v2 are specified, all unpairable lines shall
94                 be output.
95
96       -1 field  Join on the fieldth field of file 1. Fields are decimal inte‐
97                 gers starting with 1.
98
99       -2 field  Join on the fieldth field of file 2. Fields are decimal inte‐
100                 gers starting with 1.
101

OPERANDS

103       The following operands shall be supported:
104
105       file1, file2
106                 A pathname of a file to be joined. If either of the file1  or
107                 file2  operands  is  '-', the standard input shall be used in
108                 its place.
109

STDIN

111       The standard input shall be used only if the file1 or file2 operand  is
112       '-'.  See the INPUT FILES section.
113

INPUT FILES

115       The input files shall be text files.
116

ENVIRONMENT VARIABLES

118       The following environment variables shall affect the execution of join:
119
120       LANG      Provide  a  default  value for the internationalization vari‐
121                 ables that are unset or null. (See the Base Definitions  vol‐
122                 ume  of POSIX.1‐2017, Section 8.2, Internationalization Vari‐
123                 ables for the precedence  of  internationalization  variables
124                 used to determine the values of locale categories.)
125
126       LC_ALL    If  set  to  a non-empty string value, override the values of
127                 all the other internationalization variables.
128
129       LC_COLLATE
130                 Determine the locale of the collating sequence  join  expects
131                 to have been used when the input files were sorted.
132
133       LC_CTYPE  Determine  the  locale for the interpretation of sequences of
134                 bytes of text data as characters (for example, single-byte as
135                 opposed  to  multi-byte  characters  in  arguments  and input
136                 files).
137
138       LC_MESSAGES
139                 Determine the locale that should be used to affect the format
140                 and  contents  of  diagnostic  messages  written  to standard
141                 error.
142
143       NLSPATH   Determine the location of message catalogs for the processing
144                 of LC_MESSAGES.
145

ASYNCHRONOUS EVENTS

147       Default.
148

STDOUT

150       The  join utility output shall be a concatenation of selected character
151       fields.  When the -o option is not specified, the output shall be:
152
153
154           "%s%s%s\n", <join field>, <other file1 fields>,
155               <other file2 fields>
156
157       If  the  join  field  is  not  the  first  field   in   a   file,   the
158       <other file fields> for that file shall be:
159
160
161           <fields preceding join field>, <fields following join field>
162
163       When the -o option is specified, the output format shall be:
164
165
166           "%s\n", <concatenation of fields>
167
168       where the concatenation of fields is described by the -o option, above.
169
170       For  either  format, each field (except the last) shall be written with
171       its trailing separator character.  If  the  separator  is  the  default
172       (<blank>  characters),  a  single  <space>  shall be written after each
173       field (except the last).
174

STDERR

176       The standard error shall be used only for diagnostic messages.
177

OUTPUT FILES

179       None.
180

EXTENDED DESCRIPTION

182       None.
183

EXIT STATUS

185       The following exit values shall be returned:
186
187        0    All input files were output successfully.
188
189       >0    An error occurred.
190

CONSEQUENCES OF ERRORS

192       Default.
193
194       The following sections are informative.
195

APPLICATION USAGE

197       Pathnames consisting of numeric digits or  of  the  form  string.string
198       should not be specified directly following the -o list.
199
200       If  the  collating sequence of the current locale does not have a total
201       ordering  of  all  characters  (see  the  Base  Definitions  volume  of
202       POSIX.1‐2017,  Section 7.3.2, LC_COLLATE), join treats fields that col‐
203       late equally but are not identical as being the same. If this  behavior
204       is  not  desired,  it  can  be  avoided by forcing the use of the POSIX
205       locale (although this means re-sorting the input files into  the  POSIX
206       locale collating sequence.)
207
208       When using join to process pathnames, it is recommended that LC_ALL, or
209       at least LC_CTYPE and LC_COLLATE, are set to POSIX or C in the environ‐
210       ment, since pathnames can contain byte sequences that do not form valid
211       characters in some locales, in which case the utility's behavior  would
212       be  undefined.  In  the  POSIX  locale each byte is a valid single-byte
213       character, and therefore this problem is avoided.
214

EXAMPLES

216       The -o 0 field essentially selects the union of the  join  fields.  For
217       example, given file phone:
218
219
220           !Name           Phone Number
221           Don             +1 123-456-7890
222           Hal             +1 234-567-8901
223           Yasushi         +2 345-678-9012
224
225       and file fax:
226
227
228           !Name           Fax Number
229           Don             +1 123-456-7899
230           Keith           +1 456-789-0122
231           Yasushi         +2 345-678-9011
232
233       (where  the large expanses of white space are meant to each represent a
234       single <tab>), the command:
235
236
237           join -t "<tab>" -a 1 -a 2 -e '(unknown)' -o 0,1.2,2.2 phone fax
238
239       (where <tab> is a literal <tab> character) would produce:
240
241
242           !Name           Phone Number            Fax Number
243           Don             +1 123-456-7890         +1 123-456-7899
244           Hal             +1 234-567-8901         (unknown)
245           Keith           (unknown)               +1 456-789-0122
246           Yasushi         +2 345-678-9012         +2 345-678-9011
247
248       Multiple instances of the same key will produce combinatorial  results.
249       The following:
250
251
252           fa:
253               a x
254               a y
255               a z
256           fb:
257               a p
258
259       will produce:
260
261
262           a x p
263           a y p
264           a z p
265
266       And the following:
267
268
269           fa:
270               a b c
271               a d e
272           fb:
273               a w x
274               a y z
275               a o p
276
277       will produce:
278
279
280           a b c w x
281           a b c y z
282           a b c o p
283           a d e w x
284           a d e y z
285           a d e o p
286

RATIONALE

288       The  -e option is only effective when used with -o because, unless spe‐
289       cific fields are identified using -o, join is not aware of what  fields
290       might  be empty. The exception to this is the join field, but identify‐
291       ing an empty join field with the -e string is not  historical  practice
292       and some scripts might break if this were changed.
293
294       The  0  field in the -o list was adopted from the Tenth Edition version
295       of join to satisfy international objections that the join in  the  base
296       documents for IEEE Std 1003.2‐1992 did not support the ``full join'' or
297       ``outer join'' described in relational database  literature.   Although
298       it has been possible to include a join field in the output (by default,
299       or by field number using -o), the join field could not be included  for
300       an  unpaired  line  selected by -a.  The -o 0 field essentially selects
301       the union of the join fields.
302
303       This sort of outer join was not possible with the join commands in  the
304       base  documents  for  IEEE Std  1003.2‐1992.  The -o 0 field was chosen
305       because it is an upwards-compatible change for applications. An  alter‐
306       native  was  considered: have the join field represent the union of the
307       fields in the files (where they are identical for  matched  lines,  and
308       one or both are null for unmatched lines). This was not adopted because
309       it would break some historical applications.
310
311       The ability to specify file2 as - is not historical  practice;  it  was
312       added for completeness.
313
314       The  -v option is not historical practice, but was considered necessary
315       because it permitted the writing of only those lines that do not  match
316       on the join field, as opposed to the -a option, which prints both lines
317       that do and do not match. This additional facility is parallel with the
318       -v option of grep.
319
320       Some  historical  implementations  have  been encountered where a blank
321       line in one of the input files was considered to  be  the  end  of  the
322       file; the description in this volume of POSIX.1‐2017 does not cite this
323       as an allowable case.
324
325       Earlier versions of this standard allowed -j, -j1, -j2 options,  and  a
326       form  of the -o option that allowed the list option-argument to be mul‐
327       tiple arguments. These forms are no longer  specified  by  POSIX.1‐2008
328       but may be present in some implementations.
329

FUTURE DIRECTIONS

331       None.
332

SEE ALSO

334       awk, comm, sort, uniq
335
336       The Base Definitions volume of POSIX.1‐2017, Section 7.3.2, LC_COLLATE,
337       Chapter 8, Environment Variables, Section 12.2, Utility  Syntax  Guide‐
338       lines
339
341       Portions  of  this text are reprinted and reproduced in electronic form
342       from IEEE Std 1003.1-2017, Standard for Information Technology --  Por‐
343       table  Operating System Interface (POSIX), The Open Group Base Specifi‐
344       cations Issue 7, 2018 Edition, Copyright (C) 2018 by the  Institute  of
345       Electrical  and  Electronics Engineers, Inc and The Open Group.  In the
346       event of any discrepancy between this version and the original IEEE and
347       The  Open Group Standard, the original IEEE and The Open Group Standard
348       is the referee document. The original Standard can be  obtained  online
349       at http://www.opengroup.org/unix/online.html .
350
351       Any  typographical  or  formatting  errors that appear in this page are
352       most likely to have been introduced during the conversion of the source
353       files  to  man page format. To report such errors, see https://www.ker
354       nel.org/doc/man-pages/reporting_bugs.html .
355
356
357
358IEEE/The Open Group                  2017                             JOIN(1P)
Impressum