1JOIN(1P)                   POSIX Programmer's Manual                  JOIN(1P)
2
3
4

PROLOG

6       This  manual  page is part of the POSIX Programmer's Manual.  The Linux
7       implementation of this interface may differ (consult the  corresponding
8       Linux  manual page for details of Linux behavior), or the interface may
9       not be implemented on Linux.
10

NAME

12       join - relational database operator
13

SYNOPSIS

15       join [-a file_number | -v file_number][-e string][-o list][-t char]
16               [-1 field][-2 field] file1 file2
17

DESCRIPTION

19       The join utility shall perform an equality join on the files file1  and
20       file2. The joined files shall be written to the standard output.
21
22       The join field is a field in each file on which the files are compared.
23       The join utility shall write one line in the output for  each  pair  of
24       lines  in  file1  and file2 that have identical join fields. The output
25       line by default shall consist of the join  field,  then  the  remaining
26       fields  from  file1,  then the remaining fields from file2. This format
27       can be changed by using the -o option (see below). The -a option can be
28       used  to  add unmatched lines to the output.  The -v option can be used
29       to output only unmatched lines.
30
31       The files file1 and file2 shall be ordered in the collating sequence of
32       sort  -b  on  the  fields on which they shall be joined, by default the
33       first in each line.  All selected output shall be written in  the  same
34       collating sequence.
35
36       The  default  input  field  separators shall be <blank>s. In this case,
37       multiple separators shall count as one  field  separator,  and  leading
38       separators  shall  be ignored. The default output field separator shall
39       be a <space>.
40
41       The field separator and collating sequence can be changed by using  the
42       -t option (see below).
43
44       If the same key appears more than once in either file, all combinations
45       of the set of remaining fields in file1 and the set of remaining fields
46       in file2 are output in the order of the lines encountered.
47
48       If  the  input files are not in the appropriate collating sequence, the
49       results are unspecified.
50

OPTIONS

52       The join utility shall  conform  to  the  Base  Definitions  volume  of
53       IEEE Std 1003.1-2001, Section 12.2, Utility Syntax Guidelines.
54
55       The following options shall be supported:
56
57       -a  file_number
58
59              Produce  a  line  for  each unpairable line in file file_number,
60              where file_number is 1 or 2, in addition to the default  output.
61              If both -a1 and -a2 are specified, all unpairable lines shall be
62              output.
63
64       -e  string
65              Replace empty output fields in the list selected by -o with  the
66              string string.
67
68       -o  list
69              Construct  the  output  line to comprise the fields specified in
70              list, each element of which shall have one of the following  two
71              forms:
72
73               1. file_number.field,  where  file_number  is a file number and
74                  field is a decimal integer field number
75
76               2. 0 (zero), representing the join field
77
78       The elements of list shall be either comma-separated  or  <blank>-sepa‐
79       rated,  as  specified  in Guideline 8 of the Base Definitions volume of
80       IEEE Std 1003.1-2001, Section  12.2,  Utility  Syntax  Guidelines.  The
81       fields  specified  by  list  shall  be  written for all selected output
82       lines. Fields selected by list that do not appear in the input shall be
83       treated as empty output fields.  (See the -e option.) Only specifically
84       requested fields shall be written. The application  shall  ensure  that
85       list is a single command line argument.
86
87       -t  char
88              Use  character  char  as a separator, for both input and output.
89              Every appearance of char in a line shall  be  significant.  When
90              this  option  is  specified, the collating sequence shall be the
91              same as sort without the -b option.
92
93       -v  file_number
94
95              Instead of the default output, produce  a  line  only  for  each
96              unpairable  line in file_number, where file_number is 1 or 2. If
97              both -v1 and -v2 are specified, all unpairable  lines  shall  be
98              output.
99
100       -1  field
101              Join on the fieldth field of file 1. Fields are decimal integers
102              starting with 1.
103
104       -2  field
105              Join on the fieldth field of file 2. Fields are decimal integers
106              starting with 1.
107
108

OPERANDS

110       The following operands shall be supported:
111
112       file1, file2
113              A  pathname  of  a  file to be joined. If either of the file1 or
114              file2 operands is '-', the standard input shall be used  in  its
115              place.
116
117

STDIN

119       The  standard input shall be used only if the file1 or file2 operand is
120       '-' . See the INPUT FILES section.
121

INPUT FILES

123       The input files shall be text files.
124

ENVIRONMENT VARIABLES

126       The following environment variables shall affect the execution of join:
127
128       LANG   Provide a default value for the  internationalization  variables
129              that  are  unset  or  null.  (See the Base Definitions volume of
130              IEEE Std 1003.1-2001, Section  8.2,  Internationalization  Vari‐
131              ables  for the precedence of internationalization variables used
132              to determine the values of locale categories.)
133
134       LC_ALL If set to a non-empty string value, override the values  of  all
135              the other internationalization variables.
136
137       LC_COLLATE
138
139              Determine  the  locale of the collating sequence join expects to
140              have been used when the input files were sorted.
141
142       LC_CTYPE
143              Determine the locale for  the  interpretation  of  sequences  of
144              bytes  of  text  data as characters (for example, single-byte as
145              opposed to multi-byte characters in arguments and input files).
146
147       LC_MESSAGES
148              Determine the locale that should be used to  affect  the  format
149              and contents of diagnostic messages written to standard error.
150
151       NLSPATH
152              Determine the location of message catalogs for the processing of
153              LC_MESSAGES .
154
155

ASYNCHRONOUS EVENTS

157       Default.
158

STDOUT

160       The join utility output shall be a concatenation of selected  character
161       fields. When the -o option is not specified, the output shall be:
162
163
164              "%s%s%s\n", <join field>, <other file1 fields>,
165                  <other file2 fields>
166
167       If   the   join   field   is  not  the  first  field  in  a  file,  the
168       <other file fields> for that file shall be:
169
170
171              <fields preceding join field>, <fields following join field>
172
173       When the -o option is specified, the output format shall be:
174
175
176              "%s\n", <concatenation of fields>
177
178       where the concatenation of fields is described by the -o option, above.
179
180       For either format, each field (except the last) shall be  written  with
181       its  trailing  separator  character.  If the separator is the default (
182       <blank>s), a single <space> shall be written after each  field  (except
183       the last).
184

STDERR

186       The standard error shall be used only for diagnostic messages.
187

OUTPUT FILES

189       None.
190

EXTENDED DESCRIPTION

192       None.
193

EXIT STATUS

195       The following exit values shall be returned:
196
197        0     All input files were output successfully.
198
199       >0     An error occurred.
200
201

CONSEQUENCES OF ERRORS

203       Default.
204
205       The following sections are informative.
206

APPLICATION USAGE

208       Pathnames  consisting  of  numeric  digits or of the form string.string
209       should not be specified directly following the -o list.
210

EXAMPLES

212       The -o 0 field essentially selects the union of the join  fields.   For
213       example, given file phone:
214
215
216              !Name           Phone Number
217              Don             +1 123-456-7890
218              Hal             +1 234-567-8901
219              Yasushi         +2 345-678-9012
220
221       and file fax:
222
223
224              !Name           Fax Number
225              Don             +1 123-456-7899
226              Keith           +1 456-789-0122
227              Yasushi         +2 345-678-9011
228
229       (where  the large expanses of white space are meant to each represent a
230       single <tab>), the command:
231
232
233              join -t "<tab>" -a 1 -a 2 -e '(unknown)' -o 0,1.2,2.2 phone fax
234
235       would produce:
236
237
238              !Name           Phone Number            Fax Number
239              Don             +1 123-456-7890         +1 123-456-7899
240              Hal             +1 234-567-8901         (unknown)
241              Keith           (unknown)               +1 456-789-0122
242              Yasushi         +2 345-678-9012         +2 345-678-9011
243
244       Multiple instances of the same key will produce combinatorial  results.
245       The following:
246
247
248              fa:
249                  a x
250                  a y
251                  a z
252              fb:
253                  a p
254
255       will produce:
256
257
258              a x p
259              a y p
260              a z p
261
262       And the following:
263
264
265              fa:
266                  a b c
267                  a d e
268              fb:
269                  a w x
270                  a y z
271                  a o p
272
273       will produce:
274
275
276              a b c w x
277              a b c y z
278              a b c o p
279              a d e w x
280              a d e y z
281              a d e o p
282

RATIONALE

284       The  -e option is only effective when used with -o because, unless spe‐
285       cific fields are identified using -o, join is not aware of what  fields
286       might  be empty. The exception to this is the join field, but identify‐
287       ing an empty join field with the -e string is not  historical  practice
288       and some scripts might break if this were changed.
289
290       The  0  field in the -o list was adopted from the Tenth Edition version
291       of join to satisfy international objections that the join in  the  base
292       documents does not support the "full join" or "outer join" described in
293       relational database  literature.  Although  it  has  been  possible  to
294       include  a  join  field  in  the output (by default, or by field number
295       using -o), the join field could not be included for  an  unpaired  line
296       selected  by  -a.  The  -o 0 field essentially selects the union of the
297       join fields.
298
299       This sort of outer join was not possible with the join commands in  the
300       base documents. The -o 0 field was chosen because it is an upwards-com‐
301       patible change for applications. An alternative  was  considered:  have
302       the  join  field  represent the union of the fields in the files (where
303       they are identical for matched lines, and one  or  both  are  null  for
304       unmatched lines). This was not adopted because it would break some his‐
305       torical applications.
306
307       The ability to specify file2 as - is not historical  practice;  it  was
308       added for completeness.
309
310       The  -v option is not historical practice, but was considered necessary
311       because it permitted the writing of only those lines that do not  match
312       on the join field, as opposed to the -a option, which prints both lines
313       that do and do not match. This additional facility is parallel with the
314       -v option of grep.
315
316       Some  historical  implementations  have  been encountered where a blank
317       line in one of the input files was considered to  be  the  end  of  the
318       file;  the  description in this volume of IEEE Std 1003.1-2001 does not
319       cite this as an allowable case.
320

FUTURE DIRECTIONS

322       None.
323

SEE ALSO

325       awk, comm, sort, uniq
326
328       Portions of this text are reprinted and reproduced in  electronic  form
329       from IEEE Std 1003.1, 2003 Edition, Standard for Information Technology
330       -- Portable Operating System Interface (POSIX),  The  Open  Group  Base
331       Specifications  Issue  6,  Copyright  (C) 2001-2003 by the Institute of
332       Electrical and Electronics Engineers, Inc and The Open  Group.  In  the
333       event of any discrepancy between this version and the original IEEE and
334       The Open Group Standard, the original IEEE and The Open Group  Standard
335       is  the  referee document. The original Standard can be obtained online
336       at http://www.opengroup.org/unix/online.html .
337
338
339
340IEEE/The Open Group                  2003                             JOIN(1P)
Impressum