1SELECT_TUT(2)              Linux Programmer's Manual             SELECT_TUT(2)
2
3
4

NAME

6       select, pselect - synchronous I/O multiplexing
7

SYNOPSIS

9       See select(2)
10

DESCRIPTION

12       The select() and pselect() system calls are used to efficiently monitor
13       multiple file descriptors, to see  if  any  of  them  is,  or  becomes,
14       "ready";  that  is,  to see whether I/O becomes possible, or an "excep‐
15       tional condition" has occurred on any of the file descriptors.
16
17       This page provides background and tutorial information on  the  use  of
18       these  system calls.  For details of the arguments and semantics of se‐
19       lect() and pselect(), see select(2).
20
21   Combining signal and data events
22       pselect() is useful if you are waiting for a signal as well as for file
23       descriptor(s)  to  become ready for I/O.  Programs that receive signals
24       normally use the signal handler only  to  raise  a  global  flag.   The
25       global  flag will indicate that the event must be processed in the main
26       loop of the program.  A signal will cause the select()  (or  pselect())
27       call  to return with errno set to EINTR.  This behavior is essential so
28       that signals can be processed in the main loop of the  program,  other‐
29       wise select() would block indefinitely.
30
31       Now,  somewhere  in  the  main  loop will be a conditional to check the
32       global flag.  So we must ask: what if a signal arrives after the condi‐
33       tional,  but  before  the  select()  call?  The answer is that select()
34       would block indefinitely, even though an  event  is  actually  pending.
35       This  race condition is solved by the pselect() call.  This call can be
36       used to set the signal mask to a set of signals that are to be received
37       only  within  the  pselect()  call.   For instance, let us say that the
38       event in question was the exit of a child process.  Before the start of
39       the  main  loop, we would block SIGCHLD using sigprocmask(2).  Our pse‐
40       lect() call would enable SIGCHLD by using an empty  signal  mask.   Our
41       program would look like:
42
43       static volatile sig_atomic_t got_SIGCHLD = 0;
44
45       static void
46       child_sig_handler(int sig)
47       {
48           got_SIGCHLD = 1;
49       }
50
51       int
52       main(int argc, char *argv[])
53       {
54           sigset_t sigmask, empty_mask;
55           struct sigaction sa;
56           fd_set readfds, writefds, exceptfds;
57           int r;
58
59           sigemptyset(&sigmask);
60           sigaddset(&sigmask, SIGCHLD);
61           if (sigprocmask(SIG_BLOCK, &sigmask, NULL) == -1) {
62               perror("sigprocmask");
63               exit(EXIT_FAILURE);
64           }
65
66           sa.sa_flags = 0;
67           sa.sa_handler = child_sig_handler;
68           sigemptyset(&sa.sa_mask);
69           if (sigaction(SIGCHLD, &sa, NULL) == -1) {
70               perror("sigaction");
71               exit(EXIT_FAILURE);
72           }
73
74           sigemptyset(&empty_mask);
75
76           for (;;) {          /* main loop */
77               /* Initialize readfds, writefds, and exceptfds
78                  before the pselect() call. (Code omitted.) */
79
80               r = pselect(nfds, &readfds, &writefds, &exceptfds,
81                           NULL, &empty_mask);
82               if (r == -1 && errno != EINTR) {
83                   /* Handle error */
84               }
85
86               if (got_SIGCHLD) {
87                   got_SIGCHLD = 0;
88
89                   /* Handle signalled event here; e.g., wait() for all
90                      terminated children. (Code omitted.) */
91               }
92
93               /* main body of program */
94           }
95       }
96
97   Practical
98       So  what  is  the point of select()?  Can't I just read and write to my
99       file descriptors whenever I want?  The point of  select()  is  that  it
100       watches  multiple  descriptors  at  the same time and properly puts the
101       process to sleep if there is no activity.  UNIX programmers often  find
102       themselves  in  a position where they have to handle I/O from more than
103       one file descriptor where the data flow may be  intermittent.   If  you
104       were  to  merely  create  a sequence of read(2) and write(2) calls, you
105       would find that one of your calls may block waiting for data from/to  a
106       file  descriptor,  while another file descriptor is unused though ready
107       for I/O.  select() efficiently copes with this situation.
108
109   Select law
110       Many people who try to use select() come across behavior that is diffi‐
111       cult to understand and produces nonportable or borderline results.  For
112       instance, the above program is carefully written not to  block  at  any
113       point,  even though it does not set its file descriptors to nonblocking
114       mode.  It is easy to introduce subtle errors that will remove  the  ad‐
115       vantage of using select(), so here is a list of essentials to watch for
116       when using select().
117
118       1.  You should always try to use select() without a timeout.  Your pro‐
119           gram should have nothing to do if there is no data available.  Code
120           that depends on timeouts is not usually portable and  is  difficult
121           to debug.
122
123       2.  The  value  nfds  must be properly calculated for efficiency as ex‐
124           plained above.
125
126       3.  No file descriptor must be added to any set if you do not intend to
127           check  its  result  after  the select() call, and respond appropri‐
128           ately.  See next rule.
129
130       4.  After select() returns, all file descriptors in all sets should  be
131           checked to see if they are ready.
132
133       5.  The functions read(2), recv(2), write(2), and send(2) do not neces‐
134           sarily read/write the full amount of data that you have  requested.
135           If  they do read/write the full amount, it's because you have a low
136           traffic load and a fast stream.  This is not always going to be the
137           case.   You should cope with the case of your functions managing to
138           send or receive only a single byte.
139
140       6.  Never read/write only in single bytes at a time unless you are  re‐
141           ally  sure  that you have a small amount of data to process.  It is
142           extremely inefficient not to read/write as much  data  as  you  can
143           buffer  each time.  The buffers in the example below are 1024 bytes
144           although they could easily be made larger.
145
146       7.  Calls to read(2), recv(2), write(2), send(2), and select() can fail
147           with  the  error EINTR, and calls to read(2), recv(2) write(2), and
148           send(2) can fail with errno set to EAGAIN (EWOULDBLOCK).  These re‐
149           sults  must be properly managed (not done properly above).  If your
150           program is not going to receive any signals, then  it  is  unlikely
151           you  will get EINTR.  If your program does not set nonblocking I/O,
152           you will not get EAGAIN.
153
154       8.  Never call read(2), recv(2), write(2), or  send(2)  with  a  buffer
155           length of zero.
156
157       9.  If  the functions read(2), recv(2), write(2), and send(2) fail with
158           errors other than those listed in 7., or one of the input functions
159           returns  0,  indicating  end of file, then you should not pass that
160           file descriptor to select() again.  In the example below,  I  close
161           the  file  descriptor immediately, and then set it to -1 to prevent
162           it being included in a set.
163
164       10. The timeout value must be initialized with each  new  call  to  se‐
165           lect(),  since  some  operating systems modify the structure.  pse‐
166           lect() however does not modify its timeout structure.
167
168       11. Since select() modifies its file descriptor sets, if  the  call  is
169           being  used  in  a loop, then the sets must be reinitialized before
170           each call.
171

RETURN VALUE

173       See select(2).
174

NOTES

176       Generally speaking, all operating systems  that  support  sockets  also
177       support  select().   select()  can  be used to solve many problems in a
178       portable and efficient way that naive programmers try  to  solve  in  a
179       more  complicated  manner using threads, forking, IPCs, signals, memory
180       sharing, and so on.
181
182       The poll(2) system call has the same functionality as select(), and  is
183       somewhat  more  efficient  when monitoring sparse file descriptor sets.
184       It is nowadays widely available, but  historically  was  less  portable
185       than select().
186
187       The  Linux-specific epoll(7) API provides an interface that is more ef‐
188       ficient than select(2) and poll(2) when  monitoring  large  numbers  of
189       file descriptors.
190

EXAMPLES

192       Here  is  an  example  that better demonstrates the true utility of se‐
193       lect().  The listing below is a TCP forwarding  program  that  forwards
194       from one TCP port to another.
195
196       #include <stdlib.h>
197       #include <stdio.h>
198       #include <unistd.h>
199       #include <sys/select.h>
200       #include <string.h>
201       #include <signal.h>
202       #include <sys/socket.h>
203       #include <netinet/in.h>
204       #include <arpa/inet.h>
205       #include <errno.h>
206
207       static int forward_port;
208
209       #undef max
210       #define max(x,y) ((x) > (y) ? (x) : (y))
211
212       static int
213       listen_socket(int listen_port)
214       {
215           struct sockaddr_in addr;
216           int lfd;
217           int yes;
218
219           lfd = socket(AF_INET, SOCK_STREAM, 0);
220           if (lfd == -1) {
221               perror("socket");
222               return -1;
223           }
224
225           yes = 1;
226           if (setsockopt(lfd, SOL_SOCKET, SO_REUSEADDR,
227                   &yes, sizeof(yes)) == -1) {
228               perror("setsockopt");
229               close(lfd);
230               return -1;
231           }
232
233           memset(&addr, 0, sizeof(addr));
234           addr.sin_port = htons(listen_port);
235           addr.sin_family = AF_INET;
236           if (bind(lfd, (struct sockaddr *) &addr, sizeof(addr)) == -1) {
237               perror("bind");
238               close(lfd);
239               return -1;
240           }
241
242           printf("accepting connections on port %d\n", listen_port);
243           listen(lfd, 10);
244           return lfd;
245       }
246
247       static int
248       connect_socket(int connect_port, char *address)
249       {
250           struct sockaddr_in addr;
251           int cfd;
252
253           cfd = socket(AF_INET, SOCK_STREAM, 0);
254           if (cfd == -1) {
255               perror("socket");
256               return -1;
257           }
258
259           memset(&addr, 0, sizeof(addr));
260           addr.sin_port = htons(connect_port);
261           addr.sin_family = AF_INET;
262
263           if (!inet_aton(address, (struct in_addr *) &addr.sin_addr.s_addr)) {
264               fprintf(stderr, "inet_aton(): bad IP address format\n");
265               close(cfd);
266               return -1;
267           }
268
269           if (connect(cfd, (struct sockaddr *) &addr, sizeof(addr)) == -1) {
270               perror("connect()");
271               shutdown(cfd, SHUT_RDWR);
272               close(cfd);
273               return -1;
274           }
275           return cfd;
276       }
277
278       #define SHUT_FD1 do {                                \
279                            if (fd1 >= 0) {                 \
280                                shutdown(fd1, SHUT_RDWR);   \
281                                close(fd1);                 \
282                                fd1 = -1;                   \
283                            }                               \
284                        } while (0)
285
286       #define SHUT_FD2 do {                                \
287                            if (fd2 >= 0) {                 \
288                                shutdown(fd2, SHUT_RDWR);   \
289                                close(fd2);                 \
290                                fd2 = -1;                   \
291                            }                               \
292                        } while (0)
293
294       #define BUF_SIZE 1024
295
296       int
297       main(int argc, char *argv[])
298       {
299           int h;
300           int fd1 = -1, fd2 = -1;
301           char buf1[BUF_SIZE], buf2[BUF_SIZE];
302           int buf1_avail = 0, buf1_written = 0;
303           int buf2_avail = 0, buf2_written = 0;
304
305           if (argc != 4) {
306               fprintf(stderr, "Usage\n\tfwd <listen-port> "
307                        "<forward-to-port> <forward-to-ip-address>\n");
308               exit(EXIT_FAILURE);
309           }
310
311           signal(SIGPIPE, SIG_IGN);
312
313           forward_port = atoi(argv[2]);
314
315           h = listen_socket(atoi(argv[1]));
316           if (h == -1)
317               exit(EXIT_FAILURE);
318
319           for (;;) {
320               int ready, nfds = 0;
321               ssize_t nbytes;
322               fd_set readfds, writefds, exceptfds;
323
324               FD_ZERO(&readfds);
325               FD_ZERO(&writefds);
326               FD_ZERO(&exceptfds);
327               FD_SET(h, &readfds);
328               nfds = max(nfds, h);
329
330               if (fd1 > 0 && buf1_avail < BUF_SIZE)
331                   FD_SET(fd1, &readfds);
332                   /* Note: nfds is updated below, when fd1 is added to
333                      exceptfds. */
334               if (fd2 > 0 && buf2_avail < BUF_SIZE)
335                   FD_SET(fd2, &readfds);
336
337               if (fd1 > 0 && buf2_avail - buf2_written > 0)
338                   FD_SET(fd1, &writefds);
339               if (fd2 > 0 && buf1_avail - buf1_written > 0)
340                   FD_SET(fd2, &writefds);
341
342               if (fd1 > 0) {
343                   FD_SET(fd1, &exceptfds);
344                   nfds = max(nfds, fd1);
345               }
346               if (fd2 > 0) {
347                   FD_SET(fd2, &exceptfds);
348                   nfds = max(nfds, fd2);
349               }
350
351               ready = select(nfds + 1, &readfds, &writefds, &exceptfds, NULL);
352
353               if (ready == -1 && errno == EINTR)
354                   continue;
355
356               if (ready == -1) {
357                   perror("select()");
358                   exit(EXIT_FAILURE);
359               }
360
361               if (FD_ISSET(h, &readfds)) {
362                   socklen_t addrlen;
363                   struct sockaddr_in client_addr;
364                   int fd;
365
366                   addrlen = sizeof(client_addr);
367                   memset(&client_addr, 0, addrlen);
368                   fd = accept(h, (struct sockaddr *) &client_addr, &addrlen);
369                   if (fd == -1) {
370                       perror("accept()");
371                   } else {
372                       SHUT_FD1;
373                       SHUT_FD2;
374                       buf1_avail = buf1_written = 0;
375                       buf2_avail = buf2_written = 0;
376                       fd1 = fd;
377                       fd2 = connect_socket(forward_port, argv[3]);
378                       if (fd2 == -1)
379                           SHUT_FD1;
380                       else
381                           printf("connect from %s\n",
382                                   inet_ntoa(client_addr.sin_addr));
383
384                       /* Skip any events on the old, closed file
385                          descriptors. */
386
387                       continue;
388                   }
389               }
390
391               /* NB: read OOB data before normal reads. */
392
393               if (fd1 > 0 && FD_ISSET(fd1, &exceptfds)) {
394                   char c;
395
396                   nbytes = recv(fd1, &c, 1, MSG_OOB);
397                   if (nbytes < 1)
398                       SHUT_FD1;
399                   else
400                       send(fd2, &c, 1, MSG_OOB);
401               }
402               if (fd2 > 0 && FD_ISSET(fd2, &exceptfds)) {
403                   char c;
404
405                   nbytes = recv(fd2, &c, 1, MSG_OOB);
406                   if (nbytes < 1)
407                       SHUT_FD2;
408                   else
409                       send(fd1, &c, 1, MSG_OOB);
410               }
411               if (fd1 > 0 && FD_ISSET(fd1, &readfds)) {
412                   nbytes = read(fd1, buf1 + buf1_avail,
413                             BUF_SIZE - buf1_avail);
414                   if (nbytes < 1)
415                       SHUT_FD1;
416                   else
417                       buf1_avail += nbytes;
418               }
419               if (fd2 > 0 && FD_ISSET(fd2, &readfds)) {
420                   nbytes = read(fd2, buf2 + buf2_avail,
421                             BUF_SIZE - buf2_avail);
422                   if (nbytes < 1)
423                       SHUT_FD2;
424                   else
425                       buf2_avail += nbytes;
426               }
427               if (fd1 > 0 && FD_ISSET(fd1, &writefds) && buf2_avail > 0) {
428                   nbytes = write(fd1, buf2 + buf2_written,
429                              buf2_avail - buf2_written);
430                   if (nbytes < 1)
431                       SHUT_FD1;
432                   else
433                       buf2_written += nbytes;
434               }
435               if (fd2 > 0 && FD_ISSET(fd2, &writefds) && buf1_avail > 0) {
436                   nbytes = write(fd2, buf1 + buf1_written,
437                              buf1_avail - buf1_written);
438                   if (nbytes < 1)
439                       SHUT_FD2;
440                   else
441                       buf1_written += nbytes;
442               }
443
444               /* Check if write data has caught read data. */
445
446               if (buf1_written == buf1_avail)
447                   buf1_written = buf1_avail = 0;
448               if (buf2_written == buf2_avail)
449                   buf2_written = buf2_avail = 0;
450
451               /* One side has closed the connection, keep
452                  writing to the other side until empty. */
453
454               if (fd1 < 0 && buf1_avail - buf1_written == 0)
455                   SHUT_FD2;
456               if (fd2 < 0 && buf2_avail - buf2_written == 0)
457                   SHUT_FD1;
458           }
459           exit(EXIT_SUCCESS);
460       }
461
462       The  above  program properly forwards most kinds of TCP connections in‐
463       cluding OOB signal data transmitted by telnet servers.  It handles  the
464       tricky  problem  of having data flow in both directions simultaneously.
465       You might think it more efficient to use a fork(2) call  and  devote  a
466       thread  to  each  stream.  This becomes more tricky than you might sus‐
467       pect.  Another idea is to set nonblocking  I/O  using  fcntl(2).   This
468       also has its problems because you end up using inefficient timeouts.
469
470       The  program does not handle more than one simultaneous connection at a
471       time, although it could easily be extended to do  this  with  a  linked
472       list  of  buffers—one  for each connection.  At the moment, new connec‐
473       tions cause the current connection to be dropped.
474

SEE ALSO

476       accept(2), connect(2), poll(2), read(2), recv(2),  select(2),  send(2),
477       sigprocmask(2), write(2), epoll(7)
478

COLOPHON

480       This  page  is  part of release 5.12 of the Linux man-pages project.  A
481       description of the project, information about reporting bugs,  and  the
482       latest     version     of     this    page,    can    be    found    at
483       https://www.kernel.org/doc/man-pages/.
484
485
486
487Linux                             2021-03-22                     SELECT_TUT(2)
Impressum