1SELECT_TUT(2)              Linux Programmer's Manual             SELECT_TUT(2)
2
3
4

NAME

6       select, pselect - synchronous I/O multiplexing
7

SYNOPSIS

9       See select(2)
10

DESCRIPTION

12       The select() and pselect() system calls are used to efficiently monitor
13       multiple file descriptors, to see  if  any  of  them  is,  or  becomes,
14       "ready";  that  is,  to see whether I/O becomes possible, or an "excep‐
15       tional condition" has occurred on any of the file descriptors.
16
17       This page provides background and tutorial information on  the  use  of
18       these  system  calls.   For  details  of the arguments and semantics of
19       select() and pselect(), see select(2).
20
21   Combining signal and data events
22       pselect() is useful if you are waiting for a signal as well as for file
23       descriptor(s)  to  become ready for I/O.  Programs that receive signals
24       normally use the signal handler only  to  raise  a  global  flag.   The
25       global  flag will indicate that the event must be processed in the main
26       loop of the program.  A signal will cause the select()  (or  pselect())
27       call  to return with errno set to EINTR.  This behavior is essential so
28       that signals can be processed in the main loop of the  program,  other‐
29       wise select() would block indefinitely.
30
31       Now,  somewhere  in  the  main  loop will be a conditional to check the
32       global flag.  So we must ask: what if a signal arrives after the condi‐
33       tional,  but  before  the  select()  call?  The answer is that select()
34       would block indefinitely, even though an  event  is  actually  pending.
35       This  race condition is solved by the pselect() call.  This call can be
36       used to set the signal mask to a set of signals that are to be received
37       only  within  the  pselect()  call.   For instance, let us say that the
38       event in question was the exit of a child process.  Before the start of
39       the  main  loop, we would block SIGCHLD using sigprocmask(2).  Our pse‐
40       lect() call would enable SIGCHLD by using an empty  signal  mask.   Our
41       program would look like:
42
43       static volatile sig_atomic_t got_SIGCHLD = 0;
44
45       static void
46       child_sig_handler(int sig)
47       {
48           got_SIGCHLD = 1;
49       }
50
51       int
52       main(int argc, char *argv[])
53       {
54           sigset_t sigmask, empty_mask;
55           struct sigaction sa;
56           fd_set readfds, writefds, exceptfds;
57           int r;
58
59           sigemptyset(&sigmask);
60           sigaddset(&sigmask, SIGCHLD);
61           if (sigprocmask(SIG_BLOCK, &sigmask, NULL) == -1) {
62               perror("sigprocmask");
63               exit(EXIT_FAILURE);
64           }
65
66           sa.sa_flags = 0;
67           sa.sa_handler = child_sig_handler;
68           sigemptyset(&sa.sa_mask);
69           if (sigaction(SIGCHLD, &sa, NULL) == -1) {
70               perror("sigaction");
71               exit(EXIT_FAILURE);
72           }
73
74           sigemptyset(&empty_mask);
75
76           for (;;) {          /* main loop */
77               /* Initialize readfds, writefds, and exceptfds
78                  before the pselect() call. (Code omitted.) */
79
80               r = pselect(nfds, &readfds, &writefds, &exceptfds,
81                           NULL, &empty_mask);
82               if (r == -1 && errno != EINTR) {
83                   /* Handle error */
84               }
85
86               if (got_SIGCHLD) {
87                   got_SIGCHLD = 0;
88
89                   /* Handle signalled event here; e.g., wait() for all
90                      terminated children. (Code omitted.) */
91               }
92
93               /* main body of program */
94           }
95       }
96
97   Practical
98       So  what  is  the point of select()?  Can't I just read and write to my
99       file descriptors whenever I want?  The point of  select()  is  that  it
100       watches  multiple  descriptors  at  the same time and properly puts the
101       process to sleep if there is no activity.  UNIX programmers often  find
102       themselves  in  a position where they have to handle I/O from more than
103       one file descriptor where the data flow may be  intermittent.   If  you
104       were  to  merely  create  a sequence of read(2) and write(2) calls, you
105       would find that one of your calls may block waiting for data from/to  a
106       file  descriptor,  while another file descriptor is unused though ready
107       for I/O.  select() efficiently copes with this situation.
108
109   Select law
110       Many people who try to use select() come across behavior that is diffi‐
111       cult to understand and produces nonportable or borderline results.  For
112       instance, the above program is carefully written not to  block  at  any
113       point,  even though it does not set its file descriptors to nonblocking
114       mode.  It is easy to introduce  subtle  errors  that  will  remove  the
115       advantage  of  using select(), so here is a list of essentials to watch
116       for when using select().
117
118       1.  You should always try to use select() without a timeout.  Your pro‐
119           gram should have nothing to do if there is no data available.  Code
120           that depends on timeouts is not usually portable and  is  difficult
121           to debug.
122
123       2.  The  value  nfds  must  be  properly  calculated  for efficiency as
124           explained above.
125
126       3.  No file descriptor must be added to any set if you do not intend to
127           check  its  result  after  the select() call, and respond appropri‐
128           ately.  See next rule.
129
130       4.  After select() returns, all file descriptors in all sets should  be
131           checked to see if they are ready.
132
133       5.  The functions read(2), recv(2), write(2), and send(2) do not neces‐
134           sarily read/write the full amount of data that you have  requested.
135           If  they do read/write the full amount, it's because you have a low
136           traffic load and a fast stream.  This is not always going to be the
137           case.   You should cope with the case of your functions managing to
138           send or receive only a single byte.
139
140       6.  Never read/write only in single bytes at  a  time  unless  you  are
141           really sure that you have a small amount of data to process.  It is
142           extremely inefficient not to read/write as much  data  as  you  can
143           buffer  each time.  The buffers in the example below are 1024 bytes
144           although they could easily be made larger.
145
146       7.  Calls to read(2), recv(2), write(2), send(2), and select() can fail
147           with  the  error EINTR, and calls to read(2), recv(2) write(2), and
148           send(2) can fail with errno set  to  EAGAIN  (EWOULDBLOCK).   These
149           results  must  be  properly  managed (not done properly above).  If
150           your program is not going  to  receive  any  signals,  then  it  is
151           unlikely  you  will  get  EINTR.  If your program does not set non‐
152           blocking I/O, you will not get EAGAIN.
153
154       8.  Never call read(2), recv(2), write(2), or  send(2)  with  a  buffer
155           length of zero.
156
157       9.  If  the functions read(2), recv(2), write(2), and send(2) fail with
158           errors other than those listed in 7., or one of the input functions
159           returns  0,  indicating  end of file, then you should not pass that
160           file descriptor to select() again.  In the example below,  I  close
161           the  file  descriptor immediately, and then set it to -1 to prevent
162           it being included in a set.
163
164       10. The timeout value  must  be  initialized  with  each  new  call  to
165           select(),  since some operating systems modify the structure.  pse‐
166           lect() however does not modify its timeout structure.
167
168       11. Since select() modifies its file descriptor sets, if  the  call  is
169           being  used  in  a loop, then the sets must be reinitialized before
170           each call.
171

RETURN VALUE

173       See select(2).
174

NOTES

176       Generally speaking, all operating systems  that  support  sockets  also
177       support  select().   select()  can  be used to solve many problems in a
178       portable and efficient way that naive programmers try  to  solve  in  a
179       more  complicated  manner using threads, forking, IPCs, signals, memory
180       sharing, and so on.
181
182       The poll(2) system call has the same functionality as select(), and  is
183       somewhat  more  efficient  when monitoring sparse file descriptor sets.
184       It is nowadays widely available, but  historically  was  less  portable
185       than select().
186
187       The  Linux-specific  epoll(7)  API  provides  an interface that is more
188       efficient than select(2) and poll(2) when monitoring large  numbers  of
189       file descriptors.
190

EXAMPLES

192       Here  is  an  example  that  better  demonstrates  the  true utility of
193       select().  The listing below is a TCP forwarding program that  forwards
194       from one TCP port to another.
195
196       #include <stdlib.h>
197       #include <stdio.h>
198       #include <unistd.h>
199       #include <sys/select.h>
200       #include <string.h>
201       #include <signal.h>
202       #include <sys/socket.h>
203       #include <netinet/in.h>
204       #include <arpa/inet.h>
205       #include <errno.h>
206
207       static int forward_port;
208
209       #undef max
210       #define max(x,y) ((x) > (y) ? (x) : (y))
211
212       static int
213       listen_socket(int listen_port)
214       {
215           struct sockaddr_in addr;
216           int lfd;
217           int yes;
218
219           lfd = socket(AF_INET, SOCK_STREAM, 0);
220           if (lfd == -1) {
221               perror("socket");
222               return -1;
223           }
224
225           yes = 1;
226           if (setsockopt(lfd, SOL_SOCKET, SO_REUSEADDR,
227                   &yes, sizeof(yes)) == -1) {
228               perror("setsockopt");
229               close(lfd);
230               return -1;
231           }
232
233           memset(&addr, 0, sizeof(addr));
234           addr.sin_port = htons(listen_port);
235           addr.sin_family = AF_INET;
236           if (bind(lfd, (struct sockaddr *) &addr, sizeof(addr)) == -1) {
237               perror("bind");
238               close(lfd);
239               return -1;
240           }
241
242           printf("accepting connections on port %d\n", listen_port);
243           listen(lfd, 10);
244           return lfd;
245       }
246
247       static int
248       connect_socket(int connect_port, char *address)
249       {
250           struct sockaddr_in addr;
251           int cfd;
252
253           cfd = socket(AF_INET, SOCK_STREAM, 0);
254           if (cfd == -1) {
255               perror("socket");
256               return -1;
257           }
258
259           memset(&addr, 0, sizeof(addr));
260           addr.sin_port = htons(connect_port);
261           addr.sin_family = AF_INET;
262
263           if (!inet_aton(address, (struct in_addr *) &addr.sin_addr.s_addr)) {
264               fprintf(stderr, "inet_aton(): bad IP address format\n");
265               close(cfd);
266               return -1;
267           }
268
269           if (connect(cfd, (struct sockaddr *) &addr, sizeof(addr)) == -1) {
270               perror("connect()");
271               shutdown(cfd, SHUT_RDWR);
272               close(cfd);
273               return -1;
274           }
275           return cfd;
276       }
277
278       #define SHUT_FD1 do {                                \
279                            if (fd1 >= 0) {                 \
280                                shutdown(fd1, SHUT_RDWR);   \
281                                close(fd1);                 \
282                                fd1 = -1;                   \
283                            }                               \
284                        } while (0)
285
286       #define SHUT_FD2 do {                                \
287                            if (fd2 >= 0) {                 \
288                                shutdown(fd2, SHUT_RDWR);   \
289                                close(fd2);                 \
290                                fd2 = -1;                   \
291                            }                               \
292                        } while (0)
293
294       #define BUF_SIZE 1024
295
296       int
297       main(int argc, char *argv[])
298       {
299           int h;
300           int fd1 = -1, fd2 = -1;
301           char buf1[BUF_SIZE], buf2[BUF_SIZE];
302           int buf1_avail = 0, buf1_written = 0;
303           int buf2_avail = 0, buf2_written = 0;
304
305           if (argc != 4) {
306               fprintf(stderr, "Usage\n\tfwd <listen-port> "
307                        "<forward-to-port> <forward-to-ip-address>\n");
308               exit(EXIT_FAILURE);
309           }
310
311           signal(SIGPIPE, SIG_IGN);
312
313           forward_port = atoi(argv[2]);
314
315           h = listen_socket(atoi(argv[1]));
316           if (h == -1)
317               exit(EXIT_FAILURE);
318
319           for (;;) {
320               int ready, nfds = 0;
321               ssize_t nbytes;
322               fd_set readfds, writefds, exceptfds;
323
324               FD_ZERO(&readfds);
325               FD_ZERO(&writefds);
326               FD_ZERO(&exceptfds);
327               FD_SET(h, &readfds);
328               nfds = max(nfds, h);
329
330               if (fd1 > 0 && buf1_avail < BUF_SIZE)
331                   FD_SET(fd1, &readfds);
332                   /* Note: nfds is updated below, when fd1 is added to
333                      exceptfds. */
334               if (fd2 > 0 && buf2_avail < BUF_SIZE)
335                   FD_SET(fd2, &readfds);
336
337               if (fd1 > 0 && buf2_avail - buf2_written > 0)
338                   FD_SET(fd1, &writefds);
339               if (fd2 > 0 && buf1_avail - buf1_written > 0)
340                   FD_SET(fd2, &writefds);
341
342               if (fd1 > 0) {
343                   FD_SET(fd1, &exceptfds);
344                   nfds = max(nfds, fd1);
345               }
346               if (fd2 > 0) {
347                   FD_SET(fd2, &exceptfds);
348                   nfds = max(nfds, fd2);
349               }
350
351               ready = select(nfds + 1, &readfds, &writefds, &exceptfds, NULL);
352
353               if (ready == -1 && errno == EINTR)
354                   continue;
355
356               if (ready == -1) {
357                   perror("select()");
358                   exit(EXIT_FAILURE);
359               }
360
361               if (FD_ISSET(h, &readfds)) {
362                   socklen_t addrlen;
363                   struct sockaddr_in client_addr;
364                   int fd;
365
366                   addrlen = sizeof(client_addr);
367                   memset(&client_addr, 0, addrlen);
368                   fd = accept(h, (struct sockaddr *) &client_addr, &addrlen);
369                   if (fd == -1) {
370                       perror("accept()");
371                   } else {
372                       SHUT_FD1;
373                       SHUT_FD2;
374                       buf1_avail = buf1_written = 0;
375                       buf2_avail = buf2_written = 0;
376                       fd1 = fd;
377                       fd2 = connect_socket(forward_port, argv[3]);
378                       if (fd2 == -1)
379                           SHUT_FD1;
380                       else
381                           printf("connect from %s\n",
382                                   inet_ntoa(client_addr.sin_addr));
383
384                       /* Skip any events on the old, closed file
385                          descriptors. */
386
387                       continue;
388                   }
389               }
390
391               /* NB: read OOB data before normal reads */
392
393               if (fd1 > 0 && FD_ISSET(fd1, &exceptfds)) {
394                   char c;
395
396                   nbytes = recv(fd1, &c, 1, MSG_OOB);
397                   if (nbytes < 1)
398                       SHUT_FD1;
399                   else
400                       send(fd2, &c, 1, MSG_OOB);
401               }
402               if (fd2 > 0 && FD_ISSET(fd2, &exceptfds)) {
403                   char c;
404
405                   nbytes = recv(fd2, &c, 1, MSG_OOB);
406                   if (nbytes < 1)
407                       SHUT_FD2;
408                   else
409                       send(fd1, &c, 1, MSG_OOB);
410               }
411               if (fd1 > 0 && FD_ISSET(fd1, &readfds)) {
412                   nbytes = read(fd1, buf1 + buf1_avail,
413                             BUF_SIZE - buf1_avail);
414                   if (nbytes < 1)
415                       SHUT_FD1;
416                   else
417                       buf1_avail += nbytes;
418               }
419               if (fd2 > 0 && FD_ISSET(fd2, &readfds)) {
420                   nbytes = read(fd2, buf2 + buf2_avail,
421                             BUF_SIZE - buf2_avail);
422                   if (nbytes < 1)
423                       SHUT_FD2;
424                   else
425                       buf2_avail += nbytes;
426               }
427               if (fd1 > 0 && FD_ISSET(fd1, &writefds) && buf2_avail > 0) {
428                   nbytes = write(fd1, buf2 + buf2_written,
429                              buf2_avail - buf2_written);
430                   if (nbytes < 1)
431                       SHUT_FD1;
432                   else
433                       buf2_written += nbytes;
434               }
435               if (fd2 > 0 && FD_ISSET(fd2, &writefds) && buf1_avail > 0) {
436                   nbytes = write(fd2, buf1 + buf1_written,
437                              buf1_avail - buf1_written);
438                   if (nbytes < 1)
439                       SHUT_FD2;
440                   else
441                       buf1_written += nbytes;
442               }
443
444               /* Check if write data has caught read data */
445
446               if (buf1_written == buf1_avail)
447                   buf1_written = buf1_avail = 0;
448               if (buf2_written == buf2_avail)
449                   buf2_written = buf2_avail = 0;
450
451               /* One side has closed the connection, keep
452                  writing to the other side until empty */
453
454               if (fd1 < 0 && buf1_avail - buf1_written == 0)
455                   SHUT_FD2;
456               if (fd2 < 0 && buf2_avail - buf2_written == 0)
457                   SHUT_FD1;
458           }
459           exit(EXIT_SUCCESS);
460       }
461
462       The  above  program  properly  forwards  most  kinds of TCP connections
463       including OOB signal data transmitted by telnet  servers.   It  handles
464       the  tricky  problem  of having data flow in both directions simultane‐
465       ously.  You might think it more efficient to use  a  fork(2)  call  and
466       devote  a  thread  to  each  stream.  This becomes more tricky than you
467       might suspect.  Another idea is to set nonblocking I/O using  fcntl(2).
468       This  also  has its problems because you end up using inefficient time‐
469       outs.
470
471       The program does not handle more than one simultaneous connection at  a
472       time,  although  it  could  easily be extended to do this with a linked
473       list of buffers—one for each connection.  At the  moment,  new  connec‐
474       tions cause the current connection to be dropped.
475

SEE ALSO

477       accept(2),  connect(2),  poll(2), read(2), recv(2), select(2), send(2),
478       sigprocmask(2), write(2), epoll(7)
479

COLOPHON

481       This page is part of release 5.07 of the Linux  man-pages  project.   A
482       description  of  the project, information about reporting bugs, and the
483       latest    version    of    this    page,    can     be     found     at
484       https://www.kernel.org/doc/man-pages/.
485
486
487
488Linux                             2020-04-11                     SELECT_TUT(2)
Impressum