select_tut(2)

1SELECT_TUT(2)                 System Calls Manual                SELECT_TUT(2)
2
3
4

NAME

6       select, pselect - synchronous I/O multiplexing
7

LIBRARY

9       Standard C library (libc, -lc)
10

SYNOPSIS

12       See select(2)
13

DESCRIPTION

15       The select() and pselect() system calls are used to efficiently monitor
16       multiple file descriptors, to see  if  any  of  them  is,  or  becomes,
17       "ready";  that  is,  to see whether I/O becomes possible, or an "excep‐
18       tional condition" has occurred on any of the file descriptors.
19
20       This page provides background and tutorial information on  the  use  of
21       these  system calls.  For details of the arguments and semantics of se‐
22       lect() and pselect(), see select(2).
23
24   Combining signal and data events
25       pselect() is useful if you are waiting for a signal as well as for file
26       descriptor(s)  to  become ready for I/O.  Programs that receive signals
27       normally use the signal handler only  to  raise  a  global  flag.   The
28       global  flag will indicate that the event must be processed in the main
29       loop of the program.  A signal will cause the select()  (or  pselect())
30       call  to return with errno set to EINTR.  This behavior is essential so
31       that signals can be processed in the main loop of the  program,  other‐
32       wise select() would block indefinitely.
33
34       Now,  somewhere  in  the  main  loop will be a conditional to check the
35       global flag.  So we must ask: what if a signal arrives after the condi‐
36       tional,  but  before  the  select()  call?  The answer is that select()
37       would block indefinitely, even though an  event  is  actually  pending.
38       This  race condition is solved by the pselect() call.  This call can be
39       used to set the signal mask to a set of signals that are to be received
40       only  within  the  pselect()  call.   For instance, let us say that the
41       event in question was the exit of a child process.  Before the start of
42       the  main  loop, we would block SIGCHLD using sigprocmask(2).  Our pse‐
43       lect() call would enable SIGCHLD by using an empty  signal  mask.   Our
44       program would look like:
45
46       static volatile sig_atomic_t got_SIGCHLD = 0;
47
48       static void
49       child_sig_handler(int sig)
50       {
51           got_SIGCHLD = 1;
52       }
53
54       int
55       main(int argc, char *argv[])
56       {
57           sigset_t sigmask, empty_mask;
58           struct sigaction sa;
59           fd_set readfds, writefds, exceptfds;
60           int r;
61
62           sigemptyset(&sigmask);
63           sigaddset(&sigmask, SIGCHLD);
64           if (sigprocmask(SIG_BLOCK, &sigmask, NULL) == -1) {
65               perror("sigprocmask");
66               exit(EXIT_FAILURE);
67           }
68
69           sa.sa_flags = 0;
70           sa.sa_handler = child_sig_handler;
71           sigemptyset(&sa.sa_mask);
72           if (sigaction(SIGCHLD, &sa, NULL) == -1) {
73               perror("sigaction");
74               exit(EXIT_FAILURE);
75           }
76
77           sigemptyset(&empty_mask);
78
79           for (;;) {          /* main loop */
80               /* Initialize readfds, writefds, and exceptfds
81                  before the pselect() call. (Code omitted.) */
82
83               r = pselect(nfds, &readfds, &writefds, &exceptfds,
84                           NULL, &empty_mask);
85               if (r == -1 && errno != EINTR) {
86                   /* Handle error */
87               }
88
89               if (got_SIGCHLD) {
90                   got_SIGCHLD = 0;
91
92                   /* Handle signalled event here; e.g., wait() for all
93                      terminated children. (Code omitted.) */
94               }
95
96               /* main body of program */
97           }
98       }
99
100   Practical
101       So  what  is  the point of select()?  Can't I just read and write to my
102       file descriptors whenever I want?  The point of  select()  is  that  it
103       watches  multiple  descriptors  at  the same time and properly puts the
104       process to sleep if there is no activity.  UNIX programmers often  find
105       themselves  in  a position where they have to handle I/O from more than
106       one file descriptor where the data flow may be  intermittent.   If  you
107       were  to  merely  create  a sequence of read(2) and write(2) calls, you
108       would find that one of your calls may block waiting for data from/to  a
109       file  descriptor,  while another file descriptor is unused though ready
110       for I/O.  select() efficiently copes with this situation.
111
112   Select law
113       Many people who try to use select() come across behavior that is diffi‐
114       cult to understand and produces nonportable or borderline results.  For
115       instance, the above program is carefully written not to  block  at  any
116       point,  even though it does not set its file descriptors to nonblocking
117       mode.  It is easy to introduce subtle errors that will remove  the  ad‐
118       vantage of using select(), so here is a list of essentials to watch for
119       when using select().
120
121       1.  You should always try to use select() without a timeout.  Your pro‐
122           gram should have nothing to do if there is no data available.  Code
123           that depends on timeouts is not usually portable and  is  difficult
124           to debug.
125
126       2.  The  value  nfds  must be properly calculated for efficiency as ex‐
127           plained above.
128
129       3.  No file descriptor must be added to any set if you do not intend to
130           check  its  result  after  the select() call, and respond appropri‐
131           ately.  See next rule.
132
133       4.  After select() returns, all file descriptors in all sets should  be
134           checked to see if they are ready.
135
136       5.  The functions read(2), recv(2), write(2), and send(2) do not neces‐
137           sarily read/write the full amount of data that you have  requested.
138           If  they do read/write the full amount, it's because you have a low
139           traffic load and a fast stream.  This is not always going to be the
140           case.   You should cope with the case of your functions managing to
141           send or receive only a single byte.
142
143       6.  Never read/write only in single bytes at a time unless you are  re‐
144           ally  sure  that you have a small amount of data to process.  It is
145           extremely inefficient not to read/write as much  data  as  you  can
146           buffer  each time.  The buffers in the example below are 1024 bytes
147           although they could easily be made larger.
148
149       7.  Calls to read(2), recv(2), write(2), send(2), and select() can fail
150           with  the error EINTR, and calls to read(2), recv(2), write(2), and
151           send(2) can fail with errno set to EAGAIN (EWOULDBLOCK).  These re‐
152           sults  must be properly managed (not done properly above).  If your
153           program is not going to receive any signals, then  it  is  unlikely
154           you  will get EINTR.  If your program does not set nonblocking I/O,
155           you will not get EAGAIN.
156
157       8.  Never call read(2), recv(2), write(2), or  send(2)  with  a  buffer
158           length of zero.
159
160       9.  If  the functions read(2), recv(2), write(2), and send(2) fail with
161           errors other than those listed in 7., or one of the input functions
162           returns  0,  indicating  end of file, then you should not pass that
163           file descriptor to select() again.  In the example below,  I  close
164           the  file  descriptor immediately, and then set it to -1 to prevent
165           it being included in a set.
166
167       10. The timeout value must be initialized with each  new  call  to  se‐
168           lect(),  since  some  operating systems modify the structure.  pse‐
169           lect() however does not modify its timeout structure.
170
171       11. Since select() modifies its file descriptor sets, if  the  call  is
172           being  used  in  a loop, then the sets must be reinitialized before
173           each call.
174

RETURN VALUE

176       See select(2).
177

NOTES

179       Generally speaking, all operating systems  that  support  sockets  also
180       support  select().   select()  can  be used to solve many problems in a
181       portable and efficient way that naive programmers try  to  solve  in  a
182       more  complicated  manner using threads, forking, IPCs, signals, memory
183       sharing, and so on.
184
185       The poll(2) system call has the same functionality as select(), and  is
186       somewhat  more  efficient  when monitoring sparse file descriptor sets.
187       It is nowadays widely available, but  historically  was  less  portable
188       than select().
189
190       The  Linux-specific epoll(7) API provides an interface that is more ef‐
191       ficient than select(2) and poll(2) when  monitoring  large  numbers  of
192       file descriptors.
193

EXAMPLES

195       Here  is  an  example  that better demonstrates the true utility of se‐
196       lect().  The listing below is a TCP forwarding  program  that  forwards
197       from one TCP port to another.
198
199       #include <arpa/inet.h>
200       #include <errno.h>
201       #include <netinet/in.h>
202       #include <signal.h>
203       #include <stdio.h>
204       #include <stdlib.h>
205       #include <string.h>
206       #include <sys/select.h>
207       #include <sys/socket.h>
208       #include <unistd.h>
209
210       static int forward_port;
211
212       #undef max
213       #define max(x, y) ((x) > (y) ? (x) : (y))
214
215       static int
216       listen_socket(int listen_port)
217       {
218           int                 lfd;
219           int                 yes;
220           struct sockaddr_in  addr;
221
222           lfd = socket(AF_INET, SOCK_STREAM, 0);
223           if (lfd == -1) {
224               perror("socket");
225               return -1;
226           }
227
228           yes = 1;
229           if (setsockopt(lfd, SOL_SOCKET, SO_REUSEADDR,
230                          &yes, sizeof(yes)) == -1)
231           {
232               perror("setsockopt");
233               close(lfd);
234               return -1;
235           }
236
237           memset(&addr, 0, sizeof(addr));
238           addr.sin_port = htons(listen_port);
239           addr.sin_family = AF_INET;
240           if (bind(lfd, (struct sockaddr *) &addr, sizeof(addr)) == -1) {
241               perror("bind");
242               close(lfd);
243               return -1;
244           }
245
246           printf("accepting connections on port %d\n", listen_port);
247           listen(lfd, 10);
248           return lfd;
249       }
250
251       static int
252       connect_socket(int connect_port, char *address)
253       {
254           int                 cfd;
255           struct sockaddr_in  addr;
256
257           cfd = socket(AF_INET, SOCK_STREAM, 0);
258           if (cfd == -1) {
259               perror("socket");
260               return -1;
261           }
262
263           memset(&addr, 0, sizeof(addr));
264           addr.sin_port = htons(connect_port);
265           addr.sin_family = AF_INET;
266
267           if (!inet_aton(address, (struct in_addr *) &addr.sin_addr.s_addr)) {
268               fprintf(stderr, "inet_aton(): bad IP address format\n");
269               close(cfd);
270               return -1;
271           }
272
273           if (connect(cfd, (struct sockaddr *) &addr, sizeof(addr)) == -1) {
274               perror("connect()");
275               shutdown(cfd, SHUT_RDWR);
276               close(cfd);
277               return -1;
278           }
279           return cfd;
280       }
281
282       #define SHUT_FD1 do {                                \
283                            if (fd1 >= 0) {                 \
284                                shutdown(fd1, SHUT_RDWR);   \
285                                close(fd1);                 \
286                                fd1 = -1;                   \
287                            }                               \
288                        } while (0)
289
290       #define SHUT_FD2 do {                                \
291                            if (fd2 >= 0) {                 \
292                                shutdown(fd2, SHUT_RDWR);   \
293                                close(fd2);                 \
294                                fd2 = -1;                   \
295                            }                               \
296                        } while (0)
297
298       #define BUF_SIZE 1024
299
300       int
301       main(int argc, char *argv[])
302       {
303           int      h;
304           int      ready, nfds;
305           int      fd1 = -1, fd2 = -1;
306           int      buf1_avail = 0, buf1_written = 0;
307           int      buf2_avail = 0, buf2_written = 0;
308           char     buf1[BUF_SIZE], buf2[BUF_SIZE];
309           fd_set   readfds, writefds, exceptfds;
310           ssize_t  nbytes;
311
312           if (argc != 4) {
313               fprintf(stderr, "Usage\n\tfwd <listen-port> "
314                       "<forward-to-port> <forward-to-ip-address>\n");
315               exit(EXIT_FAILURE);
316           }
317
318           signal(SIGPIPE, SIG_IGN);
319
320           forward_port = atoi(argv[2]);
321
322           h = listen_socket(atoi(argv[1]));
323           if (h == -1)
324               exit(EXIT_FAILURE);
325
326           for (;;) {
327               nfds = 0;
328
329               FD_ZERO(&readfds);
330               FD_ZERO(&writefds);
331               FD_ZERO(&exceptfds);
332               FD_SET(h, &readfds);
333               nfds = max(nfds, h);
334
335               if (fd1 > 0 && buf1_avail < BUF_SIZE)
336                   FD_SET(fd1, &readfds);
337                   /* Note: nfds is updated below, when fd1 is added to
338                      exceptfds. */
339               if (fd2 > 0 && buf2_avail < BUF_SIZE)
340                   FD_SET(fd2, &readfds);
341
342               if (fd1 > 0 && buf2_avail - buf2_written > 0)
343                   FD_SET(fd1, &writefds);
344               if (fd2 > 0 && buf1_avail - buf1_written > 0)
345                   FD_SET(fd2, &writefds);
346
347               if (fd1 > 0) {
348                   FD_SET(fd1, &exceptfds);
349                   nfds = max(nfds, fd1);
350               }
351               if (fd2 > 0) {
352                   FD_SET(fd2, &exceptfds);
353                   nfds = max(nfds, fd2);
354               }
355
356               ready = select(nfds + 1, &readfds, &writefds, &exceptfds, NULL);
357
358               if (ready == -1 && errno == EINTR)
359                   continue;
360
361               if (ready == -1) {
362                   perror("select()");
363                   exit(EXIT_FAILURE);
364               }
365
366               if (FD_ISSET(h, &readfds)) {
367                   socklen_t addrlen;
368                   struct sockaddr_in client_addr;
369                   int fd;
370
371                   addrlen = sizeof(client_addr);
372                   memset(&client_addr, 0, addrlen);
373                   fd = accept(h, (struct sockaddr *) &client_addr, &addrlen);
374                   if (fd == -1) {
375                       perror("accept()");
376                   } else {
377                       SHUT_FD1;
378                       SHUT_FD2;
379                       buf1_avail = buf1_written = 0;
380                       buf2_avail = buf2_written = 0;
381                       fd1 = fd;
382                       fd2 = connect_socket(forward_port, argv[3]);
383                       if (fd2 == -1)
384                           SHUT_FD1;
385                       else
386                           printf("connect from %s\n",
387                                  inet_ntoa(client_addr.sin_addr));
388
389                       /* Skip any events on the old, closed file
390                          descriptors. */
391
392                       continue;
393                   }
394               }
395
396               /* NB: read OOB data before normal reads. */
397
398               if (fd1 > 0 && FD_ISSET(fd1, &exceptfds)) {
399                   char c;
400
401                   nbytes = recv(fd1, &c, 1, MSG_OOB);
402                   if (nbytes < 1)
403                       SHUT_FD1;
404                   else
405                       send(fd2, &c, 1, MSG_OOB);
406               }
407               if (fd2 > 0 && FD_ISSET(fd2, &exceptfds)) {
408                   char c;
409
410                   nbytes = recv(fd2, &c, 1, MSG_OOB);
411                   if (nbytes < 1)
412                       SHUT_FD2;
413                   else
414                       send(fd1, &c, 1, MSG_OOB);
415               }
416               if (fd1 > 0 && FD_ISSET(fd1, &readfds)) {
417                   nbytes = read(fd1, buf1 + buf1_avail,
418                                 BUF_SIZE - buf1_avail);
419                   if (nbytes < 1)
420                       SHUT_FD1;
421                   else
422                       buf1_avail += nbytes;
423               }
424               if (fd2 > 0 && FD_ISSET(fd2, &readfds)) {
425                   nbytes = read(fd2, buf2 + buf2_avail,
426                                 BUF_SIZE - buf2_avail);
427                   if (nbytes < 1)
428                       SHUT_FD2;
429                   else
430                       buf2_avail += nbytes;
431               }
432               if (fd1 > 0 && FD_ISSET(fd1, &writefds) && buf2_avail > 0) {
433                   nbytes = write(fd1, buf2 + buf2_written,
434                                  buf2_avail - buf2_written);
435                   if (nbytes < 1)
436                       SHUT_FD1;
437                   else
438                       buf2_written += nbytes;
439               }
440               if (fd2 > 0 && FD_ISSET(fd2, &writefds) && buf1_avail > 0) {
441                   nbytes = write(fd2, buf1 + buf1_written,
442                                  buf1_avail - buf1_written);
443                   if (nbytes < 1)
444                       SHUT_FD2;
445                   else
446                       buf1_written += nbytes;
447               }
448
449               /* Check if write data has caught read data. */
450
451               if (buf1_written == buf1_avail)
452                   buf1_written = buf1_avail = 0;
453               if (buf2_written == buf2_avail)
454                   buf2_written = buf2_avail = 0;
455
456               /* One side has closed the connection, keep
457                  writing to the other side until empty. */
458
459               if (fd1 < 0 && buf1_avail - buf1_written == 0)
460                   SHUT_FD2;
461               if (fd2 < 0 && buf2_avail - buf2_written == 0)
462                   SHUT_FD1;
463           }
464           exit(EXIT_SUCCESS);
465       }
466
467       The  above  program properly forwards most kinds of TCP connections in‐
468       cluding OOB signal data transmitted by telnet servers.  It handles  the
469       tricky  problem  of having data flow in both directions simultaneously.
470       You might think it more efficient to use a fork(2) call  and  devote  a
471       thread  to  each  stream.  This becomes more tricky than you might sus‐
472       pect.  Another idea is to set nonblocking  I/O  using  fcntl(2).   This
473       also has its problems because you end up using inefficient timeouts.
474
475       The  program does not handle more than one simultaneous connection at a
476       time, although it could easily be extended to do  this  with  a  linked
477       list  of  buffers—one  for each connection.  At the moment, new connec‐
478       tions cause the current connection to be dropped.
479