1EPOLL(7)                   Linux Programmer's Manual                  EPOLL(7)
2
3
4

NAME

6       epoll - I/O event notification facility
7

SYNOPSIS

9       #include <sys/epoll.h>
10

DESCRIPTION

12       epoll  is a variant of poll(2) that can be used either as Edge or Level
13       Triggered interface and scales well to large numbers  of  watched  fds.
14       Three  system  calls  are  provided to set up and control an epoll set:
15       epoll_create(2), epoll_ctl(2), epoll_wait(2).
16
17       An epoll set is connected to a file descriptor  created  by  epoll_cre‐
18       ate(2).   Interest  for certain file descriptors is then registered via
19       epoll_ctl(2).  Finally, the actual wait is started by epoll_wait(2).
20

NOTES

22       The epoll event distribution interface is able to behave both  as  Edge
23       Triggered  ( ET ) and Level Triggered ( LT ). The difference between ET
24       and LT event distribution mechanism can be described as  follows.  Sup‐
25       pose that this scenario happens :
26
27       1      The  file  descriptor  that represents the read side of a pipe (
28              RFD ) is added inside the epoll device.
29
30       2      Pipe writer writes 2Kb of data on the write side of the pipe.
31
32       3      A call to epoll_wait(2) is done that will return  RFD  as  ready
33              file descriptor.
34
35       4      The pipe reader reads 1Kb of data from RFD.
36
37       5      A call to epoll_wait(2) is done.
38
39       If  the RFD file descriptor has been added to the epoll interface using
40       the EPOLLET flag, the call to epoll_wait(2) done in step 5 will  proba‐
41       bly  hang because of the available data still present in the file input
42       buffers and the remote peer might be expecting a response based on  the
43       data  it already sent. The reason for this is that Edge Triggered event
44       distribution delivers events only when events happens on the  monitored
45       file.  So, in step 5 the caller might end up waiting for some data that
46       is already present inside the input buffer. In the  above  example,  an
47       event  on  RFD will be generated because of the write done in 2 and the
48       event is consumed in 3.  Since the read operation done in  4  does  not
49       consume the whole buffer data, the call to epoll_wait(2) done in step 5
50       might lock indefinitely. The epoll interface, when used with the  EPOL‐
51       LET flag ( Edge Triggered ) should use non-blocking file descriptors to
52       avoid having a blocking read or write starve the task that is  handling
53       multiple  file  descriptors.  The suggested way to use epoll as an Edge
54       Triggered (EPOLLET) interface is below, and possible pitfalls to  avoid
55       follow.
56
57              i      with non-blocking file descriptors
58
59              ii     by  going  to  wait  for  an  event only after read(2) or
60                     write(2) return EAGAIN
61
62       On the contrary, when used as a Level Triggered interface, epoll is  by
63       all means a faster poll(2), and can be used wherever the latter is used
64       since it shares the same semantics. Since even with the Edge  Triggered
65       epoll multiple events can be generated up on receipt of multiple chunks
66       of data, the caller has the option to specify the EPOLLONESHOT flag, to
67       tell  epoll to disable the associated file descriptor after the receipt
68       of an event with epoll_wait(2).  When the EPOLLONESHOT flag  is  speci‐
69       fied,  it  is  caller responsibility to rearm the file descriptor using
70       epoll_ctl(2) with EPOLL_CTL_MOD.
71

EXAMPLE FOR SUGGESTED USAGE

73       While the usage of epoll when employed like a Level Triggered interface
74       does  have  the  same  semantics  of  poll(2),  an Edge Triggered usage
75       requires more clarification to avoid stalls in  the  application  event
76       loop.  In this example, listener is a non-blocking socket on which lis‐
77       ten(2) has been called. The function do_use_fd()  uses  the  new  ready
78       file descriptor until EAGAIN is returned by either read(2) or write(2).
79       An event driven state machine application should, after having received
80       EAGAIN,  record  its  current  state  so  that  at  the  next  call  to
81       do_use_fd() it will continue to  read(2)  or  write(2)  from  where  it
82       stopped before.
83
84       struct epoll_event ev, *events;
85
86       for(;;) {
87           nfds = epoll_wait(kdpfd, events, maxevents, -1);
88
89           for(n = 0; n < nfds; ++n) {
90               if(events[n].data.fd == listener) {
91                   client = accept(listener, (struct sockaddr *) &local,
92                                   &addrlen);
93                   if(client < 0){
94                       perror("accept");
95                       continue;
96                   }
97                   setnonblocking(client);
98                   ev.events = EPOLLIN | EPOLLET;
99                   ev.data.fd = client;
100                   if (epoll_ctl(kdpfd, EPOLL_CTL_ADD, client, &ev) < 0) {
101                       fprintf(stderr, "epoll set insertion error: fd=%d\n",
102                               client);
103                       return -1;
104                   }
105               }
106               else
107                   do_use_fd(events[n].data.fd);
108           }
109       }
110
111       When  used  as an Edge triggered interface, for performance reasons, it
112       is possible to add the file descriptor inside  the  epoll  interface  (
113       EPOLL_CTL_ADD  )  once  by specifying ( EPOLLIN|EPOLLOUT ). This allows
114       you to avoid continuously switching between EPOLLIN and EPOLLOUT  call‐
115       ing epoll_ctl(2) with EPOLL_CTL_MOD.
116
117

QUESTIONS AND ANSWERS

119       Q1     What happens if you add the same fd to an epoll_set twice?
120
121       A1     You  will  probably get EEXIST. However, it is possible that two
122              threads may add the same fd twice. This is a harmless condition.
123
124       Q2     Can two epoll sets wait for the  same  fd?  If  so,  are  events
125              reported to both epoll sets fds?
126
127       A2     Yes. However, it is not recommended. Yes it would be reported to
128              both.
129
130       Q3     Is the epoll fd itself poll/epoll/selectable?
131
132       A3     Yes.
133
134       Q4     What happens if the epoll fd is put into its own fd set?
135
136       A4     It will fail. However, you can add an epoll  fd  inside  another
137              epoll fd set.
138
139       Q5     Can I send the epoll fd over a unix-socket to another process?
140
141       A5     No.
142
143       Q6     Will  the  close  of an fd cause it to be removed from all epoll
144              sets automatically?
145
146       A6     Yes.
147
148       Q7     If more than one event comes in between epoll_wait(2) calls, are
149              they combined or reported separately?
150
151       A7     They will be combined.
152
153       Q8     Does  an operation on an fd affect the already collected but not
154              yet reported events?
155
156       A8     You can do two operations on an existing  fd.  Remove  would  be
157              meaningless for this case. Modify will re-read available I/O.
158
159       Q9     Do  I  need  to  continuously read/write an fd until EAGAIN when
160              using the EPOLLET flag ( Edge Triggered behaviour ) ?
161
162       A9     No you don't. Receiving an event from epoll_wait(2) should  sug‐
163              gest to you that such file descriptor is ready for the requested
164              I/O operation. You have simply to consider it  ready  until  you
165              will  receive  the  next  EAGAIN. When and how you will use such
166              file descriptor is entirely up to you. Also, the condition  that
167              the  read/write I/O space is exhausted can be detected by check‐
168              ing the amount  of  data  read/write  from/to  the  target  file
169              descriptor. For example, if you call read(2) by asking to read a
170              certain amount of data and read(2) returns  a  lower  number  of
171              bytes,  you can be sure to have exhausted the read I/O space for
172              such file descriptor. Same  is  valid  when  writing  using  the
173              write(2) function.
174

POSSIBLE PITFALLS AND WAYS TO AVOID THEM

176       o Starvation ( Edge Triggered )
177
178       If  there is a large amount of I/O space, it is possible that by trying
179       to drain it the other files will not get processed causing  starvation.
180       This is not specific to epoll.
181
182       The  solution  is to maintain a ready list and mark the file descriptor
183       as ready in its associated data structure, thereby allowing the  appli‐
184       cation  to  remember  which  files need to be processed but still round
185       robin amongst all the ready files. This also supports  ignoring  subse‐
186       quent events you receive for fd's that are already ready.
187
188       o If using an event cache...
189
190       If  you  use  an  event  cache  or  store  all  the  fd's returned from
191       epoll_wait(2), then make sure to provide a  way  to  mark  its  closure
192       dynamically  (ie- caused by a previous event's processing). Suppose you
193       receive 100 events from epoll_wait(2), and in  event  #47  a  condition
194       causes event #13 to be closed.  If you remove the structure and close()
195       the fd for event #13, then your event cache might still say  there  are
196       events waiting for that fd causing confusion.
197
198       One  solution  for  this is to call, during the processing of event 47,
199       epoll_ctl(EPOLL_CTL_DEL) to delete fd 13 and  close(),  then  mark  its
200       associated  data structure as removed and link it to a cleanup list. If
201       you find another event for fd 13 in your  batch  processing,  you  will
202       discover the fd had been previously removed and there will be no confu‐
203       sion.
204

CONFORMING TO

206       The epoll API is Linux specific.  Some other  systems  provide  similar
207       mechanisms, e.g., FreeBSD has kqueue, and Solaris has /dev/poll.
208

VERSIONS

210       epoll(7) is a new API introduced in Linux kernel 2.5.44.  Its interface
211       should be finalized in Linux kernel 2.5.66.
212

SEE ALSO

214       epoll_create(2), epoll_ctl(2), epoll_wait(2)
215
216
217
218Linux                             2002-10-23                          EPOLL(7)
Impressum