1EPOLL(7)                   Linux Programmer's Manual                  EPOLL(7)
2
3
4

NAME

6       epoll - I/O event notification facility
7

SYNOPSIS

9       #include <sys/epoll.h>
10

DESCRIPTION

12       epoll  is a variant of poll(2) that can be used either as an edge-trig‐
13       gered or a level-triggered interface and scales well to  large  numbers
14       of  watched  file descriptors.  The following system calls are provided
15       to create and manage an epoll instance:
16
17       *  An epoll instance created by epoll_create(2), which returns  a  file
18          descriptor  referring  to  the  epoll  instance.   (The  more recent
19          epoll_create1(2) extends the functionality of epoll_create(2).)
20
21       *  Interest in particular  file  descriptors  is  then  registered  via
22          epoll_ctl(2).   The  set of file descriptors currently registered on
23          an epoll instance is sometimes called an epoll set.
24
25       *  Finally, the actual wait is started by epoll_wait(2).
26
27   Level-Triggered and Edge-Triggered
28       The epoll event distribution interface is able to behave both as  edge-
29       triggered (ET) and as level-triggered (LT).  The difference between the
30       two mechanisms can be described as follows.  Suppose that this scenario
31       happens:
32
33       1. The file descriptor that represents the read side of a pipe (rfd) is
34          registered on the epoll instance.
35
36       2. A pipe writer writes 2 kB of data on the write side of the pipe.
37
38       3. A call to epoll_wait(2) is done that will return rfd as a ready file
39          descriptor.
40
41       4. The pipe reader reads 1 kB of data from rfd.
42
43       5. A call to epoll_wait(2) is done.
44
45       If  the rfd file descriptor has been added to the epoll interface using
46       the EPOLLET (edge-triggered) flag, the call to  epoll_wait(2)  done  in
47       step  5  will probably hang despite the available data still present in
48       the file input buffer; meanwhile the remote peer might be  expecting  a
49       response  based  on  the  data it already sent.  The reason for this is
50       that edge-triggered mode only delivers events when changes occur on the
51       monitored file descriptor.  So, in step 5 the caller might end up wait‐
52       ing for some data that is already present inside the input buffer.   In
53       the  above  example,  an  event on rfd will be generated because of the
54       write done in 2 and the event is consumed in 3.  Since the read  opera‐
55       tion  done  in  4  does  not consume the whole buffer data, the call to
56       epoll_wait(2) done in step 5 might block indefinitely.
57
58       An application that employs the EPOLLET flag  should  use  non-blocking
59       file descriptors to avoid having a blocking read or write starve a task
60       that is handling multiple file descriptors.  The suggested way  to  use
61       epoll as an edge-triggered (EPOLLET) interface is as follows:
62
63              i   with non-blocking file descriptors; and
64
65              ii  by  waiting  for  an  event  only  after read(2) or write(2)
66                  return EAGAIN.
67
68       By contrast, when used as a  level-triggered  interface  (the  default,
69       when  EPOLLET  is not specified), epoll is simply a faster poll(2), and
70       can be used wherever the latter is used since it shares the same seman‐
71       tics.
72
73       Since  even with edge-triggered epoll, multiple events can be generated
74       upon receipt of multiple chunks of data, the caller has the  option  to
75       specify  the EPOLLONESHOT flag, to tell epoll to disable the associated
76       file descriptor after the receipt of an event with epoll_wait(2).  When
77       the  EPOLLONESHOT  flag is specified, it is the caller's responsibility
78       to rearm the file descriptor using epoll_ctl(2) with EPOLL_CTL_MOD.
79
80   /proc interfaces
81       The following interfaces can be used to limit the amount of kernel mem‐
82       ory consumed by epoll:
83
84       /proc/sys/fs/epoll/max_user_watches (since Linux 2.6.28)
85              This  specifies  a limit on the total number of file descriptors
86              that a user can register across all epoll instances on the  sys‐
87              tem.   The  limit  is  per  real  user ID.  Each registered file
88              descriptor costs roughly  90  bytes  on  a  32-bit  kernel,  and
89              roughly  160  bytes  on a 64-bit kernel.  Currently, the default
90              value for max_user_watches is 1/25 (4%)  of  the  available  low
91              memory, divided by the registration cost in bytes.
92
93   Example for Suggested Usage
94       While  the  usage of epoll when employed as a level-triggered interface
95       does have the same  semantics  as  poll(2),  the  edge-triggered  usage
96       requires  more  clarification  to avoid stalls in the application event
97       loop.  In this example, listener is a non-blocking socket on which lis‐
98       ten(2)  has  been  called.  The function do_use_fd() uses the new ready
99       file descriptor until EAGAIN is returned by either read(2) or write(2).
100       An event-driven state machine application should, after having received
101       EAGAIN,  record  its  current  state  so  that  at  the  next  call  to
102       do_use_fd()  it  will  continue  to  read(2)  or write(2) from where it
103       stopped before.
104
105           #define MAX_EVENTS 10
106           struct epoll_event ev, events[MAX_EVENTS];
107           int listen_sock, conn_sock, nfds, epollfd;
108
109           /* Set up listening socket, 'listen_sock' (socket(),
110              bind(), listen()) */
111
112           epollfd = epoll_create(10);
113           if (epollfd == -1) {
114               perror("epoll_create");
115               exit(EXIT_FAILURE);
116           }
117
118           ev.events = EPOLLIN;
119           ev.data.fd = listen_sock;
120           if (epoll_ctl(epollfd, EPOLL_CTL_ADD, listen_sock, &ev) == -1) {
121               perror("epoll_ctl: listen_sock");
122               exit(EXIT_FAILURE);
123           }
124
125           for (;;) {
126               nfds = epoll_wait(epollfd, events, MAX_EVENTS, -1);
127               if (nfds == -1) {
128                   perror("epoll_pwait");
129                   exit(EXIT_FAILURE);
130               }
131
132               for (n = 0; n < nfds; ++n) {
133                   if (events[n].data.fd == listen_sock) {
134                       conn_sock = accept(listen_sock,
135                                       (struct sockaddr *) &local, &addrlen);
136                       if (conn_sock == -1) {
137                           perror("accept");
138                           exit(EXIT_FAILURE);
139                       }
140                       setnonblocking(conn_sock);
141                       ev.events = EPOLLIN | EPOLLET;
142                       ev.data.fd = conn_sock;
143                       if (epoll_ctl(epollfd, EPOLL_CTL_ADD, conn_sock,
144                                   &ev) == -1) {
145                           perror("epoll_ctl: conn_sock");
146                           exit(EXIT_FAILURE);
147                       }
148                   } else {
149                       do_use_fd(events[n].data.fd);
150                   }
151               }
152           }
153
154       When used as an edge-triggered interface, for performance  reasons,  it
155       is  possible  to  add  the  file  descriptor inside the epoll interface
156       (EPOLL_CTL_ADD) once by specifying (EPOLLIN|EPOLLOUT).  This allows you
157       to  avoid  continuously  switching between EPOLLIN and EPOLLOUT calling
158       epoll_ctl(2) with EPOLL_CTL_MOD.
159
160   Questions and Answers
161       Q0  What is the key used to distinguish the file descriptors registered
162           in an epoll set?
163
164       A0  The  key  is  the combination of the file descriptor number and the
165           open file description (also known as an  "open  file  handle",  the
166           kernel's internal representation of an open file).
167
168       Q1  What  happens  if you register the same file descriptor on an epoll
169           instance twice?
170
171       A1  You will probably get EEXIST.  However, it is  possible  to  add  a
172           duplicate  (dup(2),  dup2(2),  fcntl(2)  F_DUPFD) descriptor to the
173           same epoll instance.  This can be a useful technique for  filtering
174           events,  if the duplicate file descriptors are registered with dif‐
175           ferent events masks.
176
177       Q2  Can two epoll instances wait for the same file descriptor?  If  so,
178           are events reported to both epoll file descriptors?
179
180       A2  Yes,  and  events would be reported to both.  However, careful pro‐
181           gramming may be needed to do this correctly.
182
183       Q3  Is the epoll file descriptor itself poll/epoll/selectable?
184
185       A3  Yes.  If an epoll file descriptor has events waiting then  it  will
186           indicate as being readable.
187
188       Q4  What  happens  if one attempts to put an epoll file descriptor into
189           its own file descriptor set?
190
191       A4  The epoll_ctl(2) call will fail (EINVAL).  However, you can add  an
192           epoll file descriptor inside another epoll file descriptor set.
193
194       Q5  Can  I  send  an epoll file descriptor over a Unix domain socket to
195           another process?
196
197       A5  Yes, but it does not make sense to do  this,  since  the  receiving
198           process  would not have copies of the file descriptors in the epoll
199           set.
200
201       Q6  Will closing a file descriptor cause it  to  be  removed  from  all
202           epoll sets automatically?
203
204       A6  Yes,  but  be aware of the following point.  A file descriptor is a
205           reference to an open file description (see  open(2)).   Whenever  a
206           descriptor  is duplicated via dup(2), dup2(2), fcntl(2) F_DUPFD, or
207           fork(2), a new file descriptor referring  to  the  same  open  file
208           description  is  created.   An  open  file description continues to
209           exist until all file descriptors referring to it have been  closed.
210           A  file  descriptor is removed from an epoll set only after all the
211           file descriptors referring to the underlying open file  description
212           have been closed (or before if the descriptor is explicitly removed
213           using epoll_ctl() EPOLL_CTL_DEL).  This means  that  even  after  a
214           file  descriptor  that  is  part  of  an epoll set has been closed,
215           events may be reported for  that  file  descriptor  if  other  file
216           descriptors  referring  to  the  same  underlying  file description
217           remain open.
218
219       Q7  If more than one event occurs between epoll_wait(2) calls, are they
220           combined or reported separately?
221
222       A7  They will be combined.
223
224       Q8  Does an operation on a file descriptor affect the already collected
225           but not yet reported events?
226
227       A8  You can do two operations on an existing file  descriptor.   Remove
228           would  be meaningless for this case.  Modify will re-read available
229           I/O.
230
231       Q9  Do I need to continuously read/write a file descriptor until EAGAIN
232           when using the EPOLLET flag (edge-triggered behavior) ?
233
234       A9  Receiving  an  event  from epoll_wait(2) should suggest to you that
235           such file descriptor is ready for the requested I/O operation.  You
236           must  consider  it  ready  until the next (non-blocking) read/write
237           yields EAGAIN.  When and how you will use the  file  descriptor  is
238           entirely up to you.
239
240           For packet/token-oriented files (e.g., datagram socket, terminal in
241           canonical mode), the only way to detect the end of  the  read/write
242           I/O space is to continue to read/write until EAGAIN.
243
244           For  stream-oriented  files  (e.g., pipe, FIFO, stream socket), the
245           condition that the read/write I/O space is exhausted  can  also  be
246           detected  by checking the amount of data read from / written to the
247           target file descriptor.  For example, if you call read(2) by asking
248           to read a certain amount of data and read(2) returns a lower number
249           of bytes, you can be sure of having exhausted the  read  I/O  space
250           for  the  file  descriptor.   The  same  is true when writing using
251           write(2).  (Avoid this latter technique  if  you  cannot  guarantee
252           that  the  monitored file descriptor always refers to a stream-ori‐
253           ented file.)
254
255   Possible Pitfalls and Ways to Avoid Them
256       o Starvation (edge-triggered)
257
258       If there is a large amount of I/O space, it is possible that by  trying
259       to  drain it the other files will not get processed causing starvation.
260       (This problem is not specific to epoll.)
261
262       The solution is to maintain a ready list and mark the  file  descriptor
263       as  ready in its associated data structure, thereby allowing the appli‐
264       cation to remember which files need to be  processed  but  still  round
265       robin  amongst all the ready files.  This also supports ignoring subse‐
266       quent events you receive for file descriptors that are already ready.
267
268       o If using an event cache...
269
270       If you use an event cache or store all the  file  descriptors  returned
271       from epoll_wait(2), then make sure to provide a way to mark its closure
272       dynamically (i.e., caused by a previous event's  processing).   Suppose
273       you receive 100 events from epoll_wait(2), and in event #47 a condition
274       causes event #13 to  be  closed.   If  you  remove  the  structure  and
275       close(2) the file descriptor for event #13, then your event cache might
276       still say there are events waiting for  that  file  descriptor  causing
277       confusion.
278
279       One  solution  for  this is to call, during the processing of event 47,
280       epoll_ctl(EPOLL_CTL_DEL) to delete file  descriptor  13  and  close(2),
281       then  mark  its  associated  data structure as removed and link it to a
282       cleanup list.  If you find another event for file descriptor 13 in your
283       batch processing, you will discover the file descriptor had been previ‐
284       ously removed and there will be no confusion.
285

VERSIONS

287       The epoll API was introduced in Linux kernel 2.5.44.  Support was added
288       to glibc in version 2.3.2.
289

CONFORMING TO

291       The  epoll  API  is Linux-specific.  Some other systems provide similar
292       mechanisms, for example, FreeBSD has kqueue, and Solaris has /dev/poll.
293

SEE ALSO

295       epoll_create(2), epoll_create1(2), epoll_ctl(2), epoll_wait(2)
296

COLOPHON

298       This page is part of release 3.22 of the Linux  man-pages  project.   A
299       description  of  the project, and information about reporting bugs, can
300       be found at http://www.kernel.org/doc/man-pages/.
301
302
303
304Linux                             2009-02-01                          EPOLL(7)
Impressum