epoll(7) - f29

1EPOLL(7)                   Linux Programmer's Manual                  EPOLL(7)
2
3
4

NAME

6       epoll - I/O event notification facility
7

SYNOPSIS

9       #include <sys/epoll.h>
10

DESCRIPTION

12       The  epoll  API performs a similar task to poll(2): monitoring multiple
13       file descriptors to see if I/O is possible on any of them.   The  epoll
14       API can be used either as an edge-triggered or a level-triggered inter‐
15       face and scales well to large numbers of watched file descriptors.  The
16       following  system  calls  are  provided  to  create and manage an epoll
17       instance:
18
19       *  epoll_create(2) creates a new epoll  instance  and  returns  a  file
20          descriptor  referring to that instance.  (The more recent epoll_cre‐
21          ate1(2) extends the functionality of epoll_create(2).)
22
23       *  Interest in particular  file  descriptors  is  then  registered  via
24          epoll_ctl(2).   The  set of file descriptors currently registered on
25          an epoll instance is sometimes called an epoll set.
26
27       *  epoll_wait(2) waits for I/O events, blocking the calling  thread  if
28          no events are currently available.
29
30   Level-triggered and edge-triggered
31       The  epoll event distribution interface is able to behave both as edge-
32       triggered (ET) and as level-triggered (LT).  The difference between the
33       two mechanisms can be described as follows.  Suppose that this scenario
34       happens:
35
36       1. The file descriptor that represents the read side of a pipe (rfd) is
37          registered on the epoll instance.
38
39       2. A pipe writer writes 2 kB of data on the write side of the pipe.
40
41       3. A call to epoll_wait(2) is done that will return rfd as a ready file
42          descriptor.
43
44       4. The pipe reader reads 1 kB of data from rfd.
45
46       5. A call to epoll_wait(2) is done.
47
48       If the rfd file descriptor has been added to the epoll interface  using
49       the  EPOLLET  (edge-triggered)  flag, the call to epoll_wait(2) done in
50       step 5 will probably hang despite the available data still  present  in
51       the  file  input buffer; meanwhile the remote peer might be expecting a
52       response based on the data it already sent.  The  reason  for  this  is
53       that edge-triggered mode delivers events only when changes occur on the
54       monitored file descriptor.  So, in step 5 the caller might end up wait‐
55       ing  for some data that is already present inside the input buffer.  In
56       the above example, an event on rfd will be  generated  because  of  the
57       write  done in 2 and the event is consumed in 3.  Since the read opera‐
58       tion done in 4 does not consume the whole  buffer  data,  the  call  to
59       epoll_wait(2) done in step 5 might block indefinitely.
60
61       An  application  that  employs  the EPOLLET flag should use nonblocking
62       file descriptors to avoid having a blocking read or write starve a task
63       that  is  handling multiple file descriptors.  The suggested way to use
64       epoll as an edge-triggered (EPOLLET) interface is as follows:
65
66              i   with nonblocking file descriptors; and
67
68              ii  by waiting for an  event  only  after  read(2)  or  write(2)
69                  return EAGAIN.
70
71       By  contrast,  when  used  as a level-triggered interface (the default,
72       when EPOLLET is not specified), epoll is simply a faster  poll(2),  and
73       can be used wherever the latter is used since it shares the same seman‐
74       tics.
75
76       Since even with edge-triggered epoll, multiple events can be  generated
77       upon  receipt  of multiple chunks of data, the caller has the option to
78       specify the EPOLLONESHOT flag, to tell epoll to disable the  associated
79       file descriptor after the receipt of an event with epoll_wait(2).  When
80       the EPOLLONESHOT flag is specified, it is the  caller's  responsibility
81       to rearm the file descriptor using epoll_ctl(2) with EPOLL_CTL_MOD.
82
83   Interaction with autosleep
84       If  the  system  is  in  autosleep mode via /sys/power/autosleep and an
85       event happens which wakes the device from sleep, the device driver will
86       keep  the  device  awake  only until that event is queued.  To keep the
87       device awake until the event has been processed, it is necessary to use
88       the epoll_ctl(2) EPOLLWAKEUP flag.
89
90       When  the  EPOLLWAKEUP  flag  is  set  in the events field for a struct
91       epoll_event, the system will be kept awake from the moment the event is
92       queued,  through  the  epoll_wait(2) call which returns the event until
93       the subsequent epoll_wait(2) call.  If the event should keep the system
94       awake  beyond  that  time,  then  a  separate wake_lock should be taken
95       before the second epoll_wait(2) call.
96
97   /proc interfaces
98       The following interfaces can be used to limit the amount of kernel mem‐
99       ory consumed by epoll:
100
101       /proc/sys/fs/epoll/max_user_watches (since Linux 2.6.28)
102              This  specifies  a limit on the total number of file descriptors
103              that a user can register across all epoll instances on the  sys‐
104              tem.   The  limit  is  per  real  user ID.  Each registered file
105              descriptor costs roughly  90  bytes  on  a  32-bit  kernel,  and
106              roughly  160  bytes  on a 64-bit kernel.  Currently, the default
107              value for max_user_watches is 1/25 (4%)  of  the  available  low
108              memory, divided by the registration cost in bytes.
109
110   Example for suggested usage
111       While  the  usage of epoll when employed as a level-triggered interface
112       does have the same  semantics  as  poll(2),  the  edge-triggered  usage
113       requires  more  clarification  to avoid stalls in the application event
114       loop.  In this example, listener is a nonblocking socket on which  lis‐
115       ten(2)  has  been  called.  The function do_use_fd() uses the new ready
116       file descriptor until EAGAIN is returned by either read(2) or write(2).
117       An event-driven state machine application should, after having received
118       EAGAIN,  record  its  current  state  so  that  at  the  next  call  to
119       do_use_fd()  it  will  continue  to  read(2)  or write(2) from where it
120       stopped before.
121
122           #define MAX_EVENTS 10
123           struct epoll_event ev, events[MAX_EVENTS];
124           int listen_sock, conn_sock, nfds, epollfd;
125
126           /* Code to set up listening socket, 'listen_sock',
127              (socket(), bind(), listen()) omitted */
128
129           epollfd = epoll_create1(0);
130           if (epollfd == -1) {
131               perror("epoll_create1");
132               exit(EXIT_FAILURE);
133           }
134
135           ev.events = EPOLLIN;
136           ev.data.fd = listen_sock;
137           if (epoll_ctl(epollfd, EPOLL_CTL_ADD, listen_sock, &ev) == -1) {
138               perror("epoll_ctl: listen_sock");
139               exit(EXIT_FAILURE);
140           }
141
142           for (;;) {
143               nfds = epoll_wait(epollfd, events, MAX_EVENTS, -1);
144               if (nfds == -1) {
145                   perror("epoll_wait");
146                   exit(EXIT_FAILURE);
147               }
148
149               for (n = 0; n < nfds; ++n) {
150                   if (events[n].data.fd == listen_sock) {
151                       conn_sock = accept(listen_sock,
152                                          (struct sockaddr *) &addr, &addrlen);
153                       if (conn_sock == -1) {
154                           perror("accept");
155                           exit(EXIT_FAILURE);
156                       }
157                       setnonblocking(conn_sock);
158                       ev.events = EPOLLIN | EPOLLET;
159                       ev.data.fd = conn_sock;
160                       if (epoll_ctl(epollfd, EPOLL_CTL_ADD, conn_sock,
161                                   &ev) == -1) {
162                           perror("epoll_ctl: conn_sock");
163                           exit(EXIT_FAILURE);
164                       }
165                   } else {
166                       do_use_fd(events[n].data.fd);
167                   }
168               }
169           }
170
171       When used as an edge-triggered interface, for performance  reasons,  it
172       is  possible  to  add  the  file  descriptor inside the epoll interface
173       (EPOLL_CTL_ADD) once by specifying (EPOLLIN|EPOLLOUT).  This allows you
174       to  avoid  continuously  switching between EPOLLIN and EPOLLOUT calling
175       epoll_ctl(2) with EPOLL_CTL_MOD.
176
177   Questions and answers
178       Q0  What is the key used to distinguish the file descriptors registered
179           in an epoll set?
180
181       A0  The  key  is  the combination of the file descriptor number and the
182           open file description (also known as an  "open  file  handle",  the
183           kernel's internal representation of an open file).
184
185       Q1  What  happens  if you register the same file descriptor on an epoll
186           instance twice?
187
188       A1  You will probably get EEXIST.  However, it is  possible  to  add  a
189           duplicate  (dup(2),  dup2(2),  fcntl(2) F_DUPFD) file descriptor to
190           the same epoll instance.  This can be a useful technique  for  fil‐
191           tering  events,  if  the  duplicate file descriptors are registered
192           with different events masks.
193
194       Q2  Can two epoll instances wait for the same file descriptor?  If  so,
195           are events reported to both epoll file descriptors?
196
197       A2  Yes,  and  events would be reported to both.  However, careful pro‐
198           gramming may be needed to do this correctly.
199
200       Q3  Is the epoll file descriptor itself poll/epoll/selectable?
201
202       A3  Yes.  If an epoll file descriptor has events waiting, then it  will
203           indicate as being readable.
204
205       Q4  What  happens  if one attempts to put an epoll file descriptor into
206           its own file descriptor set?
207
208       A4  The epoll_ctl(2) call fails (EINVAL).   However,  you  can  add  an
209           epoll file descriptor inside another epoll file descriptor set.
210
211       Q5  Can  I  send  an epoll file descriptor over a UNIX domain socket to
212           another process?
213
214       A5  Yes, but it does not make sense to do  this,  since  the  receiving
215           process  would not have copies of the file descriptors in the epoll
216           set.
217
218       Q6  Will closing a file descriptor cause it  to  be  removed  from  all
219           epoll sets automatically?
220
221       A6  Yes,  but  be aware of the following point.  A file descriptor is a
222           reference to an open file description (see  open(2)).   Whenever  a
223           file   descriptor  is  duplicated  via  dup(2),  dup2(2),  fcntl(2)
224           F_DUPFD, or fork(2), a new file descriptor referring  to  the  same
225           open file description is created.  An open file description contin‐
226           ues to exist until all file descriptors referring to it  have  been
227           closed.   A file descriptor is removed from an epoll set only after
228           all the file descriptors referring  to  the  underlying  open  file
229           description  have  been closed (or before if the file descriptor is
230           explicitly removed using epoll_ctl(2) EPOLL_CTL_DEL).   This  means
231           that  even after a file descriptor that is part of an epoll set has
232           been closed, events may be reported for  that  file  descriptor  if
233           other  file  descriptors  referring  to  the  same  underlying file
234           description remain open.
235
236       Q7  If more than one event occurs between epoll_wait(2) calls, are they
237           combined or reported separately?
238
239       A7  They will be combined.
240
241       Q8  Does an operation on a file descriptor affect the already collected
242           but not yet reported events?
243
244       A8  You can do two operations on an existing file  descriptor.   Remove
245           would  be  meaningless for this case.  Modify will reread available
246           I/O.
247
248       Q9  Do I need to continuously read/write a file descriptor until EAGAIN
249           when using the EPOLLET flag (edge-triggered behavior) ?
250
251       A9  Receiving  an  event  from epoll_wait(2) should suggest to you that
252           such file descriptor is ready for the requested I/O operation.  You
253           must  consider  it  ready  until  the next (nonblocking) read/write
254           yields EAGAIN.  When and how you will use the  file  descriptor  is
255           entirely up to you.
256
257           For packet/token-oriented files (e.g., datagram socket, terminal in
258           canonical mode), the only way to detect the end of  the  read/write
259           I/O space is to continue to read/write until EAGAIN.
260
261           For  stream-oriented  files  (e.g., pipe, FIFO, stream socket), the
262           condition that the read/write I/O space is exhausted  can  also  be
263           detected  by checking the amount of data read from / written to the
264           target file descriptor.  For example, if you call read(2) by asking
265           to read a certain amount of data and read(2) returns a lower number
266           of bytes, you can be sure of having exhausted the  read  I/O  space
267           for  the  file  descriptor.   The  same  is true when writing using
268           write(2).  (Avoid this latter technique  if  you  cannot  guarantee
269           that  the  monitored file descriptor always refers to a stream-ori‐
270           ented file.)
271
272   Possible pitfalls and ways to avoid them
273       o Starvation (edge-triggered)
274
275       If there is a large amount of I/O space, it is possible that by  trying
276       to  drain it the other files will not get processed causing starvation.
277       (This problem is not specific to epoll.)
278
279       The solution is to maintain a ready list and mark the  file  descriptor
280       as  ready in its associated data structure, thereby allowing the appli‐
281       cation to remember which files need to be  processed  but  still  round
282       robin  amongst all the ready files.  This also supports ignoring subse‐
283       quent events you receive for file descriptors that are already ready.
284
285       o If using an event cache...
286
287       If you use an event cache or store all the  file  descriptors  returned
288       from epoll_wait(2), then make sure to provide a way to mark its closure
289       dynamically (i.e., caused by a previous event's  processing).   Suppose
290       you receive 100 events from epoll_wait(2), and in event #47 a condition
291       causes event #13 to  be  closed.   If  you  remove  the  structure  and
292       close(2) the file descriptor for event #13, then your event cache might
293       still say there are events waiting for  that  file  descriptor  causing
294       confusion.
295
296       One  solution  for  this is to call, during the processing of event 47,
297       epoll_ctl(EPOLL_CTL_DEL) to delete file  descriptor  13  and  close(2),
298       then  mark  its  associated  data structure as removed and link it to a
299       cleanup list.  If you find another event for file descriptor 13 in your
300       batch processing, you will discover the file descriptor had been previ‐
301       ously removed and there will be no confusion.
302

VERSIONS

304       The epoll API was introduced in Linux kernel 2.5.44.  Support was added
305       to glibc in version 2.3.2.
306

CONFORMING TO

308       The  epoll  API  is Linux-specific.  Some other systems provide similar
309       mechanisms, for example, FreeBSD has kqueue, and Solaris has /dev/poll.
310

NOTES

312       The set of file descriptors that is being monitored via an  epoll  file
313       descriptor can be viewed via the entry for the epoll file descriptor in
314       the process's /proc/[pid]/fdinfo directory.  See  proc(5)  for  further
315       details.
316
317       The kcmp(2) KCMP_EPOLL_TFD operation can be used to test whether a file
318       descriptor is present in an epoll instance.
319

COLOPHON

325       This  page  is  part of release 4.16 of the Linux man-pages project.  A
326       description of the project, information about reporting bugs,  and  the
327       latest     version     of     this    page,    can    be    found    at
328       https://www.kernel.org/doc/man-pages/.
329
330
331
332Linux                             2017-09-15                          EPOLL(7)