1SOCKET(7) Linux Programmer's Manual SOCKET(7)
2
3
4
6 socket - Linux socket interface
7
9 #include <sys/socket.h>
10
11 sockfd = socket(int socket_family, int socket_type, int protocol);
12
14 This manual page describes the Linux networking socket layer user in‐
15 terface. The BSD compatible sockets are the uniform interface between
16 the user process and the network protocol stacks in the kernel. The
17 protocol modules are grouped into protocol families such as AF_INET,
18 AF_IPX, and AF_PACKET, and socket types such as SOCK_STREAM or
19 SOCK_DGRAM. See socket(2) for more information on families and types.
20
21 Socket-layer functions
22 These functions are used by the user process to send or receive packets
23 and to do other socket operations. For more information see their re‐
24 spective manual pages.
25
26 socket(2) creates a socket, connect(2) connects a socket to a remote
27 socket address, the bind(2) function binds a socket to a local socket
28 address, listen(2) tells the socket that new connections shall be ac‐
29 cepted, and accept(2) is used to get a new socket with a new incoming
30 connection. socketpair(2) returns two connected anonymous sockets (im‐
31 plemented only for a few local families like AF_UNIX)
32
33 send(2), sendto(2), and sendmsg(2) send data over a socket, and
34 recv(2), recvfrom(2), recvmsg(2) receive data from a socket. poll(2)
35 and select(2) wait for arriving data or a readiness to send data. In
36 addition, the standard I/O operations like write(2), writev(2), send‐
37 file(2), read(2), and readv(2) can be used to read and write data.
38
39 getsockname(2) returns the local socket address and getpeername(2) re‐
40 turns the remote socket address. getsockopt(2) and setsockopt(2) are
41 used to set or get socket layer or protocol options. ioctl(2) can be
42 used to set or read some other options.
43
44 close(2) is used to close a socket. shutdown(2) closes parts of a
45 full-duplex socket connection.
46
47 Seeking, or calling pread(2) or pwrite(2) with a nonzero position is
48 not supported on sockets.
49
50 It is possible to do nonblocking I/O on sockets by setting the O_NON‐
51 BLOCK flag on a socket file descriptor using fcntl(2). Then all opera‐
52 tions that would block will (usually) return with EAGAIN (operation
53 should be retried later); connect(2) will return EINPROGRESS error.
54 The user can then wait for various events via poll(2) or select(2).
55
56 ┌──────────────────────────────────────────────────────────────────────┐
57 │ I/O events │
58 ├───────────┬───────────┬──────────────────────────────────────────────┤
59 │Event │ Poll flag │ Occurrence │
60 ├───────────┼───────────┼──────────────────────────────────────────────┤
61 │Read │ POLLIN │ New data arrived. │
62 ├───────────┼───────────┼──────────────────────────────────────────────┤
63 │Read │ POLLIN │ A connection setup has been completed (for │
64 │ │ │ connection-oriented sockets) │
65 ├───────────┼───────────┼──────────────────────────────────────────────┤
66 │Read │ POLLHUP │ A disconnection request has been initiated │
67 │ │ │ by the other end. │
68 ├───────────┼───────────┼──────────────────────────────────────────────┤
69 │Read │ POLLHUP │ A connection is broken (only for connection- │
70 │ │ │ oriented protocols). When the socket is │
71 │ │ │ written SIGPIPE is also sent. │
72 ├───────────┼───────────┼──────────────────────────────────────────────┤
73 │Write │ POLLOUT │ Socket has enough send buffer space for │
74 │ │ │ writing new data. │
75 ├───────────┼───────────┼──────────────────────────────────────────────┤
76 │Read/Write │ POLLIN | │ An outgoing connect(2) finished. │
77 │ │ POLLOUT │ │
78 ├───────────┼───────────┼──────────────────────────────────────────────┤
79 │Read/Write │ POLLERR │ An asynchronous error occurred. │
80 ├───────────┼───────────┼──────────────────────────────────────────────┤
81 │Read/Write │ POLLHUP │ The other end has shut down one direction. │
82 ├───────────┼───────────┼──────────────────────────────────────────────┤
83 │Exception │ POLLPRI │ Urgent data arrived. SIGURG is sent then. │
84 └───────────┴───────────┴──────────────────────────────────────────────┘
85 An alternative to poll(2) and select(2) is to let the kernel inform the
86 application about events via a SIGIO signal. For that the O_ASYNC flag
87 must be set on a socket file descriptor via fcntl(2) and a valid signal
88 handler for SIGIO must be installed via sigaction(2). See the Signals
89 discussion below.
90
91 Socket address structures
92 Each socket domain has its own format for socket addresses, with a do‐
93 main-specific address structure. Each of these structures begins with
94 an integer "family" field (typed as sa_family_t) that indicates the
95 type of the address structure. This allows the various system calls
96 (e.g., connect(2), bind(2), accept(2), getsockname(2), getpeername(2)),
97 which are generic to all socket domains, to determine the domain of a
98 particular socket address.
99
100 To allow any type of socket address to be passed to interfaces in the
101 sockets API, the type struct sockaddr is defined. The purpose of this
102 type is purely to allow casting of domain-specific socket address types
103 to a "generic" type, so as to avoid compiler warnings about type mis‐
104 matches in calls to the sockets API.
105
106 In addition, the sockets API provides the data type struct sock‐
107 addr_storage. This type is suitable to accommodate all supported do‐
108 main-specific socket address structures; it is large enough and is
109 aligned properly. (In particular, it is large enough to hold IPv6
110 socket addresses.) The structure includes the following field, which
111 can be used to identify the type of socket address actually stored in
112 the structure:
113
114 sa_family_t ss_family;
115
116 The sockaddr_storage structure is useful in programs that must handle
117 socket addresses in a generic way (e.g., programs that must deal with
118 both IPv4 and IPv6 socket addresses).
119
120 Socket options
121 The socket options listed below can be set by using setsockopt(2) and
122 read with getsockopt(2) with the socket level set to SOL_SOCKET for all
123 sockets. Unless otherwise noted, optval is a pointer to an int.
124
125 SO_ACCEPTCONN
126 Returns a value indicating whether or not this socket has been
127 marked to accept connections with listen(2). The value 0 indi‐
128 cates that this is not a listening socket, the value 1 indicates
129 that this is a listening socket. This socket option is read-
130 only.
131
132 SO_ATTACH_FILTER (since Linux 2.2), SO_ATTACH_BPF (since Linux 3.19)
133 Attach a classic BPF (SO_ATTACH_FILTER) or an extended BPF
134 (SO_ATTACH_BPF) program to the socket for use as a filter of in‐
135 coming packets. A packet will be dropped if the filter program
136 returns zero. If the filter program returns a nonzero value
137 which is less than the packet's data length, the packet will be
138 truncated to the length returned. If the value returned by the
139 filter is greater than or equal to the packet's data length, the
140 packet is allowed to proceed unmodified.
141
142 The argument for SO_ATTACH_FILTER is a sock_fprog structure, de‐
143 fined in <linux/filter.h>:
144
145 struct sock_fprog {
146 unsigned short len;
147 struct sock_filter *filter;
148 };
149
150 The argument for SO_ATTACH_BPF is a file descriptor returned by
151 the bpf(2) system call and must refer to a program of type
152 BPF_PROG_TYPE_SOCKET_FILTER.
153
154 These options may be set multiple times for a given socket, each
155 time replacing the previous filter program. The classic and ex‐
156 tended versions may be called on the same socket, but the previ‐
157 ous filter will always be replaced such that a socket never has
158 more than one filter defined.
159
160 Both classic and extended BPF are explained in the kernel source
161 file Documentation/networking/filter.txt
162
163 SO_ATTACH_REUSEPORT_CBPF, SO_ATTACH_REUSEPORT_EBPF
164 For use with the SO_REUSEPORT option, these options allow the
165 user to set a classic BPF (SO_ATTACH_REUSEPORT_CBPF) or an ex‐
166 tended BPF (SO_ATTACH_REUSEPORT_EBPF) program which defines how
167 packets are assigned to the sockets in the reuseport group (that
168 is, all sockets which have SO_REUSEPORT set and are using the
169 same local address to receive packets).
170
171 The BPF program must return an index between 0 and N-1 repre‐
172 senting the socket which should receive the packet (where N is
173 the number of sockets in the group). If the BPF program returns
174 an invalid index, socket selection will fall back to the plain
175 SO_REUSEPORT mechanism.
176
177 Sockets are numbered in the order in which they are added to the
178 group (that is, the order of bind(2) calls for UDP sockets or
179 the order of listen(2) calls for TCP sockets). New sockets
180 added to a reuseport group will inherit the BPF program. When a
181 socket is removed from a reuseport group (via close(2)), the
182 last socket in the group will be moved into the closed socket's
183 position.
184
185 These options may be set repeatedly at any time on any socket in
186 the group to replace the current BPF program used by all sockets
187 in the group.
188
189 SO_ATTACH_REUSEPORT_CBPF takes the same argument type as SO_AT‐
190 TACH_FILTER and SO_ATTACH_REUSEPORT_EBPF takes the same argument
191 type as SO_ATTACH_BPF.
192
193 UDP support for this feature is available since Linux 4.5; TCP
194 support is available since Linux 4.6.
195
196 SO_BINDTODEVICE
197 Bind this socket to a particular device like “eth0”, as speci‐
198 fied in the passed interface name. If the name is an empty
199 string or the option length is zero, the socket device binding
200 is removed. The passed option is a variable-length null-termi‐
201 nated interface name string with the maximum size of IFNAMSIZ.
202 If a socket is bound to an interface, only packets received from
203 that particular interface are processed by the socket. Note
204 that this works only for some socket types, particularly AF_INET
205 sockets. It is not supported for packet sockets (use normal
206 bind(2) there).
207
208 Before Linux 3.8, this socket option could be set, but could not
209 retrieved with getsockopt(2). Since Linux 3.8, it is readable.
210 The optlen argument should contain the buffer size available to
211 receive the device name and is recommended to be IFNAMSIZ bytes.
212 The real device name length is reported back in the optlen argu‐
213 ment.
214
215 SO_BROADCAST
216 Set or get the broadcast flag. When enabled, datagram sockets
217 are allowed to send packets to a broadcast address. This option
218 has no effect on stream-oriented sockets.
219
220 SO_BSDCOMPAT
221 Enable BSD bug-to-bug compatibility. This is used by the UDP
222 protocol module in Linux 2.0 and 2.2. If enabled, ICMP errors
223 received for a UDP socket will not be passed to the user pro‐
224 gram. In later kernel versions, support for this option has
225 been phased out: Linux 2.4 silently ignores it, and Linux 2.6
226 generates a kernel warning (printk()) if a program uses this op‐
227 tion. Linux 2.0 also enabled BSD bug-to-bug compatibility op‐
228 tions (random header changing, skipping of the broadcast flag)
229 for raw sockets with this option, but that was removed in Linux
230 2.2.
231
232 SO_DEBUG
233 Enable socket debugging. Allowed only for processes with the
234 CAP_NET_ADMIN capability or an effective user ID of 0.
235
236 SO_DETACH_FILTER (since Linux 2.2), SO_DETACH_BPF (since Linux 3.19)
237 These two options, which are synonyms, may be used to remove the
238 classic or extended BPF program attached to a socket with either
239 SO_ATTACH_FILTER or SO_ATTACH_BPF. The option value is ignored.
240
241 SO_DOMAIN (since Linux 2.6.32)
242 Retrieves the socket domain as an integer, returning a value
243 such as AF_INET6. See socket(2) for details. This socket op‐
244 tion is read-only.
245
246 SO_ERROR
247 Get and clear the pending socket error. This socket option is
248 read-only. Expects an integer.
249
250 SO_DONTROUTE
251 Don't send via a gateway, send only to directly connected hosts.
252 The same effect can be achieved by setting the MSG_DONTROUTE
253 flag on a socket send(2) operation. Expects an integer boolean
254 flag.
255
256 SO_INCOMING_CPU (gettable since Linux 3.19, settable since Linux 4.4)
257 Sets or gets the CPU affinity of a socket. Expects an integer
258 flag.
259
260 int cpu = 1;
261 setsockopt(fd, SOL_SOCKET, SO_INCOMING_CPU, &cpu,
262 sizeof(cpu));
263
264 Because all of the packets for a single stream (i.e., all pack‐
265 ets for the same 4-tuple) arrive on the single RX queue that is
266 associated with a particular CPU, the typical use case is to em‐
267 ploy one listening process per RX queue, with the incoming flow
268 being handled by a listener on the same CPU that is handling the
269 RX queue. This provides optimal NUMA behavior and keeps CPU
270 caches hot.
271
272 SO_INCOMING_NAPI_ID (gettable since Linux 4.12)
273 Returns a system-level unique ID called NAPI ID that is associ‐
274 ated with a RX queue on which the last packet associated with
275 that socket is received.
276
277 This can be used by an application to split the incoming flows
278 among worker threads based on the RX queue on which the packets
279 associated with the flows are received. It allows each worker
280 thread to be associated with a NIC HW receive queue and service
281 all the connection requests received on that RX queue. This
282 mapping between a app thread and a HW NIC queue streamlines the
283 flow of data from the NIC to the application.
284
285 SO_KEEPALIVE
286 Enable sending of keep-alive messages on connection-oriented
287 sockets. Expects an integer boolean flag.
288
289 SO_LINGER
290 Sets or gets the SO_LINGER option. The argument is a linger
291 structure.
292
293 struct linger {
294 int l_onoff; /* linger active */
295 int l_linger; /* how many seconds to linger for */
296 };
297
298 When enabled, a close(2) or shutdown(2) will not return until
299 all queued messages for the socket have been successfully sent
300 or the linger timeout has been reached. Otherwise, the call re‐
301 turns immediately and the closing is done in the background.
302 When the socket is closed as part of exit(2), it always lingers
303 in the background.
304
305 SO_LOCK_FILTER
306 When set, this option will prevent changing the filters associ‐
307 ated with the socket. These filters include any set using the
308 socket options SO_ATTACH_FILTER, SO_ATTACH_BPF, SO_ATTACH_REUSE‐
309 PORT_CBPF, and SO_ATTACH_REUSEPORT_EBPF.
310
311 The typical use case is for a privileged process to set up a raw
312 socket (an operation that requires the CAP_NET_RAW capability),
313 apply a restrictive filter, set the SO_LOCK_FILTER option, and
314 then either drop its privileges or pass the socket file descrip‐
315 tor to an unprivileged process via a UNIX domain socket.
316
317 Once the SO_LOCK_FILTER option has been enabled, attempts to
318 change or remove the filter attached to a socket, or to disable
319 the SO_LOCK_FILTER option will fail with the error EPERM.
320
321 SO_MARK (since Linux 2.6.25)
322 Set the mark for each packet sent through this socket (similar
323 to the netfilter MARK target but socket-based). Changing the
324 mark can be used for mark-based routing without netfilter or for
325 packet filtering. Setting this option requires the CAP_NET_AD‐
326 MIN capability.
327
328 SO_OOBINLINE
329 If this option is enabled, out-of-band data is directly placed
330 into the receive data stream. Otherwise, out-of-band data is
331 passed only when the MSG_OOB flag is set during receiving.
332
333 SO_PASSCRED
334 Enable or disable the receiving of the SCM_CREDENTIALS control
335 message. For more information see unix(7).
336
337 SO_PASSSEC
338 Enable or disable the receiving of the SCM_SECURITY control mes‐
339 sage. For more information see unix(7).
340
341 SO_PEEK_OFF (since Linux 3.4)
342 This option, which is currently supported only for unix(7) sock‐
343 ets, sets the value of the "peek offset" for the recv(2) system
344 call when used with MSG_PEEK flag.
345
346 When this option is set to a negative value (it is set to -1 for
347 all new sockets), traditional behavior is provided: recv(2) with
348 the MSG_PEEK flag will peek data from the front of the queue.
349
350 When the option is set to a value greater than or equal to zero,
351 then the next peek at data queued in the socket will occur at
352 the byte offset specified by the option value. At the same
353 time, the "peek offset" will be incremented by the number of
354 bytes that were peeked from the queue, so that a subsequent peek
355 will return the next data in the queue.
356
357 If data is removed from the front of the queue via a call to
358 recv(2) (or similar) without the MSG_PEEK flag, the "peek off‐
359 set" will be decreased by the number of bytes removed. In other
360 words, receiving data without the MSG_PEEK flag will cause the
361 "peek offset" to be adjusted to maintain the correct relative
362 position in the queued data, so that a subsequent peek will re‐
363 trieve the data that would have been retrieved had the data not
364 been removed.
365
366 For datagram sockets, if the "peek offset" points to the middle
367 of a packet, the data returned will be marked with the MSG_TRUNC
368 flag.
369
370 The following example serves to illustrate the use of
371 SO_PEEK_OFF. Suppose a stream socket has the following queued
372 input data:
373
374 aabbccddeeff
375
376 The following sequence of recv(2) calls would have the effect
377 noted in the comments:
378
379 int ov = 4; // Set peek offset to 4
380 setsockopt(fd, SOL_SOCKET, SO_PEEK_OFF, &ov, sizeof(ov));
381
382 recv(fd, buf, 2, MSG_PEEK); // Peeks "cc"; offset set to 6
383 recv(fd, buf, 2, MSG_PEEK); // Peeks "dd"; offset set to 8
384 recv(fd, buf, 2, 0); // Reads "aa"; offset set to 6
385 recv(fd, buf, 2, MSG_PEEK); // Peeks "ee"; offset set to 8
386
387 SO_PEERCRED
388 Return the credentials of the peer process connected to this
389 socket. For further details, see unix(7).
390
391 SO_PEERSEC (since Linux 2.6.2)
392 Return the security context of the peer socket connected to this
393 socket. For further details, see unix(7) and ip(7).
394
395 SO_PRIORITY
396 Set the protocol-defined priority for all packets to be sent on
397 this socket. Linux uses this value to order the networking
398 queues: packets with a higher priority may be processed first
399 depending on the selected device queueing discipline. Setting a
400 priority outside the range 0 to 6 requires the CAP_NET_ADMIN ca‐
401 pability.
402
403 SO_PROTOCOL (since Linux 2.6.32)
404 Retrieves the socket protocol as an integer, returning a value
405 such as IPPROTO_SCTP. See socket(2) for details. This socket
406 option is read-only.
407
408 SO_RCVBUF
409 Sets or gets the maximum socket receive buffer in bytes. The
410 kernel doubles this value (to allow space for bookkeeping over‐
411 head) when it is set using setsockopt(2), and this doubled value
412 is returned by getsockopt(2). The default value is set by the
413 /proc/sys/net/core/rmem_default file, and the maximum allowed
414 value is set by the /proc/sys/net/core/rmem_max file. The mini‐
415 mum (doubled) value for this option is 256.
416
417 SO_RCVBUFFORCE (since Linux 2.6.14)
418 Using this socket option, a privileged (CAP_NET_ADMIN) process
419 can perform the same task as SO_RCVBUF, but the rmem_max limit
420 can be overridden.
421
422 SO_RCVLOWAT and SO_SNDLOWAT
423 Specify the minimum number of bytes in the buffer until the
424 socket layer will pass the data to the protocol (SO_SNDLOWAT) or
425 the user on receiving (SO_RCVLOWAT). These two values are ini‐
426 tialized to 1. SO_SNDLOWAT is not changeable on Linux (setsock‐
427 opt(2) fails with the error ENOPROTOOPT). SO_RCVLOWAT is
428 changeable only since Linux 2.4.
429
430 Before Linux 2.6.28 select(2), poll(2), and epoll(7) did not re‐
431 spect the SO_RCVLOWAT setting on Linux, and indicated a socket
432 as readable when even a single byte of data was available. A
433 subsequent read from the socket would then block until
434 SO_RCVLOWAT bytes are available. Since Linux 2.6.28, select(2),
435 poll(2), and epoll(7) indicate a socket as readable only if at
436 least SO_RCVLOWAT bytes are available.
437
438 SO_RCVTIMEO and SO_SNDTIMEO
439 Specify the receiving or sending timeouts until reporting an er‐
440 ror. The argument is a struct timeval. If an input or output
441 function blocks for this period of time, and data has been sent
442 or received, the return value of that function will be the
443 amount of data transferred; if no data has been transferred and
444 the timeout has been reached, then -1 is returned with errno set
445 to EAGAIN or EWOULDBLOCK, or EINPROGRESS (for connect(2)) just
446 as if the socket was specified to be nonblocking. If the time‐
447 out is set to zero (the default), then the operation will never
448 timeout. Timeouts only have effect for system calls that per‐
449 form socket I/O (e.g., read(2), recvmsg(2), send(2),
450 sendmsg(2)); timeouts have no effect for select(2), poll(2),
451 epoll_wait(2), and so on.
452
453 SO_REUSEADDR
454 Indicates that the rules used in validating addresses supplied
455 in a bind(2) call should allow reuse of local addresses. For
456 AF_INET sockets this means that a socket may bind, except when
457 there is an active listening socket bound to the address. When
458 the listening socket is bound to INADDR_ANY with a specific port
459 then it is not possible to bind to this port for any local ad‐
460 dress. Argument is an integer boolean flag.
461
462 SO_REUSEPORT (since Linux 3.9)
463 Permits multiple AF_INET or AF_INET6 sockets to be bound to an
464 identical socket address. This option must be set on each
465 socket (including the first socket) prior to calling bind(2) on
466 the socket. To prevent port hijacking, all of the processes
467 binding to the same address must have the same effective UID.
468 This option can be employed with both TCP and UDP sockets.
469
470 For TCP sockets, this option allows accept(2) load distribution
471 in a multi-threaded server to be improved by using a distinct
472 listener socket for each thread. This provides improved load
473 distribution as compared to traditional techniques such using a
474 single accept(2)ing thread that distributes connections, or hav‐
475 ing multiple threads that compete to accept(2) from the same
476 socket.
477
478 For UDP sockets, the use of this option can provide better dis‐
479 tribution of incoming datagrams to multiple processes (or
480 threads) as compared to the traditional technique of having mul‐
481 tiple processes compete to receive datagrams on the same socket.
482
483 SO_RXQ_OVFL (since Linux 2.6.33)
484 Indicates that an unsigned 32-bit value ancillary message (cmsg)
485 should be attached to received skbs indicating the number of
486 packets dropped by the socket since its creation.
487
488 SO_SELECT_ERR_QUEUE (since Linux 3.10)
489 When this option is set on a socket, an error condition on a
490 socket causes notification not only via the exceptfds set of se‐
491 lect(2). Similarly, poll(2) also returns a POLLPRI whenever an
492 POLLERR event is returned.
493
494 Background: this option was added when waking up on an error
495 condition occurred only via the readfds and writefds sets of se‐
496 lect(2). The option was added to allow monitoring for error
497 conditions via the exceptfds argument without simultaneously
498 having to receive notifications (via readfds) for regular data
499 that can be read from the socket. After changes in Linux 4.16,
500 the use of this flag to achieve the desired notifications is no
501 longer necessary. This option is nevertheless retained for
502 backwards compatibility.
503
504 SO_SNDBUF
505 Sets or gets the maximum socket send buffer in bytes. The ker‐
506 nel doubles this value (to allow space for bookkeeping overhead)
507 when it is set using setsockopt(2), and this doubled value is
508 returned by getsockopt(2). The default value is set by the
509 /proc/sys/net/core/wmem_default file and the maximum allowed
510 value is set by the /proc/sys/net/core/wmem_max file. The mini‐
511 mum (doubled) value for this option is 2048.
512
513 SO_SNDBUFFORCE (since Linux 2.6.14)
514 Using this socket option, a privileged (CAP_NET_ADMIN) process
515 can perform the same task as SO_SNDBUF, but the wmem_max limit
516 can be overridden.
517
518 SO_TIMESTAMP
519 Enable or disable the receiving of the SO_TIMESTAMP control mes‐
520 sage. The timestamp control message is sent with level
521 SOL_SOCKET and a cmsg_type of SCM_TIMESTAMP. The cmsg_data
522 field is a struct timeval indicating the reception time of the
523 last packet passed to the user in this call. See cmsg(3) for
524 details on control messages.
525
526 SO_TIMESTAMPNS (since Linux 2.6.22)
527 Enable or disable the receiving of the SO_TIMESTAMPNS control
528 message. The timestamp control message is sent with level
529 SOL_SOCKET and a cmsg_type of SCM_TIMESTAMPNS. The cmsg_data
530 field is a struct timespec indicating the reception time of the
531 last packet passed to the user in this call. The clock used for
532 the timestamp is CLOCK_REALTIME. See cmsg(3) for details on
533 control messages.
534
535 A socket cannot mix SO_TIMESTAMP and SO_TIMESTAMPNS: the two
536 modes are mutually exclusive.
537
538 SO_TYPE
539 Gets the socket type as an integer (e.g., SOCK_STREAM). This
540 socket option is read-only.
541
542 SO_BUSY_POLL (since Linux 3.11)
543 Sets the approximate time in microseconds to busy poll on a
544 blocking receive when there is no data. Increasing this value
545 requires CAP_NET_ADMIN. The default for this option is con‐
546 trolled by the /proc/sys/net/core/busy_read file.
547
548 The value in the /proc/sys/net/core/busy_poll file determines
549 how long select(2) and poll(2) will busy poll when they operate
550 on sockets with SO_BUSY_POLL set and no events to report are
551 found.
552
553 In both cases, busy polling will only be done when the socket
554 last received data from a network device that supports this op‐
555 tion.
556
557 While busy polling may improve latency of some applications,
558 care must be taken when using it since this will increase both
559 CPU utilization and power usage.
560
561 Signals
562 When writing onto a connection-oriented socket that has been shut down
563 (by the local or the remote end) SIGPIPE is sent to the writing process
564 and EPIPE is returned. The signal is not sent when the write call
565 specified the MSG_NOSIGNAL flag.
566
567 When requested with the FIOSETOWN fcntl(2) or SIOCSPGRP ioctl(2), SIGIO
568 is sent when an I/O event occurs. It is possible to use poll(2) or se‐
569 lect(2) in the signal handler to find out which socket the event oc‐
570 curred on. An alternative (in Linux 2.2) is to set a real-time signal
571 using the F_SETSIG fcntl(2); the handler of the real time signal will
572 be called with the file descriptor in the si_fd field of its siginfo_t.
573 See fcntl(2) for more information.
574
575 Under some circumstances (e.g., multiple processes accessing a single
576 socket), the condition that caused the SIGIO may have already disap‐
577 peared when the process reacts to the signal. If this happens, the
578 process should wait again because Linux will resend the signal later.
579
580 /proc interfaces
581 The core socket networking parameters can be accessed via files in the
582 directory /proc/sys/net/core/.
583
584 rmem_default
585 contains the default setting in bytes of the socket receive buf‐
586 fer.
587
588 rmem_max
589 contains the maximum socket receive buffer size in bytes which a
590 user may set by using the SO_RCVBUF socket option.
591
592 wmem_default
593 contains the default setting in bytes of the socket send buffer.
594
595 wmem_max
596 contains the maximum socket send buffer size in bytes which a
597 user may set by using the SO_SNDBUF socket option.
598
599 message_cost and message_burst
600 configure the token bucket filter used to load limit warning
601 messages caused by external network events.
602
603 netdev_max_backlog
604 Maximum number of packets in the global input queue.
605
606 optmem_max
607 Maximum length of ancillary data and user control data like the
608 iovecs per socket.
609
610 Ioctls
611 These operations can be accessed using ioctl(2):
612
613 error = ioctl(ip_socket, ioctl_type, &value_result);
614
615 SIOCGSTAMP
616 Return a struct timeval with the receive timestamp of the last
617 packet passed to the user. This is useful for accurate round
618 trip time measurements. See setitimer(2) for a description of
619 struct timeval. This ioctl should be used only if the socket
620 options SO_TIMESTAMP and SO_TIMESTAMPNS are not set on the
621 socket. Otherwise, it returns the timestamp of the last packet
622 that was received while SO_TIMESTAMP and SO_TIMESTAMPNS were not
623 set, or it fails if no such packet has been received, (i.e.,
624 ioctl(2) returns -1 with errno set to ENOENT).
625
626 SIOCSPGRP
627 Set the process or process group that is to receive SIGIO or
628 SIGURG signals when I/O becomes possible or urgent data is
629 available. The argument is a pointer to a pid_t. For further
630 details, see the description of F_SETOWN in fcntl(2).
631
632 FIOASYNC
633 Change the O_ASYNC flag to enable or disable asynchronous I/O
634 mode of the socket. Asynchronous I/O mode means that the SIGIO
635 signal or the signal set with F_SETSIG is raised when a new I/O
636 event occurs.
637
638 Argument is an integer boolean flag. (This operation is synony‐
639 mous with the use of fcntl(2) to set the O_ASYNC flag.)
640
641 SIOCGPGRP
642 Get the current process or process group that receives SIGIO or
643 SIGURG signals, or 0 when none is set.
644
645 Valid fcntl(2) operations:
646
647 FIOGETOWN
648 The same as the SIOCGPGRP ioctl(2).
649
650 FIOSETOWN
651 The same as the SIOCSPGRP ioctl(2).
652
654 SO_BINDTODEVICE was introduced in Linux 2.0.30. SO_PASSCRED is new in
655 Linux 2.2. The /proc interfaces were introduced in Linux 2.2. SO_RCV‐
656 TIMEO and SO_SNDTIMEO are supported since Linux 2.3.41. Earlier, time‐
657 outs were fixed to a protocol-specific setting, and could not be read
658 or written.
659
661 Linux assumes that half of the send/receive buffer is used for internal
662 kernel structures; thus the values in the corresponding /proc files are
663 twice what can be observed on the wire.
664
665 Linux will allow port reuse only with the SO_REUSEADDR option when this
666 option was set both in the previous program that performed a bind(2) to
667 the port and in the program that wants to reuse the port. This differs
668 from some implementations (e.g., FreeBSD) where only the later program
669 needs to set the SO_REUSEADDR option. Typically this difference is in‐
670 visible, since, for example, a server program is designed to always set
671 this option.
672
674 wireshark(1), bpf(2), connect(2), getsockopt(2), setsockopt(2),
675 socket(2), pcap(3), address_families(7), capabilities(7), ddp(7),
676 ip(7), ipv6(7), packet(7), tcp(7), udp(7), unix(7), tcpdump(8)
677
679 This page is part of release 5.13 of the Linux man-pages project. A
680 description of the project, information about reporting bugs, and the
681 latest version of this page, can be found at
682 https://www.kernel.org/doc/man-pages/.
683
684
685
686Linux 2021-03-22 SOCKET(7)