1SOCKET(7) Linux Programmer's Manual SOCKET(7)
2
3
4
6 socket - Linux socket interface
7
9 #include <sys/socket.h>
10
11 sockfd = socket(int socket_family, int socket_type, int protocol);
12
14 This manual page describes the Linux networking socket layer user
15 interface. The BSD compatible sockets are the uniform interface
16 between the user process and the network protocol stacks in the kernel.
17 The protocol modules are grouped into protocol families like AF_INET,
18 AF_IPX, AF_PACKET and socket types like SOCK_STREAM or SOCK_DGRAM. See
19 socket(2) for more information on families and types.
20
21 Socket-layer functions
22 These functions are used by the user process to send or receive packets
23 and to do other socket operations. For more information see their
24 respective manual pages.
25
26 socket(2) creates a socket, connect(2) connects a socket to a remote
27 socket address, the bind(2) function binds a socket to a local socket
28 address, listen(2) tells the socket that new connections shall be
29 accepted, and accept(2) is used to get a new socket with a new incoming
30 connection. socketpair(2) returns two connected anonymous sockets
31 (implemented only for a few local families like AF_UNIX)
32
33 send(2), sendto(2), and sendmsg(2) send data over a socket, and
34 recv(2), recvfrom(2), recvmsg(2) receive data from a socket. poll(2)
35 and select(2) wait for arriving data or a readiness to send data. In
36 addition, the standard I/O operations like write(2), writev(2), send‐
37 file(2), read(2), and readv(2) can be used to read and write data.
38
39 getsockname(2) returns the local socket address and getpeername(2)
40 returns the remote socket address. getsockopt(2) and setsockopt(2) are
41 used to set or get socket layer or protocol options. ioctl(2) can be
42 used to set or read some other options.
43
44 close(2) is used to close a socket. shutdown(2) closes parts of a
45 full-duplex socket connection.
46
47 Seeking, or calling pread(2) or pwrite(2) with a nonzero position is
48 not supported on sockets.
49
50 It is possible to do nonblocking I/O on sockets by setting the O_NON‐
51 BLOCK flag on a socket file descriptor using fcntl(2). Then all opera‐
52 tions that would block will (usually) return with EAGAIN (operation
53 should be retried later); connect(2) will return EINPROGRESS error.
54 The user can then wait for various events via poll(2) or select(2).
55
56 ┌────────────────────────────────────────────────────────────────────┐
57 │ I/O events │
58 ├───────────┬───────────┬────────────────────────────────────────────┤
59 │Event │ Poll flag │ Occurrence │
60 ├───────────┼───────────┼────────────────────────────────────────────┤
61 │Read │ POLLIN │ New data arrived. │
62 ├───────────┼───────────┼────────────────────────────────────────────┤
63 │Read │ POLLIN │ A connection setup has been completed (for │
64 │ │ │ connection-oriented sockets) │
65 ├───────────┼───────────┼────────────────────────────────────────────┤
66 │Read │ POLLHUP │ A disconnection request has been initiated │
67 │ │ │ by the other end. │
68 ├───────────┼───────────┼────────────────────────────────────────────┤
69 │Read │ POLLHUP │ A connection is broken (only for connec‐ │
70 │ │ │ tion-oriented protocols). When the socket │
71 │ │ │ is written SIGPIPE is also sent. │
72 ├───────────┼───────────┼────────────────────────────────────────────┤
73 │Write │ POLLOUT │ Socket has enough send buffer space for │
74 │ │ │ writing new data. │
75 ├───────────┼───────────┼────────────────────────────────────────────┤
76 │Read/Write │ POLLIN| │ An outgoing connect(2) finished. │
77 │ │ POLLOUT │ │
78 ├───────────┼───────────┼────────────────────────────────────────────┤
79 │Read/Write │ POLLERR │ An asynchronous error occurred. │
80 ├───────────┼───────────┼────────────────────────────────────────────┤
81 │Read/Write │ POLLHUP │ The other end has shut down one direction. │
82 ├───────────┼───────────┼────────────────────────────────────────────┤
83 │Exception │ POLLPRI │ Urgent data arrived. SIGURG is sent then. │
84 └───────────┴───────────┴────────────────────────────────────────────┘
85 An alternative to poll(2) and select(2) is to let the kernel inform the
86 application about events via a SIGIO signal. For that the O_ASYNC flag
87 must be set on a socket file descriptor via fcntl(2) and a valid signal
88 handler for SIGIO must be installed via sigaction(2). See the Signals
89 discussion below.
90
91 Socket address structures
92 Each socket domain has its own format for socket addresses, with a
93 domain-specific address structure. Each of these structures begins
94 with an integer "family" field (typed as sa_family_t) that indicates
95 the type of the address structure. This allows the various system
96 calls (e.g., connect(2), bind(2), accept(2), getsockname(2), getpeer‐
97 name(2)), which are generic to all socket domains, to determine the
98 domain of a particular socket address.
99
100 To allow any type of socket address to be passed to interfaces in the
101 sockets API, the type struct sockaddr is defined. The purpose of this
102 type is purely to allow casting of domain-specific socket address types
103 to a "generic" type, so as to avoid compiler warnings about type mis‐
104 matches in calls to the sockets API.
105
106 In addition, the sockets API provides the data type struct sock‐
107 addr_storage. This type is suitable to accommodate all supported
108 domain-specific socket address structures; it is large enough and is
109 aligned properly. (In particular, it is large enough to hold IPv6
110 socket addresses.) The structure includes the following field, which
111 can be used to identify the type of socket address actually stored in
112 the structure:
113
114 sa_family_t ss_family;
115
116 The sockaddr_storage structure is useful in programs that must handle
117 socket addresses in a generic way (e.g., programs that must deal with
118 both IPv4 and IPv6 socket addresses).
119
120 Socket options
121 The socket options listed below can be set by using setsockopt(2) and
122 read with getsockopt(2) with the socket level set to SOL_SOCKET for all
123 sockets. Unless otherwise noted, optval is a pointer to an int.
124
125 SO_ACCEPTCONN
126 Returns a value indicating whether or not this socket has been
127 marked to accept connections with listen(2). The value 0 indi‐
128 cates that this is not a listening socket, the value 1 indicates
129 that this is a listening socket. This socket option is read-
130 only.
131
132 SO_BINDTODEVICE
133 Bind this socket to a particular device like “eth0”, as speci‐
134 fied in the passed interface name. If the name is an empty
135 string or the option length is zero, the socket device binding
136 is removed. The passed option is a variable-length null-termi‐
137 nated interface name string with the maximum size of IFNAMSIZ.
138 If a socket is bound to an interface, only packets received from
139 that particular interface are processed by the socket. Note
140 that this works only for some socket types, particularly AF_INET
141 sockets. It is not supported for packet sockets (use normal
142 bind(2) there).
143
144 Before Linux 3.8, this socket option could be set, but could not
145 retrieved with getsockopt(2). Since Linux 3.8, it is readable.
146 The optlen argument should contain the buffer size available to
147 receive the device name and is recommended to be IFNAMSZ bytes.
148 The real device name length is reported back in the optlen argu‐
149 ment.
150
151 SO_BROADCAST
152 Set or get the broadcast flag. When enabled, datagram sockets
153 are allowed to send packets to a broadcast address. This option
154 has no effect on stream-oriented sockets.
155
156 SO_BSDCOMPAT
157 Enable BSD bug-to-bug compatibility. This is used by the UDP
158 protocol module in Linux 2.0 and 2.2. If enabled ICMP errors
159 received for a UDP socket will not be passed to the user pro‐
160 gram. In later kernel versions, support for this option has
161 been phased out: Linux 2.4 silently ignores it, and Linux 2.6
162 generates a kernel warning (printk()) if a program uses this
163 option. Linux 2.0 also enabled BSD bug-to-bug compatibility
164 options (random header changing, skipping of the broadcast flag)
165 for raw sockets with this option, but that was removed in Linux
166 2.2.
167
168 SO_DEBUG
169 Enable socket debugging. Only allowed for processes with the
170 CAP_NET_ADMIN capability or an effective user ID of 0.
171
172 SO_DOMAIN (since Linux 2.6.32)
173 Retrieves the socket domain as an integer, returning a value
174 such as AF_INET6. See socket(2) for details. This socket
175 option is read-only.
176
177 SO_ERROR
178 Get and clear the pending socket error. This socket option is
179 read-only. Expects an integer.
180
181 SO_DONTROUTE
182 Don't send via a gateway, send only to directly connected hosts.
183 The same effect can be achieved by setting the MSG_DONTROUTE
184 flag on a socket send(2) operation. Expects an integer boolean
185 flag.
186
187 SO_KEEPALIVE
188 Enable sending of keep-alive messages on connection-oriented
189 sockets. Expects an integer boolean flag.
190
191 SO_LINGER
192 Sets or gets the SO_LINGER option. The argument is a linger
193 structure.
194
195 struct linger {
196 int l_onoff; /* linger active */
197 int l_linger; /* how many seconds to linger for */
198 };
199
200 When enabled, a close(2) or shutdown(2) will not return until
201 all queued messages for the socket have been successfully sent
202 or the linger timeout has been reached. Otherwise, the call
203 returns immediately and the closing is done in the background.
204 When the socket is closed as part of exit(2), it always lingers
205 in the background.
206
207 SO_MARK (since Linux 2.6.25)
208 Set the mark for each packet sent through this socket (similar
209 to the netfilter MARK target but socket-based). Changing the
210 mark can be used for mark-based routing without netfilter or for
211 packet filtering. Setting this option requires the
212 CAP_NET_ADMIN capability.
213
214 SO_OOBINLINE
215 If this option is enabled, out-of-band data is directly placed
216 into the receive data stream. Otherwise out-of-band data is
217 passed only when the MSG_OOB flag is set during receiving.
218
219 SO_PASSCRED
220 Enable or disable the receiving of the SCM_CREDENTIALS control
221 message. For more information see unix(7).
222
223 SO_PEEK_OFF (since Linux 3.4)
224 This option, which is currently supported only for unix(7) sock‐
225 ets, sets the value of the "peek offset" for the recv(2) system
226 call when used with MSG_PEEK flag.
227
228 When this option is set to a negative value (it is set to -1 for
229 all new sockets), traditional behavior is provided: recv(2) with
230 the MSG_PEEK flag will peek data from the front of the queue.
231
232 When the option is set to a value greater than or equal to zero,
233 then the next peek at data queued in the socket will occur at
234 the byte offset specified by the option value. At the same
235 time, the "peek offset" will be incremented by the number of
236 bytes that were peeked from the queue, so that a subsequent peek
237 will return the next data in the queue.i
238
239 If data is removed from the front of the queue via a call to
240 recv(2) (or similar) without the MSG_PEEK flag, the "peek off‐
241 set" will be decreased by the number of bytes removed. In other
242 words, receiving data without the MSG_PEEK flag will cause the
243 "peek offset" to be adjusted to maintain the correct relative
244 position in the queued data, so that a subsequent peek will
245 retrieve the data that would have been retrieved had the data
246 not been removed.
247
248 For datagram sockets, if the "peek offset" points to the middle
249 of a packet, the data returned will be marked with the MSG_TRUNC
250 flag.
251
252 The following example serves to illustrate the use of
253 SO_PEEK_OFF. Suppose a stream socket has the following queued
254 input data:
255
256 aabbccddeeff
257
258
259 The following sequence of recv(2) calls would have the effect
260 noted in the comments:
261
262 int ov = 4; // Set peek offset to 4
263 setsockopt(fd, SOL_SOCKET, SO_PEEK_OFF, &ov, sizeof(ov));
264
265 recv(fd, buf, 2, MSG_PEEK); // Peeks "cc"; offset set to 6
266 recv(fd, buf, 2, MSG_PEEK); // Peeks "dd"; offset set to 8
267 recv(fd, buf, 2, 0); // Reads "aa"; offset set to 6
268 recv(fd, buf, 2, MSG_PEEK); // Peeks "ee"; offset set to 8
269
270 SO_PEERCRED
271 Return the credentials of the foreign process connected to this
272 socket. This is possible only for connected AF_UNIX stream
273 sockets and AF_UNIX stream and datagram socket pairs created
274 using socketpair(2); see unix(7). The returned credentials are
275 those that were in effect at the time of the call to connect(2)
276 or socketpair(2). The argument is a ucred structure; define the
277 GNU_SOURCE feature test macro to obtain the definition of that
278 structure from <sys/socket.h>. This socket option is read-only.
279
280 SO_PRIORITY
281 Set the protocol-defined priority for all packets to be sent on
282 this socket. Linux uses this value to order the networking
283 queues: packets with a higher priority may be processed first
284 depending on the selected device queueing discipline. For
285 ip(7), this also sets the IP type-of-service (TOS) field for
286 outgoing packets. Setting a priority outside the range 0 to 6
287 requires the CAP_NET_ADMIN capability.
288
289 SO_PROTOCOL (since Linux 2.6.32)
290 Retrieves the socket protocol as an integer, returning a value
291 such as IPPROTO_SCTP. See socket(2) for details. This socket
292 option is read-only.
293
294 SO_RCVBUF
295 Sets or gets the maximum socket receive buffer in bytes. The
296 kernel doubles this value (to allow space for bookkeeping over‐
297 head) when it is set using setsockopt(2), and this doubled value
298 is returned by getsockopt(2). The default value is set by the
299 /proc/sys/net/core/rmem_default file, and the maximum allowed
300 value is set by the /proc/sys/net/core/rmem_max file. The mini‐
301 mum (doubled) value for this option is 256.
302
303 SO_RCVBUFFORCE (since Linux 2.6.14)
304 Using this socket option, a privileged (CAP_NET_ADMIN) process
305 can perform the same task as SO_RCVBUF, but the rmem_max limit
306 can be overridden.
307
308 SO_RCVLOWAT and SO_SNDLOWAT
309 Specify the minimum number of bytes in the buffer until the
310 socket layer will pass the data to the protocol (SO_SNDLOWAT) or
311 the user on receiving (SO_RCVLOWAT). These two values are ini‐
312 tialized to 1. SO_SNDLOWAT is not changeable on Linux (setsock‐
313 opt(2) fails with the error ENOPROTOOPT). SO_RCVLOWAT is
314 changeable only since Linux 2.4. The select(2) and poll(2) sys‐
315 tem calls currently do not respect the SO_RCVLOWAT setting on
316 Linux, and mark a socket readable when even a single byte of
317 data is available. A subsequent read from the socket will block
318 until SO_RCVLOWAT bytes are available.
319
320 SO_RCVTIMEO and SO_SNDTIMEO
321 Specify the receiving or sending timeouts until reporting an
322 error. The argument is a struct timeval. If an input or output
323 function blocks for this period of time, and data has been sent
324 or received, the return value of that function will be the
325 amount of data transferred; if no data has been transferred and
326 the timeout has been reached then -1 is returned with errno set
327 to EAGAIN or EWOULDBLOCK, or EINPROGRESS (for connect(2)) just
328 as if the socket was specified to be nonblocking. If the time‐
329 out is set to zero (the default) then the operation will never
330 timeout. Timeouts only have effect for system calls that per‐
331 form socket I/O (e.g., read(2), recvmsg(2), send(2),
332 sendmsg(2)); timeouts have no effect for select(2), poll(2),
333 epoll_wait(2), and so on.
334
335 SO_REUSEADDR
336 Indicates that the rules used in validating addresses supplied
337 in a bind(2) call should allow reuse of local addresses. For
338 AF_INET sockets this means that a socket may bind, except when
339 there is an active listening socket bound to the address. When
340 the listening socket is bound to INADDR_ANY with a specific port
341 then it is not possible to bind to this port for any local
342 address. Argument is an integer boolean flag.
343
344 SO_SNDBUF
345 Sets or gets the maximum socket send buffer in bytes. The ker‐
346 nel doubles this value (to allow space for bookkeeping overhead)
347 when it is set using setsockopt(2), and this doubled value is
348 returned by getsockopt(2). The default value is set by the
349 /proc/sys/net/core/wmem_default file and the maximum allowed
350 value is set by the /proc/sys/net/core/wmem_max file. The mini‐
351 mum (doubled) value for this option is 2048.
352
353 SO_SNDBUFFORCE (since Linux 2.6.14)
354 Using this socket option, a privileged (CAP_NET_ADMIN) process
355 can perform the same task as SO_SNDBUF, but the wmem_max limit
356 can be overridden.
357
358 SO_TIMESTAMP
359 Enable or disable the receiving of the SO_TIMESTAMP control mes‐
360 sage. The timestamp control message is sent with level
361 SOL_SOCKET and the cmsg_data field is a struct timeval indicat‐
362 ing the reception time of the last packet passed to the user in
363 this call. See cmsg(3) for details on control messages.
364
365 SO_TYPE
366 Gets the socket type as an integer (e.g., SOCK_STREAM). This
367 socket option is read-only.
368
369 Signals
370 When writing onto a connection-oriented socket that has been shut down
371 (by the local or the remote end) SIGPIPE is sent to the writing process
372 and EPIPE is returned. The signal is not sent when the write call
373 specified the MSG_NOSIGNAL flag.
374
375 When requested with the FIOSETOWN fcntl(2) or SIOCSPGRP ioctl(2), SIGIO
376 is sent when an I/O event occurs. It is possible to use poll(2) or
377 select(2) in the signal handler to find out which socket the event
378 occurred on. An alternative (in Linux 2.2) is to set a real-time sig‐
379 nal using the F_SETSIG fcntl(2); the handler of the real time signal
380 will be called with the file descriptor in the si_fd field of its sig‐
381 info_t. See fcntl(2) for more information.
382
383 Under some circumstances (e.g., multiple processes accessing a single
384 socket), the condition that caused the SIGIO may have already disap‐
385 peared when the process reacts to the signal. If this happens, the
386 process should wait again because Linux will resend the signal later.
387
388 /proc interfaces
389 The core socket networking parameters can be accessed via files in the
390 directory /proc/sys/net/core/.
391
392 rmem_default
393 contains the default setting in bytes of the socket receive buf‐
394 fer.
395
396 rmem_max
397 contains the maximum socket receive buffer size in bytes which a
398 user may set by using the SO_RCVBUF socket option.
399
400 wmem_default
401 contains the default setting in bytes of the socket send buffer.
402
403 wmem_max
404 contains the maximum socket send buffer size in bytes which a
405 user may set by using the SO_SNDBUF socket option.
406
407 message_cost and message_burst
408 configure the token bucket filter used to load limit warning
409 messages caused by external network events.
410
411 netdev_max_backlog
412 Maximum number of packets in the global input queue.
413
414 optmem_max
415 Maximum length of ancillary data and user control data like the
416 iovecs per socket.
417
418 Ioctls
419 These operations can be accessed using ioctl(2):
420
421 error = ioctl(ip_socket, ioctl_type, &value_result);
422
423 SIOCGSTAMP
424 Return a struct timeval with the receive timestamp of the last
425 packet passed to the user. This is useful for accurate round
426 trip time measurements. See setitimer(2) for a description of
427 struct timeval. This ioctl should be used only if the socket
428 option SO_TIMESTAMP is not set on the socket. Otherwise, it
429 returns the timestamp of the last packet that was received while
430 SO_TIMESTAMP was not set, or it fails if no such packet has been
431 received, (i.e., ioctl(2) returns -1 with errno set to ENOENT).
432
433 SIOCSPGRP
434 Set the process or process group to send SIGIO or SIGURG signals
435 to when an asynchronous I/O operation has finished or urgent
436 data is available. The argument is a pointer to a pid_t. If
437 the argument is positive, send the signals to that process. If
438 the argument is negative, send the signals to the process group
439 with the ID of the absolute value of the argument. The process
440 may only choose itself or its own process group to receive sig‐
441 nals unless it has the CAP_KILL capability or an effective UID
442 of 0.
443
444 FIOASYNC
445 Change the O_ASYNC flag to enable or disable asynchronous I/O
446 mode of the socket. Asynchronous I/O mode means that the SIGIO
447 signal or the signal set with F_SETSIG is raised when a new I/O
448 event occurs.
449
450 Argument is an integer boolean flag. (This operation is synony‐
451 mous with the use of fcntl(2) to set the O_ASYNC flag.)
452
453 SIOCGPGRP
454 Get the current process or process group that receives SIGIO or
455 SIGURG signals, or 0 when none is set.
456
457 Valid fcntl(2) operations:
458
459 FIOGETOWN
460 The same as the SIOCGPGRP ioctl(2).
461
462 FIOSETOWN
463 The same as the SIOCSPGRP ioctl(2).
464
466 SO_BINDTODEVICE was introduced in Linux 2.0.30. SO_PASSCRED is new in
467 Linux 2.2. The /proc interfaces was introduced in Linux 2.2. SO_RCV‐
468 TIMEO and SO_SNDTIMEO are supported since Linux 2.3.41. Earlier, time‐
469 outs were fixed to a protocol-specific setting, and could not be read
470 or written.
471
473 Linux assumes that half of the send/receive buffer is used for internal
474 kernel structures; thus the values in the corresponding /proc files are
475 twice what can be observed on the wire.
476
477 Linux will only allow port reuse with the SO_REUSEADDR option when this
478 option was set both in the previous program that performed a bind(2) to
479 the port and in the program that wants to reuse the port. This differs
480 from some implementations (e.g., FreeBSD) where only the later program
481 needs to set the SO_REUSEADDR option. Typically this difference is
482 invisible, since, for example, a server program is designed to always
483 set this option.
484
486 The CONFIG_FILTER socket options SO_ATTACH_FILTER and SO_DETACH_FILTER
487 are not documented. The suggested interface to use them is via the
488 libpcap library.
489
491 getsockopt(2), connect(2), setsockopt(2), socket(2), capabilities(7),
492 ddp(7), ip(7), packet(7), tcp(7), udp(7), unix(7)
493
495 This page is part of release 3.53 of the Linux man-pages project. A
496 description of the project, and information about reporting bugs, can
497 be found at http://www.kernel.org/doc/man-pages/.
498
499
500
501Linux 2013-06-21 SOCKET(7)