1IP(7) Linux Programmer's Manual IP(7)
2
3
4
6 ip - Linux IPv4 protocol implementation
7
9 #include <sys/socket.h>
10 #include <netinet/in.h>
11 #include <netinet/ip.h> /* superset of previous */
12
13 tcp_socket = socket(PF_INET, SOCK_STREAM, 0);
14 udp_socket = socket(PF_INET, SOCK_DGRAM, 0);
15 raw_socket = socket(PF_INET, SOCK_RAW, protocol);
16
18 Linux implements the Internet Protocol, version 4, described in RFC 791
19 and RFC 1122. ip contains a level 2 multicasting implementation con‐
20 forming to RFC 1112. It also contains an IP router including a packet
21 filter.
22
23 The programming interface is BSD sockets compatible. For more informa‐
24 tion on sockets, see socket(7).
25
26 An IP socket is created by calling the socket(2) function as
27 socket(PF_INET, socket_type, protocol). Valid socket types are
28 SOCK_STREAM to open a tcp(7) socket, SOCK_DGRAM to open a udp(7)
29 socket, or SOCK_RAW to open a raw(7) socket to access the IP protocol
30 directly. protocol is the IP protocol in the IP header to be received
31 or sent. The only valid values for protocol are 0 and IPPROTO_TCP for
32 TCP sockets and 0 and IPPROTO_UDP for UDP sockets. For SOCK_RAW you
33 may specify a valid IANA IP protocol defined in RFC 1700 assigned num‐
34 bers.
35
36 When a process wants to receive new incoming packets or connections, it
37 should bind a socket to a local interface address using bind(2). Only
38 one IP socket may be bound to any given local (address, port) pair.
39 When INADDR_ANY is specified in the bind call the socket will be bound
40 to all local interfaces. When listen(2) or connect(2) are called on an
41 unbound socket, it is automatically bound to a random free port with
42 the local address set to INADDR_ANY.
43
44 A TCP local socket address that has been bound is unavailable for some
45 time after closing, unless the SO_REUSEADDR flag has been set. Care
46 should be taken when using this flag as it makes TCP less reliable.
47
48
50 An IP socket address is defined as a combination of an IP interface
51 address and a 16-bit port number. The basic IP protocol does not sup‐
52 ply port numbers, they are implemented by higher level protocols like
53 udp(7) and tcp(7). On raw sockets sin_port is set to the IP protocol.
54
55 struct sockaddr_in {
56 sa_family_t sin_family; /* address family: AF_INET */
57 u_int16_t sin_port; /* port in network byte order */
58 struct in_addr sin_addr; /* internet address */
59 };
60
61 /* Internet address. */
62 struct in_addr {
63 u_int32_t s_addr; /* address in network byte order */
64 };
65
66 sin_family is always set to AF_INET. This is required; in Linux 2.2
67 most networking functions return EINVAL when this setting is missing.
68 sin_port contains the port in network byte order. The port numbers
69 below 1024 are called reserved ports. Only privileged processes (i.e.,
70 those having the CAP_NET_BIND_SERVICE capability) may bind(2) to these
71 sockets. Note that the raw IPv4 protocol as such has no concept of a
72 port, they are only implemented by higher protocols like tcp(7) and
73 udp(7).
74
75 sin_addr is the IP host address. The s_addr member of struct in_addr
76 contains the host interface address in network byte order. in_addr
77 should be assigned one of the INADDR_* values (e.g., INADDR_ANY) or set
78 using the inet_aton(3), inet_addr(3), inet_makeaddr(3) library func‐
79 tions or directly with the name resolver (see gethostbyname(3)). IPv4
80 addresses are divided into unicast, broadcast and multicast addresses.
81 Unicast addresses specify a single interface of a host, broadcast
82 addresses specify all hosts on a network and multicast addresses
83 address all hosts in a multicast group. Datagrams to broadcast
84 addresses can be only sent or received when the SO_BROADCAST socket
85 flag is set. In the current implementation connection oriented sockets
86 are only allowed to use unicast addresses.
87
88 Note that the address and the port are always stored in network byte
89 order. In particular, this means that you need to call htons(3) on the
90 number that is assigned to a port. All address/port manipulation func‐
91 tions in the standard library work in network byte order.
92
93 There are several special addresses: INADDR_LOOPBACK (127.0.0.1) always
94 refers to the local host via the loopback device; INADDR_ANY (0.0.0.0)
95 means any address for binding; INADDR_BROADCAST (255.255.255.255) means
96 any host and has the same effect on bind as INADDR_ANY for historical
97 reasons.
98
100 IP supports some protocol specific socket options that can be set with
101 setsockopt(2) and read with getsockopt(2). The socket option level for
102 IP is IPPROTO_IP. A boolean integer flag is zero when it is false,
103 otherwise true.
104
105 IP_OPTIONS
106 Sets or get the IP options to be sent with every packet from
107 this socket. The arguments are a pointer to a memory buffer
108 containing the options and the option length. The setsockopt(2)
109 call sets the IP options associated with a socket. The maximum
110 option size for IPv4 is 40 bytes. See RFC 791 for the allowed
111 options. When the initial connection request packet for a
112 SOCK_STREAM socket contains IP options, the IP options will be
113 set automatically to the options from the initial packet with
114 routing headers reversed. Incoming packets are not allowed to
115 change options after the connection is established. The pro‐
116 cessing of all incoming source routing options is disabled by
117 default and can be enabled by using the accept_source_route
118 sysctl. Other options like timestamps are still handled. For
119 datagram sockets, IP options can be only set by the local user.
120 Calling getsockopt(2) with IP_OPTIONS puts the current IP
121 options used for sending into the supplied buffer.
122
123 IP_PKTINFO
124 Pass an IP_PKTINFO ancillary message that contains a pktinfo
125 structure that supplies some information about the incoming
126 packet. This only works for datagram oriented sockets. The
127 argument is a flag that tells the socket whether the IP_PKTINFO
128 message should be passed or not. The message itself can only be
129 sent/retrieved as control message with a packet using recvmsg(2)
130 or sendmsg(2).
131
132 struct in_pktinfo {
133 unsigned int ipi_ifindex; /* Interface index */
134 struct in_addr ipi_spec_dst; /* Local address */
135 struct in_addr ipi_addr; /* Header Destination
136 address */
137 };
138
139 ipi_ifindex is the unique index of the interface the packet was
140 received on. ipi_spec_dst is the local address of the packet
141 and ipi_addr is the destination address in the packet header.
142 If IP_PKTINFO is passed to sendmsg(2) and ipi_spec_dst is not
143 zero, then it is used as the local source address for the rout‐
144 ing table lookup and for setting up IP source route options.
145 When ipi_ifindex is not zero the primary local address of the
146 interface specified by the index overwrites ipi_spec_dst for the
147 routing table lookup.
148
149 IP_RECVTOS
150 If enabled the IP_TOS ancillary message is passed with incoming
151 packets. It contains a byte which specifies the Type of Ser‐
152 vice/Precedence field of the packet header. Expects a boolean
153 integer flag.
154
155 IP_RECVTTL
156 When this flag is set pass a IP_TTL control message with the
157 time to live field of the received packet as a byte. Not sup‐
158 ported for SOCK_STREAM sockets.
159
160 IP_RECVOPTS
161 Pass all incoming IP options to the user in a IP_OPTIONS control
162 message. The routing header and other options are already
163 filled in for the local host. Not supported for SOCK_STREAM
164 sockets.
165
166 IP_RETOPTS
167 Identical to IP_RECVOPTS but returns raw unprocessed options
168 with timestamp and route record options not filled in for this
169 hop.
170
171 IP_TOS Set or receive the Type-Of-Service (TOS) field that is sent with
172 every IP packet originating from this socket. It is used to
173 prioritize packets on the network. TOS is a byte. There are
174 some standard TOS flags defined: IPTOS_LOWDELAY to minimize
175 delays for interactive traffic, IPTOS_THROUGHPUT to optimize
176 throughput, IPTOS_RELIABILITY to optimize for reliability,
177 IPTOS_MINCOST should be used for "filler data" where slow trans‐
178 mission doesn't matter. At most one of these TOS values can be
179 specified. Other bits are invalid and shall be cleared. Linux
180 sends IPTOS_LOWDELAY datagrams first by default, but the exact
181 behaviour depends on the configured queueing discipline. Some
182 high priority levels may require superuser privileges (the
183 CAP_NET_ADMIN capability). The priority can also be set in a
184 protocol independent way by the (SOL_SOCKET, SO_PRIORITY) socket
185 option (see socket(7)).
186
187 IP_TTL Set or retrieve the current time to live field that is used in
188 every packet sent from this socket.
189
190 IP_HDRINCL
191 If enabled the user supplies an IP header in front of the user
192 data. Only valid for SOCK_RAW sockets. See raw(7) for more
193 information. When this flag is enabled the values set by
194 IP_OPTIONS, IP_TTL and IP_TOS are ignored.
195
196 IP_RECVERR (defined in <linux/errqueue.h>)
197 Enable extended reliable error message passing. When enabled on
198 a datagram socket all generated errors will be queued in a per-
199 socket error queue. When the user receives an error from a
200 socket operation the errors can be received by calling
201 recvmsg(2) with the MSG_ERRQUEUE flag set. The sock_extended_err
202 structure describing the error will be passed in a ancillary
203 message with the type IP_RECVERR and the level IPPROTO_IP. This
204 is useful for reliable error handling on unconnected sockets.
205 The received data portion of the error queue contains the error
206 packet.
207
208 The IP_RECVERR control message contains a sock_extended_err
209 structure:
210
211 #define SO_EE_ORIGIN_NONE 0
212 #define SO_EE_ORIGIN_LOCAL 1
213 #define SO_EE_ORIGIN_ICMP 2
214 #define SO_EE_ORIGIN_ICMP6 3
215
216 struct sock_extended_err {
217 u_int32_t ee_errno; /* error number */
218 u_int8_t ee_origin; /* where the error originated */
219 u_int8_t ee_type; /* type */
220 u_int8_t ee_code; /* code */
221 u_int8_t ee_pad;
222 u_int32_t ee_info; /* additional information */
223 u_int32_t ee_data; /* other data */
224 /* More data may follow */
225 };
226
227 struct sockaddr *SO_EE_OFFENDER(struct sock_extended_err *);
228
229 ee_errno contains the errno number of the queued error. ee_ori‐
230 gin is the origin code of where the error originated. The other
231 fields are protocol specific. The macro SO_EE_OFFENDER returns a
232 pointer to the address of the network object where the error
233 originated from given a pointer to the ancillary message. If
234 this address is not known, the sa_family member of the sockaddr
235 contains AF_UNSPEC and the other fields of the sockaddr are
236 undefined.
237
238 IP uses the sock_extended_err structure as follows: ee_origin is
239 set to SO_EE_ORIGIN_ICMP for errors received as an ICMP packet,
240 or SO_EE_ORIGIN_LOCAL for locally generated errors. Unknown val‐
241 ues should be ignored. ee_type and ee_code are set from the
242 type and code fields of the ICMP header. ee_info contains the
243 discovered MTU for EMSGSIZE errors. The message also contains
244 the sockaddr_in of the node caused the error, which can be
245 accessed with the SO_EE_OFFENDER macro. The sin_family field of
246 the SO_EE_OFFENDER address is AF_UNSPEC when the source was
247 unknown. When the error originated from the network, all IP
248 options (IP_OPTIONS, IP_TTL, etc.) enabled on the socket and
249 contained in the error packet are passed as control messages.
250 The payload of the packet causing the error is returned as nor‐
251 mal payload. Note that TCP has no error queue; MSG_ERRQUEUE is
252 illegal on SOCK_STREAM sockets. IP_RECVERR is valid for TCP,
253 but all errors are returned by socket function return or
254 SO_ERROR only.
255
256 For raw sockets, IP_RECVERR enables passing of all received ICMP
257 errors to the application, otherwise errors are only reported on
258 connected sockets
259
260 It sets or retrieves an integer boolean flag. IP_RECVERR
261 defaults to off.
262
263 IP_MTU_DISCOVER
264 Sets or receives the Path MTU Discovery setting for a socket.
265 When enabled, Linux will perform Path MTU Discovery as defined
266 in RFC 1191 on this socket. The don't fragment flag is set on
267 all outgoing datagrams. The system-wide default is controlled
268 by the ip_no_pmtu_disc sysctl for SOCK_STREAM sockets, and dis‐
269 abled on all others. For non SOCK_STREAM sockets it is the
270 user's responsibility to packetize the data in MTU sized chunks
271 and to do the retransmits if necessary. The kernel will reject
272 packets that are bigger than the known path MTU if this flag is
273 set (with EMSGSIZE ).
274
275 Path MTU discovery flags Meaning
276 IP_PMTUDISC_WANT Use per-route settings.
277 IP_PMTUDISC_DONT Never do Path MTU Discovery.
278 IP_PMTUDISC_DO Always do Path MTU Discovery.
279
280 When PMTU discovery is enabled the kernel automatically keeps
281 track of the path MTU per destination host. When it is con‐
282 nected to a specific peer with connect(2) the currently known
283 path MTU can be retrieved conveniently using the IP_MTU socket
284 option (e.g. after a EMSGSIZE error occurred). It may change
285 over time. For connectionless sockets with many destinations
286 the new also MTU for a given destination can also be accessed
287 using the error queue (see IP_RECVERR). A new error will be
288 queued for every incoming MTU update.
289
290 While MTU discovery is in progress initial packets from datagram
291 sockets may be dropped. Applications using UDP should be aware
292 of this and not take it into account for their packet retransmit
293 strategy.
294
295 To bootstrap the path MTU discovery process on unconnected sock‐
296 ets it is possible to start with a big datagram size (up to 64K-
297 headers bytes long) and let it shrink by updates of the path
298 MTU.
299
300 To get an initial estimate of the path MTU connect a datagram
301 socket to the destination address using connect(2) and retrieve
302 the MTU by calling getsockopt(2) with the IP_MTU option.
303
304 IP_MTU Retrieve the current known path MTU of the current socket. Only
305 valid when the socket has been connected. Returns an integer.
306 Only valid as a getsockopt(2).
307
308 IP_ROUTER_ALERT
309 Pass all to-be forwarded packets with the IP Router Alert option
310 set to this socket. Only valid for raw sockets. This is useful,
311 for instance, for user space RSVP daemons. The tapped packets
312 are not forwarded by the kernel, it is the users responsibility
313 to send them out again. Socket binding is ignored, such packets
314 are only filtered by protocol. Expects an integer flag.
315
316 IP_MULTICAST_TTL
317 Set or reads the time-to-live value of outgoing multicast pack‐
318 ets for this socket. It is very important for multicast packets
319 to set the smallest TTL possible. The default is 1 which means
320 that multicast packets don't leave the local network unless the
321 user program explicitly requests it. Argument is an integer.
322
323 IP_MULTICAST_LOOP
324 Sets or reads a boolean integer argument whether sent multicast
325 packets should be looped back to the local sockets.
326
327 IP_ADD_MEMBERSHIP
328 Join a multicast group. Argument is an ip_mreqn structure.
329
330 struct ip_mreqn {
331 struct in_addr imr_multiaddr; /* IP multicast group
332 address */
333 struct in_addr imr_address; /* IP address of local
334 interface */
335 int imr_ifindex; /* interface index */
336 };
337
338 imr_multiaddr contains the address of the multicast group the
339 application wants to join or leave. It must be a valid multi‐
340 cast address. imr_address is the address of the local interface
341 with which the system should join the multicast group; if it is
342 equal to INADDR_ANY an appropriate interface is chosen by the
343 system. imr_ifindex is the interface index of the interface
344 that should join/leave the imr_multiaddr group, or 0 to indicate
345 any interface.
346
347 For compatibility, the old ip_mreq structure is still supported.
348 It differs from ip_mreqn only by not including the imr_ifindex
349 field. Only valid as a setsockopt(2).
350
351 IP_DROP_MEMBERSHIP
352 Leave a multicast group. Argument is an ip_mreqn or ip_mreq
353 structure similar to IP_ADD_MEMBERSHIP.
354
355 IP_MULTICAST_IF
356 Set the local device for a multicast socket. Argument is an
357 ip_mreqn or ip_mreq structure similar to IP_ADD_MEMBERSHIP.
358
359 When an invalid socket option is passed, ENOPROTOOPT is
360 returned.
361
363 The IP protocol supports the sysctl interface to configure some global
364 options. The sysctls can be accessed by reading or writing the
365 /proc/sys/net/ipv4/* files or using the sysctl(2) interface. Variables
366 described as Boolean take an integer value, with a non-zero value
367 ("true") meaning that the corresponding option is enabled, and a zero
368 value ("false") meaning that the option is disabled.
369
370 ip_always_defrag (Boolean)
371 [New with kernel 2.2.13; in earlier kernel version the feature
372 was controlled at compile time by the CONFIG_IP_ALWAYS_DEFRAG
373 option; this file is not present in 2.4.x and later]
374
375 When this boolean frag is enabled (not equal 0) incoming frag‐
376 ments (parts of IP packets that arose when some host between
377 origin and destination decided that the packets were too large
378 and cut them into pieces) will be reassembled (defragmented)
379 before being processed, even if they are about to be forwarded.
380
381 Only enable if running either a firewall that is the sole link
382 to your network or a transparent proxy; never ever turn on here
383 for a normal router or host. Otherwise fragmented communication
384 may me disturbed when the fragments would travel over different
385 links. Defragmentation also has a large memory and CPU time
386 cost.
387
388 This is automagically turned on when masquerading or transparent
389 proxying are configured.
390
391 ip_autoconfig
392 Not documented.
393
394 ip_default_ttl (integer; default: 64)
395 Set the default time-to-live value of outgoing packets. This
396 can be changed per socket with the IP_TTL option.
397
398 ip_dynaddr (Boolean; default: disabled)
399 Enable dynamic socket address and masquerading entry rewriting
400 on interface address change. This is useful for dialup inter‐
401 face with changing IP addresses. 0 means no rewriting, 1 turns
402 it on and 2 enables verbose mode.
403
404 ip_forward (Boolean; default: disabled)
405 Enable IP forwarding with a boolean flag. IP forwarding can be
406 also set on a per interface basis.
407
408 ip_local_port_range
409 Contains two integers that define the default local port range
410 allocated to sockets. Allocation starts with the first number
411 and ends with the second number. Note that these should not
412 conflict with the ports used by masquerading (although the case
413 is handled). Also arbitrary choices may cause problems with
414 some firewall packet filters that make assumptions about the
415 local ports in use. First number should be at least >1024, bet‐
416 ter >4096 to avoid clashes with well known ports and to minimize
417 firewall problems.
418
419 ip_no_pmtu_disc (Boolean; default: disabled)
420 If enabled, don't do Path MTU Discovery for TCP sockets by
421 default. Path MTU discovery may fail if misconfigured firewalls
422 (that drop all ICMP packets) or misconfigured interfaces (e.g.,
423 a point-to-point link where the both ends don't agree on the
424 MTU) are on the path. It is better to fix the broken routers on
425 the path than to turn off Path MTU Discovery globally, because
426 not doing it incurs a high cost to the network.
427
428 ip_nonlocal_bind (Boolean; default: disabled)
429 If set, allows processes to bind() to non-local IP addresses,
430 which can be quite useful, but may break some applications.
431
432 ip6frag_time (integer; default 30)
433 Time in seconds to keep an IPv6 fragment in memory.
434
435 ip6frag_secret_interval (integer; default 600)
436 Regeneration interval (in seconds) of the hash secret (or life‐
437 time for the hash secret) for IPv6 fragments.
438
439 ipfrag_high_thresh (integer), ipfrag_low_thresh (integer)
440 If the amount of queued IP fragments reaches ipfrag_high_thresh,
441 the queue is pruned down to ipfrag_low_thresh. Contains an
442 integer with the number of bytes.
443
444 neigh/*
445 See arp(7).
446
448 All ioctls described in socket(7) apply to ip.
449
450 Ioctls to configure generic device parameters are described in netde‐
451 vice(7).
452
454 Be very careful with the SO_BROADCAST option - it is not privileged in
455 Linux. It is easy to overload the network with careless broadcasts.
456 For new application protocols it is better to use a multicast group
457 instead of broadcasting. Broadcasting is discouraged.
458
459 Some other BSD sockets implementations provide IP_RCVDSTADDR and
460 IP_RECVIF socket options to get the destination address and the inter‐
461 face of received datagrams. Linux has the more general IP_PKTINFO for
462 the same task.
463
464 Some BSD sockets implementations also provide an IP_RECVTTL option, but
465 an ancillary message with type IP_RECVTTL is passed with the incoming
466 packet. This is different from the IP_TTL option used in Linux.
467
468 Using SOL_IP socket options level isn't portable, BSD-based stacks use
469 IPPROTO_IP level.
470
472 ENOTCONN
473 The operation is only defined on a connected socket, but the
474 socket wasn't connected.
475
476 EINVAL Invalid argument passed. For send operations this can be caused
477 by sending to a blackhole route.
478
479 EMSGSIZE
480 Datagram is bigger than an MTU on the path and it cannot be
481 fragmented.
482
483 EACCES The user tried to execute an operation without the necessary
484 permissions. These include: sending a packet to a broadcast
485 address without having the SO_BROADCAST flag set; sending a
486 packet via a prohibit route; modifying firewall settings without
487 superuser privileges (the CAP_NET_ADMIN capability); binding to
488 a reserved port without superuser privileges (the
489 CAP_NET_BIND_SERVICE capability).
490
491
492 EADDRINUSE
493 Tried to bind to an address already in use.
494
495 ENOPROTOOPT and EOPNOTSUPP
496 Invalid socket option passed.
497
498 EPERM User doesn't have permission to set high priority, change con‐
499 figuration, or send signals to the requested process or group.
500
501 EADDRNOTAVAIL
502 A non-existent interface was requested or the requested source
503 address was not local.
504
505 EAGAIN Operation on a non-blocking socket would block.
506
507 ESOCKTNOSUPPORT
508 The socket is not configured or an unknown socket type was
509 requested.
510
511 EISCONN
512 connect(2) was called on an already connected socket.
513
514 EALREADY
515 An connection operation on a non-blocking socket is already in
516 progress.
517
518 ECONNABORTED
519 A connection was closed during an accept(2).
520
521 EPIPE The connection was unexpectedly closed or shut down by the other
522 end.
523
524 ENOENT SIOCGSTAMP was called on a socket where no packet arrived.
525
526 EHOSTUNREACH
527 No valid routing table entry matches the destination address.
528 This error can be caused by a ICMP message from a remote router
529 or for the local routing table.
530
531 ENODEV Network device not available or not capable of sending IP.
532
533 ENOPKG A kernel subsystem was not configured.
534
535 ENOBUFS, ENOMEM
536 Not enough free memory. This often means that the memory allo‐
537 cation is limited by the socket buffer limits, not by the system
538 memory, but this is not 100% consistent.
539
540 Other errors may be generated by the overlaying protocols; see tcp(7),
541 raw(7), udp(7) and socket(7).
542
544 IP_MTU, IP_MTU_DISCOVER, IP_PKTINFO, IP_RECVERR and IP_ROUTER_ALERT are
545 new options in Linux 2.2. They are also all Linux specific and should
546 not be used in programs intended to be portable.
547
548 struct ip_mreqn is new in Linux 2.2. Linux 2.0 only supported ip_mreq.
549
550 The sysctls were introduced with Linux 2.2.
551
553 For compatibility with Linux 2.0, the obsolete socket(PF_INET,
554 SOCK_PACKET, protocol) syntax is still supported to open a packet(7)
555 socket. This is deprecated and should be replaced by socket(PF_PACKET,
556 SOCK_RAW, protocol) instead. The main difference is the new sockaddr_ll
557 address structure for generic link layer information instead of the old
558 sockaddr_pkt.
559
561 There are too many inconsistent error values.
562
563 The ioctls to configure IP-specific interface options and ARP tables
564 are not described.
565
566 Some versions of glibc forget to declare in_pktinfo. Workaround cur‐
567 rently is to copy it into your program from this man page.
568
569 Receiving the original destination address with MSG_ERRQUEUE in
570 msg_name by recvmsg(2) does not work in some 2.2 kernels.
571
573 recvmsg(2), sendmsg(2), byteorder(3), ipfw(4), capabilities(7),
574 netlink(7), raw(7), socket(7), tcp(7), udp(7)
575
576 RFC 791 for the original IP specification.
577 RFC 1122 for the IPv4 host requirements.
578 RFC 1812 for the IPv4 router requirements.
579
580
581
582Linux Man Page 2001-06-19 IP(7)