1PACKET(7) Linux Programmer's Manual PACKET(7)
2
3
4
6 packet - packet interface on device level
7
9 #include <sys/socket.h>
10 #include <linux/if_packet.h>
11 #include <net/ethernet.h> /* the L2 protocols */
12
13 packet_socket = socket(AF_PACKET, int socket_type, int protocol);
14
16 Packet sockets are used to receive or send raw packets at the device
17 driver (OSI Layer 2) level. They allow the user to implement protocol
18 modules in user space on top of the physical layer.
19
20 The socket_type is either SOCK_RAW for raw packets including the link-
21 level header or SOCK_DGRAM for cooked packets with the link-level
22 header removed. The link-level header information is available in a
23 common format in a sockaddr_ll structure. protocol is the IEEE 802.3
24 protocol number in network byte order. See the <linux/if_ether.h>
25 include file for a list of allowed protocols. When protocol is set to
26 htons(ETH_P_ALL), then all protocols are received. All incoming pack‐
27 ets of that protocol type will be passed to the packet socket before
28 they are passed to the protocols implemented in the kernel.
29
30 In order to create a packet socket, a process must have the CAP_NET_RAW
31 capability in the user namespace that governs its network namespace.
32
33 SOCK_RAW packets are passed to and from the device driver without any
34 changes in the packet data. When receiving a packet, the address is
35 still parsed and passed in a standard sockaddr_ll address structure.
36 When transmitting a packet, the user-supplied buffer should contain the
37 physical-layer header. That packet is then queued unmodified to the
38 network driver of the interface defined by the destination address.
39 Some device drivers always add other headers. SOCK_RAW is similar to
40 but not compatible with the obsolete AF_INET/SOCK_PACKET of Linux 2.0.
41
42 SOCK_DGRAM operates on a slightly higher level. The physical header is
43 removed before the packet is passed to the user. Packets sent through
44 a SOCK_DGRAM packet socket get a suitable physical-layer header based
45 on the information in the sockaddr_ll destination address before they
46 are queued.
47
48 By default, all packets of the specified protocol type are passed to a
49 packet socket. To get packets only from a specific interface use
50 bind(2) specifying an address in a struct sockaddr_ll to bind the
51 packet socket to an interface. Fields used for binding are sll_family
52 (should be AF_PACKET), sll_protocol, and sll_ifindex.
53
54 The connect(2) operation is not supported on packet sockets.
55
56 When the MSG_TRUNC flag is passed to recvmsg(2), recv(2), or
57 recvfrom(2), the real length of the packet on the wire is always
58 returned, even when it is longer than the buffer.
59
60 Address types
61 The sockaddr_ll structure is a device-independent physical-layer
62 address.
63
64 struct sockaddr_ll {
65 unsigned short sll_family; /* Always AF_PACKET */
66 unsigned short sll_protocol; /* Physical-layer protocol */
67 int sll_ifindex; /* Interface number */
68 unsigned short sll_hatype; /* ARP hardware type */
69 unsigned char sll_pkttype; /* Packet type */
70 unsigned char sll_halen; /* Length of address */
71 unsigned char sll_addr[8]; /* Physical-layer address */
72 };
73
74 The fields of this structure are as follows:
75
76 * sll_protocol is the standard ethernet protocol type in network byte
77 order as defined in the <linux/if_ether.h> include file. It
78 defaults to the socket's protocol.
79
80 * sll_ifindex is the interface index of the interface (see netde‐
81 vice(7)); 0 matches any interface (only permitted for binding).
82 sll_hatype is an ARP type as defined in the <linux/if_arp.h> include
83 file.
84
85 * sll_pkttype contains the packet type. Valid types are PACKET_HOST
86 for a packet addressed to the local host, PACKET_BROADCAST for a
87 physical-layer broadcast packet, PACKET_MULTICAST for a packet sent
88 to a physical-layer multicast address, PACKET_OTHERHOST for a packet
89 to some other host that has been caught by a device driver in pro‐
90 miscuous mode, and PACKET_OUTGOING for a packet originating from the
91 local host that is looped back to a packet socket. These types make
92 sense only for receiving.
93
94 * sll_addr and sll_halen contain the physical-layer (e.g., IEEE 802.3)
95 address and its length. The exact interpretation depends on the
96 device.
97
98 When you send packets, it is enough to specify sll_family, sll_addr,
99 sll_halen, sll_ifindex, and sll_protocol. The other fields should be
100 0. sll_hatype and sll_pkttype are set on received packets for your
101 information.
102
103 Socket options
104 Packet socket options are configured by calling setsockopt(2) with
105 level SOL_PACKET.
106
107 PACKET_ADD_MEMBERSHIP
108 PACKET_DROP_MEMBERSHIP
109 Packet sockets can be used to configure physical-layer multicas‐
110 ting and promiscuous mode. PACKET_ADD_MEMBERSHIP adds a binding
111 and PACKET_DROP_MEMBERSHIP drops it. They both expect a
112 packet_mreq structure as argument:
113
114 struct packet_mreq {
115 int mr_ifindex; /* interface index */
116 unsigned short mr_type; /* action */
117 unsigned short mr_alen; /* address length */
118 unsigned char mr_address[8]; /* physical-layer address */
119 };
120
121 mr_ifindex contains the interface index for the interface whose
122 status should be changed. The mr_type field specifies which
123 action to perform. PACKET_MR_PROMISC enables receiving all
124 packets on a shared medium (often known as "promiscuous mode"),
125 PACKET_MR_MULTICAST binds the socket to the physical-layer mul‐
126 ticast group specified in mr_address and mr_alen, and
127 PACKET_MR_ALLMULTI sets the socket up to receive all multicast
128 packets arriving at the interface.
129
130 In addition, the traditional ioctls SIOCSIFFLAGS, SIOCADDMULTI,
131 SIOCDELMULTI can be used for the same purpose.
132
133 PACKET_AUXDATA (since Linux 2.6.21)
134 If this binary option is enabled, the packet socket passes a
135 metadata structure along with each packet in the recvmsg(2) con‐
136 trol field. The structure can be read with cmsg(3). It is
137 defined as
138
139 struct tpacket_auxdata {
140 __u32 tp_status;
141 __u32 tp_len; /* packet length */
142 __u32 tp_snaplen; /* captured length */
143 __u16 tp_mac;
144 __u16 tp_net;
145 __u16 tp_vlan_tci;
146 __u16 tp_padding;
147 };
148
149 PACKET_FANOUT (since Linux 3.1)
150 To scale processing across threads, packet sockets can form a
151 fanout group. In this mode, each matching packet is enqueued
152 onto only one socket in the group. A socket joins a fanout
153 group by calling setsockopt(2) with level SOL_PACKET and option
154 PACKET_FANOUT. Each network namespace can have up to 65536
155 independent groups. A socket selects a group by encoding the ID
156 in the first 16 bits of the integer option value. The first
157 packet socket to join a group implicitly creates it. To suc‐
158 cessfully join an existing group, subsequent packet sockets must
159 have the same protocol, device settings, fanout mode and flags
160 (see below). Packet sockets can leave a fanout group only by
161 closing the socket. The group is deleted when the last socket
162 is closed.
163
164 Fanout supports multiple algorithms to spread traffic between
165 sockets, as follows:
166
167 * The default mode, PACKET_FANOUT_HASH, sends packets from the
168 same flow to the same socket to maintain per-flow ordering.
169 For each packet, it chooses a socket by taking the packet
170 flow hash modulo the number of sockets in the group, where a
171 flow hash is a hash over network-layer address and optional
172 transport-layer port fields.
173
174 * The load-balance mode PACKET_FANOUT_LB implements a round-
175 robin algorithm.
176
177 * PACKET_FANOUT_CPU selects the socket based on the CPU that
178 the packet arrived on.
179
180 * PACKET_FANOUT_ROLLOVER processes all data on a single socket,
181 moving to the next when one becomes backlogged.
182
183 * PACKET_FANOUT_RND selects the socket using a pseudo-random
184 number generator.
185
186 * PACKET_FANOUT_QM (available since Linux 3.14) selects the
187 socket using the recorded queue_mapping of the received skb.
188
189 Fanout modes can take additional options. IP fragmentation
190 causes packets from the same flow to have different flow hashes.
191 The flag PACKET_FANOUT_FLAG_DEFRAG, if set, causes packets to be
192 defragmented before fanout is applied, to preserve order even in
193 this case. Fanout mode and options are communicated in the sec‐
194 ond 16 bits of the integer option value. The flag
195 PACKET_FANOUT_FLAG_ROLLOVER enables the roll over mechanism as a
196 backup strategy: if the original fanout algorithm selects a
197 backlogged socket, the packet rolls over to the next available
198 one.
199
200 PACKET_LOSS (with PACKET_TX_RING)
201 When a malformed packet is encountered on a transmit ring, the
202 default is to reset its tp_status to TP_STATUS_WRONG_FORMAT and
203 abort the transmission immediately. The malformed packet blocks
204 itself and subsequently enqueued packets from being sent. The
205 format error must be fixed, the associated tp_status reset to
206 TP_STATUS_SEND_REQUEST, and the transmission process restarted
207 via send(2). However, if PACKET_LOSS is set, any malformed
208 packet will be skipped, its tp_status reset to TP_STATUS_AVAIL‐
209 ABLE, and the transmission process continued.
210
211 PACKET_RESERVE (with PACKET_RX_RING)
212 By default, a packet receive ring writes packets immediately
213 following the metadata structure and alignment padding. This
214 integer option reserves additional headroom.
215
216 PACKET_RX_RING
217 Create a memory-mapped ring buffer for asynchronous packet
218 reception. The packet socket reserves a contiguous region of
219 application address space, lays it out into an array of packet
220 slots and copies packets (up to tp_snaplen) into subsequent
221 slots. Each packet is preceded by a metadata structure similar
222 to tpacket_auxdata. The protocol fields encode the offset to
223 the data from the start of the metadata header. tp_net stores
224 the offset to the network layer. If the packet socket is of
225 type SOCK_DGRAM, then tp_mac is the same. If it is of type
226 SOCK_RAW, then that field stores the offset to the link-layer
227 frame. Packet socket and application communicate the head and
228 tail of the ring through the tp_status field. The packet socket
229 owns all slots with tp_status equal to TP_STATUS_KERNEL. After
230 filling a slot, it changes the status of the slot to transfer
231 ownership to the application. During normal operation, the new
232 tp_status value has at least the TP_STATUS_USER bit set to sig‐
233 nal that a received packet has been stored. When the applica‐
234 tion has finished processing a packet, it transfers ownership of
235 the slot back to the socket by setting tp_status equal to
236 TP_STATUS_KERNEL.
237
238 Packet sockets implement multiple variants of the packet ring.
239 The implementation details are described in Documentation/net‐
240 working/packet_mmap.txt in the Linux kernel source tree.
241
242 PACKET_STATISTICS
243 Retrieve packet socket statistics in the form of a structure
244
245 struct tpacket_stats {
246 unsigned int tp_packets; /* Total packet count */
247 unsigned int tp_drops; /* Dropped packet count */
248 };
249
250 Receiving statistics resets the internal counters. The statis‐
251 tics structure differs when using a ring of variant TPACKET_V3.
252
253 PACKET_TIMESTAMP (with PACKET_RX_RING; since Linux 2.6.36)
254 The packet receive ring always stores a timestamp in the meta‐
255 data header. By default, this is a software generated timestamp
256 generated when the packet is copied into the ring. This integer
257 option selects the type of timestamp. Besides the default, it
258 support the two hardware formats described in Documentation/net‐
259 working/timestamping.txt in the Linux kernel source tree.
260
261 PACKET_TX_RING (since Linux 2.6.31)
262 Create a memory-mapped ring buffer for packet transmission.
263 This option is similar to PACKET_RX_RING and takes the same
264 arguments. The application writes packets into slots with
265 tp_status equal to TP_STATUS_AVAILABLE and schedules them for
266 transmission by changing tp_status to TP_STATUS_SEND_REQUEST.
267 When packets are ready to be transmitted, the application calls
268 send(2) or a variant thereof. The buf and len fields of this
269 call are ignored. If an address is passed using sendto(2) or
270 sendmsg(2), then that overrides the socket default. On success‐
271 ful transmission, the socket resets tp_status to TP_STA‐
272 TUS_AVAILABLE. It immediately aborts the transmission on error
273 unless PACKET_LOSS is set.
274
275 PACKET_VERSION (with PACKET_RX_RING; since Linux 2.6.27)
276 By default, PACKET_RX_RING creates a packet receive ring of
277 variant TPACKET_V1. To create another variant, configure the
278 desired variant by setting this integer option before creating
279 the ring.
280
281 PACKET_QDISC_BYPASS (since Linux 3.14)
282 By default, packets sent through packet sockets pass through the
283 kernel's qdisc (traffic control) layer, which is fine for the
284 vast majority of use cases. For traffic generator appliances
285 using packet sockets that intend to brute-force flood the net‐
286 work—for example, to test devices under load in a similar fash‐
287 ion to pktgen—this layer can be bypassed by setting this integer
288 option to 1. A side effect is that packet buffering in the
289 qdisc layer is avoided, which will lead to increased drops when
290 network device transmit queues are busy; therefore, use at your
291 own risk.
292
293 Ioctls
294 SIOCGSTAMP can be used to receive the timestamp of the last received
295 packet. Argument is a struct timeval variable.
296
297 In addition, all standard ioctls defined in netdevice(7) and socket(7)
298 are valid on packet sockets.
299
300 Error handling
301 Packet sockets do no error handling other than errors occurred while
302 passing the packet to the device driver. They don't have the concept
303 of a pending error.
304
306 EADDRNOTAVAIL
307 Unknown multicast group address passed.
308
309 EFAULT User passed invalid memory address.
310
311 EINVAL Invalid argument.
312
313 EMSGSIZE
314 Packet is bigger than interface MTU.
315
316 ENETDOWN
317 Interface is not up.
318
319 ENOBUFS
320 Not enough memory to allocate the packet.
321
322 ENODEV Unknown device name or interface index specified in interface
323 address.
324
325 ENOENT No packet received.
326
327 ENOTCONN
328 No interface address passed.
329
330 ENXIO Interface address contained an invalid interface index.
331
332 EPERM User has insufficient privileges to carry out this operation.
333
334 In addition, other errors may be generated by the low-level driver.
335
337 AF_PACKET is a new feature in Linux 2.2. Earlier Linux versions sup‐
338 ported only SOCK_PACKET.
339
341 For portable programs it is suggested to use AF_PACKET via pcap(3);
342 although this covers only a subset of the AF_PACKET features.
343
344 The SOCK_DGRAM packet sockets make no attempt to create or parse the
345 IEEE 802.2 LLC header for a IEEE 802.3 frame. When ETH_P_802_3 is
346 specified as protocol for sending the kernel creates the 802.3 frame
347 and fills out the length field; the user has to supply the LLC header
348 to get a fully conforming packet. Incoming 802.3 packets are not mul‐
349 tiplexed on the DSAP/SSAP protocol fields; instead they are supplied to
350 the user as protocol ETH_P_802_2 with the LLC header prefixed. It is
351 thus not possible to bind to ETH_P_802_3; bind to ETH_P_802_2 instead
352 and do the protocol multiplex yourself. The default for sending is the
353 standard Ethernet DIX encapsulation with the protocol filled in.
354
355 Packet sockets are not subject to the input or output firewall chains.
356
357 Compatibility
358 In Linux 2.0, the only way to get a packet socket was with the call:
359
360 socket(AF_INET, SOCK_PACKET, protocol)
361
362 This is still supported, but deprecated and strongly discouraged. The
363 main difference between the two methods is that SOCK_PACKET uses the
364 old struct sockaddr_pkt to specify an interface, which doesn't provide
365 physical-layer independence.
366
367 struct sockaddr_pkt {
368 unsigned short spkt_family;
369 unsigned char spkt_device[14];
370 unsigned short spkt_protocol;
371 };
372
373 spkt_family contains the device type, spkt_protocol is the IEEE 802.3
374 protocol type as defined in <sys/if_ether.h> and spkt_device is the
375 device name as a null-terminated string, for example, eth0.
376
377 This structure is obsolete and should not be used in new code.
378
380 The IEEE 802.2/803.3 LLC handling could be considered as a bug.
381
382 Socket filters are not documented.
383
384 The MSG_TRUNC recvmsg(2) extension is an ugly hack and should be
385 replaced by a control message. There is currently no way to get the
386 original destination address of packets via SOCK_DGRAM.
387
389 socket(2), pcap(3), capabilities(7), ip(7), raw(7), socket(7)
390
391 RFC 894 for the standard IP Ethernet encapsulation. RFC 1700 for the
392 IEEE 802.3 IP encapsulation.
393
394 The <linux/if_ether.h> include file for physical-layer protocols.
395
396 The Linux kernel source tree. /Documentation/networking/filter.txt
397 describes how to apply Berkeley Packet Filters to packet sockets.
398 /tools/testing/selftests/net/psock_tpacket.c contains example source
399 code for all available versions of PACKET_RX_RING and PACKET_TX_RING.
400
402 This page is part of release 5.04 of the Linux man-pages project. A
403 description of the project, information about reporting bugs, and the
404 latest version of this page, can be found at
405 https://www.kernel.org/doc/man-pages/.
406
407
408
409Linux 2017-09-15 PACKET(7)