1PACKET(7) Linux Programmer's Manual PACKET(7)
2
3
4
6 packet - packet interface on device level
7
9 #include <sys/socket.h>
10 #include <linux/if_packet.h>
11 #include <net/ethernet.h> /* the L2 protocols */
12
13 packet_socket = socket(AF_PACKET, int socket_type, int protocol);
14
16 Packet sockets are used to receive or send raw packets at the device
17 driver (OSI Layer 2) level. They allow the user to implement protocol
18 modules in user space on top of the physical layer.
19
20 The socket_type is either SOCK_RAW for raw packets including the link-
21 level header or SOCK_DGRAM for cooked packets with the link-level
22 header removed. The link-level header information is available in a
23 common format in a sockaddr_ll structure. protocol is the IEEE 802.3
24 protocol number in network byte order. See the <linux/if_ether.h>
25 include file for a list of allowed protocols. When protocol is set to
26 htons(ETH_P_ALL), then all protocols are received. All incoming pack‐
27 ets of that protocol type will be passed to the packet socket before
28 they are passed to the protocols implemented in the kernel.
29
30 In order to create a packet socket, a process must have the CAP_NET_RAW
31 capability in the user namespace that governs its network namespace.
32
33 SOCK_RAW packets are passed to and from the device driver without any
34 changes in the packet data. When receiving a packet, the address is
35 still parsed and passed in a standard sockaddr_ll address structure.
36 When transmitting a packet, the user-supplied buffer should contain the
37 physical-layer header. That packet is then queued unmodified to the
38 network driver of the interface defined by the destination address.
39 Some device drivers always add other headers. SOCK_RAW is similar to
40 but not compatible with the obsolete AF_INET/SOCK_PACKET of Linux 2.0.
41
42 SOCK_DGRAM operates on a slightly higher level. The physical header is
43 removed before the packet is passed to the user. Packets sent through
44 a SOCK_DGRAM packet socket get a suitable physical-layer header based
45 on the information in the sockaddr_ll destination address before they
46 are queued.
47
48 By default, all packets of the specified protocol type are passed to a
49 packet socket. To get packets only from a specific interface use
50 bind(2) specifying an address in a struct sockaddr_ll to bind the
51 packet socket to an interface. Fields used for binding are sll_family
52 (should be AF_PACKET), sll_protocol, and sll_ifindex.
53
54 The connect(2) operation is not supported on packet sockets.
55
56 When the MSG_TRUNC flag is passed to recvmsg(2), recv(2), or
57 recvfrom(2), the real length of the packet on the wire is always
58 returned, even when it is longer than the buffer.
59
60 Address types
61 The sockaddr_ll structure is a device-independent physical-layer
62 address.
63
64 struct sockaddr_ll {
65 unsigned short sll_family; /* Always AF_PACKET */
66 unsigned short sll_protocol; /* Physical-layer protocol */
67 int sll_ifindex; /* Interface number */
68 unsigned short sll_hatype; /* ARP hardware type */
69 unsigned char sll_pkttype; /* Packet type */
70 unsigned char sll_halen; /* Length of address */
71 unsigned char sll_addr[8]; /* Physical-layer address */
72 };
73
74 The fields of this structure are as follows:
75
76 * sll_protocol is the standard ethernet protocol type in network byte
77 order as defined in the <linux/if_ether.h> include file. It
78 defaults to the socket's protocol.
79
80 * sll_ifindex is the interface index of the interface (see netde‐
81 vice(7)); 0 matches any interface (only permitted for binding).
82 sll_hatype is an ARP type as defined in the <linux/if_arp.h> include
83 file.
84
85 * sll_pkttype contains the packet type. Valid types are PACKET_HOST
86 for a packet addressed to the local host, PACKET_BROADCAST for a
87 physical-layer broadcast packet, PACKET_MULTICAST for a packet sent
88 to a physical-layer multicast address, PACKET_OTHERHOST for a packet
89 to some other host that has been caught by a device driver in pro‐
90 miscuous mode, and PACKET_OUTGOING for a packet originating from the
91 local host that is looped back to a packet socket. These types make
92 sense only for receiving.
93
94 * sll_addr and sll_halen contain the physical-layer (e.g., IEEE 802.3)
95 address and its length. The exact interpretation depends on the
96 device.
97
98 When you send packets, it is enough to specify sll_family, sll_addr,
99 sll_halen, sll_ifindex, and sll_protocol. The other fields should be
100 0. sll_hatype and sll_pkttype are set on received packets for your
101 information.
102
103 Socket options
104 Packet socket options are configured by calling setsockopt(2) with
105 level SOL_PACKET.
106
107 PACKET_ADD_MEMBERSHIP
108 PACKET_DROP_MEMBERSHIP
109 Packet sockets can be used to configure physical-layer multicas‐
110 ting and promiscuous mode. PACKET_ADD_MEMBERSHIP adds a binding
111 and PACKET_DROP_MEMBERSHIP drops it. They both expect a
112 packet_mreq structure as argument:
113
114 struct packet_mreq {
115 int mr_ifindex; /* interface index */
116 unsigned short mr_type; /* action */
117 unsigned short mr_alen; /* address length */
118 unsigned char mr_address[8]; /* physical-layer address */
119 };
120
121 mr_ifindex contains the interface index for the interface whose
122 status should be changed. The mr_type field specifies which
123 action to perform. PACKET_MR_PROMISC enables receiving all
124 packets on a shared medium (often known as "promiscuous mode"),
125 PACKET_MR_MULTICAST binds the socket to the physical-layer mul‐
126 ticast group specified in mr_address and mr_alen, and
127 PACKET_MR_ALLMULTI sets the socket up to receive all multicast
128 packets arriving at the interface.
129
130 In addition, the traditional ioctls SIOCSIFFLAGS, SIOCADDMULTI,
131 SIOCDELMULTI can be used for the same purpose.
132
133 PACKET_AUXDATA (since Linux 2.6.21)
134 If this binary option is enabled, the packet socket passes a
135 metadata structure along with each packet in the recvmsg(2) con‐
136 trol field. The structure can be read with cmsg(3). It is
137 defined as
138
139 struct tpacket_auxdata {
140 __u32 tp_status;
141 __u32 tp_len; /* packet length */
142 __u32 tp_snaplen; /* captured length */
143 __u16 tp_mac;
144 __u16 tp_net;
145 __u16 tp_vlan_tci;
146 __u16 tp_vlan_tpid; /* Since Linux 3.14; earlier, these
147 were unused padding bytes */
148 };
149
150 PACKET_FANOUT (since Linux 3.1)
151 To scale processing across threads, packet sockets can form a
152 fanout group. In this mode, each matching packet is enqueued
153 onto only one socket in the group. A socket joins a fanout
154 group by calling setsockopt(2) with level SOL_PACKET and option
155 PACKET_FANOUT. Each network namespace can have up to 65536
156 independent groups. A socket selects a group by encoding the ID
157 in the first 16 bits of the integer option value. The first
158 packet socket to join a group implicitly creates it. To suc‐
159 cessfully join an existing group, subsequent packet sockets must
160 have the same protocol, device settings, fanout mode and flags
161 (see below). Packet sockets can leave a fanout group only by
162 closing the socket. The group is deleted when the last socket
163 is closed.
164
165 Fanout supports multiple algorithms to spread traffic between
166 sockets, as follows:
167
168 * The default mode, PACKET_FANOUT_HASH, sends packets from the
169 same flow to the same socket to maintain per-flow ordering.
170 For each packet, it chooses a socket by taking the packet
171 flow hash modulo the number of sockets in the group, where a
172 flow hash is a hash over network-layer address and optional
173 transport-layer port fields.
174
175 * The load-balance mode PACKET_FANOUT_LB implements a round-
176 robin algorithm.
177
178 * PACKET_FANOUT_CPU selects the socket based on the CPU that
179 the packet arrived on.
180
181 * PACKET_FANOUT_ROLLOVER processes all data on a single socket,
182 moving to the next when one becomes backlogged.
183
184 * PACKET_FANOUT_RND selects the socket using a pseudo-random
185 number generator.
186
187 * PACKET_FANOUT_QM (available since Linux 3.14) selects the
188 socket using the recorded queue_mapping of the received skb.
189
190 Fanout modes can take additional options. IP fragmentation
191 causes packets from the same flow to have different flow hashes.
192 The flag PACKET_FANOUT_FLAG_DEFRAG, if set, causes packets to be
193 defragmented before fanout is applied, to preserve order even in
194 this case. Fanout mode and options are communicated in the sec‐
195 ond 16 bits of the integer option value. The flag
196 PACKET_FANOUT_FLAG_ROLLOVER enables the roll over mechanism as a
197 backup strategy: if the original fanout algorithm selects a
198 backlogged socket, the packet rolls over to the next available
199 one.
200
201 PACKET_LOSS (with PACKET_TX_RING)
202 When a malformed packet is encountered on a transmit ring, the
203 default is to reset its tp_status to TP_STATUS_WRONG_FORMAT and
204 abort the transmission immediately. The malformed packet blocks
205 itself and subsequently enqueued packets from being sent. The
206 format error must be fixed, the associated tp_status reset to
207 TP_STATUS_SEND_REQUEST, and the transmission process restarted
208 via send(2). However, if PACKET_LOSS is set, any malformed
209 packet will be skipped, its tp_status reset to TP_STATUS_AVAIL‐
210 ABLE, and the transmission process continued.
211
212 PACKET_RESERVE (with PACKET_RX_RING)
213 By default, a packet receive ring writes packets immediately
214 following the metadata structure and alignment padding. This
215 integer option reserves additional headroom.
216
217 PACKET_RX_RING
218 Create a memory-mapped ring buffer for asynchronous packet
219 reception. The packet socket reserves a contiguous region of
220 application address space, lays it out into an array of packet
221 slots and copies packets (up to tp_snaplen) into subsequent
222 slots. Each packet is preceded by a metadata structure similar
223 to tpacket_auxdata. The protocol fields encode the offset to
224 the data from the start of the metadata header. tp_net stores
225 the offset to the network layer. If the packet socket is of
226 type SOCK_DGRAM, then tp_mac is the same. If it is of type
227 SOCK_RAW, then that field stores the offset to the link-layer
228 frame. Packet socket and application communicate the head and
229 tail of the ring through the tp_status field. The packet socket
230 owns all slots with tp_status equal to TP_STATUS_KERNEL. After
231 filling a slot, it changes the status of the slot to transfer
232 ownership to the application. During normal operation, the new
233 tp_status value has at least the TP_STATUS_USER bit set to sig‐
234 nal that a received packet has been stored. When the applica‐
235 tion has finished processing a packet, it transfers ownership of
236 the slot back to the socket by setting tp_status equal to
237 TP_STATUS_KERNEL.
238
239 Packet sockets implement multiple variants of the packet ring.
240 The implementation details are described in Documentation/net‐
241 working/packet_mmap.txt in the Linux kernel source tree.
242
243 PACKET_STATISTICS
244 Retrieve packet socket statistics in the form of a structure
245
246 struct tpacket_stats {
247 unsigned int tp_packets; /* Total packet count */
248 unsigned int tp_drops; /* Dropped packet count */
249 };
250
251 Receiving statistics resets the internal counters. The statis‐
252 tics structure differs when using a ring of variant TPACKET_V3.
253
254 PACKET_TIMESTAMP (with PACKET_RX_RING; since Linux 2.6.36)
255 The packet receive ring always stores a timestamp in the meta‐
256 data header. By default, this is a software generated timestamp
257 generated when the packet is copied into the ring. This integer
258 option selects the type of timestamp. Besides the default, it
259 support the two hardware formats described in Documentation/net‐
260 working/timestamping.txt in the Linux kernel source tree.
261
262 PACKET_TX_RING (since Linux 2.6.31)
263 Create a memory-mapped ring buffer for packet transmission.
264 This option is similar to PACKET_RX_RING and takes the same
265 arguments. The application writes packets into slots with
266 tp_status equal to TP_STATUS_AVAILABLE and schedules them for
267 transmission by changing tp_status to TP_STATUS_SEND_REQUEST.
268 When packets are ready to be transmitted, the application calls
269 send(2) or a variant thereof. The buf and len fields of this
270 call are ignored. If an address is passed using sendto(2) or
271 sendmsg(2), then that overrides the socket default. On success‐
272 ful transmission, the socket resets tp_status to TP_STA‐
273 TUS_AVAILABLE. It immediately aborts the transmission on error
274 unless PACKET_LOSS is set.
275
276 PACKET_VERSION (with PACKET_RX_RING; since Linux 2.6.27)
277 By default, PACKET_RX_RING creates a packet receive ring of
278 variant TPACKET_V1. To create another variant, configure the
279 desired variant by setting this integer option before creating
280 the ring.
281
282 PACKET_QDISC_BYPASS (since Linux 3.14)
283 By default, packets sent through packet sockets pass through the
284 kernel's qdisc (traffic control) layer, which is fine for the
285 vast majority of use cases. For traffic generator appliances
286 using packet sockets that intend to brute-force flood the net‐
287 work—for example, to test devices under load in a similar fash‐
288 ion to pktgen—this layer can be bypassed by setting this integer
289 option to 1. A side effect is that packet buffering in the
290 qdisc layer is avoided, which will lead to increased drops when
291 network device transmit queues are busy; therefore, use at your
292 own risk.
293
294 Ioctls
295 SIOCGSTAMP can be used to receive the timestamp of the last received
296 packet. Argument is a struct timeval variable.
297
298 In addition, all standard ioctls defined in netdevice(7) and socket(7)
299 are valid on packet sockets.
300
301 Error handling
302 Packet sockets do no error handling other than errors occurred while
303 passing the packet to the device driver. They don't have the concept
304 of a pending error.
305
307 EADDRNOTAVAIL
308 Unknown multicast group address passed.
309
310 EFAULT User passed invalid memory address.
311
312 EINVAL Invalid argument.
313
314 EMSGSIZE
315 Packet is bigger than interface MTU.
316
317 ENETDOWN
318 Interface is not up.
319
320 ENOBUFS
321 Not enough memory to allocate the packet.
322
323 ENODEV Unknown device name or interface index specified in interface
324 address.
325
326 ENOENT No packet received.
327
328 ENOTCONN
329 No interface address passed.
330
331 ENXIO Interface address contained an invalid interface index.
332
333 EPERM User has insufficient privileges to carry out this operation.
334
335 In addition, other errors may be generated by the low-level driver.
336
338 AF_PACKET is a new feature in Linux 2.2. Earlier Linux versions sup‐
339 ported only SOCK_PACKET.
340
342 For portable programs it is suggested to use AF_PACKET via pcap(3);
343 although this covers only a subset of the AF_PACKET features.
344
345 The SOCK_DGRAM packet sockets make no attempt to create or parse the
346 IEEE 802.2 LLC header for a IEEE 802.3 frame. When ETH_P_802_3 is
347 specified as protocol for sending the kernel creates the 802.3 frame
348 and fills out the length field; the user has to supply the LLC header
349 to get a fully conforming packet. Incoming 802.3 packets are not mul‐
350 tiplexed on the DSAP/SSAP protocol fields; instead they are supplied to
351 the user as protocol ETH_P_802_2 with the LLC header prefixed. It is
352 thus not possible to bind to ETH_P_802_3; bind to ETH_P_802_2 instead
353 and do the protocol multiplex yourself. The default for sending is the
354 standard Ethernet DIX encapsulation with the protocol filled in.
355
356 Packet sockets are not subject to the input or output firewall chains.
357
358 Compatibility
359 In Linux 2.0, the only way to get a packet socket was with the call:
360
361 socket(AF_INET, SOCK_PACKET, protocol)
362
363 This is still supported, but deprecated and strongly discouraged. The
364 main difference between the two methods is that SOCK_PACKET uses the
365 old struct sockaddr_pkt to specify an interface, which doesn't provide
366 physical-layer independence.
367
368 struct sockaddr_pkt {
369 unsigned short spkt_family;
370 unsigned char spkt_device[14];
371 unsigned short spkt_protocol;
372 };
373
374 spkt_family contains the device type, spkt_protocol is the IEEE 802.3
375 protocol type as defined in <sys/if_ether.h> and spkt_device is the
376 device name as a null-terminated string, for example, eth0.
377
378 This structure is obsolete and should not be used in new code.
379
381 The IEEE 802.2/803.3 LLC handling could be considered as a bug.
382
383 Socket filters are not documented.
384
385 The MSG_TRUNC recvmsg(2) extension is an ugly hack and should be
386 replaced by a control message. There is currently no way to get the
387 original destination address of packets via SOCK_DGRAM.
388
390 socket(2), pcap(3), capabilities(7), ip(7), raw(7), socket(7)
391
392 RFC 894 for the standard IP Ethernet encapsulation. RFC 1700 for the
393 IEEE 802.3 IP encapsulation.
394
395 The <linux/if_ether.h> include file for physical-layer protocols.
396
397 The Linux kernel source tree. /Documentation/networking/filter.txt
398 describes how to apply Berkeley Packet Filters to packet sockets.
399 /tools/testing/selftests/net/psock_tpacket.c contains example source
400 code for all available versions of PACKET_RX_RING and PACKET_TX_RING.
401
403 This page is part of release 5.07 of the Linux man-pages project. A
404 description of the project, information about reporting bugs, and the
405 latest version of this page, can be found at
406 https://www.kernel.org/doc/man-pages/.
407
408
409
410Linux 2020-02-09 PACKET(7)