1packet(7) Miscellaneous Information Manual packet(7)
2
3
4
6 packet - packet interface on device level
7
9 #include <sys/socket.h>
10 #include <linux/if_packet.h>
11 #include <net/ethernet.h> /* the L2 protocols */
12
13 packet_socket = socket(AF_PACKET, int socket_type, int protocol);
14
16 Packet sockets are used to receive or send raw packets at the device
17 driver (OSI Layer 2) level. They allow the user to implement protocol
18 modules in user space on top of the physical layer.
19
20 The socket_type is either SOCK_RAW for raw packets including the link-
21 level header or SOCK_DGRAM for cooked packets with the link-level
22 header removed. The link-level header information is available in a
23 common format in a sockaddr_ll structure. protocol is the IEEE 802.3
24 protocol number in network byte order. See the <linux/if_ether.h> in‐
25 clude file for a list of allowed protocols. When protocol is set to
26 htons(ETH_P_ALL), then all protocols are received. All incoming pack‐
27 ets of that protocol type will be passed to the packet socket before
28 they are passed to the protocols implemented in the kernel. If proto‐
29 col is set to zero, no packets are received. bind(2) can optionally be
30 called with a nonzero sll_protocol to start receiving packets for the
31 protocols specified.
32
33 In order to create a packet socket, a process must have the CAP_NET_RAW
34 capability in the user namespace that governs its network namespace.
35
36 SOCK_RAW packets are passed to and from the device driver without any
37 changes in the packet data. When receiving a packet, the address is
38 still parsed and passed in a standard sockaddr_ll address structure.
39 When transmitting a packet, the user-supplied buffer should contain the
40 physical-layer header. That packet is then queued unmodified to the
41 network driver of the interface defined by the destination address.
42 Some device drivers always add other headers. SOCK_RAW is similar to
43 but not compatible with the obsolete AF_INET/SOCK_PACKET of Linux 2.0.
44
45 SOCK_DGRAM operates on a slightly higher level. The physical header is
46 removed before the packet is passed to the user. Packets sent through
47 a SOCK_DGRAM packet socket get a suitable physical-layer header based
48 on the information in the sockaddr_ll destination address before they
49 are queued.
50
51 By default, all packets of the specified protocol type are passed to a
52 packet socket. To get packets only from a specific interface use
53 bind(2) specifying an address in a struct sockaddr_ll to bind the
54 packet socket to an interface. Fields used for binding are sll_family
55 (should be AF_PACKET), sll_protocol, and sll_ifindex.
56
57 The connect(2) operation is not supported on packet sockets.
58
59 When the MSG_TRUNC flag is passed to recvmsg(2), recv(2), or
60 recvfrom(2), the real length of the packet on the wire is always re‐
61 turned, even when it is longer than the buffer.
62
63 Address types
64 The sockaddr_ll structure is a device-independent physical-layer ad‐
65 dress.
66
67 struct sockaddr_ll {
68 unsigned short sll_family; /* Always AF_PACKET */
69 unsigned short sll_protocol; /* Physical-layer protocol */
70 int sll_ifindex; /* Interface number */
71 unsigned short sll_hatype; /* ARP hardware type */
72 unsigned char sll_pkttype; /* Packet type */
73 unsigned char sll_halen; /* Length of address */
74 unsigned char sll_addr[8]; /* Physical-layer address */
75 };
76
77 The fields of this structure are as follows:
78
79 sll_protocol
80 is the standard ethernet protocol type in network byte order as
81 defined in the <linux/if_ether.h> include file. It defaults to
82 the socket's protocol.
83
84 sll_ifindex
85 is the interface index of the interface (see netdevice(7)); 0
86 matches any interface (only permitted for binding). sll_hatype
87 is an ARP type as defined in the <linux/if_arp.h> include file.
88
89 sll_pkttype
90 contains the packet type. Valid types are PACKET_HOST for a
91 packet addressed to the local host, PACKET_BROADCAST for a phys‐
92 ical-layer broadcast packet, PACKET_MULTICAST for a packet sent
93 to a physical-layer multicast address, PACKET_OTHERHOST for a
94 packet to some other host that has been caught by a device
95 driver in promiscuous mode, and PACKET_OUTGOING for a packet
96 originating from the local host that is looped back to a packet
97 socket. These types make sense only for receiving.
98
99 sll_addr
100 sll_halen
101 contain the physical-layer (e.g., IEEE 802.3) address and its
102 length. The exact interpretation depends on the device.
103
104 When you send packets, it is enough to specify sll_family, sll_addr,
105 sll_halen, sll_ifindex, and sll_protocol. The other fields should be
106 0. sll_hatype and sll_pkttype are set on received packets for your in‐
107 formation.
108
109 Socket options
110 Packet socket options are configured by calling setsockopt(2) with
111 level SOL_PACKET.
112
113 PACKET_ADD_MEMBERSHIP
114 PACKET_DROP_MEMBERSHIP
115 Packet sockets can be used to configure physical-layer multicas‐
116 ting and promiscuous mode. PACKET_ADD_MEMBERSHIP adds a binding
117 and PACKET_DROP_MEMBERSHIP drops it. They both expect a
118 packet_mreq structure as argument:
119
120 struct packet_mreq {
121 int mr_ifindex; /* interface index */
122 unsigned short mr_type; /* action */
123 unsigned short mr_alen; /* address length */
124 unsigned char mr_address[8]; /* physical-layer address */
125 };
126
127 mr_ifindex contains the interface index for the interface whose
128 status should be changed. The mr_type field specifies which ac‐
129 tion to perform. PACKET_MR_PROMISC enables receiving all pack‐
130 ets on a shared medium (often known as "promiscuous mode"),
131 PACKET_MR_MULTICAST binds the socket to the physical-layer mul‐
132 ticast group specified in mr_address and mr_alen, and
133 PACKET_MR_ALLMULTI sets the socket up to receive all multicast
134 packets arriving at the interface.
135
136 In addition, the traditional ioctls SIOCSIFFLAGS, SIOCADDMULTI,
137 SIOCDELMULTI can be used for the same purpose.
138
139 PACKET_AUXDATA (since Linux 2.6.21)
140 If this binary option is enabled, the packet socket passes a
141 metadata structure along with each packet in the recvmsg(2) con‐
142 trol field. The structure can be read with cmsg(3). It is de‐
143 fined as
144
145 struct tpacket_auxdata {
146 __u32 tp_status;
147 __u32 tp_len; /* packet length */
148 __u32 tp_snaplen; /* captured length */
149 __u16 tp_mac;
150 __u16 tp_net;
151 __u16 tp_vlan_tci;
152 __u16 tp_vlan_tpid; /* Since Linux 3.14; earlier, these
153 were unused padding bytes */
154 };
155
156 PACKET_FANOUT (since Linux 3.1)
157 To scale processing across threads, packet sockets can form a
158 fanout group. In this mode, each matching packet is enqueued
159 onto only one socket in the group. A socket joins a fanout
160 group by calling setsockopt(2) with level SOL_PACKET and option
161 PACKET_FANOUT. Each network namespace can have up to 65536 in‐
162 dependent groups. A socket selects a group by encoding the ID
163 in the first 16 bits of the integer option value. The first
164 packet socket to join a group implicitly creates it. To suc‐
165 cessfully join an existing group, subsequent packet sockets must
166 have the same protocol, device settings, fanout mode, and flags
167 (see below). Packet sockets can leave a fanout group only by
168 closing the socket. The group is deleted when the last socket
169 is closed.
170
171 Fanout supports multiple algorithms to spread traffic between
172 sockets, as follows:
173
174 • The default mode, PACKET_FANOUT_HASH, sends packets from the
175 same flow to the same socket to maintain per-flow ordering.
176 For each packet, it chooses a socket by taking the packet
177 flow hash modulo the number of sockets in the group, where a
178 flow hash is a hash over network-layer address and optional
179 transport-layer port fields.
180
181 • The load-balance mode PACKET_FANOUT_LB implements a round-
182 robin algorithm.
183
184 • PACKET_FANOUT_CPU selects the socket based on the CPU that
185 the packet arrived on.
186
187 • PACKET_FANOUT_ROLLOVER processes all data on a single socket,
188 moving to the next when one becomes backlogged.
189
190 • PACKET_FANOUT_RND selects the socket using a pseudo-random
191 number generator.
192
193 • PACKET_FANOUT_QM (available since Linux 3.14) selects the
194 socket using the recorded queue_mapping of the received skb.
195
196 Fanout modes can take additional options. IP fragmentation
197 causes packets from the same flow to have different flow hashes.
198 The flag PACKET_FANOUT_FLAG_DEFRAG, if set, causes packets to be
199 defragmented before fanout is applied, to preserve order even in
200 this case. Fanout mode and options are communicated in the sec‐
201 ond 16 bits of the integer option value. The flag
202 PACKET_FANOUT_FLAG_ROLLOVER enables the roll over mechanism as a
203 backup strategy: if the original fanout algorithm selects a
204 backlogged socket, the packet rolls over to the next available
205 one.
206
207 PACKET_LOSS (with PACKET_TX_RING)
208 When a malformed packet is encountered on a transmit ring, the
209 default is to reset its tp_status to TP_STATUS_WRONG_FORMAT and
210 abort the transmission immediately. The malformed packet blocks
211 itself and subsequently enqueued packets from being sent. The
212 format error must be fixed, the associated tp_status reset to
213 TP_STATUS_SEND_REQUEST, and the transmission process restarted
214 via send(2). However, if PACKET_LOSS is set, any malformed
215 packet will be skipped, its tp_status reset to TP_STATUS_AVAIL‐
216 ABLE, and the transmission process continued.
217
218 PACKET_RESERVE (with PACKET_RX_RING)
219 By default, a packet receive ring writes packets immediately
220 following the metadata structure and alignment padding. This
221 integer option reserves additional headroom.
222
223 PACKET_RX_RING
224 Create a memory-mapped ring buffer for asynchronous packet re‐
225 ception. The packet socket reserves a contiguous region of ap‐
226 plication address space, lays it out into an array of packet
227 slots and copies packets (up to tp_snaplen) into subsequent
228 slots. Each packet is preceded by a metadata structure similar
229 to tpacket_auxdata. The protocol fields encode the offset to
230 the data from the start of the metadata header. tp_net stores
231 the offset to the network layer. If the packet socket is of
232 type SOCK_DGRAM, then tp_mac is the same. If it is of type
233 SOCK_RAW, then that field stores the offset to the link-layer
234 frame. Packet socket and application communicate the head and
235 tail of the ring through the tp_status field. The packet socket
236 owns all slots with tp_status equal to TP_STATUS_KERNEL. After
237 filling a slot, it changes the status of the slot to transfer
238 ownership to the application. During normal operation, the new
239 tp_status value has at least the TP_STATUS_USER bit set to sig‐
240 nal that a received packet has been stored. When the applica‐
241 tion has finished processing a packet, it transfers ownership of
242 the slot back to the socket by setting tp_status equal to
243 TP_STATUS_KERNEL.
244
245 Packet sockets implement multiple variants of the packet ring.
246 The implementation details are described in Documentation/net‐
247 working/packet_mmap.rst in the Linux kernel source tree.
248
249 PACKET_STATISTICS
250 Retrieve packet socket statistics in the form of a structure
251
252 struct tpacket_stats {
253 unsigned int tp_packets; /* Total packet count */
254 unsigned int tp_drops; /* Dropped packet count */
255 };
256
257 Receiving statistics resets the internal counters. The statis‐
258 tics structure differs when using a ring of variant TPACKET_V3.
259
260 PACKET_TIMESTAMP (with PACKET_RX_RING; since Linux 2.6.36)
261 The packet receive ring always stores a timestamp in the meta‐
262 data header. By default, this is a software generated timestamp
263 generated when the packet is copied into the ring. This integer
264 option selects the type of timestamp. Besides the default, it
265 support the two hardware formats described in Documentation/net‐
266 working/timestamping.rst in the Linux kernel source tree.
267
268 PACKET_TX_RING (since Linux 2.6.31)
269 Create a memory-mapped ring buffer for packet transmission.
270 This option is similar to PACKET_RX_RING and takes the same ar‐
271 guments. The application writes packets into slots with tp_sta‐
272 tus equal to TP_STATUS_AVAILABLE and schedules them for trans‐
273 mission by changing tp_status to TP_STATUS_SEND_REQUEST. When
274 packets are ready to be transmitted, the application calls
275 send(2) or a variant thereof. The buf and len fields of this
276 call are ignored. If an address is passed using sendto(2) or
277 sendmsg(2), then that overrides the socket default. On success‐
278 ful transmission, the socket resets tp_status to TP_STA‐
279 TUS_AVAILABLE. It immediately aborts the transmission on error
280 unless PACKET_LOSS is set.
281
282 PACKET_VERSION (with PACKET_RX_RING; since Linux 2.6.27)
283 By default, PACKET_RX_RING creates a packet receive ring of
284 variant TPACKET_V1. To create another variant, configure the
285 desired variant by setting this integer option before creating
286 the ring.
287
288 PACKET_QDISC_BYPASS (since Linux 3.14)
289 By default, packets sent through packet sockets pass through the
290 kernel's qdisc (traffic control) layer, which is fine for the
291 vast majority of use cases. For traffic generator appliances
292 using packet sockets that intend to brute-force flood the net‐
293 work—for example, to test devices under load in a similar fash‐
294 ion to pktgen—this layer can be bypassed by setting this integer
295 option to 1. A side effect is that packet buffering in the
296 qdisc layer is avoided, which will lead to increased drops when
297 network device transmit queues are busy; therefore, use at your
298 own risk.
299
300 Ioctls
301 SIOCGSTAMP can be used to receive the timestamp of the last received
302 packet. Argument is a struct timeval variable.
303
304 In addition, all standard ioctls defined in netdevice(7) and socket(7)
305 are valid on packet sockets.
306
307 Error handling
308 Packet sockets do no error handling other than errors occurred while
309 passing the packet to the device driver. They don't have the concept
310 of a pending error.
311
313 EADDRNOTAVAIL
314 Unknown multicast group address passed.
315
316 EFAULT User passed invalid memory address.
317
318 EINVAL Invalid argument.
319
320 EMSGSIZE
321 Packet is bigger than interface MTU.
322
323 ENETDOWN
324 Interface is not up.
325
326 ENOBUFS
327 Not enough memory to allocate the packet.
328
329 ENODEV Unknown device name or interface index specified in interface
330 address.
331
332 ENOENT No packet received.
333
334 ENOTCONN
335 No interface address passed.
336
337 ENXIO Interface address contained an invalid interface index.
338
339 EPERM User has insufficient privileges to carry out this operation.
340
341 In addition, other errors may be generated by the low-level driver.
342
344 AF_PACKET is a new feature in Linux 2.2. Earlier Linux versions sup‐
345 ported only SOCK_PACKET.
346
348 For portable programs it is suggested to use AF_PACKET via pcap(3); al‐
349 though this covers only a subset of the AF_PACKET features.
350
351 The SOCK_DGRAM packet sockets make no attempt to create or parse the
352 IEEE 802.2 LLC header for a IEEE 802.3 frame. When ETH_P_802_3 is
353 specified as protocol for sending the kernel creates the 802.3 frame
354 and fills out the length field; the user has to supply the LLC header
355 to get a fully conforming packet. Incoming 802.3 packets are not mul‐
356 tiplexed on the DSAP/SSAP protocol fields; instead they are supplied to
357 the user as protocol ETH_P_802_2 with the LLC header prefixed. It is
358 thus not possible to bind to ETH_P_802_3; bind to ETH_P_802_2 instead
359 and do the protocol multiplex yourself. The default for sending is the
360 standard Ethernet DIX encapsulation with the protocol filled in.
361
362 Packet sockets are not subject to the input or output firewall chains.
363
364 Compatibility
365 In Linux 2.0, the only way to get a packet socket was with the call:
366
367 socket(AF_INET, SOCK_PACKET, protocol)
368
369 This is still supported, but deprecated and strongly discouraged. The
370 main difference between the two methods is that SOCK_PACKET uses the
371 old struct sockaddr_pkt to specify an interface, which doesn't provide
372 physical-layer independence.
373
374 struct sockaddr_pkt {
375 unsigned short spkt_family;
376 unsigned char spkt_device[14];
377 unsigned short spkt_protocol;
378 };
379
380 spkt_family contains the device type, spkt_protocol is the IEEE 802.3
381 protocol type as defined in <sys/if_ether.h> and spkt_device is the de‐
382 vice name as a null-terminated string, for example, eth0.
383
384 This structure is obsolete and should not be used in new code.
385
387 LLC header handling
388 The IEEE 802.2/803.3 LLC handling could be considered as a bug.
389
390 MSG_TRUNC issues
391 The MSG_TRUNC recvmsg(2) extension is an ugly hack and should be re‐
392 placed by a control message. There is currently no way to get the
393 original destination address of packets via SOCK_DGRAM.
394
395 spkt_device device name truncation
396 The spkt_device field of sockaddr_pkt has a size of 14 bytes, which is
397 less than the constant IFNAMSIZ defined in <net/if.h> which is 16 bytes
398 and describes the system limit for a network interface name. This
399 means the names of network devices longer than 14 bytes will be trun‐
400 cated to fit into spkt_device. All these lengths include the terminat‐
401 ing null byte ('\0')).
402
403 Issues from this with old code typically show up with very long inter‐
404 face names used by the Predictable Network Interface Names feature en‐
405 abled by default in many modern Linux distributions.
406
407 The preferred solution is to rewrite code to avoid SOCK_PACKET. Possi‐
408 ble user solutions are to disable Predictable Network Interface Names
409 or to rename the interface to a name of at most 13 bytes, for example
410 using the ip(8) tool.
411
412 Documentation issues
413 Socket filters are not documented.
414
416 socket(2), pcap(3), capabilities(7), ip(7), raw(7), socket(7), ip(8),
417
418 RFC 894 for the standard IP Ethernet encapsulation. RFC 1700 for the
419 IEEE 802.3 IP encapsulation.
420
421 The <linux/if_ether.h> include file for physical-layer protocols.
422
423 The Linux kernel source tree. Documentation/networking/filter.rst de‐
424 scribes how to apply Berkeley Packet Filters to packet sockets.
425 tools/testing/selftests/net/psock_tpacket.c contains example source
426 code for all available versions of PACKET_RX_RING and PACKET_TX_RING.
427
428
429
430Linux man-pages 6.04 2023-02-05 packet(7)