1PACKET(7)                  Linux Programmer's Manual                 PACKET(7)
2
3
4

NAME

6       packet - packet interface on device level
7

SYNOPSIS

9       #include <sys/socket.h>
10       #include <linux/if_packet.h>
11       #include <net/ethernet.h> /* the L2 protocols */
12
13       packet_socket = socket(AF_PACKET, int socket_type, int protocol);
14

DESCRIPTION

16       Packet  sockets  are  used to receive or send raw packets at the device
17       driver (OSI Layer 2) level.  They allow the user to implement  protocol
18       modules in user space on top of the physical layer.
19
20       The  socket_type is either SOCK_RAW for raw packets including the link-
21       level header or SOCK_DGRAM  for  cooked  packets  with  the  link-level
22       header  removed.   The  link-level header information is available in a
23       common format in a sockaddr_ll structure.  protocol is the  IEEE  802.3
24       protocol  number  in  network  byte  order.  See the <linux/if_ether.h>
25       include file for a list of allowed protocols.  When protocol is set  to
26       htons(ETH_P_ALL),  then all protocols are received.  All incoming pack‐
27       ets of that protocol type will be passed to the  packet  socket  before
28       they are passed to the protocols implemented in the kernel.
29
30       In order to create a packet socket, a process must have the CAP_NET_RAW
31       capability in the user namespace that governs its network namespace.
32
33       SOCK_RAW packets are passed to and from the device driver  without  any
34       changes  in  the  packet data.  When receiving a packet, the address is
35       still parsed and passed in a standard  sockaddr_ll  address  structure.
36       When transmitting a packet, the user-supplied buffer should contain the
37       physical-layer header.  That packet is then queued  unmodified  to  the
38       network  driver  of  the  interface defined by the destination address.
39       Some device drivers always add other headers.  SOCK_RAW is  similar  to
40       but not compatible with the obsolete AF_INET/SOCK_PACKET of Linux 2.0.
41
42       SOCK_DGRAM operates on a slightly higher level.  The physical header is
43       removed before the packet is passed to the user.  Packets sent  through
44       a  SOCK_DGRAM  packet socket get a suitable physical-layer header based
45       on the information in the sockaddr_ll destination address  before  they
46       are queued.
47
48       By  default, all packets of the specified protocol type are passed to a
49       packet socket.  To get packets  only  from  a  specific  interface  use
50       bind(2)  specifying  an  address  in  a  struct sockaddr_ll to bind the
51       packet socket to an interface.  Fields used for binding are  sll_family
52       (should be AF_PACKET), sll_protocol, and sll_ifindex.
53
54       The connect(2) operation is not supported on packet sockets.
55
56       When   the   MSG_TRUNC  flag  is  passed  to  recvmsg(2),  recv(2),  or
57       recvfrom(2), the real length of  the  packet  on  the  wire  is  always
58       returned, even when it is longer than the buffer.
59
60   Address types
61       The   sockaddr_ll  structure  is  a  device-independent  physical-layer
62       address.
63
64           struct sockaddr_ll {
65               unsigned short sll_family;   /* Always AF_PACKET */
66               unsigned short sll_protocol; /* Physical-layer protocol */
67               int            sll_ifindex;  /* Interface number */
68               unsigned short sll_hatype;   /* ARP hardware type */
69               unsigned char  sll_pkttype;  /* Packet type */
70               unsigned char  sll_halen;    /* Length of address */
71               unsigned char  sll_addr[8];  /* Physical-layer address */
72           };
73
74       The fields of this structure are as follows:
75
76       *  sll_protocol is the standard ethernet protocol type in network  byte
77          order  as  defined  in  the  <linux/if_ether.h>  include  file.   It
78          defaults to the socket's protocol.
79
80       *  sll_ifindex is the interface index  of  the  interface  (see  netde‐
81          vice(7));  0  matches  any  interface  (only permitted for binding).
82          sll_hatype is an ARP type as defined in the <linux/if_arp.h> include
83          file.
84
85       *  sll_pkttype  contains  the packet type.  Valid types are PACKET_HOST
86          for a packet addressed to the local  host,  PACKET_BROADCAST  for  a
87          physical-layer  broadcast packet, PACKET_MULTICAST for a packet sent
88          to a physical-layer multicast address, PACKET_OTHERHOST for a packet
89          to  some  other host that has been caught by a device driver in pro‐
90          miscuous mode, and PACKET_OUTGOING for a packet originating from the
91          local host that is looped back to a packet socket.  These types make
92          sense only for receiving.
93
94       *  sll_addr and sll_halen contain the physical-layer (e.g., IEEE 802.3)
95          address  and  its  length.   The exact interpretation depends on the
96          device.
97
98       When you send packets, it is enough to  specify  sll_family,  sll_addr,
99       sll_halen,  sll_ifindex,  and sll_protocol.  The other fields should be
100       0.  sll_hatype and sll_pkttype are set on  received  packets  for  your
101       information.
102
103   Socket options
104       Packet  socket  options  are  configured  by calling setsockopt(2) with
105       level SOL_PACKET.
106
107       PACKET_ADD_MEMBERSHIP
108       PACKET_DROP_MEMBERSHIP
109              Packet sockets can be used to configure physical-layer multicas‐
110              ting and promiscuous mode.  PACKET_ADD_MEMBERSHIP adds a binding
111              and  PACKET_DROP_MEMBERSHIP  drops  it.   They  both  expect   a
112              packet_mreq structure as argument:
113
114                  struct packet_mreq {
115                      int            mr_ifindex;    /* interface index */
116                      unsigned short mr_type;       /* action */
117                      unsigned short mr_alen;       /* address length */
118                      unsigned char  mr_address[8]; /* physical-layer address */
119                  };
120
121              mr_ifindex  contains the interface index for the interface whose
122              status should be changed.  The  mr_type  field  specifies  which
123              action  to  perform.   PACKET_MR_PROMISC  enables  receiving all
124              packets on a shared medium (often known as "promiscuous  mode"),
125              PACKET_MR_MULTICAST  binds the socket to the physical-layer mul‐
126              ticast  group  specified  in   mr_address   and   mr_alen,   and
127              PACKET_MR_ALLMULTI  sets  the socket up to receive all multicast
128              packets arriving at the interface.
129
130              In addition, the traditional ioctls SIOCSIFFLAGS,  SIOCADDMULTI,
131              SIOCDELMULTI can be used for the same purpose.
132
133       PACKET_AUXDATA (since Linux 2.6.21)
134              If  this  binary  option  is enabled, the packet socket passes a
135              metadata structure along with each packet in the recvmsg(2) con‐
136              trol  field.   The  structure  can  be read with cmsg(3).  It is
137              defined as
138
139                  struct tpacket_auxdata {
140                      __u32 tp_status;
141                      __u32 tp_len;      /* packet length */
142                      __u32 tp_snaplen;  /* captured length */
143                      __u16 tp_mac;
144                      __u16 tp_net;
145                      __u16 tp_vlan_tci;
146                      __u16 tp_padding;
147                  };
148
149       PACKET_FANOUT (since Linux 3.1)
150              To scale processing across threads, packet sockets  can  form  a
151              fanout  group.   In  this mode, each matching packet is enqueued
152              onto only one socket in the group.   A  socket  joins  a  fanout
153              group  by calling setsockopt(2) with level SOL_PACKET and option
154              PACKET_FANOUT.  Each network namespace  can  have  up  to  65536
155              independent groups.  A socket selects a group by encoding the ID
156              in the first 16 bits of the integer  option  value.   The  first
157              packet  socket  to  join a group implicitly creates it.  To suc‐
158              cessfully join an existing group, subsequent packet sockets must
159              have  the  same protocol, device settings, fanout mode and flags
160              (see below).  Packet sockets can leave a fanout  group  only  by
161              closing  the  socket.  The group is deleted when the last socket
162              is closed.
163
164              Fanout supports multiple algorithms to  spread  traffic  between
165              sockets, as follows:
166
167              *  The  default mode, PACKET_FANOUT_HASH, sends packets from the
168                 same flow to the same socket to maintain  per-flow  ordering.
169                 For  each  packet,  it  chooses a socket by taking the packet
170                 flow hash modulo the number of sockets in the group, where  a
171                 flow  hash  is a hash over network-layer address and optional
172                 transport-layer port fields.
173
174              *  The load-balance mode PACKET_FANOUT_LB  implements  a  round-
175                 robin algorithm.
176
177              *  PACKET_FANOUT_CPU  selects  the  socket based on the CPU that
178                 the packet arrived on.
179
180              *  PACKET_FANOUT_ROLLOVER processes all data on a single socket,
181                 moving to the next when one becomes backlogged.
182
183              *  PACKET_FANOUT_RND  selects  the  socket using a pseudo-random
184                 number generator.
185
186              *  PACKET_FANOUT_QM (available since  Linux  3.14)  selects  the
187                 socket using the recorded queue_mapping of the received skb.
188
189              Fanout  modes  can  take  additional  options.  IP fragmentation
190              causes packets from the same flow to have different flow hashes.
191              The flag PACKET_FANOUT_FLAG_DEFRAG, if set, causes packets to be
192              defragmented before fanout is applied, to preserve order even in
193              this case.  Fanout mode and options are communicated in the sec‐
194              ond  16  bits  of  the   integer   option   value.    The   flag
195              PACKET_FANOUT_FLAG_ROLLOVER enables the roll over mechanism as a
196              backup strategy: if the  original  fanout  algorithm  selects  a
197              backlogged  socket,  the packet rolls over to the next available
198              one.
199
200       PACKET_LOSS (with PACKET_TX_RING)
201              When a malformed packet is encountered on a transmit  ring,  the
202              default  is to reset its tp_status to TP_STATUS_WRONG_FORMAT and
203              abort the transmission immediately.  The malformed packet blocks
204              itself  and  subsequently enqueued packets from being sent.  The
205              format error must be fixed, the associated  tp_status  reset  to
206              TP_STATUS_SEND_REQUEST,  and  the transmission process restarted
207              via send(2).  However, if  PACKET_LOSS  is  set,  any  malformed
208              packet  will be skipped, its tp_status reset to TP_STATUS_AVAIL‐
209              ABLE, and the transmission process continued.
210
211       PACKET_RESERVE (with PACKET_RX_RING)
212              By default, a packet receive  ring  writes  packets  immediately
213              following  the  metadata  structure and alignment padding.  This
214              integer option reserves additional headroom.
215
216       PACKET_RX_RING
217              Create a  memory-mapped  ring  buffer  for  asynchronous  packet
218              reception.   The  packet  socket reserves a contiguous region of
219              application address space, lays it out into an array  of  packet
220              slots  and  copies  packets  (up  to tp_snaplen) into subsequent
221              slots.  Each packet is preceded by a metadata structure  similar
222              to  tpacket_auxdata.   The  protocol fields encode the offset to
223              the data from the start of the metadata header.   tp_net  stores
224              the  offset  to  the  network layer.  If the packet socket is of
225              type SOCK_DGRAM, then tp_mac is the same.   If  it  is  of  type
226              SOCK_RAW,  then  that  field stores the offset to the link-layer
227              frame.  Packet socket and application communicate the  head  and
228              tail of the ring through the tp_status field.  The packet socket
229              owns all slots with tp_status equal to TP_STATUS_KERNEL.   After
230              filling  a  slot,  it changes the status of the slot to transfer
231              ownership to the application.  During normal operation, the  new
232              tp_status  value has at least the TP_STATUS_USER bit set to sig‐
233              nal that a received packet has been stored.  When  the  applica‐
234              tion has finished processing a packet, it transfers ownership of
235              the slot back to  the  socket  by  setting  tp_status  equal  to
236              TP_STATUS_KERNEL.
237
238              Packet  sockets  implement multiple variants of the packet ring.
239              The implementation details are described  in  Documentation/net‐
240              working/packet_mmap.txt in the Linux kernel source tree.
241
242       PACKET_STATISTICS
243              Retrieve packet socket statistics in the form of a structure
244
245                  struct tpacket_stats {
246                      unsigned int tp_packets;  /* Total packet count */
247                      unsigned int tp_drops;    /* Dropped packet count */
248                  };
249
250              Receiving  statistics resets the internal counters.  The statis‐
251              tics structure differs when using a ring of variant TPACKET_V3.
252
253       PACKET_TIMESTAMP (with PACKET_RX_RING; since Linux 2.6.36)
254              The packet receive ring always stores a timestamp in  the  meta‐
255              data header.  By default, this is a software generated timestamp
256              generated when the packet is copied into the ring.  This integer
257              option  selects  the type of timestamp.  Besides the default, it
258              support the two hardware formats described in Documentation/net‐
259              working/timestamping.txt in the Linux kernel source tree.
260
261       PACKET_TX_RING (since Linux 2.6.31)
262              Create  a  memory-mapped  ring  buffer  for packet transmission.
263              This option is similar to  PACKET_RX_RING  and  takes  the  same
264              arguments.   The  application  writes  packets  into  slots with
265              tp_status equal to TP_STATUS_AVAILABLE and  schedules  them  for
266              transmission  by  changing  tp_status to TP_STATUS_SEND_REQUEST.
267              When packets are ready to be transmitted, the application  calls
268              send(2)  or  a  variant thereof.  The buf and len fields of this
269              call are ignored.  If an address is passed  using  sendto(2)  or
270              sendmsg(2), then that overrides the socket default.  On success‐
271              ful  transmission,  the  socket  resets  tp_status  to   TP_STA‐
272              TUS_AVAILABLE.   It immediately aborts the transmission on error
273              unless PACKET_LOSS is set.
274
275       PACKET_VERSION (with PACKET_RX_RING; since Linux 2.6.27)
276              By default, PACKET_RX_RING creates  a  packet  receive  ring  of
277              variant  TPACKET_V1.   To  create another variant, configure the
278              desired variant by setting this integer option  before  creating
279              the ring.
280
281       PACKET_QDISC_BYPASS (since Linux 3.14)
282              By default, packets sent through packet sockets pass through the
283              kernel's qdisc (traffic control) layer, which is  fine  for  the
284              vast  majority  of  use cases.  For traffic generator appliances
285              using packet sockets that intend to brute-force flood  the  net‐
286              work—for  example, to test devices under load in a similar fash‐
287              ion to pktgen—this layer can be bypassed by setting this integer
288              option  to  1.   A  side  effect is that packet buffering in the
289              qdisc layer is avoided, which will lead to increased drops  when
290              network  device transmit queues are busy; therefore, use at your
291              own risk.
292
293   Ioctls
294       SIOCGSTAMP can be used to receive the timestamp of  the  last  received
295       packet.  Argument is a struct timeval variable.
296
297       In  addition, all standard ioctls defined in netdevice(7) and socket(7)
298       are valid on packet sockets.
299
300   Error handling
301       Packet sockets do no error handling other than  errors  occurred  while
302       passing  the  packet to the device driver.  They don't have the concept
303       of a pending error.
304

ERRORS

306       EADDRNOTAVAIL
307              Unknown multicast group address passed.
308
309       EFAULT User passed invalid memory address.
310
311       EINVAL Invalid argument.
312
313       EMSGSIZE
314              Packet is bigger than interface MTU.
315
316       ENETDOWN
317              Interface is not up.
318
319       ENOBUFS
320              Not enough memory to allocate the packet.
321
322       ENODEV Unknown device name or interface index  specified  in  interface
323              address.
324
325       ENOENT No packet received.
326
327       ENOTCONN
328              No interface address passed.
329
330       ENXIO  Interface address contained an invalid interface index.
331
332       EPERM  User has insufficient privileges to carry out this operation.
333
334       In addition, other errors may be generated by the low-level driver.
335

VERSIONS

337       AF_PACKET  is  a new feature in Linux 2.2.  Earlier Linux versions sup‐
338       ported only SOCK_PACKET.
339

NOTES

341       For portable programs it is suggested to  use  AF_PACKET  via  pcap(3);
342       although this covers only a subset of the AF_PACKET features.
343
344       The  SOCK_DGRAM  packet  sockets make no attempt to create or parse the
345       IEEE 802.2 LLC header for a IEEE  802.3  frame.   When  ETH_P_802_3  is
346       specified  as  protocol  for sending the kernel creates the 802.3 frame
347       and fills out the length field; the user has to supply the  LLC  header
348       to  get a fully conforming packet.  Incoming 802.3 packets are not mul‐
349       tiplexed on the DSAP/SSAP protocol fields; instead they are supplied to
350       the  user  as protocol ETH_P_802_2 with the LLC header prefixed.  It is
351       thus not possible to bind to ETH_P_802_3; bind to  ETH_P_802_2  instead
352       and do the protocol multiplex yourself.  The default for sending is the
353       standard Ethernet DIX encapsulation with the protocol filled in.
354
355       Packet sockets are not subject to the input or output firewall chains.
356
357   Compatibility
358       In Linux 2.0, the only way to get a packet socket was with the call:
359
360           socket(AF_INET, SOCK_PACKET, protocol)
361
362       This is still supported, but deprecated and strongly discouraged.   The
363       main  difference  between  the two methods is that SOCK_PACKET uses the
364       old struct sockaddr_pkt to specify an interface, which doesn't  provide
365       physical-layer independence.
366
367           struct sockaddr_pkt {
368               unsigned short spkt_family;
369               unsigned char  spkt_device[14];
370               unsigned short spkt_protocol;
371           };
372
373       spkt_family  contains  the device type, spkt_protocol is the IEEE 802.3
374       protocol type as defined in <sys/if_ether.h>  and  spkt_device  is  the
375       device name as a null-terminated string, for example, eth0.
376
377       This structure is obsolete and should not be used in new code.
378

BUGS

380       The IEEE 802.2/803.3 LLC handling could be considered as a bug.
381
382       Socket filters are not documented.
383
384       The  MSG_TRUNC  recvmsg(2)  extension  is  an  ugly  hack and should be
385       replaced by a control message.  There is currently no way  to  get  the
386       original destination address of packets via SOCK_DGRAM.
387

SEE ALSO

389       socket(2), pcap(3), capabilities(7), ip(7), raw(7), socket(7)
390
391       RFC 894  for  the standard IP Ethernet encapsulation.  RFC 1700 for the
392       IEEE 802.3 IP encapsulation.
393
394       The <linux/if_ether.h> include file for physical-layer protocols.
395
396       The Linux  kernel  source  tree.   /Documentation/networking/filter.txt
397       describes  how  to  apply  Berkeley  Packet  Filters to packet sockets.
398       /tools/testing/selftests/net/psock_tpacket.c  contains  example  source
399       code for all available versions of PACKET_RX_RING and PACKET_TX_RING.
400

COLOPHON

402       This  page  is  part of release 4.15 of the Linux man-pages project.  A
403       description of the project, information about reporting bugs,  and  the
404       latest     version     of     this    page,    can    be    found    at
405       https://www.kernel.org/doc/man-pages/.
406
407
408
409Linux                             2017-09-15                         PACKET(7)
Impressum