1tcp(7) Miscellaneous Information Manual tcp(7)
2
3
4
6 tcp - TCP protocol
7
9 #include <sys/socket.h>
10 #include <netinet/in.h>
11 #include <netinet/tcp.h>
12
13 tcp_socket = socket(AF_INET, SOCK_STREAM, 0);
14
16 This is an implementation of the TCP protocol defined in RFC 793,
17 RFC 1122 and RFC 2001 with the NewReno and SACK extensions. It pro‐
18 vides a reliable, stream-oriented, full-duplex connection between two
19 sockets on top of ip(7), for both v4 and v6 versions. TCP guarantees
20 that the data arrives in order and retransmits lost packets. It gener‐
21 ates and checks a per-packet checksum to catch transmission errors.
22 TCP does not preserve record boundaries.
23
24 A newly created TCP socket has no remote or local address and is not
25 fully specified. To create an outgoing TCP connection use connect(2)
26 to establish a connection to another TCP socket. To receive new incom‐
27 ing connections, first bind(2) the socket to a local address and port
28 and then call listen(2) to put the socket into the listening state.
29 After that a new socket for each incoming connection can be accepted
30 using accept(2). A socket which has had accept(2) or connect(2) suc‐
31 cessfully called on it is fully specified and may transmit data. Data
32 cannot be transmitted on listening or not yet connected sockets.
33
34 Linux supports RFC 1323 TCP high performance extensions. These include
35 Protection Against Wrapped Sequence Numbers (PAWS), Window Scaling and
36 Timestamps. Window scaling allows the use of large (> 64 kB) TCP win‐
37 dows in order to support links with high latency or bandwidth. To make
38 use of them, the send and receive buffer sizes must be increased. They
39 can be set globally with the /proc/sys/net/ipv4/tcp_wmem and
40 /proc/sys/net/ipv4/tcp_rmem files, or on individual sockets by using
41 the SO_SNDBUF and SO_RCVBUF socket options with the setsockopt(2) call.
42
43 The maximum sizes for socket buffers declared via the SO_SNDBUF and
44 SO_RCVBUF mechanisms are limited by the values in the
45 /proc/sys/net/core/rmem_max and /proc/sys/net/core/wmem_max files.
46 Note that TCP actually allocates twice the size of the buffer requested
47 in the setsockopt(2) call, and so a succeeding getsockopt(2) call will
48 not return the same size of buffer as requested in the setsockopt(2)
49 call. TCP uses the extra space for administrative purposes and inter‐
50 nal kernel structures, and the /proc file values reflect the larger
51 sizes compared to the actual TCP windows. On individual connections,
52 the socket buffer size must be set prior to the listen(2) or connect(2)
53 calls in order to have it take effect. See socket(7) for more informa‐
54 tion.
55
56 TCP supports urgent data. Urgent data is used to signal the receiver
57 that some important message is part of the data stream and that it
58 should be processed as soon as possible. To send urgent data specify
59 the MSG_OOB option to send(2). When urgent data is received, the ker‐
60 nel sends a SIGURG signal to the process or process group that has been
61 set as the socket "owner" using the SIOCSPGRP or FIOSETOWN ioctls (or
62 the POSIX.1-specified fcntl(2) F_SETOWN operation). When the SO_OOBIN‐
63 LINE socket option is enabled, urgent data is put into the normal data
64 stream (a program can test for its location using the SIOCATMARK ioctl
65 described below), otherwise it can be received only when the MSG_OOB
66 flag is set for recv(2) or recvmsg(2).
67
68 When out-of-band data is present, select(2) indicates the file descrip‐
69 tor as having an exceptional condition and poll [4m(2) indicates a POLLPRI
70 event.
71
72 Linux 2.4 introduced a number of changes for improved throughput and
73 scaling, as well as enhanced functionality. Some of these features in‐
74 clude support for zero-copy sendfile(2), Explicit Congestion Notifica‐
75 tion, new management of TIME_WAIT sockets, keep-alive socket options
76 and support for Duplicate SACK extensions.
77
78 Address formats
79 TCP is built on top of IP (see ip(7)). The address formats defined by
80 ip(7) apply to TCP. TCP supports point-to-point communication only;
81 broadcasting and multicasting are not supported.
82
83 /proc interfaces
84 System-wide TCP parameter settings can be accessed by files in the di‐
85 rectory /proc/sys/net/ipv4/. In addition, most IP /proc interfaces
86 also apply to TCP; see ip(7). Variables described as Boolean take an
87 integer value, with a nonzero value ("true") meaning that the corre‐
88 sponding option is enabled, and a zero value ("false") meaning that the
89 option is disabled.
90
91 tcp_abc (Integer; default: 0; Linux 2.6.15 to Linux 3.8)
92 Control the Appropriate Byte Count (ABC), defined in RFC 3465.
93 ABC is a way of increasing the congestion window (cwnd) more
94 slowly in response to partial acknowledgements. Possible values
95 are:
96
97 0 increase cwnd once per acknowledgement (no ABC)
98
99 1 increase cwnd once per acknowledgement of full sized seg‐
100 ment
101
102 2 allow increase cwnd by two if acknowledgement is of two
103 segments to compensate for delayed acknowledgements.
104
105 tcp_abort_on_overflow (Boolean; default: disabled; since Linux 2.4)
106 Enable resetting connections if the listening service is too
107 slow and unable to keep up and accept them. It means that if
108 overflow occurred due to a burst, the connection will recover.
109 Enable this option only if you are really sure that the listen‐
110 ing daemon cannot be tuned to accept connections faster. En‐
111 abling this option can harm the clients of your server.
112
113 tcp_adv_win_scale (integer; default: 2; since Linux 2.4)
114 Count buffering overhead as bytes/2^tcp_adv_win_scale, if
115 tcp_adv_win_scale is greater than 0; or
116 bytes-bytes/2^(-tcp_adv_win_scale), if tcp_adv_win_scale is less
117 than or equal to zero.
118
119 The socket receive buffer space is shared between the applica‐
120 tion and kernel. TCP maintains part of the buffer as the TCP
121 window, this is the size of the receive window advertised to the
122 other end. The rest of the space is used as the "application"
123 buffer, used to isolate the network from scheduling and applica‐
124 tion latencies. The tcp_adv_win_scale default value of 2 im‐
125 plies that the space used for the application buffer is one
126 fourth that of the total.
127
128 tcp_allowed_congestion_control (String; default: see text; since Linux
129 2.4.20)
130 Show/set the congestion control algorithm choices available to
131 unprivileged processes (see the description of the TCP_CONGES‐
132 TION socket option). The items in the list are separated by
133 white space and terminated by a newline character. The list is
134 a subset of those listed in tcp_available_congestion_control.
135 The default value for this list is "reno" plus the default set‐
136 ting of tcp_congestion_control.
137
138 tcp_autocorking (Boolean; default: enabled; since Linux 3.14)
139 If this option is enabled, the kernel tries to coalesce small
140 writes (from consecutive write(2) and sendmsg(2) calls) as much
141 as possible, in order to decrease the total number of sent pack‐
142 ets. Coalescing is done if at least one prior packet for the
143 flow is waiting in Qdisc queues or device transmit queue. Ap‐
144 plications can still use the TCP_CORK socket option to obtain
145 optimal behavior when they know how/when to uncork their sock‐
146 ets.
147
148 tcp_available_congestion_control (String; read-only; since Linux
149 2.4.20)
150 Show a list of the congestion-control algorithms that are regis‐
151 tered. The items in the list are separated by white space and
152 terminated by a newline character. This list is a limiting set
153 for the list in tcp_allowed_congestion_control. More conges‐
154 tion-control algorithms may be available as modules, but not
155 loaded.
156
157 tcp_app_win (integer; default: 31; since Linux 2.4)
158 This variable defines how many bytes of the TCP window are re‐
159 served for buffering overhead.
160
161 A maximum of (window/2^tcp_app_win, mss) bytes in the window are
162 reserved for the application buffer. A value of 0 implies that
163 no amount is reserved.
164
165 tcp_base_mss (Integer; default: 512; since Linux 2.6.17)
166 The initial value of search_low to be used by the packetization
167 layer Path MTU discovery (MTU probing). If MTU probing is en‐
168 abled, this is the initial MSS used by the connection.
169
170 tcp_bic (Boolean; default: disabled; Linux 2.4.27/2.6.6 to Linux
171 2.6.13)
172 Enable BIC TCP congestion control algorithm. BIC-TCP is a
173 sender-side-only change that ensures a linear RTT fairness under
174 large windows while offering both scalability and bounded TCP-
175 friendliness. The protocol combines two schemes called additive
176 increase and binary search increase. When the congestion window
177 is large, additive increase with a large increment ensures lin‐
178 ear RTT fairness as well as good scalability. Under small con‐
179 gestion windows, binary search increase provides TCP friendli‐
180 ness.
181
182 tcp_bic_low_window (integer; default: 14; Linux 2.4.27/2.6.6 to Linux
183 2.6.13)
184 Set the threshold window (in packets) where BIC TCP starts to
185 adjust the congestion window. Below this threshold BIC TCP be‐
186 haves the same as the default TCP Reno.
187
188 tcp_bic_fast_convergence (Boolean; default: enabled; Linux 2.4.27/2.6.6
189 to Linux 2.6.13)
190 Force BIC TCP to more quickly respond to changes in congestion
191 window. Allows two flows sharing the same connection to con‐
192 verge more rapidly.
193
194 tcp_congestion_control (String; default: see text; since Linux 2.4.13)
195 Set the default congestion-control algorithm to be used for new
196 connections. The algorithm "reno" is always available, but ad‐
197 ditional choices may be available depending on kernel configura‐
198 tion. The default value for this file is set as part of kernel
199 configuration.
200
201 tcp_dma_copybreak (integer; default: 4096; since Linux 2.6.24)
202 Lower limit, in bytes, of the size of socket reads that will be
203 offloaded to a DMA copy engine, if one is present in the system
204 and the kernel was configured with the CONFIG_NET_DMA option.
205
206 tcp_dsack (Boolean; default: enabled; since Linux 2.4)
207 Enable RFC 2883 TCP Duplicate SACK support.
208
209 tcp_fastopen (Bitmask; default: 0x1; since Linux 3.7)
210 Enables RFC 7413 Fast Open support. The flag is used as a bit‐
211 map with the following values:
212
213 0x1 Enables client side Fast Open support
214
215 0x2 Enables server side Fast Open support
216
217 0x4 Allows client side to transmit data in SYN without Fast
218 Open option
219
220 0x200 Allows server side to accept SYN data without Fast Open
221 option
222
223 0x400 Enables Fast Open on all listeners without TCP_FASTOPEN
224 socket option
225
226 tcp_fastopen_key (since Linux 3.7)
227 Set server side RFC 7413 Fast Open key to generate Fast Open
228 cookie when server side Fast Open support is enabled.
229
230 tcp_ecn (Integer; default: see below; since Linux 2.4)
231 Enable RFC 3168 Explicit Congestion Notification.
232
233 This file can have one of the following values:
234
235 0 Disable ECN. Neither initiate nor accept ECN. This was
236 the default up to and including Linux 2.6.30.
237
238 1 Enable ECN when requested by incoming connections and
239 also request ECN on outgoing connection attempts.
240
241 2 Enable ECN when requested by incoming connections, but do
242 not request ECN on outgoing connections. This value is
243 supported, and is the default, since Linux 2.6.31.
244
245 When enabled, connectivity to some destinations could be af‐
246 fected due to older, misbehaving middle boxes along the path,
247 causing connections to be dropped. However, to facilitate and
248 encourage deployment with option 1, and to work around such
249 buggy equipment, the tcp_ecn_fallback option has been intro‐
250 duced.
251
252 tcp_ecn_fallback (Boolean; default: enabled; since Linux 4.1)
253 Enable RFC 3168, Section 6.1.1.1. fallback. When enabled, out‐
254 going ECN-setup SYNs that time out within the normal SYN re‐
255 transmission timeout will be resent with CWR and ECE cleared.
256
257 tcp_fack (Boolean; default: enabled; since Linux 2.2)
258 Enable TCP Forward Acknowledgement support.
259
260 tcp_fin_timeout (integer; default: 60; since Linux 2.2)
261 This specifies how many seconds to wait for a final FIN packet
262 before the socket is forcibly closed. This is strictly a viola‐
263 tion of the TCP specification, but required to prevent denial-
264 of-service attacks. In Linux 2.2, the default value was 180.
265
266 tcp_frto (integer; default: see below; since Linux 2.4.21/2.6)
267 Enable F-RTO, an enhanced recovery algorithm for TCP retransmis‐
268 sion timeouts (RTOs). It is particularly beneficial in wireless
269 environments where packet loss is typically due to random radio
270 interference rather than intermediate router congestion. See
271 RFC 4138 for more details.
272
273 This file can have one of the following values:
274
275 0 Disabled. This was the default up to and including Linux
276 2.6.23.
277
278 1 The basic version F-RTO algorithm is enabled.
279
280 2 Enable SACK-enhanced F-RTO if flow uses SACK. The basic
281 version can be used also when SACK is in use though in
282 that case scenario(s) exists where F-RTO interacts badly
283 with the packet counting of the SACK-enabled TCP flow.
284 This value is the default since Linux 2.6.24.
285
286 Before Linux 2.6.22, this parameter was a Boolean value, sup‐
287 porting just values 0 and 1 above.
288
289 tcp_frto_response (integer; default: 0; since Linux 2.6.22)
290 When F-RTO has detected that a TCP retransmission timeout was
291 spurious (i.e., the timeout would have been avoided had TCP set
292 a longer retransmission timeout), TCP has several options con‐
293 cerning what to do next. Possible values are:
294
295 0 Rate halving based; a smooth and conservative response,
296 results in halved congestion window (cwnd) and slow-start
297 threshold (ssthresh) after one RTT.
298
299 1 Very conservative response; not recommended because even
300 though being valid, it interacts poorly with the rest of
301 Linux TCP; halves cwnd and ssthresh immediately.
302
303 2 Aggressive response; undoes congestion-control measures
304 that are now known to be unnecessary (ignoring the possi‐
305 bility of a lost retransmission that would require TCP to
306 be more cautious); cwnd and ssthresh are restored to the
307 values prior to timeout.
308
309 tcp_keepalive_intvl (integer; default: 75; since Linux 2.4)
310 The number of seconds between TCP keep-alive probes.
311
312 tcp_keepalive_probes (integer; default: 9; since Linux 2.2)
313 The maximum number of TCP keep-alive probes to send before giv‐
314 ing up and killing the connection if no response is obtained
315 from the other end.
316
317 tcp_keepalive_time (integer; default: 7200; since Linux 2.2)
318 The number of seconds a connection needs to be idle before TCP
319 begins sending out keep-alive probes. Keep-alives are sent only
320 when the SO_KEEPALIVE socket option is enabled. The default
321 value is 7200 seconds (2 hours). An idle connection is termi‐
322 nated after approximately an additional 11 minutes (9 probes an
323 interval of 75 seconds apart) when keep-alive is enabled.
324
325 Note that underlying connection tracking mechanisms and applica‐
326 tion timeouts may be much shorter.
327
328 tcp_low_latency (Boolean; default: disabled; since Linux 2.4.21/2.6;
329 obsolete since Linux 4.14)
330 If enabled, the TCP stack makes decisions that prefer lower la‐
331 tency as opposed to higher throughput. It this option is dis‐
332 abled, then higher throughput is preferred. An example of an
333 application where this default should be changed would be a Be‐
334 owulf compute cluster. Since Linux 4.14, this file still ex‐
335 ists, but its value is ignored.
336
337 tcp_max_orphans (integer; default: see below; since Linux 2.4)
338 The maximum number of orphaned (not attached to any user file
339 handle) TCP sockets allowed in the system. When this number is
340 exceeded, the orphaned connection is reset and a warning is
341 printed. This limit exists only to prevent simple denial-of-
342 service attacks. Lowering this limit is not recommended. Net‐
343 work conditions might require you to increase the number of or‐
344 phans allowed, but note that each orphan can eat up to ~64 kB of
345 unswappable memory. The default initial value is set equal to
346 the kernel parameter NR_FILE. This initial default is adjusted
347 depending on the memory in the system.
348
349 tcp_max_syn_backlog (integer; default: see below; since Linux 2.2)
350 The maximum number of queued connection requests which have
351 still not received an acknowledgement from the connecting
352 client. If this number is exceeded, the kernel will begin drop‐
353 ping requests. The default value of 256 is increased to 1024
354 when the memory present in the system is adequate or greater (>=
355 128 MB), and reduced to 128 for those systems with very low mem‐
356 ory (<= 32 MB).
357
358 Before Linux 2.6.20, it was recommended that if this needed to
359 be increased above 1024, the size of the SYNACK hash table
360 (TCP_SYNQ_HSIZE) in include/net/tcp.h should be modified to keep
361
362 TCP_SYNQ_HSIZE * 16 <= tcp_max_syn_backlog
363
364 and the kernel should be recompiled. In Linux 2.6.20, the fixed
365 sized TCP_SYNQ_HSIZE was removed in favor of dynamic sizing.
366
367 tcp_max_tw_buckets (integer; default: see below; since Linux 2.4)
368 The maximum number of sockets in TIME_WAIT state allowed in the
369 system. This limit exists only to prevent simple denial-of-ser‐
370 vice attacks. The default value of NR_FILE*2 is adjusted de‐
371 pending on the memory in the system. If this number is ex‐
372 ceeded, the socket is closed and a warning is printed.
373
374 tcp_moderate_rcvbuf (Boolean; default: enabled; since Linux
375 2.4.17/2.6.7)
376 If enabled, TCP performs receive buffer auto-tuning, attempting
377 to automatically size the buffer (no greater than tcp_rmem[2])
378 to match the size required by the path for full throughput.
379
380 tcp_mem (since Linux 2.4)
381 This is a vector of 3 integers: [low, pressure, high]. These
382 bounds, measured in units of the system page size, are used by
383 TCP to track its memory usage. The defaults are calculated at
384 boot time from the amount of available memory. (TCP can only
385 use low memory for this, which is limited to around 900
386 megabytes on 32-bit systems. 64-bit systems do not suffer this
387 limitation.)
388
389 low TCP doesn't regulate its memory allocation when the num‐
390 ber of pages it has allocated globally is below this num‐
391 ber.
392
393 pressure
394 When the amount of memory allocated by TCP exceeds this
395 number of pages, TCP moderates its memory consumption.
396 This memory pressure state is exited once the number of
397 pages allocated falls below the low mark.
398
399 high The maximum number of pages, globally, that TCP will al‐
400 locate. This value overrides any other limits imposed by
401 the kernel.
402
403 tcp_mtu_probing (integer; default: 0; since Linux 2.6.17)
404 This parameter controls TCP Packetization-Layer Path MTU Discov‐
405 ery. The following values may be assigned to the file:
406
407 0 Disabled
408
409 1 Disabled by default, enabled when an ICMP black hole de‐
410 tected
411
412 2 Always enabled, use initial MSS of tcp_base_mss.
413
414 tcp_no_metrics_save (Boolean; default: disabled; since Linux 2.6.6)
415 By default, TCP saves various connection metrics in the route
416 cache when the connection closes, so that connections estab‐
417 lished in the near future can use these to set initial condi‐
418 tions. Usually, this increases overall performance, but it may
419 sometimes cause performance degradation. If tcp_no_metrics_save
420 is enabled, TCP will not cache metrics on closing connections.
421
422 tcp_orphan_retries (integer; default: 8; since Linux 2.4)
423 The maximum number of attempts made to probe the other end of a
424 connection which has been closed by our end.
425
426 tcp_reordering (integer; default: 3; since Linux 2.4)
427 The maximum a packet can be reordered in a TCP packet stream
428 without TCP assuming packet loss and going into slow start. It
429 is not advisable to change this number. This is a packet re‐
430 ordering detection metric designed to minimize unnecessary back
431 off and retransmits provoked by reordering of packets on a con‐
432 nection.
433
434 tcp_retrans_collapse (Boolean; default: enabled; since Linux 2.2)
435 Try to send full-sized packets during retransmit.
436
437 tcp_retries1 (integer; default: 3; since Linux 2.2)
438 The number of times TCP will attempt to retransmit a packet on
439 an established connection normally, without the extra effort of
440 getting the network layers involved. Once we exceed this number
441 of retransmits, we first have the network layer update the route
442 if possible before each new retransmit. The default is the RFC
443 specified minimum of 3.
444
445 tcp_retries2 (integer; default: 15; since Linux 2.2)
446 The maximum number of times a TCP packet is retransmitted in es‐
447 tablished state before giving up. The default value is 15,
448 which corresponds to a duration of approximately between 13 to
449 30 minutes, depending on the retransmission timeout. The
450 RFC 1122 specified minimum limit of 100 seconds is typically
451 deemed too short.
452
453 tcp_rfc1337 (Boolean; default: disabled; since Linux 2.2)
454 Enable TCP behavior conformant with RFC 1337. When disabled, if
455 a RST is received in TIME_WAIT state, we close the socket imme‐
456 diately without waiting for the end of the TIME_WAIT period.
457
458 tcp_rmem (since Linux 2.4)
459 This is a vector of 3 integers: [min, default, max]. These pa‐
460 rameters are used by TCP to regulate receive buffer sizes. TCP
461 dynamically adjusts the size of the receive buffer from the de‐
462 faults listed below, in the range of these values, depending on
463 memory available in the system.
464
465 min minimum size of the receive buffer used by each TCP
466 socket. The default value is the system page size. (On
467 Linux 2.4, the default value is 4 kB, lowered to
468 PAGE_SIZE bytes in low-memory systems.) This value is
469 used to ensure that in memory pressure mode, allocations
470 below this size will still succeed. This is not used to
471 bound the size of the receive buffer declared using
472 SO_RCVBUF on a socket.
473
474 default
475 the default size of the receive buffer for a TCP socket.
476 This value overwrites the initial default buffer size
477 from the generic global net.core.rmem_default defined for
478 all protocols. The default value is 87380 bytes. (On
479 Linux 2.4, this will be lowered to 43689 in low-memory
480 systems.) If larger receive buffer sizes are desired,
481 this value should be increased (to affect all sockets).
482 To employ large TCP windows, the net.ipv4.tcp_win‐
483 dow_scaling must be enabled (default).
484
485 max the maximum size of the receive buffer used by each TCP
486 socket. This value does not override the global
487 net.core.rmem_max. This is not used to limit the size of
488 the receive buffer declared using SO_RCVBUF on a socket.
489 The default value is calculated using the formula
490
491 max(87380, min(4 MB, tcp_mem[1]*PAGE_SIZE/128))
492
493 (On Linux 2.4, the default is 87380*2 bytes, lowered to
494 87380 in low-memory systems).
495
496 tcp_sack (Boolean; default: enabled; since Linux 2.2)
497 Enable RFC 2018 TCP Selective Acknowledgements.
498
499 tcp_slow_start_after_idle (Boolean; default: enabled; since Linux
500 2.6.18)
501 If enabled, provide RFC 2861 behavior and time out the conges‐
502 tion window after an idle period. An idle period is defined as
503 the current RTO (retransmission timeout). If disabled, the con‐
504 gestion window will not be timed out after an idle period.
505
506 tcp_stdurg (Boolean; default: disabled; since Linux 2.2)
507 If this option is enabled, then use the RFC 1122 interpretation
508 of the TCP urgent-pointer field. According to this interpreta‐
509 tion, the urgent pointer points to the last byte of urgent data.
510 If this option is disabled, then use the BSD-compatible inter‐
511 pretation of the urgent pointer: the urgent pointer points to
512 the first byte after the urgent data. Enabling this option may
513 lead to interoperability problems.
514
515 tcp_syn_retries (integer; default: 6; since Linux 2.2)
516 The maximum number of times initial SYNs for an active TCP con‐
517 nection attempt will be retransmitted. This value should not be
518 higher than 255. The default value is 6, which corresponds to
519 retrying for up to approximately 127 seconds. Before Linux 3.7,
520 the default value was 5, which (in conjunction with calculation
521 based on other kernel parameters) corresponded to approximately
522 180 seconds.
523
524 tcp_synack_retries (integer; default: 5; since Linux 2.2)
525 The maximum number of times a SYN/ACK segment for a passive TCP
526 connection will be retransmitted. This number should not be
527 higher than 255.
528
529 tcp_syncookies (integer; default: 1; since Linux 2.2)
530 Enable TCP syncookies. The kernel must be compiled with CON‐
531 FIG_SYN_COOKIES. The syncookies feature attempts to protect a
532 socket from a SYN flood attack. This should be used as a last
533 resort, if at all. This is a violation of the TCP protocol, and
534 conflicts with other areas of TCP such as TCP extensions. It
535 can cause problems for clients and relays. It is not recom‐
536 mended as a tuning mechanism for heavily loaded servers to help
537 with overloaded or misconfigured conditions. For recommended
538 alternatives see tcp_max_syn_backlog, tcp_synack_retries, and
539 tcp_abort_on_overflow. Set to one of the following values:
540
541 0 Disable TCP syncookies.
542
543 1 Send out syncookies when the syn backlog queue of a
544 socket overflows.
545
546 2 (since Linux 3.12) Send out syncookies unconditionally.
547 This can be useful for network testing.
548
549 tcp_timestamps (integer; default: 1; since Linux 2.2)
550 Set to one of the following values to enable or disable RFC 1323
551 TCP timestamps:
552
553 0 Disable timestamps.
554
555 1 Enable timestamps as defined in RFC1323 and use random
556 offset for each connection rather than only using the
557 current time.
558
559 2 As for the value 1, but without random offsets. Setting
560 tcp_timestamps to this value is meaningful since Linux
561 4.10.
562
563 tcp_tso_win_divisor (integer; default: 3; since Linux 2.6.9)
564 This parameter controls what percentage of the congestion window
565 can be consumed by a single TCP Segmentation Offload (TSO)
566 frame. The setting of this parameter is a tradeoff between
567 burstiness and building larger TSO frames.
568
569 tcp_tw_recycle (Boolean; default: disabled; Linux 2.4 to Linux 4.11)
570 Enable fast recycling of TIME_WAIT sockets. Enabling this op‐
571 tion is not recommended as the remote IP may not use monotoni‐
572 cally increasing timestamps (devices behind NAT, devices with
573 per-connection timestamp offsets). See RFC 1323 (PAWS) and RFC
574 6191.
575
576 tcp_tw_reuse (Boolean; default: disabled; since Linux 2.4.19/2.6)
577 Allow to reuse TIME_WAIT sockets for new connections when it is
578 safe from protocol viewpoint. It should not be changed without
579 advice/request of technical experts.
580
581 tcp_vegas_cong_avoid (Boolean; default: disabled; Linux 2.2 to Linux
582 2.6.13)
583 Enable TCP Vegas congestion avoidance algorithm. TCP Vegas is a
584 sender-side-only change to TCP that anticipates the onset of
585 congestion by estimating the bandwidth. TCP Vegas adjusts the
586 sending rate by modifying the congestion window. TCP Vegas
587 should provide less packet loss, but it is not as aggressive as
588 TCP Reno.
589
590 tcp_westwood (Boolean; default: disabled; Linux 2.4.26/2.6.3 to Linux
591 2.6.13)
592 Enable TCP Westwood+ congestion control algorithm. TCP West‐
593 wood+ is a sender-side-only modification of the TCP Reno proto‐
594 col stack that optimizes the performance of TCP congestion con‐
595 trol. It is based on end-to-end bandwidth estimation to set
596 congestion window and slow start threshold after a congestion
597 episode. Using this estimation, TCP Westwood+ adaptively sets a
598 slow start threshold and a congestion window which takes into
599 account the bandwidth used at the time congestion is experi‐
600 enced. TCP Westwood+ significantly increases fairness with re‐
601 spect to TCP Reno in wired networks and throughput over wireless
602 links.
603
604 tcp_window_scaling (Boolean; default: enabled; since Linux 2.2)
605 Enable RFC 1323 TCP window scaling. This feature allows the use
606 of a large window (> 64 kB) on a TCP connection, should the
607 other end support it. Normally, the 16 bit window length field
608 in the TCP header limits the window size to less than 64 kB. If
609 larger windows are desired, applications can increase the size
610 of their socket buffers and the window scaling option will be
611 employed. If tcp_window_scaling is disabled, TCP will not nego‐
612 tiate the use of window scaling with the other end during con‐
613 nection setup.
614
615 tcp_wmem (since Linux 2.4)
616 This is a vector of 3 integers: [min, default, max]. These pa‐
617 rameters are used by TCP to regulate send buffer sizes. TCP dy‐
618 namically adjusts the size of the send buffer from the default
619 values listed below, in the range of these values, depending on
620 memory available.
621
622 min Minimum size of the send buffer used by each TCP socket.
623 The default value is the system page size. (On Linux
624 2.4, the default value is 4 kB.) This value is used to
625 ensure that in memory pressure mode, allocations below
626 this size will still succeed. This is not used to bound
627 the size of the send buffer declared using SO_SNDBUF on a
628 socket.
629
630 default
631 The default size of the send buffer for a TCP socket.
632 This value overwrites the initial default buffer size
633 from the generic global /proc/sys/net/core/wmem_default
634 defined for all protocols. The default value is 16 kB.
635 If larger send buffer sizes are desired, this value
636 should be increased (to affect all sockets). To employ
637 large TCP windows, the /proc/sys/net/ipv4/tcp_win‐
638 dow_scaling must be set to a nonzero value (default).
639
640 max The maximum size of the send buffer used by each TCP
641 socket. This value does not override the value in
642 /proc/sys/net/core/wmem_max. This is not used to limit
643 the size of the send buffer declared using SO_SNDBUF on a
644 socket. The default value is calculated using the for‐
645 mula
646
647 max(65536, min(4 MB, tcp_mem[1]*PAGE_SIZE/128))
648
649 (On Linux 2.4, the default value is 128 kB, lowered 64 kB
650 depending on low-memory systems.)
651
652 tcp_workaround_signed_windows (Boolean; default: disabled; since Linux
653 2.6.26)
654 If enabled, assume that no receipt of a window-scaling option
655 means that the remote TCP is broken and treats the window as a
656 signed quantity. If disabled, assume that the remote TCP is not
657 broken even if we do not receive a window scaling option from
658 it.
659
660 Socket options
661 To set or get a TCP socket option, call getsockopt(2) to read or set‐
662 sockopt(2) to write the option with the option level argument set to
663 IPPROTO_TCP. Unless otherwise noted, optval is a pointer to an int.
664 In addition, most IPPROTO_IP socket options are valid on TCP sockets.
665 For more information see ip(7).
666
667 Following is a list of TCP-specific socket options. For details of
668 some other socket options that are also applicable for TCP sockets, see
669 socket(7).
670
671 TCP_CONGESTION (since Linux 2.6.13)
672 The argument for this option is a string. This option allows
673 the caller to set the TCP congestion control algorithm to be
674 used, on a per-socket basis. Unprivileged processes are re‐
675 stricted to choosing one of the algorithms in tcp_allowed_con‐
676 gestion_control (described above). Privileged processes
677 (CAP_NET_ADMIN) can choose from any of the available congestion-
678 control algorithms (see the description of tcp_available_conges‐
679 tion_control above).
680
681 TCP_CORK (since Linux 2.2)
682 If set, don't send out partial frames. All queued partial
683 frames are sent when the option is cleared again. This is use‐
684 ful for prepending headers before calling sendfile(2), or for
685 throughput optimization. As currently implemented, there is a
686 200 millisecond ceiling on the time for which output is corked
687 by TCP_CORK. If this ceiling is reached, then queued data is
688 automatically transmitted. This option can be combined with
689 TCP_NODELAY only since Linux 2.5.71. This option should not be
690 used in code intended to be portable.
691
692 TCP_DEFER_ACCEPT (since Linux 2.4)
693 Allow a listener to be awakened only when data arrives on the
694 socket. Takes an integer value (seconds), this can bound the
695 maximum number of attempts TCP will make to complete the connec‐
696 tion. This option should not be used in code intended to be
697 portable.
698
699 TCP_INFO (since Linux 2.4)
700 Used to collect information about this socket. The kernel re‐
701 turns a struct tcp_info as defined in the file /usr/in‐
702 clude/linux/tcp.h. This option should not be used in code in‐
703 tended to be portable.
704
705 TCP_KEEPCNT (since Linux 2.4)
706 The maximum number of keepalive probes TCP should send before
707 dropping the connection. This option should not be used in code
708 intended to be portable.
709
710 TCP_KEEPIDLE (since Linux 2.4)
711 The time (in seconds) the connection needs to remain idle before
712 TCP starts sending keepalive probes, if the socket option
713 SO_KEEPALIVE has been set on this socket. This option should
714 not be used in code intended to be portable.
715
716 TCP_KEEPINTVL (since Linux 2.4)
717 The time (in seconds) between individual keepalive probes. This
718 option should not be used in code intended to be portable.
719
720 TCP_LINGER2 (since Linux 2.4)
721 The lifetime of orphaned FIN_WAIT2 state sockets. This option
722 can be used to override the system-wide setting in the file
723 /proc/sys/net/ipv4/tcp_fin_timeout for this socket. This is not
724 to be confused with the socket(7) level option SO_LINGER. This
725 option should not be used in code intended to be portable.
726
727 TCP_MAXSEG
728 The maximum segment size for outgoing TCP packets. In Linux 2.2
729 and earlier, and in Linux 2.6.28 and later, if this option is
730 set before connection establishment, it also changes the MSS
731 value announced to the other end in the initial packet. Values
732 greater than the (eventual) interface MTU have no effect. TCP
733 will also impose its minimum and maximum bounds over the value
734 provided.
735
736 TCP_NODELAY
737 If set, disable the Nagle algorithm. This means that segments
738 are always sent as soon as possible, even if there is only a
739 small amount of data. When not set, data is buffered until
740 there is a sufficient amount to send out, thereby avoiding the
741 frequent sending of small packets, which results in poor uti‐
742 lization of the network. This option is overridden by TCP_CORK;
743 however, setting this option forces an explicit flush of pending
744 output, even if TCP_CORK is currently set.
745
746 TCP_QUICKACK (since Linux 2.4.4)
747 Enable quickack mode if set or disable quickack mode if cleared.
748 In quickack mode, acks are sent immediately, rather than delayed
749 if needed in accordance to normal TCP operation. This flag is
750 not permanent, it only enables a switch to or from quickack
751 mode. Subsequent operation of the TCP protocol will once again
752 enter/leave quickack mode depending on internal protocol pro‐
753 cessing and factors such as delayed ack timeouts occurring and
754 data transfer. This option should not be used in code intended
755 to be portable.
756
757 TCP_SYNCNT (since Linux 2.4)
758 Set the number of SYN retransmits that TCP should send before
759 aborting the attempt to connect. It cannot exceed 255. This
760 option should not be used in code intended to be portable.
761
762 TCP_USER_TIMEOUT (since Linux 2.6.37)
763 This option takes an unsigned int as an argument. When the
764 value is greater than 0, it specifies the maximum amount of time
765 in milliseconds that transmitted data may remain unacknowledged,
766 or buffered data may remain untransmitted (due to zero window
767 size) before TCP will forcibly close the corresponding connec‐
768 tion and return ETIMEDOUT to the application. If the option
769 value is specified as 0, TCP will use the system default.
770
771 Increasing user timeouts allows a TCP connection to survive ex‐
772 tended periods without end-to-end connectivity. Decreasing user
773 timeouts allows applications to "fail fast", if so desired.
774 Otherwise, failure may take up to 20 minutes with the current
775 system defaults in a normal WAN environment.
776
777 This option can be set during any state of a TCP connection, but
778 is effective only during the synchronized states of a connection
779 (ESTABLISHED, FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, and
780 LAST-ACK). Moreover, when used with the TCP keepalive
781 (SO_KEEPALIVE) option, TCP_USER_TIMEOUT will override keepalive
782 to determine when to close a connection due to keepalive fail‐
783 ure.
784
785 The option has no effect on when TCP retransmits a packet, nor
786 when a keepalive probe is sent.
787
788 This option, like many others, will be inherited by the socket
789 returned by accept(2), if it was set on the listening socket.
790
791 Further details on the user timeout feature can be found in
792 RFC 793 and RFC 5482 ("TCP User Timeout Option").
793
794 TCP_WINDOW_CLAMP (since Linux 2.4)
795 Bound the size of the advertised window to this value. The ker‐
796 nel imposes a minimum size of SOCK_MIN_RCVBUF/2. This option
797 should not be used in code intended to be portable.
798
799 TCP_FASTOPEN (since Linux 3.6)
800 This option enables Fast Open (RFC 7413) on the listener socket.
801 The value specifies the maximum length of pending SYNs (similar
802 to the backlog argument in listen(2)). Once enabled, the lis‐
803 tener socket grants the TCP Fast Open cookie on incoming SYN
804 with TCP Fast Open option.
805
806 More importantly it accepts the data in SYN with a valid Fast
807 Open cookie and responds SYN-ACK acknowledging both the data and
808 the SYN sequence. accept(2) returns a socket that is available
809 for read and write when the handshake has not completed yet.
810 Thus the data exchange can commence before the handshake com‐
811 pletes. This option requires enabling the server-side support
812 on sysctl net.ipv4.tcp_fastopen (see above). For TCP Fast Open
813 client-side support, see send(2) MSG_FASTOPEN or
814 TCP_FASTOPEN_CONNECT below.
815
816 TCP_FASTOPEN_CONNECT (since Linux 4.11)
817 This option enables an alternative way to perform Fast Open on
818 the active side (client). When this option is enabled, con‐
819 nect(2) would behave differently depending on if a Fast Open
820 cookie is available for the destination.
821
822 If a cookie is not available (i.e. first contact to the destina‐
823 tion), connect(2) behaves as usual by sending a SYN immediately,
824 except the SYN would include an empty Fast Open cookie option to
825 solicit a cookie.
826
827 If a cookie is available, connect(2) would return 0 immediately
828 but the SYN transmission is deferred. A subsequent write(2) or
829 sendmsg(2) would trigger a SYN with data plus cookie in the Fast
830 Open option. In other words, the actual connect operation is
831 deferred until data is supplied.
832
833 Note: While this option is designed for convenience, enabling it
834 does change the behaviors and certain system calls might set
835 different errno values. With cookie present, write(2) or
836 sendmsg(2) must be called right after connect(2) in order to
837 send out SYN+data to complete 3WHS and establish connection.
838 Calling read(2) right after connect(2) without write(2) will
839 cause the blocking socket to be blocked forever.
840
841 The application should either set TCP_FASTOPEN_CONNECT socket
842 option before write(2) or sendmsg(2), or call write(2) or
843 sendmsg(2) with MSG_FASTOPEN flag directly, instead of both on
844 the same connection.
845
846 Here is the typical call flow with this new option:
847
848 s = socket();
849 setsockopt(s, IPPROTO_TCP, TCP_FASTOPEN_CONNECT, 1, ...);
850 connect(s);
851 write(s); /* write() should always follow connect()
852 * in order to trigger SYN to go out. */
853 read(s)/write(s);
854 /* ... */
855 close(s);
856
857 Sockets API
858 TCP provides limited support for out-of-band data, in the form of (a
859 single byte of) urgent data. In Linux this means if the other end
860 sends newer out-of-band data the older urgent data is inserted as nor‐
861 mal data into the stream (even when SO_OOBINLINE is not set). This
862 differs from BSD-based stacks.
863
864 Linux uses the BSD compatible interpretation of the urgent pointer
865 field by default. This violates RFC 1122, but is required for interop‐
866 erability with other stacks. It can be changed via
867 /proc/sys/net/ipv4/tcp_stdurg.
868
869 It is possible to peek at out-of-band data using the recv(2) MSG_PEEK
870 flag.
871
872 Since Linux 2.4, Linux supports the use of MSG_TRUNC in the flags argu‐
873 ment of recv(2) (and recvmsg(2)). This flag causes the received bytes
874 of data to be discarded, rather than passed back in a caller-supplied
875 buffer. Since Linux 2.4.4, MSG_TRUNC also has this effect when used in
876 conjunction with MSG_OOB to receive out-of-band data.
877
878 Ioctls
879 The following ioctl(2) calls return information in value. The correct
880 syntax is:
881
882 int value;
883 error = ioctl(tcp_socket, ioctl_type, &value);
884
885 ioctl_type is one of the following:
886
887 SIOCINQ
888 Returns the amount of queued unread data in the receive buffer.
889 The socket must not be in LISTEN state, otherwise an error (EIN‐
890 VAL) is returned. SIOCINQ is defined in <linux/sockios.h>. Al‐
891 ternatively, you can use the synonymous FIONREAD, defined in
892 <sys/ioctl.h>.
893
894 SIOCATMARK
895 Returns true (i.e., value is nonzero) if the inbound data stream
896 is at the urgent mark.
897
898 If the SO_OOBINLINE socket option is set, and SIOCATMARK returns
899 true, then the next read from the socket will return the urgent
900 data. If the SO_OOBINLINE socket option is not set, and SIOCAT‐
901 MARK returns true, then the next read from the socket will re‐
902 turn the bytes following the urgent data (to actually read the
903 urgent data requires the recv(MSG_OOB) flag).
904
905 Note that a read never reads across the urgent mark. If an ap‐
906 plication is informed of the presence of urgent data via se‐
907 lect(2) (using the exceptfds argument) or through delivery of a
908 SIGURG signal, then it can advance up to the mark using a loop
909 which repeatedly tests SIOCATMARK and performs a read (request‐
910 ing any number of bytes) as long as SIOCATMARK returns false.
911
912 SIOCOUTQ
913 Returns the amount of unsent data in the socket send queue. The
914 socket must not be in LISTEN state, otherwise an error (EINVAL)
915 is returned. SIOCOUTQ is defined in <linux/sockios.h>. Alter‐
916 natively, you can use the synonymous TIOCOUTQ, defined in
917 <sys/ioctl.h>.
918
919 Error handling
920 When a network error occurs, TCP tries to resend the packet. If it
921 doesn't succeed after some time, either ETIMEDOUT or the last received
922 error on this connection is reported.
923
924 Some applications require a quicker error notification. This can be
925 enabled with the IPPROTO_IP level IP_RECVERR socket option. When this
926 option is enabled, all incoming errors are immediately passed to the
927 user program. Use this option with care — it makes TCP less tolerant
928 to routing changes and other normal network conditions.
929
931 EAFNOTSUPPORT
932 Passed socket address type in sin_family was not AF_INET.
933
934 EPIPE The other end closed the socket unexpectedly or a read is exe‐
935 cuted on a shut down socket.
936
937 ETIMEDOUT
938 The other end didn't acknowledge retransmitted data after some
939 time.
940
941 Any errors defined for ip(7) or the generic socket layer may also be
942 returned for TCP.
943
945 Support for Explicit Congestion Notification, zero-copy sendfile(2),
946 reordering support and some SACK extensions (DSACK) were introduced in
947 Linux 2.4. Support for forward acknowledgement (FACK), TIME_WAIT recy‐
948 cling, and per-connection keepalive socket options were introduced in
949 Linux 2.3.
950
952 Not all errors are documented.
953
954 IPv6 is not described.
955
957 accept(2), bind(2), connect(2), getsockopt(2), listen(2), recvmsg(2),
958 sendfile(2), sendmsg(2), socket(2), ip(7), socket(7)
959
960 The kernel source file Documentation/networking/ip-sysctl.txt.
961
962 RFC 793 for the TCP specification.
963 RFC 1122 for the TCP requirements and a description of the Nagle algo‐
964 rithm.
965 RFC 1323 for TCP timestamp and window scaling options.
966 RFC 1337 for a description of TIME_WAIT assassination hazards.
967 RFC 3168 for a description of Explicit Congestion Notification.
968 RFC 2581 for TCP congestion control algorithms.
969 RFC 2018 and RFC 2883 for SACK and extensions to SACK.
970
971
972
973Linux man-pages 6.05 2023-07-15 tcp(7)