1TCP(7) Linux Programmer's Manual TCP(7)
2
3
4
6 tcp - TCP protocol
7
9 #include <sys/socket.h>
10 #include <netinet/in.h>
11 #include <netinet/tcp.h>
12
13 tcp_socket = socket(AF_INET, SOCK_STREAM, 0);
14
16 This is an implementation of the TCP protocol defined in RFC 793,
17 RFC 1122 and RFC 2001 with the NewReno and SACK extensions. It pro‐
18 vides a reliable, stream-oriented, full-duplex connection between two
19 sockets on top of ip(7), for both v4 and v6 versions. TCP guarantees
20 that the data arrives in order and retransmits lost packets. It gener‐
21 ates and checks a per-packet checksum to catch transmission errors.
22 TCP does not preserve record boundaries.
23
24 A newly created TCP socket has no remote or local address and is not
25 fully specified. To create an outgoing TCP connection use connect(2)
26 to establish a connection to another TCP socket. To receive new incom‐
27 ing connections, first bind(2) the socket to a local address and port
28 and then call listen(2) to put the socket into the listening state.
29 After that a new socket for each incoming connection can be accepted
30 using accept(2). A socket which has had accept(2) or connect(2) suc‐
31 cessfully called on it is fully specified and may transmit data. Data
32 cannot be transmitted on listening or not yet connected sockets.
33
34 Linux supports RFC 1323 TCP high performance extensions. These include
35 Protection Against Wrapped Sequence Numbers (PAWS), Window Scaling and
36 Timestamps. Window scaling allows the use of large (> 64 kB) TCP win‐
37 dows in order to support links with high latency or bandwidth. To make
38 use of them, the send and receive buffer sizes must be increased. They
39 can be set globally with the /proc/sys/net/ipv4/tcp_wmem and
40 /proc/sys/net/ipv4/tcp_rmem files, or on individual sockets by using
41 the SO_SNDBUF and SO_RCVBUF socket options with the setsockopt(2) call.
42
43 The maximum sizes for socket buffers declared via the SO_SNDBUF and
44 SO_RCVBUF mechanisms are limited by the values in the
45 /proc/sys/net/core/rmem_max and /proc/sys/net/core/wmem_max files.
46 Note that TCP actually allocates twice the size of the buffer requested
47 in the setsockopt(2) call, and so a succeeding getsockopt(2) call will
48 not return the same size of buffer as requested in the setsockopt(2)
49 call. TCP uses the extra space for administrative purposes and inter‐
50 nal kernel structures, and the /proc file values reflect the larger
51 sizes compared to the actual TCP windows. On individual connections,
52 the socket buffer size must be set prior to the listen(2) or connect(2)
53 calls in order to have it take effect. See socket(7) for more informa‐
54 tion.
55
56 TCP supports urgent data. Urgent data is used to signal the receiver
57 that some important message is part of the data stream and that it
58 should be processed as soon as possible. To send urgent data specify
59 the MSG_OOB option to send(2). When urgent data is received, the ker‐
60 nel sends a SIGURG signal to the process or process group that has been
61 set as the socket "owner" using the SIOCSPGRP or FIOSETOWN ioctls (or
62 the POSIX.1-specified fcntl(2) F_SETOWN operation). When the SO_OOBIN‐
63 LINE socket option is enabled, urgent data is put into the normal data
64 stream (a program can test for its location using the SIOCATMARK ioctl
65 described below), otherwise it can be received only when the MSG_OOB
66 flag is set for recv(2) or recvmsg(2).
67
68 When out-of-band data is present, select(2) indicates the file descrip‐
69 tor as having an exceptional condition and poll [4m(2) indicates a POLLPRI
70 event.
71
72 Linux 2.4 introduced a number of changes for improved throughput and
73 scaling, as well as enhanced functionality. Some of these features
74 include support for zero-copy sendfile(2), Explicit Congestion Notifi‐
75 cation, new management of TIME_WAIT sockets, keep-alive socket options
76 and support for Duplicate SACK extensions.
77
78 Address formats
79 TCP is built on top of IP (see ip(7)). The address formats defined by
80 ip(7) apply to TCP. TCP supports point-to-point communication only;
81 broadcasting and multicasting are not supported.
82
83 /proc interfaces
84 System-wide TCP parameter settings can be accessed by files in the
85 directory /proc/sys/net/ipv4/. In addition, most IP /proc interfaces
86 also apply to TCP; see ip(7). Variables described as Boolean take an
87 integer value, with a nonzero value ("true") meaning that the corre‐
88 sponding option is enabled, and a zero value ("false") meaning that the
89 option is disabled.
90
91 tcp_abc (Integer; default: 0; Linux 2.6.15 to Linux 3.8)
92 Control the Appropriate Byte Count (ABC), defined in RFC 3465.
93 ABC is a way of increasing the congestion window (cwnd) more
94 slowly in response to partial acknowledgments. Possible values
95 are:
96
97 0 increase cwnd once per acknowledgment (no ABC)
98
99 1 increase cwnd once per acknowledgment of full sized segment
100
101 2 allow increase cwnd by two if acknowledgment is of two seg‐
102 ments to compensate for delayed acknowledgments.
103
104 tcp_abort_on_overflow (Boolean; default: disabled; since Linux 2.4)
105 Enable resetting connections if the listening service is too
106 slow and unable to keep up and accept them. It means that if
107 overflow occurred due to a burst, the connection will recover.
108 Enable this option only if you are really sure that the listen‐
109 ing daemon cannot be tuned to accept connections faster.
110 Enabling this option can harm the clients of your server.
111
112 tcp_adv_win_scale (integer; default: 2; since Linux 2.4)
113 Count buffering overhead as bytes/2^tcp_adv_win_scale, if
114 tcp_adv_win_scale is greater than 0; or bytes-
115 bytes/2^(-tcp_adv_win_scale), if tcp_adv_win_scale is less than
116 or equal to zero.
117
118 The socket receive buffer space is shared between the applica‐
119 tion and kernel. TCP maintains part of the buffer as the TCP
120 window, this is the size of the receive window advertised to the
121 other end. The rest of the space is used as the "application"
122 buffer, used to isolate the network from scheduling and applica‐
123 tion latencies. The tcp_adv_win_scale default value of 2
124 implies that the space used for the application buffer is one
125 fourth that of the total.
126
127 tcp_allowed_congestion_control (String; default: see text; since Linux
128 2.4.20)
129 Show/set the congestion control algorithm choices available to
130 unprivileged processes (see the description of the TCP_CONGES‐
131 TION socket option). The items in the list are separated by
132 white space and terminated by a newline character. The list is
133 a subset of those listed in tcp_available_congestion_control.
134 The default value for this list is "reno" plus the default set‐
135 ting of tcp_congestion_control.
136
137 tcp_autocorking (Boolean; default: enabled; since Linux 3.14)
138 If this option is enabled, the kernel tries to coalesce small
139 writes (from consecutive write(2) and sendmsg(2) calls) as much
140 as possible, in order to decrease the total number of sent pack‐
141 ets. Coalescing is done if at least one prior packet for the
142 flow is waiting in Qdisc queues or device transmit queue.
143 Applications can still use the TCP_CORK socket option to obtain
144 optimal behavior when they know how/when to uncork their sock‐
145 ets.
146
147 tcp_available_congestion_control (String; read-only; since Linux
148 2.4.20)
149 Show a list of the congestion-control algorithms that are regis‐
150 tered. The items in the list are separated by white space and
151 terminated by a newline character. This list is a limiting set
152 for the list in tcp_allowed_congestion_control. More conges‐
153 tion-control algorithms may be available as modules, but not
154 loaded.
155
156 tcp_app_win (integer; default: 31; since Linux 2.4)
157 This variable defines how many bytes of the TCP window are
158 reserved for buffering overhead.
159
160 A maximum of (window/2^tcp_app_win, mss) bytes in the window are
161 reserved for the application buffer. A value of 0 implies that
162 no amount is reserved.
163
164 tcp_base_mss (Integer; default: 512; since Linux 2.6.17)
165 The initial value of search_low to be used by the packetization
166 layer Path MTU discovery (MTU probing). If MTU probing is
167 enabled, this is the initial MSS used by the connection.
168
169 tcp_bic (Boolean; default: disabled; Linux 2.4.27/2.6.6 to 2.6.13)
170 Enable BIC TCP congestion control algorithm. BIC-TCP is a
171 sender-side-only change that ensures a linear RTT fairness under
172 large windows while offering both scalability and bounded TCP-
173 friendliness. The protocol combines two schemes called additive
174 increase and binary search increase. When the congestion window
175 is large, additive increase with a large increment ensures lin‐
176 ear RTT fairness as well as good scalability. Under small con‐
177 gestion windows, binary search increase provides TCP friendli‐
178 ness.
179
180 tcp_bic_low_window (integer; default: 14; Linux 2.4.27/2.6.6 to 2.6.13)
181 Set the threshold window (in packets) where BIC TCP starts to
182 adjust the congestion window. Below this threshold BIC TCP
183 behaves the same as the default TCP Reno.
184
185 tcp_bic_fast_convergence (Boolean; default: enabled; Linux 2.4.27/2.6.6
186 to 2.6.13)
187 Force BIC TCP to more quickly respond to changes in congestion
188 window. Allows two flows sharing the same connection to con‐
189 verge more rapidly.
190
191 tcp_congestion_control (String; default: see text; since Linux 2.4.13)
192 Set the default congestion-control algorithm to be used for new
193 connections. The algorithm "reno" is always available, but
194 additional choices may be available depending on kernel configu‐
195 ration. The default value for this file is set as part of ker‐
196 nel configuration.
197
198 tcp_dma_copybreak (integer; default: 4096; since Linux 2.6.24)
199 Lower limit, in bytes, of the size of socket reads that will be
200 offloaded to a DMA copy engine, if one is present in the system
201 and the kernel was configured with the CONFIG_NET_DMA option.
202
203 tcp_dsack (Boolean; default: enabled; since Linux 2.4)
204 Enable RFC 2883 TCP Duplicate SACK support.
205
206 tcp_ecn (Integer; default: see below; since Linux 2.4)
207 Enable RFC 3168 Explicit Congestion Notification.
208
209 This file can have one of the following values:
210
211 0 Disable ECN. Neither initiate nor accept ECN. This was
212 the default up to and including Linux 2.6.30.
213
214 1 Enable ECN when requested by incoming connections and
215 also request ECN on outgoing connection attempts.
216
217 2 Enable ECN when requested by incoming connections, but do
218 not request ECN on outgoing connections. This value is
219 supported, and is the default, since Linux 2.6.31.
220
221 When enabled, connectivity to some destinations could be
222 affected due to older, misbehaving middle boxes along the path,
223 causing connections to be dropped. However, to facilitate and
224 encourage deployment with option 1, and to work around such
225 buggy equipment, the tcp_ecn_fallback option has been intro‐
226 duced.
227
228 tcp_ecn_fallback (Boolean; default: enabled; since Linux 4.1)
229 Enable RFC 3168, Section 6.1.1.1. fallback. When enabled, out‐
230 going ECN-setup SYNs that time out within the normal SYN
231 retransmission timeout will be resent with CWR and ECE cleared.
232
233 tcp_fack (Boolean; default: enabled; since Linux 2.2)
234 Enable TCP Forward Acknowledgement support.
235
236 tcp_fin_timeout (integer; default: 60; since Linux 2.2)
237 This specifies how many seconds to wait for a final FIN packet
238 before the socket is forcibly closed. This is strictly a viola‐
239 tion of the TCP specification, but required to prevent denial-
240 of-service attacks. In Linux 2.2, the default value was 180.
241
242 tcp_frto (integer; default: see below; since Linux 2.4.21/2.6)
243 Enable F-RTO, an enhanced recovery algorithm for TCP retransmis‐
244 sion timeouts (RTOs). It is particularly beneficial in wireless
245 environments where packet loss is typically due to random radio
246 interference rather than intermediate router congestion. See
247 RFC 4138 for more details.
248
249 This file can have one of the following values:
250
251 0 Disabled. This was the default up to and including Linux
252 2.6.23.
253
254 1 The basic version F-RTO algorithm is enabled.
255
256 2 Enable SACK-enhanced F-RTO if flow uses SACK. The basic ver‐
257 sion can be used also when SACK is in use though in that case
258 scenario(s) exists where F-RTO interacts badly with the
259 packet counting of the SACK-enabled TCP flow. This value is
260 the default since Linux 2.6.24.
261
262 Before Linux 2.6.22, this parameter was a Boolean value, sup‐
263 porting just values 0 and 1 above.
264
265 tcp_frto_response (integer; default: 0; since Linux 2.6.22)
266 When F-RTO has detected that a TCP retransmission timeout was
267 spurious (i.e., the timeout would have been avoided had TCP set
268 a longer retransmission timeout), TCP has several options con‐
269 cerning what to do next. Possible values are:
270
271 0 Rate halving based; a smooth and conservative response,
272 results in halved congestion window (cwnd) and slow-start
273 threshold (ssthresh) after one RTT.
274
275 1 Very conservative response; not recommended because even
276 though being valid, it interacts poorly with the rest of
277 Linux TCP; halves cwnd and ssthresh immediately.
278
279 2 Aggressive response; undoes congestion-control measures that
280 are now known to be unnecessary (ignoring the possibility of
281 a lost retransmission that would require TCP to be more cau‐
282 tious); cwnd and ssthresh are restored to the values prior to
283 timeout.
284
285 tcp_keepalive_intvl (integer; default: 75; since Linux 2.4)
286 The number of seconds between TCP keep-alive probes.
287
288 tcp_keepalive_probes (integer; default: 9; since Linux 2.2)
289 The maximum number of TCP keep-alive probes to send before giv‐
290 ing up and killing the connection if no response is obtained
291 from the other end.
292
293 tcp_keepalive_time (integer; default: 7200; since Linux 2.2)
294 The number of seconds a connection needs to be idle before TCP
295 begins sending out keep-alive probes. Keep-alives are sent only
296 when the SO_KEEPALIVE socket option is enabled. The default
297 value is 7200 seconds (2 hours). An idle connection is termi‐
298 nated after approximately an additional 11 minutes (9 probes an
299 interval of 75 seconds apart) when keep-alive is enabled.
300
301 Note that underlying connection tracking mechanisms and applica‐
302 tion timeouts may be much shorter.
303
304 tcp_low_latency (Boolean; default: disabled; since Linux 2.4.21/2.6;
305 obsolete since Linux 4.14)
306 If enabled, the TCP stack makes decisions that prefer lower
307 latency as opposed to higher throughput. It this option is dis‐
308 abled, then higher throughput is preferred. An example of an
309 application where this default should be changed would be a
310 Beowulf compute cluster. Since Linux 4.14, this file still
311 exists, but its value is ignored.
312
313 tcp_max_orphans (integer; default: see below; since Linux 2.4)
314 The maximum number of orphaned (not attached to any user file
315 handle) TCP sockets allowed in the system. When this number is
316 exceeded, the orphaned connection is reset and a warning is
317 printed. This limit exists only to prevent simple denial-of-
318 service attacks. Lowering this limit is not recommended. Net‐
319 work conditions might require you to increase the number of
320 orphans allowed, but note that each orphan can eat up to ~64 kB
321 of unswappable memory. The default initial value is set equal
322 to the kernel parameter NR_FILE. This initial default is
323 adjusted depending on the memory in the system.
324
325 tcp_max_syn_backlog (integer; default: see below; since Linux 2.2)
326 The maximum number of queued connection requests which have
327 still not received an acknowledgement from the connecting
328 client. If this number is exceeded, the kernel will begin drop‐
329 ping requests. The default value of 256 is increased to 1024
330 when the memory present in the system is adequate or greater (>=
331 128 MB), and reduced to 128 for those systems with very low mem‐
332 ory (<= 32 MB).
333
334 Prior to Linux 2.6.20, it was recommended that if this needed to
335 be increased above 1024, the size of the SYNACK hash table
336 (TCP_SYNQ_HSIZE) in include/net/tcp.h should be modified to keep
337
338 TCP_SYNQ_HSIZE * 16 <= tcp_max_syn_backlog
339
340 and the kernel should be recompiled. In Linux 2.6.20, the fixed
341 sized TCP_SYNQ_HSIZE was removed in favor of dynamic sizing.
342
343 tcp_max_tw_buckets (integer; default: see below; since Linux 2.4)
344 The maximum number of sockets in TIME_WAIT state allowed in the
345 system. This limit exists only to prevent simple denial-of-ser‐
346 vice attacks. The default value of NR_FILE*2 is adjusted
347 depending on the memory in the system. If this number is
348 exceeded, the socket is closed and a warning is printed.
349
350 tcp_moderate_rcvbuf (Boolean; default: enabled; since Linux
351 2.4.17/2.6.7)
352 If enabled, TCP performs receive buffer auto-tuning, attempting
353 to automatically size the buffer (no greater than tcp_rmem[2])
354 to match the size required by the path for full throughput.
355
356 tcp_mem (since Linux 2.4)
357 This is a vector of 3 integers: [low, pressure, high]. These
358 bounds, measured in units of the system page size, are used by
359 TCP to track its memory usage. The defaults are calculated at
360 boot time from the amount of available memory. (TCP can only
361 use low memory for this, which is limited to around 900
362 megabytes on 32-bit systems. 64-bit systems do not suffer this
363 limitation.)
364
365 low TCP doesn't regulate its memory allocation when the num‐
366 ber of pages it has allocated globally is below this num‐
367 ber.
368
369 pressure
370 When the amount of memory allocated by TCP exceeds this
371 number of pages, TCP moderates its memory consumption.
372 This memory pressure state is exited once the number of
373 pages allocated falls below the low mark.
374
375 high The maximum number of pages, globally, that TCP will
376 allocate. This value overrides any other limits imposed
377 by the kernel.
378
379 tcp_mtu_probing (integer; default: 0; since Linux 2.6.17)
380 This parameter controls TCP Packetization-Layer Path MTU Discov‐
381 ery. The following values may be assigned to the file:
382
383 0 Disabled
384
385 1 Disabled by default, enabled when an ICMP black hole detected
386
387 2 Always enabled, use initial MSS of tcp_base_mss.
388
389 tcp_no_metrics_save (Boolean; default: disabled; since Linux 2.6.6)
390 By default, TCP saves various connection metrics in the route
391 cache when the connection closes, so that connections estab‐
392 lished in the near future can use these to set initial condi‐
393 tions. Usually, this increases overall performance, but it may
394 sometimes cause performance degradation. If tcp_no_metrics_save
395 is enabled, TCP will not cache metrics on closing connections.
396
397 tcp_orphan_retries (integer; default: 8; since Linux 2.4)
398 The maximum number of attempts made to probe the other end of a
399 connection which has been closed by our end.
400
401 tcp_reordering (integer; default: 3; since Linux 2.4)
402 The maximum a packet can be reordered in a TCP packet stream
403 without TCP assuming packet loss and going into slow start. It
404 is not advisable to change this number. This is a packet
405 reordering detection metric designed to minimize unnecessary
406 back off and retransmits provoked by reordering of packets on a
407 connection.
408
409 tcp_retrans_collapse (Boolean; default: enabled; since Linux 2.2)
410 Try to send full-sized packets during retransmit.
411
412 tcp_retries1 (integer; default: 3; since Linux 2.2)
413 The number of times TCP will attempt to retransmit a packet on
414 an established connection normally, without the extra effort of
415 getting the network layers involved. Once we exceed this number
416 of retransmits, we first have the network layer update the route
417 if possible before each new retransmit. The default is the RFC
418 specified minimum of 3.
419
420 tcp_retries2 (integer; default: 15; since Linux 2.2)
421 The maximum number of times a TCP packet is retransmitted in
422 established state before giving up. The default value is 15,
423 which corresponds to a duration of approximately between 13 to
424 30 minutes, depending on the retransmission timeout. The
425 RFC 1122 specified minimum limit of 100 seconds is typically
426 deemed too short.
427
428 tcp_rfc1337 (Boolean; default: disabled; since Linux 2.2)
429 Enable TCP behavior conformant with RFC 1337. When disabled, if
430 a RST is received in TIME_WAIT state, we close the socket imme‐
431 diately without waiting for the end of the TIME_WAIT period.
432
433 tcp_rmem (since Linux 2.4)
434 This is a vector of 3 integers: [min, default, max]. These
435 parameters are used by TCP to regulate receive buffer sizes.
436 TCP dynamically adjusts the size of the receive buffer from the
437 defaults listed below, in the range of these values, depending
438 on memory available in the system.
439
440 min minimum size of the receive buffer used by each TCP
441 socket. The default value is the system page size. (On
442 Linux 2.4, the default value is 4 kB, lowered to
443 PAGE_SIZE bytes in low-memory systems.) This value is
444 used to ensure that in memory pressure mode, allocations
445 below this size will still succeed. This is not used to
446 bound the size of the receive buffer declared using
447 SO_RCVBUF on a socket.
448
449 default
450 the default size of the receive buffer for a TCP socket.
451 This value overwrites the initial default buffer size
452 from the generic global net.core.rmem_default defined for
453 all protocols. The default value is 87380 bytes. (On
454 Linux 2.4, this will be lowered to 43689 in low-memory
455 systems.) If larger receive buffer sizes are desired,
456 this value should be increased (to affect all sockets).
457 To employ large TCP windows, the net.ipv4.tcp_win‐
458 dow_scaling must be enabled (default).
459
460 max the maximum size of the receive buffer used by each TCP
461 socket. This value does not override the global
462 net.core.rmem_max. This is not used to limit the size of
463 the receive buffer declared using SO_RCVBUF on a socket.
464 The default value is calculated using the formula
465
466 max(87380, min(4 MB, tcp_mem[1]*PAGE_SIZE/128))
467
468 (On Linux 2.4, the default is 87380*2 bytes, lowered to
469 87380 in low-memory systems).
470
471 tcp_sack (Boolean; default: enabled; since Linux 2.2)
472 Enable RFC 2018 TCP Selective Acknowledgements.
473
474 tcp_slow_start_after_idle (Boolean; default: enabled; since Linux
475 2.6.18)
476 If enabled, provide RFC 2861 behavior and time out the conges‐
477 tion window after an idle period. An idle period is defined as
478 the current RTO (retransmission timeout). If disabled, the con‐
479 gestion window will not be timed out after an idle period.
480
481 tcp_stdurg (Boolean; default: disabled; since Linux 2.2)
482 If this option is enabled, then use the RFC 1122 interpretation
483 of the TCP urgent-pointer field. According to this interpreta‐
484 tion, the urgent pointer points to the last byte of urgent data.
485 If this option is disabled, then use the BSD-compatible inter‐
486 pretation of the urgent pointer: the urgent pointer points to
487 the first byte after the urgent data. Enabling this option may
488 lead to interoperability problems.
489
490 tcp_syn_retries (integer; default: 6; since Linux 2.2)
491 The maximum number of times initial SYNs for an active TCP con‐
492 nection attempt will be retransmitted. This value should not be
493 higher than 255. The default value is 6, which corresponds to
494 retrying for up to approximately 127 seconds. Before Linux 3.7,
495 the default value was 5, which (in conjunction with calculation
496 based on other kernel parameters) corresponded to approximately
497 180 seconds.
498
499 tcp_synack_retries (integer; default: 5; since Linux 2.2)
500 The maximum number of times a SYN/ACK segment for a passive TCP
501 connection will be retransmitted. This number should not be
502 higher than 255.
503
504 tcp_syncookies (Boolean; since Linux 2.2)
505 Enable TCP syncookies. The kernel must be compiled with CON‐
506 FIG_SYN_COOKIES. Send out syncookies when the syn backlog queue
507 of a socket overflows. The syncookies feature attempts to pro‐
508 tect a socket from a SYN flood attack. This should be used as a
509 last resort, if at all. This is a violation of the TCP proto‐
510 col, and conflicts with other areas of TCP such as TCP exten‐
511 sions. It can cause problems for clients and relays. It is not
512 recommended as a tuning mechanism for heavily loaded servers to
513 help with overloaded or misconfigured conditions. For recom‐
514 mended alternatives see tcp_max_syn_backlog, tcp_synack_retries,
515 and tcp_abort_on_overflow.
516
517 tcp_timestamps (integer; default: 1; since Linux 2.2)
518 Set to one of the following values to enable or disable RFC 1323
519 TCP timestamps:
520
521 0 Disable timestamps.
522
523 1 Enable timestamps as defined in RFC1323 and use random offset
524 for each connection rather than only using the current time.
525
526 2 As for the value 1, but without random offsets. Setting
527 tcp_timestamps to this value is meaningful since Linux 4.10.
528
529 tcp_tso_win_divisor (integer; default: 3; since Linux 2.6.9)
530 This parameter controls what percentage of the congestion window
531 can be consumed by a single TCP Segmentation Offload (TSO)
532 frame. The setting of this parameter is a tradeoff between
533 burstiness and building larger TSO frames.
534
535 tcp_tw_recycle (Boolean; default: disabled; Linux 2.4 to 4.11)
536 Enable fast recycling of TIME_WAIT sockets. Enabling this
537 option is not recommended as the remote IP may not use monotoni‐
538 cally increasing timestamps (devices behind NAT, devices with
539 per-connection timestamp offsets). See RFC 1323 (PAWS) and RFC
540 6191.
541
542 tcp_tw_reuse (Boolean; default: disabled; since Linux 2.4.19/2.6)
543 Allow to reuse TIME_WAIT sockets for new connections when it is
544 safe from protocol viewpoint. It should not be changed without
545 advice/request of technical experts.
546
547 tcp_vegas_cong_avoid (Boolean; default: disabled; Linux 2.2 to 2.6.13)
548 Enable TCP Vegas congestion avoidance algorithm. TCP Vegas is a
549 sender-side-only change to TCP that anticipates the onset of
550 congestion by estimating the bandwidth. TCP Vegas adjusts the
551 sending rate by modifying the congestion window. TCP Vegas
552 should provide less packet loss, but it is not as aggressive as
553 TCP Reno.
554
555 tcp_westwood (Boolean; default: disabled; Linux 2.4.26/2.6.3 to 2.6.13)
556 Enable TCP Westwood+ congestion control algorithm. TCP West‐
557 wood+ is a sender-side-only modification of the TCP Reno proto‐
558 col stack that optimizes the performance of TCP congestion con‐
559 trol. It is based on end-to-end bandwidth estimation to set
560 congestion window and slow start threshold after a congestion
561 episode. Using this estimation, TCP Westwood+ adaptively sets a
562 slow start threshold and a congestion window which takes into
563 account the bandwidth used at the time congestion is experi‐
564 enced. TCP Westwood+ significantly increases fairness with
565 respect to TCP Reno in wired networks and throughput over wire‐
566 less links.
567
568 tcp_window_scaling (Boolean; default: enabled; since Linux 2.2)
569 Enable RFC 1323 TCP window scaling. This feature allows the use
570 of a large window (> 64 kB) on a TCP connection, should the
571 other end support it. Normally, the 16 bit window length field
572 in the TCP header limits the window size to less than 64 kB. If
573 larger windows are desired, applications can increase the size
574 of their socket buffers and the window scaling option will be
575 employed. If tcp_window_scaling is disabled, TCP will not nego‐
576 tiate the use of window scaling with the other end during con‐
577 nection setup.
578
579 tcp_wmem (since Linux 2.4)
580 This is a vector of 3 integers: [min, default, max]. These
581 parameters are used by TCP to regulate send buffer sizes. TCP
582 dynamically adjusts the size of the send buffer from the default
583 values listed below, in the range of these values, depending on
584 memory available.
585
586 min Minimum size of the send buffer used by each TCP socket.
587 The default value is the system page size. (On Linux
588 2.4, the default value is 4 kB.) This value is used to
589 ensure that in memory pressure mode, allocations below
590 this size will still succeed. This is not used to bound
591 the size of the send buffer declared using SO_SNDBUF on a
592 socket.
593
594 default
595 The default size of the send buffer for a TCP socket.
596 This value overwrites the initial default buffer size
597 from the generic global /proc/sys/net/core/wmem_default
598 defined for all protocols. The default value is 16 kB.
599 If larger send buffer sizes are desired, this value
600 should be increased (to affect all sockets). To employ
601 large TCP windows, the /proc/sys/net/ipv4/tcp_win‐
602 dow_scaling must be set to a nonzero value (default).
603
604 max The maximum size of the send buffer used by each TCP
605 socket. This value does not override the value in
606 /proc/sys/net/core/wmem_max. This is not used to limit
607 the size of the send buffer declared using SO_SNDBUF on a
608 socket. The default value is calculated using the for‐
609 mula
610
611 max(65536, min(4 MB, tcp_mem[1]*PAGE_SIZE/128))
612
613 (On Linux 2.4, the default value is 128 kB, lowered 64 kB
614 depending on low-memory systems.)
615
616 tcp_workaround_signed_windows (Boolean; default: disabled; since Linux
617 2.6.26)
618 If enabled, assume that no receipt of a window-scaling option
619 means that the remote TCP is broken and treats the window as a
620 signed quantity. If disabled, assume that the remote TCP is not
621 broken even if we do not receive a window scaling option from
622 it.
623
624 Socket options
625 To set or get a TCP socket option, call getsockopt(2) to read or set‐
626 sockopt(2) to write the option with the option level argument set to
627 IPPROTO_TCP. Unless otherwise noted, optval is a pointer to an int.
628 In addition, most IPPROTO_IP socket options are valid on TCP sockets.
629 For more information see ip(7).
630
631 Following is a list of TCP-specific socket options. For details of
632 some other socket options that are also applicable for TCP sockets, see
633 socket(7).
634
635 TCP_CONGESTION (since Linux 2.6.13)
636 The argument for this option is a string. This option allows
637 the caller to set the TCP congestion control algorithm to be
638 used, on a per-socket basis. Unprivileged processes are
639 restricted to choosing one of the algorithms in tcp_allowed_con‐
640 gestion_control (described above). Privileged processes
641 (CAP_NET_ADMIN) can choose from any of the available congestion-
642 control algorithms (see the description of tcp_available_conges‐
643 tion_control above).
644
645 TCP_CORK (since Linux 2.2)
646 If set, don't send out partial frames. All queued partial
647 frames are sent when the option is cleared again. This is use‐
648 ful for prepending headers before calling sendfile(2), or for
649 throughput optimization. As currently implemented, there is a
650 200 millisecond ceiling on the time for which output is corked
651 by TCP_CORK. If this ceiling is reached, then queued data is
652 automatically transmitted. This option can be combined with
653 TCP_NODELAY only since Linux 2.5.71. This option should not be
654 used in code intended to be portable.
655
656 TCP_DEFER_ACCEPT (since Linux 2.4)
657 Allow a listener to be awakened only when data arrives on the
658 socket. Takes an integer value (seconds), this can bound the
659 maximum number of attempts TCP will make to complete the connec‐
660 tion. This option should not be used in code intended to be
661 portable.
662
663 TCP_INFO (since Linux 2.4)
664 Used to collect information about this socket. The kernel
665 returns a struct tcp_info as defined in the file
666 /usr/include/linux/tcp.h. This option should not be used in
667 code intended to be portable.
668
669 TCP_KEEPCNT (since Linux 2.4)
670 The maximum number of keepalive probes TCP should send before
671 dropping the connection. This option should not be used in code
672 intended to be portable.
673
674 TCP_KEEPIDLE (since Linux 2.4)
675 The time (in seconds) the connection needs to remain idle before
676 TCP starts sending keepalive probes, if the socket option
677 SO_KEEPALIVE has been set on this socket. This option should
678 not be used in code intended to be portable.
679
680 TCP_KEEPINTVL (since Linux 2.4)
681 The time (in seconds) between individual keepalive probes. This
682 option should not be used in code intended to be portable.
683
684 TCP_LINGER2 (since Linux 2.4)
685 The lifetime of orphaned FIN_WAIT2 state sockets. This option
686 can be used to override the system-wide setting in the file
687 /proc/sys/net/ipv4/tcp_fin_timeout for this socket. This is not
688 to be confused with the socket(7) level option SO_LINGER. This
689 option should not be used in code intended to be portable.
690
691 TCP_MAXSEG
692 The maximum segment size for outgoing TCP packets. In Linux 2.2
693 and earlier, and in Linux 2.6.28 and later, if this option is
694 set before connection establishment, it also changes the MSS
695 value announced to the other end in the initial packet. Values
696 greater than the (eventual) interface MTU have no effect. TCP
697 will also impose its minimum and maximum bounds over the value
698 provided.
699
700 TCP_NODELAY
701 If set, disable the Nagle algorithm. This means that segments
702 are always sent as soon as possible, even if there is only a
703 small amount of data. When not set, data is buffered until
704 there is a sufficient amount to send out, thereby avoiding the
705 frequent sending of small packets, which results in poor uti‐
706 lization of the network. This option is overridden by TCP_CORK;
707 however, setting this option forces an explicit flush of pending
708 output, even if TCP_CORK is currently set.
709
710 TCP_QUICKACK (since Linux 2.4.4)
711 Enable quickack mode if set or disable quickack mode if cleared.
712 In quickack mode, acks are sent immediately, rather than delayed
713 if needed in accordance to normal TCP operation. This flag is
714 not permanent, it only enables a switch to or from quickack
715 mode. Subsequent operation of the TCP protocol will once again
716 enter/leave quickack mode depending on internal protocol pro‐
717 cessing and factors such as delayed ack timeouts occurring and
718 data transfer. This option should not be used in code intended
719 to be portable.
720
721 TCP_SYNCNT (since Linux 2.4)
722 Set the number of SYN retransmits that TCP should send before
723 aborting the attempt to connect. It cannot exceed 255. This
724 option should not be used in code intended to be portable.
725
726 TCP_USER_TIMEOUT (since Linux 2.6.37)
727 This option takes an unsigned int as an argument. When the
728 value is greater than 0, it specifies the maximum amount of time
729 in milliseconds that transmitted data may remain unacknowledged
730 before TCP will forcibly close the corresponding connection and
731 return ETIMEDOUT to the application. If the option value is
732 specified as 0, TCP will use the system default.
733
734 Increasing user timeouts allows a TCP connection to survive
735 extended periods without end-to-end connectivity. Decreasing
736 user timeouts allows applications to "fail fast", if so desired.
737 Otherwise, failure may take up to 20 minutes with the current
738 system defaults in a normal WAN environment.
739
740 This option can be set during any state of a TCP connection, but
741 is effective only during the synchronized states of a connection
742 (ESTABLISHED, FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, and
743 LAST-ACK). Moreover, when used with the TCP keepalive
744 (SO_KEEPALIVE) option, TCP_USER_TIMEOUT will override keepalive
745 to determine when to close a connection due to keepalive fail‐
746 ure.
747
748 The option has no effect on when TCP retransmits a packet, nor
749 when a keepalive probe is sent.
750
751 This option, like many others, will be inherited by the socket
752 returned by accept(2), if it was set on the listening socket.
753
754 Further details on the user timeout feature can be found in
755 RFC 793 and RFC 5482 ("TCP User Timeout Option").
756
757 TCP_WINDOW_CLAMP (since Linux 2.4)
758 Bound the size of the advertised window to this value. The ker‐
759 nel imposes a minimum size of SOCK_MIN_RCVBUF/2. This option
760 should not be used in code intended to be portable.
761
762 Sockets API
763 TCP provides limited support for out-of-band data, in the form of (a
764 single byte of) urgent data. In Linux this means if the other end
765 sends newer out-of-band data the older urgent data is inserted as nor‐
766 mal data into the stream (even when SO_OOBINLINE is not set). This
767 differs from BSD-based stacks.
768
769 Linux uses the BSD compatible interpretation of the urgent pointer
770 field by default. This violates RFC 1122, but is required for interop‐
771 erability with other stacks. It can be changed via
772 /proc/sys/net/ipv4/tcp_stdurg.
773
774 It is possible to peek at out-of-band data using the recv(2) MSG_PEEK
775 flag.
776
777 Since version 2.4, Linux supports the use of MSG_TRUNC in the flags
778 argument of recv(2) (and recvmsg(2)). This flag causes the received
779 bytes of data to be discarded, rather than passed back in a caller-sup‐
780 plied buffer. Since Linux 2.4.4, MSG_TRUNC also has this effect when
781 used in conjunction with MSG_OOB to receive out-of-band data.
782
783 Ioctls
784 The following ioctl(2) calls return information in value. The correct
785 syntax is:
786
787 int value;
788 error = ioctl(tcp_socket, ioctl_type, &value);
789
790 ioctl_type is one of the following:
791
792 SIOCINQ
793 Returns the amount of queued unread data in the receive buffer.
794 The socket must not be in LISTEN state, otherwise an error (EIN‐
795 VAL) is returned. SIOCINQ is defined in <linux/sockios.h>.
796 Alternatively, you can use the synonymous FIONREAD, defined in
797 <sys/ioctl.h>.
798
799 SIOCATMARK
800 Returns true (i.e., value is nonzero) if the inbound data stream
801 is at the urgent mark.
802
803 If the SO_OOBINLINE socket option is set, and SIOCATMARK returns
804 true, then the next read from the socket will return the urgent
805 data. If the SO_OOBINLINE socket option is not set, and SIOCAT‐
806 MARK returns true, then the next read from the socket will
807 return the bytes following the urgent data (to actually read the
808 urgent data requires the recv(MSG_OOB) flag).
809
810 Note that a read never reads across the urgent mark. If an
811 application is informed of the presence of urgent data via
812 select(2) (using the exceptfds argument) or through delivery of
813 a SIGURG signal, then it can advance up to the mark using a loop
814 which repeatedly tests SIOCATMARK and performs a read (request‐
815 ing any number of bytes) as long as SIOCATMARK returns false.
816
817 SIOCOUTQ
818 Returns the amount of unsent data in the socket send queue. The
819 socket must not be in LISTEN state, otherwise an error (EINVAL)
820 is returned. SIOCOUTQ is defined in <linux/sockios.h>. Alter‐
821 natively, you can use the synonymous TIOCOUTQ, defined in
822 <sys/ioctl.h>.
823
824 Error handling
825 When a network error occurs, TCP tries to resend the packet. If it
826 doesn't succeed after some time, either ETIMEDOUT or the last received
827 error on this connection is reported.
828
829 Some applications require a quicker error notification. This can be
830 enabled with the IPPROTO_IP level IP_RECVERR socket option. When this
831 option is enabled, all incoming errors are immediately passed to the
832 user program. Use this option with care — it makes TCP less tolerant
833 to routing changes and other normal network conditions.
834
836 EAFNOTSUPPORT
837 Passed socket address type in sin_family was not AF_INET.
838
839 EPIPE The other end closed the socket unexpectedly or a read is exe‐
840 cuted on a shut down socket.
841
842 ETIMEDOUT
843 The other end didn't acknowledge retransmitted data after some
844 time.
845
846 Any errors defined for ip(7) or the generic socket layer may also be
847 returned for TCP.
848
850 Support for Explicit Congestion Notification, zero-copy sendfile(2),
851 reordering support and some SACK extensions (DSACK) were introduced in
852 2.4. Support for forward acknowledgement (FACK), TIME_WAIT recycling,
853 and per-connection keepalive socket options were introduced in 2.3.
854
856 Not all errors are documented.
857
858 IPv6 is not described.
859
861 accept(2), bind(2), connect(2), getsockopt(2), listen(2), recvmsg(2),
862 sendfile(2), sendmsg(2), socket(2), ip(7), socket(7)
863
864 The kernel source file Documentation/networking/ip-sysctl.txt.
865
866 RFC 793 for the TCP specification.
867 RFC 1122 for the TCP requirements and a description of the Nagle algo‐
868 rithm.
869 RFC 1323 for TCP timestamp and window scaling options.
870 RFC 1337 for a description of TIME_WAIT assassination hazards.
871 RFC 3168 for a description of Explicit Congestion Notification.
872 RFC 2581 for TCP congestion control algorithms.
873 RFC 2018 and RFC 2883 for SACK and extensions to SACK.
874
876 This page is part of release 5.07 of the Linux man-pages project. A
877 description of the project, information about reporting bugs, and the
878 latest version of this page, can be found at
879 https://www.kernel.org/doc/man-pages/.
880
881
882
883Linux 2020-06-09 TCP(7)