1TCP(7) Linux Programmer's Manual TCP(7)
2
3
4
6 tcp - TCP protocol
7
9 #include <sys/socket.h>
10 #include <netinet/in.h>
11 #include <netinet/tcp.h>
12
13 tcp_socket = socket(AF_INET, SOCK_STREAM, 0);
14
16 This is an implementation of the TCP protocol defined in RFC 793,
17 RFC 1122 and RFC 2001 with the NewReno and SACK extensions. It pro‐
18 vides a reliable, stream-oriented, full-duplex connection between two
19 sockets on top of ip(7), for both v4 and v6 versions. TCP guarantees
20 that the data arrives in order and retransmits lost packets. It gener‐
21 ates and checks a per-packet checksum to catch transmission errors.
22 TCP does not preserve record boundaries.
23
24 A newly created TCP socket has no remote or local address and is not
25 fully specified. To create an outgoing TCP connection use connect(2)
26 to establish a connection to another TCP socket. To receive new incom‐
27 ing connections, first bind(2) the socket to a local address and port
28 and then call listen(2) to put the socket into the listening state.
29 After that a new socket for each incoming connection can be accepted
30 using accept(2). A socket which has had accept(2) or connect(2) suc‐
31 cessfully called on it is fully specified and may transmit data. Data
32 cannot be transmitted on listening or not yet connected sockets.
33
34 Linux supports RFC 1323 TCP high performance extensions. These include
35 Protection Against Wrapped Sequence Numbers (PAWS), Window Scaling and
36 Timestamps. Window scaling allows the use of large (> 64 kB) TCP win‐
37 dows in order to support links with high latency or bandwidth. To make
38 use of them, the send and receive buffer sizes must be increased. They
39 can be set globally with the /proc/sys/net/ipv4/tcp_wmem and
40 /proc/sys/net/ipv4/tcp_rmem files, or on individual sockets by using
41 the SO_SNDBUF and SO_RCVBUF socket options with the setsockopt(2) call.
42
43 The maximum sizes for socket buffers declared via the SO_SNDBUF and
44 SO_RCVBUF mechanisms are limited by the values in the
45 /proc/sys/net/core/rmem_max and /proc/sys/net/core/wmem_max files.
46 Note that TCP actually allocates twice the size of the buffer requested
47 in the setsockopt(2) call, and so a succeeding getsockopt(2) call will
48 not return the same size of buffer as requested in the setsockopt(2)
49 call. TCP uses the extra space for administrative purposes and inter‐
50 nal kernel structures, and the /proc file values reflect the larger
51 sizes compared to the actual TCP windows. On individual connections,
52 the socket buffer size must be set prior to the listen(2) or connect(2)
53 calls in order to have it take effect. See socket(7) for more informa‐
54 tion.
55
56 TCP supports urgent data. Urgent data is used to signal the receiver
57 that some important message is part of the data stream and that it
58 should be processed as soon as possible. To send urgent data specify
59 the MSG_OOB option to send(2). When urgent data is received, the ker‐
60 nel sends a SIGURG signal to the process or process group that has been
61 set as the socket "owner" using the SIOCSPGRP or FIOSETOWN ioctls (or
62 the POSIX.1-specified fcntl(2) F_SETOWN operation). When the SO_OOBIN‐
63 LINE socket option is enabled, urgent data is put into the normal data
64 stream (a program can test for its location using the SIOCATMARK ioctl
65 described below), otherwise it can be received only when the MSG_OOB
66 flag is set for recv(2) or recvmsg(2).
67
68 When out-of-band data is present, select(2) indicates the file descrip‐
69 tor as having an exceptional condition and poll [4m(2) indicates a POLLPRI
70 event.
71
72 Linux 2.4 introduced a number of changes for improved throughput and
73 scaling, as well as enhanced functionality. Some of these features in‐
74 clude support for zero-copy sendfile(2), Explicit Congestion Notifica‐
75 tion, new management of TIME_WAIT sockets, keep-alive socket options
76 and support for Duplicate SACK extensions.
77
78 Address formats
79 TCP is built on top of IP (see ip(7)). The address formats defined by
80 ip(7) apply to TCP. TCP supports point-to-point communication only;
81 broadcasting and multicasting are not supported.
82
83 /proc interfaces
84 System-wide TCP parameter settings can be accessed by files in the di‐
85 rectory /proc/sys/net/ipv4/. In addition, most IP /proc interfaces
86 also apply to TCP; see ip(7). Variables described as Boolean take an
87 integer value, with a nonzero value ("true") meaning that the corre‐
88 sponding option is enabled, and a zero value ("false") meaning that the
89 option is disabled.
90
91 tcp_abc (Integer; default: 0; Linux 2.6.15 to Linux 3.8)
92 Control the Appropriate Byte Count (ABC), defined in RFC 3465.
93 ABC is a way of increasing the congestion window (cwnd) more
94 slowly in response to partial acknowledgements. Possible values
95 are:
96
97 0 increase cwnd once per acknowledgement (no ABC)
98
99 1 increase cwnd once per acknowledgement of full sized segment
100
101 2 allow increase cwnd by two if acknowledgement is of two seg‐
102 ments to compensate for delayed acknowledgements.
103
104 tcp_abort_on_overflow (Boolean; default: disabled; since Linux 2.4)
105 Enable resetting connections if the listening service is too
106 slow and unable to keep up and accept them. It means that if
107 overflow occurred due to a burst, the connection will recover.
108 Enable this option only if you are really sure that the listen‐
109 ing daemon cannot be tuned to accept connections faster. En‐
110 abling this option can harm the clients of your server.
111
112 tcp_adv_win_scale (integer; default: 2; since Linux 2.4)
113 Count buffering overhead as bytes/2^tcp_adv_win_scale, if
114 tcp_adv_win_scale is greater than 0; or
115 bytes-bytes/2^(-tcp_adv_win_scale), if tcp_adv_win_scale is less
116 than or equal to zero.
117
118 The socket receive buffer space is shared between the applica‐
119 tion and kernel. TCP maintains part of the buffer as the TCP
120 window, this is the size of the receive window advertised to the
121 other end. The rest of the space is used as the "application"
122 buffer, used to isolate the network from scheduling and applica‐
123 tion latencies. The tcp_adv_win_scale default value of 2 im‐
124 plies that the space used for the application buffer is one
125 fourth that of the total.
126
127 tcp_allowed_congestion_control (String; default: see text; since Linux
128 2.4.20)
129 Show/set the congestion control algorithm choices available to
130 unprivileged processes (see the description of the TCP_CONGES‐
131 TION socket option). The items in the list are separated by
132 white space and terminated by a newline character. The list is
133 a subset of those listed in tcp_available_congestion_control.
134 The default value for this list is "reno" plus the default set‐
135 ting of tcp_congestion_control.
136
137 tcp_autocorking (Boolean; default: enabled; since Linux 3.14)
138 If this option is enabled, the kernel tries to coalesce small
139 writes (from consecutive write(2) and sendmsg(2) calls) as much
140 as possible, in order to decrease the total number of sent pack‐
141 ets. Coalescing is done if at least one prior packet for the
142 flow is waiting in Qdisc queues or device transmit queue. Ap‐
143 plications can still use the TCP_CORK socket option to obtain
144 optimal behavior when they know how/when to uncork their sock‐
145 ets.
146
147 tcp_available_congestion_control (String; read-only; since Linux
148 2.4.20)
149 Show a list of the congestion-control algorithms that are regis‐
150 tered. The items in the list are separated by white space and
151 terminated by a newline character. This list is a limiting set
152 for the list in tcp_allowed_congestion_control. More conges‐
153 tion-control algorithms may be available as modules, but not
154 loaded.
155
156 tcp_app_win (integer; default: 31; since Linux 2.4)
157 This variable defines how many bytes of the TCP window are re‐
158 served for buffering overhead.
159
160 A maximum of (window/2^tcp_app_win, mss) bytes in the window are
161 reserved for the application buffer. A value of 0 implies that
162 no amount is reserved.
163
164 tcp_base_mss (Integer; default: 512; since Linux 2.6.17)
165 The initial value of search_low to be used by the packetization
166 layer Path MTU discovery (MTU probing). If MTU probing is en‐
167 abled, this is the initial MSS used by the connection.
168
169 tcp_bic (Boolean; default: disabled; Linux 2.4.27/2.6.6 to 2.6.13)
170 Enable BIC TCP congestion control algorithm. BIC-TCP is a
171 sender-side-only change that ensures a linear RTT fairness under
172 large windows while offering both scalability and bounded TCP-
173 friendliness. The protocol combines two schemes called additive
174 increase and binary search increase. When the congestion window
175 is large, additive increase with a large increment ensures lin‐
176 ear RTT fairness as well as good scalability. Under small con‐
177 gestion windows, binary search increase provides TCP friendli‐
178 ness.
179
180 tcp_bic_low_window (integer; default: 14; Linux 2.4.27/2.6.6 to 2.6.13)
181 Set the threshold window (in packets) where BIC TCP starts to
182 adjust the congestion window. Below this threshold BIC TCP be‐
183 haves the same as the default TCP Reno.
184
185 tcp_bic_fast_convergence (Boolean; default: enabled; Linux 2.4.27/2.6.6
186 to 2.6.13)
187 Force BIC TCP to more quickly respond to changes in congestion
188 window. Allows two flows sharing the same connection to con‐
189 verge more rapidly.
190
191 tcp_congestion_control (String; default: see text; since Linux 2.4.13)
192 Set the default congestion-control algorithm to be used for new
193 connections. The algorithm "reno" is always available, but ad‐
194 ditional choices may be available depending on kernel configura‐
195 tion. The default value for this file is set as part of kernel
196 configuration.
197
198 tcp_dma_copybreak (integer; default: 4096; since Linux 2.6.24)
199 Lower limit, in bytes, of the size of socket reads that will be
200 offloaded to a DMA copy engine, if one is present in the system
201 and the kernel was configured with the CONFIG_NET_DMA option.
202
203 tcp_dsack (Boolean; default: enabled; since Linux 2.4)
204 Enable RFC 2883 TCP Duplicate SACK support.
205
206 tcp_ecn (Integer; default: see below; since Linux 2.4)
207 Enable RFC 3168 Explicit Congestion Notification.
208
209 This file can have one of the following values:
210
211 0 Disable ECN. Neither initiate nor accept ECN. This was
212 the default up to and including Linux 2.6.30.
213
214 1 Enable ECN when requested by incoming connections and
215 also request ECN on outgoing connection attempts.
216
217 2 Enable ECN when requested by incoming connections, but do
218 not request ECN on outgoing connections. This value is
219 supported, and is the default, since Linux 2.6.31.
220
221 When enabled, connectivity to some destinations could be af‐
222 fected due to older, misbehaving middle boxes along the path,
223 causing connections to be dropped. However, to facilitate and
224 encourage deployment with option 1, and to work around such
225 buggy equipment, the tcp_ecn_fallback option has been intro‐
226 duced.
227
228 tcp_ecn_fallback (Boolean; default: enabled; since Linux 4.1)
229 Enable RFC 3168, Section 6.1.1.1. fallback. When enabled, out‐
230 going ECN-setup SYNs that time out within the normal SYN re‐
231 transmission timeout will be resent with CWR and ECE cleared.
232
233 tcp_fack (Boolean; default: enabled; since Linux 2.2)
234 Enable TCP Forward Acknowledgement support.
235
236 tcp_fin_timeout (integer; default: 60; since Linux 2.2)
237 This specifies how many seconds to wait for a final FIN packet
238 before the socket is forcibly closed. This is strictly a viola‐
239 tion of the TCP specification, but required to prevent denial-
240 of-service attacks. In Linux 2.2, the default value was 180.
241
242 tcp_frto (integer; default: see below; since Linux 2.4.21/2.6)
243 Enable F-RTO, an enhanced recovery algorithm for TCP retransmis‐
244 sion timeouts (RTOs). It is particularly beneficial in wireless
245 environments where packet loss is typically due to random radio
246 interference rather than intermediate router congestion. See
247 RFC 4138 for more details.
248
249 This file can have one of the following values:
250
251 0 Disabled. This was the default up to and including Linux
252 2.6.23.
253
254 1 The basic version F-RTO algorithm is enabled.
255
256 2 Enable SACK-enhanced F-RTO if flow uses SACK. The basic ver‐
257 sion can be used also when SACK is in use though in that case
258 scenario(s) exists where F-RTO interacts badly with the
259 packet counting of the SACK-enabled TCP flow. This value is
260 the default since Linux 2.6.24.
261
262 Before Linux 2.6.22, this parameter was a Boolean value, sup‐
263 porting just values 0 and 1 above.
264
265 tcp_frto_response (integer; default: 0; since Linux 2.6.22)
266 When F-RTO has detected that a TCP retransmission timeout was
267 spurious (i.e., the timeout would have been avoided had TCP set
268 a longer retransmission timeout), TCP has several options con‐
269 cerning what to do next. Possible values are:
270
271 0 Rate halving based; a smooth and conservative response, re‐
272 sults in halved congestion window (cwnd) and slow-start
273 threshold (ssthresh) after one RTT.
274
275 1 Very conservative response; not recommended because even
276 though being valid, it interacts poorly with the rest of
277 Linux TCP; halves cwnd and ssthresh immediately.
278
279 2 Aggressive response; undoes congestion-control measures that
280 are now known to be unnecessary (ignoring the possibility of
281 a lost retransmission that would require TCP to be more cau‐
282 tious); cwnd and ssthresh are restored to the values prior to
283 timeout.
284
285 tcp_keepalive_intvl (integer; default: 75; since Linux 2.4)
286 The number of seconds between TCP keep-alive probes.
287
288 tcp_keepalive_probes (integer; default: 9; since Linux 2.2)
289 The maximum number of TCP keep-alive probes to send before giv‐
290 ing up and killing the connection if no response is obtained
291 from the other end.
292
293 tcp_keepalive_time (integer; default: 7200; since Linux 2.2)
294 The number of seconds a connection needs to be idle before TCP
295 begins sending out keep-alive probes. Keep-alives are sent only
296 when the SO_KEEPALIVE socket option is enabled. The default
297 value is 7200 seconds (2 hours). An idle connection is termi‐
298 nated after approximately an additional 11 minutes (9 probes an
299 interval of 75 seconds apart) when keep-alive is enabled.
300
301 Note that underlying connection tracking mechanisms and applica‐
302 tion timeouts may be much shorter.
303
304 tcp_low_latency (Boolean; default: disabled; since Linux 2.4.21/2.6;
305 obsolete since Linux 4.14)
306 If enabled, the TCP stack makes decisions that prefer lower la‐
307 tency as opposed to higher throughput. It this option is dis‐
308 abled, then higher throughput is preferred. An example of an
309 application where this default should be changed would be a Be‐
310 owulf compute cluster. Since Linux 4.14, this file still ex‐
311 ists, but its value is ignored.
312
313 tcp_max_orphans (integer; default: see below; since Linux 2.4)
314 The maximum number of orphaned (not attached to any user file
315 handle) TCP sockets allowed in the system. When this number is
316 exceeded, the orphaned connection is reset and a warning is
317 printed. This limit exists only to prevent simple denial-of-
318 service attacks. Lowering this limit is not recommended. Net‐
319 work conditions might require you to increase the number of or‐
320 phans allowed, but note that each orphan can eat up to ~64 kB of
321 unswappable memory. The default initial value is set equal to
322 the kernel parameter NR_FILE. This initial default is adjusted
323 depending on the memory in the system.
324
325 tcp_max_syn_backlog (integer; default: see below; since Linux 2.2)
326 The maximum number of queued connection requests which have
327 still not received an acknowledgement from the connecting
328 client. If this number is exceeded, the kernel will begin drop‐
329 ping requests. The default value of 256 is increased to 1024
330 when the memory present in the system is adequate or greater (>=
331 128 MB), and reduced to 128 for those systems with very low mem‐
332 ory (<= 32 MB).
333
334 Prior to Linux 2.6.20, it was recommended that if this needed to
335 be increased above 1024, the size of the SYNACK hash table
336 (TCP_SYNQ_HSIZE) in include/net/tcp.h should be modified to keep
337
338 TCP_SYNQ_HSIZE * 16 <= tcp_max_syn_backlog
339
340 and the kernel should be recompiled. In Linux 2.6.20, the fixed
341 sized TCP_SYNQ_HSIZE was removed in favor of dynamic sizing.
342
343 tcp_max_tw_buckets (integer; default: see below; since Linux 2.4)
344 The maximum number of sockets in TIME_WAIT state allowed in the
345 system. This limit exists only to prevent simple denial-of-ser‐
346 vice attacks. The default value of NR_FILE*2 is adjusted de‐
347 pending on the memory in the system. If this number is ex‐
348 ceeded, the socket is closed and a warning is printed.
349
350 tcp_moderate_rcvbuf (Boolean; default: enabled; since Linux
351 2.4.17/2.6.7)
352 If enabled, TCP performs receive buffer auto-tuning, attempting
353 to automatically size the buffer (no greater than tcp_rmem[2])
354 to match the size required by the path for full throughput.
355
356 tcp_mem (since Linux 2.4)
357 This is a vector of 3 integers: [low, pressure, high]. These
358 bounds, measured in units of the system page size, are used by
359 TCP to track its memory usage. The defaults are calculated at
360 boot time from the amount of available memory. (TCP can only
361 use low memory for this, which is limited to around 900
362 megabytes on 32-bit systems. 64-bit systems do not suffer this
363 limitation.)
364
365 low TCP doesn't regulate its memory allocation when the num‐
366 ber of pages it has allocated globally is below this num‐
367 ber.
368
369 pressure
370 When the amount of memory allocated by TCP exceeds this
371 number of pages, TCP moderates its memory consumption.
372 This memory pressure state is exited once the number of
373 pages allocated falls below the low mark.
374
375 high The maximum number of pages, globally, that TCP will al‐
376 locate. This value overrides any other limits imposed by
377 the kernel.
378
379 tcp_mtu_probing (integer; default: 0; since Linux 2.6.17)
380 This parameter controls TCP Packetization-Layer Path MTU Discov‐
381 ery. The following values may be assigned to the file:
382
383 0 Disabled
384
385 1 Disabled by default, enabled when an ICMP black hole detected
386
387 2 Always enabled, use initial MSS of tcp_base_mss.
388
389 tcp_no_metrics_save (Boolean; default: disabled; since Linux 2.6.6)
390 By default, TCP saves various connection metrics in the route
391 cache when the connection closes, so that connections estab‐
392 lished in the near future can use these to set initial condi‐
393 tions. Usually, this increases overall performance, but it may
394 sometimes cause performance degradation. If tcp_no_metrics_save
395 is enabled, TCP will not cache metrics on closing connections.
396
397 tcp_orphan_retries (integer; default: 8; since Linux 2.4)
398 The maximum number of attempts made to probe the other end of a
399 connection which has been closed by our end.
400
401 tcp_reordering (integer; default: 3; since Linux 2.4)
402 The maximum a packet can be reordered in a TCP packet stream
403 without TCP assuming packet loss and going into slow start. It
404 is not advisable to change this number. This is a packet re‐
405 ordering detection metric designed to minimize unnecessary back
406 off and retransmits provoked by reordering of packets on a con‐
407 nection.
408
409 tcp_retrans_collapse (Boolean; default: enabled; since Linux 2.2)
410 Try to send full-sized packets during retransmit.
411
412 tcp_retries1 (integer; default: 3; since Linux 2.2)
413 The number of times TCP will attempt to retransmit a packet on
414 an established connection normally, without the extra effort of
415 getting the network layers involved. Once we exceed this number
416 of retransmits, we first have the network layer update the route
417 if possible before each new retransmit. The default is the RFC
418 specified minimum of 3.
419
420 tcp_retries2 (integer; default: 15; since Linux 2.2)
421 The maximum number of times a TCP packet is retransmitted in es‐
422 tablished state before giving up. The default value is 15,
423 which corresponds to a duration of approximately between 13 to
424 30 minutes, depending on the retransmission timeout. The
425 RFC 1122 specified minimum limit of 100 seconds is typically
426 deemed too short.
427
428 tcp_rfc1337 (Boolean; default: disabled; since Linux 2.2)
429 Enable TCP behavior conformant with RFC 1337. When disabled, if
430 a RST is received in TIME_WAIT state, we close the socket imme‐
431 diately without waiting for the end of the TIME_WAIT period.
432
433 tcp_rmem (since Linux 2.4)
434 This is a vector of 3 integers: [min, default, max]. These pa‐
435 rameters are used by TCP to regulate receive buffer sizes. TCP
436 dynamically adjusts the size of the receive buffer from the de‐
437 faults listed below, in the range of these values, depending on
438 memory available in the system.
439
440 min minimum size of the receive buffer used by each TCP
441 socket. The default value is the system page size. (On
442 Linux 2.4, the default value is 4 kB, lowered to
443 PAGE_SIZE bytes in low-memory systems.) This value is
444 used to ensure that in memory pressure mode, allocations
445 below this size will still succeed. This is not used to
446 bound the size of the receive buffer declared using
447 SO_RCVBUF on a socket.
448
449 default
450 the default size of the receive buffer for a TCP socket.
451 This value overwrites the initial default buffer size
452 from the generic global net.core.rmem_default defined for
453 all protocols. The default value is 87380 bytes. (On
454 Linux 2.4, this will be lowered to 43689 in low-memory
455 systems.) If larger receive buffer sizes are desired,
456 this value should be increased (to affect all sockets).
457 To employ large TCP windows, the net.ipv4.tcp_win‐
458 dow_scaling must be enabled (default).
459
460 max the maximum size of the receive buffer used by each TCP
461 socket. This value does not override the global
462 net.core.rmem_max. This is not used to limit the size of
463 the receive buffer declared using SO_RCVBUF on a socket.
464 The default value is calculated using the formula
465
466 max(87380, min(4 MB, tcp_mem[1]*PAGE_SIZE/128))
467
468 (On Linux 2.4, the default is 87380*2 bytes, lowered to
469 87380 in low-memory systems).
470
471 tcp_sack (Boolean; default: enabled; since Linux 2.2)
472 Enable RFC 2018 TCP Selective Acknowledgements.
473
474 tcp_slow_start_after_idle (Boolean; default: enabled; since Linux
475 2.6.18)
476 If enabled, provide RFC 2861 behavior and time out the conges‐
477 tion window after an idle period. An idle period is defined as
478 the current RTO (retransmission timeout). If disabled, the con‐
479 gestion window will not be timed out after an idle period.
480
481 tcp_stdurg (Boolean; default: disabled; since Linux 2.2)
482 If this option is enabled, then use the RFC 1122 interpretation
483 of the TCP urgent-pointer field. According to this interpreta‐
484 tion, the urgent pointer points to the last byte of urgent data.
485 If this option is disabled, then use the BSD-compatible inter‐
486 pretation of the urgent pointer: the urgent pointer points to
487 the first byte after the urgent data. Enabling this option may
488 lead to interoperability problems.
489
490 tcp_syn_retries (integer; default: 6; since Linux 2.2)
491 The maximum number of times initial SYNs for an active TCP con‐
492 nection attempt will be retransmitted. This value should not be
493 higher than 255. The default value is 6, which corresponds to
494 retrying for up to approximately 127 seconds. Before Linux 3.7,
495 the default value was 5, which (in conjunction with calculation
496 based on other kernel parameters) corresponded to approximately
497 180 seconds.
498
499 tcp_synack_retries (integer; default: 5; since Linux 2.2)
500 The maximum number of times a SYN/ACK segment for a passive TCP
501 connection will be retransmitted. This number should not be
502 higher than 255.
503
504 tcp_syncookies (integer; default: 1; since Linux 2.2)
505 Enable TCP syncookies. The kernel must be compiled with CON‐
506 FIG_SYN_COOKIES. The syncookies feature attempts to protect a
507 socket from a SYN flood attack. This should be used as a last
508 resort, if at all. This is a violation of the TCP protocol, and
509 conflicts with other areas of TCP such as TCP extensions. It
510 can cause problems for clients and relays. It is not recom‐
511 mended as a tuning mechanism for heavily loaded servers to help
512 with overloaded or misconfigured conditions. For recommended
513 alternatives see tcp_max_syn_backlog, tcp_synack_retries, and
514 tcp_abort_on_overflow. Set to one of the following values:
515
516 0 Disable TCP syncookies.
517
518 1 Send out syncookies when the syn backlog queue of a socket
519 overflows.
520
521 2 (since Linux 3.12) Send out syncookies unconditionally. This
522 can be useful for network testing.
523
524 tcp_timestamps (integer; default: 1; since Linux 2.2)
525 Set to one of the following values to enable or disable RFC 1323
526 TCP timestamps:
527
528 0 Disable timestamps.
529
530 1 Enable timestamps as defined in RFC1323 and use random offset
531 for each connection rather than only using the current time.
532
533 2 As for the value 1, but without random offsets. Setting
534 tcp_timestamps to this value is meaningful since Linux 4.10.
535
536 tcp_tso_win_divisor (integer; default: 3; since Linux 2.6.9)
537 This parameter controls what percentage of the congestion window
538 can be consumed by a single TCP Segmentation Offload (TSO)
539 frame. The setting of this parameter is a tradeoff between
540 burstiness and building larger TSO frames.
541
542 tcp_tw_recycle (Boolean; default: disabled; Linux 2.4 to 4.11)
543 Enable fast recycling of TIME_WAIT sockets. Enabling this op‐
544 tion is not recommended as the remote IP may not use monotoni‐
545 cally increasing timestamps (devices behind NAT, devices with
546 per-connection timestamp offsets). See RFC 1323 (PAWS) and RFC
547 6191.
548
549 tcp_tw_reuse (Boolean; default: disabled; since Linux 2.4.19/2.6)
550 Allow to reuse TIME_WAIT sockets for new connections when it is
551 safe from protocol viewpoint. It should not be changed without
552 advice/request of technical experts.
553
554 tcp_vegas_cong_avoid (Boolean; default: disabled; Linux 2.2 to 2.6.13)
555 Enable TCP Vegas congestion avoidance algorithm. TCP Vegas is a
556 sender-side-only change to TCP that anticipates the onset of
557 congestion by estimating the bandwidth. TCP Vegas adjusts the
558 sending rate by modifying the congestion window. TCP Vegas
559 should provide less packet loss, but it is not as aggressive as
560 TCP Reno.
561
562 tcp_westwood (Boolean; default: disabled; Linux 2.4.26/2.6.3 to 2.6.13)
563 Enable TCP Westwood+ congestion control algorithm. TCP West‐
564 wood+ is a sender-side-only modification of the TCP Reno proto‐
565 col stack that optimizes the performance of TCP congestion con‐
566 trol. It is based on end-to-end bandwidth estimation to set
567 congestion window and slow start threshold after a congestion
568 episode. Using this estimation, TCP Westwood+ adaptively sets a
569 slow start threshold and a congestion window which takes into
570 account the bandwidth used at the time congestion is experi‐
571 enced. TCP Westwood+ significantly increases fairness with re‐
572 spect to TCP Reno in wired networks and throughput over wireless
573 links.
574
575 tcp_window_scaling (Boolean; default: enabled; since Linux 2.2)
576 Enable RFC 1323 TCP window scaling. This feature allows the use
577 of a large window (> 64 kB) on a TCP connection, should the
578 other end support it. Normally, the 16 bit window length field
579 in the TCP header limits the window size to less than 64 kB. If
580 larger windows are desired, applications can increase the size
581 of their socket buffers and the window scaling option will be
582 employed. If tcp_window_scaling is disabled, TCP will not nego‐
583 tiate the use of window scaling with the other end during con‐
584 nection setup.
585
586 tcp_wmem (since Linux 2.4)
587 This is a vector of 3 integers: [min, default, max]. These pa‐
588 rameters are used by TCP to regulate send buffer sizes. TCP dy‐
589 namically adjusts the size of the send buffer from the default
590 values listed below, in the range of these values, depending on
591 memory available.
592
593 min Minimum size of the send buffer used by each TCP socket.
594 The default value is the system page size. (On Linux
595 2.4, the default value is 4 kB.) This value is used to
596 ensure that in memory pressure mode, allocations below
597 this size will still succeed. This is not used to bound
598 the size of the send buffer declared using SO_SNDBUF on a
599 socket.
600
601 default
602 The default size of the send buffer for a TCP socket.
603 This value overwrites the initial default buffer size
604 from the generic global /proc/sys/net/core/wmem_default
605 defined for all protocols. The default value is 16 kB.
606 If larger send buffer sizes are desired, this value
607 should be increased (to affect all sockets). To employ
608 large TCP windows, the /proc/sys/net/ipv4/tcp_win‐
609 dow_scaling must be set to a nonzero value (default).
610
611 max The maximum size of the send buffer used by each TCP
612 socket. This value does not override the value in
613 /proc/sys/net/core/wmem_max. This is not used to limit
614 the size of the send buffer declared using SO_SNDBUF on a
615 socket. The default value is calculated using the for‐
616 mula
617
618 max(65536, min(4 MB, tcp_mem[1]*PAGE_SIZE/128))
619
620 (On Linux 2.4, the default value is 128 kB, lowered 64 kB
621 depending on low-memory systems.)
622
623 tcp_workaround_signed_windows (Boolean; default: disabled; since Linux
624 2.6.26)
625 If enabled, assume that no receipt of a window-scaling option
626 means that the remote TCP is broken and treats the window as a
627 signed quantity. If disabled, assume that the remote TCP is not
628 broken even if we do not receive a window scaling option from
629 it.
630
631 Socket options
632 To set or get a TCP socket option, call getsockopt(2) to read or set‐
633 sockopt(2) to write the option with the option level argument set to
634 IPPROTO_TCP. Unless otherwise noted, optval is a pointer to an int.
635 In addition, most IPPROTO_IP socket options are valid on TCP sockets.
636 For more information see ip(7).
637
638 Following is a list of TCP-specific socket options. For details of
639 some other socket options that are also applicable for TCP sockets, see
640 socket(7).
641
642 TCP_CONGESTION (since Linux 2.6.13)
643 The argument for this option is a string. This option allows
644 the caller to set the TCP congestion control algorithm to be
645 used, on a per-socket basis. Unprivileged processes are re‐
646 stricted to choosing one of the algorithms in tcp_allowed_con‐
647 gestion_control (described above). Privileged processes
648 (CAP_NET_ADMIN) can choose from any of the available congestion-
649 control algorithms (see the description of tcp_available_conges‐
650 tion_control above).
651
652 TCP_CORK (since Linux 2.2)
653 If set, don't send out partial frames. All queued partial
654 frames are sent when the option is cleared again. This is use‐
655 ful for prepending headers before calling sendfile(2), or for
656 throughput optimization. As currently implemented, there is a
657 200 millisecond ceiling on the time for which output is corked
658 by TCP_CORK. If this ceiling is reached, then queued data is
659 automatically transmitted. This option can be combined with
660 TCP_NODELAY only since Linux 2.5.71. This option should not be
661 used in code intended to be portable.
662
663 TCP_DEFER_ACCEPT (since Linux 2.4)
664 Allow a listener to be awakened only when data arrives on the
665 socket. Takes an integer value (seconds), this can bound the
666 maximum number of attempts TCP will make to complete the connec‐
667 tion. This option should not be used in code intended to be
668 portable.
669
670 TCP_INFO (since Linux 2.4)
671 Used to collect information about this socket. The kernel re‐
672 turns a struct tcp_info as defined in the file /usr/in‐
673 clude/linux/tcp.h. This option should not be used in code in‐
674 tended to be portable.
675
676 TCP_KEEPCNT (since Linux 2.4)
677 The maximum number of keepalive probes TCP should send before
678 dropping the connection. This option should not be used in code
679 intended to be portable.
680
681 TCP_KEEPIDLE (since Linux 2.4)
682 The time (in seconds) the connection needs to remain idle before
683 TCP starts sending keepalive probes, if the socket option
684 SO_KEEPALIVE has been set on this socket. This option should
685 not be used in code intended to be portable.
686
687 TCP_KEEPINTVL (since Linux 2.4)
688 The time (in seconds) between individual keepalive probes. This
689 option should not be used in code intended to be portable.
690
691 TCP_LINGER2 (since Linux 2.4)
692 The lifetime of orphaned FIN_WAIT2 state sockets. This option
693 can be used to override the system-wide setting in the file
694 /proc/sys/net/ipv4/tcp_fin_timeout for this socket. This is not
695 to be confused with the socket(7) level option SO_LINGER. This
696 option should not be used in code intended to be portable.
697
698 TCP_MAXSEG
699 The maximum segment size for outgoing TCP packets. In Linux 2.2
700 and earlier, and in Linux 2.6.28 and later, if this option is
701 set before connection establishment, it also changes the MSS
702 value announced to the other end in the initial packet. Values
703 greater than the (eventual) interface MTU have no effect. TCP
704 will also impose its minimum and maximum bounds over the value
705 provided.
706
707 TCP_NODELAY
708 If set, disable the Nagle algorithm. This means that segments
709 are always sent as soon as possible, even if there is only a
710 small amount of data. When not set, data is buffered until
711 there is a sufficient amount to send out, thereby avoiding the
712 frequent sending of small packets, which results in poor uti‐
713 lization of the network. This option is overridden by TCP_CORK;
714 however, setting this option forces an explicit flush of pending
715 output, even if TCP_CORK is currently set.
716
717 TCP_QUICKACK (since Linux 2.4.4)
718 Enable quickack mode if set or disable quickack mode if cleared.
719 In quickack mode, acks are sent immediately, rather than delayed
720 if needed in accordance to normal TCP operation. This flag is
721 not permanent, it only enables a switch to or from quickack
722 mode. Subsequent operation of the TCP protocol will once again
723 enter/leave quickack mode depending on internal protocol pro‐
724 cessing and factors such as delayed ack timeouts occurring and
725 data transfer. This option should not be used in code intended
726 to be portable.
727
728 TCP_SYNCNT (since Linux 2.4)
729 Set the number of SYN retransmits that TCP should send before
730 aborting the attempt to connect. It cannot exceed 255. This
731 option should not be used in code intended to be portable.
732
733 TCP_USER_TIMEOUT (since Linux 2.6.37)
734 This option takes an unsigned int as an argument. When the
735 value is greater than 0, it specifies the maximum amount of time
736 in milliseconds that transmitted data may remain unacknowledged,
737 or bufferred data may remain untransmitted (due to zero window
738 size) before TCP will forcibly close the corresponding connec‐
739 tion and return ETIMEDOUT to the application. If the option
740 value is specified as 0, TCP will use the system default.
741
742 Increasing user timeouts allows a TCP connection to survive ex‐
743 tended periods without end-to-end connectivity. Decreasing user
744 timeouts allows applications to "fail fast", if so desired.
745 Otherwise, failure may take up to 20 minutes with the current
746 system defaults in a normal WAN environment.
747
748 This option can be set during any state of a TCP connection, but
749 is effective only during the synchronized states of a connection
750 (ESTABLISHED, FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, and
751 LAST-ACK). Moreover, when used with the TCP keepalive
752 (SO_KEEPALIVE) option, TCP_USER_TIMEOUT will override keepalive
753 to determine when to close a connection due to keepalive fail‐
754 ure.
755
756 The option has no effect on when TCP retransmits a packet, nor
757 when a keepalive probe is sent.
758
759 This option, like many others, will be inherited by the socket
760 returned by accept(2), if it was set on the listening socket.
761
762 Further details on the user timeout feature can be found in
763 RFC 793 and RFC 5482 ("TCP User Timeout Option").
764
765 TCP_WINDOW_CLAMP (since Linux 2.4)
766 Bound the size of the advertised window to this value. The ker‐
767 nel imposes a minimum size of SOCK_MIN_RCVBUF/2. This option
768 should not be used in code intended to be portable.
769
770 Sockets API
771 TCP provides limited support for out-of-band data, in the form of (a
772 single byte of) urgent data. In Linux this means if the other end
773 sends newer out-of-band data the older urgent data is inserted as nor‐
774 mal data into the stream (even when SO_OOBINLINE is not set). This
775 differs from BSD-based stacks.
776
777 Linux uses the BSD compatible interpretation of the urgent pointer
778 field by default. This violates RFC 1122, but is required for interop‐
779 erability with other stacks. It can be changed via
780 /proc/sys/net/ipv4/tcp_stdurg.
781
782 It is possible to peek at out-of-band data using the recv(2) MSG_PEEK
783 flag.
784
785 Since version 2.4, Linux supports the use of MSG_TRUNC in the flags ar‐
786 gument of recv(2) (and recvmsg(2)). This flag causes the received
787 bytes of data to be discarded, rather than passed back in a caller-sup‐
788 plied buffer. Since Linux 2.4.4, MSG_TRUNC also has this effect when
789 used in conjunction with MSG_OOB to receive out-of-band data.
790
791 Ioctls
792 The following ioctl(2) calls return information in value. The correct
793 syntax is:
794
795 int value;
796 error = ioctl(tcp_socket, ioctl_type, &value);
797
798 ioctl_type is one of the following:
799
800 SIOCINQ
801 Returns the amount of queued unread data in the receive buffer.
802 The socket must not be in LISTEN state, otherwise an error (EIN‐
803 VAL) is returned. SIOCINQ is defined in <linux/sockios.h>. Al‐
804 ternatively, you can use the synonymous FIONREAD, defined in
805 <sys/ioctl.h>.
806
807 SIOCATMARK
808 Returns true (i.e., value is nonzero) if the inbound data stream
809 is at the urgent mark.
810
811 If the SO_OOBINLINE socket option is set, and SIOCATMARK returns
812 true, then the next read from the socket will return the urgent
813 data. If the SO_OOBINLINE socket option is not set, and SIOCAT‐
814 MARK returns true, then the next read from the socket will re‐
815 turn the bytes following the urgent data (to actually read the
816 urgent data requires the recv(MSG_OOB) flag).
817
818 Note that a read never reads across the urgent mark. If an ap‐
819 plication is informed of the presence of urgent data via se‐
820 lect(2) (using the exceptfds argument) or through delivery of a
821 SIGURG signal, then it can advance up to the mark using a loop
822 which repeatedly tests SIOCATMARK and performs a read (request‐
823 ing any number of bytes) as long as SIOCATMARK returns false.
824
825 SIOCOUTQ
826 Returns the amount of unsent data in the socket send queue. The
827 socket must not be in LISTEN state, otherwise an error (EINVAL)
828 is returned. SIOCOUTQ is defined in <linux/sockios.h>. Alter‐
829 natively, you can use the synonymous TIOCOUTQ, defined in
830 <sys/ioctl.h>.
831
832 Error handling
833 When a network error occurs, TCP tries to resend the packet. If it
834 doesn't succeed after some time, either ETIMEDOUT or the last received
835 error on this connection is reported.
836
837 Some applications require a quicker error notification. This can be
838 enabled with the IPPROTO_IP level IP_RECVERR socket option. When this
839 option is enabled, all incoming errors are immediately passed to the
840 user program. Use this option with care — it makes TCP less tolerant
841 to routing changes and other normal network conditions.
842
844 EAFNOTSUPPORT
845 Passed socket address type in sin_family was not AF_INET.
846
847 EPIPE The other end closed the socket unexpectedly or a read is exe‐
848 cuted on a shut down socket.
849
850 ETIMEDOUT
851 The other end didn't acknowledge retransmitted data after some
852 time.
853
854 Any errors defined for ip(7) or the generic socket layer may also be
855 returned for TCP.
856
858 Support for Explicit Congestion Notification, zero-copy sendfile(2),
859 reordering support and some SACK extensions (DSACK) were introduced in
860 2.4. Support for forward acknowledgement (FACK), TIME_WAIT recycling,
861 and per-connection keepalive socket options were introduced in 2.3.
862
864 Not all errors are documented.
865
866 IPv6 is not described.
867
869 accept(2), bind(2), connect(2), getsockopt(2), listen(2), recvmsg(2),
870 sendfile(2), sendmsg(2), socket(2), ip(7), socket(7)
871
872 The kernel source file Documentation/networking/ip-sysctl.txt.
873
874 RFC 793 for the TCP specification.
875 RFC 1122 for the TCP requirements and a description of the Nagle algo‐
876 rithm.
877 RFC 1323 for TCP timestamp and window scaling options.
878 RFC 1337 for a description of TIME_WAIT assassination hazards.
879 RFC 3168 for a description of Explicit Congestion Notification.
880 RFC 2581 for TCP congestion control algorithms.
881 RFC 2018 and RFC 2883 for SACK and extensions to SACK.
882
884 This page is part of release 5.13 of the Linux man-pages project. A
885 description of the project, information about reporting bugs, and the
886 latest version of this page, can be found at
887 https://www.kernel.org/doc/man-pages/.
888
889
890
891Linux 2021-03-22 TCP(7)