1tcp(7P) Protocols tcp(7P)
2
3
4
6 tcp, TCP - Internet Transmission Control Protocol
7
9 #include <sys/socket.h>
10
11
12 #include <netinet/in.h>
13
14
15 s = socket(AF_INET, SOCK_STREAM, 0);
16
17
18 s = socket(AF_INET6, SOCK_STREAM, 0);
19
20
21 t = t_open("/dev/tcp", O_RDWR);
22
23
24 t = t_open("/dev/tcp6", O_RDWR);
25
26
28 TCP is the virtual circuit protocol of the Internet protocol family. It
29 provides reliable, flow-controlled, in order, two-way transmission of
30 data. It is a byte-stream protocol layered above the Internet Protocol
31 (IP), or the Internet Protocol Version 6 (IPv6), the Internet protocol
32 family's internetwork datagram delivery protocol.
33
34
35 Programs can access TCP using the socket interface as a SOCK_STREAM
36 socket type, or using the Transport Level Interface (TLI) where it sup‐
37 ports the connection-oriented (T_COTS_ORD) service type.
38
39
40 TCP uses IP's host-level addressing and adds its own per-host collec‐
41 tion of "port addresses." The endpoints of a TCP connection are identi‐
42 fied by the combination of an IP or IPv6 address and a TCP port number.
43 Although other protocols, such as the User Datagram Protocol (UDP), may
44 use the same host and port address format, the port space of these pro‐
45 tocols is distinct. See inet(7P) and inet6(7P) for details on the com‐
46 mon aspects of addressing in the Internet protocol family.
47
48
49 Sockets utilizing TCP are either "active" or "passive." Active sockets
50 initiate connections to passive sockets. Both types of sockets must
51 have their local IP or IPv6 address and TCP port number bound with the
52 bind(3SOCKET) system call after the socket is created. By default, TCP
53 sockets are active. A passive socket is created by calling the lis‐
54 ten(3SOCKET) system call after binding the socket with bind(). This
55 establishes a queueing parameter for the passive socket. After this,
56 connections to the passive socket can be received with the
57 accept(3SOCKET) system call. Active sockets use the connect(3SOCKET)
58 call after binding to initiate connections.
59
60
61 By using the special value INADDR_ANY with IP, or the unspecified
62 address (all zeroes) with IPv6, the local IP address can be left
63 unspecified in the bind() call by either active or passive TCP sockets.
64 This feature is usually used if the local address is either unknown or
65 irrelevant. If left unspecified, the local IP or IPv6 address will be
66 bound at connection time to the address of the network interface used
67 to service the connection.
68
69
70 Note that no two TCP sockets can be bound to the same port unless the
71 bound IP addresses are different. IPv4 INADDR_ANY and IPv6 unspecified
72 addresses compare as equal to any IPv4 or IPv6 address. For example, if
73 a socket is bound to INADDR_ANY or unspecified address and port X, no
74 other socket can bind to port X, regardless of the binding address.
75 This special consideration of INADDR_ANY and unspecified address can be
76 changed using the socket option SO_REUSEADDR. If SO_REUSEADDR is set on
77 a socket doing a bind, IPv4 INADDR_ANY and IPv6 unspecified address do
78 not compare as equal to any IP address. This means that as long as the
79 two sockets are not both bound to INADDR_ANY/unspecified address or the
80 same IP address, the two sockets can be bound to the same port.
81
82
83 If an application does not want to allow another socket using the
84 SO_REUSEADDR option to bind to a port its socket is bound to, the
85 application can set the socket level option SO_EXCLBIND on a socket.
86 The option values of 0 and 1 mean enabling and disabling the option
87 respectively. Once this option is enabled on a socket, no other socket
88 can be bound to the same port.
89
90
91 Once a connection has been established, data can be exchanged using the
92 read(2) and write(2) system calls.
93
94
95 Under most circumstances, TCP sends data when it is presented. When
96 outstanding data has not yet been acknowledged, TCP gathers small
97 amounts of output to be sent in a single packet once an acknowledgement
98 has been received. For a small number of clients, such as window sys‐
99 tems that send a stream of mouse events which receive no replies, this
100 packetization may cause significant delays. To circumvent this problem,
101 TCP provides a socket-level boolean option, TCP_NODELAY. TCP_NODELAY is
102 defined in <netinet/tcp.h>, and is set with setsockopt(3SOCKET) and
103 tested with getsockopt(3SOCKET). The option level for the setsockopt()
104 call is the protocol number for TCP, available from getprotoby‐
105 name(3SOCKET).
106
107
108 For some applications, it may be desirable for TCP not to send out data
109 unless a full TCP segment can be sent. To enable this behavior, an
110 application can use the TCP_CORK socket option. When TCP_CORK is set
111 with a non-zero value, TCP sends out a full TCP segment only. When
112 TCP_CORK is set to zero after it has been enabled, all buffered data is
113 sent out (as permitted by the peer's receive window and the current
114 congestion window). TCP_CORK is defined in <netinet/tcp.h>, and is set
115 with setsockopt(3SOCKET) and tested with getsockopt(3SOCKET). The
116 option level for the setsockopt() call is the protocol number for
117 TCP, available from getprotobyname(3SOCKET).
118
119
120 The SO_RCVBUF socket level option can be used to control the window
121 that TCP advertises to the peer. IP level options may also be used with
122 TCP. See ip(7P) and ip6(7P).
123
124
125 Another socket level option, SO_RCVBUF, can be used to control the win‐
126 dow that TCP advertises to the peer. IP level options may also be used
127 with TCP. See ip(7P) and ip6(7P).
128
129
130 TCP provides an urgent data mechanism, which may be invoked using the
131 out-of-band provisions of send(3SOCKET). The caller may mark one byte
132 as "urgent" with the MSG_OOB flag to send(3SOCKET). This sets an
133 "urgent pointer" pointing to this byte in the TCP stream. The receiver
134 on the other side of the stream is notified of the urgent data by a
135 SIGURG signal. The SIOCATMARK ioctl(2) request returns a value indicat‐
136 ing whether the stream is at the urgent mark. Because the system never
137 returns data across the urgent mark in a single read(2) call, it is
138 possible to advance to the urgent data in a simple loop which reads
139 data, testing the socket with the SIOCATMARK ioctl() request, until it
140 reaches the mark.
141
142
143 Incoming connection requests that include an IP source route option are
144 noted, and the reverse source route is used in responding.
145
146
147 A checksum over all data helps TCP implement reliability. Using a win‐
148 dow-based flow control mechanism that makes use of positive acknowl‐
149 edgements, sequence numbers, and a retransmission strategy, TCP can
150 usually recover when datagrams are damaged, delayed, duplicated or
151 delivered out of order by the underlying communication medium.
152
153
154 If the local TCP receives no acknowledgements from its peer for a
155 period of time, (for example, if the remote machine crashes), the con‐
156 nection is closed and an error is returned.
157
158
159 TCP follows the congestion control algorithm described in RFC 2581, and
160 also supports the initial congestion window (cwnd) changes in RFC 3390.
161 The initial cwnd calculation can be overridden by the socket option
162 TCP_INIT_CWND. An application can use this option to set the initial
163 cwnd to a specified number of TCP segments. This applies to the cases
164 when the connection first starts and restarts after an idle period.
165 The process must have the PRIV_SYS_NET_CONFIG privilege if it wants to
166 specify a number greater than that calculated by RFC 3390.
167
168
169 SunOS supports TCP Extensions for High Performance (RFC 1323) which
170 includes the window scale and time stamp options, and Protection
171 Against Wrap Around Sequence Numbers (PAWS). SunOS also supports Selec‐
172 tive Acknowledgment (SACK) capabilities (RFC 2018) and Explicit Conges‐
173 tion Notification (ECN) mechanism (RFC 3168).
174
175
176 Turn on the window scale option in one of the following ways:
177
178 o An application can set SO_SNDBUF or SO_RCVBUF size in the
179 setsockopt() option to be larger than 64K. This must be done
180 before the program calls listen() or connect(), because the
181 window scale option is negotiated when the connection is
182 established. Once the connection has been made, it is too
183 late to increase the send or receive window beyond the
184 default TCP limit of 64K.
185
186 o For all applications, use ndd(1M) to modify the configura‐
187 tion parameter tcp_wscale_always. If tcp_wscale_always is
188 set to 1, the window scale option will always be set when
189 connecting to a remote system. If tcp_wscale_always is 0,
190 the window scale option will be set only if the user has
191 requested a send or receive window larger than 64K. The
192 default value of tcp_wscale_always is 1.
193
194 o Regardless of the value of tcp_wscale_always, the window
195 scale option will always be included in a connect acknowl‐
196 edgement if the connecting system has used the option.
197
198
199 Turn on SACK capabilities in the following way:
200
201 o Use ndd to modify the configuration parameter tcp_sack_per‐
202 mitted. If tcp_sack_permitted is set to 0, TCP will not
203 accept SACK or send out SACK information. If tcp_sack_per‐
204 mitted is set to 1, TCP will not initiate a connection with
205 SACK permitted option in the SYN segment, but will respond
206 with SACK permitted option in the SYN|ACK segment if an
207 incoming connection request has the SACK permitted option.
208 This means that TCP will only accept SACK information if the
209 other side of the connection also accepts SACK information.
210 If tcp_sack_permitted is set to 2, it will both initiate and
211 accept connections with SACK information. The default for
212 tcp_sack_permitted is 2 (active enabled).
213
214
215 Turn on TCP ECN mechanism in the following way:
216
217 o Use ndd to modify the configuration parameter tcp_ecn_per‐
218 mitted. If tcp_ecn_permitted is set to 0, TCP will not nego‐
219 tiate with a peer that supports ECN mechanism. If
220 tcp_ecn_permitted is set to 1 when initiating a connection,
221 TCP will not tell a peer that it supports ECN mechanism.
222 However, it will tell a peer that it supports ECN mechanism
223 when accepting a new incoming connection request if the peer
224 indicates that it supports ECN mechanism in the SYN segment.
225 If tcp_ecn_permitted is set to 2, in addition to negotiating
226 with a peer on ECN mechanism when accepting connections, TCP
227 will indicate in the outgoing SYN segment that it supports
228 ECN mechanism when TCP makes active outgoing connections.
229 The default for tcp_ecn_permitted is 1.
230
231
232 Turn on the time stamp option in the following way:
233
234 o Use ndd to modify the configuration parameter
235 tcp_tstamp_always. If tcp_tstamp_always is 1, the time stamp
236 option will always be set when connecting to a remote
237 machine. If tcp_tstamp_always is 0, the timestamp option
238 will not be set when connecting to a remote system. The
239 default for tcp_tstamp_always is 0.
240
241 o Regardless of the value of tcp_tstamp_always, the time stamp
242 option will always be included in a connect acknowledgement
243 (and all succeeding packets) if the connecting system has
244 used the time stamp option.
245
246
247 Use the following procedure to turn on the time stamp option only when
248 the window scale option is in effect:
249
250 o Use ndd to modify the configuration parameter
251 tcp_tstamp_if_wscale. Setting tcp_tstamp_if_wscale to 1 will
252 cause the time stamp option to be set when connecting to a
253 remote system, if the window scale option has been set. If
254 tcp_tstamp_if_wscale is 0, the time stamp option will not be
255 set when connecting to a remote system. The default for
256 tcp_tstamp_if_wscale is 1.
257
258
259 Protection Against Wrap Around Sequence Numbers (PAWS) is always used
260 when the time stamp option is set.
261
262
263 SunOS also supports multiple methods of generating initial sequence
264 numbers. One of these methods is the improved technique suggested in
265 RFC 1948. We HIGHLY recommend that you set sequence number generation
266 parameters as close to boot time as possible. This prevents sequence
267 number problems on connections that use the same connection-ID as ones
268 that used a different sequence number generation. The svc:/network/ini‐
269 tial:default service configures the initial sequence number generation.
270 The service reads the value contained in the configuration file
271 /etc/default/inetinit to determine which method to use.
272
273
274 The /etc/default/inetinit file is an unstable interface, and may change
275 in future releases.
276
277
278 TCP may be configured to report some information on connections that
279 terminate by means of an RST packet. By default, no logging is done. If
280 the ndd(1M) parameter tcp_trace is set to 1, then trace data is col‐
281 lected for all new connections established after that time.
282
283
284 The trace data consists of the TCP headers and IP source and destina‐
285 tion addresses of the last few packets sent in each direction before
286 RST occurred. Those packets are logged in a series of strlog(9F) calls.
287 This trace facility has a very low overhead, and so is superior to such
288 utilities as snoop(1M) for non-intrusive debugging for connections ter‐
289 minating by means of an RST.
290
291
292 SunOS supports the keep-alive mechanism described in RFC 1122. It is
293 enabled using the socket option SO_KEEPALIVE. When enabled, the first
294 keep-alive probe is sent out after a TCP is idle for two hours If the
295 peer does not respond to the probe within eight minutes, the TCP con‐
296 nection is aborted. You can alter the interval for sending out the
297 first probe using the socket option TCP_KEEPALIVE_THRESHOLD. The option
298 value is an unsigned integer in milliseconds. The system default is
299 controlled by the TCP ndd parameter tcp_keepalive_interval. The minimum
300 value is ten seconds. The maximum is ten days, while the default is two
301 hours. If you receive no response to the probe, you can use the
302 TCP_KEEPALIVE_ABORT_THRESHOLD socket option to change the time thresh‐
303 old for aborting a TCP connection. The option value is an unsigned
304 integer in milliseconds. The value zero indicates that TCP should never
305 time out and abort the connection when probing. The system default is
306 controlled by the TCP ndd parameter tcp_keepalive_abort_interval. The
307 default is eight minutes.
308
310 svcs(1), ndd(1M), ioctl(2), read(2), svcadm(1M), write(2),
311 accept(3SOCKET), bind(3SOCKET), connect(3SOCKET), getprotoby‐
312 name(3SOCKET), getsockopt(3SOCKET), listen(3SOCKET), send(3SOCKET),
313 smf(5), inet(7P), inet6(7P), ip(7P), ip6(7P)
314
315
316 Ramakrishnan, K., Floyd, S., Black, D., RFC 3168, The Addition of
317 Explicit Congestion Notification (ECN) to IP, September 2001.
318
319
320 Mathias, M. and Hahdavi, J. Pittsburgh Supercomputing Center; Ford, S.
321 Lawrence Berkeley National Laboratory; Romanow, A. Sun Microsystems,
322 Inc. RFC 2018, TCP Selective Acknowledgement Options, October 1996.
323
324
325 Bellovin, S., RFC 1948, Defending Against Sequence Number Attacks, May
326 1996.
327
328
329 Jacobson, V., Braden, R., and Borman, D., RFC 1323, TCP Extensions for
330 High Performance, May 1992.
331
332
333 Postel, Jon, RFC 793, Transmission Control Protocol - DARPA Internet
334 Program Protocol Specification, Network Information Center, SRI Inter‐
335 national, Menlo Park, CA., September 1981.
336
338 A socket operation may fail if:
339
340 EISCONN A connect() operation was attempted on a socket on
341 which a connect() operation had already been per‐
342 formed.
343
344
345 ETIMEDOUT A connection was dropped due to excessive retransmis‐
346 sions.
347
348
349 ECONNRESET The remote peer forced the connection to be closed
350 (usually because the remote machine has lost state
351 information about the connection due to a crash).
352
353
354 ECONNREFUSED The remote peer actively refused connection establish‐
355 ment (usually because no process is listening to the
356 port).
357
358
359 EADDRINUSE A bind() operation was attempted on a socket with a
360 network address/port pair that has already been bound
361 to another socket.
362
363
364 EADDRNOTAVAIL A bind() operation was attempted on a socket with a
365 network address for which no network interface exists.
366
367
368 EACCES A bind() operation was attempted with a "reserved"
369 port number and the effective user ID of the process
370 was not the privileged user.
371
372
373 ENOBUFS The system ran out of memory for internal data struc‐
374 tures.
375
376
378 The tcp service is managed by the service management facility, smf(5),
379 under the service identifier:
380
381 svc:/network/initial:default
382
383
384
385
386 Administrative actions on this service, such as enabling, disabling, or
387 requesting restart, can be performed using svcadm(1M). The service's
388 status can be queried using the svcs(1) command.
389
390
391
392SunOS 5.11 30 June 2006 tcp(7P)