1globus_ftp_extensions(3Version) globus_ftp_extensions(3Version)
2
3
4
6 globus_ftp_extensions - GridFTP: Protocol Extensions to FTP for the
7 Grid.
8
10 This section defines extensions to the FTP specification STD 9, RFC
11 959, FILE TRANSFER PROTOCOL (FTP) (October 1985) These extensions
12 provide striped data transfer, parallel data transfer, extended data
13 transfer, data buffer size configuration, and data channel
14 authentication.
15
16 The following new commands are introduced in this specification
17
18 • Striped Passive (SPAS)
19
20 • Striped Data Port (SPOR)
21
22 • Extended Retrieve (ERET)
23
24 • Extended Store (ESTO)
25
26 • Set Data Buffer Size (SBUF)
27
28 • Data Channel Authentication Mode (DCAU)
29
30 A new transfer mode (extended-block mode) is introduced for parallel
31 and striped data transfers. Also, a set of extension options to RETR
32 are added to control striped data layout and parallelism.
33
34 The following new feature names are to be included in the FTP server's
35 response to FEAT if it implements the following sets of functionality
36
37 PARALLEL
38 The server supports the SPOR, SPAS, the RETR options mentioned
39 above, and extended block mode.
40
41 ESTO
42 The server implements the ESTO command as described in this
43 document.
44
45 ERET
46 The server implements the ERET command as described in this
47 document.
48
49 SBUF
50 The server implements the SBUF command as described in this
51 document.
52
53 DCAU
54 The server implements the DCAU command as described in this
55 document, including the requirement that data channels are
56 authenticated by default, if RFC 2228 authentication is used to
57 establish the control channel.
58
60 Parallel transfer
61 From a single data server, splitting file data for transfer over
62 multiple data connections.
63
64 Striped transfer
65 Distributing a file's data over multiple independent data nodes,
66 and transerring over multiple data connections.
67
68 Data Node
69 In a striped data transfer, a data node is one of the stripe
70 destinations returned in the SPAS command, or one of the stripe
71 destinations sent in the SPOR command.
72
73 DTP
74 The data transfer process establishes and manages the data
75 connection. The DTP can be passive or active.
76
77 PI
78 The protocol interpreter. The user and server sides of the protocol
79 have distinct roles implemented in a user-PI and a server-PI.
80
82 • RFC 959, FILE TRANSFER PROTOCOL (FTP), J. Postel, R. Reynolds
83 (October 1985)
84
85 • Commands used by GridFTP
86
87 • USER
88
89 • PASS
90
91 • ACCT
92
93 • CWD
94
95 • CDUP
96
97 • QUIT
98
99 • REIN
100
101 • PORT
102
103 • PASV
104
105 • TYPE
106
107 • MODE
108
109 • RETR
110
111 • STOR
112
113 • STOU
114
115 • APPE
116
117 • ALLO
118
119 • REST
120
121 • RNFR
122
123 • RNTO
124
125 • ABOR
126
127 • DELE
128
129 • RMD
130
131 • MKD
132
133 • PWD
134
135 • LIST
136
137 • NLST
138
139 • SITE
140
141 • SYST
142
143 • STAT
144
145 • HELP
146
147 • NOOP
148
149 • Features used by GridFTP
150
151 • ASCII and Image types
152
153 • Stream mode
154
155 • File structure
156
157 • RFC 2228, FTP Security Extensions, Horowitz, M. and S. Lunt (October
158 1997)
159
160 • Commands used by GridFTP
161
162 • AUTH
163
164 • ADAT
165
166 • MIC
167
168 • CONF
169
170 • ENC
171
172 • Features used by GridFTP
173
174 • GSSAPI authentication
175
176 • RFC 2389, Feature negotiation mechanism for the File Transfer
177 Protocol, P. Hethmon , R. Elz (August 1998)
178
179 • Commands used by GridFTP
180
181 • FEAT
182
183 • OPTS
184
185 • Features used by GridFTP
186
187 • FTP Extensions, R. Elz, P. Hethmon (September 2000)
188
189 • Commands used by GridFTP
190
191 • SIZE
192
193 • Features used by GridFTP
194
195 • Restart of a stream mode transfer
196
198 This extension is used to establish a vector of data socket listeners
199 for for a server with one or more stripes. This command MUST be used in
200 conjunction with the extended block mode. The response to this command
201 includes a list of host and port addresses the server is listening on.
202
203 Due to the nature of the extended block mode protocol, SPAS must be
204 used in conjunction with data transfer commands which receive data
205 (such as STOR, ESTO, or APPE) and can not be used with commands which
206 send data on the data channels.
207
208 Syntax
209
210
211 The syntax of the SPAS command is:
212
213 spas = "SPAS" <CRLF>
214
215 Responses
216
217
218 The server-PI will respond to the SPAS command with a 229 reply giving
219 the list of host-port strings for the remote server-DTP or user-DTP to
220 connect to.
221
222 spas-response = "229-Entering Striped Passive Mode" CRLF
223 1*(<SP> host-port CRLF)
224 229 End
225
226 Where the command is correctly parsed, but the server-DTP cannot
227 process the SPAS request, it must return the same error responses as
228 the PASV command.
229
230 OPTS for SPAS
231
232
233 There are no options in this SPAS specification, and hence there is no
234 OPTS command defined.
235
237 This extension is to be used as a complement to the SPAS command to
238 implement striped third-party transfers. This command MUST always be
239 used in conjunction with the extended block mode. The argument to SPOR
240 is a vector of host/TCP listener port pairs to which the server is to
241 connect. This
242
243 Due to the nature of the extended block mode protocol, SPOR must be
244 used in conjunction with data transfer commands which send data (such
245 as RETR, ERET, LIST, or NLST) and can not be used with commands which
246 receive data on the data channels.
247
248 Syntax
249
250
251 The syntax of the SPOR command is:
252
253 SPOR 1*(<SP> <host-port>) <CRLF>
254
255 The host-port sequence in the command structure MUST match the host-
256 port replies to a SPAS command.
257
258 Responses
259
260
261 The server-PI will respond to the SPOR command with the same response
262 set as the PORT command described in the ftp specification.
263
264 OPTS for SPOR
265
266
267 There are no options in this SPOR specification, and hence there is no
268 OPTS command defined.
269
271 The extended retrieve extension is used to request that a retrieve be
272 done with some additional processing on the server. This command an
273 extensible way of providing server-side data reduction or other
274 modifications to the RETR command. This command is used in place of
275 OPTS to the RETR command to allow server side processing to be done
276 with a single round trip (one command sent to the server instead of
277 two) for latency-critical applications.
278
279 ERET may be used with either the data transports defined in RFC 959, or
280 using extended block mode as defined in this document. Using an ERET
281 creates a new virtual file which will be sent, with it's own size and
282 byte range starting at zero. Restart markers generated while processing
283 an ERET are relative to the beginning of this view of the file.
284
285 Syntax
286
287
288 The syntax of the ERET command is
289
290 ERET <SP> <retrieve-mode> <SP> <filename>
291
292 retrieve-mode ::= P <SP> <offset> <SP> <size>
293 offset ::= 64 bit integer
294 size ::= 64 bit integer
295
296
297 The retrieve-mode defines behavior of the extended-retrieve mode. There
298 is one mode defined by this specification, but other general purpose or
299 application-specific ones may be added later.
300
301 modes_ERET Extended Retrieve Modes
302
303
304 Partial Retrieve Mode (P)
305 A section of the file will be retrieved from the data server. The
306 section is defined by the starting offset and extent size
307 parameters. When used with extended block mode, the extended block
308 headers sent along with data will send the data with offset of 0
309 meaning the beginning of the section of the file which was
310 requested.
311
313 The extended store extension is used to request that a store be done
314 with some additional processing on the server. Arbitrary data
315 processing algorithms may be added by defining additional ESTO store-
316 modes. Similar to the ERET, the ESTO command expects data sent to
317 satisfy the request to be sent as if it were a new file with data block
318 offset 0 being beginning the beginning of the new file.
319
320 The format of the ESTO command is
321
322 ESTO <SP> <store-mode> <filename>
323
324 store-mode ::= A <SP> <offset>
325
326
327 The store-mode defines the behavior of the extended store. There is one
328 mode defined by this specification, but others may be added later.
329
330 Extended Store Modes
331
332
333 Adjusted store (A)
334 The data in the file is to stored with offset added to the file
335 pointer before storing the blocks of the file. In extended block
336 mode, this value is added to the offset in the extended block
337 header by the server when writing to disk. Extended block headers
338 should therefore send the beginning of the byte range on the data
339 channel with offset of zero. In stream mode, the offset is added to
340 the implicit offset of 0 for the beginning of the data before
341 writing. If a stream mode restart marker is used in conjunction
342 with this ESTO mode, the restart marker's offset is added to the
343 offset passed as the parameter to the adjusted store.
344
346 This extension adds the capability of a client to set the TCP buffer
347 size for subsequent data connections to a value. This replaces the
348 server-specific commands SITE RBUFSIZE, SITE RETRBUFSIZE, SITE RBUFSZ,
349 SITE SBUFSIZE, SITE SBUFSZ, and SITE BUFSIZE. Clients may wish to
350 consider supporting these other commands to ensure wider compatibility.
351
352 Syntax
353
354
355 The syntax of the SBUF command is
356
357 sbuf = SBUF <SP> <buffer-size>
358
359 buffer-size ::= <number>
360
361 The buffer-size value is the TCP buffer size in bytes. The TCP window
362 size should be set accordingly by the server.
363
364 Response Codes
365
366
367 If the server-PI is able to set the buffer size state to the requested
368 buffer-size, then it will return a 200 reply.
369
370 Note
371 Even if the SBUF is accepted by the server, an error may occur
372 later when the data connections are actually created, depending on
373 how the server or client operating systems' TCP implementations.
374
376 This extension provides a method for specifying the type of
377 authentication to be performed on FTP data channels. This extension may
378 only be used when the control connection was authenticated using RFC
379 2228 Security extensions.
380
381 The format of the DCAU command is
382
383 DCAU <SP> <authentication-mode> <CRLF>
384
385 authentication-mode ::= <no-authentication>
386 | <authenticate-with-self>
387 | <authenticate-with-subject>
388
389 no-authentication ::= N
390 authenticate-with-self ::= A
391 authenticate-with-subject ::= S <subject-name>
392
393 subject-name ::= string
394
395
396 Authentication Modes
397
398
399 • No authentication (N)
400 No authentication handshake will be done upon data connection
401 establishment.
402
403 • Self authentication (A)
404 A security-protocol specific authentication will be used on the
405 data channel. The identity of the remote data connection will be
406 the same as the identity of the user which authenticated to the
407 control connection.
408
409 • Subject-name authentication (S)
410 A security-protocol specific authentication will be used on the
411 data channel. The identity of the remote data connection MUST
412 match the supplied subject-name string.
413
414 The default data channel authentication mode is A for FTP sessions
415 which are RFC 2228 authenticated---the client must explicitly send a
416 DCAU N message to disable it if it does not implement data channel
417 authentication.
418
419 If the security handshake fails, the server should return the error
420 response 432 (Data channel authentication failed).
421
423 The striped and parallel data transfer methods described above require
424 an extended transfer mode to support out-of-sequence data delivery, and
425 partial data transmission per data connection. The extended block mode
426 described here extends the block mode header to provide support for
427 these as well as large blocks, and end-of-data synchronization.
428
429 Clients indicate that they want to use extended block mode by sending
430 the command
431
432 MODE <SP> E <CRLF>
433
434 on the control channel before a transfer command is sent.
435
436 The structure of the extended block header is
437
438 Extended Block Header
439
440 +----------------+-------/-----------+------/------------+
441 | Descriptor | Byte Count | Offset Count |
442 | 8 bits | 64 bits | 64 bits |
443 +----------------+-------/-----------+------/------------+
444
445
446 The descriptor codes are indicated by bit flags in the descriptor byte.
447 Six codes have been assigned, where each code number is the decimal
448 value of the corresponding bit in the byte.
449
450 Code Meaning
451
452 128 End of data block is EOR (Legacy)
453 64 End of data block is EOF
454 32 Suspected errors in data block
455 16 Data block is a restart marker
456 8 End of data block is EOD for a parallel/striped transfer
457 4 Sender will close the data connection
458
459 With this encoding, more than one descriptor coded condition may exist
460 for a particular block. As many bits as necessary may be flagged.
461
462 Some additional protocol is added to the extended block mode data
463 channels, to properly handle end-of-file detection in the presence of
464 an unknown number of data streams.
465
466 • When no more data is to be sent on the data channel, then the sender
467 will mark the last block, or send a zero-length block after the last
468 block with the EOD bit (8) set in the extended block header.
469
470 • After receiving an EOD the data connection can be cached for use in a
471 subsequent transfer. To signifiy that the data connection will be
472 closed the sender sets the close bit (4) in the header on the last
473 message sent.
474
475 • The sender communicates end of file by sending an EOF message to all
476 servers receiving data. The EOF message format follows.
477
478 Extended Block EOF Header
479
480 +----------------+-------/--------+------/---------------+
481 | Descriptor | unused | EOD count expected |
482 | 8 bits | 64 bits | 64 bits |
483 +----------------+-------/--------+------/---------------+
484
485
486 EOF Descriptor. The EOF header descriptor has the same definition as
487 the regular data message header described above.
488
489 EOD Count Expected. This 64 bit field represents the total number of
490 data connections that will be established with the server receiving the
491 file. This number is used by the receiver to determine it has received
492 all of the data. When the number of EOD messages received equals the
493 number represented by the 'EOD Count Expected' field the receiver has
494 hit end of file.
495
496 Simply waiting for EOD on all open data connections is not sufficient.
497 It is possible that the receiver reads an EOD message on all of its
498 open data connects while an additional data connection is in flight. If
499 the receiver were to assume it reached end of file it would fail to
500 receive the data on the in flight connection.
501
502 To handle EOF in the multi-striped server case a 126 response has been
503 introduced. When receiving data from a striped server a client makes a
504 control connection to a single host, but several host may create
505 several data connections back to the client. Each host can
506 independently decide how many data connections it will use, but only a
507 single EOF message may be sent to back to the client, therefore it must
508 be possible to aggregate the total number of data connections used in
509 the transfer across the stripes. The 126 response serves this purpose.
510
511 The 126 is an intermediate response to RETR command. It has the
512 following format.
513
514 126 <SP> 1*(count of data connections)
515
516 Several 'Count of data connections' can be in a single reply. They
517 correspond to the stripes returned in the response to the SPAS command.
518
519 Discussion of protocol change to enable bidirectional data channels
520 brought up the following problem if doing bidirectional data channels
521
522 If the client is pasv, and sending to a multi-stripe server, then the
523 server creates data connections connections; since the client didn't do
524 SPAS, it cannot associate HOST/PORT pairs on the data connections with
525 stripes on the server (it doesn't even know how many there are). it
526 cannot reliably determine which nodes to send data to. (Becomes even
527 more complex in the third-party transfer case, because the sender may
528 have multiple stripes of data.) The basic problem is that we need to
529 know logical stripe numbers to know where to send the data.
530
531 EOF Handling in Extended Block Mode
532
533
534 If you are in either striped or parallel mode, you will get exactly one
535 EOF on each SPAS-specified ports (stripes). Hosts in extended block
536 mode must be prepared to accept an arbitrary number of connections on
537 each SPOR port before the EOF block is sent.
538
539 Restarting
540
541
542 In general, opaque restart markers passed via the block header should
543 not be used in extended block mode. Instead, the destination server
544 should send extended data marker responses over the control connection,
545 in the following form:
546
547 extended-mark-response = "111" <SP> "Range Marker" <SP> <byte-ranges-list>
548
549 byte-ranges-list = <byte-range> [ *("," <byte-range>) ]
550 byte-range = <start-offset> "-" <end-offset>
551
552 start-offset ::= <number>
553 end-offset ::= <number>
554
555 The byte ranges in the marker are an incremental set of byte ranges
556 which have been stored to disk by the data server. The complete restart
557 marker is a concatenation of all byte ranges received by the client in
558 111 responses.
559
560 The client MAY combine adjacent ranges received over several range
561 responses into any number of ranges when sending the REST command to
562 the server to restart a transfer.
563
564 For example, the client, on receiving the responses:
565
566 111 Range Marker 0-29
567 111 Range Marker 30-89
568
569 may send, equivalently,
570
571 REST 0-29,30-89
572 REST 0-89
573 REST 30-59,0-29,60-89
574
575 to restart the transfer after those 90 bytes have been received.
576
577 The server MAY indicate that a given range of data has been received in
578 multiple subsequent range markers. The client MUST be able to handle
579 this. For example:
580
581 111 Range Marker 30-59
582 111 Range Marker 0-89
583
584 is equivalent to
585
586 111 Range Marker 30-59
587 111 Range Marker 0-29,60-89
588
589 Similarly, the client, if it is doing no processing of the restart
590 markers, MAY send redundant information in a restart.
591
592 Should these be allowed as restart markers for stream mode?
593
594 Performance Monitoring
595
596
597 In order to monitor the performance of extended block mode transfer, an
598 additional preliminary reply MAY be transmitted over the control
599 channel. This reply is of the form:
600
601 extended-perf-response = "112-Perf Marker" CRLF
602 <SP> "Timestamp:" <SP> <timestamp> CRLF
603 <SP> "Stripe Index:" <SP> <stripe-number> CRLF
604 <SP> "Stripe Bytes Transferred:" <SP> <byte count> CRLF
605 <SP> "Total Stripe Count:" <SP> <stripe count> CRLF
606 "112 End" CRLF
607
608 timestamp = <number> [ "." <digit> ]
609
610 <timestamp> is seconds since the epoch
611
612 The performance marker can contain these or any other perf-line facts
613 which provide useful information about the current performance.
614
615 All perf-line facts represent an instantaneous state of the transfer at
616 the given timestamp. The meaning of the facts are
617
618 • Timestamp - The time at which the server computed the performance
619 information. This is in seconds since the epoch (00:00:00 UTC,
620 January 1, 1970).
621
622 • Stripe Index - the index (0-number of stripes on the STOR side of the
623 transfer) which this marker pertains to.
624
625 • Stripe Bytes Transferred - The number of bytes which have been
626 received on this stripe.
627
628 A transfer start time can be specified by a perf marker with 'Stripe
629 Bytes Transferred' set to zero. Only the first marker per stripe can be
630 used to specify the start time of that stripe. Any subsequent markers
631 with 'Stripe Bytes Transferred' set to zero simply indicates no data
632 transfer over the interval.
633
634 A server should send a 'start' marker for each stripe. A server should
635 also send a final perf marker for each stripe. This is a marker with
636 'Stripe Bytes Transferred' set to the total transfer size for that
637 stripe.
638
640 The options described in this section provide a means to convey
641 striping and transfer parallelism information to the server-DTP. For
642 the RETR command, the Client-FTP may specify a parallelism and striping
643 mode it wishes the server-DTP to use. These options are only used by
644 the server-DTP if the retrieve operation is done in extended block
645 mode. These options are implemented as RFC 2389 extensions.
646
647 The format of the RETR OPTS is specified by:
648
649 retr-opts = "OPTS" <SP> "RETR" [<SP> option-list] CRLF
650 option-list = [ layout-opts ";" ] [ parallel-opts ";" ]
651 layout-opts = "StripeLayout=Partitioned"
652 | "StripeLayout=Blocked;BlockSize=" <block-size>
653 parallel-opts = "Parallelism=" <starting-parallelism> ","
654 <minimum-parallelism> ","
655 <maximum-parallelism>
656
657 block-size ::= <number>
658 starting-parallelism ::= <number>
659 minimum-parallelism ::= <number>
660 maximum-parallelism ::= <number>
661
662
663 Layout Options
664
665
666 The layout option is used by the source data node to send sections of
667 the data file to the appropriate destination stripe. The various
668 StripeLayout parameters are to be implemented as follows:
669
670 Partitioned
671 A partitioned data layout is one where the data is distributed
672 evenly on the destination data nodes. Only one contiguous section
673 of data is stored on each data node. A data node is defined here a
674 single host-port mentioned in the SPOR command
675
676 Blocked
677 A blocked data layout is one where the data is distributed in
678 round-robin fashion over the destination data nodes. The data
679 distribution is ordered by the order of the host-port
680 specifications in the SPOR command. The block-size defines the size
681 of blocks to be distributed.
682
683 PLVL Parallelism Options
684
685
686 The parallelism option is used by the source data node to control how
687 many parallel data connections may be established to each destination
688 data node. This extension option provides for both a fixed level of
689 parallelism, and for adapting the parallelism to the host/network
690 connection, within a range. If the starting-parallelism option is set,
691 then the server-DTP will make starting-parallelism connections to each
692 destination data node. If the minimum-parallelism option is set, then
693 the server may reduce the number of parallel connections per
694 destination data node to this value. If the maximum-parallelism option
695 is set, then the server may increase the number of parallel connections
696 to per destination data node to at most this value.
697
699 [1] Postel, J. and Reynolds, J., '<a href='ftp://ftp.isi.edu/in-
700 notes/rfc959.txt'> FILE TRANSFER PROTOCOL (FTP)</a>', STD 9, RFC 959,
701 October 1985.
702
703 [2] Hethmon, P. and Elz, R., '<a href='ftp://ftp.isi.edu/in-
704 notes/rfc2389.txt'> Feature negotiation mechanism for the File Transfer
705 Protocol</a>', RFC 2389, August 1998.
706
707 [3] Horowitz, M. and Lunt, S., '<a href='ftp://ftp.isi.edu/in-
708 notes/rfc2228.txt'> FTP Security Extensions</a>', RFC 2228, October
709 1997.
710
711 [4] Elz, R. and Hethom, P., '<a href='http://www.ietf.org/internet-
712 drafts/draft-ietf-ftpext-mlst-13.txt'> FTP Extensions</a>', IETF Draft,
713 May 2001.
714
716 There are several security components in this document which are
717 extensions to the behavior of RFC 2228. These appendix attempts to
718 clarify the protocol how these extensions map to the OpenSSL-based
719 implementation of the GSSAPI known as GSI (Grid Security
720 Infrastructure).
721
722 A client implementation which communicates with a server which supports
723 the DCAU extension should delegate a limited credential set (using the
724 GSS_C_DELEG_FLAG and GSS_C_GLOBUS_LIMITED_DELEG_PROXY_FLAG flags to
725 gss_init_sec_context()). If delegation is not performed, the client
726 MUST request that DCAU be disable by requesting DCAU N, or the server
727 will be unable to perform the default of DCAU A as described by this
728 document.
729
730 When DCAU mode 'A' or 'S' is used, a separate security context is
731 established on each data channel. The context is established by
732 performing the GSSAPI handshake with the active-DTP calling
733 gss_init_sec_context() and the passive-DTP calling
734 gss_accept_sec_context(). No delegation need be done on these data
735 channels.
736
737 Data channel protection via the PROT command MUST always be used in
738 conjunction with the DCAU A or DCAU S commands. If a PROT level is set,
739 then messages will be wrapped according to RFC 2228 Appendix I using
740 the contexts established on each data channel. Tokens transferred over
741 the data channels when either PROT or DCAU is used are not framed in
742 any way when using GSI. (When implementing this specification with
743 other GSSAPI mechanisms, a 4 byte, big endian, binary token length
744 should proceed all tokens).
745
746 If the DCAU mode or the PROT mode is changed between file transfers
747 when caching data channels in extended block mode, all open data
748 channels must be closed. This is because the GSI implementation does
749 not support changing levels of protection on an existing connection.
750
751
752
753globus_ftp_control 9.10" globus_ftp_extensions(3Version)