1globus_ftp_extensions(3)      globus ftp control      globus_ftp_extensions(3)
2
3
4

NAME

6       globus_ftp_extensions - GridFTP: Protocol Extensions to FTP for the
7       Grid
8

Introduction

10       This section defines extensions to the FTP specification STD 9, RFC
11       959, FILE TRANSFER PROTOCOL (FTP) (October 1985) These extensions
12       provide striped data transfer, parallel data transfer, extended data
13       transfer, data buffer size configuration, and data channel
14       authentication.
15
16       The following new commands are introduced in this specification
17
18       · Striped Passive (SPAS)
19
20       · Striped Data Port (SPOR)
21
22       · Extended Retrieve (ERET)
23
24       · Extended Store (ESTO)
25
26       · Set Data Buffer Size (SBUF)
27
28       · Data Channel Authentication Mode (DCAU)
29
30       A new transfer mode (extended-block mode) is introduced for parallel
31       and striped data transfers. Also, a set of extension options to RETR
32       are added to control striped data layout and parallelism.
33
34       The following new feature names are to be included in the FTP server's
35       response to FEAT if it implements the following sets of functionality
36
37       PARALLEL
38           The server supports the SPOR, SPAS, the RETR options mentioned
39           above, and extended block mode.
40
41       ESTO
42           The server implements the ESTO command as described in this
43           document.
44
45       ERET
46           The server implements the ERET command as described in this
47           document.
48
49       SBUF
50           The server implements the SBUF command as described in this
51           document.
52
53       DCAU
54           The server implements the DCAU command as described in this
55           document, including the requirement that data channels are
56           authenticated by default, if RFC 2228 authentication is used to
57           establish the control channel.
58

Terminology

60       Parallel transfer
61           From a single data server, splitting file data for transfer over
62           multiple data connections.
63
64       Striped transfer
65           Distributing a file's data over multiple independent data nodes,
66           and transerring over multiple data connections.
67
68       Data Node
69           In a striped data transfer, a data node is one of the stripe
70           destinations returned in the SPAS command, or one of the stripe
71           destinations sent in the SPOR command.
72
73       DTP
74           The data transfer process establishes and manages the data
75           connection. The DTP can be passive or active.
76
77       PI
78           The protocol interpreter. The user and server sides of the protocol
79           have distinct roles implemented in a user-PI and a server-PI.
80

FTP Standards Used

82       · RFC 959, FILE TRANSFER PROTOCOL (FTP), J. Postel, R. Reynolds
83         (October 1985)
84
85         · Commands used by GridFTP
86
87           · USER
88
89           · PASS
90
91           · ACCT
92
93           · CWD
94
95           · CDUP
96
97           · QUIT
98
99           · REIN
100
101           · PORT
102
103           · PASV
104
105           · TYPE
106
107           · MODE
108
109           · RETR
110
111           · STOR
112
113           · STOU
114
115           · APPE
116
117           · ALLO
118
119           · REST
120
121           · RNFR
122
123           · RNTO
124
125           · ABOR
126
127           · DELE
128
129           · RMD
130
131           · MKD
132
133           · PWD
134
135           · LIST
136
137           · NLST
138
139           · SITE
140
141           · SYST
142
143           · STAT
144
145           · HELP
146
147           · NOOP
148
149         · Features used by GridFTP
150
151           · ASCII and Image types
152
153           · Stream mode
154
155           · File structure
156
157       · RFC 2228, FTP Security Extensions, Horowitz, M. and S. Lunt (October
158         1997)
159
160         · Commands used by GridFTP
161
162           · AUTH
163
164           · ADAT
165
166           · MIC
167
168           · CONF
169
170           · ENC
171
172         · Features used by GridFTP
173
174           · GSSAPI authentication
175
176       · RFC 2389, Feature negotiation mechanism for the File Transfer
177         Protocol, P. Hethmon , R. Elz (August 1998)
178
179         · Commands used by GridFTP
180
181           · FEAT
182
183           · OPTS
184
185         · Features used by GridFTP
186
187       · FTP Extensions, R. Elz, P. Hethmon (September 2000)
188
189         · Commands used by GridFTP
190
191           · SIZE
192
193         · Features used by GridFTP
194
195           · Restart of a stream mode transfer
196

Striped Passive (SPAS)

198       This extension is used to establish a vector of data socket listeners
199       for for a server with one or more stripes. This command MUST be used in
200       conjunction with the extended block mode. The response to this command
201       includes a list of host and port addresses the server is listening on.
202
203       Due to the nature of the extended block mode protocol, SPAS must be
204       used in conjunction with data transfer commands which receive data
205       (such as STOR, ESTO, or APPE) and can not be used with commands which
206       send data on the data channels.
207
208       Syntax
209
210
211       The syntax of the SPAS command is:
212
213           spas = 'SPAS' <CRLF>
214
215       Responses
216
217
218       The server-PI will respond to the SPAS command with a 229 reply giving
219       the list of host-port strings for the remote server-DTP or user-DTP to
220       connect to.
221
222           spas-response = '229-Entering Striped Passive Mode' CRLF
223                            1*(<SP> host-port CRLF)
224                            229 End
225
226       Where the command is correctly parsed, but the server-DTP cannot
227       process the SPAS request, it must return the same error responses as
228       the PASV command.
229
230       OPTS for SPAS
231
232
233       There are no options in this SPAS specification, and hence there is no
234       OPTS command defined.
235

Striped Data Port (SPOR)

237       This extension is to be used as a complement to the SPAS command to
238       implement striped third-party transfers. This command MUST always be
239       used in conjunction with the extended block mode. The argument to SPOR
240       is a vector of host/TCP listener port pairs to which the server is to
241       connect. This
242
243       Due to the nature of the extended block mode protocol, SPOR must be
244       used in conjunction with data transfer commands which send data (such
245       as RETR, ERET, LIST, or NLST) and can not be used with commands which
246       receive data on the data channels.
247
248       Syntax
249
250
251       The syntax of the SPOR command is:
252
253       SPOR 1*(<SP> <host-port>) <CRLF>
254
255       The host-port sequence in the command structure MUST match the host-
256       port replies to a SPAS command.
257
258       Responses
259
260
261       The server-PI will respond to the SPOR command with the same response
262       set as the PORT command described in the ftp specification.
263
264       OPTS for SPOR
265
266
267       There are no options in this SPOR specification, and hence there is no
268       OPTS command defined.
269

Extended Retrieve (ERET)

271       The extended retrieve extension is used to request that a retrieve be
272       done with some additional processing on the server. This command an
273       extensible way of providing server-side data reduction or other
274       modifications to the RETR command. This command is used in place of
275       OPTS to the RETR command to allow server side processing to be done
276       with a single round trip (one command sent to the server instead of
277       two) for latency-critical applications.
278
279       ERET may be used with either the data transports defined in RFC 959, or
280       using extended block mode as defined in this document. Using an ERET
281       creates a new virtual file which will be sent, with it's own size and
282       byte range starting at zero. Restart markers generated while processing
283       an ERET are relative to the beginning of this view of the file.
284
285       Syntax
286
287
288       The syntax of the ERET command is
289
290       ERET <SP> <retrieve-mode> <SP> <filename>
291
292       retrieve-mode ::= P <SP> <offset> <SP> <size>
293       offset ::= 64 bit integer
294       size ::= 64 bit integer
295
296
297       The retrieve-mode defines behavior of the extended-retrieve mode. There
298       is one mode defined by this specification, but other general purpose or
299       application-specific ones may be added later.
300
301       modes_ERET Extended Retrieve Modes
302
303
304       Partial Retrieve Mode (P)
305           A section of the file will be retrieved from the data server. The
306           section is defined by the starting offset and extent size
307           parameters. When used with extended block mode, the extended block
308           headers sent along with data will send the data with offset of 0
309           meaning the beginning of the section of the file which was
310           requested.
311

Extended Store (ESTO)

313       The extended store extension is used to request that a store be done
314       with some additional processing on the server. Arbitrary data
315       processing algorithms may be added by defining additional ESTO store-
316       modes. Similar to the ERET, the ESTO command expects data sent to
317       satisfy the request to be sent as if it were a new file with data block
318       offset 0 being beginning the beginning of the new file.
319
320       The format of the ESTO command is
321
322       ESTO <SP> <store-mode> <filename>
323
324       store-mode ::= A <SP> <offset>
325
326
327       The store-mode defines the behavior of the extended store. There is one
328       mode defined by this specification, but others may be added later.
329
330       Extended Store Modes
331
332
333       Adjusted store (A)
334           The data in the file is to stored with offset added to the file
335           pointer before storing the blocks of the file. In extended block
336           mode, this value is added to the offset in the extended block
337           header by the server when writing to disk. Extended block headers
338           should therefore send the beginning of the byte range on the data
339           channel with offset of zero. In stream mode, the offset is added to
340           the implicit offset of 0 for the beginning of the data before
341           writing. If a stream mode restart marker is used in conjunction
342           with this ESTO mode, the restart marker's offset is added to the
343           offset passed as the parameter to the adjusted store.
344

Set Buffer Size (SBUF)

346       This extension adds the capability of a client to set the TCP buffer
347       size for subsequent data connections to a value. This replaces the
348       server-specific commands SITE RBUFSIZE, SITE RETRBUFSIZE, SITE RBUFSZ,
349       SITE SBUFSIZE, SITE SBUFSZ, and SITE BUFSIZE. Clients may wish to
350       consider supporting these other commands to ensure wider compatibility.
351
352       Syntax
353
354
355       The syntax of the SBUF command is
356
357       sbuf = SBUF <SP> <buffer-size>
358
359       buffer-size ::= <number>
360
361       The buffer-size value is the TCP buffer size in bytes. The TCP window
362       size should be set accordingly by the server.
363
364       Response Codes
365
366
367       If the server-PI is able to set the buffer size state to the requested
368       buffer-size, then it will return a 200 reply.
369
370       Note:
371           Even if the SBUF is accepted by the server, an error may occur
372           later when the data connections are actually created, depending on
373           how the server or client operating systems' TCP implementations.
374

Data Channel Authentication (DCAU)

376       This extension provides a method for specifying the type of
377       authentication to be performed on FTP data channels. This extension may
378       only be used when the control connection was authenticated using RFC
379       2228 Security extensions.
380
381       The format of the DCAU command is
382
383       DCAU <SP> <authentication-mode> <CRLF>
384
385       authentication-mode ::= <no-authentication>
386                             | <authenticate-with-self>
387                             | <authenticate-with-subject>
388
389       no-authentication ::= N
390       authenticate-with-self ::= A
391       authenticate-with-subject ::= S <subject-name>
392
393       subject-name ::= string
394
395
396       Authentication Modes
397
398
399           · No authentication (N)
400              No authentication handshake will be done upon data connection
401             establishment.
402
403           · Self authentication (A)
404              A security-protocol specific authentication will be used on the
405             data channel. The identity of the remote data connection will be
406             the same as the identity of the user which authenticated to the
407             control connection.
408
409           · Subject-name authentication (S)
410              A security-protocol specific authentication will be used on the
411             data channel. The identity of the remote data connection MUST
412             match the supplied subject-name string.
413
414       The default data channel authentication mode is A for FTP sessions
415       which are RFC 2228 authenticated---the client must explicitly send a
416       DCAU N message to disable it if it does not implement data channel
417       authentication.
418
419       If the security handshake fails, the server should return the error
420       response 432 (Data channel authentication failed).
421

Extended Block Mode

423       The striped and parallel data transfer methods described above require
424       an extended transfer mode to support out-of-sequence data delivery, and
425       partial data transmission per data connection. The extended block mode
426       described here extends the block mode header to provide support for
427       these as well as large blocks, and end-of-data synchronization.
428
429       Clients indicate that they want to use extended block mode by sending
430       the command
431
432       MODE <SP> E <CRLF>
433
434       on the control channel before a transfer command is sent.
435
436       The structure of the extended block header is
437
438       Extended Block Header
439
440       +----------------+-------/-----------+------/------------+
441       | Descriptor     |    Byte Count     |    Offset Count   |
442       |         8 bits |        64 bits    |          64 bits  |
443       +----------------+-------/-----------+------/------------+
444
445
446       The descriptor codes are indicated by bit flags in the descriptor byte.
447       Six codes have been assigned, where each code number is the decimal
448       value of the corresponding bit in the byte.
449
450        Code     Meaning
451
452         128     End of data block is EOR (Legacy)
453          64     End of data block is EOF
454          32     Suspected errors in data block
455          16     Data block is a restart marker
456           8     End of data block is EOD for a parallel/striped transfer
457           4     Sender will close the data connection
458
459       With this encoding, more than one descriptor coded condition may exist
460       for a particular block. As many bits as necessary may be flagged.
461
462       Some additional protocol is added to the extended block mode data
463       channels, to properly handle end-of-file detection in the presence of
464       an unknown number of data streams.
465
466       · When no more data is to be sent on the data channel, then the sender
467         will mark the last block, or send a zero-length block after the last
468         block with the EOD bit (8) set in the extended block header.
469
470       · After receiving an EOD the data connection can be cached for use in a
471         subsequent transfer. To signifiy that the data connection will be
472         closed the sender sets the close bit (4) in the header on the last
473         message sent.
474
475       · The sender communicates end of file by sending an EOF message to all
476         servers receiving data. The EOF message format follows.
477
478       Extended Block EOF Header
479
480       +----------------+-------/--------+------/---------------+
481       | Descriptor     |     unused     |  EOD count expected  |
482       |         8 bits |     64 bits    |        64 bits       |
483       +----------------+-------/--------+------/---------------+
484
485
486       EOF Descriptor. The EOF header descriptor has the same definition as
487       the regular data message header described above.
488
489       EOD Count Expected. This 64 bit field represents the total number of
490       data connections that will be established with the server receiving the
491       file. This number is used by the receiver to determine it has received
492       all of the data. When the number of EOD messages received equals the
493       number represented by the 'EOD Count Expected' field the receiver has
494       hit end of file.
495
496       Simply waiting for EOD on all open data connections is not sufficient.
497       It is possible that the receiver reads an EOD message on all of its
498       open data connects while an additional data connection is in flight. If
499       the receiver were to assume it reached end of file it would fail to
500       receive the data on the in flight connection.
501
502       To handle EOF in the multi-striped server case a 126 response has been
503       introduced. When receiving data from a striped server a client makes a
504       control connection to a single host, but several host may create
505       several data connections back to the client. Each host can
506       independently decide how many data connections it will use, but only a
507       single EOF message may be sent to back to the client, therefore it must
508       be possible to aggregate the total number of data connections used in
509       the transfer across the stripes. The 126 response serves this purpose.
510
511       The 126 is an intermediate response to RETR command. It has the
512       following format.
513
514       126 <SP> 1*(count of data connections)
515
516       Several 'Count of data connections' can be in a single reply. They
517       correspond to the stripes returned in the response to the SPAS command.
518
519       Discussion of protocol change to enable bidirectional data channels
520       brought up the following problem if doing bidirectional data channels
521
522       If the client is pasv, and sending to a multi-stripe server, then the
523       server creates data connections connections; since the client didn't do
524       SPAS, it cannot associate HOST/PORT pairs on the data connections with
525       stripes on the server (it doesn't even know how many there are). it
526       cannot reliably determine which nodes to send data to. (Becomes even
527       more complex in the third-party transfer case, because the sender may
528       have multiple stripes of data.) The basic problem is that we need to
529       know logical stripe numbers to know where to send the data.
530
531       EOF Handling in Extended Block Mode
532
533
534       If you are in either striped or parallel mode, you will get exactly one
535       EOF on each SPAS-specified ports (stripes). Hosts in extended block
536       mode must be prepared to accept an arbitrary number of connections on
537       each SPOR port before the EOF block is sent.
538
539       Restarting
540
541
542       In general, opaque restart markers passed via the block header should
543       not be used in extended block mode. Instead, the destination server
544       should send extended data marker responses over the control connection,
545       in the following form:
546
547          extended-mark-response = '111' <SP> 'Range Marker' <SP> <byte-ranges-list>
548
549          byte-ranges-list       = <byte-range> [ *(',' <byte-range>) ]
550          byte-range             = <start-offset> '-' <end-offset>
551
552          start-offset         ::= <number>
553          end-offset           ::= <number>
554
555       The byte ranges in the marker are an incremental set of byte ranges
556       which have been stored to disk by the data server. The complete restart
557       marker is a concatenation of all byte ranges received by the client in
558       111 responses.
559
560       The client MAY combine adjacent ranges received over several range
561       responses into any number of ranges when sending the REST command to
562       the server to restart a transfer.
563
564       For example, the client, on receiving the responses:
565
566       111 Range Marker 0-29
567       111 Range Marker 30-89
568
569       may send, equivalently,
570
571       REST 0-29,30-89
572       REST 0-89
573       REST 30-59,0-29,60-89
574
575       to restart the transfer after those 90 bytes have been received.
576
577       The server MAY indicate that a given range of data has been received in
578       multiple subsequent range markers. The client MUST be able to handle
579       this. For example:
580
581       111 Range Marker 30-59
582       111 Range Marker 0-89
583
584       is equivalent to
585
586       111 Range Marker 30-59
587       111 Range Marker 0-29,60-89
588
589       Similarly, the client, if it is doing no processing of the restart
590       markers, MAY send redundant information in a restart.
591
592       Should these be allowed as restart markers for stream mode?
593
594       Performance Monitoring
595
596
597       In order to monitor the performance of extended block mode transfer, an
598       additional preliminary reply MAY be transmitted over the control
599       channel. This reply is of the form:
600
601          extended-perf-response  = '112-Perf Marker' CRLF
602                                    <SP> 'Timestamp:' <SP> <timestamp> CRLF
603                                    <SP> 'Stripe Index:' <SP> <stripe-number> CRLF
604                                    <SP> 'Stripe Bytes Transferred:' <SP> <byte count> CRLF
605                                    <SP> 'Total Stripe Count:' <SP> <stripe count> CRLF
606                                    '112 End' CRLF
607
608          timestamp               = <number> [ '.' <digit> ]
609
610       <timestamp> is seconds since the epoch
611
612       The performance marker can contain these or any other perf-line facts
613       which provide useful information about the current performance.
614
615       All perf-line facts represent an instantaneous state of the transfer at
616       the given timestamp. The meaning of the facts are
617
618       · Timestamp - The time at which the server computed the performance
619         information. This is in seconds since the epoch (00:00:00 UTC,
620         January 1, 1970).
621
622       · Stripe Index - the index (0-number of stripes on the STOR side of the
623         transfer) which this marker pertains to.
624
625       · Stripe Bytes Transferred - The number of bytes which have been
626         received on this stripe.
627
628       A transfer start time can be specified by a perf marker with 'Stripe
629       Bytes Transferred' set to zero. Only the first marker per stripe can be
630       used to specify the start time of that stripe. Any subsequent markers
631       with 'Stripe Bytes Transferred' set to zero simply indicates no data
632       transfer over the interval.
633
634       A server should send a 'start' marker for each stripe. A server should
635       also send a final perf marker for each stripe. This is a marker with
636       'Stripe Bytes Transferred' set to the total transfer size for that
637       stripe.
638

Options to RETR

640       The options described in this section provide a means to convey
641       striping and transfer parallelism information to the server-DTP. For
642       the RETR command, the Client-FTP may specify a parallelism and striping
643       mode it wishes the server-DTP to use. These options are only used by
644       the server-DTP if the retrieve operation is done in extended block
645       mode. These options are implemented as RFC 2389 extensions.
646
647       The format of the RETR OPTS is specified by:
648
649           retr-opts     = 'OPTS' <SP> 'RETR' [<SP> option-list] CRLF
650           option-list   = [ layout-opts ';' ] [ parallel-opts ';' ]
651           layout-opts   = 'StripeLayout=Partitioned'
652                         | 'StripeLayout=Blocked;BlockSize=' <block-size>
653           parallel-opts = 'Parallelism=' <starting-parallelism> ','
654                                          <minimum-parallelism>  ','
655                                          <maximum-parallelism>
656
657           block-size           ::= <number>
658           starting-parallelism ::= <number>
659           minimum-parallelism  ::= <number>
660           maximum-parallelism  ::= <number>
661
662
663       Layout Options
664
665
666       The layout option is used by the source data node to send sections of
667       the data file to the appropriate destination stripe. The various
668       StripeLayout parameters are to be implemented as follows:
669
670       Partitioned
671           A partitioned data layout is one where the data is distributed
672           evenly on the destination data nodes. Only one contiguous section
673           of data is stored on each data node. A data node is defined here a
674           single host-port mentioned in the SPOR command
675
676       Blocked
677           A blocked data layout is one where the data is distributed in
678           round-robin fashion over the destination data nodes. The data
679           distribution is ordered by the order of the host-port
680           specifications in the SPOR command. The block-size defines the size
681           of blocks to be distributed.
682
683       PLVL Parallelism Options
684
685
686       The parallelism option is used by the source data node to control how
687       many parallel data connections may be established to each destination
688       data node. This extension option provides for both a fixed level of
689       parallelism, and for adapting the parallelism to the host/network
690       connection, within a range. If the starting-parallelism option is set,
691       then the server-DTP will make starting-parallelism connections to each
692       destination data node. If the minimum-parallelism option is set, then
693       the server may reduce the number of parallel connections per
694       destination data node to this value. If the maximum-parallelism option
695       is set, then the server may increase the number of parallel connections
696       to per destination data node to at most this value.
697

References

699        [1] Postel, J. and Reynolds, J., '<a href='ftp://ftp.isi.edu/in-
700       notes/rfc959.txt'> FILE TRANSFER PROTOCOL (FTP)</a>', STD 9, RFC 959,
701       October 1985.
702
703        [2] Hethmon, P. and Elz, R., '<a href='ftp://ftp.isi.edu/in-
704       notes/rfc2389.txt'> Feature negotiation mechanism for the File Transfer
705       Protocol</a>', RFC 2389, August 1998.
706
707        [3] Horowitz, M. and Lunt, S., '<a href='ftp://ftp.isi.edu/in-
708       notes/rfc2228.txt'> FTP Security Extensions</a>', RFC 2228, October
709       1997.
710
711        [4] Elz, R. and Hethom, P., '<a href='http://www.ietf.org/internet-
712       drafts/draft-ietf-ftpext-mlst-13.txt'> FTP Extensions</a>', IETF Draft,
713       May 2001.
714

Appendix I: Implementation under GSI

716       There are several security components in this document which are
717       extensions to the behavior of RFC 2228. These appendix attempts to
718       clarify the protocol how these extensions map to the OpenSSL-based
719       implementation of the GSSAPI known as GSI (Grid Security
720       Infrastructure).
721
722       A client implementation which communicates with a server which supports
723       the DCAU extension should delegate a limited credential set (using the
724       GSS_C_DELEG_FLAG and GSS_C_GLOBUS_LIMITED_DELEG_PROXY_FLAG flags to
725       gss_init_sec_context()). If delegation is not performed, the client
726       MUST request that DCAU be disable by requesting DCAU N, or the server
727       will be unable to perform the default of DCAU A as described by this
728       document.
729
730       When DCAU mode 'A' or 'S' is used, a separate security context is
731       established on each data channel. The context is established by
732       performing the GSSAPI handshake with the active-DTP calling
733       gss_init_sec_context() and the passive-DTP calling
734       gss_accept_sec_context(). No delegation need be done on these data
735       channels.
736
737       Data channel protection via the PROT command MUST always be used in
738       conjunction with the DCAU A or DCAU S commands. If a PROT level is set,
739       then messages will be wrapped according to RFC 2228 Appendix I using
740       the contexts established on each data channel. Tokens transferred over
741       the data channels when either PROT or DCAU is used are not framed in
742       any way when using GSI. (When implementing this specification with
743       other GSSAPI mechanisms, a 4 byte, big endian, binary token length
744       should procede all tokens).
745
746       If the DCAU mode or the PROT mode is changed between file transfers
747       when caching data channels in extended block mode, all open data
748       channels must be closed. This is because the GSI implementation does
749       not support changing levels of protection on an existing connection.
750
751
752
753Version 2.12                    Thu Jun 23 2011       globus_ftp_extensions(3)
Impressum