1libcurl-tutorial(3) libcurl programming libcurl-tutorial(3)
2
3
4
6 libcurl-tutorial - libcurl programming tutorial
7
9 This document attempts to describe the general principles and some
10 basic approaches to consider when programming with libcurl. The text
11 will focus mainly on the C interface but might apply fairly well on
12 other interfaces as well as they usually follow the C one pretty
13 closely.
14
15 This document will refer to 'the user' as the person writing the source
16 code that uses libcurl. That would probably be you or someone in your
17 position. What will be generally referred to as 'the program' will be
18 the collected source code that you write that is using libcurl for
19 transfers. The program is outside libcurl and libcurl is outside of the
20 program.
21
22 To get the more details on all options and functions described herein,
23 please refer to their respective man pages.
24
25
27 There are many different ways to build C programs. This chapter will
28 assume a unix-style build process. If you use a different build system,
29 you can still read this to get general information that may apply to
30 your environment as well.
31
32 Compiling the Program
33 Your compiler needs to know where the libcurl headers are
34 located. Therefore you must set your compiler's include path to
35 point to the directory where you installed them. The 'curl-con‐
36 fig'[3] tool can be used to get this information:
37
38 $ curl-config --cflags
39
40
41 Linking the Program with libcurl
42 When having compiled the program, you need to link your object
43 files to create a single executable. For that to succeed, you
44 need to link with libcurl and possibly also with other libraries
45 that libcurl itself depends on. Like the OpenSSL libraries, but
46 even some standard OS libraries may be needed on the command
47 line. To figure out which flags to use, once again the 'curl-
48 config' tool comes to the rescue:
49
50 $ curl-config --libs
51
52
53 SSL or Not
54 libcurl can be built and customized in many ways. One of the
55 things that varies from different libraries and builds is the
56 support for SSL-based transfers, like HTTPS and FTPS. If a sup‐
57 ported SSL library was detected properly at build-time, libcurl
58 will be built with SSL support. To figure out if an installed
59 libcurl has been built with SSL support enabled, use 'curl-con‐
60 fig' like this:
61
62 $ curl-config --feature
63
64 And if SSL is supported, the keyword 'SSL' will be written to
65 stdout, possibly together with a few other features that can be
66 on and off on different libcurls.
67
68 See also the "Features libcurl Provides" further down.
69
70 autoconf macro
71 When you write your configure script to detect libcurl and setup
72 variables accordingly, we offer a prewritten macro that probably
73 does everything you need in this area. See
74 docs/libcurl/libcurl.m4 file - it includes docs on how to use
75 it.
76
77
79 The people behind libcurl have put a considerable effort to make
80 libcurl work on a large amount of different operating systems and envi‐
81 ronments.
82
83 You program libcurl the same way on all platforms that libcurl runs on.
84 There are only very few minor considerations that differs. If you just
85 make sure to write your code portable enough, you may very well create
86 yourself a very portable program. libcurl shouldn't stop you from that.
87
88
90 The program must initialize some of the libcurl functionality globally.
91 That means it should be done exactly once, no matter how many times you
92 intend to use the library. Once for your program's entire life time.
93 This is done using
94
95 curl_global_init()
96
97 and it takes one parameter which is a bit pattern that tells libcurl
98 what to initialize. Using CURL_GLOBAL_ALL will make it initialize all
99 known internal sub modules, and might be a good default option. The
100 current two bits that are specified are:
101
102 CURL_GLOBAL_WIN32
103 which only does anything on Windows machines. When used
104 on a Windows machine, it'll make libcurl initialize the
105 win32 socket stuff. Without having that initialized prop‐
106 erly, your program cannot use sockets properly. You
107 should only do this once for each application, so if your
108 program already does this or of another library in use
109 does it, you should not tell libcurl to do this as well.
110
111 CURL_GLOBAL_SSL
112 which only does anything on libcurls compiled and built
113 SSL-enabled. On these systems, this will make libcurl
114 initialize the SSL library properly for this application.
115 This is only needed to do once for each application so if
116 your program or another library already does this, this
117 bit should not be needed.
118
119 libcurl has a default protection mechanism that detects if
120 curl_global_init(3) hasn't been called by the time curl_easy_perform(3)
121 is called and if that is the case, libcurl runs the function itself
122 with a guessed bit pattern. Please note that depending solely on this
123 is not considered nice nor very good.
124
125 When the program no longer uses libcurl, it should call
126 curl_global_cleanup(3), which is the opposite of the init call. It will
127 then do the reversed operations to cleanup the resources the
128 curl_global_init(3) call initialized.
129
130 Repeated calls to curl_global_init(3) and curl_global_cleanup(3) should
131 be avoided. They should only be called once each.
132
133
135 It is considered best-practice to determine libcurl features at run-
136 time rather than at build-time (if possible of course). By calling
137 curl_version_info(3) and checking out the details of the returned
138 struct, your program can figure out exactly what the currently running
139 libcurl supports.
140
141
143 libcurl first introduced the so called easy interface. All operations
144 in the easy interface are prefixed with 'curl_easy'.
145
146 Recent libcurl versions also offer the multi interface. More about that
147 interface, what it is targeted for and how to use it is detailed in a
148 separate chapter further down. You still need to understand the easy
149 interface first, so please continue reading for better understanding.
150
151 To use the easy interface, you must first create yourself an easy han‐
152 dle. You need one handle for each easy session you want to perform.
153 Basically, you should use one handle for every thread you plan to use
154 for transferring. You must never share the same handle in multiple
155 threads.
156
157 Get an easy handle with
158
159 easyhandle = curl_easy_init();
160
161 It returns an easy handle. Using that you proceed to the next step:
162 setting up your preferred actions. A handle is just a logic entity for
163 the upcoming transfer or series of transfers.
164
165 You set properties and options for this handle using
166 curl_easy_setopt(3). They control how the subsequent transfer or trans‐
167 fers will be made. Options remain set in the handle until set again to
168 something different. Alas, multiple requests using the same handle will
169 use the same options.
170
171 Many of the options you set in libcurl are "strings", pointers to data
172 terminated with a zero byte. Keep in mind that when you set strings
173 with curl_easy_setopt(3), libcurl will not copy the data. It will
174 merely point to the data. You MUST make sure that the data remains
175 available for libcurl to use until finished or until you use the same
176 option again to point to something else.
177
178 One of the most basic properties to set in the handle is the URL. You
179 set your preferred URL to transfer with CURLOPT_URL in a manner similar
180 to:
181
182 curl_easy_setopt(handle, CURLOPT_URL, "http://domain.com/");
183
184 Let's assume for a while that you want to receive data as the URL iden‐
185 tifies a remote resource you want to get here. Since you write a sort
186 of application that needs this transfer, I assume that you would like
187 to get the data passed to you directly instead of simply getting it
188 passed to stdout. So, you write your own function that matches this
189 prototype:
190
191 size_t write_data(void *buffer, size_t size, size_t nmemb, void
192 *userp);
193
194 You tell libcurl to pass all data to this function by issuing a func‐
195 tion similar to this:
196
197 curl_easy_setopt(easyhandle, CURLOPT_WRITEFUNCTION, write_data);
198
199 You can control what data your function get in the forth argument by
200 setting another property:
201
202 curl_easy_setopt(easyhandle, CURLOPT_WRITEDATA, &internal_struct);
203
204 Using that property, you can easily pass local data between your appli‐
205 cation and the function that gets invoked by libcurl. libcurl itself
206 won't touch the data you pass with CURLOPT_WRITEDATA.
207
208 libcurl offers its own default internal callback that'll take care of
209 the data if you don't set the callback with CURLOPT_WRITEFUNCTION. It
210 will then simply output the received data to stdout. You can have the
211 default callback write the data to a different file handle by passing a
212 'FILE *' to a file opened for writing with the CURLOPT_WRITEDATA
213 option.
214
215 Now, we need to take a step back and have a deep breath. Here's one of
216 those rare platform-dependent nitpicks. Did you spot it? On some plat‐
217 forms[2], libcurl won't be able to operate on files opened by the pro‐
218 gram. Thus, if you use the default callback and pass in an open file
219 with CURLOPT_WRITEDATA, it will crash. You should therefore avoid this
220 to make your program run fine virtually everywhere.
221
222 (CURLOPT_WRITEDATA was formerly known as CURLOPT_FILE. Both names still
223 work and do the same thing).
224
225 If you're using libcurl as a win32 DLL, you MUST use the CURLOPT_WRITE‐
226 FUNCTION if you set CURLOPT_WRITEDATA - or you will experience crashes.
227
228 There are of course many more options you can set, and we'll get back
229 to a few of them later. Let's instead continue to the actual transfer:
230
231 success = curl_easy_perform(easyhandle);
232
233 curl_easy_perform(3) will connect to the remote site, do the necessary
234 commands and receive the transfer. Whenever it receives data, it calls
235 the callback function we previously set. The function may get one byte
236 at a time, or it may get many kilobytes at once. libcurl delivers as
237 much as possible as often as possible. Your callback function should
238 return the number of bytes it "took care of". If that is not the exact
239 same amount of bytes that was passed to it, libcurl will abort the
240 operation and return with an error code.
241
242 When the transfer is complete, the function returns a return code that
243 informs you if it succeeded in its mission or not. If a return code
244 isn't enough for you, you can use the CURLOPT_ERRORBUFFER to point
245 libcurl to a buffer of yours where it'll store a human readable error
246 message as well.
247
248 If you then want to transfer another file, the handle is ready to be
249 used again. Mind you, it is even preferred that you re-use an existing
250 handle if you intend to make another transfer. libcurl will then
251 attempt to re-use the previous connection.
252
253
255 The first basic rule is that you must never share a libcurl handle (be
256 it easy or multi or whatever) between multiple threads. Only use one
257 handle in one thread at a time.
258
259 libcurl is completely thread safe, except for two issues: signals and
260 SSL/TLS handlers. Signals are used timeouting name resolves (during DNS
261 lookup) - when built without c-ares support and not on Windows..
262
263 If you are accessing HTTPS or FTPS URLs in a multi-threaded manner, you
264 are then of course using the underlying SSL library multi-threaded and
265 those libs might have their own requirements on this issue. Basically,
266 you need to provide one or two functions to allow it to function prop‐
267 erly. For all details, see this:
268
269 OpenSSL
270
271 http://www.openssl.org/docs/crypto/threads.html#DESCRIPTION
272
273 GnuTLS
274
275 http://www.gnu.org/software/gnutls/man‐
276 ual/html_node/Multi_002dthreaded-applications.html
277
278 NSS
279
280 is claimed to be thread-safe already without anything required
281
282 yassl
283
284 Required actions unknown
285
286 When using multiple threads you should set the CURLOPT_NOSIGNAL option
287 to TRUE for all handles. Everything will or might work fine except that
288 timeouts are not honored during the DNS lookup - which you can work
289 around by building libcurl with c-ares support. c-ares is a library
290 that provides asynchronous name resolves. Unfortunately, c-ares does
291 not yet fully support IPv6. On some platforms, libcurl simply will not
292 function properly multi-threaded unless this option is set.
293
294 Also, note that CURLOPT_DNS_USE_GLOBAL_CACHE is not thread-safe.
295
296
298 There will always be times when the transfer fails for some reason. You
299 might have set the wrong libcurl option or misunderstood what the
300 libcurl option actually does, or the remote server might return non-
301 standard replies that confuse the library which then confuses your pro‐
302 gram.
303
304 There's one golden rule when these things occur: set the CURLOPT_VER‐
305 BOSE option to TRUE. It'll cause the library to spew out the entire
306 protocol details it sends, some internal info and some received proto‐
307 col data as well (especially when using FTP). If you're using HTTP,
308 adding the headers in the received output to study is also a clever way
309 to get a better understanding why the server behaves the way it does.
310 Include headers in the normal body output with CURLOPT_HEADER set TRUE.
311
312 Of course there are bugs left. We need to get to know about them to be
313 able to fix them, so we're quite dependent on your bug reports! When
314 you do report suspected bugs in libcurl, please include as much details
315 you possibly can: a protocol dump that CURLOPT_VERBOSE produces,
316 library version, as much as possible of your code that uses libcurl,
317 operating system name and version, compiler name and version etc.
318
319 If CURLOPT_VERBOSE is not enough, you increase the level of debug data
320 your application receive by using the CURLOPT_DEBUGFUNCTION.
321
322 Getting some in-depth knowledge about the protocols involved is never
323 wrong, and if you're trying to do funny things, you might very well
324 understand libcurl and how to use it better if you study the appropri‐
325 ate RFC documents at least briefly.
326
327
329 libcurl tries to keep a protocol independent approach to most trans‐
330 fers, thus uploading to a remote FTP site is very similar to uploading
331 data to a HTTP server with a PUT request.
332
333 Of course, first you either create an easy handle or you re-use one
334 existing one. Then you set the URL to operate on just like before. This
335 is the remote URL, that we now will upload.
336
337 Since we write an application, we most likely want libcurl to get the
338 upload data by asking us for it. To make it do that, we set the read
339 callback and the custom pointer libcurl will pass to our read callback.
340 The read callback should have a prototype similar to:
341
342 size_t function(char *bufptr, size_t size, size_t nitems, void
343 *userp);
344
345 Where bufptr is the pointer to a buffer we fill in with data to upload
346 and size*nitems is the size of the buffer and therefore also the maxi‐
347 mum amount of data we can return to libcurl in this call. The 'userp'
348 pointer is the custom pointer we set to point to a struct of ours to
349 pass private data between the application and the callback.
350
351 curl_easy_setopt(easyhandle, CURLOPT_READFUNCTION, read_function);
352
353 curl_easy_setopt(easyhandle, CURLOPT_INFILE, &filedata);
354
355 Tell libcurl that we want to upload:
356
357 curl_easy_setopt(easyhandle, CURLOPT_UPLOAD, TRUE);
358
359 A few protocols won't behave properly when uploads are done without any
360 prior knowledge of the expected file size. So, set the upload file size
361 using the CURLOPT_INFILESIZE_LARGE for all known file sizes like
362 this[1]:
363
364 /* in this example, file_size must be an off_t variable */
365 curl_easy_setopt(easyhandle, CURLOPT_INFILESIZE_LARGE, file_size);
366
367 When you call curl_easy_perform(3) this time, it'll perform all the
368 necessary operations and when it has invoked the upload it'll call your
369 supplied callback to get the data to upload. The program should return
370 as much data as possible in every invoke, as that is likely to make the
371 upload perform as fast as possible. The callback should return the num‐
372 ber of bytes it wrote in the buffer. Returning 0 will signal the end of
373 the upload.
374
375
377 Many protocols use or even require that user name and password are pro‐
378 vided to be able to download or upload the data of your choice. libcurl
379 offers several ways to specify them.
380
381 Most protocols support that you specify the name and password in the
382 URL itself. libcurl will detect this and use them accordingly. This is
383 written like this:
384
385 protocol://user:password@example.com/path/
386
387 If you need any odd letters in your user name or password, you should
388 enter them URL encoded, as %XX where XX is a two-digit hexadecimal num‐
389 ber.
390
391 libcurl also provides options to set various passwords. The user name
392 and password as shown embedded in the URL can instead get set with the
393 CURLOPT_USERPWD option. The argument passed to libcurl should be a char
394 * to a string in the format "user:password:". In a manner like this:
395
396 curl_easy_setopt(easyhandle, CURLOPT_USERPWD, "myname:thesecret");
397
398 Another case where name and password might be needed at times, is for
399 those users who need to authenticate themselves to a proxy they use.
400 libcurl offers another option for this, the CURLOPT_PROXYUSERPWD. It is
401 used quite similar to the CURLOPT_USERPWD option like this:
402
403 curl_easy_setopt(easyhandle, CURLOPT_PROXYUSERPWD, "myname:these‐
404 cret");
405
406 There's a long time unix "standard" way of storing ftp user names and
407 passwords, namely in the $HOME/.netrc file. The file should be made
408 private so that only the user may read it (see also the "Security Con‐
409 siderations" chapter), as it might contain the password in plain text.
410 libcurl has the ability to use this file to figure out what set of user
411 name and password to use for a particular host. As an extension to the
412 normal functionality, libcurl also supports this file for non-FTP pro‐
413 tocols such as HTTP. To make curl use this file, use the CURLOPT_NETRC
414 option:
415
416 curl_easy_setopt(easyhandle, CURLOPT_NETRC, TRUE);
417
418 And a very basic example of how such a .netrc file may look like:
419
420 machine myhost.mydomain.com
421 login userlogin
422 password secretword
423
424 All these examples have been cases where the password has been
425 optional, or at least you could leave it out and have libcurl attempt
426 to do its job without it. There are times when the password isn't
427 optional, like when you're using an SSL private key for secure trans‐
428 fers.
429
430 To pass the known private key password to libcurl:
431
432 curl_easy_setopt(easyhandle, CURLOPT_SSLKEYPASSWD, "keypassword");
433
434
436 The previous chapter showed how to set user name and password for get‐
437 ting URLs that require authentication. When using the HTTP protocol,
438 there are many different ways a client can provide those credentials to
439 the server and you can control what way libcurl will (attempt to) use.
440 The default HTTP authentication method is called 'Basic', which is
441 sending the name and password in clear-text in the HTTP request,
442 base64-encoded. This is insecure.
443
444 At the time of this writing libcurl can be built to use: Basic, Digest,
445 NTLM, Negotiate, GSS-Negotiate and SPNEGO. You can tell libcurl which
446 one to use with CURLOPT_HTTPAUTH as in:
447
448 curl_easy_setopt(easyhandle, CURLOPT_HTTPAUTH, CURLAUTH_DIGEST);
449
450 And when you send authentication to a proxy, you can also set authenti‐
451 cation type the same way but instead with CURLOPT_PROXYAUTH:
452
453 curl_easy_setopt(easyhandle, CURLOPT_PROXYAUTH, CURLAUTH_NTLM);
454
455 Both these options allow you to set multiple types (by ORing them
456 together), to make libcurl pick the most secure one out of the types
457 the server/proxy claims to support. This method does however add a
458 round-trip since libcurl must first ask the server what it supports:
459
460 curl_easy_setopt(easyhandle, CURLOPT_HTTPAUTH,
461 CURLAUTH_DIGEST|CURLAUTH_BASIC);
462
463 For convenience, you can use the 'CURLAUTH_ANY' define (instead of a
464 list with specific types) which allows libcurl to use whatever method
465 it wants.
466
467 When asking for multiple types, libcurl will pick the available one it
468 considers "best" in its own internal order of preference.
469
470
472 We get many questions regarding how to issue HTTP POSTs with libcurl
473 the proper way. This chapter will thus include examples using both dif‐
474 ferent versions of HTTP POST that libcurl supports.
475
476 The first version is the simple POST, the most common version, that
477 most HTML pages using the <form> tag uses. We provide a pointer to the
478 data and tell libcurl to post it all to the remote site:
479
480 char *data="name=daniel&project=curl";
481 curl_easy_setopt(easyhandle, CURLOPT_POSTFIELDS, data);
482 curl_easy_setopt(easyhandle, CURLOPT_URL, "http://posthere.com/");
483
484 curl_easy_perform(easyhandle); /* post away! */
485
486 Simple enough, huh? Since you set the POST options with the CUR‐
487 LOPT_POSTFIELDS, this automatically switches the handle to use POST in
488 the upcoming request.
489
490 Ok, so what if you want to post binary data that also requires you to
491 set the Content-Type: header of the post? Well, binary posts prevents
492 libcurl from being able to do strlen() on the data to figure out the
493 size, so therefore we must tell libcurl the size of the post data. Set‐
494 ting headers in libcurl requests are done in a generic way, by building
495 a list of our own headers and then passing that list to libcurl.
496
497 struct curl_slist *headers=NULL;
498 headers = curl_slist_append(headers, "Content-Type: text/xml");
499
500 /* post binary data */
501 curl_easy_setopt(easyhandle, CURLOPT_POSTFIELDS, binaryptr);
502
503 /* set the size of the postfields data */
504 curl_easy_setopt(easyhandle, CURLOPT_POSTFIELDSIZE, 23);
505
506 /* pass our list of custom made headers */
507 curl_easy_setopt(easyhandle, CURLOPT_HTTPHEADER, headers);
508
509 curl_easy_perform(easyhandle); /* post away! */
510
511 curl_slist_free_all(headers); /* free the header list */
512
513 While the simple examples above cover the majority of all cases where
514 HTTP POST operations are required, they don't do multi-part formposts.
515 Multi-part formposts were introduced as a better way to post (possibly
516 large) binary data and was first documented in the RFC1867. They're
517 called multi-part because they're built by a chain of parts, each being
518 a single unit. Each part has its own name and contents. You can in fact
519 create and post a multi-part formpost with the regular libcurl POST
520 support described above, but that would require that you build a form‐
521 post yourself and provide to libcurl. To make that easier, libcurl pro‐
522 vides curl_formadd(3). Using this function, you add parts to the form.
523 When you're done adding parts, you post the whole form.
524
525 The following example sets two simple text parts with plain textual
526 contents, and then a file with binary contents and upload the whole
527 thing.
528
529 struct curl_httppost *post=NULL;
530 struct curl_httppost *last=NULL;
531 curl_formadd(&post, &last,
532 CURLFORM_COPYNAME, "name",
533 CURLFORM_COPYCONTENTS, "daniel", CURLFORM_END);
534 curl_formadd(&post, &last,
535 CURLFORM_COPYNAME, "project",
536 CURLFORM_COPYCONTENTS, "curl", CURLFORM_END);
537 curl_formadd(&post, &last,
538 CURLFORM_COPYNAME, "logotype-image",
539 CURLFORM_FILECONTENT, "curl.png", CURLFORM_END);
540
541 /* Set the form info */
542 curl_easy_setopt(easyhandle, CURLOPT_HTTPPOST, post);
543
544 curl_easy_perform(easyhandle); /* post away! */
545
546 /* free the post data again */
547 curl_formfree(post);
548
549 Multipart formposts are chains of parts using MIME-style separators and
550 headers. It means that each one of these separate parts get a few head‐
551 ers set that describe the individual content-type, size etc. To enable
552 your application to handicraft this formpost even more, libcurl allows
553 you to supply your own set of custom headers to such an individual form
554 part. You can of course supply headers to as many parts you like, but
555 this little example will show how you set headers to one specific part
556 when you add that to the post handle:
557
558 struct curl_slist *headers=NULL;
559 headers = curl_slist_append(headers, "Content-Type: text/xml");
560
561 curl_formadd(&post, &last,
562 CURLFORM_COPYNAME, "logotype-image",
563 CURLFORM_FILECONTENT, "curl.xml",
564 CURLFORM_CONTENTHEADER, headers,
565 CURLFORM_END);
566
567 curl_easy_perform(easyhandle); /* post away! */
568
569 curl_formfree(post); /* free post */
570 curl_slist_free_all(post); /* free custom header list */
571
572 Since all options on an easyhandle are "sticky", they remain the same
573 until changed even if you do call curl_easy_perform(3), you may need to
574 tell curl to go back to a plain GET request if you intend to do such a
575 one as your next request. You force an easyhandle to back to GET by
576 using the CURLOPT_HTTPGET option:
577
578 curl_easy_setopt(easyhandle, CURLOPT_HTTPGET, TRUE);
579
580 Just setting CURLOPT_POSTFIELDS to "" or NULL will *not* stop libcurl
581 from doing a POST. It will just make it POST without any data to send!
582
583
585 For historical and traditional reasons, libcurl has a built-in progress
586 meter that can be switched on and then makes it presents a progress
587 meter in your terminal.
588
589 Switch on the progress meter by, oddly enough, set CURLOPT_NOPROGRESS
590 to FALSE. This option is set to TRUE by default.
591
592 For most applications however, the built-in progress meter is useless
593 and what instead is interesting is the ability to specify a progress
594 callback. The function pointer you pass to libcurl will then be called
595 on irregular intervals with information about the current transfer.
596
597 Set the progress callback by using CURLOPT_PROGRESSFUNCTION. And pass a
598 pointer to a function that matches this prototype:
599
600 int progress_callback(void *clientp,
601 double dltotal,
602 double dlnow,
603 double ultotal,
604 double ulnow);
605
606 If any of the input arguments is unknown, a 0 will be passed. The first
607 argument, the 'clientp' is the pointer you pass to libcurl with CUR‐
608 LOPT_PROGRESSDATA. libcurl won't touch it.
609
610
612 There's basically only one thing to keep in mind when using C++ instead
613 of C when interfacing libcurl:
614
615 The callbacks CANNOT be non-static class member functions
616
617 Example C++ code:
618
619 class AClass {
620 static size_t write_data(void *ptr, size_t size, size_t nmemb,
621 void *ourpointer)
622 {
623 /* do what you want with the data */
624 }
625 }
626
627
629 What "proxy" means according to Merriam-Webster: "a person authorized
630 to act for another" but also "the agency, function, or office of a
631 deputy who acts as a substitute for another".
632
633 Proxies are exceedingly common these days. Companies often only offer
634 Internet access to employees through their proxies. Network clients or
635 user-agents ask the proxy for documents, the proxy does the actual
636 request and then it returns them.
637
638 libcurl supports SOCKS and HTTP proxies. When a given URL is wanted,
639 libcurl will ask the proxy for it instead of trying to connect to the
640 actual host identified in the URL.
641
642 If you're using a SOCKS proxy, you may find that libcurl doesn't quite
643 support all operations through it.
644
645 For HTTP proxies: the fact that the proxy is a HTTP proxy puts certain
646 restrictions on what can actually happen. A requested URL that might
647 not be a HTTP URL will be still be passed to the HTTP proxy to deliver
648 back to libcurl. This happens transparently, and an application may not
649 need to know. I say "may", because at times it is very important to
650 understand that all operations over a HTTP proxy is using the HTTP pro‐
651 tocol. For example, you can't invoke your own custom FTP commands or
652 even proper FTP directory listings.
653
654
655 Proxy Options
656
657 To tell libcurl to use a proxy at a given port number:
658
659 curl_easy_setopt(easyhandle, CURLOPT_PROXY, "proxy-
660 host.com:8080");
661
662 Some proxies require user authentication before allowing a
663 request, and you pass that information similar to this:
664
665 curl_easy_setopt(easyhandle, CURLOPT_PROXYUSERPWD, "user:pass‐
666 word");
667
668 If you want to, you can specify the host name only in the CUR‐
669 LOPT_PROXY option, and set the port number separately with CUR‐
670 LOPT_PROXYPORT.
671
672 Tell libcurl what kind of proxy it is with CURLOPT_PROXYTYPE (if
673 not, it will default to assume a HTTP proxy):
674
675 curl_easy_setopt(easyhandle, CURLOPT_PROXYTYPE, CURL‐
676 PROXY_SOCKS4);
677
678
679 Environment Variables
680
681 libcurl automatically checks and uses a set of environment vari‐
682 ables to know what proxies to use for certain protocols. The
683 names of the variables are following an ancient de facto stan‐
684 dard and are built up as "[protocol]_proxy" (note the lower cas‐
685 ing). Which makes the variable HTTP. Following the same rule,
686 the variable named 'ftp_proxy' is checked for FTP URLs. Again,
687 the proxies are always HTTP proxies, the different names of the
688 variables simply allows different HTTP proxies to be used.
689
690 The proxy environment variable contents should be in the format
691 "[protocol://][user:password@]machine[:port]". Where the proto‐
692 col:// part is simply ignored if present (so http://proxy and
693 bluerk://proxy will do the same) and the optional port number
694 specifies on which port the proxy operates on the host. If not
695 specified, the internal default port number will be used and
696 that is most likely *not* the one you would like it to be.
697
698 There are two special environment variables. 'all_proxy' is what
699 sets proxy for any URL in case the protocol specific variable
700 wasn't set, and 'no_proxy' defines a list of hosts that should
701 not use a proxy even though a variable may say so. If 'no_proxy'
702 is a plain asterisk ("*") it matches all hosts.
703
704 To explicitly disable libcurl's checking for and using the proxy
705 environment variables, set the proxy name to "" - an empty
706 string - with CURLOPT_PROXY.
707
708 SSL and Proxies
709
710 SSL is for secure point-to-point connections. This involves
711 strong encryption and similar things, which effectively makes it
712 impossible for a proxy to operate as a "man in between" which
713 the proxy's task is, as previously discussed. Instead, the only
714 way to have SSL work over a HTTP proxy is to ask the proxy to
715 tunnel trough everything without being able to check or fiddle
716 with the traffic.
717
718 Opening an SSL connection over a HTTP proxy is therefor a matter
719 of asking the proxy for a straight connection to the target host
720 on a specified port. This is made with the HTTP request CONNECT.
721 ("please mr proxy, connect me to that remote host").
722
723 Because of the nature of this operation, where the proxy has no
724 idea what kind of data that is passed in and out through this
725 tunnel, this breaks some of the very few advantages that come
726 from using a proxy, such as caching. Many organizations prevent
727 this kind of tunneling to other destination port numbers than
728 443 (which is the default HTTPS port number).
729
730
731 Tunneling Through Proxy
732 As explained above, tunneling is required for SSL to work and
733 often even restricted to the operation intended for SSL; HTTPS.
734
735 This is however not the only time proxy-tunneling might offer
736 benefits to you or your application.
737
738 As tunneling opens a direct connection from your application to
739 the remote machine, it suddenly also re-introduces the ability
740 to do non-HTTP operations over a HTTP proxy. You can in fact use
741 things such as FTP upload or FTP custom commands this way.
742
743 Again, this is often prevented by the administrators of proxies
744 and is rarely allowed.
745
746 Tell libcurl to use proxy tunneling like this:
747
748 curl_easy_setopt(easyhandle, CURLOPT_HTTPPROXYTUNNEL, TRUE);
749
750 In fact, there might even be times when you want to do plain
751 HTTP operations using a tunnel like this, as it then enables you
752 to operate on the remote server instead of asking the proxy to
753 do so. libcurl will not stand in the way for such innovative
754 actions either!
755
756
757 Proxy Auto-Config
758
759 Netscape first came up with this. It is basically a web page
760 (usually using a .pac extension) with a javascript that when
761 executed by the browser with the requested URL as input, returns
762 information to the browser on how to connect to the URL. The
763 returned information might be "DIRECT" (which means no proxy
764 should be used), "PROXY host:port" (to tell the browser where
765 the proxy for this particular URL is) or "SOCKS host:port" (to
766 direct the browser to a SOCKS proxy).
767
768 libcurl has no means to interpret or evaluate javascript and
769 thus it doesn't support this. If you get yourself in a position
770 where you face this nasty invention, the following advice have
771 been mentioned and used in the past:
772
773 - Depending on the javascript complexity, write up a script that
774 translates it to another language and execute that.
775
776 - Read the javascript code and rewrite the same logic in another
777 language.
778
779 - Implement a javascript interpreted, people have successfully
780 used the Mozilla javascript engine in the past.
781
782 - Ask your admins to stop this, for a static proxy setup or sim‐
783 ilar.
784
785
787 Re-cycling the same easy handle several times when doing multiple
788 requests is the way to go.
789
790 After each single curl_easy_perform(3) operation, libcurl will keep the
791 connection alive and open. A subsequent request using the same easy
792 handle to the same host might just be able to use the already open con‐
793 nection! This reduces network impact a lot.
794
795 Even if the connection is dropped, all connections involving SSL to the
796 same host again, will benefit from libcurl's session ID cache that
797 drastically reduces re-connection time.
798
799 FTP connections that are kept alive saves a lot of time, as the com‐
800 mand- response round-trips are skipped, and also you don't risk getting
801 blocked without permission to login again like on many FTP servers only
802 allowing N persons to be logged in at the same time.
803
804 libcurl caches DNS name resolving results, to make lookups of a previ‐
805 ously looked up name a lot faster.
806
807 Other interesting details that improve performance for subsequent
808 requests may also be added in the future.
809
810 Each easy handle will attempt to keep the last few connections alive
811 for a while in case they are to be used again. You can set the size of
812 this "cache" with the CURLOPT_MAXCONNECTS option. Default is 5. It is
813 very seldom any point in changing this value, and if you think of
814 changing this it is often just a matter of thinking again.
815
816 To force your upcoming request to not use an already existing connec‐
817 tion (it will even close one first if there happens to be one alive to
818 the same host you're about to operate on), you can do that by setting
819 CURLOPT_FRESH_CONNECT to TRUE. In a similar spirit, you can also forbid
820 the upcoming request to be "lying" around and possibly get re-used
821 after the request by setting CURLOPT_FORBID_REUSE to TRUE.
822
823
825 When you use libcurl to do HTTP requests, it'll pass along a series of
826 headers automatically. It might be good for you to know and understand
827 these ones. You can replace or remove them by using the CURLOPT_HTTP‐
828 HEADER option.
829
830
831 Host This header is required by HTTP 1.1 and even many 1.0 servers
832 and should be the name of the server we want to talk to. This
833 includes the port number if anything but default.
834
835
836 Pragma "no-cache". Tells a possible proxy to not grab a copy from the
837 cache but to fetch a fresh one.
838
839
840 Accept "*/*".
841
842
843 Expect When doing POST requests, libcurl sets this header to "100-con‐
844 tinue" to ask the server for an "OK" message before it proceeds
845 with sending the data part of the post. If the POSTed data
846 amount is deemed "small", libcurl will not use this header.
847
848
850 There is an ongoing development today where more and more protocols are
851 built upon HTTP for transport. This has obvious benefits as HTTP is a
852 tested and reliable protocol that is widely deployed and have excellent
853 proxy-support.
854
855 When you use one of these protocols, and even when doing other kinds of
856 programming you may need to change the traditional HTTP (or FTP or...)
857 manners. You may need to change words, headers or various data.
858
859 libcurl is your friend here too.
860
861
862 CUSTOMREQUEST
863 If just changing the actual HTTP request keyword is what you
864 want, like when GET, HEAD or POST is not good enough for you,
865 CURLOPT_CUSTOMREQUEST is there for you. It is very simple to
866 use:
867
868 curl_easy_setopt(easyhandle, CURLOPT_CUSTOMREQUEST, "MYOWN‐
869 RUQUEST");
870
871 When using the custom request, you change the request keyword of
872 the actual request you are performing. Thus, by default you make
873 GET request but you can also make a POST operation (as described
874 before) and then replace the POST keyword if you want to. You're
875 the boss.
876
877
878 Modify Headers
879 HTTP-like protocols pass a series of headers to the server when
880 doing the request, and you're free to pass any amount of extra
881 headers that you think fit. Adding headers are this easy:
882
883 struct curl_slist *headers=NULL; /* init to NULL is important */
884
885 headers = curl_slist_append(headers, "Hey-server-hey: how are you?");
886 headers = curl_slist_append(headers, "X-silly-content: yes");
887
888 /* pass our list of custom made headers */
889 curl_easy_setopt(easyhandle, CURLOPT_HTTPHEADER, headers);
890
891 curl_easy_perform(easyhandle); /* transfer http */
892
893 curl_slist_free_all(headers); /* free the header list */
894
895 ... and if you think some of the internally generated headers,
896 such as Accept: or Host: don't contain the data you want them to
897 contain, you can replace them by simply setting them too:
898
899 headers = curl_slist_append(headers, "Accept: Agent-007");
900 headers = curl_slist_append(headers, "Host: munged.host.line");
901
902
903 Delete Headers
904 If you replace an existing header with one with no contents, you
905 will prevent the header from being sent. Like if you want to
906 completely prevent the "Accept:" header to be sent, you can dis‐
907 able it with code similar to this:
908
909 headers = curl_slist_append(headers, "Accept:");
910
911 Both replacing and canceling internal headers should be done
912 with careful consideration and you should be aware that you may
913 violate the HTTP protocol when doing so.
914
915
916 Enforcing chunked transfer-encoding
917
918 By making sure a request uses the custom header "Transfer-Encod‐
919 ing: chunked" when doing a non-GET HTTP operation, libcurl will
920 switch over to "chunked" upload, even though the size of the
921 data to upload might be known. By default, libcurl usually
922 switches over to chunked upload automatically if the upload data
923 size is unknown.
924
925
926 HTTP Version
927
928 All HTTP requests includes the version number to tell the server
929 which version we support. libcurl speak HTTP 1.1 by default.
930 Some very old servers don't like getting 1.1-requests and when
931 dealing with stubborn old things like that, you can tell libcurl
932 to use 1.0 instead by doing something like this:
933
934 curl_easy_setopt(easyhandle, CURLOPT_HTTP_VERSION,
935 CURL_HTTP_VERSION_1_0);
936
937
938 FTP Custom Commands
939
940 Not all protocols are HTTP-like, and thus the above may not help
941 you when you want to make for example your FTP transfers to
942 behave differently.
943
944 Sending custom commands to a FTP server means that you need to
945 send the commands exactly as the FTP server expects them (RFC959
946 is a good guide here), and you can only use commands that work
947 on the control-connection alone. All kinds of commands that
948 requires data interchange and thus needs a data-connection must
949 be left to libcurl's own judgment. Also be aware that libcurl
950 will do its very best to change directory to the target direc‐
951 tory before doing any transfer, so if you change directory (with
952 CWD or similar) you might confuse libcurl and then it might not
953 attempt to transfer the file in the correct remote directory.
954
955 A little example that deletes a given file before an operation:
956
957 headers = curl_slist_append(headers, "DELE file-to-remove");
958
959 /* pass the list of custom commands to the handle */
960 curl_easy_setopt(easyhandle, CURLOPT_QUOTE, headers);
961
962 curl_easy_perform(easyhandle); /* transfer ftp data! */
963
964 curl_slist_free_all(headers); /* free the header list */
965
966 If you would instead want this operation (or chain of opera‐
967 tions) to happen _after_ the data transfer took place the option
968 to curl_easy_setopt(3) would instead be called CURLOPT_POSTQUOTE
969 and used the exact same way.
970
971 The custom FTP command will be issued to the server in the same
972 order they are added to the list, and if a command gets an error
973 code returned back from the server, no more commands will be
974 issued and libcurl will bail out with an error code
975 (CURLE_FTP_QUOTE_ERROR). Note that if you use CURLOPT_QUOTE to
976 send commands before a transfer, no transfer will actually take
977 place when a quote command has failed.
978
979 If you set the CURLOPT_HEADER to true, you will tell libcurl to
980 get information about the target file and output "headers" about
981 it. The headers will be in "HTTP-style", looking like they do in
982 HTTP.
983
984 The option to enable headers or to run custom FTP commands may
985 be useful to combine with CURLOPT_NOBODY. If this option is set,
986 no actual file content transfer will be performed.
987
988
989 FTP Custom CUSTOMREQUEST
990 If you do what list the contents of a FTP directory using your
991 own defined FTP command, CURLOPT_CUSTOMREQUEST will do just
992 that. "NLST" is the default one for listing directories but
993 you're free to pass in your idea of a good alternative.
994
995
997 In the HTTP sense, a cookie is a name with an associated value. A
998 server sends the name and value to the client, and expects it to get
999 sent back on every subsequent request to the server that matches the
1000 particular conditions set. The conditions include that the domain name
1001 and path match and that the cookie hasn't become too old.
1002
1003 In real-world cases, servers send new cookies to replace existing one
1004 to update them. Server use cookies to "track" users and to keep "ses‐
1005 sions".
1006
1007 Cookies are sent from server to clients with the header Set-Cookie: and
1008 they're sent from clients to servers with the Cookie: header.
1009
1010 To just send whatever cookie you want to a server, you can use CUR‐
1011 LOPT_COOKIE to set a cookie string like this:
1012
1013 curl_easy_setopt(easyhandle, CURLOPT_COOKIE, "name1=var1;
1014 name2=var2;");
1015
1016 In many cases, that is not enough. You might want to dynamically save
1017 whatever cookies the remote server passes to you, and make sure those
1018 cookies are then use accordingly on later requests.
1019
1020 One way to do this, is to save all headers you receive in a plain file
1021 and when you make a request, you tell libcurl to read the previous
1022 headers to figure out which cookies to use. Set header file to read
1023 cookies from with CURLOPT_COOKIEFILE.
1024
1025 The CURLOPT_COOKIEFILE option also automatically enables the cookie
1026 parser in libcurl. Until the cookie parser is enabled, libcurl will not
1027 parse or understand incoming cookies and they will just be ignored.
1028 However, when the parser is enabled the cookies will be understood and
1029 the cookies will be kept in memory and used properly in subsequent
1030 requests when the same handle is used. Many times this is enough, and
1031 you may not have to save the cookies to disk at all. Note that the file
1032 you specify to CURLOPT_COOKIEFILE doesn't have to exist to enable the
1033 parser, so a common way to just enable the parser and not read able
1034 might be to use a file name you know doesn't exist.
1035
1036 If you rather use existing cookies that you've previously received with
1037 your Netscape or Mozilla browsers, you can make libcurl use that cookie
1038 file as input. The CURLOPT_COOKIEFILE is used for that too, as libcurl
1039 will automatically find out what kind of file it is and act accord‐
1040 ingly.
1041
1042 The perhaps most advanced cookie operation libcurl offers, is saving
1043 the entire internal cookie state back into a Netscape/Mozilla formatted
1044 cookie file. We call that the cookie-jar. When you set a file name with
1045 CURLOPT_COOKIEJAR, that file name will be created and all received
1046 cookies will be stored in it when curl_easy_cleanup(3) is called. This
1047 enabled cookies to get passed on properly between multiple handles
1048 without any information getting lost.
1049
1050
1052 FTP transfers use a second TCP/IP connection for the data transfer.
1053 This is usually a fact you can forget and ignore but at times this fact
1054 will come back to haunt you. libcurl offers several different ways to
1055 custom how the second connection is being made.
1056
1057 libcurl can either connect to the server a second time or tell the
1058 server to connect back to it. The first option is the default and it is
1059 also what works best for all the people behind firewalls, NATs or IP-
1060 masquerading setups. libcurl then tells the server to open up a new
1061 port and wait for a second connection. This is by default attempted
1062 with EPSV first, and if that doesn't work it tries PASV instead. (EPSV
1063 is an extension to the original FTP spec and does not exist nor work on
1064 all FTP servers.)
1065
1066 You can prevent libcurl from first trying the EPSV command by setting
1067 CURLOPT_FTP_USE_EPSV to FALSE.
1068
1069 In some cases, you will prefer to have the server connect back to you
1070 for the second connection. This might be when the server is perhaps
1071 behind a firewall or something and only allows connections on a single
1072 port. libcurl then informs the remote server which IP address and port
1073 number to connect to. This is made with the CURLOPT_FTPPORT option. If
1074 you set it to "-", libcurl will use your system's "default IP address".
1075 If you want to use a particular IP, you can set the full IP address, a
1076 host name to resolve to an IP address or even a local network interface
1077 name that libcurl will get the IP address from.
1078
1079 When doing the "PORT" approach, libcurl will attempt to use the EPRT
1080 and the LPRT before trying PORT, as they work with more protocols. You
1081 can disable this behavior by setting CURLOPT_FTP_USE_EPRT to FALSE.
1082
1083
1085 Some protocols provide "headers", meta-data separated from the normal
1086 data. These headers are by default not included in the normal data
1087 stream, but you can make them appear in the data stream by setting CUR‐
1088 LOPT_HEADER to TRUE.
1089
1090 What might be even more useful, is libcurl's ability to separate the
1091 headers from the data and thus make the callbacks differ. You can for
1092 example set a different pointer to pass to the ordinary write callback
1093 by setting CURLOPT_WRITEHEADER.
1094
1095 Or, you can set an entirely separate function to receive the headers,
1096 by using CURLOPT_HEADERFUNCTION.
1097
1098 The headers are passed to the callback function one by one, and you can
1099 depend on that fact. It makes it easier for you to add custom header
1100 parsers etc.
1101
1102 "Headers" for FTP transfers equal all the FTP server responses. They
1103 aren't actually true headers, but in this case we pretend they are! ;-)
1104
1105
1107 [ curl_easy_getinfo ]
1108
1109
1111 libcurl is in itself not insecure. If used the right way, you can use
1112 libcurl to transfer data pretty safely.
1113
1114 There are of course many things to consider that may loosen up this
1115 situation:
1116
1117
1118 Command Lines
1119 If you use a command line tool (such as curl) that uses libcurl,
1120 and you give option to the tool on the command line those
1121 options can very likely get read by other users of your system
1122 when they use 'ps' or other tools to list currently running pro‐
1123 cesses.
1124
1125 To avoid this problem, never feed sensitive things to programs
1126 using command line options.
1127
1128
1129 .netrc .netrc is a pretty handy file/feature that allows you to login
1130 quickly and automatically to frequently visited sites. The file
1131 contains passwords in clear text and is a real security risk. In
1132 some cases, your .netrc is also stored in a home directory that
1133 is NFS mounted or used on another network based file system, so
1134 the clear text password will fly through your network every time
1135 anyone reads that file!
1136
1137 To avoid this problem, don't use .netrc files and never store
1138 passwords in plain text anywhere.
1139
1140
1141 Clear Text Passwords
1142 Many of the protocols libcurl supports send name and password
1143 unencrypted as clear text (HTTP Basic authentication, FTP, TEL‐
1144 NET etc). It is very easy for anyone on your network or a net‐
1145 work nearby yours, to just fire up a network analyzer tool and
1146 eavesdrop on your passwords. Don't let the fact that HTTP uses
1147 base64 encoded passwords fool you. They may not look readable at
1148 a first glance, but they very easily "deciphered" by anyone
1149 within seconds.
1150
1151 To avoid this problem, use protocols that don't let snoopers see
1152 your password: HTTPS, FTPS and FTP-kerberos are a few examples.
1153 HTTP Digest authentication allows this too, but isn't supported
1154 by libcurl as of this writing.
1155
1156
1157 Showing What You Do
1158 On a related issue, be aware that even in situations like when
1159 you have problems with libcurl and ask someone for help, every‐
1160 thing you reveal in order to get best possible help might also
1161 impose certain security related risks. Host names, user names,
1162 paths, operating system specifics etc (not to mention passwords
1163 of course) may in fact be used by intruders to gain additional
1164 information of a potential target.
1165
1166 To avoid this problem, you must of course use your common sense.
1167 Often, you can just edit out the sensitive data or just
1168 search/replace your true information with faked data.
1169
1170
1172 The easy interface as described in detail in this document is a syn‐
1173 chronous interface that transfers one file at a time and doesn't return
1174 until its done.
1175
1176 The multi interface on the other hand, allows your program to transfer
1177 multiple files in both directions at the same time, without forcing you
1178 to use multiple threads.
1179
1180 To use this interface, you are better off if you first understand the
1181 basics of how to use the easy interface. The multi interface is simply
1182 a way to make multiple transfers at the same time, by adding up multi‐
1183 ple easy handles in to a "multi stack".
1184
1185 You create the easy handles you want and you set all the options just
1186 like you have been told above, and then you create a multi handle with
1187 curl_multi_init(3) and add all those easy handles to that multi handle
1188 with curl_multi_add_handle(3).
1189
1190 When you've added the handles you have for the moment (you can still
1191 add new ones at any time), you start the transfers by call
1192 curl_multi_perform(3).
1193
1194 curl_multi_perform(3) is asynchronous. It will only execute as little
1195 as possible and then return back control to your program. It is
1196 designed to never block. If it returns CURLM_CALL_MULTI_PERFORM you
1197 better call it again soon, as that is a signal that it still has local
1198 data to send or remote data to receive.
1199
1200 The best usage of this interface is when you do a select() on all pos‐
1201 sible file descriptors or sockets to know when to call libcurl again.
1202 This also makes it easy for you to wait and respond to actions on your
1203 own application's sockets/handles. You figure out what to select() for
1204 by using curl_multi_fdset(3), that fills in a set of fd_set variables
1205 for you with the particular file descriptors libcurl uses for the
1206 moment.
1207
1208 When you then call select(), it'll return when one of the file handles
1209 signal action and you then call curl_multi_perform(3) to allow libcurl
1210 to do what it wants to do. Take note that libcurl does also feature
1211 some time-out code so we advice you to never use very long timeouts on
1212 select() before you call curl_multi_perform(3), which thus should be
1213 called unconditionally every now and then even if none of its file
1214 descriptors have signaled ready. Another precaution you should use:
1215 always call curl_multi_fdset(3) immediately before the select() call
1216 since the current set of file descriptors may change when calling a
1217 curl function.
1218
1219 If you want to stop the transfer of one of the easy handles in the
1220 stack, you can use curl_multi_remove_handle(3) to remove individual
1221 easy handles. Remember that easy handles should be
1222 curl_easy_cleanup(3)ed.
1223
1224 When a transfer within the multi stack has finished, the counter of
1225 running transfers (as filled in by curl_multi_perform(3)) will
1226 decrease. When the number reaches zero, all transfers are done.
1227
1228 curl_multi_info_read(3) can be used to get information about completed
1229 transfers. It then returns the CURLcode for each easy transfer, to
1230 allow you to figure out success on each individual transfer.
1231
1232
1234 [ seeding, passwords, keys, certificates, ENGINE, ca certs ]
1235
1236
1238 [ fill in ]
1239
1240
1242 [1] libcurl 7.10.3 and later have the ability to switch over to
1243 chunked Transfer-Encoding in cases were HTTP uploads are done
1244 with data of an unknown size.
1245
1246 [2] This happens on Windows machines when libcurl is built and used
1247 as a DLL. However, you can still do this on Windows if you link
1248 with a static library.
1249
1250 [3] The curl-config tool is generated at build-time (on unix-like
1251 systems) and should be installed with the 'make install' or sim‐
1252 ilar instruction that installs the library, header files, man
1253 pages etc.
1254
1255
1256
1257libcurl 27 Feb 2007 libcurl-tutorial(3)