1Cache::FastMmap(3)    User Contributed Perl Documentation   Cache::FastMmap(3)
2
3
4

NAME

6       Cache::FastMmap - Uses an mmap'ed file to act as a shared memory
7       interprocess cache
8

SYNOPSIS

10         use Cache::FastMmap;
11
12         # Uses vaguely sane defaults
13         $Cache = Cache::FastMmap->new();
14
15         # $Value must be a reference...
16         $Cache->set($Key, $Value);
17         $Value = $Cache->get($Key);
18
19         $Cache = Cache::FastMmap->new(raw_values => 1);
20
21         # $Value can't be a reference...
22         $Cache->set($Key, $Value);
23         $Value = $Cache->get($Key);
24

ABSTRACT

26       A shared memory cache through an mmap'ed file. It's core is written in
27       C for performance. It uses fcntl locking to ensure multiple processes
28       can safely access the cache at the same time. It uses a basic LRU
29       algorithm to keep the most used entries in the cache.
30

DESCRIPTION

32       In multi-process environments (eg mod_perl, forking daemons, etc), it's
33       common to want to cache information, but have that cache shared between
34       processes. Many solutions already exist, and may suit your situation
35       better:
36
37       ·   MLDBM::Sync - acts as a database, data is not automatically
38           expired, slow
39
40       ·   IPC::MM - hash implementation is broken, data is not automatically
41           expired, slow
42
43       ·   Cache::FileCache - lots of features, slow
44
45       ·   Cache::SharedMemoryCache - lots of features, VERY slow. Uses
46           IPC::ShareLite which freeze/thaws ALL data at each read/write
47
48       ·   DBI - use your favourite RDBMS. can perform well, need a DB server
49           running. very global. socket connection latency
50
51       ·   Cache::Mmap - similar to this module, in pure perl. slows down with
52           larger pages
53
54       ·   BerkeleyDB - very fast (data ends up mostly in shared memory cache)
55           but acts as a database overall, so data is not automatically
56           expired
57
58       In the case I was working on, I needed:
59
60       ·   Automatic expiry and space management
61
62       ·   Very fast access to lots of small items
63
64       ·   The ability to fetch/store many items in one go
65
66       Which is why I developed this module. It tries to be quite efficient
67       through a number of means:
68
69       ·   Core code is written in C for performance
70
71       ·   It uses multiple pages within a file, and uses Fcntl to only lock a
72           page at a time to reduce contention when multiple processes access
73           the cache.
74
75       ·   It uses a dual level hashing system (hash to find page, then hash
76           within each page to find a slot) to make most "get()" calls O(1)
77           and fast
78
79       ·   On each "set()", if there are slots and page space available, only
80           the slot has to be updated and the data written at the end of the
81           used data space. If either runs out, a re-organisation of the page
82           is performed to create new slots/space which is done in an
83           efficient way
84
85       The class also supports read-through, and write-back or write-through
86       callbacks to access the real data if it's not in the cache, meaning
87       that code like this:
88
89         my $Value = $Cache->get($Key);
90         if (!defined $Value) {
91           $Value = $RealDataSource->get($Key);
92           $Cache->set($Key, $Value)
93         }
94
95       Isn't required, you instead specify in the constructor:
96
97         Cache::FastMmap->new(
98           ...
99           context => $RealDataSourceHandle,
100           read_cb => sub { $_[0]->get($_[1]) },
101           write_cb => sub { $_[0]->set($_[1], $_[2]) },
102         );
103
104       And then:
105
106         my $Value = $Cache->get($Key);
107
108         $Cache->set($Key, $NewValue);
109
110       Will just work and will be read/written to the underlying data source
111       as needed automatically.
112

PERFORMANCE

114       If you're storing relatively large and complex structures into the
115       cache, then you're limited by the speed of the Storable module.  If
116       you're storing simple structures, or raw data, then Cache::FastMmap has
117       noticeable performance improvements.
118
119       See <http://cpan.robm.fastmail.fm/cache_perf.html> for some comparisons
120       to other modules.
121

COMPATIBILITY

123       Cache::FastMmap uses mmap to map a file as the shared cache space, and
124       fcntl to do page locking. This means it should work on most UNIX like
125       operating systems.
126
127       Ash Berlin has written a Win32 layer using MapViewOfFile et al. to
128       provide support for Win32 platform.
129

MEMORY SIZE

131       Because Cache::FastMmap mmap's a shared file into your processes memory
132       space, this can make each process look quite large, even though it's
133       just mmap'd memory that's shared between all processes that use the
134       cache, and may even be swapped out if the cache is getting low usage.
135
136       However, the OS will think your process is quite large, which might
137       mean you hit some BSD::Resource or 'ulimits' you set previously that
138       you thought were sane, but aren't anymore, so be aware.
139

CACHE FILES AND OS ISSUES

141       Because Cache::FastMmap uses an mmap'ed file, when you put values into
142       the cache, you are actually "dirtying" pages in memory that belong to
143       the cache file. Your OS will want to write those dirty pages back to
144       the file on the actual physical disk, but the rate it does that at is
145       very OS dependent.
146
147       In Linux, you have some control over how the OS writes those pages back
148       using a number of parameters in /proc/sys/vm
149
150         dirty_background_ratio
151         dirty_expire_centisecs
152         dirty_ratio
153         dirty_writeback_centisecs
154
155       How you tune these depends heavily on your setup.
156
157       As an interesting point, if you use a highmem linux kernel, a change
158       between 2.6.16 and 2.6.20 made the kernel flush memory a LOT more.
159       There's details in this kernel mailing list thread:
160       <http://www.uwsg.iu.edu/hypermail/linux/kernel/0711.3/0804.html>
161
162       In most cases, people are not actually concerned about the persistence
163       of data in the cache, and so are happy to disable writing of any cache
164       data back to disk at all. Baically what they want is an in memory only
165       shared cache. The best way to do that is to use a "tmpfs" filesystem
166       and put all cache files on there.
167
168       For instance, all our machines have a /tmpfs mount point that we create
169       in /etc/fstab as:
170
171         none /tmpfs tmpfs defaults,noatime,size=1000M 0 0
172
173       And we put all our cache files on there. The tmpfs filesystem is smart
174       enough to only use memory as required by files actually on the tmpfs,
175       so making it 1G in size doesn't actually use 1G of memory, it only uses
176       as much as the cache files we put on it. In all cases, we ensure that
177       we never run out of real memory, so the cache files effectively act
178       just as named access points to shared memory.
179
180       Some people have suggested using anonymous mmaped memory. Unfortunately
181       we need a file descriptor to do the fcntl locking on, so we'd have to
182       create a separate file on a filesystem somewhere anyway. It seems
183       easier to just create an explicit "tmpfs" filesystem.
184

PAGE SIZE AND KEY/VALUE LIMITS

186       To reduce lock contention, Cache::FastMmap breaks up the file into
187       pages. When you get/set a value, it hashes the key to get a page, then
188       locks that page, and uses a hash table within the page to get/store the
189       actual key/value pair.
190
191       One consequence of this is that you cannot store values larger than a
192       page in the cache at all. Attempting to store values larger than a page
193       size will fail (the set() function will return false).
194
195       Also keep in mind that each page has it's own hash table, and that we
196       store the key and value data of each item. So if you are expecting to
197       store large values and/or keys in the cache, you should use page sizes
198       that are definitely larger than your largest key + value size + a few
199       kbytes for the overhead.
200

USAGE

202       Because the cache uses shared memory through an mmap'd file, you have
203       to make sure each process connects up to the file. There's probably two
204       main ways to do this:
205
206       ·   Create the cache in the parent process, and then when it forks,
207           each child will inherit the same file descriptor, mmap'ed memory,
208           etc and just work. This is the recommended way. (BEWARE: This only
209           works under UNIX as Win32 has no concept of forking)
210
211       ·   Explicitly connect up in each forked child to the share file. In
212           this case, make sure the file already exists and the children
213           connect with init_file => 0 to avoid deleting the cache contents
214           and possible race corruption conditions. Also be careful that
215           multiple children may race to create the file at the same time,
216           each overwriting and corrupting content. Use a separate lock file
217           if you must to ensure only one child creates the file. (This is the
218           only possible way under Win32)
219
220       The first way is usually the easiest. If you're using the cache in a
221       Net::Server based module, you'll want to open the cache in the
222       "pre_loop_hook", because that's executed before the fork, but after the
223       process ownership has changed and any chroot has been done.
224
225       In mod_perl, just open the cache at the global level in the appropriate
226       module, which is executed as the server is starting and before it
227       starts forking children, but you'll probably want to chmod or chown the
228       file to the permissions of the apache process.
229

METHODS

231       new(%Opts)
232           Create a new Cache::FastMmap object.
233
234           Basic global parameters are:
235
236           ·   share_file
237
238               File to mmap for sharing of data.  default on unix:
239               /tmp/sharefile-$pid-$time-$random default on windows:
240               %TEMP%\sharefile-$pid-$time-$random
241
242           ·   init_file
243
244               Clear any existing values and re-initialise file. Useful to do
245               in a parent that forks off children to ensure that file is
246               empty at the start (default: 0)
247
248               Note: This is quite important to do in the parent to ensure a
249               consistent file structure. The shared file is not perfectly
250               transaction safe, and so if a child is killed at the wrong
251               instant, it might leave the the cache file in an inconsistent
252               state.
253
254           ·   raw_values
255
256               Store values as raw binary data rather than using Storable to
257               free/thaw data structures (default: 0)
258
259           ·   compress
260
261               Compress the value (but not the key) before storing into the
262               cache. If you set this to 1, the module will attempt to require
263               the Compress::Zlib module and then use the memGzip() function
264               on the value data before storing into the cache, and
265               memGunzip() when retrieving data from the cache. Some initial
266               testing shows that the uncompressing tends to be very fast,
267               though the compressing can be quite slow, so it's probably best
268               to use this option only if you know values in the cache are
269               long lived and have a high hit rate. (default: 0)
270
271           ·   enable_stats
272
273               Enable some basic statistics capturing. When enabled, every
274               read to the cache is counted, and every read to the cache that
275               finds a value in the cache is also counted. You can then
276               retrieve these values via the get_statistics() call. This
277               causes every read action to do a write on a page, which can
278               cause some more IO, so it's disabled by default. (default: 0)
279
280           ·   expire_time
281
282               Maximum time to hold values in the cache in seconds. A value of
283               0 means does no explicit expiry time, and values are expired
284               only based on LRU usage. Can be expressed as 1m, 1h, 1d for
285               minutes/hours/days respectively. (default: 0)
286
287           You may specify the cache size as:
288
289           ·   cache_size
290
291               Size of cache. Can be expresses as 1k, 1m for kilobytes or
292               megabytes respectively. Automatically guesses page size/page
293               count values.
294
295           Or specify explicit page size/page count values. If none of these
296           are specified, the values page_size = 64k and num_pages = 89 are
297           used.
298
299           ·   page_size
300
301               Size of each page. Must be a power of 2 between 4k and 1024k.
302               If not, is rounded to the nearest value.
303
304           ·   num_pages
305
306               Number of pages. Should be a prime number for best hashing
307
308           The cache allows the use of callbacks for reading/writing data to
309           an underlying data store.
310
311           ·   context
312
313               Opaque reference passed as the first parameter to any callback
314               function if specified
315
316           ·   read_cb
317
318               Callback to read data from the underlying data store.  Called
319               as:
320
321                 $read_cb->($context, $Key)
322
323               Should return the value to use. This value will be saved in the
324               cache for future retrievals. Return undef if there is no value
325               for the given key
326
327           ·   write_cb
328
329               Callback to write data to the underlying data store.  Called
330               as:
331
332                 $write_cb->($context, $Key, $Value, $ExpiryTime)
333
334               In 'write_through' mode, it's always called as soon as a
335               set(...)  is called on the Cache::FastMmap class. In
336               'write_back' mode, it's called when a value is expunged from
337               the cache if it's been changed by a set(...) rather than read
338               from the underlying store with the read_cb above.
339
340               Note: Expired items do result in the write_cb being called if
341               'write_back' caching is enabled and the item has been changed.
342               You can check the $ExpiryTime against "time()" if you only want
343               to write back values which aren't expired.
344
345               Also remember that write_cb may be called in a different
346               process to the one that placed the data in the cache in the
347               first place
348
349           ·   delete_cb
350
351               Callback to delete data from the underlying data store.  Called
352               as:
353
354                 $delete_cb->($context, $Key)
355
356               Called as soon as remove(...) is called on the Cache::FastMmap
357               class
358
359           ·   cache_not_found
360
361               If set to true, then if the read_cb is called and it returns
362               undef to say nothing was found, then that information is stored
363               in the cache, so that next time a get(...) is called on that
364               key, undef is returned immediately rather than again calling
365               the read_cb
366
367           ·   write_action
368
369               Either 'write_back' or 'write_through'. (default:
370               write_through)
371
372           ·   allow_recursive
373
374               If you're using a callback function, then normally the cache is
375               not re-enterable, and attempting to call a get/set on the cache
376               will cause an error. By setting this to one, the cache will
377               unlock any pages before calling the callback. During the unlock
378               time, other processes may change data in current cache page,
379               causing possible unexpected effects. You shouldn't set this
380               unless you know you want to be able to recall to the cache
381               within a callback.  (default: 0)
382
383           ·   empty_on_exit
384
385               When you have 'write_back' mode enabled, then you really want
386               to make sure all values from the cache are expunged when your
387               program exits so any changes are written back.
388
389               The trick is that we only want to do this in the parent
390               process, we don't want any child processes to empty the cache
391               when they exit.  So if you set this, it takes the PID via $$,
392               and only calls empty in the DESTROY method if $$ matches the
393               pid we captured at the start. (default: 0)
394
395           ·   unlink_on_exit
396
397               Unlink the share file when the cache is destroyed.
398
399               As with empty_on_exit, this will only unlink the file if the
400               DESTROY occurs in the same PID that the cache was created in so
401               that any forked children don't unlink the file.
402
403               This value defaults to 1 if the share_file specified does not
404               already exist. If the share_file specified does already exist,
405               it defaults to 0.
406
407           ·   catch_deadlocks
408
409               Sets an alarm(10) before each page is locked via
410               fcntl(F_SETLKW) to catch any deadlock. This used to be the
411               default behaviour, but it's not really needed in the default
412               case and could clobber sub-second Time::HiRes alarms setup by
413               other code. Defaults to 0.
414
415       get($Key, [ \%Options ])
416           Search cache for given Key. Returns undef if not found. If read_cb
417           specified and not found, calls the callback to try and find the
418           value for the key, and if found (or 'cache_not_found' is set),
419           stores it into the cache and returns the found value.
420
421           %Options is optional, and is used by get_and_set() to control the
422           locking behaviour. For now, you should probably ignore it unless
423           you read the code to understand how it works
424
425       set($Key, $Value, [ \%Options ])
426           Store specified key/value pair into cache
427
428           %Options is optional, and is used by get_and_set() to control the
429           locking behaviour. For now, you should probably ignore it unless
430           you read the code to understand how it works
431
432           This method returns true if the value was stored in the cache,
433           false otherwise. See the PAGE SIZE AND KEY/VALUE LIMITS section for
434           more details.
435
436       get_and_set($Key, $Sub)
437           Atomically retrieve and set the value of a Key.
438
439           The page is locked while retrieving the $Key and is unlocked only
440           after the value is set, thus guaranteeing the value does not change
441           betwen the get and set operations.
442
443           $Sub is a reference to a subroutine that is called to calculate the
444           new value to store. $Sub gets $Key and the current value as
445           parameters, and should return the new value to set in the cache for
446           the given $Key.
447
448           For example, to atomically increment a value in the cache, you can
449           just use:
450
451             $Cache->get_and_set($Key, sub { return ++$_[1]; });
452
453           In scalar context, the return value from this function is the *new*
454           value stored back into the cache.
455
456           In list context, a two item array is returned; the new value stored
457           back into the cache and a boolean that's true if the value was
458           stored in the cache, false otherwise. See the PAGE SIZE AND
459           KEY/VALUE LIMITS section for more details.
460
461           Notes:
462
463           ·   Do not perform any get/set operations from the callback sub, as
464               these operations lock the page and you may end up with a dead
465               lock!
466
467           ·   If your sub does a die/throws an exception, the page will
468               correctly be unlocked (1.15 onwards)
469
470       remove($Key, [ \%Options ])
471           Delete the given key from the cache
472
473           %Options is optional, and is used by get_and_remove() to control
474           the locking behaviour. For now, you should probably ignore it
475           unless you read the code to understand how it works
476
477       get_and_remove($Key)
478           Atomically retrieve value of a Key while removing it from the
479           cache.
480
481           The page is locked while retrieving the $Key and is unlocked only
482           after the value is removed, thus guaranteeing the value stored by
483           someone else isn't removed by us.
484
485       clear()
486           Clear all items from the cache
487
488           Note: If you're using callbacks, this has no effect on items in the
489           underlying data store. No delete callbacks are made
490
491       purge()
492           Clear all expired items from the cache
493
494           Note: If you're using callbacks, this has no effect on items in the
495           underlying data store. No delete callbacks are made, and no write
496           callbacks are made for the expired data
497
498       empty($OnlyExpired)
499           Empty all items from the cache, or if $OnlyExpired is true, only
500           expired items.
501
502           Note: If 'write_back' mode is enabled, any changed items are
503           written back to the underlying store. Expired items are written
504           back to the underlying store as well.
505
506       get_keys($Mode)
507           Get a list of keys/values held in the cache. May immediately be out
508           of date because of the shared access nature of the cache
509
510           If $Mode == 0, an array of keys is returned
511
512           If $Mode == 1, then an array of hashrefs, with 'key',
513           'last_access', 'expire_time' and 'flags' keys is returned
514
515           If $Mode == 2, then hashrefs also contain 'value' key
516
517       get_statistics($Clear)
518           Returns a two value list of (nreads, nreadhits). This only works if
519           you passed enable_stats in the constructor
520
521           nreads is the total number of read attempts done on the cache since
522           it was created
523
524           nreadhits is the total number of read attempts done on the cache
525           since it was created that found the key/value in the cache
526
527           If $Clear is true, the values are reset immediately after they are
528           retrieved
529
530       multi_get($PageKey, [ $Key1, $Key2, ... ])
531           The two multi_xxx routines act a bit differently to the other
532           routines. With the multi_get, you pass a separate PageKey value and
533           then multiple keys. The PageKey value is hashed, and that page
534           locked. Then that page is searched for each key. It returns a hash
535           ref of Key => Value items found in that page in the cache.
536
537           The main advantage of this is just a speed one, if you happen to
538           need to search for a lot of items on each call.
539
540           For instance, say you have users and a bunch of pieces of separate
541           information for each user. On a particular run, you need to
542           retrieve a sub-set of that information for a user. You could do
543           lots of get() calls, or you could use the 'username' as the page
544           key, and just use one multi_get() and multi_set() call instead.
545
546           A couple of things to note:
547
548           1.  This makes multi_get()/multi_set() and get()/set()
549               incompatible. Don't mix calls to the two, because you won't
550               find the data you're expecting
551
552           2.  The writeback and callback modes of operation do not work with
553               multi_get()/multi_set(). Don't attempt to use them together.
554
555       multi_set($PageKey, { $Key1 = $Value1, $Key2 => $Value2, ... }, [
556       \%Options ])>
557           Store specified key/value pair into cache
558

INTERNAL METHODS

560       _expunge_all($Mode, $WB)
561           Expunge all items from the cache
562
563           Expunged items (that have not expired) are written back to the
564           underlying store if write_back is enabled
565
566       _expunge_page($Mode, $WB, $Len)
567           Expunge items from the current page to make space for $Len bytes
568           key/value items
569
570           Expunged items (that have not expired) are written back to the
571           underlying store if write_back is enabled
572
573       _lock_page($Page)
574           Lock a given page in the cache, and return an object reference that
575           when DESTROYed, unlocks the page
576

INCOMPATIBLE CHANGES

578       ·   From 1.15
579
580           ·   Default share_file name is no-longer /tmp/sharefile, but
581               /tmp/sharefile-$pid-$time.  This ensures that different
582               runs/processes don't interfere with each other, but means you
583               may not connect up to the file you expect. You should be
584               choosing an explicit name in most cases.
585
586               On Unix systems, you can pass in the environment variable
587               TMPDIR to override the default directory of /tmp
588
589           ·   The new option unlink_on_exit defaults to true if you pass a
590               filename for the share_file which doesn't already exist. This
591               means if you have one process that creates the file, and
592               another that expects the file to be there, by default it won't
593               be.
594
595               Otherwise the defaults seem sensible to cleanup unneeded share
596               files rather than leaving them around to accumulate.
597
598       ·   From 1.29
599
600           ·   Default share_file name is no longer /tmp/sharefile-$pid-$time
601               but /tmp/sharefile-$pid-$time-$random.
602
603       ·   From 1.31
604
605           ·   Before 1.31, if you were using raw_values => 0 mode, then the
606               write_cb would be called with raw frozen data, rather than the
607               thawed object.  From 1.31 onwards, it correctly calls write_cb
608               with the thawed object value (eg what was passed to the ->set()
609               call in the first place)
610
611       ·   From 1.36
612
613           ·   Before 1.36, an alarm(10) would be set before each attempt to
614               lock a page. The only purpose of this was to detect deadlocks,
615               which should only happen if the Cache::FastMmap code was buggy,
616               or a callback function in get_and_set() made another call into
617               Cache::FastMmap.
618
619               However this added unnecessary extra system calls for every
620               lookup, and for users using Time::HiRes, it could clobber any
621               existing alarms that had been set with sub-second resolution.
622
623               So this has now been made an optional feature via the
624               catch_deadlocks option passed to new.
625

SEE ALSO

627       MLDBM::Sync, IPC::MM, Cache::FileCache, Cache::SharedMemoryCache, DBI,
628       Cache::Mmap, BerkeleyDB
629
630       Latest news/details can also be found at:
631
632       <http://cpan.robm.fastmail.fm/cachefastmmap/>
633
634       Available on github at:
635
636       https://github.com/robmueller/cache-fastmmap/
637       <https://github.com/robmueller/cache-fastmmap/>
638

AUTHOR

640       Rob Mueller <mailto:cpan@robm.fastmail.fm>
641
643       Copyright (C) 2003-2011 by Opera Software Australia Pty Ltd
644
645       This library is free software; you can redistribute it and/or modify it
646       under the same terms as Perl itself.
647
648
649
650perl v5.12.3                      2011-07-18                Cache::FastMmap(3)
Impressum