1Cache::FastMmap(3) User Contributed Perl Documentation Cache::FastMmap(3)
2
3
4
6 Cache::FastMmap - Uses an mmap'ed file to act as a shared memory
7 interprocess cache
8
10 use Cache::FastMmap;
11
12 # Uses vaguely sane defaults
13 $Cache = Cache::FastMmap->new();
14
15 # Uses Storable to serialize $Value to bytes for storage
16 $Cache->set($Key, $Value);
17 $Value = $Cache->get($Key);
18
19 $Cache = Cache::FastMmap->new(serializer => '');
20
21 # Stores stringified bytes of $Value directly
22 $Cache->set($Key, $Value);
23 $Value = $Cache->get($Key);
24
26 A shared memory cache through an mmap'ed file. It's core is written in
27 C for performance. It uses fcntl locking to ensure multiple processes
28 can safely access the cache at the same time. It uses a basic LRU
29 algorithm to keep the most used entries in the cache.
30
32 In multi-process environments (eg mod_perl, forking daemons, etc), it's
33 common to want to cache information, but have that cache shared between
34 processes. Many solutions already exist, and may suit your situation
35 better:
36
37 · MLDBM::Sync - acts as a database, data is not automatically
38 expired, slow
39
40 · IPC::MM - hash implementation is broken, data is not automatically
41 expired, slow
42
43 · Cache::FileCache - lots of features, slow
44
45 · Cache::SharedMemoryCache - lots of features, VERY slow. Uses
46 IPC::ShareLite which freeze/thaws ALL data at each read/write
47
48 · DBI - use your favourite RDBMS. can perform well, need a DB server
49 running. very global. socket connection latency
50
51 · Cache::Mmap - similar to this module, in pure perl. slows down with
52 larger pages
53
54 · BerkeleyDB - very fast (data ends up mostly in shared memory cache)
55 but acts as a database overall, so data is not automatically
56 expired
57
58 In the case I was working on, I needed:
59
60 · Automatic expiry and space management
61
62 · Very fast access to lots of small items
63
64 · The ability to fetch/store many items in one go
65
66 Which is why I developed this module. It tries to be quite efficient
67 through a number of means:
68
69 · Core code is written in C for performance
70
71 · It uses multiple pages within a file, and uses Fcntl to only lock a
72 page at a time to reduce contention when multiple processes access
73 the cache.
74
75 · It uses a dual level hashing system (hash to find page, then hash
76 within each page to find a slot) to make most "get()" calls O(1)
77 and fast
78
79 · On each "set()", if there are slots and page space available, only
80 the slot has to be updated and the data written at the end of the
81 used data space. If either runs out, a re-organisation of the page
82 is performed to create new slots/space which is done in an
83 efficient way
84
85 The class also supports read-through, and write-back or write-through
86 callbacks to access the real data if it's not in the cache, meaning
87 that code like this:
88
89 my $Value = $Cache->get($Key);
90 if (!defined $Value) {
91 $Value = $RealDataSource->get($Key);
92 $Cache->set($Key, $Value)
93 }
94
95 Isn't required, you instead specify in the constructor:
96
97 Cache::FastMmap->new(
98 ...
99 context => $RealDataSourceHandle,
100 read_cb => sub { $_[0]->get($_[1]) },
101 write_cb => sub { $_[0]->set($_[1], $_[2]) },
102 );
103
104 And then:
105
106 my $Value = $Cache->get($Key);
107
108 $Cache->set($Key, $NewValue);
109
110 Will just work and will be read/written to the underlying data source
111 as needed automatically.
112
114 If you're storing relatively large and complex structures into the
115 cache, then you're limited by the speed of the Storable module. If
116 you're storing simple structures, or raw data, then Cache::FastMmap has
117 noticeable performance improvements.
118
119 See <http://cpan.robm.fastmail.fm/cache_perf.html> for some comparisons
120 to other modules.
121
123 Cache::FastMmap uses mmap to map a file as the shared cache space, and
124 fcntl to do page locking. This means it should work on most UNIX like
125 operating systems.
126
127 Ash Berlin has written a Win32 layer using MapViewOfFile et al. to
128 provide support for Win32 platform.
129
131 Because Cache::FastMmap mmap's a shared file into your processes memory
132 space, this can make each process look quite large, even though it's
133 just mmap'd memory that's shared between all processes that use the
134 cache, and may even be swapped out if the cache is getting low usage.
135
136 However, the OS will think your process is quite large, which might
137 mean you hit some BSD::Resource or 'ulimits' you set previously that
138 you thought were sane, but aren't anymore, so be aware.
139
141 Because Cache::FastMmap uses an mmap'ed file, when you put values into
142 the cache, you are actually "dirtying" pages in memory that belong to
143 the cache file. Your OS will want to write those dirty pages back to
144 the file on the actual physical disk, but the rate it does that at is
145 very OS dependent.
146
147 In Linux, you have some control over how the OS writes those pages back
148 using a number of parameters in /proc/sys/vm
149
150 dirty_background_ratio
151 dirty_expire_centisecs
152 dirty_ratio
153 dirty_writeback_centisecs
154
155 How you tune these depends heavily on your setup.
156
157 As an interesting point, if you use a highmem linux kernel, a change
158 between 2.6.16 and 2.6.20 made the kernel flush memory a LOT more.
159 There's details in this kernel mailing list thread:
160 <http://www.uwsg.iu.edu/hypermail/linux/kernel/0711.3/0804.html>
161
162 In most cases, people are not actually concerned about the persistence
163 of data in the cache, and so are happy to disable writing of any cache
164 data back to disk at all. Baically what they want is an in memory only
165 shared cache. The best way to do that is to use a "tmpfs" filesystem
166 and put all cache files on there.
167
168 For instance, all our machines have a /tmpfs mount point that we create
169 in /etc/fstab as:
170
171 none /tmpfs tmpfs defaults,noatime,size=1000M 0 0
172
173 And we put all our cache files on there. The tmpfs filesystem is smart
174 enough to only use memory as required by files actually on the tmpfs,
175 so making it 1G in size doesn't actually use 1G of memory, it only uses
176 as much as the cache files we put on it. In all cases, we ensure that
177 we never run out of real memory, so the cache files effectively act
178 just as named access points to shared memory.
179
180 Some people have suggested using anonymous mmaped memory. Unfortunately
181 we need a file descriptor to do the fcntl locking on, so we'd have to
182 create a separate file on a filesystem somewhere anyway. It seems
183 easier to just create an explicit "tmpfs" filesystem.
184
186 To reduce lock contention, Cache::FastMmap breaks up the file into
187 pages. When you get/set a value, it hashes the key to get a page, then
188 locks that page, and uses a hash table within the page to get/store the
189 actual key/value pair.
190
191 One consequence of this is that you cannot store values larger than a
192 page in the cache at all. Attempting to store values larger than a page
193 size will fail (the set() function will return false).
194
195 Also keep in mind that each page has it's own hash table, and that we
196 store the key and value data of each item. So if you are expecting to
197 store large values and/or keys in the cache, you should use page sizes
198 that are definitely larger than your largest key + value size + a few
199 kbytes for the overhead.
200
202 Because the cache uses shared memory through an mmap'd file, you have
203 to make sure each process connects up to the file. There's probably two
204 main ways to do this:
205
206 · Create the cache in the parent process, and then when it forks,
207 each child will inherit the same file descriptor, mmap'ed memory,
208 etc and just work. This is the recommended way. (BEWARE: This only
209 works under UNIX as Win32 has no concept of forking)
210
211 · Explicitly connect up in each forked child to the share file. In
212 this case, make sure the file already exists and the children
213 connect with init_file => 0 to avoid deleting the cache contents
214 and possible race corruption conditions. Also be careful that
215 multiple children may race to create the file at the same time,
216 each overwriting and corrupting content. Use a separate lock file
217 if you must to ensure only one child creates the file. (This is the
218 only possible way under Win32)
219
220 The first way is usually the easiest. If you're using the cache in a
221 Net::Server based module, you'll want to open the cache in the
222 "pre_loop_hook", because that's executed before the fork, but after the
223 process ownership has changed and any chroot has been done.
224
225 In mod_perl, just open the cache at the global level in the appropriate
226 module, which is executed as the server is starting and before it
227 starts forking children, but you'll probably want to chmod or chown the
228 file to the permissions of the apache process.
229
231 new(%Opts)
232 Create a new Cache::FastMmap object.
233
234 Basic global parameters are:
235
236 · share_file
237
238 File to mmap for sharing of data. default on unix:
239 /tmp/sharefile-$pid-$time-$random default on windows:
240 %TEMP%\sharefile-$pid-$time-$random
241
242 · init_file
243
244 Clear any existing values and re-initialise file. Useful to do
245 in a parent that forks off children to ensure that file is
246 empty at the start (default: 0)
247
248 Note: This is quite important to do in the parent to ensure a
249 consistent file structure. The shared file is not perfectly
250 transaction safe, and so if a child is killed at the wrong
251 instant, it might leave the cache file in an inconsistent
252 state.
253
254 · serializer
255
256 Use a serialization library to serialize perl data structures
257 before storing in the cache. If not set, the raw value in the
258 variable passed to set() is stored as a string. You must set
259 this if you want to store anything other than basic scalar
260 values. Supported values are:
261
262 '' for none
263 'storable' for 'Storable'
264 'sereal' for 'Sereal'
265 'json' for 'JSON'
266 [ $s, $d ] for custom serializer/de-serializer
267
268 If this parameter has a value the module will attempt to load
269 the associated package and then use the API of that package to
270 serialize data before storing in the cache, and deserialize it
271 upon retrieval from the cache. (default: 'storable')
272
273 You can use a custom serializer/de-serializer by passing an
274 array-ref with two values. The first should be a subroutine
275 reference that takes the data to serialize as a single argument
276 and returns an octet stream to store. The second should be a
277 subroutine reference that takes the octet stream as a single
278 argument and returns the original data structure.
279
280 One thing to note, the data structure passed to the serializer
281 is always a *scalar* reference to the original data passed in
282 to the ->set(...) call. If your serializer doesn't support
283 that, you might need to dereference it first before storing,
284 but rembember to return a reference again in the de-serializer.
285
286 (Note: Historically this module only supported a boolean value
287 for the `raw_values` parameter and defaulted to 0, which meant
288 it used Storable to serialze all values.)
289
290 · raw_values
291
292 Deprecated. Use serializer above
293
294 · compressor
295
296 Compress the value (but not the key) before storing into the
297 cache, using the compression package identified by the value of
298 the parameter. Supported values are:
299
300 'zlib' for 'Compress::Zlib'
301 'lz4' for 'Compress::LZ4'
302 'snappy' for 'Compress::Snappy'
303 [ $c, $d ] for custom compressor/de-compressor
304
305 If this parameter has a value the module will attempt to load
306 the associated package and then use the API of that package to
307 compress data before storing in the cache, and uncompress it
308 upon retrieval from the cache. (default: undef)
309
310 You can use a custom compressor/de-compressor by passing an
311 array-ref with two values. The first should be a subroutine
312 reference that takes the data to compress as a single octet
313 stream argument and returns an octet stream to store. The
314 second should be a subroutine reference that takes the
315 compressed octet stream as a single argument and returns the
316 original uncompressed data.
317
318 (Note: Historically this module only supported a boolean value
319 for the `compress` parameter and defaulted to use
320 Compress::Zlib. The note for the old `compress` parameter
321 stated: "Some initial testing shows that the uncompressing
322 tends to be very fast, though the compressing can be quite
323 slow, so it's probably best to use this option only if you know
324 values in the cache are long-lived and have a high hit rate."
325
326 Comparable test results for the other compression tools are not
327 yet available; submission of benchmarks welcome. However, the
328 documentation for the 'Snappy' library
329 (http://google.github.io/snappy/) states: For instance,
330 compared to the fastest mode of zlib, Snappy is an order of
331 magnitude faster for most inputs, but the resulting compressed
332 files are anywhere from 20% to 100% bigger. )
333
334 · compress
335
336 Deprecated. Please use compressor, see above.
337
338 · enable_stats
339
340 Enable some basic statistics capturing. When enabled, every
341 read to the cache is counted, and every read to the cache that
342 finds a value in the cache is also counted. You can then
343 retrieve these values via the get_statistics() call. This
344 causes every read action to do a write on a page, which can
345 cause some more IO, so it's disabled by default. (default: 0)
346
347 · expire_time
348
349 Maximum time to hold values in the cache in seconds. A value of
350 0 means does no explicit expiry time, and values are expired
351 only based on LRU usage. Can be expressed as 1m, 1h, 1d for
352 minutes/hours/days respectively. (default: 0)
353
354 You may specify the cache size as:
355
356 · cache_size
357
358 Size of cache. Can be expresses as 1k, 1m for kilobytes or
359 megabytes respectively. Automatically guesses page size/page
360 count values.
361
362 Or specify explicit page size/page count values. If none of these
363 are specified, the values page_size = 64k and num_pages = 89 are
364 used.
365
366 · page_size
367
368 Size of each page. Must be a power of 2 between 4k and 1024k.
369 If not, is rounded to the nearest value.
370
371 · num_pages
372
373 Number of pages. Should be a prime number for best hashing
374
375 The cache allows the use of callbacks for reading/writing data to
376 an underlying data store.
377
378 · context
379
380 Opaque reference passed as the first parameter to any callback
381 function if specified
382
383 · read_cb
384
385 Callback to read data from the underlying data store. Called
386 as:
387
388 $read_cb->($context, $Key)
389
390 Should return the value to use. This value will be saved in the
391 cache for future retrievals. Return undef if there is no value
392 for the given key
393
394 · write_cb
395
396 Callback to write data to the underlying data store. Called
397 as:
398
399 $write_cb->($context, $Key, $Value, $ExpiryTime)
400
401 In 'write_through' mode, it's always called as soon as a
402 set(...) is called on the Cache::FastMmap class. In
403 'write_back' mode, it's called when a value is expunged from
404 the cache if it's been changed by a set(...) rather than read
405 from the underlying store with the read_cb above.
406
407 Note: Expired items do result in the write_cb being called if
408 'write_back' caching is enabled and the item has been changed.
409 You can check the $ExpiryTime against "time()" if you only want
410 to write back values which aren't expired.
411
412 Also remember that write_cb may be called in a different
413 process to the one that placed the data in the cache in the
414 first place
415
416 · delete_cb
417
418 Callback to delete data from the underlying data store. Called
419 as:
420
421 $delete_cb->($context, $Key)
422
423 Called as soon as remove(...) is called on the Cache::FastMmap
424 class
425
426 · cache_not_found
427
428 If set to true, then if the read_cb is called and it returns
429 undef to say nothing was found, then that information is stored
430 in the cache, so that next time a get(...) is called on that
431 key, undef is returned immediately rather than again calling
432 the read_cb
433
434 · write_action
435
436 Either 'write_back' or 'write_through'. (default:
437 write_through)
438
439 · allow_recursive
440
441 If you're using a callback function, then normally the cache is
442 not re-enterable, and attempting to call a get/set on the cache
443 will cause an error. By setting this to one, the cache will
444 unlock any pages before calling the callback. During the unlock
445 time, other processes may change data in current cache page,
446 causing possible unexpected effects. You shouldn't set this
447 unless you know you want to be able to recall to the cache
448 within a callback. (default: 0)
449
450 · empty_on_exit
451
452 When you have 'write_back' mode enabled, then you really want
453 to make sure all values from the cache are expunged when your
454 program exits so any changes are written back.
455
456 The trick is that we only want to do this in the parent
457 process, we don't want any child processes to empty the cache
458 when they exit. So if you set this, it takes the PID via $$,
459 and only calls empty in the DESTROY method if $$ matches the
460 pid we captured at the start. (default: 0)
461
462 · unlink_on_exit
463
464 Unlink the share file when the cache is destroyed.
465
466 As with empty_on_exit, this will only unlink the file if the
467 DESTROY occurs in the same PID that the cache was created in so
468 that any forked children don't unlink the file.
469
470 This value defaults to 1 if the share_file specified does not
471 already exist. If the share_file specified does already exist,
472 it defaults to 0.
473
474 · catch_deadlocks
475
476 Sets an alarm(10) before each page is locked via
477 fcntl(F_SETLKW) to catch any deadlock. This used to be the
478 default behaviour, but it's not really needed in the default
479 case and could clobber sub-second Time::HiRes alarms setup by
480 other code. Defaults to 0.
481
482 get($Key, [ \%Options ])
483 Search cache for given Key. Returns undef if not found. If read_cb
484 specified and not found, calls the callback to try and find the
485 value for the key, and if found (or 'cache_not_found' is set),
486 stores it into the cache and returns the found value.
487
488 %Options is optional, and is used by get_and_set() to control the
489 locking behaviour. For now, you should probably ignore it unless
490 you read the code to understand how it works
491
492 set($Key, $Value, [ \%Options ])
493 Store specified key/value pair into cache
494
495 %Options is optional, and is used by get_and_set() to control the
496 locking behaviour. For now, you should probably ignore it unless
497 you read the code to understand how it works
498
499 This method returns true if the value was stored in the cache,
500 false otherwise. See the PAGE SIZE AND KEY/VALUE LIMITS section for
501 more details.
502
503 get_and_set($Key, $Sub)
504 Atomically retrieve and set the value of a Key.
505
506 The page is locked while retrieving the $Key and is unlocked only
507 after the value is set, thus guaranteeing the value does not change
508 between the get and set operations.
509
510 $Sub is a reference to a subroutine that is called to calculate the
511 new value to store. $Sub gets $Key and the current value as
512 parameters, and should return the new value to set in the cache for
513 the given $Key.
514
515 If the subroutine returns an empty list, no value is stored back in
516 the cache. This avoids updating the expiry time on an entry if you
517 want to do a "get if in cache, store if not present" type callback.
518
519 For example, to atomically increment a value in the cache, you can
520 just use:
521
522 $Cache->get_and_set($Key, sub { return ++$_[1]; });
523
524 In scalar context, the return value from this function is the *new*
525 value stored back into the cache.
526
527 In list context, a two item array is returned; the new value stored
528 back into the cache and a boolean that's true if the value was
529 stored in the cache, false otherwise. See the PAGE SIZE AND
530 KEY/VALUE LIMITS section for more details.
531
532 Notes:
533
534 · Do not perform any get/set operations from the callback sub, as
535 these operations lock the page and you may end up with a dead
536 lock!
537
538 · If your sub does a die/throws an exception, the page will
539 correctly be unlocked (1.15 onwards)
540
541 remove($Key, [ \%Options ])
542 Delete the given key from the cache
543
544 %Options is optional, and is used by get_and_remove() to control
545 the locking behaviour. For now, you should probably ignore it
546 unless you read the code to understand how it works
547
548 get_and_remove($Key)
549 Atomically retrieve value of a Key while removing it from the
550 cache.
551
552 The page is locked while retrieving the $Key and is unlocked only
553 after the value is removed, thus guaranteeing the value stored by
554 someone else isn't removed by us.
555
556 clear()
557 Clear all items from the cache
558
559 Note: If you're using callbacks, this has no effect on items in the
560 underlying data store. No delete callbacks are made
561
562 purge()
563 Clear all expired items from the cache
564
565 Note: If you're using callbacks, this has no effect on items in the
566 underlying data store. No delete callbacks are made, and no write
567 callbacks are made for the expired data
568
569 empty($OnlyExpired)
570 Empty all items from the cache, or if $OnlyExpired is true, only
571 expired items.
572
573 Note: If 'write_back' mode is enabled, any changed items are
574 written back to the underlying store. Expired items are written
575 back to the underlying store as well.
576
577 get_keys($Mode)
578 Get a list of keys/values held in the cache. May immediately be out
579 of date because of the shared access nature of the cache
580
581 If $Mode == 0, an array of keys is returned
582
583 If $Mode == 1, then an array of hashrefs, with 'key',
584 'last_access', 'expire_time' and 'flags' keys is returned
585
586 If $Mode == 2, then hashrefs also contain 'value' key
587
588 get_statistics($Clear)
589 Returns a two value list of (nreads, nreadhits). This only works if
590 you passed enable_stats in the constructor
591
592 nreads is the total number of read attempts done on the cache since
593 it was created
594
595 nreadhits is the total number of read attempts done on the cache
596 since it was created that found the key/value in the cache
597
598 If $Clear is true, the values are reset immediately after they are
599 retrieved
600
601 multi_get($PageKey, [ $Key1, $Key2, ... ])
602 The two multi_xxx routines act a bit differently to the other
603 routines. With the multi_get, you pass a separate PageKey value and
604 then multiple keys. The PageKey value is hashed, and that page
605 locked. Then that page is searched for each key. It returns a hash
606 ref of Key => Value items found in that page in the cache.
607
608 The main advantage of this is just a speed one, if you happen to
609 need to search for a lot of items on each call.
610
611 For instance, say you have users and a bunch of pieces of separate
612 information for each user. On a particular run, you need to
613 retrieve a sub-set of that information for a user. You could do
614 lots of get() calls, or you could use the 'username' as the page
615 key, and just use one multi_get() and multi_set() call instead.
616
617 A couple of things to note:
618
619 1. This makes multi_get()/multi_set() and get()/set()
620 incompatible. Don't mix calls to the two, because you won't
621 find the data you're expecting
622
623 2. The writeback and callback modes of operation do not work with
624 multi_get()/multi_set(). Don't attempt to use them together.
625
626 multi_set($PageKey, { $Key1 = $Value1, $Key2 => $Value2, ... }, [
627 \%Options ])>
628 Store specified key/value pair into cache
629
631 _expunge_all($Mode, $WB)
632 Expunge all items from the cache
633
634 Expunged items (that have not expired) are written back to the
635 underlying store if write_back is enabled
636
637 _expunge_page($Mode, $WB, $Len)
638 Expunge items from the current page to make space for $Len bytes
639 key/value items
640
641 Expunged items (that have not expired) are written back to the
642 underlying store if write_back is enabled
643
644 _lock_page($Page)
645 Lock a given page in the cache, and return an object reference that
646 when DESTROYed, unlocks the page
647
649 · From 1.15
650
651 · Default share_file name is no-longer /tmp/sharefile, but
652 /tmp/sharefile-$pid-$time. This ensures that different
653 runs/processes don't interfere with each other, but means you
654 may not connect up to the file you expect. You should be
655 choosing an explicit name in most cases.
656
657 On Unix systems, you can pass in the environment variable
658 TMPDIR to override the default directory of /tmp
659
660 · The new option unlink_on_exit defaults to true if you pass a
661 filename for the share_file which doesn't already exist. This
662 means if you have one process that creates the file, and
663 another that expects the file to be there, by default it won't
664 be.
665
666 Otherwise the defaults seem sensible to cleanup unneeded share
667 files rather than leaving them around to accumulate.
668
669 · From 1.29
670
671 · Default share_file name is no longer /tmp/sharefile-$pid-$time
672 but /tmp/sharefile-$pid-$time-$random.
673
674 · From 1.31
675
676 · Before 1.31, if you were using raw_values => 0 mode, then the
677 write_cb would be called with raw frozen data, rather than the
678 thawed object. From 1.31 onwards, it correctly calls write_cb
679 with the thawed object value (eg what was passed to the ->set()
680 call in the first place)
681
682 · From 1.36
683
684 · Before 1.36, an alarm(10) would be set before each attempt to
685 lock a page. The only purpose of this was to detect deadlocks,
686 which should only happen if the Cache::FastMmap code was buggy,
687 or a callback function in get_and_set() made another call into
688 Cache::FastMmap.
689
690 However this added unnecessary extra system calls for every
691 lookup, and for users using Time::HiRes, it could clobber any
692 existing alarms that had been set with sub-second resolution.
693
694 So this has now been made an optional feature via the
695 catch_deadlocks option passed to new.
696
698 MLDBM::Sync, IPC::MM, Cache::FileCache, Cache::SharedMemoryCache, DBI,
699 Cache::Mmap, BerkeleyDB
700
701 Latest news/details can also be found at:
702
703 <http://cpan.robm.fastmail.fm/cachefastmmap/>
704
705 Available on github at:
706
707 <https://github.com/robmueller/cache-fastmmap/>
708
710 Rob Mueller <mailto:cpan@robm.fastmail.fm>
711
713 Copyright (C) 2003-2017 by FastMail Pty Ltd
714
715 This library is free software; you can redistribute it and/or modify it
716 under the same terms as Perl itself.
717
718
719
720perl v5.30.0 2019-07-26 Cache::FastMmap(3)