1Cache::FastMmap(3) User Contributed Perl Documentation Cache::FastMmap(3)
2
3
4
6 Cache::FastMmap - Uses an mmap'ed file to act as a shared memory
7 interprocess cache
8
10 use Cache::FastMmap;
11
12 # Uses vaguely sane defaults
13 $Cache = Cache::FastMmap->new();
14
15 # $Value must be a reference...
16 $Cache->set($Key, $Value);
17 $Value = $Cache->get($Key);
18
19 $Cache = Cache::FastMmap->new(raw_values => 1);
20
21 # $Value can't be a reference...
22 $Cache->set($Key, $Value);
23 $Value = $Cache->get($Key);
24
26 A shared memory cache through an mmap'ed file. It's core is written in
27 C for performance. It uses fcntl locking to ensure multiple processes
28 can safely access the cache at the same time. It uses a basic LRU
29 algorithm to keep the most used entries in the cache.
30
32 In multi-process environments (eg mod_perl, forking daemons, etc), it's
33 common to want to cache information, but have that cache shared between
34 processes. Many solutions already exist, and may suit your situation
35 better:
36
37 · MLDBM::Sync - acts as a database, data is not automatically
38 expired, slow
39
40 · IPC::MM - hash implementation is broken, data is not automatically
41 expired, slow
42
43 · Cache::FileCache - lots of features, slow
44
45 · Cache::SharedMemoryCache - lots of features, VERY slow. Uses
46 IPC::ShareLite which freeze/thaws ALL data at each read/write
47
48 · DBI - use your favourite RDBMS. can perform well, need a DB server
49 running. very global. socket connection latency
50
51 · Cache::Mmap - similar to this module, in pure perl. slows down with
52 larger pages
53
54 · BerkeleyDB - very fast (data ends up mostly in shared memory cache)
55 but acts as a database overall, so data is not automatically
56 expired
57
58 In the case I was working on, I needed:
59
60 · Automatic expiry and space management
61
62 · Very fast access to lots of small items
63
64 · The ability to fetch/store many items in one go
65
66 Which is why I developed this module. It tries to be quite efficient
67 through a number of means:
68
69 · Core code is written in C for performance
70
71 · It uses multiple pages within a file, and uses Fcntl to only lock a
72 page at a time to reduce contention when multiple processes access
73 the cache.
74
75 · It uses a dual level hashing system (hash to find page, then hash
76 within each page to find a slot) to make most "get()" calls O(1)
77 and fast
78
79 · On each "set()", if there are slots and page space available, only
80 the slot has to be updated and the data written at the end of the
81 used data space. If either runs out, a re-organisation of the page
82 is performed to create new slots/space which is done in an
83 efficient way
84
85 The class also supports read-through, and write-back or write-through
86 callbacks to access the real data if it's not in the cache, meaning
87 that code like this:
88
89 my $Value = $Cache->get($Key);
90 if (!defined $Value) {
91 $Value = $RealDataSource->get($Key);
92 $Cache->set($Key, $Value)
93 }
94
95 Isn't required, you instead specify in the constructor:
96
97 Cache::FastMmap->new(
98 ...
99 context => $RealDataSourceHandle,
100 read_cb => sub { $_[0]->get($_[1]) },
101 write_cb => sub { $_[0]->set($_[1], $_[2]) },
102 );
103
104 And then:
105
106 my $Value = $Cache->get($Key);
107
108 $Cache->set($Key, $NewValue);
109
110 Will just work and will be read/written to the underlying data source
111 as needed automatically.
112
114 If you're storing relatively large and complex structures into the
115 cache, then you're limited by the speed of the Storable module. If
116 you're storing simple structures, or raw data, then Cache::FastMmap has
117 noticeable performance improvements.
118
119 See <http://cpan.robm.fastmail.fm/cache_perf.html> for some comparisons
120 to other modules.
121
123 Cache::FastMmap uses mmap to map a file as the shared cache space, and
124 fcntl to do page locking. This means it should work on most UNIX like
125 operating systems.
126
127 Ash Berlin has written a Win32 layer using MapViewOfFile et al. to
128 provide support for Win32 platform.
129
131 Because Cache::FastMmap mmap's a shared file into your processes memory
132 space, this can make each process look quite large, even though it's
133 just mmap'd memory that's shared between all processes that use the
134 cache, and may even be swapped out if the cache is getting low usage.
135
136 However, the OS will think your process is quite large, which might
137 mean you hit some BSD::Resource or 'ulimits' you set previously that
138 you thought were sane, but aren't anymore, so be aware.
139
141 Because Cache::FastMmap uses an mmap'ed file, when you put values into
142 the cache, you are actually "dirtying" pages in memory that belong to
143 the cache file. Your OS will want to write those dirty pages back to
144 the file on the actual physical disk, but the rate it does that at is
145 very OS dependent.
146
147 In Linux, you have some control over how the OS writes those pages back
148 using a number of parameters in /proc/sys/vm
149
150 dirty_background_ratio
151 dirty_expire_centisecs
152 dirty_ratio
153 dirty_writeback_centisecs
154
155 How you tune these depends heavily on your setup.
156
157 As an interesting point, if you use a highmem linux kernel, a change
158 between 2.6.16 and 2.6.20 made the kernel flush memory a LOT more.
159 There's details in this kernel mailing list thread:
160 <http://www.uwsg.iu.edu/hypermail/linux/kernel/0711.3/0804.html>
161
162 In most cases, people are not actually concerned about the persistence
163 of data in the cache, and so are happy to disable writing of any cache
164 data back to disk at all. Baically what they want is an in memory only
165 shared cache. The best way to do that is to use a "tmpfs" filesystem
166 and put all cache files on there.
167
168 For instance, all our machines have a /tmpfs mount point that we create
169 in /etc/fstab as:
170
171 none /tmpfs tmpfs defaults,noatime,size=1000M 0 0
172
173 And we put all our cache files on there. The tmpfs filesystem is smart
174 enough to only use memory as required by files actually on the tmpfs,
175 so making it 1G in size doesn't actually use 1G of memory, it only uses
176 as much as the cache files we put on it. In all cases, we ensure that
177 we never run out of real memory, so the cache files effectively act
178 just as named access points to shared memory.
179
180 Some people have suggested using anonymous mmaped memory. Unfortunately
181 we need a file descriptor to do the fcntl locking on, so we'd have to
182 create a separate file on a filesystem somewhere anyway. It seems
183 easier to just create an explicit "tmpfs" filesystem.
184
186 To reduce lock contention, Cache::FastMmap breaks up the file into
187 pages. When you get/set a value, it hashes the key to get a page, then
188 locks that page, and uses a hash table within the page to get/store the
189 actual key/value pair.
190
191 One consequence of this is that you cannot store values larger than a
192 page in the cache at all. Attempting to store values larger than a page
193 size will fail (the set() function will return false).
194
195 Also keep in mind that each page has it's own hash table, and that we
196 store the key and value data of each item. So if you are expecting to
197 store large values and/or keys in the cache, you should use page sizes
198 that are definitely larger than your largest key + value size + a few
199 kbytes for the overhead.
200
202 Because the cache uses shared memory through an mmap'd file, you have
203 to make sure each process connects up to the file. There's probably two
204 main ways to do this:
205
206 · Create the cache in the parent process, and then when it forks,
207 each child will inherit the same file descriptor, mmap'ed memory,
208 etc and just work. This is the recommended way. (BEWARE: This only
209 works under UNIX as Win32 has no concept of forking)
210
211 · Explicitly connect up in each forked child to the share file. In
212 this case, make sure the file already exists and the children
213 connect with init_file => 0 to avoid deleting the cache contents
214 and possible race corruption conditions. Also be careful that
215 multiple children may race to create the file at the same time,
216 each overwriting and corrupting content. Use a separate lock file
217 if you must to ensure only one child creates the file. (This is the
218 only possible way under Win32)
219
220 The first way is usually the easiest. If you're using the cache in a
221 Net::Server based module, you'll want to open the cache in the
222 "pre_loop_hook", because that's executed before the fork, but after the
223 process ownership has changed and any chroot has been done.
224
225 In mod_perl, just open the cache at the global level in the appropriate
226 module, which is executed as the server is starting and before it
227 starts forking children, but you'll probably want to chmod or chown the
228 file to the permissions of the apache process.
229
231 new(%Opts)
232 Create a new Cache::FastMmap object.
233
234 Basic global parameters are:
235
236 · share_file
237
238 File to mmap for sharing of data. default on unix:
239 /tmp/sharefile-$pid-$time-$random default on windows:
240 %TEMP%\sharefile-$pid-$time-$random
241
242 · init_file
243
244 Clear any existing values and re-initialise file. Useful to do
245 in a parent that forks off children to ensure that file is
246 empty at the start (default: 0)
247
248 Note: This is quite important to do in the parent to ensure a
249 consistent file structure. The shared file is not perfectly
250 transaction safe, and so if a child is killed at the wrong
251 instant, it might leave the the cache file in an inconsistent
252 state.
253
254 · raw_values
255
256 Store values as raw binary data rather than using Storable to
257 free/thaw data structures (default: 0)
258
259 · compress
260
261 Compress the value (but not the key) before storing into the
262 cache. If you set this to 1, the module will attempt to require
263 the Compress::Zlib module and then use the memGzip() function
264 on the value data before storing into the cache, and
265 memGunzip() when retrieving data from the cache. Some initial
266 testing shows that the uncompressing tends to be very fast,
267 though the compressing can be quite slow, so it's probably best
268 to use this option only if you know values in the cache are
269 long lived and have a high hit rate. (default: 0)
270
271 · enable_stats
272
273 Enable some basic statistics capturing. When enabled, every
274 read to the cache is counted, and every read to the cache that
275 finds a value in the cache is also counted. You can then
276 retrieve these values via the get_statistics() call. This
277 causes every read action to do a write on a page, which can
278 cause some more IO, so it's disabled by default. (default: 0)
279
280 · expire_time
281
282 Maximum time to hold values in the cache in seconds. A value of
283 0 means does no explicit expiry time, and values are expired
284 only based on LRU usage. Can be expressed as 1m, 1h, 1d for
285 minutes/hours/days respectively. (default: 0)
286
287 You may specify the cache size as:
288
289 · cache_size
290
291 Size of cache. Can be expresses as 1k, 1m for kilobytes or
292 megabytes respectively. Automatically guesses page size/page
293 count values.
294
295 Or specify explicit page size/page count values. If none of these
296 are specified, the values page_size = 64k and num_pages = 89 are
297 used.
298
299 · page_size
300
301 Size of each page. Must be a power of 2 between 4k and 1024k.
302 If not, is rounded to the nearest value.
303
304 · num_pages
305
306 Number of pages. Should be a prime number for best hashing
307
308 The cache allows the use of callbacks for reading/writing data to
309 an underlying data store.
310
311 · context
312
313 Opaque reference passed as the first parameter to any callback
314 function if specified
315
316 · read_cb
317
318 Callback to read data from the underlying data store. Called
319 as:
320
321 $read_cb->($context, $Key)
322
323 Should return the value to use. This value will be saved in the
324 cache for future retrievals. Return undef if there is no value
325 for the given key
326
327 · write_cb
328
329 Callback to write data to the underlying data store. Called
330 as:
331
332 $write_cb->($context, $Key, $Value, $ExpiryTime)
333
334 In 'write_through' mode, it's always called as soon as a
335 set(...) is called on the Cache::FastMmap class. In
336 'write_back' mode, it's called when a value is expunged from
337 the cache if it's been changed by a set(...) rather than read
338 from the underlying store with the read_cb above.
339
340 Note: Expired items do result in the write_cb being called if
341 'write_back' caching is enabled and the item has been changed.
342 You can check the $ExpiryTime against "time()" if you only want
343 to write back values which aren't expired.
344
345 Also remember that write_cb may be called in a different
346 process to the one that placed the data in the cache in the
347 first place
348
349 · delete_cb
350
351 Callback to delete data from the underlying data store. Called
352 as:
353
354 $delete_cb->($context, $Key)
355
356 Called as soon as remove(...) is called on the Cache::FastMmap
357 class
358
359 · cache_not_found
360
361 If set to true, then if the read_cb is called and it returns
362 undef to say nothing was found, then that information is stored
363 in the cache, so that next time a get(...) is called on that
364 key, undef is returned immediately rather than again calling
365 the read_cb
366
367 · write_action
368
369 Either 'write_back' or 'write_through'. (default:
370 write_through)
371
372 · allow_recursive
373
374 If you're using a callback function, then normally the cache is
375 not re-enterable, and attempting to call a get/set on the cache
376 will cause an error. By setting this to one, the cache will
377 unlock any pages before calling the callback. During the unlock
378 time, other processes may change data in current cache page,
379 causing possible unexpected effects. You shouldn't set this
380 unless you know you want to be able to recall to the cache
381 within a callback. (default: 0)
382
383 · empty_on_exit
384
385 When you have 'write_back' mode enabled, then you really want
386 to make sure all values from the cache are expunged when your
387 program exits so any changes are written back.
388
389 The trick is that we only want to do this in the parent
390 process, we don't want any child processes to empty the cache
391 when they exit. So if you set this, it takes the PID via $$,
392 and only calls empty in the DESTROY method if $$ matches the
393 pid we captured at the start. (default: 0)
394
395 · unlink_on_exit
396
397 Unlink the share file when the cache is destroyed.
398
399 As with empty_on_exit, this will only unlink the file if the
400 DESTROY occurs in the same PID that the cache was created in so
401 that any forked children don't unlink the file.
402
403 This value defaults to 1 if the share_file specified does not
404 already exist. If the share_file specified does already exist,
405 it defaults to 0.
406
407 · catch_deadlocks
408
409 Sets an alarm(10) before each page is locked via
410 fcntl(F_SETLKW) to catch any deadlock. This used to be the
411 default behaviour, but it's not really needed in the default
412 case and could clobber sub-second Time::HiRes alarms setup by
413 other code. Defaults to 0.
414
415 get($Key, [ \%Options ])
416 Search cache for given Key. Returns undef if not found. If read_cb
417 specified and not found, calls the callback to try and find the
418 value for the key, and if found (or 'cache_not_found' is set),
419 stores it into the cache and returns the found value.
420
421 %Options is optional, and is used by get_and_set() to control the
422 locking behaviour. For now, you should probably ignore it unless
423 you read the code to understand how it works
424
425 set($Key, $Value, [ \%Options ])
426 Store specified key/value pair into cache
427
428 %Options is optional, and is used by get_and_set() to control the
429 locking behaviour. For now, you should probably ignore it unless
430 you read the code to understand how it works
431
432 This method returns true if the value was stored in the cache,
433 false otherwise. See the PAGE SIZE AND KEY/VALUE LIMITS section for
434 more details.
435
436 get_and_set($Key, $Sub)
437 Atomically retrieve and set the value of a Key.
438
439 The page is locked while retrieving the $Key and is unlocked only
440 after the value is set, thus guaranteeing the value does not change
441 betwen the get and set operations.
442
443 $Sub is a reference to a subroutine that is called to calculate the
444 new value to store. $Sub gets $Key and the current value as
445 parameters, and should return the new value to set in the cache for
446 the given $Key.
447
448 For example, to atomically increment a value in the cache, you can
449 just use:
450
451 $Cache->get_and_set($Key, sub { return ++$_[1]; });
452
453 In scalar context, the return value from this function is the *new*
454 value stored back into the cache.
455
456 In list context, a two item array is returned; the new value stored
457 back into the cache and a boolean that's true if the value was
458 stored in the cache, false otherwise. See the PAGE SIZE AND
459 KEY/VALUE LIMITS section for more details.
460
461 Notes:
462
463 · Do not perform any get/set operations from the callback sub, as
464 these operations lock the page and you may end up with a dead
465 lock!
466
467 · If your sub does a die/throws an exception, the page will
468 correctly be unlocked (1.15 onwards)
469
470 remove($Key, [ \%Options ])
471 Delete the given key from the cache
472
473 %Options is optional, and is used by get_and_remove() to control
474 the locking behaviour. For now, you should probably ignore it
475 unless you read the code to understand how it works
476
477 get_and_remove($Key)
478 Atomically retrieve value of a Key while removing it from the
479 cache.
480
481 The page is locked while retrieving the $Key and is unlocked only
482 after the value is removed, thus guaranteeing the value stored by
483 someone else isn't removed by us.
484
485 clear()
486 Clear all items from the cache
487
488 Note: If you're using callbacks, this has no effect on items in the
489 underlying data store. No delete callbacks are made
490
491 purge()
492 Clear all expired items from the cache
493
494 Note: If you're using callbacks, this has no effect on items in the
495 underlying data store. No delete callbacks are made, and no write
496 callbacks are made for the expired data
497
498 empty($OnlyExpired)
499 Empty all items from the cache, or if $OnlyExpired is true, only
500 expired items.
501
502 Note: If 'write_back' mode is enabled, any changed items are
503 written back to the underlying store. Expired items are written
504 back to the underlying store as well.
505
506 get_keys($Mode)
507 Get a list of keys/values held in the cache. May immediately be out
508 of date because of the shared access nature of the cache
509
510 If $Mode == 0, an array of keys is returned
511
512 If $Mode == 1, then an array of hashrefs, with 'key',
513 'last_access', 'expire_time' and 'flags' keys is returned
514
515 If $Mode == 2, then hashrefs also contain 'value' key
516
517 get_statistics($Clear)
518 Returns a two value list of (nreads, nreadhits). This only works if
519 you passed enable_stats in the constructor
520
521 nreads is the total number of read attempts done on the cache since
522 it was created
523
524 nreadhits is the total number of read attempts done on the cache
525 since it was created that found the key/value in the cache
526
527 If $Clear is true, the values are reset immediately after they are
528 retrieved
529
530 multi_get($PageKey, [ $Key1, $Key2, ... ])
531 The two multi_xxx routines act a bit differently to the other
532 routines. With the multi_get, you pass a separate PageKey value and
533 then multiple keys. The PageKey value is hashed, and that page
534 locked. Then that page is searched for each key. It returns a hash
535 ref of Key => Value items found in that page in the cache.
536
537 The main advantage of this is just a speed one, if you happen to
538 need to search for a lot of items on each call.
539
540 For instance, say you have users and a bunch of pieces of separate
541 information for each user. On a particular run, you need to
542 retrieve a sub-set of that information for a user. You could do
543 lots of get() calls, or you could use the 'username' as the page
544 key, and just use one multi_get() and multi_set() call instead.
545
546 A couple of things to note:
547
548 1. This makes multi_get()/multi_set() and get()/set()
549 incompatible. Don't mix calls to the two, because you won't
550 find the data you're expecting
551
552 2. The writeback and callback modes of operation do not work with
553 multi_get()/multi_set(). Don't attempt to use them together.
554
555 multi_set($PageKey, { $Key1 = $Value1, $Key2 => $Value2, ... }, [
556 \%Options ])>
557 Store specified key/value pair into cache
558
560 _expunge_all($Mode, $WB)
561 Expunge all items from the cache
562
563 Expunged items (that have not expired) are written back to the
564 underlying store if write_back is enabled
565
566 _expunge_page($Mode, $WB, $Len)
567 Expunge items from the current page to make space for $Len bytes
568 key/value items
569
570 Expunged items (that have not expired) are written back to the
571 underlying store if write_back is enabled
572
573 _lock_page($Page)
574 Lock a given page in the cache, and return an object reference that
575 when DESTROYed, unlocks the page
576
578 · From 1.15
579
580 · Default share_file name is no-longer /tmp/sharefile, but
581 /tmp/sharefile-$pid-$time. This ensures that different
582 runs/processes don't interfere with each other, but means you
583 may not connect up to the file you expect. You should be
584 choosing an explicit name in most cases.
585
586 On Unix systems, you can pass in the environment variable
587 TMPDIR to override the default directory of /tmp
588
589 · The new option unlink_on_exit defaults to true if you pass a
590 filename for the share_file which doesn't already exist. This
591 means if you have one process that creates the file, and
592 another that expects the file to be there, by default it won't
593 be.
594
595 Otherwise the defaults seem sensible to cleanup unneeded share
596 files rather than leaving them around to accumulate.
597
598 · From 1.29
599
600 · Default share_file name is no longer /tmp/sharefile-$pid-$time
601 but /tmp/sharefile-$pid-$time-$random.
602
603 · From 1.31
604
605 · Before 1.31, if you were using raw_values => 0 mode, then the
606 write_cb would be called with raw frozen data, rather than the
607 thawed object. From 1.31 onwards, it correctly calls write_cb
608 with the thawed object value (eg what was passed to the ->set()
609 call in the first place)
610
611 · From 1.36
612
613 · Before 1.36, an alarm(10) would be set before each attempt to
614 lock a page. The only purpose of this was to detect deadlocks,
615 which should only happen if the Cache::FastMmap code was buggy,
616 or a callback function in get_and_set() made another call into
617 Cache::FastMmap.
618
619 However this added unnecessary extra system calls for every
620 lookup, and for users using Time::HiRes, it could clobber any
621 existing alarms that had been set with sub-second resolution.
622
623 So this has now been made an optional feature via the
624 catch_deadlocks option passed to new.
625
627 MLDBM::Sync, IPC::MM, Cache::FileCache, Cache::SharedMemoryCache, DBI,
628 Cache::Mmap, BerkeleyDB
629
630 Latest news/details can also be found at:
631
632 <http://cpan.robm.fastmail.fm/cachefastmmap/>
633
634 Available on github at:
635
636 https://github.com/robmueller/cache-fastmmap/
637 <https://github.com/robmueller/cache-fastmmap/>
638
640 Rob Mueller <mailto:cpan@robm.fastmail.fm>
641
643 Copyright (C) 2003-2011 by Opera Software Australia Pty Ltd
644
645 This library is free software; you can redistribute it and/or modify it
646 under the same terms as Perl itself.
647
648
649
650perl v5.12.3 2011-07-18 Cache::FastMmap(3)