« Xtremely Quick Development SystemComplexity: the mismatch between needs, does, and wants »

Memcached set invalidation


Permalink 09:05:32 pm, by fumanchu Email , 1036 words   English (US)
Categories: IT

Memcached set invalidation

One of the saving graces of Memcached is its use of a stable, static address space; that is, client X and client Y can each manipulate datum D as long as they both know its key. But because the space of addresses is static and flat (not hierarchical), it also tends to be huge, and sparse. This can make it difficult to perform set operations, such as invalidation of a class of entries.

For example, let's design a Data Access component which sits in front of a database, and accepts requests using an RPC-style interface. It fetches results from the cache where possible; otherwise, it reads from a database and writes the result to the cache before returning it to the caller. Assume we have multiple Data Access (DAL) servers, multiple Memcached servers, and one or more database servers.

A common set-invalidation scenario involves the caching of lists of items. Let's suppose a web server requests get_items_in_cart(cart_id=13, limit=10, offset=30). The DAL server might translate this into a cache key such as get_items_in_cart::13:10:30. So far so good. But that's just a read; when we add writes to the picture, cache coherency starts to become a problem.

When a webserver asks a DAL component to add an item to cart 13, we need to invalidate (or recalculate) get_items_in_cart::13:10:30. However, it should be readily apparent that we need to invalidate not just a single key, but potentially a whole class of keys, get_items_in_cart::id:limit:offset, and that that set of keys could be very large. Let's conservatively guess:

>>> max_items = 1000
>>> sum([(max_items / n) for n in range(5, 21)])

That is, if we restrict the 'limit' argument to the interval [5, 20], and assume a maximum number of items per cart of 1024, we end up with over 1500 potential cache keys per cart for a single function! And if we have a million carts...? There are several ways we could attack this problem:

  1. Restrict the interface. Don't parameterize functions unless absolutely necessary. This would mean, rather than expose get_items_in_cart(cart_id, limit, offset), which has 3 arguments, we might instead always assume a page size of 10 and expose get_paged_items_in_cart(cart_id, page), which would result in only 100 potential cache keys per cart, not 1500. We might even go further and expose get_items_in_cart(cart_id) (1 cache key per cart) and let the caller do its own paging. This makes cache invalidation more performant (fewer keys to manage) at the cost of API simplicity and extensibility (because the API is less flexible). So your programmers will scream, possibly silently, when the product team decides they want to change the number of items on a page.
  2. Use an address registry. This approach introduces a separate directory service; when the result of get_items_in_cart(cart_id, limit, offset) is cached, the address is registered in a central list. When a class of keys needs to be invalidated, that central registry can then iterate over all seen keys for each class quickly and stably (or return the list so the caller can do it). This can reduce performance with the extra network traffic, is not very scalable, and tacks on a new reliability issue (the directory is a single point of failure, which costs even more performance if made redundant). Also, your system architects will probably leave you threatening notes.
  3. Broadcast address hits. This is like #2, but broadcasts address registration to all DAL servers instead of using a central registry. Network traffic is much increased, and reliably updating all participating nodes can be quite a chore. Your network admins will most likely start randomly unplugging your office's LAN and phones, and then surreptitiously plug them back in when you knock on their door.
  4. Hash the addresses. If the consumers of our API tend to call only a few combinations of arguments for our cart function, then although we have a large potential space of cache keys, the set of actual keys might be very small, maybe only 2 or 3 keys! The classic means of reducing a large, sparse array to a small, denser one is hashing. That is, rather than using the key get_items_in_cart::13:10:30, we might hash it to get_items_in_cart::13/x, where 'x' is a number from 1 to 4. If we could come up with a reliable hash, we could then invalidate 4 keys, not 1500. This approach can become very complex, and results in cache keys which aren't very transparent (so getting other components to interact with the system is harder). But it doesn't require any extra network calls or new components; each DAL server does its own hash calculations. Your VP of Engineering may stick you with the bill for all those new engineers with Math Ph.D.'s, though.
  5. Rely on timeouts. If you don't mind web servers serving stale data for get_items_in_cart::13:10:30 for, say, 2 minutes, then tell memcached to expire that entry after 2 minutes, and don't bother trying to invalidate it on write. This is by far the simplest solution when you can swing it. But your system operators or QA team might call you first when they have to wait for one too many timeouts during a 3 A.M. debugging session.
  6. Add a version column to the carts table, and use it when forming cache keys. When you update the row, update the version. This can increase the number of database hits if not done properly, but in this example, your callers most likely have already fetched the entire relevant carts row, and might be able to pass the version in the DAL call. However, don't be surprised when your DBA's tell you the HR DB crashed and they lost your timesheets for the past year.

That's all I can think of at the moment. I think I've got some political goodwill among our other programmers at the moment (in that they don't wish me any specific harm, yet), so I may go with solution #1 for now on my current project. But I've also recently been playing architect, so maybe I'll pick solution #2 and just throw away any threatening notes I leave for myself.


Comment from: Dan McKinley [Visitor] · http://mcfunley.com

Cromulent Simpsons reference.

12/31/07 @ 07:28
Comment from: David Avraamides [Visitor] · http://davidavraamides.net

Why not use a variant of #2 where the address registry is memcache? Just have a key 'get_items_in_cart' whose value is a pickled tuple of the specific cached instances of the function:

(get_items_in_cart::13:10:30, get_items_in_cart::5:5:20, ...)

Then when you get a cache miss, you just need to append the new key to the key list as well as cache the actual data. A cache reset just iterates through the keylist and invalidates each key, and then truncates the keylist.

12/31/07 @ 08:43
Comment from: Simon Willison [Visitor] · http://simonwillison.net/

Option 6 looks pretty good to me - I don't understand your argument against using it.

12/31/07 @ 09:43
Comment from: fumanchu [Member] Email


Thanks for mentioning that. Storing indices in memcached can work in some situations, but I rejected it as a general solution because you don't have absolute control over when it might be LRU'd out of the cache (even with no timeout). Instead, you have to trust that you've thrown enough resources into Memcached to never approach that scenario. That's why my #2 opted for a more durable store for the address registry.

12/31/07 @ 10:03
Comment from: fumanchu [Member] Email


The last line is of course just a running joke that someone somewhere in the company isn't going to like whatever decision you make. My argument should have been more fully stated. #6 makes caching more performant at the cost of making database access less so. If you cache the version number itself you might as well not use it, so you need to hit the DB every time, but you're usually in the situation of adding caching because your database is already overworked. It also means more coordination between all layers of the stack, since you have to pass the version around; this is especially true if you've physically separated the app and DAL layers, as we have.

So, in some situations I might try it, like I might try any one of them depending on the details of the operations involved.

12/31/07 @ 10:17
Comment from: fumanchu [Member] Email
  1. Put a shim in front of each memcached server that allows you to set tags for each key, and then delete all seen keys which share a given tag.
12/31/07 @ 17:31
Comment from: Alex Rickabaugh [Visitor]

the way we handle something like this is to have a group key which contains a randomly generated prefix. A lookup becomes:

prefix = cache_get('cart_' + cart_id);
fullkey = prefix + '' + limit + '' + offset;
value = cache_get(fullkey);

of course, if the group key is invalid, all of the subkeys that depend on the group key lookup are implicitly invalid, since they are effectively lost. so invalidating the entire cached cart is as simple as:

cache_del('cart_' + cart_id);

the lost subkeys will eventually expire, but in the meantime, the next time we query the cart (and find that the group key is invalid), we'll generate a new prefix (something like md5(rand()) works well) and save it to the group key.

02/22/08 @ 10:32
Comment from: fumanchu [Member] Email

Alex, that sounds like a very nice solution indeed! The only thing I might change would be, instead of calling:

cache_del('cart' + cart_id)
....time passes
prefix = cache_get('cart' + cart_id)
if prefix is None:
cache_set('cart' + cart_id, newprefix())

I would probably just call cache_set with a new prefix right away. That would cut down on stampeding and out-of-sync prefixes if multiple consumers are using the same cache farm.

02/28/08 @ 16:28
Comment from: James Arthur [Visitor] · http://www.thruflo.com

Sometimes reading to the bottom of the comment thread really pays off.

Alex's solution: so simple, so clever. Nice!

07/15/09 @ 10:18

Leave a comment

Your email address will not be revealed on this site.

Your URL will be displayed.

Please enter the phrase "I am a real human." in the textbox above.
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)
August 2020
Sun Mon Tue Wed Thu Fri Sat
 << <   > >>
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31          


The requested Blog doesn't exist any more!

XML Feeds

blog engine