|« Xtremely Quick Development System||Complexity: the mismatch between needs, does, and wants »|
One of the saving graces of Memcached is its use of a stable, static address space; that is, client X and client Y can each manipulate datum D as long as they both know its key. But because the space of addresses is static and flat (not hierarchical), it also tends to be huge, and sparse. This can make it difficult to perform set operations, such as invalidation of a class of entries.
For example, let's design a Data Access component which sits in front of a database, and accepts requests using an RPC-style interface. It fetches results from the cache where possible; otherwise, it reads from a database and writes the result to the cache before returning it to the caller. Assume we have multiple Data Access (DAL) servers, multiple Memcached servers, and one or more database servers.
A common set-invalidation scenario involves the caching of lists of items. Let's suppose a web server requests
get_items_in_cart(cart_id=13, limit=10, offset=30). The DAL server might translate this into a cache key such as
get_items_in_cart::13:10:30. So far so good. But that's just a read; when we add writes to the picture, cache coherency starts to become a problem.
When a webserver asks a DAL component to add an item to cart 13, we need to invalidate (or recalculate)
get_items_in_cart::13:10:30. However, it should be readily apparent that we need to invalidate not just a single key, but potentially a whole class of keys,
get_items_in_cart::id:limit:offset, and that that set of keys could be very large. Let's conservatively guess:
>>> max_items = 1000 >>> sum([(max_items / n) for n in range(5, 21)]) 1507
That is, if we restrict the 'limit' argument to the interval [5, 20], and assume a maximum number of items per cart of 1024, we end up with over 1500 potential cache keys per cart for a single function! And if we have a million carts...? There are several ways we could attack this problem:
get_items_in_cart(cart_id, limit, offset), which has 3 arguments, we might instead always assume a page size of 10 and expose
get_paged_items_in_cart(cart_id, page), which would result in only 100 potential cache keys per cart, not 1500. We might even go further and expose
get_items_in_cart(cart_id)(1 cache key per cart) and let the caller do its own paging. This makes cache invalidation more performant (fewer keys to manage) at the cost of API simplicity and extensibility (because the API is less flexible). So your programmers will scream, possibly silently, when the product team decides they want to change the number of items on a page.
get_items_in_cart(cart_id, limit, offset)is cached, the address is registered in a central list. When a class of keys needs to be invalidated, that central registry can then iterate over all seen keys for each class quickly and stably (or return the list so the caller can do it). This can reduce performance with the extra network traffic, is not very scalable, and tacks on a new reliability issue (the directory is a single point of failure, which costs even more performance if made redundant). Also, your system architects will probably leave you threatening notes.
get_items_in_cart::13:10:30, we might hash it to
get_items_in_cart::13/x, where 'x' is a number from 1 to 4. If we could come up with a reliable hash, we could then invalidate 4 keys, not 1500. This approach can become very complex, and results in cache keys which aren't very transparent (so getting other components to interact with the system is harder). But it doesn't require any extra network calls or new components; each DAL server does its own hash calculations. Your VP of Engineering may stick you with the bill for all those new engineers with Math Ph.D.'s, though.
get_items_in_cart::13:10:30for, say, 2 minutes, then tell memcached to expire that entry after 2 minutes, and don't bother trying to invalidate it on write. This is by far the simplest solution when you can swing it. But your system operators or QA team might call you first when they have to wait for one too many timeouts during a 3 A.M. debugging session.
cartstable, and use it when forming cache keys. When you update the row, update the version. This can increase the number of database hits if not done properly, but in this example, your callers most likely have already fetched the entire relevant
cartsrow, and might be able to pass the version in the DAL call. However, don't be surprised when your DBA's tell you the HR DB crashed and they lost your timesheets for the past year.
That's all I can think of at the moment. I think I've got some political goodwill among our other programmers at the moment (in that they don't wish me any specific harm, yet), so I may go with solution #1 for now on my current project. But I've also recently been playing architect, so maybe I'll pick solution #2 and just throw away any threatening notes I leave for myself.
Cromulent Simpsons reference.
Why not use a variant of #2 where the address registry is memcache? Just have a key 'get_items_in_cart' whose value is a pickled tuple of the specific cached instances of the function:
(get_items_in_cart::13:10:30, get_items_in_cart::5:5:20, ...)
Then when you get a cache miss, you just need to append the new key to the key list as well as cache the actual data. A cache reset just iterates through the keylist and invalidates each key, and then truncates the keylist.
Option 6 looks pretty good to me - I don't understand your argument against using it.
Thanks for mentioning that. Storing indices in memcached can work in some situations, but I rejected it as a general solution because you don't have absolute control over when it might be LRU'd out of the cache (even with no timeout). Instead, you have to trust that you've thrown enough resources into Memcached to never approach that scenario. That's why my #2 opted for a more durable store for the address registry.
The last line is of course just a running joke that someone somewhere in the company isn't going to like whatever decision you make. My argument should have been more fully stated. #6 makes caching more performant at the cost of making database access less so. If you cache the version number itself you might as well not use it, so you need to hit the DB every time, but you're usually in the situation of adding caching because your database is already overworked. It also means more coordination between all layers of the stack, since you have to pass the version around; this is especially true if you've physically separated the app and DAL layers, as we have.
So, in some situations I might try it, like I might try any one of them depending on the details of the operations involved.
the way we handle something like this is to have a group key which contains a randomly generated prefix. A lookup becomes:
prefix = cache_get('cart_' + cart_id);
fullkey = prefix + '' + limit + '' + offset;
value = cache_get(fullkey);
of course, if the group key is invalid, all of the subkeys that depend on the group key lookup are implicitly invalid, since they are effectively lost. so invalidating the entire cached cart is as simple as:
cache_del('cart_' + cart_id);
the lost subkeys will eventually expire, but in the meantime, the next time we query the cart (and find that the group key is invalid), we'll generate a new prefix (something like md5(rand()) works well) and save it to the group key.
Alex, that sounds like a very nice solution indeed! The only thing I might change would be, instead of calling:
cache_del('cart' + cart_id)
prefix = cache_get('cart' + cart_id)
if prefix is None:
cache_set('cart' + cart_id, newprefix())
I would probably just call cache_set with a new prefix right away. That would cut down on stampeding and out-of-sync prefixes if multiple consumers are using the same cache farm.
Sometimes reading to the bottom of the comment thread really pays off.
Alex's solution: so simple, so clever. Nice!
|<< <||> >>|