Ian Bicking: the old part of his blog

Referential identity (appropriate abstraction)


Looking for ESI Caching or Static Recap?

Artima has an article Innapropriate Abstractions, which is interesting if perhaps a little brief for what's such a large topic (the appropriateness of various abstractions).

I noticed this:

On a middle-tier, for example, you quite often don't care about caching, because you're just serving some incoming HTTP request that's immediately going to go away thereafter. So why cache?

[...] Part of the problem with most O/R mappings has been that they immediately took on the problem of caching and referential identity. If you ask for a particular customer and get back a Customer object, the next time you ask for that customer you get back exactly the same object. Well that's a tough problem. It requires a gigantic hash table that contains everything you've ever seen.

Of course, I noticed this because SQLObject does this sort of caching, and ensures referential identity. It wasn't very hard to implement, though there were some details that haven't been fully worked out until lately (expiring cached values, for instance -- which still probably confuses people because there's several levels of cache).

Caching is important, and is one of the great things about an ORM, but there's also a lot of diversity in what caching needs people have. Just documenting the options to satisfy everyone is a pain, not to mention implementation. But referential identity was, to me, just a useful way of handling the implementation.

It costs very little, and it doesn't even require caching. Here's how you can do it yourself (untested):

class Cacher:
    def __init__(self):
        self.cache = {} # dict of weakrefs
        self.lock = threading.Lock()
    def get(self, id):
        "Return object, or None"
        val = None
            val = self.cache[id]()
            if val is not None: return val
        except KeyError:
            val = self.cache[id]()
        except KeyError:
            return None
            if val is None:
                # expired weakref
                del self.cache[id]
        # otherwise we found one:
        return val

    def put(self, id, obj):
        self.cache[id] = weakref.ref(obj)

    def finishPut(self):

to be used like:

cache = Cacher()
obj = cache.get(1)
if obj is None:
        obj = NewObject(1)
        cache.put(1, obj)

Like I said, it's untested, but something more complete is in SQLObject.Cache.

This is threadsafe code, which adds some peculiarities to it. First, we try to get the object without getting a lock -- this way most successful cache hits won't require locks at all. Then we retry with the lock set, just in case someone added something to the cache during that instant between val = self.cache[id]() and self.lock.acquire().

Also, we have a pretty important contract with the caller, that if they get None from .get(), that they will call .finishPut() promptly. try:finally: is essential with locks, and here particularly. I do it this way because it may not be feasible for the cache to create objects. We have to maintain the lock during object creation, because we might just be creating an object that someone else wants at the same instance (pretty good chance of that, actually). If we were cleverer, maybe we could allow cache retrieval for other IDs to continue in spite of this lock (well, we do, so long as the cache hits are successful).

This way for any ID only a single object will exists, which in the right application context (typically a single-process, multi-threaded context) can be very useful. Using weak references we aren't actually caching anything -- we are only keeping track of what already exists. The SQLObject implementation allows for caching as well, by keeping both a normal dictionary and a dictionary of weak references.

Created 15 Dec '03
Modified 25 Jan '05