I've been reading up on distributed systems again. For quite a while, my monthly copy of CACM has been my only connection to computer science topics. This time, I followed a few references and came across interesting concepts (most of them familiar from back in university).
Amazon doesn't just sell books over the internet, they're building fascinating systems, too. And sometimes they write articles and papers about it, like in January's edition of CACM (see Eventually consistent). If you're familiar with Eric Brewer's CAP theorem, you know that in a distributed system, you can decide for only two design goals out of consistency, availability and partition resilience. In traditional systems, consistency is highly valued, but if you're willing to trade in consistency for availability, you can build high-performance systems.
One of these systems is Dynamo, a highly available, distributed key-value store that is part of Amazon's internal infrastructure. Dynamo is write-optimized, writes are almost always possible, even if some replicas aren't updated. Inconsistencies are resolved during read, typically by business logic inside the application itself.
Dynamo uses Consistent Hashing, which is used in distributed hash tables (DHTs) like Chord. This way, nodes can join and leave the network without having to remap the whole key space. As a friend succinctly put it "Yes, in a non-hostile environment, DHTs really work. They're not only useful for distributing cam-rips on the Internet". If you're ever going to build a cache consisting of more than one caching server, Consistent Hashing is definitely something to check out.
Damn, I miss building stuff like this! It's a shame that only large operations really need this kind of infrastructure.