CAP Theorem ? Streamy Development Blog

来源:百度文库 编辑:神马文学网 时间:2024/06/03 07:00:12

Streamy Development Blog

  • About
« HUG7: HBase User Group Wrap-Up

CAP Theorem


If you’re talking or thinking about distributed data systems these days, you are almost certain to come across some discussion of the CAP theorem.  This is one of those beautifully simplistic ideas that helps explain something extraordinarily complex.

So, what does it mean?

CAP stands for Consistency, Availability, and Partition tolerance.  The theorem simply states that any shared-data system can only achieve two of these three.

Consistency

Consistency describes how and whether a system is left in a consistent state after an operation. In a distributed data system, this usually means that once a writer has written, all readers will see that write.

A distributed data system is either strongly consistent or has some form of weak consistency.  The most well known example of strong consistency in databases is ACID (Atomicity Consistency Isolation Durability), used in most relational databases.  On the other end of the spectrum is BASE (Basically Available Soft-state Eventual consistency).

Most often, weak consistency comes in the form of eventual consistency which means the database eventually reaches a consistent state.  Weak consistency systems are usually ones where data is replicated; the latest version of something is sitting on some node in the cluster, but older versions are still out there on other nodes, but eventually all nodes will see the latest version.

If you are interested in learning more, Werner Vogels, CTO of Amazon, has two good blog posts on eventual consistency here and here.

Availability

High-Availability refers to the design and implementation of a system such that it is ensured to remain operational over some period of time.

In this context, it generally means a system that is tolerant of node failures and can also remain available during software and hardware upgrades.  This is perhaps the simplest to understand and most commonly desired property, yet can be quite difficult to achieve to any level of certainty.

Partition Tolerance

Partition tolerance refers to the ability for a system to continue to operate in the presence of a network partitions.  For example, if I have a database running on 80 nodes across 2 racks and the interconnect between the racks is lost, my database is now partitioned.  If the system is tolerant of it, then the database will still be able to perform read and write operations while partitioned.  If not, often times the cluster is completely unusable or is read-only.

Who came up with it?

In July 2000, Dr. Eric Brewer of Berkeley gave a talk, Toward Robust Distributed Systems.

In it, he talks about the many trade-offs between ACID and BASE systems.  He explains that it should be thought of not as one or the other, but rather as a continuous spectrum.  As a useful principle, he then introduced the CAP Theorem.

Why now?

Distributed data systems are increasingly becoming a hot area of research and development.  Before the Internet and the web, there were not many companies dealing with terabyte or petabyte datasets.  With the explosion of content and information from websites, blogs, and social networks, more and more businesses are now trying to store, analyze, and serve massive amounts of data.  And they need to be able to perform massive batch operations on it while also serving it up to clients in near real-time.

These companies each have their own requirements: performance, reliability, durability; ACID, BASE, or somewhere in between.

Real World Example

In November 2006, Google released a paper, BigTable: A Distributed Storage System for Structured Data describing a distributed, column-oriented database that sat on top of the distributed Google File System.

In October 2007, Amazon released their own paper, Dynamo: Amazon’s highly available Key-value Store describing a distributed key-value database designed and in-use at Amazon.

What makes these two products a great example is that they are modern designs and implementations of distributed, shared data systems but with two different philosophies regarding CAP.

BigTable is a CA system; it is strongly consistent and highly available, but can be unavailable under network partitions.  BigTable has no replication at the database level, rather replication is handled underneath by GFS.

Dynamo is an AP system; it is highly available, even under network partitions, but eventually consistent.  Data is replicated within a single cluster, so even under partitions most data is available, however one node’s latest version might not match that of another, so every reader is only guaranteed to see every write eventually.

CAP at Streamy

First and foremost, Streamy is backed by HBase, an open-source implementation of Google’s BigTable.  As such, the core of our database systems are strongly consistent and highly available.  We are not overly concerned with network partitions as our clusters are all within a single data center and connected via local gigabit switches.  Once we expand to additional data centers, we plan to employ inter-cluster replication, with each cluster located in a single DC.  Remote replication will introduce some eventual consistency into the system, but each cluster will continue to be strongly consistent.

In addition to HBase, we also have a number of additional data systems that are responsible for indexing, sorting, merging, aggregating, and joining our data.  Some of these systems could be considered distributed or replicated, meaning there are multiple instances on multiple nodes and they talk to each other.  Since none are actually persistent (we do not rely on their state or their ability to save data under faults, that’s what HBase is for), we are most heavily focused on high availability and ensuring a read-your-writes consistency (a special form of eventual consistency).

Without going into more detail, it can be said that we employ both CA and AP systems here at Streamy.  The focus is not on fundamentally which is “better” but rather what the requirements are for that particular application and what the expectation is for our users.  The most important thing to avoid is a user who performs an action but then is unable to see that action immediately, which is why we often enforce a read-your-write consistency when we do need to relax our constraints.