Eventually Consistent Databases: State of the Art

Introduction

Eventual consistency [1] is a consistency model, which is used in many large distributed databases. Such databases require that all changes to a replicated piece of data eventually reach all affected replicas. Furthermore, the conflict resolution is not handled in these databases, and the responsibility is pushed up to the application authors in the event of conflicting updates. Eventual consistency is a specific form of weak consistency: the storage system guarantees that if no new updates are made to the object, eventually all accesses will return the last updated value [1]. If no failures occur, the maximum size of the inconsistency window can be determined based on the factors such as communication delays, the load on the system, and the number of replicas involved in the replication scheme. We earlier in https://blog.mariadb.org/mariadb-eventually-consistent/ studied is the MariaDB eventually consistent and found out that in the most of the configurations it can be.

DEFINITION: Eventual consistency.

Eventual delivery: An update executed at one node eventually executes at all nodes.
Termination: All update executions terminate.
Convergence: Nodes that have executed the same updates eventually reach an equivalent state (and stay).

EXAMPLE: Consider a case where data item R=0 on all three nodes. Assume that we have the following sequence of writes and commits: W(R=3) C W(R=5) C W(R=7) C in node 0. Now read on node 1 could return R=5 and read from node 2 could return R=7. This is eventually consistent as long as eventually read from all nodes return the same value. Note that this final value could be R=5. Eventual consistency does not restrict the order in which the writes must be executed.

In this blog we shortly introduce database management systems using eventual consistency and we evaluate reviewed databases based on popularity, maturity, consistency and use cases. Based on this we present advantages and disadvantages of eventual consistency. A longer and more thorough version of this research has been published on [13].

Databases using eventual consistency

MongoDB [2] is a cross-platform document-oriented NoSQL database system, and uses BSON to store data as its data model. MongoDB is free and open source software, and has official drivers for a variety of popular programming languages and development environments. Web programming language Opa also has built- in support for MongoDB, and offers a type-safety layer on top of MongoDB. There are also a large number of unofficial or community-supported drivers for other programming languages and frameworks.

CouchDB [3] is an open source NoSQL database, and uses JSON as its data mdoel, JavaScript as its query language and HTTP as API. CouchDB was first released in 2005 and later became an Apache project in 2008. One of CouchDB’s distinguished features is multi-master replication.

Amazon SimpleDB [4] is a distributed database written in Erlang by Amazon.com. It is used as a web service with Amazon Elastic Compute Cloud (EC2) and Amazon S3, and is part of Amazon Web Services. It was announced on December 13, 2007.

Amazon DynamoDB [5,6] is a fully managed proprietary NoSQL database service that is offered by Amazon.com as part of the Amazon Web Services portfolio. DynamoDB uses a similar data model as Dynamo, and derives its name also from Dynamo, but has a different underlying implementation: DynamoDB has a single master design. DynamoDB was announced by Amazon CTO Werner Vogels on January 18, 2012.

Riak [7] is an open-source, fault-tolerant key-value NoSQL database, and implements the principles from Amazon’s Dynamo. Riak uses the consistent hashing to distribute data across nodes, and buckets to store data.

DeeDS [8] is a prototype of a distributed, active real-time database system. It aims to provide a data storage for real-time applications, which may have hard or firm real-time requirements. As database, DeeDS uses OBST (Object Management system of STONE) and TDBM (DBM with transactions), which replaces the OBST storage manager. One main reason for introducing TDBM is to add support of nested transaction into DeeDS. TDBM is a transaction processing data store with a layered architecture, and provides DeeDS with:

Nested Transactions
Volatile and persistent databases
Support for very large data items

ZATARA database [9] is a distributed database engine that features an abstract query interface and plug-in-able internal data structures. ZATARA is designed for the framework, where it is flexible enough to be used by any software application, and guarantees data integrity and achieves high performance and scalability.

Both DeeDS and ZATARA are the result from research projects and not yet mature enough for production usage.

We use the following criteria to evaluate the database systems that support the eventual consistency:

Popularity
Maturity
Consistency
Use cases

Popularity

We evaluate the popularity of the presented database systems based on DB Engines ranking (http://db-engines.com/en/ranking). The DB Engines Ranking ranks database management systems according to their popularity. At the beginning of 2014, MongoDB was ranked 7th with a score of 96.1. In June 2014, it is ranked 5th with the score 231.44. At the beginning of 2014, CouchDB was ranked 16th. In June 2014,CouchDB is ranked 21st with the score 22.78. At the beginning of 2014, Riak was ranked 27th. In June 2014, Riak is ranked 30 th with the score 10.82. At the beginning of 2014, DynamoDB was ranked 35th with a score of 7.20. In June 2014 DynamoDB is ranked 32nd with the score 9.58. At the beginning of 2014, SimpleDB was ranked 46th. In June 2014 SimpleDB, is ranked 52th with the score 2.94.

According to this ranking, MongoDB is clearly the most popular and widely known database system supporting the eventual consistency. As a reference MySQL is ranked 2nd on June 2014 and MariaDB is ranked 28th.

Maturity

Based on the authors’ research, MongoDB is clearly the most mature database system using eventual consistency. It has a large user and customer base and is actively developed. MongoDB has official drivers for several popular programming languages and development environments. There are also a huge number of unofficial or community-supported drivers for other programming languages and frameworks.

Riak is available for free under the Apache 2 License. In addition, Riak uses Basho Technologies to offer commercial licenses with subscription support and the ability for MDC (Multi Data Center) Replication. Riak has official drivers for Ruby, Java, Erlang, Python, PHP, and C/C++. There are also many community-supported drivers for other programming languages and frameworks.

CouchDB is a NoSQL database. CouchDB uses JSON to store data, supports MapReduce query functions in JavaScript and Erlang. CouchDB was first released in 2005 and became an Apache project in 2008. The replication and synchronization features of CouchDB make it ideal for mobile devices, where network connection is not guaranteed but the application must keep on working offline. CouchDB is also suited for applications with accumulating, occasionally changing data, on which pre-defined queries are to be run and where versioning is important (CRM, CMS systems, for example). The master-master replication is an especially interesting feature of CouchDB, which allows easy multi-site deployments. CouchDB is clearly a mature system and used in production environments.

DynamoDB is a commercially managed NoSQL database service, offered by Amazon.com as part of the Amazon Web Services portfolio. There is also a local development version of DynamoDB, with which developers can test DynamoDB-backed applications locally. The programming languages with DynamoDB binding include Java, Node.js, .NET, Perl, PHP, Python, Ruby, and Erlang. Therefore, DynamoDB is a mature and production-quality service.

Amazon SimpleDB is on the Beta phase and thus we do not suggest its use in production. ZATARA and DeeDS are in the research phase and there are no publicly available systems for testing. Therefore, they are at most in the Alpha phase and wedo not recommend their use in production as well.

Consistency

From earlier research, we know that Amazon SimpleDB’s inconsistency window for eventually consistent reads was almost always less than 500ms [10], while another study found that Amazon S3’s inconsistency window lasted up to 12 seconds [10]. However, to the author’s knowledge, there is not a widely known and accepted workload for the databases using eventual consistency. Therefore, the comparison of consistency or inconsistency must be based solely on system features. From a point of view of consistency, Riak offers the most configurable consistency feature, which allows selecting the consistency level.

MongoDB, SimpleDB and DynamoDB offer the possibility to read the latest version of the data time, thus providing strong consistency as well as eventual consistency. All other systems reviewed offer only eventual consistency, and may return an old version of the data when performing read operations.

Use Cases

MongoDB has been successfully used on operational intelligence, especially on storing log data, creating pre-aggregated reports and in hierarchical aggregation. Furthermore, MongoDB has been used on product management system to store product catalogs, manage inventory and category hierarchy. In content management systems, MongoDB is used to store metadata, asset management and store user comments on content, like blog posts and media.

Riak has been successfully used on simple high read-write applications for session storage, serving advertisements, storing log data and sensor data. Furthermore, Riak has been used in content management and social applications for storing user accounts, user settings and preferences, user events and timelines, and articles and blog posts.

The replication and synchronization capabilities of CouchDB are well suited in mobile environment, where network connection is not guaranteed, but the application must keep on working offline. CouchDB is also ideal for the applications with accumulating, occasionally changing data, on which pre-defined queries are to be run, and where versioning is important. CRM, CMS systems are the examples of such applications. CouchDB has an especially interesting feature: master-master replication, which allows easy multi-site deployments.

SimpleDB is well suited for logging, online games, and metadata indexing. However, one cannot use SimpleDB for aggregate reporting: there are no aggregate functions such as SUM, AVERAGE, MIN, etc. in SimpleDB. Metadata indexing is a very good use case for SimpleDB. One can also have data stored in S3 and use SimpleDB domains to store pointers to S3 objects with more information about them. Another class of applications, for which SimpleDB is ideal, is sharing information between isolated components of an application. SimpleDB also provides a way to share indexed information, i.e., the information that can be searched. A SimpleDB item is limited in size, but one can use S3 for storing bigger objects, such as images and videos, and point to them from SimpleDB. This could be called the metadata indexing.

Advantages and Disadvantages

Advantages

Eventual consistency is easy to achieve and provides some consistency for the clients[11]. Building an eventually consistent database has two advantages over building a strongly-consistent database: (1) It is much easier to build a system with weaker guarantees, and (2) database servers separated from the larger database cluster by a network partition can still accept writes from applications. Unsurprisingly, the second justification is the one given by the creators of the first generation NoSQL systems that adopted eventual consistency.

Eventual consistency is often strongly consistent. Several recent projects have verified the consistency of real-world eventually consistent stores [10].

Disadvantages

While eventual consistency is easy to achieve, the current definition is not precise [11]. Firstly, from the current definition, it is not clear what the state of eventually consistent databases is. A database always returning the value 42 is eventually consistent, even if 42 were never written. One possible definition would be that eventually all accesses return the last updated value, and thus the database cannot converge to an arbitrary value[1]. Even this new definition has another problem: what values can be returned before the eventual state of the database is reached? If replicas have not yet converged, what guarantees can be made on the data returned? In this case, the only possible solution would be to return the last known consistent value. The problem here is how to know what version of data item was converged to the same state on all replicas[1].

Eventual consistency requires that writes to one replica will eventually appear at other replicas, and that if all replicas have received the same set of writes, they will have the same values for all data. This weak form of consistency does not restrict the ordering of operations on different keys in any way, thus forcing programmers to reason about all possible orderings and exposing many inconsistencies to users. For example, under eventual consistency, after Alice updates her profile, she might not see that update after a refresh. Or, if Alice and Bob are commenting back-and-forth on a blog post, Carol might see a random non-contiguous subset of that conversation. When an engineer builds an application on an eventually consistent database, the engineer needs to answer several tough questions every time when data is accessed from the database:

What is the effect on the application if a database read returns an arbitrarily old value?
What is the effect on the application if the database sees modification happen in the wrong order?
What is the effect on the application if a client is modifying the database as another tries to read it?
And what is the effect that my database updates have on other clients, which are trying to read the data?

That is a hard list,and developers must work very hard in order to answer these questions. Essentially, an engineer needs to manually do the work to make sure that multiple clients do not introduce inconsistency between nodes.

One way to address these questions at least partly is to use a stronger version of eventual consistency.

DEFINITION: Strong Eventual consistency.

Eventual delivery: An update executed at one node eventually executes at all nodes.
Termination: All update executions terminate.
Strong Convergence: Nodes that have executed the same updates have equivalent state.

To the authors’ knowledge, there is currently no database system that uses strong eventual consistency. This could be because it is harder to implement. Eventual consistency represents a clear weakening of the guarantees that traditional databases provide, and places a requirement for software developers. Designing applications, which maintain correct behavior even if the accuracy of the database cannot be relied on, is hard.

In fact, Google addressed the pain points of eventual consistency in a recent paper on its F1 database [12] and noted: “We also have a lot of experience with eventual consistency systems at Google. In all such systems, we find developers spend a significant fraction of their time building extremely complex and error-prone mechanisms to cope with eventual consistency and handle data that may be out of date. We think this is an unacceptable burden to place on developers and that consistency problems should be solved at the database level.”

Conclusions

Clearly, there are several very mature and popular database systems using eventual consistency. Most of these are actively developed and there is a strong community behind them. We believe that we will see more database systems in the future using eventual consistency or strong eventual consistency.

References

[1] Vogels, W.: Scalable Web services: Eventually Consistent, ACM Queue, vol. 6, no. 6, pp. 14-16, October 2009.

[2] MongoDB: Agile and Scalable. http://www.mongodb.org/

[3] Anderson, C. J., Lehnardt, J., and Slater, N.: CouchDB: The Definitive Guide. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. January 2010, First Edition

[4] Amazon SimpleDB, http://aws.amazon.com/simpledb/

[5] DynamoDB, http://aws.amazon.com/dynamodb/

[6] Sivasubramanian, S.: Amazon DynamoDB: a seamlessly scalable non-relational database service. In proceeding SIGMOD International Conference on Management of Data, ACM New York, NY, USA, 2012, pp. 729-730.

[7] Riak, http://basho.com/riak/

[8] F. Andler, J. Hansson, J. Mellin, J. Eriksson, and B. Eftring: An overview of the DeeDS real-time database architecture. In Proc. of 6th International Workshop on Parallel and Distributed Real-Time Systems, 1998

[9] Bogdan Carstoiu and Dorin Carstoiu: Zatara, the Plug-in-able Eventually Consistent Distributed Database. AISS, 2(3), 2010.

[10] Bermbach, D. and Tai S: Eventual Consistency: How soon is eventual? Proceedings of the 6th Workshop on Middleware for Service Oriented Computing, pp1-5, 2011.

[11] Bailis, P., and Ghodsi, A: Eventual consistency today: limitations, extensions, and beyond. In communications of the ACM vol. 56, no. 5, pp.55-63, May 2013.

[12] Shute Jeff, Vingralek Radek, Samwel Bart, Handhy Ben, Whipkey Chad, Rollins Eric, Oancea Mircea, Littlefield Kyle, Menestina David, Ellner Stephqan, Cieslewicz John, Rae Ian, Stancescu Traian, Apte Himani: F1: A Distributed SQL Database That Scales, VLDB, 2013.

[13] Mawahib Musa Elbushra, Jan Lindström: Eventual Consistent Databases: State of the Art, OJDB 2014, 1(1), Pages 26-41, 2014, https://www.ronpub.com/journals/ojdb/2014-vol1/issue1/OJDB-v1i1n03_Elbushra.pdf.