Serverless Cassandra: AWS Keyspaces and Azure Cosmos DB in Comparison
Apache Cassandra is a fascinating NoSQL database technology for building web-scale and big data applications. Cassandra is an open source, distributed, columnar, database system that has no single point of failure. Commercial support for Apache Cassandra is offered by DataStax, but now there are other derivatives of Cassandra available such as Scylla, which reimplements Cassandra in C++, rather than Java.
With the advent of serverless in the cloud, the Cassandra landscape is becoming more fractured and also more competitive. Serverless “Cassandra” started with AWS DynamoDB, which was created by one of the co-creators of Cassandra, Avinash Lakshman. While DynamoDB is quite different from Cassandra in design and purpose, there is significant feature overlap and it’s a truly serverless database with on-demand, pay-as-you-go, pricing.
This article will examine, at a high level, with a strong focus on costs, three new “serverless” Cassandra services, AWS Keyspaces, Azure Cosmos DB Cassandra API, and DataStax Astra. While I would not declare Astra to be serverless, it is a no-ops, managed Cassandra service that is competitive with Keyspaces and Cosmos DB, especially for their provisioned pricing model. So for comparison’s sake, I am including Astra in this article to a very limited extent.
AWS Keyspaces On-Demand
Keyspaces is AWS’s new serverless Cassandra service. This Twitter thread by Rafal Wilinski offers a good comparison between AWS Keyspaces and DynamoDB. This ZDNet article also offers a good overview of Keyspaces after it became generally available in April 2020.
While unstated by AWS, Keyspaces is a compatibility layer built on top of DynamoDB as depicted in this Keyspaces doc. Additionally, Keyspaces has significant compatibility gaps with Cassandra, as documented in these Keyspaces docs. So, with Keyspaces, you cannot expect the full Cassandra experience, and Keyspaces will have performance characteristics more similar to DynamoDB than Cassandra. It’s possible that Keyspaces will close some of the feature gaps with Cassandra in the future, but the underlying architecture is set as DynamoDB.
Cassandra is typically used for balanced or write-heavy applications. One example might be IoT applications where there is a constant stream of data into the database. So, if you were to port an existing write-heavy application using Cassandra to Keyspaces, you would face potentially higher than expected costs.
Let’s examine the pricing of Keyspaces, as a basic calculation, as compared to DynamoDB:
On-Demand Pricing Estimate (DynamoDB vs. Keyspaces)
Because the underlying architecture for Keyspaces is DynamoDB, the pricing model is the same. You are just paying a 16% premium for the Cassandra compatibility layer.
With the pricing models in place for DynamoDB, Keyspaces, and Cosmos DB for that matter, it makes sense to compare two very different types of applications: write-heavy and read-heavy. They will give you two very different perspectives on if these services are a good value and if they make sense for each use case.
Example Pricing Estimate for a Write-Heavy Application (DynamoDB vs. Keyspaces)
For a write-heavy application, such as for IoT, you might expect these sample numbers, where you are writing 1.6 TB per month, and reading 160 GB per month from a 10 TB database.
Example Pricing Estimate for a Very Read-Heavy Application (DynamoDB vs. Keyspaces)
For a read-heavy application, such as a RESTful API, you might expect these sample numbers, where you are reading 1.6 TB per month, while writing 16 GB per month from a 1 TB database.
When to use Keyspaces On-Demand
With a very low-cost to run read-heavy applications, Keyspaces opens up new use cases for applications that may prefer Cassandra/Keyspaces as the database implementation. Here is when you may want to use Keyspaces:
- Keep door open for moving to real Cassandra (avoid vendor lock in)
- You want to use the Cassandra API, Cassandra Clients, or CQL
- You prefer CQL to PartiQL
- Application benefits from low latency performance of DynamoDB
- Application doesn’t require features of Cassandra unsupported by Keyspaces
- Workload is variable and/or unpredictable
Keyspaces, at first, appears to be more of an alternative to DynamoDB rather than a true competitor to a full Cassandra service. Yet, DynamoDB also offers many advantages over Keyspaces, such as easy cross-region replication, that are out of scope for this article. For now, let’s hold our thoughts on Keyspaces, and look at the alternatives.
Azure Cosmos DB Serverless Cassandra API
Azure Cosmos DB is a fully managed NoSQL database service that offers a Cassandra-compatible API. Compared to DynamoDB, there is more emphasis on multi-region replication of data as a primary feature in order to create globally scalable systems.
The Cosmos DB API offers a similar level of compatibility with Cassandra as Keyspaces does, with some minor differences between the two.
On Demand Pricing Estimate (Cosmos DB)
Costs for Cosmos DB are calculated by Request units per second (RU/s). Cosmos DB RUs for writes are 5 times as expensive as reads, which is the same ratio as DynamoDB/Keyspaces. This immediately implies that Cosmos DB also can be expensive for write-heavy applications, just like Keyspaces is. Storage cost is $0.25 GB/month which also aligns to DynamoDB/Keyspaces.
Conversion Table for Cosmos DB RU/s
In order to use the capacity calculator for Cosmos DB, we need to calculate reads/writes per second. That metric, in tandem with storage requirements, lets you use the estimator to calculate RU/s. By using the following table, with the RU/s outputs from the calculator, it will then be possible to estimate the cost for Cosmos DB serverless. The capacity calculator doesn’t take into account serverless, as that has to be calculated in the next section of this article.
Example Pricing Estimate for a Write-Heavy Application (Cosmos DB Cassandra Serverless)
By using the RU/s metric, it’s now possible to calculate the cost of serverless Cosmos DB, as shown in the next table. Once again, this equates to a write-heavy application, such as for IoT, as depicted earlier in this article.
As you can see from this calculation, Cosmos DB may be significantly less expensive than Keyspaces for write-heavy applications, but your mileage will vary depending on the number of reads, writes, and required storage.
Example Pricing Estimate for a Very Read-Heavy Application (Cosmos DB Cassandra Serverless)
When it comes to a very read-heavy application, Keyspaces and Cosmos DB are neck and neck with pricing. Please refer back to the previous section to review the Keyspace pricing calculation, if necessary.
When to use Cosmos DB Serverless
Here is when you may want to use Cosmos DB Serverless:
- Keep door open for moving to real Cassandra (avoid vendor lock in)
- You want to use the Cassandra API, Cassandra Clients, or CQL
- Application benefits from low latency performance and/or global scalability offered by Cosmos DB.
- Application doesn’t require features of Cassandra unsupported by Cosmos DB
- Workload is variable and/or unpredictable
Cosmos DB Serverless is very competitive on features and price as compared to Keyspaces. They are definitely head-to-head competitors as serverless Cassandra services.
DataStax Astra (Managed Cassandra, Not Serverless)
With Keyspaces and Cosmos DB Serverless, we have established that serverless Cassandra services are much cheaper for read-heavy rather than write-heavy applications, as the cost of writing data is 5 times of reading data. With provisioned resources for Keyspaces or Cosmos DB, the cost of writes can be reduced, assuming that loads are fairly constant.
Given these facts, maybe utilizing a real Cassandra managed service would be a more cost effective approach for write-heavy applications? DataStax, as a primary contributor to the Cassandra community, offers their Astra managed service for Cassandra. Astra is fully managed and multi-cloud, including AWS and Azure. Additionally, Astra has a 5 GB free tier to help you get started.
If you take a look at the DataStax Astra pricing page, a D1 high density production cluster costs $3,902 per month. It comes with 1.5 TB of storage, so it falls short of the 10 TB write-heavy example documented above. Nevertheless, Astra would likely handle write-heavy applications with the compute and memory resources available. I am unsure if the storage sizes can be increased for their Astra deployments, and would need to reach out to DataStax to find out more information. Unfortunately, because of the storage constraints of the Astra offering, I cannot give you a true price comparison in this article.
If anything, the relatively rigid specs and pricing structure of Astra show the benefits of a truly serverless Cassandra service where it autoscales and adapts to whatever size of workload that you need to run. That being said, Astra is really Cassandra, and comes with many benefits that cannot be offered by Keyspaces or Cosmos DB.
Update: A Serverless Astra service (in beta) was just announced. No pricing information is available at this time.
Price Check: Google BigTable
For write-heavy applications, it’s hard to ascertain what’s a good value for a high performance serverless or managed NoSQL database. Looking beyond Cassandra, what if we were to consider an alternative such as BigTable from Google Cloud? Here is the estimate for a 10 TB BigTable database:
As we can see here, for 10 TB of data, it’s a simple calculation, which requires a minimum of 4 BigTable nodes to support 10 TB of data on SSD storage. The total estimated cost is $3638.80 per month. So, given that, Keyspaces and Cosmos DB pricing seems to be in the ballpark ($5398.40 and $2953, respectively).
Write-Heavy Application Options, In Review
It’s now apparent that write-heavy applications with large, distributed databases in the cloud are not cheap after looking at AWS DynamoDB/Keyspaces, Azure Cosmos DB, Astra, and BigTable. Both AWS and Azure charge 5 times more for writes than reads, which is understandable because of the significant overhead incurred with replicating data after a write.
Given that the point of this article is to evaluate serverless Cassandra services, we can see that both Keyspaces and Cosmos DB seem to be in the ballpark of being reasonably priced for write-heavy applications at scale. That being said, Cosmos DB seems to be a better value assuming my calculations are correct. Yet, if you’re already in AWS, I would recommend using the “Provisioned capacity mode” unless you want to stay completely serverless and incur the additional expense.
Conclusion
Keyspaces and Cosmos DB both seem to be reasonable options for serverless Cassandra, assuming you do not need the complete Cassandra API, as there are significant gaps for each service. In both cases, they offer a compatibility layer for Cassandra while the backend remains a completely different implementation.
Also, both Keyspaces and Cosmos DB open the door to using Cassandra for read-heavy workloads such as RESTful microservices. The latencies are very low, and the costs are also low. Yet, there are alternatives that also fit this use case. You need to answer the question, “Why Cassandra?” and weigh all of the options.
One use case that you may consider for Keyspaces/Cosmos DB is merging your application database and data analytics database into one serverless Cassandra database. It’s so often the case that data is copied from one to the other, where if you were to use Keyspaces or Cosmos DB, that may not be necessary. With these being serverless services, the scalability concerns are almost eliminated, but just keep an eye on the costs!
With that, I’d be interested to hear your take on Keyspaces and Cosmos DB as serverless Cassandra databases. What are the use cases that make sense for these services? Please comment below!