500MB of Memory Saved Us ~60% in our DynamoDB Bill

Published in

Riskified Tech

6 min readMar 26, 2020

The main goal of the development department in a fast-growing startup is to deliver business value — and deliver it fast. In many cases this speed comes with a price tag — in some cases we might choose the easiest solution to implement rather than the optimal solution, in other cases, we might use some SaaS/managed platform to avoid the additional effort of maintaining infrastructure in house.

In the “cloud era”, where every action is audited, labeled and then billed, every inefficiency can easily add up to quite a substantial amount at the end of the month.
This is what happened to one of our high throughput data pipelines that leverages DynamoDB — it grew expensive, fast!

But, with just a few days of work, using a simple in-memory cache, we were able to reduce its price tag by around 60%.

In this blog post, I’ll go over our solution, and it’s performance.

Our data pipeline in a nutshell

A few years ago, we at Riskified, needed a high throughput, low latency data pipeline, that can collect a huge number of signals and aggregate them.

We’ve chosen to build our pipeline like so:

We needed a simple and super scalable solution, and we needed it fast. So we turned to DynamoDB, which gave us an amazing performance and helped us deliver a truly awesome product.

That being said, as our throughput grew, our DynamoDB bill grew with it, until we reached the point when we had difficulty justifying that cost compared to the product offering we were giving.

To understand how that problem came to be and what we did to solve it, let’s dive into our data modeling.

Data modeling in NoSQL datastores

Our business domain required us to query our data by a small set of different keys while maintaining low latency reads and high throughput writes, processing large amounts of data.
Considering all of these requirements, we chose to use a NoSQL datastore, and since we were a small, fast-growing startup, we went with AWS DynamoDB.

In most NoSQL data stores, including DynamoDB, each record has a primary key called the partition key, and fast queries are available only by this primary key (in practice there are few other keys to query the tables by, but query by the partition key is the main one).

With this in mind, we divided our data into 3 sets: raw data, aggregated data, and indexes.

Raw data holds all of the data we get, raw, with the data’s partition key as the primary key.
Aggregated, holds our data ready for querying and processing, and this is the main business value we give.
We used our data’s partition key as the primary key here too.
Index tables used to translate different queries to the primary key of our data.

The index tables give us the ability to query our main data by different keys. In practice, those tables map different keys to our primary data key.

Spotting the most expensive link in the chain

Thankfully, AWS provides quite good monitoring metrics and dashboards for their services, and the ones for DynamoDB are no exception — you can see each table read/write throughput/latency, etc.

Using said monitoring tools, we were able to identify that the write throughput of the index tables was very high! It made sense since our service is totally stateless and therefore each request was processed separately and written to the datastore. We also suspected that the data we wrote to the index tables was fairly similar, i.e we wrote the same data over and over again.

Combining this knowledge with the AWS cost calculator, we assumed those writes were the main reason our DynamoDB bill sky-rocketed, and we were determined to reduce this cost.

Easy win with an in-memory cache

We decided to add an in-memory write-through cache in front of each index table, we don’t need much, 250MB of memory should do the trick

With this solution, each request made to the index tables in DynamoDB is first looked up in the cache, and if there’s a cache hit, we don’t write it to DynamoDB.
On a cache miss, we write the data both to DynamoDB and to the cache.

With this solution, if successful, only the first write is sent to DynamoDB, and all records arriving with the same data are found in the cache, without having to be written.

Our service is written in Scala, so we’ve used Scaffeine, a Scala wrapper for Java’s caffeine cache, which gave us great performance, easy usage, and cache performance monitoring. A code sample can be found here.

Performance and cost reduction

With just 250MB given to the cache, we reached an amazing cache hit-rate of around 75%, and over time it climbed up to 80%.

Those amazing hit rates are easily explained by our usage of Kinesis, where elements with the same partition-key get mapped to the same consumer, where the value is already cached.

Our DynamoDB write throughput dropped by more than 80%:

And the crown jewel — our DynamoDB bill started to shrink after 2 years it just climbed unstoppably:

Our DynamoDB bill after introducing caching

Closing thoughts

Cache is usually the go-to solution to reduce latency, and external resources load, or to memoize resource-intensive computations.
Apparently, it also can be used to save some money ;)

I encourage you to have a close look at your services and systems and see where you can save a few coins by caching expensive data or optimizing usage of third party services.

As always, if you like what you’ve read, come follow me @cherkaskyb