Graph DB in Serverless Flavor: Hacking DynamoDB

Renato Byrro - Oct 7 '19 - - Dev Community

In this article we'll cover:

  • 3️⃣ 3 steps to grasp DynamoDB working as a Graph database
  • 📺 Work with examples and illustrative data
  • 💪 Increasing visibility of the serverless stack

DynamoDB is the NoSQL, managed database by AWS. It is designed more as a key-value store. But it can be hacked to work as a graph database.

Which is pretty cool 😉

How so?

There is a data access pattern called Adjacency List. It's a way to describe nodes and edges. We can use it in DynamoDB to leverage graph relationships.

Building the concept in three steps:

Step 1: Basic Keys

Each entry in DynamoDB has two basic keys: primary and sort.

The first (primary-key) allows efficient access to the entries in the DB. Think of it as the entries' IDs.

The second (sort-key) allows... well, for sorting results, obviously.

As an example, we could model e-commerce purchases as:

  • primary-key: order-ID
  • sort-key: timestamp
Primary-Key Sort-Key Products
order-1 1570147200 ['Tesla Roadster', 'Sunglasses']
order-2 1569976212 ['Chocolate', 'Gummies']
. . . . . . . . .

This allows us to query by Order ID, as well as sort orders by their timestamp.

Step 2: Many-to-Many Relationships

One could say a graph (node-edge model) is a form of a many-to-many relationship. As such, it can be represented by an adjacency list. The primary-key represents the top-level item, while the sort-key is used for associations. Consider:

  • A blog has two posts: "Post-1" and "Post-2"
  • There are three possible tags: "cool", "awesome", "neat"
Primary Key Sort Key Title Text
Post-1 Post-1 Hello World! I'm a Post inside Dynamo!
Post-2 Post-2 Foo Bar Can't wait for the graph
Tag-1 Tag-1 cool null
Tag-1 Tag-1 awesome null
Tag-1 Tag-1 neat null

Access Patterns:

  1. Retrieve Post-1:
    primary-key == 'Post-1' && sort-key == 'Post-1'

  2. Retrieve Tag-2:
    primary-key == 'Tag-2' && sort-key == 'Tag-2'

Now, say Post-1 has one tag: cool.

Primary Key Sort Key Primary Title Sort Title
Post-1 Tag-1 Hello World! cool
Tag-1 Post-1 cool Hello World!

The connection is represented twice, which is important from an access pattern standpoint, as you can see below:

Access patterns:

  1. Get tags assigned to Post-1:
    primary-key == 'Post-1' & sort-key BEGINS_WITH 'Tag'

  2. Get posts tagged with Tag-1:
    primary-key == 'Tag-1' & sort-key BEGINS_WITH 'Post'

The BEGINS_WITH operator allows matching multiple Tags associated with a Post, or vice versa.

By having two entries in the DB for each connection, we can reach the other node regardless of our starting point. If we have a tag, we can find all posts associated. If we have a post, we can find all tags assigned.

Step 3: The Graph

We'll take the tripestore model for this example. It consists of subject-predicate-object, such as:

Subject Predicate Object
John Likes Music
Ross Friends With Chandler

Now, how do we model this data structure in DynamoDB? We leverage the adjacency pattern!

Say we have a Friendly social media site. Here's how we could model it in DynamoDB.

3.1 Nodes

We start by describing the nodes. In our example, we only have users. These will be the subjects and objects of the triplestore model. A name property is also added to each user:

Primary Key Sort Key Name
User-1 User-1 Ross
User-2 User-2 Rachel
User-3 User-3 Monica

3.2 Predicates

Primary Key Sort Key
Pred-Sibling Pred-Sibling
Pred-Friend Pred-Friend
Pred-Married Pred-Married

In the case of predicates, there are no properties associated.

3.3 Connections

Finally, we model the relationships between nodes using predicates to connect them. We also add a property indicating when the relationship started:

Primary Key Sort Key Relationship Start
User-1 Pred-Sibling-User-3 1994-09-24
User-3 Pred-Sibling-User-1 1994-09-24
User-2 Pred-Friend-User-3 1994-09-24
User-3 Pred-Friend-User-2 1994-09-24
User-1 Pred-Married-User-2 1999-05-20
User-2 Pred-Married-User-1 1999-05-20

Again, we have two entries for each connection, also for access patterns reasons, as explained above in the post <> tag example.

Here's how we read the node connections:

  • Ross and Monica are siblings
  • Monica and Rachel are friends
  • Rachel and Ross are married

Access patterns to retrieve our data:

  1. Who is married to Ross?
    primary-key == 'User-1' & sort-key BEGINS_WITH 'Pred-Married-User'

  2. Who are Monica's siblings?
    primary-key == 'User-3' & sort-key BEGINS_WITH 'Pred-Sibling-User'

  3. Who are Rachels's friends?
    primary-key == 'User-2' & sort-key BEGINS_WITH 'Pred-Friend-User'

Going deeper

The example above is very simplified, but it illustrates the idea.

It's possible to add other types of nodes to the graph. For instance:

Primary Key Sort Key Title
Location-1 Location-1 Cafeteria

One advantage is that links can have their own properties as well. For instance: in the Ross-Married-Rachel link, we could have properties such as where the couple met:

Primary Key Sort Key Meeting Place
User-1 Pred-Married-User-2 Location-1

Or we could model the "Meeting Place" as a node in the graph:

Primary Key Sort Key
Pred-MeetingPlace Pred-MeetingPlace
User-1 Pred-MeetingPlace-Location-1-User-2
User-1 Pred-MeetingPlace-User-2-Location-1

Access patterns:

  1. Get everyone Ross met at the Cafeteria:
    primary-key == 'User-1' & sort-key BEGINS_WITH 'Pred-MeetingPlace-Location-1'

  2. Where did Ross meet Rachel?
    primary-key == 'User-1' & sort-key BEGINS_WITH 'Pred-MeetingPlace-User-2'

This is actually kind of a hack of the triplestore model. The predicate Pred-MeetingPlace-Location-1 is actually a combination of a real predicate Pred-MeetingPlace and a node Location-1. This allows for flexible queries. Think of it like the node attached modifying, or characterizing the predicate.

The full table combined is available down at the end of the article.

Heads up

It is important to think about your access patterns before modeling your data. It may be difficult to add support for different access patterns down the road.

Check the DynamoDB documentation to learn more.

Why to Graph On DynamoDB?

DynamoDB is easy to get started with and keeps infrastructure hurdles to a minimum. Especially for startups and small teams that can't afford a DevOps team to maintain healthy DB servers/instances, it can be a great ally.

In summary, Dynamo will give us:

  • Fully managed & serverless: minimal infrastructure overhead
  • Hyper scalability: in both IO and storage, Dynamo can scale to tens of thousands of concurrent requests and exabytes of data
  • Integration with Lambda
  • Microsecond latency
  • Built-in global replication

Will I lose control of my DB by using serverless?

That's a common concern about serverless.

We really need to ask ourselves: what and why do I want to control?

Usually, it boils down to monitoring, making sure everything is running smoothly. Serverless indeed requires a different approach to observability.

There are professional services, that can take care of that. Dashbird, especially, was created from the ground up with the goal to provide full visibility over entire serverless stacks, so you might want to check them out.


Full Dynamo Tables

Post + Tag Example

Primary Key Sort Key Primary Title Sort Title Text
Post-1 Post-1 Hello World null I'm a Post in Dynamo!
Post-2 Post-2 Foo Bar null Graph is cool!
Tag-1 Tag-1 cool null null
Tag-1 Tag-1 awesome null null
Tag-1 Tag-1 neat null null
Post-1 Tag-1 Hello World cool null
Tag-1 Post-1 cool Hello World null

Friends Social Media

Primary Key Sort Key Title Rel. Start
User-1 User-1 Ross null
User-2 User-2 Rachel null
User-3 User-3 Monica null
Location-1 Location-1 Cafeteria null
Pred-Sibling Pred-Sibling null null
Pred-Friend Pred-Friend null null
Pred-Married Pred-Married null null
Pred-MeetingPlace Pred-MeetingPlace null null
User-1 Pred-Sibling-User-3 null 1994-09-24
User-3 Pred-Sibling-User-1 null 1994-09-24
User-2 Pred-Friend-User-3 null 1994-09-24
User-3 Pred-Friend-User-2 null 1994-09-24
User-1 Pred-Married-User-2 null 1999-05-20
User-2 Pred-Married-User-1 null 1999-05-20
User-1 Pred-MeetingPlace-Location-1-User-2 null null
User-1 Pred-MeetingPlace-User-2-Location-1 null null
User-2 Pred-MeetingPlace-Location-1-User-1 null null
User-2 Pred-MeetingPlace-User-1-Location-1 null null

Photo Credits:

Full Disclosure

I work as a Developer Advocate at Dashbird.


Additional resources:


. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Terabox Video Player