How to use DynamoDB single-table design with AppSync

Learn how to use the benefits of single-table design with your GraphQL API

How to use DynamoDB single-table design with AppSync

AppSync is a fully managed service from AWS that lets you build and deploy scalable and secure GraphQL APIs in the cloud. DynamoDB is a NoSQL fully managed and scalable database. Both being serverless services, they are very often used together.

If you are interested in DynamoDB, you have also probably already heard about single-table design. However, that way of designing databases is often considered to be unhelpful with GraphQL.

In this blog post, I will share some ideas on how we can still use single-table design and its benefits with GraphQL, and show some techniques that I use with AWS AppSync.

Let's begin.

It's all about access pattern

If you have watched Rick Houlihan's talks at AWS re:invent (In 2017, 2018 and 2019), or read Alex DeBrie's book (I totally recommend it if you haven't), you probably know it by now: The key to success with single table design is that you should know your access patterns in advance.

It is likely that your access patterns will reflect what the different pages or views of your application will show. Something like

For a given user, show user details and his/her last 10 orders

Then as the user drills down to a particular order, you will show the order and all its order items (That is a second access pattern).

However, one of the key features of GraphQL is that you can fetch nested children as deep as you need. In our example, you could ask "Give me that user's details with his/her last 10 orders and all the items of those orders".

Under the hood, GraphQL uses resolvers that fetch data from the persistence storage and return it in the query response. Most of the time, each child is a different resolver that receives the parent (or source) data, that you can use to fetch related data. This is how GraphQL "joins" data, and this is probably the way you have been using DynamoDB with GraphQL so far. DynamoDB doesn't have JOINs, GraphQL fills that gap for you!

With all that in mind, it does not look like single-table design DynamoDB has many benefits to bring to GraphQL.

But wait a minute...

One of the spirits of GraphQL is no under- and over fetching. GraphQL lets you choose what fields and children you need in your client application and lets you fetch those fields specifically. In our previous example, even though GraphQL allows you to, will you ever do a query that fetches all the users, and all the orders, and all the items in your client application? Unless you are building a public API with unpredictable access patterns, chances are that the answer is No (except for debugging or exploring your data, maybe).

It's all about access patterns! Just as I explained earlier, you will probably show a list of orders first, and when the user clicks on an order you will show the details. That's two different queries. They might look like this:

Query 1: Fetch a User and related order:

  user(id: "123") {
    id
    email
    name
    orders {
      id
      orderDate
      shippedDate
    }
  }

Query 2: Fetch an Order and related items:

order(id: "456") {
    id
    orderDate
    shippedDate
    items {
      productId
      name
      quantity
      price
    }
}

We still have the same 2 access patterns from the beginning.

By now you might think

But I still need one resolver for each entity

No, you don't! We'll see how in the next section.

Build your resolvers with your access patterns in mind

Maybe one of the most common misconceptions with GraphQL is that each nested entity is a different resolver. However, this does not have to be the case. You can easily return child elements from the parent resolver. If you do so, you don't even need a child resolver at all. This can even often be the case with DynamoDB if you denormalise some relations in the parent item (ie: in a Map or a List attribute). But it can also work when the entities are decoupled but you are able to fetch them all in one query, for example with a JOIN (with RDBMS) or if your items live under the same partition (with DynamoDB).

Let's see how this can work in our example and what kind of resolvers we need.

Our user resolver can probably get the user and the orders from DynamoDB in a single query, then return them all in one resolver; while the order resolver can do the same with a particular order and its items.

Here is what the order resolver might look like (request template):

## Fetch the Order and OrderItems (they are under the same partition)
{
    "version": "2017-02-28",
    "operation": "Query",
    "query": {
      "expression": "#PK = :PK",
      "expressionNames": {
        "#PK": "PK"
      },
      "expressionValues": {
        ":PK": $util.dynamodb.toDynamoDBJson("ORDER#${context.args.id}")
      }
    }
}

and the response template

## re-organize the data
#if($context.result.items.size() == 0)
  $utils.error("NotFound", "NotFound");
#else
  #set ($order = {})
  #foreach($item in $context.result.items)
    #if($item.SK.startsWith("ORDER#"))
      #set ($order = $item)
      $util.qr($order.put("items", []))
    #else
      #if($item.SK.startsWith("ORDERITEM#"))
        $util.qr($order.items.add($item))
      #end
    #end
  #end
  $utils.toJson($order)
#end

In the response template, we receive all the items in a single array. All we have to do is to re-organize them a little (we embed the items inside the order itself). We then return the whole thing.

We're sending only one query to DynamoDB! 🎉

The drawbacks

This way of doing things comes with a couple of issues though:

  1. If you only need fields from the order (your query does not include the items field), you will be over-fetching.

  2. It only works with that single access pattern. If you need to access the order items from let's say the updateOrder endpoint, it won't work because they won't come in that DynamoDB access pattern.

Let's tackle these issues one at a time

Avoid over-fetching children entities if they are not explicitly included in the query

That one is easy. Every AppSync query comes with a Context object that contains information about the query. Things like args, source, identity, request and info. The last one is the one we're interested in. It gives us information about the Query and more specifically, the selectionSetList attribute tells us what fields were required in the query. We can use that to change our request to DynamoDB and include the item orders, or not, depending on its value. Let's adjust our request template to use it.

#set($expression="#PK = :PK")
#set($expressionNames={"#PK": "PK"})
#set($expressionValues={":PK": $util.dynamodb.toDynamoDB("ORDER#$context.args.id")})
## if the selectionSetList does not contain the items, we fetch only the order
#if(!$ctx.info.selectionSetList.contains("items"))
  #set($expression=$expression + " and #SK = :SK")
  $util.qr($expressionNames.put("#SK", "SK"))
  $util.qr($expressionValues.put(":SK", $util.dynamodb.toDynamoDB("ORDER#${context.args.id}")))
#end
{
    "version" : "2017-02-28",
    "operation" : "Query",
    "query" : {
      "expression": "$expression",
      "expressionNames": $util.toJson($expressionNames),
      "expressionValues": $util.toJson($expressionValues)
    }
}

When the selectionSetList does not include "items", we limit the Query to only the Order item itself by adding an SK condition. Now we only fetch what we need when we need it.

Make other access patterns to work

This one is a little more tricky. If we want the updateOrder query to return order items as well, we will need to do it in 2 steps. This means that we will need a resolver for the order items. Unfortunately, here we have no choice. Let's write our order.items resolver.

{
    "version" : "2017-02-28",
    "operation" : "Query",
    "query" : {
      "expression": "#PK = :PK and begins_with(#SK, :SK)",
      "expressionNames" : {
        "#PK" : "PK",
        "#SK" : "SK"
      },
      "expressionValues" : {
        ":PK": $util.dynamodb.toDynamoDBJson("ORDER#${context.source.id}"),
        ":SK": $util.dynamodb.toDynamoDBJson("ORDERITEM#")
      }
    }
}

Now we can see related order items for an Order in any query. And this is probably what you want! Otherwise, your GraphQL API would be sort of inconsistent where data is returned in some request and not in others.

But wait! We just broke our previous access pattern! Since resolvers are associated with a Type (in our case: Order) and the type is the same, that new resolver will be used by the order endpoint, too. This means, that the extra query will also be executed in that case, making all the efforts we have done so far useless. Worse, we would be fetching the order items twice! Is there a way we can avoid that?

Remember the Context object? It also comes with the source attribute. We even just used it to get the id of the Order and fetch the related order items. That object actually comes with the full result from the previous (parent) resolver, including the order items, if any. We can use that in our order.items resolver and avoid the extra query if the order items come pre-populated from the source. For that, we can use the #return directive.

#if($ctx.source.items)
  #return($ctx.source.items)
#else
{
    "version" : "2017-02-28",
    "operation" : "Query",
    "query" : {
      "expression": "#PK = :PK and begins_with(#SK, :SK)",
      "expressionNames" : {
        "#PK" : "PK",
        "#SK" : "SK"
      },
      "expressionValues" : {
        ":PK": $util.dynamodb.toDynamoDBJson("ORDER#${context.source.id}"),
        ":SK": $util.dynamodb.toDynamoDBJson("ORDERITEM#")
      }
    }
}
#end

By returning early in the request template, the DynamoDB query will not be executed at all and the data from the previous resolver will just pass-through.

Limitations

The techniques I just showed you have some other limitations to keep in mind.

Deep nesting

It can only work well when you have 2 levels of nesting. In DynamoDB, with single-table design, you will almost never group more than 2 levels of hierarchy under the same partition key. You will probably not store orders and order items under the user PK. Instead, you will store the items under a GSI with the order id as the PK. You won't be able to fetch all these items in one single query and you will need at least two. Usually, you will be able to group entities two by two (4 levels of hierarchy = 2 grouped queries to DynamoDB).

That said, it still really depends on your access pattern. If your client API almost always fetches 3 levels, or more, of hierarchy in a single query, you might still group them under the same PK and filter/re-order the items in your top resolver. It might just become more complicated to maintain and you might be reading more than you need in some cases. You might also hit other limitations like the 1MB DynamoDB limit a lot faster.

Pagination

In our example, our users will have an unbounded number of Orders and you probably don't want to return them all in one single query. You will for example get the last 10 in one query, and then paginate. You might want to have a query like this one:

  user(id: "123") {
    id
    email
    name
    orders(nextToken: "ey........") {
      id
      orderDate
      shippedDate
    }
  }

Our design will simply not work in this case because when you pass nextToken, you simply won't get the Order item at all in the DyamoDB response. In fact, you won't even have access to the nextToken argument from your user resolver.

That said, it would also probably be a bad design of your GraphQL API. Do you still want to bring back the user for every orders page? Probably not. If you need to paginate, you should probably have another endpoint in your API. Something like ordersByUserId(userId: ID!): [Order], and use that instead.

Sorting

Sorting is also limited. You might for instance sometimes want to get the last 10 orders, and sometimes the first 10, for a given user. Now you are limited to a direction depending on where you placed the User item in the partition. If it's at the beginning (and your orders are sorted by date), you will get the user's first orders, and if it's at the end, the last ones. Sorting is basically limited to how you designed your table in the first place.

If you need 2 ways of sorting orders (ASC and DESC), you have in fact 2 access patterns. What you would normally do is to add a GSI to your table for the second access pattern. You could then use an index or the other depending on the direction requested by your query. That's another level of complexity to take into consideration.

If you use a sub-resolver (one for the User and one for the Orders), all you would have to do is change the ScanIndexForward param in your items query.

Conclusions

We just saw how single-table design can work well with GraphQL and how we can use its benefits to reduce DynamoDB calls. We found out that it comes with a few challenges and how to deal with them. We also learned about the limitations and things to take into account before using this method. If your question is:

Is is worth it?

Well, it probably depends on your use case. If you know your access patterns well in advance, it can give you a little performance boost. If what you need is flexibility or you have unpredictable access patterns, you probably should stick to keeping your resolvers de-coupled.

If you have comments, suggestions or questions, let me know!

Did you find this article valuable?

Support Benoît Bouré by becoming a sponsor. Any amount is appreciated!