Benoit Boure

Calling External Endpoints With Step Functions and the CDK

Benoît Bouré — Wed, 17 Jan 2024 20:21:11 GMT

At re:Invent 2023, AWS announced a new feature for Step Functions that allows you to call third-party HTTPS API endpoints directly from your workflow without the need to write a Lambda function. It's a simple way to allow you to securely call external providers such as Stripe, Github, etc.

AWS Step Functions is a service from AWS that allows developers to easily create orchestrated processes (state machines) without having to manage servers. It integrates with over 200 services. With Step Functions, you only pay for the number of state transitions that your state machines execute.

In this article, I will explain this new feature, and illustrate it with a practical example using the CDK (Cloud Development Kit).

How does it work?

The HTTP endpoint Task state allows you to send an HTTPS request to the endpoint of your choice. It can be a GET, POST, PUT, DELETE, PATCH, OPTIONS, or HEAD, and you can also pass a request body.

You will also need to specify a connection arn for authentication. Step Functions HTTP endpoints use EventBridge connections, the same as for EventBridge API destinations. This keeps your credentials secure, preventing them from being hard-coded in the ASL (Amazon State Language) definition.

A Practical Example

Let's take a practical example for this new Task state. Imagine that we are selling licenses for an app (like GraphBolt). We accept payments on our website. Once a payment has been confirmed, we want to generate a license and send it to the user via email.

We are using the following services:

Paddle

Paddle is a merchant of record which provides a payment gateway. They also support sending notifications to your backend via webhooks when a purchase is confirmed. They also have an API that allows us to fetch information about payments, customers, etc.

Keygen

Keygen.sh is an open-source licensing API. It provides everything you need to generate, manage, and validate software licenses.

Our goal is to create a back-end system with an API that receives events from Paddle, validates them, and then starts a Step Functions state machine that processes the event to generate and send the license key to the user.

Here is the overview of what it looks like.

💡

Here, I will only focus on the Step Functions state machine, and more specifically the HTTP task definition. I won't go into detail about how Paddle and Keygen work.

Here is what we want our state machine to accomplish:

Receive a transaction.complete Paddle event as input.
Generate a new License in Keygen through the API.
Fetch the customer information from Paddle (name, email, etc) using the customer_id included in the event.
Send the license key to the user via SES.

At the time of writing this article, the CDK does not (yet) have a dedicated Construct for HTTP endpoint Task (watch this Guthub issue). However, we can use the CustomTask construct and define it using plain old ASL.

This is how I defined the CreateLicense task.

const keygenConnection = new Connection(this, 'KeygenConnection', {  authorization: Authorization.apiKey(    'Authorization',    SecretValue.secretsManager('KeygenSecret'),  ),});const keygenEndpoint =  'https://api.keygen.sh/v1/accounts/2d4fdf58-9507-4e0b-a7e2-5520e1f1cbdb';const createLicense = new CustomState(this, 'CreateLicense', {  stateJson: {    Type: 'Task',    Resource: 'arn:aws:states:::http:invoke',    Parameters: {      ApiEndpoint: `${keygenEndpoint}/licenses`,      Method: 'POST',      Authentication: {        ConnectionArn: keygenConnection.connectionArn,      },      RequestBody: {        data: {          type: 'licenses',          attributes: {            metadata: {              'transactionId.$': '$.data.id',              'customerId.$': '$.data.customer_id',            },          },          relationships: {            policy: {              data: {                type: 'policies',                id: '8c2294b0-dbbe-4028-b561-6aa246d60951',              },            },          },        },      },    },    ResultSelector: {      'body.$': 'States.StringToJson($.ResponseBody)',    },    OutputPath: '$.body',  },});

First, we create a Connection for our HTTP task. This is an EventBridge Connection. As I explained earlier, the role of the connection is to store the credentials securely and not leak them in the Step Functions definition. However, we also don't want to hard-code them in the CDK definition. To avoid that, I manually created a value in Secret Manager named KeygenSecret which contains the API key, and I referenced it in the connection.

Then, I create a Task with the Resource type of arn:aws:states:::http:invoke, attach the connection to it, and define all the other attributes (method, body, etc).

This defines our HTTP task state, but to be able to execute it, Step Functions also needs the necessary permissions.

We need three things:

Permission to execute HTTP requests
Permission to use the EventBridge connection
Permission to fetch the connection's secret

For that, I manually add the following IAM policies to the state machine's role.

sm.role.attachInlinePolicy(  new Policy(this, 'HttpInvoke', {    statements: [      new PolicyStatement({        actions: ['states:InvokeHTTPEndpoint'],        resources: [sm.stateMachineArn],        conditions: {          StringEquals: {            'states:HTTPMethod': 'POST',          },          StringLike: {            'states:HTTPEndpoint': `${keyGenEndpoint}/*`,          },        },      }),      new PolicyStatement({        actions: ['events:RetrieveConnectionCredentials'],        resources: [          keygenConnection.connectionArn,        ],      }),      new PolicyStatement({        actions: [          'secretsmanager:GetSecretValue',          'secretsmanager:DescribeSecret',        ],        resources: [          'arn:aws:secretsmanager:*:*:secret:events!connection/*',        ],      }),    ],  }),);

Finally, I did the same thing for the Paddle HTTP task. I also added the SES sendEmail task and put everything together.

🧑💻

You can find the full code on GitHub.

Conclusion

The support for direct calls to HTTP endpoints opens a lot of possibilities to integrate with third parties. Before, we would require a Lambda function to achieve the same result. This is one more step forward towards zero-code Step Functions!

Securely Access Your AWS Resources From Github Actions

Benoît Bouré — Mon, 27 Dec 2021 15:13:06 GMT

Security is a very important topic for all cloud engineers. Making sure that your infrastructure and data are kept out of reach of malicious people is one of the most serious things to get right. In AWS, we are used to dealing with IAM roles and permissions that make our resources accessible to users or to other resources. However, sometimes you need to grant access from outside your organization.

One example is when you want to deploy your infrastructure from a CI/CD pipeline, like Github Actions. How do you allow your workflow to gain access to your AWS account?

One approach is to create a dedicated IAM user, store its credentials in the Github secrets store, and allow the workflow to use them. Easy, enough! Secrets are encrypted by Github, so it is secure, right?

Not really... The problem is that those credentials are meant to be long-lived. It means that if anyone is able to get hold of them for whatever reason (eg: a leak in workflow logs, someone gaining access to a GitHub action runner, etc), they will be able to access all your resources (at least those that the credentials are allowed to control). Sure, you could rotate them from time to time, but you'd have to do that manually. This is probably not something you want to spend time doing and let's face it, you probably won't!

Luckily, there is a better solution. If you are using Github Actions, you can allow Github to grab temporary, short-lived, credentials that it can use during the execution of the workflow. After that, the credentials will expire and no one will ever be able to use them again.

In this post, I will guide you through the steps to set this up. Don't worry, it's actually easier than you think!

Here is a schema representing what we are going to accomplish

Setting up your AWS account

💡 TL;DR; I created a CloudFormation quick-create link that you can use to automate the following steps. See at the bottom of this article. If you want to know how it works, and what CloudFormation is going to do, keep reading this section.

Create an OpenID Connect Identity provider

The first step is to create an OpenID Connect (OIDC) identity provider in your AWS Account. This will allow Github to identify itself.

Got to the IAM console -> Identity providers
Click Add new provider
Select OpenID Connect
Provider Url: https://token.actions.githubusercontent.com (Don't forget to click Get Thumbprint)
Audience: sts.amazonaws.com
Add tags if you want to and click Add Provider

💡 You will need to do this step only once per AWS account.

Edit Jan 13 2022

On Jan 12, GithubActions changed its certificate chain. The new thumbprint is 6938fd4d98bab03faadb97b34396831e3780aea1

https://twitter.com/Benoit_Boure/status/1481537078869565440

Create a role

You now need to create a role that Github will be able to assume in order to access the resources it needs to control.

Go back to IAM and select Roles
Create a new Role
Chose Web Identity, select the Identity provider you created in the previous step, and its audience.
Click Next:Permissions

You now need to give the role the appropriate permissions (Policies). These are the ones that Github needs in order to do whatever it has to do. This will vary based on your use case, so I will leave that up to you. Keep in mind that you should stick to the principle of least privileges.

When that is done, give your role a name and click Create Role.

There is now an additional step to do. You need to edit the trust policy of the role to reduce its scope to your repository only. Make sure you don't skip this part, it is very important. Without that, any repository on GitHub will be able to assume your role and access your resources. (Unfortunately, there does not seem to be a way to do that at creation time).

Go back to IAM Roles and select the created Role. Choose Trust Relationships and Edit Trust Relationship.

Under Condition, add the following segment:

"StringLike": {  "token.actions.githubusercontent.com:sub": "repo:[your-org]/[your-repo]:*"}

Replace the organization and repo names to match yours, and click Update Trust Policy.

Note: You can take this even further and reduce the scope, by using git references, to a branch or tag only, for example.eg: repo:[your-org]/[your-repo]:ref:refs/heads/master

The final result will look like this:

{  "Version": "2012-10-17",  "Statement": [    {      "Effect": "Allow",      "Principal": {        "Federated": "arn:aws:iam::1234567890:oidc-provider/token.actions.githubusercontent.com"      },      "Action": "sts:AssumeRoleWithWebIdentity",      "Condition": {        "StringEquals": {          "token.actions.githubusercontent.com:aud": "sts.amazonaws.com"        },        "StringLike": {          "token.actions.githubusercontent.com:sub": "repo:[your-org]/[your-repo]:*"        }      }    }  ]}

This concludes the required configurations on your AWS account. Take note of the role ARN, you'll need it later.

💡 You can create different roles per account and use a different one for each use case. For example, one per application, per usage (configurations, deployment, integration tests), etc. You can play with that to reduce the scope of each session even more.

Configure Github action workflow

Your Github workflow requires additional permissions in order to be able to use OIDC. Add the following at the top of your workflow's YML file. You can also add it at the job level to reduce the scope if needed.

permissions:  id-token: write # required to use OIDC authentication  contents: read # required to checkout the code from the repo

You can now use the configure-aws-credentials Github action in the job that needs to assume the role. Add this step to generate credentials before doing any call to AWS:

- name: configure aws credentials  uses: aws-actions/configure-aws-credentials@v1  with:    role-to-assume: arn:aws:iam::1234567890:role/your-role-arn    role-duration-seconds: 900 # the ttl of the session, in seconds.    aws-region: us-east-1 # use your region here.# You can now execute commands that use the credentials👇- name: Serverless deploy  run: sls deploy --stage dev

The configure AWS credentials step will use the OIDC integration to assume the given role, generate short-lived credentials, and make them available to the current job.

💡 If you want to take security even further, you can also keep your role's ARN used in role-to-assume in a Github secret.

Automate

The guys at configure-aws-credentials shared a CloudFormation template that you can use to automate the AWS configuration steps.

I took it one step further; I hosted that template and created a deployment link for you.

Click here to deploy it into your account.

Fill in the parameters:

GitHubOrg: your organization name, or your Github username
RepositoryName: the repository that needs access to your AWS account
OIDCProviderArn: your existing OIDC provider's ARN, if you have one already. If you don't, leave it empty and one will be created for you. (Remember that you only need one per account).

Note: The created role will not have any Policy attached to it. You will still need to attach the ones that your workflow needs to it after that.

Conclusion

As you can see, securing your account doesn't have to be hard. The part that might require a little more effort is to define the right Policies if you want to follow the principle of least privileges (which you should!).

For more content like this, follow me here on Hashnode, on Twitter @Benoit_Boure, and dont forget to subscribe to my newsletter.

How to Observe EventBridge Events with AppSync Subscriptions

Benoît Bouré — Sat, 02 Oct 2021 21:09:03 GMT

I recently came across David Boyne's blog post: How to Observe EventBridge Events with Postman and WebSockets. What a great idea! But, then I thought:

I can do the same with AppSync Subscriptions!

I had to try! Here is what I achieved:

Building the basic AppSync API

The idea was simple. I needed the following components:

An AppSync API
A Mutation that receives events from EventBridge
A Subscription that is attached to the aforementioned Mutation
An EventBridge rule that sends events to the AppSync Mutation (target)

I also wanted to be able to filter events I was interested in. Here, I thought about two options:

Filter the events in the EventBridge rule.
Send all events to AppSync and use AppSync to filter them, thanks to subscription arguments

I went with the second approach. It would give me more flexibility to filter the events at query time instead of having to re-deploy each time I wanted a new filter.

Here is the GraphQL Schema I created:

type Mutation {  sendEvent(event: EventBridgeMessageInput!): EventBridgeMessage}type Subscription {  subscribe(    source: String    detailType: String    account: String    resources: [String!]  ): EventBridgeMessage    @aws_subscribe(mutations: ["sendEvent"])}type EventBridgeMessage {  id: ID!  version: String!  detailType: String!  source: String!  account: String!  time: AWSDateTime!  region: String!  resources: [String!]  detail: AWSJSON!}input EventBridgeMessageInput {  id: ID!  version: String!  detailType: String!  source: String!  account: String!  time: AWSDateTime!  region: String!  resources: [String!]  detail: AWSJSON!}

I also needed to setup the Mutation. I used a NONE data source for that and a simple mapping template that just returns the received payload.

All done! Now, by executing the sendEvent Mutation, it gets delivered to the subscription! 🙌

All that was left to do was to configure EventBrige and set the Mutation as a target.

First attempt: API Destinations

My first attempt was to use API Destinations. I followed this awesome tutorial and defined my Input Path and Input Transformer rules which looked like this:

InputPathsMap:  version: $.version  id: $.id  detailType: $.detail-type  source: $.source  account: $.account  time: $.time  region: $.region  resources: $.resources  detail: $.detailInputTemplate: |  {    "query": "mutation SendEvent($event: EventInput!) { sendEvent(event: $event) { version id detailType source account time region resources detail } }",    "operationName": "SendEvent",    "variables": {      "event": {        "version": "",        "id": "",        "detailType": "",        "source": "",        "account": "",        "time": "",        "region": "",        "resources": "",        "detail":       }    }

Unfortunately, that didn't work! 😞

The problem is that in EventBridge, the detail attribute is an arbitrary JSON object which could have any shape. This is the reason I used an AWSJSON type in my GraphQL schema (I wanted to receive any event). The problem is that AppSync expects the JSON to be stringified!

After some investigation, I could not find any way for EventBridge to stringify JSONs. So, that was a dead end.

AWS Lambda to the rescue!

If EventBridge cannot do it, Lambda surely can! So, I wrote a simple lambda that receives the event, reformats it and calls the AppSync endpoint. I then just configured the Lambda as an EventBridge target. (See the code here).

Note: I also added an IAM authentication method to the AppSync API that Lambda can use to call the Mutation (in addition to the API key used by the subscription).

All set! Now, running the following subscription:

subscription MySubscription {  subscribe {    resources    region    source    version    detailType    detail  }}

And sending an event into Event Bridge

aws events put-events --entries '[{"DetailType": "my.detail.type", "Source": "my.source", "Detail": "{\"foo\": \"bar\"}"}]'

Response:

{  "data": {    "subscribe": {      "resources": [],      "region": "us-east-1",      "source": "my.source",      "version": "0",      "detailType": "my.detail.type",      "detail": "{\"foo\":\"bar\"}"    }  }}

It works! 🎉

The power of AppSync subscriptions

One of the great features of AppSync subscriptions is that you can specify which changes you are interested in at query time. You can do that by adding arguments to the subscription endpoint. Whatever value you pass in the input, you will only receive changes that match the Mutation's response fields values.

So, I can now do queries such as

## Will match events with detail-type = "my.detail" onlysubscription {  subscribe(detailType: "my.detail") {    id    detailType    detail  }}## Will match events with source = "my.source" onlysubscription {  subscribe(source: "my.source") {    id    detailType    detail  }}## Will match events with detail-type = "my.detail" AND source = "my.source"subscription {  subscribe(detailType: "my.detail", source: "my.source") {    id    detailType    detail  }}

Isn't that great? I can now listen to exactly the events I am interested in 🔥

Limitations & gotchas

Unfortunately, this technique has some limitations. It cannot filter events based on the content of the detail field. This is because the data comes stringified.

Also, filters only work when the values exactly match. You cannot use advanced filters such as prefix, anything-but, etc. These are filters supported by eventBridge, not by AppSync.

Note that any advanced filter can still be achieved through filters at the EventBridge rule level, of course!

Conclusion

In this post, I showed you how we can observe EventBridge events through AppSync subscriptions and how we can even filter them at query time. Although its usage is somewhat limited, it can probably still be very helpful when you only need to filter on the detailType or source values, for example. You can easily use it to debug/test your application.

Find the full code of this implementation on Github

A big thanks to David Boyne for the inspiration!

How to Avoid Composite IDs in GraphQL with DynamoDB (feat. AppSync)

Benoît Bouré — Tue, 14 Sep 2021 20:29:04 GMT

In this article, I will discuss a few tricks on how to optimize your GraphQL API for items that use composite keys in DynamoDB. It will work no matter the GraphQL server, but if you're using AppSync, you're in luck because I'll share a few (VTL) code snippets too 🙂

In DynamoDB, it is very common to use composite keys (ie: Partition Key and Sort Key). This allows us to group related items together. Moreover, the combination of the partition key (PK) and sort key (SK) is what uniquely identifies the Item.

To illustrate this, let's take the following simple example. Imagine we have a DynamoDB states table that contains states from different countries. We might structure our data like so:

Here, the PK identifies the country code, and the SK the state code. They both together uniquely identify one Item in the database (ie: a state in a given country) and ensures their uniqueness at the same time. Additionally, this gives us some free access patterns (eg: Find all states for a given country).

Now, imagine that we want to serve the Items from a GraphQL endpoint. The query might look like this:

getState(countryCode: "US", stateCode: "TX") {  name}

This works well, but has several inconvenients:

This is not practical

The client has to pass two arguments in order to identify which item it wants to query. Understanding which fields must be used (eg: from other queries) might not be as straightforward as it seems. Also, the frontend often needs a unique key to distinguish items/components from each other (think "key" attribute in React), forcing it to compute it every time.

The client should not have to worry about the underlying data structure

In an ideal world, the client should not have to worry about how the data is being stored. By having a composite id in our API, we are exposing how the data is organized in the data layer and make the client depend from it.

In Front end applications, the client cache functionality might not work out of the box

Most GraphQL clients, like Apollo, offer a solid and powerful cache functionality. However, by default, the id field (with an ID type) is what they usually use to uniquely identify the Item in the (cache) datastore. In the above example, there isn't any (Neither in the request nor in the response). The client does not know that the countryCode/stateCode combination is what uniquely identifies a State. As a result, the item would never be cached.

Sure, we can always customize the cache ids, but we would have to do it for every Item type and in every client (ie: web, mobile, etc).

The solution: Denormalizing a unique id

Wouldn't it be nice if we could have a unique id field for our State items? As mentioned earlier, every State is a unique combination of the country code and the state code. In this case, we could even use the iso code of each state for that. For example, Texas' id can be US-TX.

Let's add an id attribute to our data model.

Now, all we have to do is to denormalize the id by concatenating the country and state codes. Doing so at creation time will avoid us having to generate it on the fly in every query (Plus, it's always nice to receive a pre-computed id field everywhere, even in the backend, for future uses). We can easily do that when saving the item in DynamoDB.

Example using AppSync VTL

#set($countryCode=$ctx.args.input.countryCode)#set($stateCode=$ctx.args.input.stateCode)#set($attributeValues={})$util.qr($attributeValues.put("id", $util.dynamodb.toDynamoDB("${countryCode}-${stateCode}")))#foreach($item in $ctx.args.input.entrySet())  $util.qr($attributeValues.put("${item.key}", $util.dynamodb.toDynamoDB($item.value)))#end{  "version": "2018-05-29",  "operation": "PutItem",  "key": {    "countryCode": $util.dynamodb.toDynamoDBJson($countryCode),    "stateCode": $util.dynamodb.toDynamoDBJson($stateCode)  },  "attributeValues": $util.toJson($attributeValues)}

Awesome! But now, how do we fetch data from GraphQL? Let's update the query and use a unique id parameter with an ID! type.

type Query {  getState(id: ID!): State!}type State {  id: ID!  countryCode: String!  stateCode: String!  name: String!}

Great! Now, the backend receives a unique argument. However, DynamoDB still requires us to pass a countryCode (PK) and stateCode (SK) composite key. This will require some additional gymnastics at the resolver level. This is pretty straightforward, though. All we have to do is to split the id argument by '-'. You can do that in your favourite language depending on your use case. If you are using AppSync, here is how you can easily do that in VTL.

#if(!$ctx.args.id.contains("-"))  ## Invalid iso code  $util.error("Invalid Id", "InputError")#end#set($parts=$ctx.args.id.split("-"))#set($countryCode=$parts.get(0))#set($stateCode=$parts.get(1)){  "version": "2018-05-29",  "operation": "GetItem",  "key": {    "countryCode": $util.dynamodb.toStringJson($countryCode),    "stateCode": $util.dynamodb.toStringJson($stateCode)  }}

As you can see, this requires very little logic to implement and it solves all our issues. And it's completely transparent to the client. 🙌

Here is what the new query looks like:

getState(id: "US-TX") {  id  name}

Conclusion

In this post, I showed you how to handle composite DynamoDB keys with GraphQL by hiding them from the client behind a unique attribute. By denormalizing this attribute in DynamoDB and implementing some simple logic in the resolvers, you can save yourself from more annoying issues that we identified earlier.

How to Handle Many to Many relations in AppSync

Benoît Bouré — Sun, 16 May 2021 20:07:31 GMT

In this post, I will teach you how you can handle many-to-many relations with AWS AppSync, how to avoid denormalization and still avoid the n+1 problem.

TL;DR; Use a Pipeline resolver to first fetch the relations followed by a BatchGetItem operation to retrieve all related items in one single query.
Find the full solution on GitHub

The problem

One of the most common problems developers face when designing DynamoDB databases is many-to-many relationships. Usually, the recommended way is to denormalize your data. You duplicate all the fields required by your access pattern in the relation Item so that they are returned along with it. It avoids doing extra queries to the related items, as NoSQL databases can't operate JOIN operations.

Let's take an example. Imagine you are building an application that has users and groups. Users can be in several groups and groups may have multiple users.

Your data model might look like this:

There are 2 problems with this design with GraphQL APIs:

1) The client might ask for fields that are not denormalized in the relation.

Since GraphQL is agnostic of the underlying data source and the types defined in the schema have all the fields defined (not just those that are denormalized), a query might request them. In our example, it might be the user's bio or profile picture. If these fields are not be denormalized in be the relation, they would be missing in the GraphQL response.

query {  getGroupUsers(id: ID!) {    id    name    # bio and picture are not denormalized in the relation    bio    picture  }}

One approach to fix this would be to create a different type which is a subset of User. However, this defeats the purpose of GraphQL and might also not be what you want.

2) It is hard to keep the data up to date when it changes.

What if the user changes his username (think Twitter)? You will have to go through all the relation Items and update them. If the number of items is small, it can be manageable, but imagine a group that has thousands or millions of users! This can become a hassle to maintain and data can easily become out of sync.

Also, as explained before, with GraphQL in mind, you might end up having to denormalize the whole user item. This would not be a viable solution.

Resolvers to the rescue

One of the characteristics of GraphQL is resolvers. Resolvers are used to resolve child entities using data from the previously resolved ones (the source in AppSync).

One of the common approaches to solve the above problems would be to use a different resolver for the child entity (in our case: user). The implementation is pretty straightforward: first, resolve the relations, and then use them to resolve the underlying users (using the user id they contain).

For that to work, you would need to nest the user entity under the relation entity. This might not be a bad thing anyway, because you might want to return some metadata related to the relationship as well, such as a joinedAt attribute.

Example:

query {  getGroupUsers(id: ID!) {    joinedAt    user {      id      name      bio      picture    }    }}

user is attached to a resolver that receives the user id from the group-user relation.

There is one problem with this approach though: It introduces an n+1 problem. ie: every child entity will trigger one extra query to DynamoDB each. If a group has 10 users, you will end up executing 11 queries (one for all the relations and 10 for each individual user)

A better approach: Pipeline & Batch resolvers

Pipelines allow you to compose a resolver out of different steps or functions. If you are not familiar with pipelines yet, I suggest you read the documentation

AppSync also supports DynamoDB Batch resolvers which you can use to act on several items in one single DynamoDB round-trip. There are three supported operations: BatchGetItem, BatchPutItem , and BatchDeleteItem.

The one we are interested in here is BatchGetItem. It can be used in order to retrieve up to 100 DynamoDB items in one single DynamoDB request.

With all these elements in hand, we can implement a pipeline resolver with two functions:

fetch the group-user relation items
fetch all the underlying user entities in one single query

Let's see how that works and build the getGroupUser endpoint.

The full solution is available on GitHub

In the getGroupUsers function (the first function of the pipeline), we first fetch the relation items between the group and the users. We also make sure not to go over the limit of 100 items imposed by BatchGetItem. After that, we'll need to paginate (more on that later).

## getGroupUsers - request mapping#set($limit=$util.defaultIfNull($ctx.args.limit, 10))#if($limit>100)  #set($limit=100)#end{  "version": "2018-05-29",  "operation": "Query",  "limit": $util.toJson($limit),  "nextToken": $util.toJson($ctx.args.nextToken),  "query" : {    "expression": "#PK = :PK and begins_with(#SK, :SK)",    "expressionNames" : {      "#PK": "PK",      "#SK": "SK"    },    "expressionValues" : {      ":PK": $util.dynamodb.toStringJson("GROUP#${ctx.args.id}"),      ":SK": $util.dynamodb.toStringJson("USER#")    }  }}

The response mapping just forwards the items to the next function. We also keep nextToken into the stash in order to return it later to the client for pagination.

## getGroupUsers - response mapping$util.qr($ctx.stash.put("nextToken", $ctx.result.nextToken))$util.toJson($ctx.result.items)

The getBatchUsers function is where the magic happens. We build the Primary Key pairs (PK and SK) of our user items and pass them to the GetBatchItem query.

Before that, if the previous request returned no result, we just return an empty array straightway (bypassing thereby the extra query to DynamoDB).

## getBatchUsers - request mapping#if($ctx.prev.result.size() == 0)    #return([])#end#set($keys=[])#foreach($item in $ctx.prev.result)  ## the user and PK/SK is the SK from the Item received from the previous function  $util.qr($keys.add({    "PK": $util.dynamodb.toDynamoDB(${item.SK}),    "SK": $util.dynamodb.toDynamoDB(${item.SK})  }))#end{    "version": "2018-05-29",    "operation": "BatchGetItem",    "tables" : {        ## replace this with your table's name        "table-name": {            "keys": $util.toJson($keys)        }    }}

Once we get to our response mapping, we have to restructure our data a bit and inject the user entities into the relation items returned by the previous pipeline function.

## getBatchUsers - response mapping#set($items=[])#foreach($item in $ctx.result.data.get("table-name"))  #set($groupUser=$ctx.prev.result.get($foreach.index))  $util.qr($groupUser.put("user", $item))  $util.qr($items.add($groupUser))#end$util.toJson($items)

Finally, in our after mapping, we return the data we previously aggregated and we also send the nextToken back to the client to allow for pagination.

## getUsers - after mapping$util.toJson({  "nextToken": $ctx.stash.nextToken,  "items": $ctx.result})

Here you have it! Now, no matter how many user entities the group has, you would only be sending 2 requests to DynamoDB!

💡 Did you know?
In DynamoDB, BatchGetItem does not guarantee to return the items in any particular order. However, AppSync does the heavy lifting for you and returns them in the same order as the keys. You, therefore, don't need to worry about it. 🙌

There is one important thing to notice, though:

BatchGetItem will have zero impact on your AWS bill. Fetching 100 items in batch will consume exactly the same RCUs as doing 100 individual GetItem requests. The only difference is that it can reduce the HTTP overhead and slightly improve latency.

Conclusion

In this post, we learned how to reduce the number of DynamoDB requests in many-to-many relationships with AppSync using pipeline resolvers and fetching items in batch from DynamoDB.

If you are interested in AppSync, I regularly share content related to it on Twitter and on this blog, so make sure to follow me and subscribe to my newsletter.

If you have any question, feel free to drop them in the comment section, and if you would like to receive advice or coaching from me about AppSync or Serverless, you can book a 1:1 conference or chat with me

Understanding the DynamoDB Sort Key Order

Benoît Bouré — Sun, 02 May 2021 16:33:35 GMT

If you have been working with DynamoDB, you are probably quite familiar with the notion of Partition Keys and Sort Keys (aka PK and SK). You also know that Sort Keys are... well, sorted in ascending order by default. If your SK is of type Number the items will be sorted in numeric order (1, 3, 10, 50, 400), while if it's of type String they are sorted in "order of UTF-8 bytes". But what does that even mean and how does it affect the order of the items? Let's find out.

What is is

As per Wikipedia,

UTF-8 is a variable-width character encoding used for electronic communication [...] UTF-8 is capable of encoding all 1,112,064 valid character code points in Unicode using one to four one-byte (8-bit) code units.

What it means is that each and every character is assigned a specific byte, or code point, which have a numerical value.

It's that numerical value that DynamoDB uses to determine the order of your Sort Key.

If you want to know what character comes after which, a good start is to remember the order of the most common characters:

numbers
uppercase letters
lowercase letters

Notice that letters are "grouped" by case (first the capital letters, and after that lowercase), which means that Zoey will come before alligator. This is really important to know if you want to avoid surprises later.

For a complete list of UTF-8 characters, including special ones, sorted by their bytes order, refer to this page.

How to use it to your advantage

Once you know how strings are being sorted, you can use that knowledge to your advantage. A very common practice with DynamoDB and single table design is to pre-join data by placing them into the same partition.

Depending on your access pattern, you might either want your parent item to be at the beginning, or at the end of the partition.

For example, if you have Orders (SK prefixed with ORDER#) and Order Items (SK prefixed with ITEM#), you will perhaps want the ORDER# item at the beginning of the partition, and the ITEM# items sorted by number, in ascending order after it. However, O comes after I, which means that ORDER# will be placed at the end of the partition.

You could scan the index backwards, sure, but then your items would be sorted in reversed order, breaking the access pattern.

How to fix that?

Use the UTF-8 sorting mechanism to your advantage! You need ORDER# to start with a character that comes before I to make sure it comes before ITEMS#. For that, you can use any character that comes before I. Any letter from A to H would work, but it might not be user friendly (for debugging and inspecting the data later) and could change the meaning of your prefix. Instead, it is usually most common to use special characters that come before A: for example #, $ or %. Let's use $ and rename ORDER# to $ORDER#.

Now the Order item is at the top of the partition and all items are sorted as expected. 🎉

Note that you can use the same trick in order to sort items in reversed order if necessary.

Example:

In the above example, you might want to scan the index in reverse order and get the latest vouchers at the top, in descending order. You can force the USER# item to be at the end of the partition by prefixing it with a character that is higher in the UTF-8 ranking. | or ~ are good examples.

Now, ~USER# is at the end of the partition and you can scan the index backwards.

Conclusion

In this article, you learned what is the UTF-8 bytes order, and how you can use it to your advantage in order to force the order of items in your DynamoDB partitions.

If you would like to read more content like this, follow me on Twitter and subscribe to my newsletter on Hashnode.

Photo credits: Markus Spiske on unsplash

5 Ways to Prevent Accidentally Deleting Your CloudFormation Resources

Benoît Bouré — Fri, 19 Mar 2021 19:05:09 GMT

CloudFormation is an AWS service that allows you to maintain Infrastructure as Code (IaC). Whether you are using it natively (with JSON or YML) or through a third-party service such as the Serverless Framework, AWS CDK or SAM, it is a great way to make your infrastructure reproducible across various stages. It also makes the deployment process easily automatable through CI/CD pipelines. In other words, it makes managing your infrastructure less prone to human errors.

Although automating things sounds like a good idea, one of the downsides of CloudFormation is that it is hard to understand what is going on under the hood and what exactly is going to happen to your stack during the process, turning every single deployment into a potential 7 minutes of terror story. Imagine that an entire resource gets deleted and all its data with it. Make just one mistake and you will only find out when it's too late. What if it is a production database? It happens more than you think. 😱😱😱

To avoid this kind of disasters, I will show you 5 ways to protect your resources from deletion with CloudFormation.

1. Review the Changeset

The first technique is to understand which actions will effectively be executed during the update before they happen. CloudFormation offers a tool that lets you pre-visualize all the modifications that would be applied by a change in your template.

To use it, follow these simple steps:

go to your CloudFormation console and select the stack that you want to update
click the Stack actions button and then select Create change set for current stack.
Choose Replace current template and upload your new template, or enter an S3 path to the file.
From there, just follow the guide in order to create the changeset

It might take a few seconds for the changeset to be generated. Once it is done, the console will show you a detailed summary of what actions would be executed if you decided to proceed with the update.

Example:

As you can see, you can easily spot what resources will be Modified, or Removed and if they require replacement. Once you are confident enough that this is what you intend to do, you can hit the Execute button with a certain peace of mind 🧘

This method is useful when you want to visually confirm a change that you are unsure about. However, it is not always convenient. Let's explore other solutions.

2. Retain Specific Resources

With the DeletionPolicy attribute, you can control what CloudFormation should do with a resource in the event of it being removed from the template, or if the stack is deleted altogether. The default value is Delete which is probably not what you want in some cases. By changing the value to Retain, you are telling CloudFormation to keep the resource instead.

Example:

Resources:  MyTable:    Type: AWS::DynamoDB::Table    DeletionPolicy: Retain    Properties:      TableName: mytable

One thing to notice here is that this method will not make your deployment fail. CloudFormation will execute all your changes. The difference is that any instruction to delete a resource with a Retain policy will be ignored and the resource will be "detached" from the stack instead. This also means that if you try to add the resource back to the stack, any subsequent deployment might fail because CloudFormation will try to re-create the resource that already exists (e.g: the DynamoDB table already exists with that name). If that happens, you can check this guide for Importing Existing Resources into a CloudFormation Stack.

Attention!

This capability doesn't apply to resources whose physical instance is replaced during stack update operations. For example, if you edit a resource's properties such that CloudFormation replaces that resource during a stack update.

This extract from the official documentation is very important. What it means is that if you change a property of a resource that requires replacement (e.g.: changing a DynamoDB table's name), the deletion policy will not apply, and it would still be deleted and re-created. Before you change a property, you should pay attention to the Update requires section of the CloudFormation documentation for that resource's attribute.

With certain types of resources, like EC2 volumes or RDS instances, you can also use Snapshot. In that case, the asset would still be deleted but a backup would be executed first.You can read more about this strategy by reading the official documentation.

3. Define a Stack Policy

A more advanced way of protecting your resources is through Stack Policies. With Stack Policies, you can constraint what actions are allowed to be executed or not according to specific rules that you define. When you add a policy, all resources are protected by default. You need to explicitly Allow the changes on the resources that you want to update. You can think of it as an IAM policy, but the difference here is that it only applies during stack updates.

Example:

The following policy allows any change on all resources, except for the resource whose id is MyDynamoDBTable. By explicitly denying Update:Delete and Update:Replace, the resource is protected against deletion and replacement. On the other hand, modifications are still allowed (e.g.: Add a Global Secondary Index).

{  "Statement": [    {      "Effect": "Allow",      "Action": ["Update:*"],      "Principal": "*",      "Resource": "*"    },    {      "Effect": "Deny",      "Action": ["Update:Delete", "Update:Replace"],      "Principal": "*",      "Resource": ["LogicalResourceId/MyDynamoDBTable"]    }  ]}

To learn how to write custom stack policies, refer to the documentation

4. Enable Stack Termination Protection

If all you worry about is someone (or a process) tearing down a whole stack by mistake, what you need is Stack termination protection. When enabled, CloudFormation will reject any attempt of deleting the stack.

To enable termination protection:

Go to CloudFormation and select the stack that you want to protect.
Chose Stack actions followed by Edit termination protection
Chose Enabled and hit Save

5. Place Sensitive Resources in Different Stacks

Last but not least, if you are too paranoid about deleting precious resources and all the data they contain, the best thing you can do is isolate them into their own stack. Place each one of them in a dedicated template and touch them only if and when you need to. By doing so, you will not risk destroying them while deploying other stacks that change more often.

Which One Should You Use?

Each solution has its own pros and cons. They also behave differently in different situations. To help you better understand the differences, I created a simple cheat sheet.

Will my resource be protected if	It is removed from the stack	It requires replacement	The stack is deleted
Changeset review	Manual (1)	Manual (1)	No
DeletionPolicy	Yes	No	Yes
Stack Policy	Yes (2)	Yes (2)	No
Stack Termination Protection	No	No	Yes
Resource Isolation	No (3)	No (3)	No (3)

(1) You will need to manually review and approve the changes.

(2) Provided you configure the policy properly

(3) On its own, resource isolation will not protect any resource. You'll need to combine it with other solutions

As you can see, there is no one-fits-all solution (none of the rows has all Yeses). You will need to use more than one if you want full protection.

Conclusion

I just showed you 5 ways to avoid accidental deletion of CloudFormation resources:

Review the changeset is good if you want to sporadically review changes manually before applying some important changes.
The DeletionPolicy attribute will save your data in the event of a resource removal or stack deletion, but it won't help against resource replacement.
Stack Policies will save you from accidentally removing a resource from the stack and changes that force a replacement. On the other hand, it won't be of any help if the stack is deleted altogether.
Stack termination protection will only prevent accidental deletion of the stack.
Placing sensitive resources in isolation will help against some human mistakes, but on its own, it will not protect your data.

Use the one that best fits your needs and your particular use-cases. If you need complete protection, you can combine them together and benefit from several safety nets at the same time.

Hopefully, these measures will help you and your team sleep better at night 😴.

If you would like to read more content like this, follow me on Twitter and subscribe to my brand new newsletter on Hashnode.

How to Store Large Attribute Values in DynamoDB

Benoît Bouré — Mon, 15 Mar 2021 06:54:20 GMT

DynamoDB is a fully managed NoSQL database that delivers single-digit millisecond performance at any scale. In order to keep up with its promises, there are a couple of constraints and good practices that you need to follow. One of them is to keep your items as small as possible. This is true not only for performance but also for cost. With DynamoDB, you pay per amount of data that you read or write as well as for storage. Reducing your data size is important if you want to reduce your monthly bill.

On top of that, DynamoDB also comes with some hard-limits including:

Any item cannot exceed 400 KB in size.
Query and Scan operations are limited to 1 MB of data scanned (After that, you will be forced to paginate).

If you handle large amounts of data, you can hit those limitations very quickly.

For example, imagine that you are building a blog application (like Hashnode). You might store posts and comments in a DynamoDB table. These kinds of items contain free text that can be quite long and grow very fast. A blog post can easily reach 10 to 20 KB or more. When you know that half an RCU allows you to read 4 KB of data (providing that you are doing eventually consistent reads), we are talking about 2 to 3 RCUs for every read, if not more if you have several other attributes!

When dealing with such large data, AWS recommends compressing them and storing them as Binary attributes. In this blog post, I will show you how to compress long text strings with gzip and how to store them into DynamoDB. We will then inspect the read and write units consumed and compare them with the corresponding uncompressed version.

1. Writes

In this demo, I'll be using node.js but you should easily be able to apply these techniques to your favourite programming language.

For the purpose of this test, we'll write a simple script that will first generate a dummy blog post using lorem-ipsum. We will then save it to DynamoDB twice: once as raw text (uncompressed) and once compressed with gzip. We will also make use of the ReturnConsumedCapacity property of DynamoDB so that it returns the consumed capacity (WCU) for both operations.

//write.jsconst AWS = require('aws-sdk');const { loremIpsum } = require('lorem-ipsum');const { gzipSync } = require('zlib');const content = loremIpsum({    count: 20,    units: "paragraph",    format: "plain",    paragraphLowerBound: 5,    paragraphUpperBound: 15,    sentenceLowerBound: 5,    sentenceUpperBound: 15,    suffix: "\n\n\n",});// output some stats about the text's lenght.console.log(`Generated a text with ${content.length} characters and ${content.split(' ').length} words`);// compress the contentconst compressed = gzipSync(content);// more stats about the contentconsole.log(`total size (uncompressed): ~${Math.round(content.length/1024)} KB`);console.log(`total size (compressed): ~${Math.round(compressed.length/1024)} KB`);// config DynamoDBAWS.config.update({ region: 'eu-west-1' });const dynamoDbClient = new AWS.DynamoDB();dynamoDbClient.putItem({    "TableName": "blog",    "ReturnConsumedCapacity": "TOTAL",    "Item": {        "author": {            "S": "bboure"        },        "slug": {            "S": "raw-blog-post"        },        "title": {            "S": "My blog post"        },        "content": {            "S": content,        }    }}).promise().then(result => {    console.log('Write capacity for raw post', result.ConsumedCapacity );});dynamoDbClient.putItem({    "TableName": "blog",    "ReturnConsumedCapacity": "TOTAL",    "Item": {        "author": {            "S": "bboure"        },        "slug": {            "S": "compressed-blog-post"        },        "title": {            "S": "My blog post"        },        "content": {            "B": compressed,        }    }}).promise().then(result => {    console.log('Write capacity for compressed post', result.ConsumedCapacity );});

In the script above, we generate a text of 20 paragraphs. Each paragraph will have between 5 and 15 sentences, and each sentence will be 5 to 15 words long. That is enough to generate a text of around 2000 words. Then, we compress the text and save both versions into DynamDB.

Let's run the script:

$ node write.jsGenerated a text with 12973 characters and 1943 wordstotal size (uncompressed): ~13 KBtotal size (compressed): ~4 KBWrite capacity for compressed post { TableName: 'blog', CapacityUnits: 4 }Write capacity for raw post { TableName: 'blog', CapacityUnits: 14 }

As you can see, the raw text was around 13 KB and consumed 14 WCUs, while the compressed one only 4 KB for 4 RCUs. That looks right since 1 WCU computes for 1 KB of data.

By compressing the data we just saved ourselves 10 WCUs. That's a 70% gain! Not only that, but we also reduced the item size by 70%! Since DynamoDB also charges us for storage, that can make a huge difference on our AWS bill! 🎉

2. Reads

Now that we saved our blog post in DynamoDB we want to read it back. Let's create a new script that will read the items back and see how many RCUs they are consuming.

//read.jsconst AWS = require('aws-sdk');const { gunzipSync } = require('zlib');AWS.config.update({ region: 'eu-west-1' });const dynamoDbClient = new AWS.DynamoDB();dynamoDbClient.getItem({    "TableName": "blog",    "ReturnConsumedCapacity": "TOTAL",    "Key": {        "author": {            "S": "bboure"        },        "slug": {            "S": "raw-blog-post"        },    }}).promise().then(result => {    console.log('Read capacity for raw post', result.ConsumedCapacity );});dynamoDbClient.getItem({    "TableName": "blog",    "ReturnConsumedCapacity": "TOTAL",    "Key": {        "author": {            "S": "bboure"        },        "slug": {            "S": "compressed-blog-post"        },    }}).promise().then(result => {    console.log('Read capacity for compressed post', result.ConsumedCapacity );    // uncompress post content    const content = gunzipSync(result.Item.content.B).toString();    console.log(`Original text with ${content.length} characters and ${content.split(' ').length} words`);});

Let's run it:

$ node read.jsRead capacity for compressed post { TableName: 'blog', CapacityUnits: 0.5 }Original text with 12973 characters and 1943 wordsRead capacity for raw post { TableName: 'blog', CapacityUnits: 2 }

At read time, we only consumed 0.5 RCUs against 2 for the uncompressed version. That's 4 times less! And as you can see, it is just as easy to uncompress the data back into its original form.

3. Secondary indexes

Before we call it a day, there is one last test I'd like to make. Sometimes, you want to add secondary indexes to your table. In our blog example, we could add a GSI that will index blog posts by author and sort them by timestamp. One could argue that you should probably avoid projecting the entire blog content in all your indexes (and I would definitely agree with that), but sometimes, you might not have a choice; and for the sake of completeness, we'll try it out.

Let's create another script that will test just that. I'm not going to copy the full script again here. Instead, just know that I added a timestamp attribute and I created a GSI index that projects all the attributes (Index name: timestamp, PK: author, SK: timestamp).

$ node write.jsGenerated a text with 13673 characters and 1986 wordstotal size (uncompressed): ~13 KBtotal size (compressed): ~4 KBWrite capacity for compressed post {  TableName: 'blog',  CapacityUnits: 12,  Table: { CapacityUnits: 4 },  GlobalSecondaryIndexes: { timestamp: { CapacityUnits: 8 } }}Write capacity for raw post {  TableName: 'blog',  CapacityUnits: 42,  Table: { CapacityUnits: 14 },  GlobalSecondaryIndexes: { timestamp: { CapacityUnits: 28 } }}

As you can see, GSIs can be greedy in capacity units. That is because every write you make must be replicated to all your indexes. Our secondary index alone consumed 8 WCUs for the compressed post and a whopping 28 WCUs for the uncompressed version! Add the 4 and 14 WCUs that correspond to the table to that and we are at 12 vs 42 WCUs!

Note: To be honest, I was expecting the GSI to consume the same amount of WCU as the table index (i.e.: 4 and 14). For some reason that I still don't understand, that amount is doubled. I could not find any information about why that happens. If you happen to know, please don't hesitate to drop a comment below. 🙏

Here, even though the saving in terms of percentage is about the same (70%), the difference of capacity units starts to increase. We consumed 30 WCUs less with compressed content! Over time, this can quickly make a difference.

Note that here, there would be no difference in terms of RCUs when reading the data back. DynamoDB will read from the index that you provide in the query, and that index only.

Conclusion

We just learned that by compressing large contents before saving them in DynamoDB, you can expect saving up to 70% in WCU, RCU and storage cost. This is significant enough to take the time and make the extra effort of compressing/decompressing the data as you read/write it.

If you'd like to read more content like this, follow me on Twitter

How to use TypeScript with AppSync Lambda Resolvers

Benoît Bouré — Wed, 03 Mar 2021 21:29:34 GMT

Edit - 2023-04-11: If you're interested in creating JavaScript resolvers with TypeScript rather than utilizing Lambda functions, check out this alternative article.

One of the great benefits of GraphQL is typing! Define your schema, and GraphQL enforces the input/output "shape" of your endpoints data.

If you are using Lambda as your AppSync resolvers with the node.js runtime, you might be using TypeScript, too. If you do, you might also be defining TS types that correspond to your schema. Doing this manually can be tedious, is prone to error, and is basically doing the same job twice! 🙁 Wouldn't it be great if you could import your GraphQL types into your code automatically?

In this article, I'll show you how to generate TypeScript types directly from your GraphQL schema, just by running a simple command line. Then, I'll teach you how to use those types in your Lambda resolvers.

Let's begin.

Pre-requisites

You should already have a basic AppSync project setup with a defined GraphQL schema (If you don't have one already, you can use the example down below).

For the purpose of this tutorial, I will take this simple schema as an example:

type Query {    post(id: ID!): Post}type Mutation {    createPost(post: PostInput!): Post!}type Post {    id: ID!    title: String!    content: String!    publishedAt: AWSDateTime}input PostInput {    title: String!    content: String!}

Setting up the project

Install the dependencies

We will need to install three packages:

npm i @graphql-codegen/cli @graphql-codegen/typescript @types/aws-lambda  -D

The first two packages belong to the graphql-code-generator suite. The first one is the base CLI, while the second one is the plugin that generates TypeScript code from a GraphQL schema.

@types/aws-lambda is a collection of TypeScript types for AWS Lambda. It includes all sorts of Lambda event type definitions (API gateway, S3, SNS, etc.), including one for AppSync resolvers (AppSyncResolverHandler). We'll use that last one later when we build our resolvers.

Create the configuration file

It's time to configure graphql-codegen and tell it how to generate our TS types. For that, we'll create a codegen.yml file:

overwrite: trueschema:  - schema.graphql #your schema filegenerates:  appsync.d.ts:    plugins:      - typescript

This tells codegen which schema file(s) it should use (in the example: schema.graphql), what plugin (typescript) and where the output should be placed (appsync.d.ts). Fell free to change these parameters to match your needs.

Support for AWS Scalars

If you are using special AWS AppSync Scalars, you will also need to tell graphql-codegen how to handle them.

💡 You need to declare, at the minimum, the scalars that you use, but it might be a good idea to just declare them all and forget about it.

Let's create a new appsync.graphql file with the following content:

scalar AWSDatescalar AWSTimescalar AWSDateTimescalar AWSTimestampscalar AWSEmailscalar AWSJSONscalar AWSURLscalar AWSPhonescalar AWSIPAddress

Don't place these types in the same file as your main schema. You only need them for code generation and they should not get into your deployment package to AWS AppSync.

We also need to tell codegen how to map these scalars to TypeScript. For that, we will modify the codegen.yml file. Add/edit the following sections:

schema:  - schema.graphql  - appsync.graphql # 👈 add this# and this 👇config:  scalars:    AWSJSON: string    AWSDate: string    AWSTime: string    AWSDateTime: string    AWSTimestamp: number    AWSEmail: string    AWSURL: string    AWSPhone: string    AWSIPAddress: string

Generate the code

We are all set with the configuration. Time to generate some code! Run the following command:

graphql-codegen

💡 You can also add "codegen": "graphql-codegen" to you package.json under the "scripts" section, and use npm run codegen.

If you look in your working directory, you should now see an appsync.d.ts file that contains your generated types.

export type Maybe = T | null;export type Exactextends { [key: string]: unknown }> = { [K in keyof T]: T[K] };export type MakeOptionalextends keyof T> = Omit & { [SubKey in K]?: Maybe };export type MakeMaybeextends keyof T> = Omit & { [SubKey in K]: Maybe };/** All built-in and custom scalars, mapped to their actual values */export type Scalars = {  ID: string;  String: string;  Boolean: boolean;  Int: number;  Float: number;  AWSDate: string;  AWSTime: string;  AWSDateTime: string;  AWSTimestamp: number;  AWSEmail: string;  AWSJSON: string;  AWSURL: string;  AWSPhone: string;  AWSIPAddress: string;};export type Query = {  __typename?: 'Query';  post?: Maybe;};export type QueryPostArgs = {  id: Scalars['ID'];};export type Mutation = {  __typename?: 'Mutation';  createPost: Post;};export type MutationCreatePostArgs = {  post: PostInput;};export type Post = {  __typename?: 'Post';  id: Scalars['ID'];  title: Scalars['String'];  content: Scalars['String'];  publishedAt?: Maybe'AWSDateTime']>;};export type PostInput = {  title: Scalars['String'];  content: Scalars['String'];};

Notice that, apart from some helper types at the top, different types are being generated:

Scalars

Contains all the basic scalars (ID, String, etc.) and the AWS custom Scalars.

Query and Mutation

These two types describe the full Query and Mutation types.

Post

This is our Post type from our schema translated into TypeScript. It is also the return value of the post query and the createPost mutation.

QueryPostArgs and MutationCreatePostArgs

These types describe the input arguments of the post Query and the createPost mutation, respectively.

💡 Did you notice the name pattern here? Argument types are always named Query[NameOfTheEndpoint]Args and Mutation[NameOfTheEndpoint]Args in PascalCase. This is useful to know when you want to auto-complete types in your IDE.

Use the generated types

Now that we have generated our types, it's time to use them!

Let's implement the Query.post resolver as an example.

Lambda handlers always receive 3 arguments:

event: contains information about the input query (arguments, identity, etc)
context: contains information about the executed Lambda function
callback: a function you can call when your handler finishes (if you are not using async/promises)

The shape of an AppSync handler is almost always the same. It turns out that there is a DefinitelyTyped package that already defines it. We installed it at the beginning of this tutorial. Let's use it!

The AppSyncResolverHandler type takes two arguments. The first one is the type for the event.arguments object, and the second one is the return value of the resolver.

In our case that will be: QueryPostArgs and Post, respectively.

Here is how to use it:

import db from './db';import { AppSyncResolverHandler } from 'aws-lambda';import {Post, QueryPostArgs} from './appsync';export const handler: AppSyncResolverHandler = async (event) => {    const post = await db.getPost(event.arguments.id);    if (post) {        return post;    }    throw new Error('Not Found');};

Now, our Lambda handler benefits from type-checking in 2 ways:

event.arguments will be of type QueryPostArgs (with the benefits of auto-complete!)
the return value, or the second argument of the callback, is expected to be of the same shape as Post (with an id, title, etc); or TypeScript will show you an error.

Advanced usage

There are lots of options that let you customize your generated types. Check out the documentation for more details!

Conclusion

By auto-generating types, you will not only improve your development speed and experience but will also ensure that your resolvers do what your API is expecting. You also ensure that your code types and your schema types are always in perfect sync, avoiding mismatches that could lead to bugs.

Don't forget to re-run the graphql-codegen command each time you edit your schema! It might be a good idea to automate the process or validate your types in your CI/CD pipeline.

How to use DynamoDB single-table design with AppSync

Benoît Bouré — Thu, 25 Feb 2021 19:57:09 GMT

AppSync is a fully managed service from AWS that lets you build and deploy scalable and secure GraphQL APIs in the cloud. DynamoDB is a NoSQL fully managed and scalable database. Both being serverless services, they are very often used together.

If you are interested in DynamoDB, you have also probably already heard about single-table design. However, that way of designing databases is often considered to be unhelpful with GraphQL.

In this blog post, I will share some ideas on how we can still use single-table design and its benefits with GraphQL, and show some techniques that I use with AWS AppSync.

Let's begin.

It's all about access pattern

If you have watched Rick Houlihan's talks at AWS re:invent (In 2017, 2018 and 2019), or read Alex DeBrie's book (I totally recommend it if you haven't), you probably know it by now: The key to success with single table design is that you should know your access patterns in advance.

It is likely that your access patterns will reflect what the different pages or views of your application will show. Something like

For a given user, show user details and his/her last 10 orders

Then as the user drills down to a particular order, you will show the order and all its order items (That is a second access pattern).

However, one of the key features of GraphQL is that you can fetch nested children as deep as you need. In our example, you could ask "Give me that user's details with his/her last 10 orders and all the items of those orders".

Under the hood, GraphQL uses resolvers that fetch data from the persistence storage and return it in the query response. Most of the time, each child is a different resolver that receives the parent (or source) data, that you can use to fetch related data. This is how GraphQL "joins" data, and this is probably the way you have been using DynamoDB with GraphQL so far. DynamoDB doesn't have JOINs, GraphQL fills that gap for you!

With all that in mind, it does not look like single-table design DynamoDB has many benefits to bring to GraphQL.

But wait a minute...

One of the spirits of GraphQL is no under- and over fetching. GraphQL lets you choose what fields and children you need in your client application and lets you fetch those fields specifically. In our previous example, even though GraphQL allows you to, will you ever do a query that fetches all the users, and all the orders, and all the items in your client application? Unless you are building a public API with unpredictable access patterns, chances are that the answer is No (except for debugging or exploring your data, maybe).

It's all about access patterns! Just as I explained earlier, you will probably show a list of orders first, and when the user clicks on an order you will show the details. That's two different queries. They might look like this:

Query 1: Fetch a User and related order:

  user(id: "123") {    id    email    name    orders {      id      orderDate      shippedDate    }  }

Query 2: Fetch an Order and related items:

order(id: "456") {    id    orderDate    shippedDate    items {      productId      name      quantity      price    }}

We still have the same 2 access patterns from the beginning.

By now you might think

But I still need one resolver for each entity

No, you don't! We'll see how in the next section.

Build your resolvers with your access patterns in mind

Maybe one of the most common misconceptions with GraphQL is that each nested entity is a different resolver. However, this does not have to be the case. You can easily return child elements from the parent resolver. If you do so, you don't even need a child resolver at all. This can even often be the case with DynamoDB if you denormalise some relations in the parent item (ie: in a Map or a List attribute). But it can also work when the entities are decoupled but you are able to fetch them all in one query, for example with a JOIN (with RDBMS) or if your items live under the same partition (with DynamoDB).

Let's see how this can work in our example and what kind of resolvers we need.

Our user resolver can probably get the user and the orders from DynamoDB in a single query, then return them all in one resolver; while the order resolver can do the same with a particular order and its items.

Here is what the order resolver might look like (request template):

## Fetch the Order and OrderItems (they are under the same partition){    "version": "2017-02-28",    "operation": "Query",    "query": {      "expression": "#PK = :PK",      "expressionNames": {        "#PK": "PK"      },      "expressionValues": {        ":PK": $util.dynamodb.toDynamoDBJson("ORDER#${context.args.id}")      }    }}

and the response template

## re-organize the data#if($context.result.items.size() == 0)  $utils.error("NotFound", "NotFound");#else  #set ($order = {})  #foreach($item in $context.result.items)    #if($item.SK.startsWith("ORDER#"))      #set ($order = $item)      $util.qr($order.put("items", []))    #else      #if($item.SK.startsWith("ORDERITEM#"))        $util.qr($order.items.add($item))      #end    #end  #end  $utils.toJson($order)#end

In the response template, we receive all the items in a single array. All we have to do is to re-organize them a little (we embed the items inside the order itself). We then return the whole thing.

We're sending only one query to DynamoDB! 🎉

The drawbacks

This way of doing things comes with a couple of issues though:

If you only need fields from the order (your query does not include the items field), you will be over-fetching.
It only works with that single access pattern. If you need to access the order items from let's say the updateOrder endpoint, it won't work because they won't come in that DynamoDB access pattern.

Let's tackle these issues one at a time

Avoid over-fetching children entities if they are not explicitly included in the query

That one is easy. Every AppSync query comes with a Context object that contains information about the query. Things like args, source, identity, request and info. The last one is the one we're interested in. It gives us information about the Query and more specifically, the selectionSetList attribute tells us what fields were required in the query. We can use that to change our request to DynamoDB and include the item orders, or not, depending on its value. Let's adjust our request template to use it.

#set($expression="#PK = :PK")#set($expressionNames={"#PK": "PK"})#set($expressionValues={":PK": $util.dynamodb.toDynamoDB("ORDER#$context.args.id")})## if the selectionSetList does not contain the items, we fetch only the order#if(!$ctx.info.selectionSetList.contains("items"))  #set($expression=$expression + " and #SK = :SK")  $util.qr($expressionNames.put("#SK", "SK"))  $util.qr($expressionValues.put(":SK", $util.dynamodb.toDynamoDB("ORDER#${context.args.id}")))#end{    "version" : "2017-02-28",    "operation" : "Query",    "query" : {      "expression": "$expression",      "expressionNames": $util.toJson($expressionNames),      "expressionValues": $util.toJson($expressionValues)    }}

When the selectionSetList does not include "items", we limit the Query to only the Order item itself by adding an SK condition. Now we only fetch what we need when we need it.

Make other access patterns to work

This one is a little more tricky. If we want the updateOrder query to return order items as well, we will need to do it in 2 steps. This means that we will need a resolver for the order items. Unfortunately, here we have no choice. Let's write our order.items resolver.

{    "version" : "2017-02-28",    "operation" : "Query",    "query" : {      "expression": "#PK = :PK and begins_with(#SK, :SK)",      "expressionNames" : {        "#PK" : "PK",        "#SK" : "SK"      },      "expressionValues" : {        ":PK": $util.dynamodb.toDynamoDBJson("ORDER#${context.source.id}"),        ":SK": $util.dynamodb.toDynamoDBJson("ORDERITEM#")      }    }}

Now we can see related order items for an Order in any query. And this is probably what you want! Otherwise, your GraphQL API would be sort of inconsistent where data is returned in some request and not in others.

But wait! We just broke our previous access pattern! Since resolvers are associated with a Type (in our case: Order) and the type is the same, that new resolver will be used by the order endpoint, too. This means, that the extra query will also be executed in that case, making all the efforts we have done so far useless. Worse, we would be fetching the order items twice! Is there a way we can avoid that?

Remember the Context object? It also comes with the source attribute. We even just used it to get the id of the Order and fetch the related order items. That object actually comes with the full result from the previous (parent) resolver, including the order items, if any. We can use that in our order.items resolver and avoid the extra query if the order items come pre-populated from the source. For that, we can use the #return directive.

#if($ctx.source.items)  #return($ctx.source.items)#else{    "version" : "2017-02-28",    "operation" : "Query",    "query" : {      "expression": "#PK = :PK and begins_with(#SK, :SK)",      "expressionNames" : {        "#PK" : "PK",        "#SK" : "SK"      },      "expressionValues" : {        ":PK": $util.dynamodb.toDynamoDBJson("ORDER#${context.source.id}"),        ":SK": $util.dynamodb.toDynamoDBJson("ORDERITEM#")      }    }}#end

By returning early in the request template, the DynamoDB query will not be executed at all and the data from the previous resolver will just pass-through.

Limitations

The techniques I just showed you have some other limitations to keep in mind.

Deep nesting

It can only work well when you have 2 levels of nesting. In DynamoDB, with single-table design, you will almost never group more than 2 levels of hierarchy under the same partition key. You will probably not store orders and order items under the user PK. Instead, you will store the items under a GSI with the order id as the PK. You won't be able to fetch all these items in one single query and you will need at least two. Usually, you will be able to group entities two by two (4 levels of hierarchy = 2 grouped queries to DynamoDB).

That said, it still really depends on your access pattern. If your client API almost always fetches 3 levels, or more, of hierarchy in a single query, you might still group them under the same PK and filter/re-order the items in your top resolver. It might just become more complicated to maintain and you might be reading more than you need in some cases. You might also hit other limitations like the 1MB DynamoDB limit a lot faster.

Pagination

In our example, our users will have an unbounded number of Orders and you probably don't want to return them all in one single query. You will for example get the last 10 in one query, and then paginate. You might want to have a query like this one:

  user(id: "123") {    id    email    name    orders(nextToken: "ey........") {      id      orderDate      shippedDate    }  }

Our design will simply not work in this case because when you pass nextToken, you simply won't get the Order item at all in the DyamoDB response. In fact, you won't even have access to the nextToken argument from your user resolver.

That said, it would also probably be a bad design of your GraphQL API. Do you still want to bring back the user for every orders page? Probably not. If you need to paginate, you should probably have another endpoint in your API. Something like ordersByUserId(userId: ID!): [Order], and use that instead.

Sorting

Sorting is also limited. You might for instance sometimes want to get the last 10 orders, and sometimes the first 10, for a given user. Now you are limited to a direction depending on where you placed the User item in the partition. If it's at the beginning (and your orders are sorted by date), you will get the user's first orders, and if it's at the end, the last ones. Sorting is basically limited to how you designed your table in the first place.

If you need 2 ways of sorting orders (ASC and DESC), you have in fact 2 access patterns. What you would normally do is to add a GSI to your table for the second access pattern. You could then use an index or the other depending on the direction requested by your query. That's another level of complexity to take into consideration.

If you use a sub-resolver (one for the User and one for the Orders), all you would have to do is change the ScanIndexForward param in your items query.

Conclusions

We just saw how single-table design can work well with GraphQL and how we can use its benefits to reduce DynamoDB calls. We found out that it comes with a few challenges and how to deal with them. We also learned about the limitations and things to take into account before using this method. If your question is:

Is is worth it?

Well, it probably depends on your use case. If you know your access patterns well in advance, it can give you a little performance boost. If what you need is flexibility or you have unpredictable access patterns, you probably should stick to keeping your resolvers de-coupled.

If you have comments, suggestions or questions, let me know!

How I used DynamoDB as a long-term cache layer for AppSync

Benoît Bouré — Mon, 21 Dec 2020 17:38:53 GMT

Originally published on Medium

Updated on 20201226: After posting this, I realized that it could be improved this, even more, by using DynamoDB TTL. I updated this article accordingly.

While I was working on a GraphQL API, I needed a couple of resolvers hitting remote HTTP endpoints. This is rather straightforward with AppSync HTTP data sources. However, I didn't want it hitting the remote APIs at every single request. There were several reasons for that:

Latency: Due to several factors, like the location of the remote endpoint, it could add a noticeable overhead and add extra time to the request execution.
Throttling: I did not want to spam the remote endpoints and suffer possible throttling, or even worst: being banned.
API quotas: Some of the remote endpoints also had quotas and I did not want to reach the limits too fast.

Because most of the data was not going to change over time anyway, the natural choice, in this case, was to use a cache layer.

My first instinct was to turn towards the AppSync caching capabilities. AppSync comes out-of-the-box with a built-in server-side cache. It offers per-request and per-resolver caching. Unfortunately, it had two drawbacks for me:

Cost: Starting at $0.044 up to $6.775 per hour, it can quickly become expensive if the workload increases.
TTL is limited to 3600 seconds, after what cached data will expire.

The main issue for me here was the time limit. With a 1-hour cache TTL, data would be flushed every hour and the remote endpoints would have to be hit again. In my case, this was still too often, especially because I was totally OK with a day-old data or even a month-old, in some cases. So, I started looking for alternatives.

The data I had to store was plain JSON objects. So I thought: how about DynamoDB? I could store them as a document in a table. Then, I could have a resolver that looks into the table for a given cache key. If the record is found (and hasn't expired), return it; otherwise, fetch fresh data from the source, store it into the table for later and return the data. Because DynamoDB is fast, it sounded like a good idea.

Now, a naive approach would have been to use a Lambda resolver that would do just that. It would have worked for sure, but there is a better alternative that is faster and cheaper. AppSync supports pipeline resolvers. Pipeline resolvers let you execute multiple operations, or "functions", to resolve one single field. This was just what I needed. My resolver would be composed of 3 functions:

Try and fetch data from a DynamoDB table. If there is a hit, **skip the following functions **and return the data.
If there was no hit, go fetch the remote data from the source.
Save the data into the DynamoDB table and return it.

Let me show you how I implemented this with a simplified example. In this demo, we will build a GraphQL API that fetches Wikipedia articles. Because we don't want to spam Wikipedia's servers and because articles don't change that much very often, they should be cached for a month before we have to hit Wikipedia again and get the updated versions.

To build that, we will use the Serverless Framework and the AppSync plugin. I will not go into details on how the plugin works. For more information, please refer to the documentation on the repository or this series of articles.

I will explain the most important parts only, but you can find the full code of this example on GitHub.

Let's start with the serverless.yml.

    mappingTemplates:      - type: Query        field: wikipedia        kind: PIPELINE        functions:          - fetchFromCache          - fetchWikipedia          - saveToCache    dataSources:      - type: HTTP        name: wikipedia        description: 'Wikipedia api'        config:          endpoint: https://en.wikipedia.org      - type: AMAZON_DYNAMODB        name: wikicache        description: 'Wikipedia cached titles'        config:          tableName:            Ref: WikipediaTable    functionConfigurations:      - dataSource: wikicache        name: fetchFromCache      - dataSource: wikicache        name: saveToCache      - dataSource: wikipedia        name: fetchWikipediaresources:  Resources:    WikipediaTable:      Type: AWS::DynamoDB::Table      Properties:        TableName: wikipedia        BillingMode: PAY_PER_REQUEST        TimeToLiveSpecification:          AttributeName: expires_at          Enabled: true        AttributeDefinitions:          - AttributeName: title            AttributeType: S          - AttributeName: expires_at            AttributeType: N        KeySchema:          - AttributeName: title            KeyType: HASH

mappingTemplates

This is where we define our resolver. As I explained earlier this is going to be a PIPELINE resolver with three consecutive functions: fetchFromCache, fetchWikipedia and saveToCache.

dataSources

Here we define our two data sources:

an HTTP endpoint which points to the Wikipedia API in English
a DynamoDB table

functionConfigurations

And here, we declare the 3 pipeline functions that we use in the data source we created earlier.

WikipediaTable resource

Finally, we declare our DynamoDB table. It will have a HASH key, which will be the title of the article. We also set a TimeToLiveSpecification on the expires_at attribute.

DynamoDB Time to Live (TTL) is a feature that allows us to define a per-record timestamp when the record is no longer needed. When the timestamp is reached, the record is deleted. We will use that in order to auto-expire the cache.

Now, we need to define our mapping templates. There are a few of them. Let's go through them in the order they will be executed.

Let's begin with the "before" pipeline request mapping template.

## Query.wikipedia.request.vtl$util.qr($ctx.stash.put("title", $ctx.args.title)){}

Here, we simply put the title argument coming from the request into the stash (see the schema definition). We will use it later in the pipeline. We also return an empty Map (because mapping templates cannot be empty).

At this point, the first function in the pipeline will be called: fetchFromCache

## fetchFromCache.request.vtl{  "version": "2018-05-29",  "operation": "GetItem",  "key": {    "title": $util.dynamodb.toStringJson("${ctx.stash.title}")  }}

Here, we execute a GetItem operation on our DynamoDB table using the title of the article as the key. Let's see what is in the response template:

#if($ctx.error)  $util.error($ctx.error.message, $ctx.error.type)#end#if($ctx.result)  $util.qr($ctx.stash.put("result", $ctx.result.content))#end{}

First, check for any error, and stop the process if we find any. Then, if we have a result (it means that we have a hit!), we stick it into the stash. You will find out why later.

Now, at this point, if we have a hit, we want to stop the execution and return the value to the user. It turns out that AppSync has a neat solution for that: the return directive.

The #return directive comes handy if you need to return prematurely from any mapping template. #return is analogous to the return keyword in programming languages, as it will return from the closest scoped block of logic. What this means is using #return inside a resolver mapping template will return from the resolver. Additionally, using #return from a function mapping template will return from the function and will continue the execution to either the next function in the pipeline or the resolver response mapping template.

The important part to notice here is that, when used in a pipeline function, *#returnwill continue to the next function in the pipeline. But we don't want the next function to be executed, right? Well, it turns out that if you call *#return in a request mapping, it will skip to the next function without executing the current one.

This is why we previously kept the result into the stash. We will use it in the following two functions' request mappings to determine if there was a hit, and skip to the next one directly, in cascade. See our fetchWikipedia request template:

## fetchWikipedia.request.vtl## Bypass this function if result is present in the stash#if($ctx.stash.result)  #return($ctx.stash.result)#end{  "version": "2018-05-29",  "method": "GET",  "params": {    "query": {        "action": "query",        "format": "json",        "prop": "extracts",        "exintro": "true",        "titles": "${ctx.stash.title}",        "explaintext": "true",        "exsentences": 10    }  },  "resourcePath": "/w/api.php"}

If we have a result in the stash, we call *#return* prematurely and continue to the next function. Otherwise, the function would get executed, the API endpoint would be called and the response template too, where we extract the data we need. Here it is:

## fetchWikipedia.response.vtl#if($ctx.result.statusCode == 200)    #set($body = $utils.parseJson($ctx.result.body))    #foreach ($page in $body.query.pages.entrySet())        #if ($page.value.title == $ctx.args.title)            #return($page.value.extract)        #end    #end    $utils.error("Article not found", "NotFound")#else    $utils.error($ctx.result.statusCode, "Error")#end

All right, we are almost there. There is just one last function to define.

Remember, if we return early within any function, the return directive will skip to the next function. So, here again, we need to check if we have a result in the stash and return early one more time. Otherwise, this is where we save the result from the previous function into DynamoDB. We also set an expiry timestamp for 30 days in the future:

## saveToCache.request.vtl## Bypass this function if result is present in the stash#if($ctx.stash.result)  #return($ctx.stash.result)#end#set($expires_at = $util.time.nowEpochSeconds() + 3600 * 24 * 30){  "version" : "2018-05-29",  "operation" : "PutItem",  "key" : {    "title" : $util.dynamodb.toStringJson(${ctx.stash.title})  },  "attributeValues": {    "expires_at": $util.dynamodb.toNumberJson($expires_at),    "content": $util.dynamodb.toStringJson($ctx.prev.result)  }}

And we return the result.

## saveToCache.response.vtl#if($ctx.error)  $util.error($ctx.error.message, $ctx.error.type)#end$utils.toJson($ctx.prev.result)

Finally, our "after" pipeline just forwards the result to the resolver

## Query.wikipedia.response.vtl$util.toJson($ctx.result)

And we are done!

Let's deploy, run some queries and look at the X-Ray traces to confirm what we just built works as expected.

We will use the following query and execute it, twice:

query {  wikipedia(title: "Cat")}

which generates the following traces.

Traces of the first execution

Traces of the second execution

As you can see, the first time, our resolver executed the three steps sequentially. The second time though, only the GetItem operation was executed. The HTTP request was not executed at all and our resolver execution time even went down from 165ms to 47ms. Isn't that nice?

If you look carefully you will also notice 2 warning signs. These are the request mapping templates where we do an early return. I am not sure why X-Ray shows that as warnings but there are no errors at all showing in the details or the CloudWatch logs, and everything works as expected.

Here you have it, a long-term cache layer for AppSync, using only out-of-the-box functionalities.