<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Benoit Boure]]></title><description><![CDATA[I am a software engineer with a focus on serverless technologies. I blog about Serverless
[Follow me on Twitter](https://twitter.com/Benoit_Boure) - [Need help?]]></description><link>https://benoitboure.com</link><generator>RSS for Node</generator><lastBuildDate>Sun, 12 Apr 2026 09:04:44 GMT</lastBuildDate><atom:link href="https://benoitboure.com/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Private API Gateway as EventBridge API Destination]]></title><description><![CDATA[In a previous post, I explained how to connect AWS Step Functions to a private API Gateway endpoint thanks to the new integration with AWS PrivateLink and Amazon VPC Lattice. In this issue, I’ll show you how to use the same integration to use a priva...]]></description><link>https://benoitboure.com/private-api-gateway-as-eventbridge-api-destination</link><guid isPermaLink="true">https://benoitboure.com/private-api-gateway-as-eventbridge-api-destination</guid><category><![CDATA[AWS]]></category><category><![CDATA[AWS EventBridge]]></category><category><![CDATA[API Gateway]]></category><category><![CDATA[vpc]]></category><category><![CDATA[VPC Endpoints]]></category><dc:creator><![CDATA[Benoît Bouré]]></dc:creator><pubDate>Tue, 21 Jan 2025 08:00:13 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1737112643009/3f8e278d-7380-4a73-9a26-896c9820852e.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In a previous post, I explained <a target="_blank" href="https://benoitboure.com/invoking-private-api-gateway-endpoints-from-step-functions">how to connect AWS Step Functions to a private API Gateway endpoint</a> thanks to the new integration with AWS PrivateLink and Amazon VPC Lattice. In this issue, I’ll show you how to use the same integration to use a private API Gateway API as an EventBridge target using the CDK, removing the need for an intermediary Lambda function.</p>
<h2 id="heading-overview">Overview</h2>
<p><img src="https://cdn-0.plantuml.com/plantuml/png/dL71RXCn4BtxAzmSq5R2CTnG3r5BY4gb2aLKWhDZJpOZTctB7cTL8VwTIUA4LM4FzB3hDyzlPj-ylSra4fM-4rSEjkX1tdr_MdCjTqGntsYTp31laNPbKp8a6po1fxaDlJP3ximc7qw5V97LDYGLE-CF0_N-_OVvE-qman1Nw6rNt6MwvdCP-ZxuUUJod_TFsCSEjmXkGdEVGebPVril9mJyXLW8zAFfDyvCYBu03I7zGDykJxjzWWxta9uFWrVUnO2UyfJDo1Qj8Gp-WPlRT8JwRlrmRmW6y_n_VQizUFgOqBNm6hUFXWXjRHLYDAs18oxvhPojAfmndbqBmOt791iF0sDc-JsxbZ-5bEC8cdsqv-8aakUoZcBznKIJ88UIBDGWMF6rCh9Ibwu_SJKc8hDC_2Kw_SHcPxph837x-OIgu9SGvnrLmdP7Ql72mOqS1I8vFW_saBfy8o_EcDrYArvqgiLeTJ72Qi5-1JzgKNs9M_2Eu_yD" alt="EventBridge Private API Gateway Integration" /></p>
<p><a target="_blank" href="https://www.plantuml.com/plantuml/uml/dL71RXCn4BtxAzmSq5R2CTnG3r5BY4gb2aLKWhDZJpOZTctB7cTL8VwTIUA4LM4FzB3hDyzlPj-ylSra4fM-4rSEjkX1tdr_MdCjTqGntsYTp31laNPbKp8a6po1fxaDlJP3ximc7qw5V97LDYGLE-CF0_N-_OVvE-qman1Nw6rNt6MwvdCP-ZxuUUJod_TFsCSEjmXkGdEVGebPVril9mJyXLW8zAFfDyvCYBu03I7zGDykJxjzWWxta9uFWrVUnO2UyfJDo1Qj8Gp-WPlRT8JwRlrmRmW6y_n_VQizUFgOqBNm6hUFXWXjRHLYDAs18oxvhPojAfmndbqBmOt791iF0sDc-JsxbZ-5bEC8cdsqv-8aakUoZcBznKIJ88UIBDGWMF6rCh9Ibwu_SJKc8hDC_2Kw_SHcPxph837x-OIgu9SGvnrLmdP7Ql72mOqS1I8vFW_saBfy8o_EcDrYArvqgiLeTJ72Qi5-1JzgKNq9SGlUSVw_0G00">Source</a></p>
<p>The setup is similar to the one for Step Functions. A Resource Gateway is used as the entry point into the VPC. It is associated with a Resource Configuration, which defines the Private API Gateway resource, and the EventBridge Connection is configured to use the Resource Config as the final destination.</p>
<p>For more details about this setup, see my previous post about the <a target="_blank" href="https://benoitboure.com/invoking-private-api-gateway-endpoints-from-step-functions#heading-overview">Step Functions Integration</a>.</p>
<h2 id="heading-cdk-stack-definition">CDK Stack Definition</h2>
<p>We need to define the Resource Gateway and the Resource Definition.</p>
<pre><code class="lang-typescript">    <span class="hljs-comment">// Security Group for the Resource Gateway</span>
    <span class="hljs-keyword">const</span> rgSecurityGroup = <span class="hljs-keyword">new</span> SecurityGroup(<span class="hljs-built_in">this</span>, <span class="hljs-string">'ResourceGatewaySG'</span>, {
      vpc: vpc,
      allowAllOutbound: <span class="hljs-literal">false</span>,
    });

    rgSecurityGroup.addEgressRule(
      Peer.ipv4(vpc.vpcCidrBlock),
      Port.tcp(<span class="hljs-number">443</span>),
      <span class="hljs-string">'Allow HTTPS traffic from Resource Gateway'</span>,
    );

    <span class="hljs-comment">// Resource Gateway</span>
    <span class="hljs-keyword">const</span> resourceGateway = <span class="hljs-keyword">new</span> CfnResourceGateway(<span class="hljs-built_in">this</span>, <span class="hljs-string">'ResourceGateway'</span>, {
      name: <span class="hljs-string">'private-api-access'</span>,
      ipAddressType: <span class="hljs-string">'IPV4'</span>,
      vpcIdentifier: vpc.vpcId,
      subnetIds: vpc.isolatedSubnets.map(<span class="hljs-function">(<span class="hljs-params">subnet</span>) =&gt;</span> subnet.subnetId),
      securityGroupIds: [rgSecurityGroup.securityGroupId],
    });

    <span class="hljs-comment">// Resource Configuration</span>
    <span class="hljs-keyword">const</span> resourceConfig = <span class="hljs-keyword">new</span> CfnResourceConfiguration(
      <span class="hljs-built_in">this</span>,
      <span class="hljs-string">'ResourceConfig'</span>,
      {
        name: <span class="hljs-string">'sf-private-api'</span>,
        portRanges: [<span class="hljs-string">'443'</span>],
        resourceGatewayId: resourceGateway.ref,
        resourceConfigurationType: <span class="hljs-string">'SINGLE'</span>,
      },
    );

    <span class="hljs-comment">// Use the global DNS name of the API gateway's VPC endpoint</span>
    <span class="hljs-comment">// in the Resource Configuration</span>
    resourceConfig.addPropertyOverride(
      <span class="hljs-string">'ResourceConfigurationDefinition.DnsResource'</span>,
      {
        DomainName: Fn.select(
          <span class="hljs-number">1</span>,
          Fn.split(<span class="hljs-string">':'</span>, Fn.select(<span class="hljs-number">0</span>, api.vpcEndpoint.vpcEndpointDnsEntries)),
        ),
        IpAddressType: <span class="hljs-string">'IPV4'</span>,
      },
    );

    <span class="hljs-comment">// Event Bus</span>
    <span class="hljs-keyword">const</span> eventBus = <span class="hljs-keyword">new</span> EventBus(<span class="hljs-built_in">this</span>, <span class="hljs-string">'EventBus'</span>, {});

    <span class="hljs-comment">// Connection to the API</span>
    <span class="hljs-keyword">const</span> connection = <span class="hljs-keyword">new</span> Connection(<span class="hljs-built_in">this</span>, <span class="hljs-string">'ApiConnection'</span>, {
      authorization: Authorization.apiKey(
        <span class="hljs-string">'x-api-key'</span>,
        SecretValue.unsafePlainText(<span class="hljs-string">'demo'</span>),
      ),
    });

    <span class="hljs-comment">// Setup the Connection with the Resouce Config</span>
    (connection.node.children[<span class="hljs-number">0</span>] <span class="hljs-keyword">as</span> CfnConnection).addPropertyOverride(
      <span class="hljs-string">'InvocationConnectivityParameters'</span>,
      {
        ResourceParameters: {
          ResourceConfigurationArn: resourceConfig.attrArn,
        },
      },
    );
</code></pre>
<p>EventBridge is now able to connect to the private API Gateway. We can now create a rule and set the API as the target.</p>
<pre><code class="lang-typescript">    <span class="hljs-keyword">const</span> rule = <span class="hljs-keyword">new</span> Rule(<span class="hljs-built_in">this</span>, <span class="hljs-string">'RequestAccountRule'</span>, {
      eventBus,
      eventPattern: {
        source: [<span class="hljs-string">'my-source'</span>],
      },
    });

    <span class="hljs-keyword">const</span> apiDestination = <span class="hljs-keyword">new</span> ApiDestination(<span class="hljs-built_in">this</span>, <span class="hljs-string">'ApiDestination'</span>, {
      endpoint: <span class="hljs-string">`<span class="hljs-subst">${api.api.url}</span>/hello`</span>,
      httpMethod: HttpMethod.POST,
      connection: connection,
    });

    rule.addTarget(
      <span class="hljs-keyword">new</span> targets.ApiDestination(apiDestination, {
        event: RuleTargetInput.fromEventPath(<span class="hljs-string">'$.detail'</span>),
      }),
    );
</code></pre>
<div data-node-type="callout">
<div data-node-type="callout-emoji">🔗</div>
<div data-node-type="callout-text">Find the full code on <a target="_self" href="https://github.com/bboure/cdk-event-bridge-private-api-gateway">GitHub</a>.</div>
</div>

<h2 id="heading-testing-the-integration">Testing the Integration</h2>
<p>Putting the following event on the bus.</p>
<pre><code class="lang-typescript">{
  <span class="hljs-string">"DetailType"</span>: <span class="hljs-string">"somethingHappened"</span>,
  <span class="hljs-string">"Source"</span>: <span class="hljs-string">"my-source"</span>,
  <span class="hljs-string">"EventBusName"</span>:<span class="hljs-string">"EventBusVendingMachine308DEFEB"</span>,
  <span class="hljs-string">"Detail"</span>: {
    <span class="hljs-string">"foo"</span>: <span class="hljs-string">"bar"</span>
  }
}
</code></pre>
<p>I can see that the Lambda function used as the handler of the endpoint is invoked with the following event.</p>
<pre><code class="lang-json">{
    <span class="hljs-attr">"resource"</span>: <span class="hljs-string">"/hello"</span>,
    <span class="hljs-attr">"path"</span>: <span class="hljs-string">"/hello"</span>,
    <span class="hljs-attr">"httpMethod"</span>: <span class="hljs-string">"POST"</span>,
    <span class="hljs-attr">"headers"</span>: {
        <span class="hljs-attr">"Accept-Encoding"</span>: <span class="hljs-string">"gzip, x-gzip, deflate, br"</span>,
        <span class="hljs-attr">"Content-Type"</span>: <span class="hljs-string">"application/json; charset=utf-8"</span>,
        <span class="hljs-attr">"Host"</span>: <span class="hljs-string">"899aggxh3a.execute-api.us-east-1.amazonaws.com"</span>,
        <span class="hljs-attr">"Range"</span>: <span class="hljs-string">"bytes=0-1048575"</span>,
        <span class="hljs-attr">"User-Agent"</span>: <span class="hljs-string">"Amazon/EventBridge/ApiDestinations"</span>,
        <span class="hljs-attr">"x-amzn-cipher-suite"</span>: <span class="hljs-string">"ECDHE-RSA-AES128-GCM-SHA256"</span>,
        <span class="hljs-attr">"x-amzn-tls-version"</span>: <span class="hljs-string">"TLSv1.2"</span>,
        <span class="hljs-attr">"x-amzn-vpc-id"</span>: <span class="hljs-string">"vpc-0a1db1c1701e137ca"</span>,
        <span class="hljs-attr">"x-amzn-vpce-config"</span>: <span class="hljs-string">"1"</span>,
        <span class="hljs-attr">"x-amzn-vpce-id"</span>: <span class="hljs-string">"vpce-09fc3c0c5173d919b"</span>,
        <span class="hljs-attr">"x-api-key"</span>: <span class="hljs-string">"demo"</span>,
        <span class="hljs-attr">"X-Forwarded-For"</span>: <span class="hljs-string">"10.0.195.243"</span>
    },
    <span class="hljs-attr">"multiValueHeaders"</span>: {
        <span class="hljs-attr">"Accept-Encoding"</span>: [
            <span class="hljs-string">"gzip, x-gzip, deflate, br"</span>
        ],
        <span class="hljs-attr">"Content-Type"</span>: [
            <span class="hljs-string">"application/json; charset=utf-8"</span>
        ],
        <span class="hljs-attr">"Host"</span>: [
            <span class="hljs-string">"899aggxh3a.execute-api.us-east-1.amazonaws.com"</span>
        ],
        <span class="hljs-attr">"Range"</span>: [
            <span class="hljs-string">"bytes=0-1048575"</span>
        ],
        <span class="hljs-attr">"User-Agent"</span>: [
            <span class="hljs-string">"Amazon/EventBridge/ApiDestinations"</span>
        ],
        <span class="hljs-attr">"x-amzn-cipher-suite"</span>: [
            <span class="hljs-string">"ECDHE-RSA-AES128-GCM-SHA256"</span>
        ],
        <span class="hljs-attr">"x-amzn-tls-version"</span>: [
            <span class="hljs-string">"TLSv1.2"</span>
        ],
        <span class="hljs-attr">"x-amzn-vpc-id"</span>: [
            <span class="hljs-string">"vpc-0a1db1c1701e137ca"</span>
        ],
        <span class="hljs-attr">"x-amzn-vpce-config"</span>: [
            <span class="hljs-string">"1"</span>
        ],
        <span class="hljs-attr">"x-amzn-vpce-id"</span>: [
            <span class="hljs-string">"vpce-09fc3c0c5173d919b"</span>
        ],
        <span class="hljs-attr">"x-api-key"</span>: [
            <span class="hljs-string">"demo"</span>
        ],
        <span class="hljs-attr">"X-Forwarded-For"</span>: [
            <span class="hljs-string">"10.0.195.243"</span>
        ]
    },
    <span class="hljs-attr">"queryStringParameters"</span>: <span class="hljs-literal">null</span>,
    <span class="hljs-attr">"multiValueQueryStringParameters"</span>: <span class="hljs-literal">null</span>,
    <span class="hljs-attr">"pathParameters"</span>: <span class="hljs-literal">null</span>,
    <span class="hljs-attr">"stageVariables"</span>: <span class="hljs-literal">null</span>,
    <span class="hljs-attr">"requestContext"</span>: {
        <span class="hljs-attr">"resourceId"</span>: <span class="hljs-string">"yjzggg"</span>,
        <span class="hljs-attr">"resourcePath"</span>: <span class="hljs-string">"/hello"</span>,
        <span class="hljs-attr">"httpMethod"</span>: <span class="hljs-string">"POST"</span>,
        <span class="hljs-attr">"extendedRequestId"</span>: <span class="hljs-string">"Efl1aFY5oAMFoYA="</span>,
        <span class="hljs-attr">"requestTime"</span>: <span class="hljs-string">"16/Jan/2025:18:29:54 +0000"</span>,
        <span class="hljs-attr">"path"</span>: <span class="hljs-string">"/prod/hello"</span>,
        <span class="hljs-attr">"accountId"</span>: <span class="hljs-string">"438465158289"</span>,
        <span class="hljs-attr">"protocol"</span>: <span class="hljs-string">"HTTP/1.1"</span>,
        <span class="hljs-attr">"stage"</span>: <span class="hljs-string">"prod"</span>,
        <span class="hljs-attr">"domainPrefix"</span>: <span class="hljs-string">"899aggxh3a"</span>,
        <span class="hljs-attr">"requestTimeEpoch"</span>: <span class="hljs-number">1737052194204</span>,
        <span class="hljs-attr">"requestId"</span>: <span class="hljs-string">"3a6754ae-5488-429c-9ec6-1837a4c21727"</span>,
        <span class="hljs-attr">"identity"</span>: {
            <span class="hljs-attr">"cognitoIdentityPoolId"</span>: <span class="hljs-literal">null</span>,
            <span class="hljs-attr">"cognitoIdentityId"</span>: <span class="hljs-literal">null</span>,
            <span class="hljs-attr">"vpceId"</span>: <span class="hljs-string">"vpce-09fc3c0c5173d919b"</span>,
            <span class="hljs-attr">"apiKey"</span>: <span class="hljs-string">"demo"</span>,
            <span class="hljs-attr">"principalOrgId"</span>: <span class="hljs-literal">null</span>,
            <span class="hljs-attr">"cognitoAuthenticationType"</span>: <span class="hljs-literal">null</span>,
            <span class="hljs-attr">"userArn"</span>: <span class="hljs-literal">null</span>,
            <span class="hljs-attr">"userAgent"</span>: <span class="hljs-string">"Amazon/EventBridge/ApiDestinations"</span>,
            <span class="hljs-attr">"accountId"</span>: <span class="hljs-literal">null</span>,
            <span class="hljs-attr">"caller"</span>: <span class="hljs-literal">null</span>,
            <span class="hljs-attr">"sourceIp"</span>: <span class="hljs-string">"10.0.195.243"</span>,
            <span class="hljs-attr">"accessKey"</span>: <span class="hljs-literal">null</span>,
            <span class="hljs-attr">"vpcId"</span>: <span class="hljs-string">"vpc-0a1db1c1701e137ca"</span>,
            <span class="hljs-attr">"cognitoAuthenticationProvider"</span>: <span class="hljs-literal">null</span>,
            <span class="hljs-attr">"user"</span>: <span class="hljs-literal">null</span>
        },
        <span class="hljs-attr">"domainName"</span>: <span class="hljs-string">"899aggxh3a.execute-api.us-east-1.amazonaws.com"</span>,
        <span class="hljs-attr">"deploymentId"</span>: <span class="hljs-string">"74v610"</span>,
        <span class="hljs-attr">"apiId"</span>: <span class="hljs-string">"899aggxh3a"</span>
    },
    <span class="hljs-attr">"body"</span>: <span class="hljs-string">"{\"foo\":\"bar\"}"</span>,
    <span class="hljs-attr">"isBase64Encoded"</span>: <span class="hljs-literal">false</span>
}
</code></pre>
<h2 id="heading-conclusion">Conclusion</h2>
<p>The new VPC Lattice and AWS Private Link integration allows developers to invoke Private APIs directly without needing a Lambda function. This reduces code, maintenance, and latency.</p>
]]></content:encoded></item><item><title><![CDATA[Invoking Private API Gateway Endpoints From Step Functions]]></title><description><![CDATA[At Re:Invent 2024, AWS announced EventBridge and Step Functions integration with private APIs. Thanks to this new feature, customers can now directly invoke APIs that are inside a private VPC from EventBridge (with API destinations), or Step Function...]]></description><link>https://benoitboure.com/invoking-private-api-gateway-endpoints-from-step-functions</link><guid isPermaLink="true">https://benoitboure.com/invoking-private-api-gateway-endpoints-from-step-functions</guid><category><![CDATA[AWS]]></category><category><![CDATA[AWS Step Functions]]></category><category><![CDATA[API Gateway]]></category><category><![CDATA[CDK]]></category><category><![CDATA[serverless]]></category><category><![CDATA[Serverless Architecture]]></category><category><![CDATA[AWS Private Link]]></category><dc:creator><![CDATA[Benoît Bouré]]></dc:creator><pubDate>Tue, 14 Jan 2025 07:54:20 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1736839599937/50b215ea-437e-405c-b8a1-f4b4375d6878.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>At Re:Invent 2024, AWS announced <a target="_blank" href="https://aws.amazon.com/about-aws/whats-new/2024/12/amazon-eventbridge-step-functions-integration-private-apis/">EventBridge and Step Functions integration with private APIs</a>. Thanks to this new feature, customers can now directly invoke APIs that are inside a private VPC from EventBridge (with <a target="_blank" href="https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-api-destinations.html">API destinations</a>), or Step Functions (<a target="_blank" href="https://docs.aws.amazon.com/step-functions/latest/dg/call-https-apis.html">HTTP Tasks</a>). Before, users had to use Lambda functions inside the VPC as a proxy to their private APIs.</p>
<p>In a previous post, I explained how to <a target="_blank" href="https://benoitboure.com/calling-external-endpoints-with-step-functions-and-the-cdk">invoke HTTP endpoints from Step Functions with the CDK</a>. In this issue, I will cover calling a private API Gateway endpoint using the new integration.</p>
<h2 id="heading-overview">Overview</h2>
<p>First, let’s examine how this new integration works and how the different components interact with each other. At the end of this post, I’ll show you how to deploy this setup with the CDK.</p>
<p>Here is a diagram that describes the architecture.</p>
<p><img src="https://cdn-0.plantuml.com/plantuml/png/dL7DRXCn4BxFKxWveAt4nd13FKGjePHA5Og816TdFRknyDgMFSwgGhmxirddXq8ES4WyCzzFlfdS9bAHSc_XIcDh78gxR-iLzs9B5DADb54DyyxGDczomjXuH-XetlXUgY5PjKdZMni6KjtwM0Uht6WeTs_VpTz8RH81N1dNsAoFxfBVfUzxx-Q1sx_YQzC7Qrg3-WBd8VeSalowMbuWy2-4J2YVLB_HwWBfCzBWutVZkkMqsmUqPeVnUJI-TpfuuoXTYXauOgF8UFV8uYxkItctUdnGX8Dw_ZVTcZ1ypAuPc_G_UPyKaMbmaWByvbUDijRwuRMOZO0u8ZEUpAu1s61_qyhXm3LF-NjsBNu0275oho8cdsE3PSU99meglXHK5BYu2t5-pseNcaDJz8Vso3zTiLB1y9G7VvXE_ssrLKvRZ3pzD5M5y1FWi7QzU97xvdw7Zjv7epiKV4o7-tE8LwSLUDgQ3bu8wyLPUZYhwmK71VxKYn8806xHwTpRNm00" alt="PlantUML diagram" /></p>
<p><a target="_blank" href="https://www.plantuml.com/plantuml/uml/dL71ZjCm4BtxAxmzeAn4QhYXFLIxb6LPQOKgAi7PZIVf2CUsx76Z5UBVcJHfA8KSs4FYcJTlNfvVRXFfIBcruif0ZGxatRVjXdkv9mhfHgceksM3jC-xd21MtX4uMbQ-LRfBLkzIVvR8WrJMFfR1QjSBgiFRTyitoc0Y8QxGLJQRILtnkVPjwzqoSFlF-HRROB56C3ESX-XpIEhhPZr3u2-4JA2UTBipUeRq6QZpyJkwPZtSxGDOF41yxeNldGaU7QKvcu4jLfhGkqTURkAnL7URnmTDqEdd_zlR4eIFsLLzarxYzqaJOGN3gX1_w1NzMcrzzrek-e6S9Wj65jT2iC0nqy91npMZ_5vSonz2olCmYaEeJir0agTsb6B-PAQ8a7oE5OoHCEFBYCWHchP-1rVeW8moy1Tf-9t5NZjZ8JBwQQX6mayXJZSj8pPxAbSN3cxa_G4SlOze6f0SeuDZ4FALd9mnMcCZBZRBrTdLnLbThjYluATSZRw4k0LdScj_0G00">Source</a></p>
<p>The new private API integration is powered by <a target="_blank" href="https://aws.amazon.com/vpc/lattice/">VPC Lattice</a> and <a target="_blank" href="https://aws.amazon.com/privatelink/">AWS PrivateLink</a>. VPC Lattice has two new features that make connecting Step Functions to a VPC possible: <em>Resource Gateways</em> and <em>Resource Configurations</em></p>
<h3 id="heading-resource-gateway"><strong>Resource Gateway</strong></h3>
<p>A <a target="_blank" href="https://docs.aws.amazon.com/vpc/latest/privatelink/resource-gateway.html">resource gateway</a> is a point of entry into the VPC where your resources reside. It can span one or more availability zones through the VPC subnets.</p>
<p>To access a private API Gateway from Step Functions, we need a Resource Gateway that lives in the same VPC and subnets as the VPC endpoint that is attached to the API.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1736791560083/19e3bba4-c459-4a59-87d9-2ca87d48aa28.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-resource-configuration">Resource Configuration</h3>
<p>Once we have a Resource Gateway for our VPC, we can create and attach <a target="_blank" href="https://docs.aws.amazon.com/vpc-lattice/latest/ug/resource-configuration.html">Resource Configurations</a> to it. Resource Configurations represent resources that are accessible through the gateway, and how they are accessed.</p>
<p>In the case of API Gateway, a resource configuration consists of the VPC endpoint’s regional DNS name. We can also specify a port or range of ports that are accessible, which in our case is just <code>443</code> (for HTTPS).</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1736791485500/abe42ff7-f635-4182-bfe1-7f32afe14617.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-eventbridge-connection">EventBridge Connection</h3>
<p>To call HTTP endpoints, Step Functions uses <a target="_blank" href="https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-target-connection.html">EventBridge Connections</a>. The connection defines the authorization method, and the credentials to access the endpoint. Connections now have a new capability that allows integration with private APIs through a VPC Lattice Resource Configuration.</p>
<p>For API Gateway, the Resource Configuration is the one that defines the API Gateway.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1736791696972/19cb1b19-3442-4ab4-8aa0-9d3112e6cdcc.png" alt class="image--center mx-auto" /></p>
<p>And that’s all we need. Everything else works the same as <a target="_blank" href="https://benoitboure.com/calling-external-endpoints-with-step-functions-and-the-cdk">calling a public HTTP endpoint</a>.</p>
<h2 id="heading-definition-with-the-cdk">Definition With the CDK</h2>
<p>As explained earlier, we need a Resource Gateway that serves as the point of ingress into our VPC. We also create a security group that only allows egress to port 443, which is all we need for this use case:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> rgSecurityGroup = <span class="hljs-keyword">new</span> SecurityGroup(<span class="hljs-built_in">this</span>, <span class="hljs-string">'ResourceGatewaySG'</span>, {
  vpc: vpc,
  allowAllOutbound: <span class="hljs-literal">false</span>,
});

rgSecurityGroup.addEgressRule(
  Peer.ipv4(vpc.vpcCidrBlock),
  Port.tcp(<span class="hljs-number">443</span>),
  <span class="hljs-string">'Allow HTTPS traffic from Resource Gateway'</span>,
);

<span class="hljs-comment">// Resource Gateway</span>
<span class="hljs-keyword">const</span> resourceGateway = <span class="hljs-keyword">new</span> CfnResourceGateway(<span class="hljs-built_in">this</span>, <span class="hljs-string">'ResourceGateway'</span>, {
  name: <span class="hljs-string">'private-api-access'</span>,
  ipAddressType: <span class="hljs-string">'IPV4'</span>,
  vpcIdentifier: vpc.vpcId, 
  subnetIds: vpc.isolatedSubnets.map(<span class="hljs-function">(<span class="hljs-params">subnet</span>) =&gt;</span> subnet.subnetId), <span class="hljs-comment">// all isolated subnets</span>
  securityGroupIds: [rgSecurityGroup.securityGroupId],
});
</code></pre>
<p>We also need a Resource Config that describes the API Gateway’s VPC endpoint.</p>
<pre><code class="lang-typescript"><span class="hljs-comment">// Resource Configuration</span>
<span class="hljs-keyword">const</span> resourceConfig = <span class="hljs-keyword">new</span> CfnResourceConfiguration(
  <span class="hljs-built_in">this</span>,
  <span class="hljs-string">'ResourceConfig'</span>,
  {
    name: <span class="hljs-string">'sf-private-api'</span>,
    portRanges: [<span class="hljs-string">'443'</span>],
    resourceGatewayId: resourceGateway.ref,
    resourceConfigurationType: <span class="hljs-string">'SINGLE'</span>,
  },
);

resourceConfig.addPropertyOverride(
  <span class="hljs-string">'ResourceConfigurationDefinition.DnsResource'</span>,
  {
    DomainName: Fn.select(
      <span class="hljs-number">1</span>,
      Fn.split(<span class="hljs-string">':'</span>, Fn.select(<span class="hljs-number">0</span>, api.vpcEndpoint.vpcEndpointDnsEntries)),
    ),
    IpAddressType: <span class="hljs-string">'IPV4'</span>,
  },
);
</code></pre>
<p>At the time of writing, the <code>CfnResourceConfiguration</code> L1 construct does not support <code>DnsResource</code> for <code>ResourceConfigurationDefinition</code>, so I’m using an override. For <code>DomainName</code>, we need the regional public DNS name of the <a target="_blank" href="https://github.com/bboure/cdk-step-functions-private-api-gateway/blob/main/lib/constructs/PrivateApi.ts#L38-L45">VPC endpoint of the API Gateway</a>, which is the first item of the <a target="_blank" href="https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-ec2-vpcendpoint.html#aws-resource-ec2-vpcendpoint-return-values">DnsEntries</a> CloudFormation returned value. It’s prefixed with the hosted zone id, so I’m using intrinsic functions to extract the value.</p>
<p>We can now use the configuration in our Step Functions definition:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> connection = <span class="hljs-keyword">new</span> Connection(<span class="hljs-built_in">this</span>, <span class="hljs-string">'ApiConnection'</span>, {
  authorization: Authorization.apiKey(
    <span class="hljs-string">'x-api-key'</span>,
    SecretValue.unsafePlainText(<span class="hljs-string">'demo'</span>),
  ),
});

(connection.node.children[<span class="hljs-number">0</span>] <span class="hljs-keyword">as</span> CfnConnection).addPropertyOverride(
  <span class="hljs-string">'InvocationConnectivityParameters'</span>,
  {
    ResourceParameters: {
      ResourceConfigurationArn: resourceConfig.attrArn,
    },
  },
);

<span class="hljs-keyword">const</span> http = <span class="hljs-keyword">new</span> HttpInvoke(<span class="hljs-built_in">this</span>, <span class="hljs-string">'Http'</span>, {
  apiRoot: api.url, <span class="hljs-comment">// url of the API Gateway</span>
  apiEndpoint: TaskInput.fromText(<span class="hljs-string">`hello`</span>),
  method: TaskInput.fromText(<span class="hljs-string">'GET'</span>),
  connection: connection,
});
</code></pre>
<p>The <code>Connection</code> construct does not support <code>InvocationConnectivityParameters</code> yet, so I’m also using an override here as well.</p>
<p>And here you have it! You can find this code in full on <a target="_blank" href="https://github.com/bboure/cdk-step-functions-private-api-gateway/blob/main/lib/step-functions-private-api-stack.ts">GitHub</a>.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>With this new integration with Amazon VPC Lattice and AWS PrivateLink, teams now have the ability to invoke private API Gateway endpoints directly from AWS Step Functions or Amazon EventBridge. This eliminates the need for a Lambda Function, which in turn reduces the amount of code required and decreases overhead. It's a significant step forward for teams looking to optimize and simplify their cloud infrastructure.</p>
]]></content:encoded></item><item><title><![CDATA[I Built A Serverless Ephemeral AWS Account Vending Machine]]></title><description><![CDATA[Last November 2024, I attended an AWS user group meetup in Barcelona. I found Joan García's sessions particularly interesting. He explained how they addressed some recurring challenges at Ocado, such as safely conducting proofs of concept or running ...]]></description><link>https://benoitboure.com/i-built-a-serverless-ephemeral-aws-account-vending-machine</link><guid isPermaLink="true">https://benoitboure.com/i-built-a-serverless-ephemeral-aws-account-vending-machine</guid><category><![CDATA[AWS]]></category><category><![CDATA[serverless]]></category><category><![CDATA[aws-cdk]]></category><category><![CDATA[aws learning ]]></category><category><![CDATA[AWS Management]]></category><category><![CDATA[AWS Cost Optimization]]></category><category><![CDATA[AWS Account Management]]></category><dc:creator><![CDATA[Benoît Bouré]]></dc:creator><pubDate>Tue, 07 Jan 2025 07:59:31 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1735999703717/198deb0e-d15a-48ff-bd6e-00d2b99df11b.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Last November 2024, I attended an <a target="_blank" href="https://www.meetup.com/barcelona-amazon-web-services-meetup/events/304525815/">AWS user group meetup in Barcelona</a>. I found <a target="_blank" href="https://www.linkedin.com/in/jggtic/">Joan García</a>'s sessions particularly interesting. He explained how they addressed some recurring challenges at <a target="_blank" href="https://www.linkedin.com/company/ocado-technology/posts/?feedView=all">Ocado</a>, such as safely conducting proofs of concept or running hackathons without disrupting production environments while keeping costs under control.</p>
<p>Unfortunately, this session was not recorded. I won’t get into the details but, in short, they implemented a way for their teams to request ephemeral, self-destructed AWS accounts. Users request an account to run a PoC, a hackathon, a workshop, etc. After a set time, or when a specific budget is reached, all the resources in the account are automatically deleted, and the account is closed. In other words: an AWS account vending machine.</p>
<p>I thought it was a great idea! We all love to play with new AWS services, run quick PoCs, etc. but I often forget or am too lazy to clean up after myself, which clutters my AWS accounts with unnecessary resources. In some cases, it can even incur unnecessary costs.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1736019458647/b6b5a412-d375-49a5-ab86-12e4933c927a.png" alt class="image--center mx-auto" /></p>
<p>Although Infrastructure as code helps with the tear-down process, I don’t always dot it. Sometimes, I don’t even use IaC, especially if I just want to test or try something quickly. The idea of getting an ephemeral AWS account that I can mess with, and knowing that everything in it will automatically get destroyed later sounded very attractive. The bad news was: Ocado’s solution is not open source... So I re-built it myself.</p>
<h2 id="heading-requirements">Requirements</h2>
<p>Here are the requirements I had in mind before starting this project:</p>
<ul>
<li><p><strong>Security</strong></p>
<ul>
<li><p>All sandbox accounts should stay secure within the same AWS Organization.</p>
</li>
<li><p>Users can only access accounts they are supposed to. e.g. they can’t access other user’s accounts.</p>
</li>
<li><p>Users should access the vending machine and sandbox accounts using their SSO credentials.</p>
</li>
</ul>
</li>
<li><p><strong>Low Cost:</strong> The solution should be cheap to run (What’s the point of building this to save on costs if the solution itself ends up costing more than the savings?). For that reason, I wanted the solution to be 100% serverless.</p>
</li>
<li><p><strong>Simple</strong>:</p>
<ul>
<li><p>Users should be able to easily request new accounts through a simple Web App.</p>
</li>
<li><p>Users should access sandbox accounts from the AWS access portal, or the AWS CLI, using SSO, just like they do for any other long-lived account (e.g. dev, prod).</p>
</li>
</ul>
</li>
</ul>
<h2 id="heading-solution-overview">Solution Overview</h2>
<p>This solution uses IAM Identity Center. Users sign in using SSO to access AWS accounts under the same organization. They <a target="_blank" href="https://docs.aws.amazon.com/singlesignon/latest/userguide/manage-your-applications.html">access a web application</a> where they can request a new sandbox account. When a user introduces a request, an account is randomly picked from a pool of AWS accounts specially created for this purpose. It is assigned to the user with a pre-determined <a target="_blank" href="https://docs.aws.amazon.com/singlesignon/latest/userguide/permissionsetsconcept.html">Permission Set</a> in IAM Identity Center. The user can then sign into the account using the AWS console, or the AWS CLI with SSO. When the sandbox expires, or when the user requests it, the account assignment is removed. The user loses access to the account and all the resources are deleted using <a target="_blank" href="https://github.com/ekristen/aws-nuke">aws-nuke</a> (<code>aws-nuke</code> is an open-source utility that deletes all the resources in an AWS account). Finally, the account is recycled and put back into the account pool for future assignments. Administrators can control who can access the web application.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1736015186656/8763dd4e-a755-4603-994f-d122c68faf2e.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1736004642617/9dd20563-bd9a-43fa-932e-e10614c1c69e.png" alt="Request an AWS Sandbox Account" class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1736004724880/33691fbb-345b-490a-a601-13972c13531b.png" alt="Sandbox Account Ready" class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1736004764782/8455ae43-beed-40dc-9a10-20dd78b21ad0.png" alt="List of Sandbox accounts" class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1736015374201/969c5b3a-5358-478c-bf0e-c07e29ee29b7.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-how-it-works">How it works</h2>
<p>Here is an overview of the architecture.</p>
<p><img src="https://img.plantuml.biz/plantuml/png/dHPDJzimz9vVyPRBWaIR8IvJXH2b7GXf1KJ6E4mxk8wRMYHsP3ljkgd_VS_v3cbWur3vVN-_brveGvJ9ajZv4B8L5mocJy4zuh0s9jKJtrTaEuwuMMVBJ3D5fJ1Cc36LYK-sEYPBRTyHHuOUhQGQfJ4Hrg2_EVay_kI7N1ld0nSqpiBQk8_lJ2Q95ECqzts07_0aZVcAit189aK-9OPBSOD1HIe7BJdO2Vf_Ie5XwLKcg4NqWEgrabcgRMXJIcM6HJWi5p2QAMvsDo4M2bzC57qIGPbVaT00qtf118bOWgG76RDtM9ikQYe-J0sOQFnSomrJ8bU-Kn4H_7UUlduzeRVrsmY97mKVCZKdXYo9Plvy9qWYvvS3SZCSCuBJgBH_HT2u6IhFG3-_R33QIyN3Y0LqpS8i7gpEBJDRgPwvY6R5RCzPp37DdJ-BPHPUAJdSQICLLiRFHvMLMC3KXjDtj7Cc8ooSB1GTZH6bH944Ogo3sQKCKXdlBE8upWeP3Do0Y70fVq7PF-q2qQ1BuXy7uBM1yt1lRxDdPk5ZS352Yu55NSJT8zG_D2KUATxZddyugHLRjc5q3gNA1BuneY2KMsVlE8HYnS2rPoKFt0AEq-nNld1UKWSzhVqsYLIktQEtyyMP2B7D2qBNMCKoxUyTOZVxte9vMaja8hts10K7l22uEoerz_qikhuRlr0vxkPXPCJC6irQ1A2yQHv9kUrKojtoEShPX-RFqwUsGGPRH-69BJKtJM80pMnxj0QHtT1XZfTRyMLcUz_MBCRKeyLhTGe87h5_S2zbN4llyTVFfdDiFQ8rZqJJsscEr_NeKuem3csufoi8jLe2K4kq8dihBiLYXfUg2UoX8BGZUqagGYilLhLVnMw11RlkgnjOBPIsajKc5v8ePRS2HZ5R6ToZfbSLvwnc5Lr0UYDKw-bJfEjJ6E7g2RkFARTBclAQqastOYeUhQr-fqaLyyUQXxW5Fpk-orau6wDJBISGJ1TVRHeVeMYZtKPhe6qGnYZTtyw9bShmpYb4NkgKSLUbwx7FwrcmcLVmwFn5ddbaSuQJBdS8Txd_65TNgH_OONun7CsfX76MehlMuPpOajL-KB8VYhsjUPcl5VxhyX_OHgBDlNE1alWP-Gi0" alt /></p>
<p><a target="_blank" href="https://editor.plantuml.com/uml/dHPDJzimz9vVyPRBWaIR8IvJXH2b7GXf1KJ6E4mxk8wRMYHsP3ljkgd_VS_v3cbWur3vVN-_brveGvJ9ajZv4B8L5mocJy4zuh0s9jKJtrTaEuwuMMVBJ3D5fJ1Cc36LYK-sEYPBRTyHHuOUhQGQfJ4Hrg2_EVay_kI7N1ld0nSqpiBQk8_lJ2Q95ECqzts07_0aZVcAit189aK-9OPBSOD1HIe7BJdO2Vf_Ie5XwLKcg4NqWEgrabcgRMXJIcM6HJWi5p2QAMvsDo4M2bzC57qIGPbVaT00qtf118bOWgG76RDtM9ikQYe-J0sOQFnSomrJ8bU-Kn4H_7UUlduzeRVrsmY97mKVCZKdXYo9Plvy9qWYvvS3SZCSCuBJgBH_HT2u6IhFG3-_R33QIyN3Y0LqpS8i7gpEBJDRgPwvY6R5RCzPp37DdJ-BPHPUAJdSQICLLiRFHvMLMC3KXjDtj7Cc8ooSB1GTZH6bH944Ogo3sQKCKXdlBE8upWeP3Do0Y70fVq7PF-q2qQ1BuXy7uBM1yt1lRxDdPk5ZS352Yu55NSJT8zG_D2KUATxZddyugHLRjc5q3gNA1BuneY2KMsVlE8HYnS2rPoKFt0AEq-nNld1UKWSzhVqsYLIktQEtyyMP2B7D2qBNMCKoxUyTOZVxte9vMaja8hts10K7l22uEoerz_qikhuRlr0vxkPXPCJC6irQ1A2yQHv9kUrKojtoEShPX-RFqwUsGGPRH-69BJKtJM80pMnxj0QHtT1XZfTRyMLcUz_MBCRKeyLhTGe87h5_S2zbN4llyTVFfdDiFQ8rZqJJsscEr_NeKuem3csufoi8jLe2K4kq8dihBiLYXfUg2UoX8BGZUqagGYilLhLVnMw11RlkgnjOBPIsajKc5v8ePRS2HZ5R6ToZfbSLvwnc5Lr0UYDKw-bJfEjJ6E7g2RkFARTBclAQqastOYeUhQr-fqaLyyUQXxW5Fpk-orau6wDJBISGJ1TVRHeVeMYZtKPhe6qGnYZTtyw9bShmpYb4NkgKSLUbwx7FwrcmcLVmwFn5ddbaSuQJBdS8Txd_65TNgH_OONun7CsfX76MehlMuPpOajL-KB8VYhsjUPcl5VxhyX_OHgBDlNE1alWP-Gi0">Source</a></p>
<h3 id="heading-sandbox-accounts">Sandbox Accounts</h3>
<p>Sandbox accounts are organized in an Organization Unit (OU). These are the accounts that are assigned to a user when they request one. They don’t hold any resources except for SSO roles and a special role: <code>AWSNuke</code> to which we’ll come back later.</p>
<h3 id="heading-management-account">Management Account</h3>
<p>This is the management account for the AWS organization structure. Users live in this account (in IAM Identity Center). This account also holds a special cross-account role: <code>VendingMachine</code>. This role is assumed by the Vending Machine service to assign or revoke access to an account for a specific user.</p>
<h3 id="heading-vending-machine-account">Vending Machine Account</h3>
<p>This is the Vending Machine service. The service is composed of the following components:</p>
<h4 id="heading-a-static-website">A static website</h4>
<p>This is a React SPA which the users use to request a new sandbox. It is stored on S3 and served by AWS CloudFront distribution.</p>
<h4 id="heading-appsync-api">AppSync API</h4>
<p>An AppSync GraphQL API is the gateway between the users and the service. It triggers the <code>Assign Account</code> and <code>Release Account</code> Step Function workflows (see below).</p>
<p><strong>Amazon Cognito</strong></p>
<p>A Cognito user pool controls user authentication to AWS AppSync. <a target="_blank" href="https://repost.aws/knowledge-center/cognito-user-pool-iam-integration">IAM Identity Center is used as an Identity provider using SAML</a>.</p>
<h4 id="heading-accounts-dynamodb-table">Accounts DynamoDB Table</h4>
<p>This DynamoDB table contains information about the sandbox accounts such as their id and status (e.g. <code>USED</code> or <code>FREE</code>). When in use, it also stores who is using the account (user id), when it expires, etc.</p>
<h4 id="heading-eventbridge-scheduler">EventBridge Scheduler</h4>
<p>When a user requests an account, a schedule is created at the expiration date to destroy it and return it to the account pool.</p>
<h4 id="heading-assign-account-step-functions-workflow">Assign Account Step Functions Workflow</h4>
<p>When a user requests a new account, the AppSync API starts this Step Function Workflow, which orchestrates the assignation of an account to the user.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1736003288838/770adbcf-b439-4213-80d6-3f01de05864a.png" alt class="image--center mx-auto" /></p>
<p>First, it tries to find an available account (i.e. an account that is not already used by another user (<code>status = FREE</code>) and immediately locks it in DynamoDB (<code>status = USED</code>) so that no other users can have the same account assigned to them.</p>
<p>After that, the workflow invokes the <a target="_blank" href="https://docs.aws.amazon.com/singlesignon/latest/APIReference/API_CreateAccountAssignment.html">CreateAccountAssignment</a> command to assign the account to the requester user. It does so assuming the <code>VendingMachine</code> role in the management account.</p>
<p>Finally, we schedule the execution of the <code>Release Account</code> workflow at the expiration date of the account. By default, it’s 14 days after the request time, but the user can request a shorter or longer period. We also put an <code>accountAssigned</code> event into an Event Bridge bus to let other services know about it. This event is used by the <a target="_blank" href="https://docs.aws.amazon.com/eventbridge/latest/userguide/target-appsync.html">EventBridge AppSync integration</a> to notify the user that the account is ready, in real-time.</p>
<h4 id="heading-release-account-step-functions-workflow">Release Account Step Functions Workflow</h4>
<p>This other Step Function workflow orchestrates the destruction of an account; either when it expires (triggered by the EventBridge scheduler), or because the user requested it through the web app when it’s no longer needed.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1736003850594/4fdd1708-8dd8-4803-9703-39204618e803.png" alt class="image--center mx-auto" /></p>
<p>First, we check if the user initiated the destruction. When this is the case, we delete the schedule that we created when setting up the account. When triggered by the EventBridge scheduler, the schedule is automatically deleted (We set <code>ActionAfterCompletion: 'DELETE'</code>).</p>
<p>Then, we remove access to the account from the user using the <a target="_blank" href="https://docs.aws.amazon.com/singlesignon/latest/APIReference/API_DeleteAccountAssignment.html">DeleteAccountAssignment</a> command. We also remove the user association in the <code>accounts</code> DynamoDB table.</p>
<p>Then, we invoke <code>aws-nuke</code>. Since it’s a long-running process, it is executed within an ECS task running on Fargate. <code>aws-nuke</code> assumes the <code>AWSNuke</code> role within the targeted account and uses that role to delete all resources.</p>
<p>Finally, we release the account in DynamoDB so that it can be re-assigned to another user later.</p>
<h2 id="heading-cost-estimations">Cost Estimations</h2>
<p>Assuming a moderate usage of this service, the operational cost should be close to zero.</p>
<p>With an average of 10 account requests per month, here are the cost estimations of the main components:</p>
<p><strong>AppSync API</strong></p>
<ul>
<li><p>API requests: &lt; 1000 requests(1) at $4/million requests = ~$0.004</p>
</li>
<li><p>~10 real-time updates at $2/million = ~$0.00002</p>
</li>
<li><p>&lt; 10 connection minutes at $0.08/million minute = &lt;$0.0000008</p>
</li>
</ul>
<p>(1) Assuming that users consult the web app more often to check expiration times, etc.</p>
<p><strong>DynamoDB</strong></p>
<p>Usage should stay under the free tier. Outside free tier:</p>
<ul>
<li><p>~50 write requests = $0.00003125</p>
</li>
<li><p>~1000 read requests = $0.000125</p>
</li>
<li><p>&lt;1MB storage = $0.00025</p>
</li>
</ul>
<p><strong>Cognito</strong></p>
<ul>
<li><p>50 MAU (Monthly Active Users) are free (with SAML identity provider).</p>
</li>
<li><p>$.015 per MAU after that.</p>
</li>
</ul>
<p><strong>Step Functions</strong></p>
<p>~150 state transitions. Within the always-free tier of 4,000 state transitions per month.</p>
<p>Outside the free tier, at $0.025 / 1,000 transitions: $0.025 × 150 / 1000 ~= $0.00375</p>
<p><strong>EventBridge (Event bus and scheduler)</strong></p>
<ul>
<li><p>~10 events per month: ~$0.00001 ($1 / million events)</p>
</li>
<li><p>~10 schedules per month: Well under the 14M schedules free tier; or ~$0.00001 ($1 / million schedule triggers)</p>
</li>
</ul>
<p><strong>Static Website</strong></p>
<ul>
<li><p>S3: ~1.4 Mb - $0.023 x 0.0014GB = $0.0000322</p>
</li>
<li><p>Cloudfront: CloudFront has an always-free tier of 1TB data transfer and 10M HTTPS requests, which is probably more than enough for this use case.</p>
</li>
</ul>
<p><strong>ECS Fargate</strong></p>
<p>Probably the highest cost. This solution uses the lowest configuration possible (0.25 vCPUs and 512 MB of memory). Assuming an average of ~15 minutes execution each time.</p>
<ul>
<li><p>CPU: 0.25 vCPU x 0.25 hours x 10 x $0.03238 = $0.0202375 (ARM architecture)</p>
</li>
<li><p>Memory: 0.5GB × 0.25 hours x 10 x $0.00356 = $0.001869</p>
</li>
</ul>
<p><strong>Total cost</strong></p>
<p>Even excluding the free tier, the total cost of operation should not go over a few cents per month. Of course, your mileage may vary, depending on the size of your organization, the number of users, and how many times the service is used.</p>
<h2 id="heading-its-open-source">It’s Open Source!</h2>
<p>This project is open-source, you can find it on <a target="_blank" href="https://github.com/bboure/aws-account-vending-machine-demo">GitHub</a><strong>.</strong> Feel free to fork it and deploy it into your account, share it, and send me your feedback!</p>
<h2 id="heading-whats-next">What’s Next?</h2>
<p>This solution is basic. I built it both as a PoC and for the challenge. It’s also good enough for my personal usage and as an MVP. However, I can see a few improvements that could be added:</p>
<ul>
<li><p><strong>User notifications</strong>: Before an account is destroyed, users might want to get warned a few days before it happens.</p>
</li>
<li><p><strong>Budgets</strong>: I didn't include budget limits in this MVP, but automatically removing costly resources before they increase your AWS bill would be a useful feature.</p>
</li>
<li><p><strong>Time/Budget Extension:</strong> Need more time/budget to work on your project? Request an extension.</p>
</li>
<li><p><strong>Multiple Permission Sets</strong>: The current implementation grants the <code>AdministratorAccess</code> permission set by default. One might want to support more than one permission set, depending on the use case the account is created for, or who requests it.</p>
</li>
<li><p><strong>Team accounts</strong>: When working with teams, you might want to assign an account to a team, instead of single users, so that several people can work on it at the same time.</p>
</li>
<li><p><strong>Manager approval</strong>: Companies might want to require approval by a manager before an account is granted to a user or team. Managers could also be able to control expiration times, budgets, permission sets, etc.</p>
</li>
</ul>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Trying new AWS services, making proofs of concepts, or attending workshops can easily clutter your AWS accounts and incur costs if you forget to clean up after yourself. Ephemeral, self-destructed AWS accounts can help eliminate or mitigate those problems. With a self-service vending machine, users can request sandbox accounts to play with and focus on their projects while the cleanup process is automated.</p>
]]></content:encoded></item><item><title><![CDATA[Unmarshalling DynamoDB Items from Step Functions]]></title><description><![CDATA[AWS Step Functions introduced two new features: variables and support for JSONata. JSONata is a lightweight query and transformation language for JSON data. Whoever has worked with Step Functions knows that this is a real game-changer! When I heard t...]]></description><link>https://benoitboure.com/unmarshalling-dynamodb-items-from-step-functions</link><guid isPermaLink="true">https://benoitboure.com/unmarshalling-dynamodb-items-from-step-functions</guid><category><![CDATA[AWS]]></category><category><![CDATA[AWS Step Functions]]></category><category><![CDATA[DynamoDB]]></category><category><![CDATA[serverless]]></category><category><![CDATA[CDK]]></category><dc:creator><![CDATA[Benoît Bouré]]></dc:creator><pubDate>Mon, 16 Dec 2024 08:00:51 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1734187737566/2216cab7-1b92-46e2-b31f-2a2ffaec2063.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>AWS Step Functions introduced two new features: variables and support for <a target="_blank" href="https://docs.jsonata.org/overview.html">JSONata</a>. JSONata is a lightweight query and transformation language for JSON data. Whoever has worked with Step Functions knows that this is a real game-changer! When I heard the news, I immediately saw the potential for many things that would previously require a Lambda function but would now be achievable “natively” in Step Functions.</p>
<p>One common task many Step Functions workflows do is fetching items from DynamoDB. However, the <code>getItem</code> or <code>query</code> tasks return the raw “unmarshalled” data from DynamoDB.</p>
<p>Example:</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"id"</span>: {
    <span class="hljs-attr">"S"</span>: <span class="hljs-string">"1"</span>
  },
  <span class="hljs-attr">"name"</span>: {
    <span class="hljs-attr">"S"</span>: <span class="hljs-string">"John"</span>
  },
  <span class="hljs-attr">"address"</span>: {
    <span class="hljs-attr">"M"</span>: {
      <span class="hljs-attr">"street"</span>: {
        <span class="hljs-attr">"S"</span>: <span class="hljs-string">"123, 5th Avenue"</span>
      },
      <span class="hljs-attr">"postCode"</span>: {
        <span class="hljs-attr">"S"</span>: <span class="hljs-string">"5555"</span>
      },
      <span class="hljs-attr">"city"</span>: {
        <span class="hljs-attr">"S"</span>: <span class="hljs-string">"New York"</span>
      }
    }
  },
  <span class="hljs-attr">"age"</span>: {
    <span class="hljs-attr">"N"</span>: <span class="hljs-string">"32"</span>
  }
}
</code></pre>
<p>This format is not very practical to work with because you need to remember and know the field types in the paths (e.g., <code>$.item.name.S</code>). Additionally, there's an issue with certain values, such as numbers, which are encoded as strings (like <code>age</code> above). This makes it harder to perform simple operations like math and comparisons.</p>
<p>With the arrival of JSONata, I started wondering if we could use the <a target="_blank" href="https://docs.jsonata.org/string-functions">Function Library</a> to "visit" and decode DynamoDB objects (a.k.a unmarshall them).</p>
<h2 id="heading-step-one-a-simple-proof-of-concept">Step One: A Simple Proof of Concept</h2>
<p>Before getting into the nitty-gritty of Step Functions, I first wanted to make a quick proof of concept to see if JSONata would give us that possibility. Luckily, JSONata has a practical <a target="_blank" href="https://try.jsonata.org/">playground</a> to try it out. After some time, I came to this simple solution:</p>
<pre><code class="lang-javascript">(
  $unmarshall := <span class="hljs-function"><span class="hljs-keyword">function</span> (<span class="hljs-params">$object</span>) </span>{(
      $type($object) = <span class="hljs-string">'array'</span>
        ? [$map($object, $unmarshall)]
        : $merge($each($object, <span class="hljs-function"><span class="hljs-keyword">function</span> (<span class="hljs-params">$val, $key</span>) </span>{
            { <span class="hljs-attr">$key</span>: $convertValue($val) }
        })
      );
  )};

  $convertValue := <span class="hljs-function"><span class="hljs-keyword">function</span> (<span class="hljs-params">$object</span>) </span>{(
    $type := $keys($object)[<span class="hljs-number">0</span>];
    $value := $lookup($object, $type);

    $type <span class="hljs-keyword">in</span> [<span class="hljs-string">'S'</span>, <span class="hljs-string">'SS'</span>, <span class="hljs-string">'Ss'</span>, <span class="hljs-string">'B'</span>, <span class="hljs-string">'BS'</span>, <span class="hljs-string">'Bs'</span>] ?  $value
        : $type <span class="hljs-keyword">in</span> [<span class="hljs-string">'N'</span>] ? $number($value)
        : $type <span class="hljs-keyword">in</span> [<span class="hljs-string">'M'</span>] ? $unmarshall($value)
        : $type <span class="hljs-keyword">in</span> [<span class="hljs-string">'BOOL'</span>, <span class="hljs-string">'Bool'</span>] ? $value = <span class="hljs-string">'true'</span> or $value = <span class="hljs-literal">true</span>
        : $type <span class="hljs-keyword">in</span> [<span class="hljs-string">'L'</span>] ? [$map($value, $convertValue)]
        : $type <span class="hljs-keyword">in</span> [<span class="hljs-string">'NS'</span>, <span class="hljs-string">'Ns'</span>] ? [$value.$number()]
        : $type <span class="hljs-keyword">in</span> [<span class="hljs-string">'NULL'</span>, <span class="hljs-string">'Null'</span>, <span class="hljs-string">'Nul'</span>] ? <span class="hljs-literal">null</span>
        : $error(<span class="hljs-string">'Unsupported type: '</span> &amp; $type);
  )};

  $unmarshall($);
)
</code></pre>
<p><strong>What’s going on in there?</strong></p>
<p><code>$unmarshall</code> is a function that takes an object or an array as input. It visits the value and for each attribute, it converts the nested type objects into native values with <code>$convertValue</code>. It does so recursively for nested arrays and maps.</p>
<p>The final result is very similar to the <a target="_blank" href="https://github.com/aws/aws-sdk-js-v3/blob/main/packages/util-dynamodb/src/convertToNative.ts">JavaScript version</a>.</p>
<p><a target="_blank" href="https://try.jsonata.org/yG-465u5K">Try it yourself!</a></p>
<h2 id="heading-step-two-use-it-with-step-functions">Step Two: Use it With Step Functions</h2>
<p>After proving it’s doable, the second step was to make it work within Step Functions.</p>
<p>My first attempt was to use the new <code>Assign</code> property and store the <code>$unmarshall</code> and <code>$convertValue</code> functions into variables of the same name. Then I tried to call them from the <code>Output</code> property.</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"Type"</span>: <span class="hljs-string">"Pass"</span>,
  <span class="hljs-attr">"QueryLanguage"</span>: <span class="hljs-string">"JSONata"</span>,
  <span class="hljs-attr">"Assign"</span>: {
    <span class="hljs-attr">"unmarshall"</span>: <span class="hljs-string">"{% function ($object) { ... } %}"</span>,
    <span class="hljs-attr">"convertValue"</span>: <span class="hljs-string">"{% function ($object) { ... } %}"</span>
  },
  <span class="hljs-attr">"Output"</span>: {
    <span class="hljs-attr">"result"</span>: <span class="hljs-string">"{% $unmarshall($states.input.dyanmoDbItem) %}"</span>
  }
}
</code></pre>
<p>But this did not work. For two reasons:</p>
<ol>
<li>As mentioned in <a target="_blank" href="https://arc.net/l/quote/lvbezixj">this article</a> and <a target="_blank" href="https://docs.aws.amazon.com/step-functions/latest/dg/transforming-data.html#querylanguage-field">the doc</a>, you can’t use the variables in the same state you assigned them.</li>
</ol>
<pre><code class="lang-json">{
    <span class="hljs-attr">"error"</span>: <span class="hljs-string">"States.QueryEvaluationError"</span>,
    <span class="hljs-attr">"cause"</span>: <span class="hljs-string">"The JSONata expression '$unmarshall($states.input.dyanmoDbItem)' specified for the field 'Output/result' threw an error during evaluation. T1006: Attempted to invoke a non-function"</span>
}
</code></pre>
<p>This is because the <code>Assign</code> and <code>Output</code> steps are evaluated in parallel.</p>
<ol start="2">
<li>Assigning functions in variables is not supported.</li>
</ol>
<p>This is not mentioned anywhere in the doc, but you can’t assign a function to a variable (it must be a “real” value). I learned this when I tried to move the <code>Assign</code> part into a previous task to work around the previous limitation.</p>
<pre><code class="lang-json">{
    <span class="hljs-attr">"error"</span>: <span class="hljs-string">"States.QueryEvaluationError"</span>,
    <span class="hljs-attr">"cause"</span>: <span class="hljs-string">"The JSONata expression 'function ($object) { ... }' specified for the field 'Assign/unmarshall' returned an unsupported result type."</span>
}
</code></pre>
<p>After some thought and research, I figured that nothing prevents me from putting everything into a single expression. This expression can define the functions <strong>and</strong> return the final result (just like in the JSONata playground).</p>
<p>And because that expression evaluates to a value, the result would end up in that variable, ready to be used later.</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"Type"</span>: <span class="hljs-string">"Pass"</span>,
  <span class="hljs-attr">"QueryLanguage"</span>: <span class="hljs-string">"JSONata"</span>,
  <span class="hljs-attr">"Assign"</span>: {
    <span class="hljs-attr">"unmarshalledItem"</span>: <span class="hljs-string">"{% (
  $unmarshall := function ($object) {(
    $type($object) = 'array' ?
      [$map($object, $unmarshall)]
      : $merge($each($object, function ($val, $key) {
            { $key: $convertValue($val) }
        })
    );
  )};

  $convertValue := function ($object) {(
    $type := $keys($object)[0];
    $value := $lookup($object, $type);

    $type in ['S', 'SS', 'Ss', 'B', 'BS', 'Bs'] ?  $value
      : $type in ['N'] ? $number($value)
      : $type in ['M'] ? $unmarshall($value)
      : $type in ['BOOL', 'Bool'] ? $value = 'true' or $value = true
      : $type in ['L'] ? [$map($value, $convertValue)]
      : $type in ['NS', 'Ns'] ? [$value.$number()]
      : $type in ['NULL', 'Null', 'Nul'] ? null
      : $error('Unsupported type: ' &amp; $type);
  )};

  $unmarshall($states.input.dynamoDbItem);
) %}"</span>
  }
}
</code></pre>
<div data-node-type="callout">
<div data-node-type="callout-emoji">🗒</div>
<div data-node-type="callout-text">Note: I kept the new lines inside <code>Assign</code> for readability, but for it to be a valid JSON/ASL, it must all go into a single line when deployed to Step Functions.</div>
</div>

<p>Testing it out:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1734188827533/9fb02611-c03a-436b-bc58-d6c202e88470.png" alt class="image--center mx-auto" /></p>
<p>Now I can use the <code>$unmarshalledItem</code> variable, which contains the result, anywhere in a later state.</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"Type"</span>: <span class="hljs-string">"Pass"</span>,
  <span class="hljs-attr">"QueryLanguage"</span>: <span class="hljs-string">"JSONata"</span>,
  <span class="hljs-attr">"Output"</span>: {
    <span class="hljs-attr">"unmarshalledItem"</span>: <span class="hljs-string">"{% $unmarshalledItem %}"</span>
  }
}
</code></pre>
<p>Alternatively, I could also return the result directly in the <code>Output</code></p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"Type"</span>: <span class="hljs-string">"Pass"</span>,
  <span class="hljs-attr">"QueryLanguage"</span>: <span class="hljs-string">"JSONata"</span>,
  <span class="hljs-attr">"Output"</span>: {
    <span class="hljs-attr">"unmarshalledItem"</span>: <span class="hljs-string">"{% (
  $unmarshall := function ($object) {(
    $type($object) = 'array' ?
      [$map($object, $unmarshall)]
      : $merge($each($object, function ($val, $key) {
            { $key: $convertValue($val) }
        })
    );
  )};

  $convertValue := function ($object) {(
    $type := $keys($object)[0];
    $value := $lookup($object, $type);

    $type in ['S', 'SS', 'Ss', 'B', 'BS', 'Bs'] ?  $value
      : $type in ['N'] ? $number($value)
      : $type in ['M'] ? $unmarshall($value)
      : $type in ['BOOL', 'Bool'] ? $value = 'true' or $value = true
      : $type in ['L'] ? [$map($value, $convertValue)]
      : $type in ['NS', 'Ns'] ? [$value.$number()]
      : $type in ['NULL', 'Null', 'Nul'] ? null
      : $error('Unsupported type: ' &amp; $type);
  )};

  $unmarshall($states.input.dynamoDbItem);
) %}"</span>
  }
}
</code></pre>
<h2 id="heading-step-3-create-a-cdk-construct">Step 3: Create a CDK Construct</h2>
<p>After having a proof of concept that works, I wanted to put it all into a simple re-useable CDK construct:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> { CustomState } <span class="hljs-keyword">from</span> <span class="hljs-string">"aws-cdk-lib/aws-stepfunctions"</span>;
<span class="hljs-keyword">import</span> { Construct } <span class="hljs-keyword">from</span> <span class="hljs-string">"constructs"</span>;

<span class="hljs-keyword">interface</span> DynamoUnmarshallProps {
  path: <span class="hljs-built_in">string</span>;
  variableName: <span class="hljs-built_in">string</span>;
}

<span class="hljs-keyword">const</span> generateUnmarshall = <span class="hljs-function">(<span class="hljs-params">path: <span class="hljs-built_in">string</span></span>) =&gt;</span> <span class="hljs-string">`{% (
  $unmarshall := function ($object) {(
    $type($object) = 'array' ?
      [$map($object, $unmarshall)]
      : $merge($each($object, function ($val, $key) {
            { $key: $convertValue($val) }
        })
    );
  )};

  $convertValue := function ($object) {(
    $type := $keys($object)[0];
    $value := $lookup($object, $type);

    $type in ['S', 'SS', 'Ss', 'B', 'BS', 'Bs'] ?  $value
      : $type in ['N'] ? $number($value)
      : $type in ['M'] ? $unmarshall($value)
      : $type in ['BOOL', 'Bool'] ? $value = 'true' or $value = true
      : $type in ['L'] ? [$map($value, $convertValue)]
      : $type in ['NS', 'Ns'] ? [$value.$number()]
      : $type in ['NULL', 'Null', 'Nul'] ? null
      : $error('Unsupported type: ' &amp; $type);
  )};

  $unmarshall(<span class="hljs-subst">${path}</span>);
) %}`</span>;

<span class="hljs-keyword">export</span> <span class="hljs-keyword">class</span> DynamoUnmarshall <span class="hljs-keyword">extends</span> CustomState {
  <span class="hljs-keyword">constructor</span>(<span class="hljs-params">scope: Construct, id: <span class="hljs-built_in">string</span>, props: DynamoUnmarshallProps</span>) {
    <span class="hljs-keyword">const</span> { path, variableName } = props;

    <span class="hljs-built_in">super</span>(scope, id, {
      stateJson: {
        Type: <span class="hljs-string">"Pass"</span>,
        QueryLanguage: <span class="hljs-string">"JSONata"</span>,
        Assign: {
          [variableName]: generateUnmarshall(path),
        },
      },
    });
  }
}
</code></pre>
<p>The construct takes two input parameters:</p>
<ul>
<li><p><code>path</code>: The JSONata path of the raw DynamoDb item to unmarshall</p>
</li>
<li><p><code>variableName</code> the name of the variable where to store the result</p>
</li>
</ul>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> unmarshall = <span class="hljs-keyword">new</span> DynamoUnmarshall(<span class="hljs-built_in">this</span>, <span class="hljs-string">"Unmarshall"</span>, {
  path: <span class="hljs-string">"$states.input.dynamoDbItem"</span>,
  variableName: <span class="hljs-string">"unmarshalledItem"</span>,
});
</code></pre>
<p>You can find a <a target="_blank" href="https://github.com/bboure/stepfunction-unmarshall-dynamodb/tree/main">fully working example on GitHub</a>.</p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">⚠</div>
<div data-node-type="callout-text">Please note that this was for experimental and learning purposes only. I do not guarantee that this code will work in all cases. I did not test this thoroughly and it should <strong>not</strong> be considered production-ready.</div>
</div>

<h2 id="heading-conclusion">Conclusion</h2>
<p>Unmarshalling DynamoDB within Step Functions lets developers access their data more easily. Previously, developers needed to remember to include field types in their paths or use a Lambda function. By embedding the logic into a reusable CDK construct, it simplifies the process by hiding the complexity of the logic.</p>
]]></content:encoded></item><item><title><![CDATA[Calling External Endpoints With Step Functions and the CDK]]></title><description><![CDATA[At re:Invent 2023, AWS announced a new feature for Step Functions that allows you to call third-party HTTPS API endpoints directly from your workflow without the need to write a Lambda function. It's a simple way to allow you to securely call externa...]]></description><link>https://benoitboure.com/calling-external-endpoints-with-step-functions-and-the-cdk</link><guid isPermaLink="true">https://benoitboure.com/calling-external-endpoints-with-step-functions-and-the-cdk</guid><category><![CDATA[AWS]]></category><category><![CDATA[serverless]]></category><category><![CDATA[CDK]]></category><category><![CDATA[aws-cdk]]></category><category><![CDATA[stepfunction]]></category><category><![CDATA[state-machines]]></category><dc:creator><![CDATA[Benoît Bouré]]></dc:creator><pubDate>Wed, 17 Jan 2024 20:21:11 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1705521462494/3d9206b6-871e-43e9-80f6-38d033b3c71d.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>At re:Invent 2023, AWS announced a new feature for <a target="_blank" href="https://aws.amazon.com/step-functions">Step Functions</a> that allows you to call third-party HTTPS API endpoints directly from your workflow without the need to write a Lambda function. It's a simple way to allow you to securely call external providers such as Stripe, Github, etc.</p>
<p>AWS Step Functions is a service from AWS that allows developers to easily create orchestrated processes (state machines) without having to manage servers. It integrates with over 200 services. With Step Functions, you only pay for the number of <a target="_blank" href="https://aws.amazon.com/step-functions/pricing/">state transitions</a> that your state machines execute.</p>
<p>In this article, I will explain this new feature, and illustrate it with a practical example using the CDK (Cloud Development Kit).</p>
<h2 id="heading-how-does-it-work">How does it work?</h2>
<p>The HTTP endpoint Task state allows you to send an HTTPS request to the endpoint of your choice. It can be a <code>GET</code>, <code>POST</code>, <code>PUT</code>, <code>DELETE</code>, <code>PATCH</code>, <code>OPTIONS</code>, or <code>HEAD</code>, and you can also pass a request body.</p>
<p>You will also need to specify a connection arn for authentication. Step Functions HTTP endpoints use EventBridge connections, the same as for <a target="_blank" href="https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-api-destinations.html#eb-api-destination-connection">EventBridge API destinations</a>. This keeps your credentials secure, preventing them from being hard-coded in the ASL (Amazon State Language) definition.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1705515299911/824a8665-6978-41a8-ae5e-206b62b2043c.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-a-practical-example">A Practical Example</h2>
<p>Let's take a practical example for this new Task state. Imagine that we are selling licenses for an app (like <a target="_blank" href="https://graphbolt.dev/">GraphBolt</a>). We accept payments on our website. Once a payment has been confirmed, we want to generate a license and send it to the user via email.</p>
<p>We are using the following services:</p>
<p><strong>Paddle</strong></p>
<p><a target="_blank" href="https://www.paddle.com/">Paddle</a> is a merchant of record which provides a payment gateway. They also support sending <a target="_blank" href="https://developer.paddle.com/webhooks/notification-destinations">notifications</a> to your backend via webhooks when a purchase is confirmed. They also have an API that allows us to fetch information about payments, customers, etc.</p>
<p><strong>Keygen</strong></p>
<p><a target="_blank" href="https://keygen.sh/">Keygen.sh</a> is an open-source licensing API. It provides everything you need to generate, manage, and validate software licenses.</p>
<p>Our goal is to create a back-end system with an API that receives events from Paddle, validates them, and then starts a Step Functions state machine that processes the event to generate and send the license key to the user.</p>
<p>Here is the overview of what it looks like.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1705519748141/712f0ceb-6cc3-4fc4-93c9-a4647cec5207.png" alt class="image--center mx-auto" /></p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">Here, I will only focus on the Step Functions state machine, and more specifically the HTTP task definition. I won't go into detail about how Paddle and Keygen work.</div>
</div>

<p>Here is what we want our state machine to accomplish:</p>
<ol>
<li><p>Receive a <a target="_blank" href="https://developer.paddle.com/webhooks/transactions/transaction-completed"><code>transaction.complete</code></a> Paddle event as input.</p>
</li>
<li><p>Generate a new License in Keygen through the API.</p>
</li>
<li><p>Fetch the customer information from Paddle (name, email, etc) using the <code>customer_id</code> included in the event.</p>
</li>
<li><p>Send the license key to the user via SES.</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1705516840838/9b74a819-90b6-4b5c-bcdb-22f714787fe8.png" alt class="image--center mx-auto" /></p>
<p>At the time of writing this article, the CDK does not (yet) have a dedicated Construct for HTTP endpoint Task (watch <a target="_blank" href="https://github.com/aws/aws-cdk/issues/28278">this Guthub issue</a>). However, we can use the <code>CustomTask</code> construct and define it using plain old ASL.</p>
<p>This is how I defined the <em>CreateLicense</em> task.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> keygenConnection = <span class="hljs-keyword">new</span> Connection(<span class="hljs-built_in">this</span>, <span class="hljs-string">'KeygenConnection'</span>, {
  authorization: Authorization.apiKey(
    <span class="hljs-string">'Authorization'</span>,
    SecretValue.secretsManager(<span class="hljs-string">'KeygenSecret'</span>),
  ),
});

<span class="hljs-keyword">const</span> keygenEndpoint =
  <span class="hljs-string">'https://api.keygen.sh/v1/accounts/2d4fdf58-9507-4e0b-a7e2-5520e1f1cbdb'</span>;

<span class="hljs-keyword">const</span> createLicense = <span class="hljs-keyword">new</span> CustomState(<span class="hljs-built_in">this</span>, <span class="hljs-string">'CreateLicense'</span>, {
  stateJson: {
    Type: <span class="hljs-string">'Task'</span>,
    Resource: <span class="hljs-string">'arn:aws:states:::http:invoke'</span>,
    Parameters: {
      ApiEndpoint: <span class="hljs-string">`<span class="hljs-subst">${keygenEndpoint}</span>/licenses`</span>,
      Method: <span class="hljs-string">'POST'</span>,
      Authentication: {
        ConnectionArn: keygenConnection.connectionArn,
      },
      RequestBody: {
        data: {
          <span class="hljs-keyword">type</span>: <span class="hljs-string">'licenses'</span>,
          attributes: {
            metadata: {
              <span class="hljs-string">'transactionId.$'</span>: <span class="hljs-string">'$.data.id'</span>,
              <span class="hljs-string">'customerId.$'</span>: <span class="hljs-string">'$.data.customer_id'</span>,
            },
          },
          relationships: {
            policy: {
              data: {
                <span class="hljs-keyword">type</span>: <span class="hljs-string">'policies'</span>,
                id: <span class="hljs-string">'8c2294b0-dbbe-4028-b561-6aa246d60951'</span>,
              },
            },
          },
        },
      },
    },
    ResultSelector: {
      <span class="hljs-string">'body.$'</span>: <span class="hljs-string">'States.StringToJson($.ResponseBody)'</span>,
    },
    OutputPath: <span class="hljs-string">'$.body'</span>,
  },
});
</code></pre>
<p>First, we create a <code>Connection</code> for our HTTP task. This is an <a target="_blank" href="https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_events.Connection.html">EventBridge Connection</a>. As I explained earlier, the role of the connection is to store the credentials securely and not leak them in the Step Functions definition. However, we also don't want to hard-code them in the CDK definition. To avoid that, I manually created a value in Secret Manager named <code>KeygenSecret</code> which contains the API key, and I referenced it in the connection.</p>
<p>Then, I create a Task with the <code>Resource</code> type of <code>arn:aws:states:::http:invoke</code>, attach the connection to it, and define all the other attributes (method, body, etc).</p>
<p>This defines our HTTP task state, but to be able to execute it, Step Functions also needs the necessary permissions.</p>
<p>We need three things:</p>
<ul>
<li><p>Permission to execute HTTP requests</p>
</li>
<li><p>Permission to use the EventBridge connection</p>
</li>
<li><p>Permission to fetch the connection's secret</p>
</li>
</ul>
<p>For that, I manually add the following IAM policies to the state machine's role.</p>
<pre><code class="lang-typescript">sm.role.attachInlinePolicy(
  <span class="hljs-keyword">new</span> Policy(<span class="hljs-built_in">this</span>, <span class="hljs-string">'HttpInvoke'</span>, {
    statements: [
      <span class="hljs-keyword">new</span> PolicyStatement({
        actions: [<span class="hljs-string">'states:InvokeHTTPEndpoint'</span>],
        resources: [sm.stateMachineArn],
        conditions: {
          StringEquals: {
            <span class="hljs-string">'states:HTTPMethod'</span>: <span class="hljs-string">'POST'</span>,
          },
          StringLike: {
            <span class="hljs-string">'states:HTTPEndpoint'</span>: <span class="hljs-string">`<span class="hljs-subst">${keyGenEndpoint}</span>/*`</span>,
          },
        },
      }),
      <span class="hljs-keyword">new</span> PolicyStatement({
        actions: [<span class="hljs-string">'events:RetrieveConnectionCredentials'</span>],
        resources: [
          keygenConnection.connectionArn,
        ],
      }),
      <span class="hljs-keyword">new</span> PolicyStatement({
        actions: [
          <span class="hljs-string">'secretsmanager:GetSecretValue'</span>,
          <span class="hljs-string">'secretsmanager:DescribeSecret'</span>,
        ],
        resources: [
          <span class="hljs-string">'arn:aws:secretsmanager:*:*:secret:events!connection/*'</span>,
        ],
      }),
    ],
  }),
);
</code></pre>
<p>Finally, I did the same thing for the Paddle HTTP task. I also added the SES <code>sendEmail</code> task and put everything together.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1705518882773/d35d5341-c595-4f58-9962-9f2db546e4c8.png" alt class="image--center mx-auto" /></p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">🧑‍💻</div>
<div data-node-type="callout-text">You can find the full code on <a target="_blank" href="https://github.com/bboure/step-functions-http-task-cdk">GitHub</a>.</div>
</div>

<h1 id="heading-conclusion">Conclusion</h1>
<p>The support for direct calls to HTTP endpoints opens a lot of possibilities to integrate with third parties. Before, we would require a Lambda function to achieve the same result. This is one more step forward towards zero-code Step Functions!</p>
]]></content:encoded></item><item><title><![CDATA[Securely Access Your AWS Resources From Github Actions]]></title><description><![CDATA[Security is a very important topic for all cloud engineers. Making sure that your infrastructure and data are kept out of reach of malicious people is one of the most serious things to get right. In AWS, we are used to dealing with IAM roles and perm...]]></description><link>https://benoitboure.com/securely-access-your-aws-resources-from-github-actions</link><guid isPermaLink="true">https://benoitboure.com/securely-access-your-aws-resources-from-github-actions</guid><category><![CDATA[AWS]]></category><category><![CDATA[GitHub]]></category><category><![CDATA[github-actions]]></category><category><![CDATA[Security]]></category><category><![CDATA[ci-cd]]></category><dc:creator><![CDATA[Benoît Bouré]]></dc:creator><pubDate>Mon, 27 Dec 2021 15:13:06 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1640545141412/JHfiV9GBd.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Security is a very important topic for all cloud engineers. Making sure that your infrastructure and data are kept out of reach of malicious people is one of the most serious things to get right. In AWS, we are used to dealing with IAM roles and permissions that make our resources accessible to users or to other resources. However, sometimes you need to grant access from outside your organization.</p>
<p>One example is when you want to deploy your infrastructure from a CI/CD pipeline, like Github Actions. How do you allow your workflow to gain access to your AWS account?</p>
<p>One approach is to create a dedicated IAM user, store its credentials in the <a target="_blank" href="https://docs.github.com/en/actions/security-guides/encrypted-secrets">Github secrets store</a>, and allow the workflow to use them. Easy, enough! Secrets are encrypted by Github, so it is secure, right? </p>
<p>Not really... The problem is that those credentials are meant to be long-lived. It means that if anyone is able to get hold of them for whatever reason (eg: a leak in workflow logs, someone gaining access to a GitHub action runner, etc), they will be able to access all your resources (at least those that the credentials are allowed to control). Sure, you could rotate them from time to time, but you'd have to do that manually. This is probably not something you want to spend time doing and let's face it, you probably won't!</p>
<p>Luckily, there is a better solution. If you are using Github Actions, you can allow Github to grab temporary, short-lived, credentials that it can use during the execution of the workflow. After that, the credentials will expire and no one will ever be able to use them again.</p>
<p>In this post, I will guide you through the steps to set this up. Don't worry, it's actually easier than you think!</p>
<p>Here is a schema representing what we are going to accomplish</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1640544136378/7f72xtMx3.png" alt="Github OIDC assume role.png" /></p>
<h1 id="heading-setting-up-your-aws-account">Setting up your AWS account</h1>
<blockquote>
<p>💡 TL;DR; I created a CloudFormation quick-create link that you can use to automate the following steps. See at the bottom of this article. If you want to know how it works, and what CloudFormation is going to do, keep reading this section.</p>
</blockquote>
<h2 id="heading-create-an-openid-connect-identity-provider">Create an OpenID Connect Identity provider</h2>
<p>The first step is to create an OpenID Connect (OIDC) identity provider in your AWS Account. This will allow Github to identify itself.</p>
<ul>
<li>Got to the <a target="_blank" href="https://console.aws.amazon.com/iamv2/home?#/identity_providers">IAM console -&gt; Identity providers</a></li>
<li>Click <em>Add new provider</em></li>
<li>Select <em>OpenID Connect</em></li>
<li>Provider Url: <code>https://token.actions.githubusercontent.com</code> (Don't forget to click <code>Get Thumbprint</code>)</li>
<li>Audience: <code>sts.amazonaws.com</code></li>
<li>Add tags if you want to and click <em>Add Provider</em></li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1640547423605/DPAcFf-vK.png" alt="image.png" /></p>
<blockquote>
<p>💡 You will need to do this step only once per AWS account.</p>
</blockquote>
<p><strong>Edit Jan 13 2022</strong></p>
<p>On Jan 12, GithubActions changed its certificate chain. The new thumbprint is <code>6938fd4d98bab03faadb97b34396831e3780aea1</code></p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://twitter.com/Benoit_Boure/status/1481537078869565440">https://twitter.com/Benoit_Boure/status/1481537078869565440</a></div>
<h2 id="heading-create-a-role">Create a role</h2>
<p>You now need to create a role that Github will be able to assume in order to access the resources it needs to control. </p>
<ul>
<li>Go back to IAM and select <a target="_blank" href="https://console.aws.amazon.com/iamv2/home?#/roles">Roles</a></li>
<li>Create a new Role</li>
<li>Chose <em>Web Identity</em>, select the Identity provider you created in the previous step, and its audience.</li>
<li>Click <em>Next:Permissions</em></li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1640267179031/LAbzvscMB.png" alt="image.png" /></p>
<p>You now need to give the role the appropriate permissions (Policies). These are the ones that Github needs in order to do whatever it has to do. This will vary based on your use case, so I will leave that up to you. Keep in mind that you should stick to the <a target="_blank" href="https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html#grant-least-privilege">principle of least privileges</a>.</p>
<p>When that is done, give your role a name and click <em>Create Role</em>.</p>
<p>There is now an additional step to do. You need to edit the trust policy of the role to reduce its scope to your repository only. Make sure you don't skip this part, it is <strong>very important</strong>. Without that, <strong>any repository on GitHub will be able to assume your role and access your resources</strong>. (Unfortunately, there does not seem to be a way to do that at creation time).</p>
<p>Go back to IAM Roles and select the created Role. Choose <em>Trust Relationships</em> and <em>Edit Trust Relationship</em>.</p>
<p>Under <code>Condition</code>, add the following segment:</p>
<pre><code class="lang-json"><span class="hljs-string">"StringLike"</span>: {
  <span class="hljs-attr">"token.actions.githubusercontent.com:sub"</span>: <span class="hljs-string">"repo:[your-org]/[your-repo]:*"</span>
}
</code></pre>
<p>Replace the organization and repo names to match yours, and click <code>Update Trust Policy</code>.</p>
<blockquote>
<p>✍️ Note: You can take this even further and reduce the scope, by using <a target="_blank" href="https://git-scm.com/book/en/v2/Git-Internals-Git-References">git references</a>, to a branch or tag only, for example.
eg: <code>repo:[your-org]/[your-repo]:ref:refs/heads/master</code></p>
</blockquote>
<p>The final result will look like this:</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"Version"</span>: <span class="hljs-string">"2012-10-17"</span>,
  <span class="hljs-attr">"Statement"</span>: [
    {
      <span class="hljs-attr">"Effect"</span>: <span class="hljs-string">"Allow"</span>,
      <span class="hljs-attr">"Principal"</span>: {
        <span class="hljs-attr">"Federated"</span>: <span class="hljs-string">"arn:aws:iam::1234567890:oidc-provider/token.actions.githubusercontent.com"</span>
      },
      <span class="hljs-attr">"Action"</span>: <span class="hljs-string">"sts:AssumeRoleWithWebIdentity"</span>,
      <span class="hljs-attr">"Condition"</span>: {
        <span class="hljs-attr">"StringEquals"</span>: {
          <span class="hljs-attr">"token.actions.githubusercontent.com:aud"</span>: <span class="hljs-string">"sts.amazonaws.com"</span>
        },
        <span class="hljs-attr">"StringLike"</span>: {
          <span class="hljs-attr">"token.actions.githubusercontent.com:sub"</span>: <span class="hljs-string">"repo:[your-org]/[your-repo]:*"</span>
        }
      }
    }
  ]
}
</code></pre>
<p>This concludes the required configurations on your AWS account. Take note of the role ARN, you'll need it later.</p>
<blockquote>
<p>💡 You can create different roles per account and use a different one for each use case.  For example, one per application, per usage (configurations, deployment, integration tests), etc. You can play with that to reduce the scope of each session even more.</p>
</blockquote>
<h1 id="heading-configure-github-action-workflow">Configure Github action workflow</h1>
<p>Your Github workflow requires additional permissions in order to be able to use OIDC. Add the following at the top of your workflow's YML file. You can also add it at the job level to reduce the scope if needed.</p>
<pre><code class="lang-yml"><span class="hljs-attr">permissions:</span>
  <span class="hljs-attr">id-token:</span> <span class="hljs-string">write</span> <span class="hljs-comment"># required to use OIDC authentication</span>
  <span class="hljs-attr">contents:</span> <span class="hljs-string">read</span> <span class="hljs-comment"># required to checkout the code from the repo</span>
</code></pre>
<p>You can now use the <a target="_blank" href="https://github.com/aws-actions/configure-aws-credentials">configure-aws-credentials</a> Github action in the job that needs to assume the role. Add this step to generate credentials before doing any call to AWS:</p>
<pre><code class="lang-yml"><span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">configure</span> <span class="hljs-string">aws</span> <span class="hljs-string">credentials</span>
  <span class="hljs-attr">uses:</span> <span class="hljs-string">aws-actions/configure-aws-credentials@v1</span>
  <span class="hljs-attr">with:</span>
    <span class="hljs-attr">role-to-assume:</span> <span class="hljs-string">arn:aws:iam::1234567890:role/your-role-arn</span>
    <span class="hljs-attr">role-duration-seconds:</span> <span class="hljs-number">900</span> <span class="hljs-comment"># the ttl of the session, in seconds.</span>
    <span class="hljs-attr">aws-region:</span> <span class="hljs-string">us-east-1</span> <span class="hljs-comment"># use your region here.</span>
<span class="hljs-comment"># You can now execute commands that use the credentials👇</span>
<span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Serverless</span> <span class="hljs-string">deploy</span>
  <span class="hljs-attr">run:</span> <span class="hljs-string">sls</span> <span class="hljs-string">deploy</span> <span class="hljs-string">--stage</span> <span class="hljs-string">dev</span>
</code></pre>
<p>The <code>configure AWS credentials</code> step will use the OIDC integration to assume the given role, generate <strong>short-lived</strong> credentials, and make them available to the current job.</p>
<blockquote>
<p>💡 If you want to take security even further, you can also keep your role's ARN used in <code>role-to-assume</code> in a Github secret.</p>
</blockquote>
<h1 id="heading-automate">Automate</h1>
<p>The guys at <code>configure-aws-credentials</code> shared a <a target="_blank" href="https://github.com/aws-actions/configure-aws-credentials#sample-iam-role-cloudformation-template">CloudFormation template</a> that you can use to automate the AWS configuration steps.</p>
<p>I took it one step further; I <a target="_blank" href="http://githubactions-oidc-cfn.s3.amazonaws.com/template.yml">hosted that template</a> and created a deployment link for you.</p>
<p><a target="_blank" href="https://us-east-1.console.aws.amazon.com/cloudformation/home?region=eu-west-1#/stacks/quickcreate?templateURL=http://githubactions-oidc-cfn.s3.amazonaws.com/template.yml&amp;stackName=GithubActionsOIDC">Click here</a> to deploy it into your account.</p>
<p>Fill in the parameters:</p>
<ul>
<li><code>GitHubOrg</code>: your organization name, or your Github username</li>
<li><code>RepositoryName</code>: the repository that needs access to your AWS account</li>
<li><code>OIDCProviderArn</code>: your existing OIDC provider's ARN, if you have one already. If you don't, leave it empty and one will be created for you. (Remember that you only need one per account).</li>
</ul>
<blockquote>
<p>✍️ Note: The created role will not have any Policy attached to it. You will still need to attach the ones that your workflow needs to it after that.</p>
</blockquote>
<h1 id="heading-conclusion">Conclusion</h1>
<p>As you can see, securing your account doesn't have to be hard. The part that might require a little more effort is to define the right Policies if you want to follow the principle of least privileges (which you should!).</p>
<p>For more content like this, follow me here on Hashnode, on Twitter <a target="_blank" href="https://twitter.com/Benoit_Boure">@Benoit_Boure</a>, and don’t forget to subscribe to my newsletter.</p>
]]></content:encoded></item><item><title><![CDATA[How to Observe EventBridge Events with AppSync Subscriptions]]></title><description><![CDATA[I recently came across David Boyne's blog post: How to Observe EventBridge Events with Postman and WebSockets. What a great idea! But, then I thought:

I can do the same with AppSync Subscriptions!

I had to try! Here is what I achieved:
Building the...]]></description><link>https://benoitboure.com/how-to-observe-eventbridge-events-with-appsync-subscriptions</link><guid isPermaLink="true">https://benoitboure.com/how-to-observe-eventbridge-events-with-appsync-subscriptions</guid><category><![CDATA[GraphQL]]></category><category><![CDATA[AWS]]></category><category><![CDATA[debugging]]></category><category><![CDATA[debug]]></category><dc:creator><![CDATA[Benoît Bouré]]></dc:creator><pubDate>Sat, 02 Oct 2021 21:09:03 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1632118721654/M3EirmG7D.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I recently came across David Boyne's blog post: <a target="_blank" href="https://www.boyney.io/blog/2021-09-06-debug-eventbridge-with-postman">How to Observe EventBridge Events with Postman and WebSockets</a>. What a great idea! But, then I thought:</p>
<blockquote>
<p>I can do the same with AppSync Subscriptions!</p>
</blockquote>
<p>I had to try! Here is what I achieved:</p>
<h1 id="heading-building-the-basic-appsync-api">Building the basic AppSync API</h1>
<p>The idea was simple. I needed the following components:</p>
<ul>
<li><p>An AppSync API</p>
</li>
<li><p>A <code>Mutation</code> that receives events from EventBridge</p>
</li>
<li><p>A <code>Subscription</code> that is attached to the aforementioned Mutation</p>
</li>
<li><p>An EventBridge rule that sends events to the AppSync Mutation (target)</p>
</li>
</ul>
<p>I also wanted to be able to filter events I was interested in. Here, I thought about two options:</p>
<ol>
<li><p>Filter the events in the EventBridge rule.</p>
</li>
<li><p>Send <strong>all</strong> events to AppSync and use AppSync to filter them, thanks to <a target="_blank" href="https://docs.aws.amazon.com/appsync/latest/devguide/aws-appsync-real-time-data.html#using-subscription-arguments">subscription arguments</a></p>
</li>
</ol>
<p>I went with the second approach. It would give me more flexibility to filter the events at query time instead of having to re-deploy each time I wanted a new filter.</p>
<p>Here is the GraphQL Schema I created:</p>
<pre><code class="lang-graphql"><span class="hljs-keyword">type</span> Mutation {
  sendEvent(<span class="hljs-symbol">event:</span> EventBridgeMessageInput!): EventBridgeMessage
}

<span class="hljs-keyword">type</span> Subscription {
  subscribe(
    <span class="hljs-symbol">source:</span> String
    <span class="hljs-symbol">detailType:</span> String
    <span class="hljs-symbol">account:</span> String
    <span class="hljs-symbol">resources:</span> [String!]
  ): EventBridgeMessage
    <span class="hljs-meta">@aws_subscribe</span>(<span class="hljs-symbol">mutations:</span> [<span class="hljs-string">"sendEvent"</span>])
}

<span class="hljs-keyword">type</span> EventBridgeMessage {
  <span class="hljs-symbol">id:</span> ID!
  <span class="hljs-symbol">version:</span> String!
  <span class="hljs-symbol">detailType:</span> String!
  <span class="hljs-symbol">source:</span> String!
  <span class="hljs-symbol">account:</span> String!
  <span class="hljs-symbol">time:</span> AWSDateTime!
  <span class="hljs-symbol">region:</span> String!
  <span class="hljs-symbol">resources:</span> [String!]
  <span class="hljs-symbol">detail:</span> AWSJSON!
}

<span class="hljs-keyword">input</span> EventBridgeMessageInput {
  <span class="hljs-symbol">id:</span> ID!
  <span class="hljs-symbol">version:</span> String!
  <span class="hljs-symbol">detailType:</span> String!
  <span class="hljs-symbol">source:</span> String!
  <span class="hljs-symbol">account:</span> String!
  <span class="hljs-symbol">time:</span> AWSDateTime!
  <span class="hljs-symbol">region:</span> String!
  <span class="hljs-symbol">resources:</span> [String!]
  <span class="hljs-symbol">detail:</span> AWSJSON!
}
</code></pre>
<p>I also needed to setup the Mutation. I used a <code>NONE</code> data source for that and a simple mapping template that just returns the received payload.</p>
<p>All done! Now, by executing the <code>sendEvent</code> Mutation, it gets delivered to the subscription! 🙌</p>
<p>All that was left to do was to configure EventBrige and set the Mutation as a target.</p>
<h1 id="heading-first-attempt-api-destinations">First attempt: API Destinations</h1>
<p>My first attempt was to use API Destinations. I followed this <a target="_blank" href="https://aws.amazon.com/blogs/mobile/appsync-eventbridge/">awesome tutorial</a> and defined my <em>Input Path</em> and <em>Input Transformer</em> rules which looked like this:</p>
<pre><code class="lang-typescript">InputPathsMap:
  version: $.version
  id: $.id
  detailType: $.detail-<span class="hljs-keyword">type</span>
  source: $.source
  account: $.account
  time: $.time
  region: $.region
  resources: $.resources
  detail: $.detail
InputTemplate: |
  {
    <span class="hljs-string">"query"</span>: <span class="hljs-string">"mutation SendEvent($event: EventInput!) { sendEvent(event: $event) { version id detailType source account time region resources detail } }"</span>,
    <span class="hljs-string">"operationName"</span>: <span class="hljs-string">"SendEvent"</span>,
    <span class="hljs-string">"variables"</span>: {
      <span class="hljs-string">"event"</span>: {
        <span class="hljs-string">"version"</span>: <span class="hljs-string">"&lt;version&gt;"</span>,
        <span class="hljs-string">"id"</span>: <span class="hljs-string">"&lt;id&gt;"</span>,
        <span class="hljs-string">"detailType"</span>: <span class="hljs-string">"&lt;detailType&gt;"</span>,
        <span class="hljs-string">"source"</span>: <span class="hljs-string">"&lt;source&gt;"</span>,
        <span class="hljs-string">"account"</span>: <span class="hljs-string">"&lt;account&gt;"</span>,
        <span class="hljs-string">"time"</span>: <span class="hljs-string">"&lt;time&gt;"</span>,
        <span class="hljs-string">"region"</span>: <span class="hljs-string">"&lt;region&gt;"</span>,
        <span class="hljs-string">"resources"</span>: <span class="hljs-string">"&lt;resources&gt;"</span>,
        <span class="hljs-string">"detail"</span>: &lt;detail&gt;
      }
    }
</code></pre>
<p>Unfortunately, that didn't work! 😞</p>
<p>The problem is that in EventBridge, the <code>detail</code> attribute is an arbitrary JSON object which could have any shape. This is the reason I used an <code>AWSJSON</code> type in my GraphQL schema (I wanted to receive any event). The problem is that AppSync expects the JSON to be <strong>stringified</strong>!</p>
<p>After some investigation, I could not find any way for EventBridge to stringify JSONs. So, that was a dead end.</p>
<h1 id="heading-aws-lambda-to-the-rescue">AWS Lambda to the rescue!</h1>
<p>If EventBridge cannot do it, Lambda surely can! So, I wrote a simple lambda that receives the event, reformats it and calls the AppSync endpoint. I then just configured the Lambda as an EventBridge target. (<a target="_blank" href="https://github.com/bboure/appsync-eventbridge-subscriber/blob/master/src/processEvent.ts">See the code here</a>).</p>
<blockquote>
<p>✍️ Note: I also added an IAM authentication method to the AppSync API that Lambda can use to call the Mutation (in addition to the API key used by the subscription).</p>
</blockquote>
<p>All set! Now, running the following subscription:</p>
<pre><code class="lang-graphql"><span class="hljs-keyword">subscription</span> MySubscription {
  subscribe {
    resources
    region
    source
    version
    detailType
    detail
  }
}
</code></pre>
<p>And sending an event into Event Bridge</p>
<pre><code class="lang-bash">aws events put-events --entries <span class="hljs-string">'[{"DetailType": "my.detail.type", "Source": "my.source", "Detail": "{\"foo\": \"bar\"}"}]'</span>
</code></pre>
<p>Response:</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"data"</span>: {
    <span class="hljs-attr">"subscribe"</span>: {
      <span class="hljs-attr">"resources"</span>: [],
      <span class="hljs-attr">"region"</span>: <span class="hljs-string">"us-east-1"</span>,
      <span class="hljs-attr">"source"</span>: <span class="hljs-string">"my.source"</span>,
      <span class="hljs-attr">"version"</span>: <span class="hljs-string">"0"</span>,
      <span class="hljs-attr">"detailType"</span>: <span class="hljs-string">"my.detail.type"</span>,
      <span class="hljs-attr">"detail"</span>: <span class="hljs-string">"{\"foo\":\"bar\"}"</span>
    }
  }
}
</code></pre>
<p>It works! 🎉</p>
<h1 id="heading-the-power-of-appsync-subscriptions">The power of AppSync subscriptions</h1>
<p>One of the great features of AppSync subscriptions is that you can specify which changes you are interested in at query time. You can do that by adding arguments to the subscription endpoint. Whatever value you pass in the input, you will only receive changes that <a target="_blank" href="https://blog.purple-technology.com/lessons-learned-aws-appsync-subscriptions/">match the Mutation's response fields values</a>.</p>
<p>So, I can now do queries such as</p>
<pre><code class="lang-graphql"><span class="hljs-comment">## Will match events with detail-type = "my.detail" only</span>
<span class="hljs-keyword">subscription</span> {
  subscribe(<span class="hljs-symbol">detailType:</span> <span class="hljs-string">"my.detail"</span>) {
    id
    detailType
    detail
  }
}

<span class="hljs-comment">## Will match events with source = "my.source" only</span>
<span class="hljs-keyword">subscription</span> {
  subscribe(<span class="hljs-symbol">source:</span> <span class="hljs-string">"my.source"</span>) {
    id
    detailType
    detail
  }
}

<span class="hljs-comment">## Will match events with detail-type = "my.detail" AND source = "my.source"</span>
<span class="hljs-keyword">subscription</span> {
  subscribe(<span class="hljs-symbol">detailType:</span> <span class="hljs-string">"my.detail"</span>, <span class="hljs-symbol">source:</span> <span class="hljs-string">"my.source"</span>) {
    id
    detailType
    detail
  }
}
</code></pre>
<p>Isn't that great? I can now listen to exactly the events I am interested in 🔥</p>
<h1 id="heading-limitations-andamp-gotchas">Limitations &amp; gotchas</h1>
<p>Unfortunately, this technique has some limitations. It <strong>cannot</strong> filter events based on the content of the <code>detail</code> field. This is because the data comes stringified.</p>
<p>Also, filters only work when the values <strong>exactly</strong> match. You cannot use advanced filters such as <code>prefix</code>, <code>anything-but</code>, etc. These are filters supported by eventBridge, <strong>not</strong> by AppSync.</p>
<blockquote>
<p>Note that any advanced filter can still be achieved through filters at the EventBridge rule level, of course!</p>
</blockquote>
<h1 id="heading-conclusion">Conclusion</h1>
<p>In this post, I showed you how we can observe EventBridge events through AppSync subscriptions and how we can even filter them at query time. Although its usage is somewhat limited, it can probably still be very helpful when you only need to filter on the <code>detailType</code> or <code>source</code> values, for example. You can easily use it to debug/test your application.</p>
<p>Find the full code of this implementation <a target="_blank" href="https://github.com/bboure/appsync-eventbridge-subscriber/">on Github</a></p>
<p>A big thanks to <a target="_blank" href="https://twitter.com/boyney123">David Boyne</a> for the inspiration!</p>
<div class="hn-embed-widget" id="graphbolt"></div>]]></content:encoded></item><item><title><![CDATA[How to Avoid Composite IDs in GraphQL with DynamoDB (feat. AppSync)]]></title><description><![CDATA[In this article, I will discuss a few tricks on how to optimize your GraphQL API for items that use composite keys in DynamoDB. It will work no matter the GraphQL server, but if you're using AppSync, you're in luck because I'll share a few (VTL) code...]]></description><link>https://benoitboure.com/how-to-avoid-composite-ids-in-graphql-with-dynamodb-feat-appsync</link><guid isPermaLink="true">https://benoitboure.com/how-to-avoid-composite-ids-in-graphql-with-dynamodb-feat-appsync</guid><category><![CDATA[serverless]]></category><category><![CDATA[GraphQL]]></category><category><![CDATA[DynamoDB]]></category><category><![CDATA[AWS]]></category><dc:creator><![CDATA[Benoît Bouré]]></dc:creator><pubDate>Tue, 14 Sep 2021 20:29:04 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1631555987441/5vUGjj_w8.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In this article, I will discuss a few tricks on how to optimize your GraphQL API for items that use composite keys in DynamoDB. It will work no matter the GraphQL server, but if you're using AppSync, you're in luck because I'll share a few (VTL) code snippets too 🙂</p>
<p>In DynamoDB, it is very common to use composite keys (ie: Partition Key and Sort Key). This allows us to group related items together. Moreover, the combination of the partition key (PK) and sort key (SK) is what uniquely identifies the Item.</p>
<p>To illustrate this, let's take the following simple example. Imagine we have a DynamoDB <code>states</code> table that contains states from different countries. We might structure our data like so:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1631435725822/O5sjcK87O.png" alt="image.png" /></p>
<p>Here, the PK identifies the country code, and the SK the state code. They both together uniquely identify one Item in the database (ie: a state in a given country) and ensures their uniqueness at the same time. Additionally, this gives us some free access patterns (eg: <em>Find all states for a given country</em>).</p>
<p>Now, imagine that we want to serve the Items from a GraphQL endpoint. The query might look like this:</p>
<pre><code class="lang-graphql">getState(<span class="hljs-symbol">countryCode:</span> <span class="hljs-string">"US"</span>, <span class="hljs-symbol">stateCode:</span> <span class="hljs-string">"TX"</span>) {
  name
}
</code></pre>
<p>This works well, but has several inconvenients:</p>
<p><strong>This is not practical</strong></p>
<p>The client has to pass two arguments in order to identify which item it wants to query. Understanding which fields must be used (eg: from other queries) might not be as straightforward as it seems. Also, the frontend often needs a unique key to distinguish items/components from each other (<a target="_blank" href="https://reactjs.org/docs/lists-and-keys.html#keys">think "key" attribute in React</a>), forcing it to compute it every time.</p>
<p><strong>The client should not have to worry about the underlying data structure</strong></p>
<p>In an ideal world, the client should not have to worry about how the data is being stored. By having a composite id in our API, we are exposing how the data is organized in the data layer and make the client depend from it.</p>
<p><strong>In Front end applications, the client cache functionality might not work out of the box</strong></p>
<p>Most GraphQL clients, like Apollo, offer a solid and powerful <a target="_blank" href="https://www.apollographql.com/docs/react/caching/overview/">cache functionality</a>. However, by default, the <code>id</code> field (with an <code>ID</code> type) is what they usually use to uniquely identify the Item in the (cache) datastore. In the above example, there isn't any (Neither in the request nor in the response). The client does not know that the <code>countryCode</code>/<code>stateCode</code> combination is what uniquely identifies a State. As a result, the item would never be cached.</p>
<p>Sure, we can always <a target="_blank" href="https://www.apollographql.com/docs/react/caching/cache-configuration/#customizing-cache-ids">customize the cache ids</a>, but we would have to do it for every Item type and in every client (ie: web, mobile, etc).</p>
<h2 id="heading-the-solution-denormalizing-a-unique-id">The solution: Denormalizing a unique id</h2>
<p>Wouldn't it be nice if we could have a unique <code>id</code> field for our <code>State</code> items? As mentioned earlier, every State is a unique combination of the country code and the state code. In this case, we could even use the <a target="_blank" href="https://www.iso.org/obp/ui/#iso:code:3166:US">iso code</a> of each state for that. For example, Texas' id can be <code>US-TX</code>.</p>
<p>Let's add an <code>id</code> attribute to our data model.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1631433772445/rGtR-1MeYQ.png" alt="image.png" /></p>
<p>Now, all we have to do is to denormalize the <code>id</code> by concatenating the country and state codes. Doing so at creation time will avoid us having to generate it on the fly in every query (Plus, it's always nice to receive a pre-computed <code>id</code> field everywhere, even in the backend, for future uses). We can easily do that when saving the item in DynamoDB.</p>
<p>Example using AppSync VTL</p>
<pre><code class="lang-typescript">#set($countryCode=$ctx.args.input.countryCode)
#set($stateCode=$ctx.args.input.stateCode)
#set($attributeValues={})
$util.qr($attributeValues.put(<span class="hljs-string">"id"</span>, $util.dynamodb.toDynamoDB(<span class="hljs-string">"${countryCode}-${stateCode}"</span>)))
#foreach($item <span class="hljs-keyword">in</span> $ctx.args.input.entrySet())
  $util.qr($attributeValues.put(<span class="hljs-string">"${item.key}"</span>, $util.dynamodb.toDynamoDB($item.value)))
#end
{
  <span class="hljs-string">"version"</span>: <span class="hljs-string">"2018-05-29"</span>,
  <span class="hljs-string">"operation"</span>: <span class="hljs-string">"PutItem"</span>,
  <span class="hljs-string">"key"</span>: {
    <span class="hljs-string">"countryCode"</span>: $util.dynamodb.toDynamoDBJson($countryCode),
    <span class="hljs-string">"stateCode"</span>: $util.dynamodb.toDynamoDBJson($stateCode)
  },
  <span class="hljs-string">"attributeValues"</span>: $util.toJson($attributeValues)
}
</code></pre>
<p>Awesome! But now, how do we fetch data from GraphQL? Let's update the query and use a unique <code>id</code> parameter with an <code>ID!</code> type.</p>
<pre><code class="lang-graphql"><span class="hljs-keyword">type</span> Query {
  getState(<span class="hljs-symbol">id:</span> ID!): State!
}

<span class="hljs-keyword">type</span> State {
  <span class="hljs-symbol">id:</span> ID!
  <span class="hljs-symbol">countryCode:</span> String!
  <span class="hljs-symbol">stateCode:</span> String!
  <span class="hljs-symbol">name:</span> String!
}
</code></pre>
<p>Great! Now, the backend receives a unique argument. However, DynamoDB still requires us to pass a <code>countryCode</code> (PK) and <code>stateCode</code> (SK) composite key. This will require some additional gymnastics at the resolver level. This is pretty straightforward, though. All we have to do is to split the <code>id</code> argument by '<code>-</code>'. You can do that in your favourite language depending on your use case. If you are using AppSync, here is how you can easily do that in VTL.</p>
<pre><code class="lang-typescript">#<span class="hljs-keyword">if</span>(!$ctx.args.id.contains(<span class="hljs-string">"-"</span>))
  ## Invalid iso code
  $util.error(<span class="hljs-string">"Invalid Id"</span>, <span class="hljs-string">"InputError"</span>)
#end
#set($parts=$ctx.args.id.split(<span class="hljs-string">"-"</span>))
#set($countryCode=$parts.get(<span class="hljs-number">0</span>))
#set($stateCode=$parts.get(<span class="hljs-number">1</span>))
{
  <span class="hljs-string">"version"</span>: <span class="hljs-string">"2018-05-29"</span>,
  <span class="hljs-string">"operation"</span>: <span class="hljs-string">"GetItem"</span>,
  <span class="hljs-string">"key"</span>: {
    <span class="hljs-string">"countryCode"</span>: $util.dynamodb.toStringJson($countryCode),
    <span class="hljs-string">"stateCode"</span>: $util.dynamodb.toStringJson($stateCode)
  }
}
</code></pre>
<p>As you can see, this requires very little logic to implement and it solves all our issues. And it's <strong>completely transparent to the client</strong>. 🙌</p>
<p>Here is what the new query looks like:</p>
<pre><code class="lang-graphql">getState(<span class="hljs-symbol">id:</span> <span class="hljs-string">"US-TX"</span>) {
  id
  name
}
</code></pre>
<h2 id="heading-conclusion">Conclusion</h2>
<p>In this post, I showed you how to handle composite DynamoDB keys with GraphQL by hiding them from the client behind a unique attribute. By denormalizing this attribute in DynamoDB and implementing some simple logic in the resolvers, you can save yourself from more annoying issues that we identified earlier.</p>
<div class="hn-embed-widget" id="graphbolt"></div>]]></content:encoded></item><item><title><![CDATA[How to Handle Many to Many relations in AppSync]]></title><description><![CDATA[In this post, I will teach you how you can handle many-to-many relations with AWS AppSync, how to avoid denormalization and still avoid the n+1 problem.

TL;DR; Use a Pipeline resolver to first fetch the relations followed by a BatchGetItem operation...]]></description><link>https://benoitboure.com/how-to-handle-many-to-many-relations-in-appsync</link><guid isPermaLink="true">https://benoitboure.com/how-to-handle-many-to-many-relations-in-appsync</guid><category><![CDATA[DynamoDB]]></category><category><![CDATA[AWS]]></category><category><![CDATA[GraphQL]]></category><dc:creator><![CDATA[Benoît Bouré]]></dc:creator><pubDate>Sun, 16 May 2021 20:07:31 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1621195818735/8fL3XyM0_.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In this post, I will teach you how you can handle many-to-many relations with AWS AppSync, how to avoid denormalization and still avoid the n+1 problem.</p>
<blockquote>
<p>TL;DR; Use a Pipeline resolver to first fetch the relations followed by a BatchGetItem operation to retrieve all related items in one single query.</p>
<p>Find the full solution <a target="_blank" href="https://github.com/bboure/appsync-n-plus-one">on GitHub</a></p>
</blockquote>
<h1 id="heading-the-problem">The problem</h1>
<p>One of the most common problems developers face when designing DynamoDB databases is many-to-many relationships. Usually, the recommended way is to <a target="_blank" href="https://aws.amazon.com/blogs/database/should-your-dynamodb-table-be-normalized-or-denormalized/">denormalize your data</a>. You duplicate all the fields required by your access pattern in the relation Item so that they are returned along with it. It avoids doing extra queries to the related items, as NoSQL databases can't operate <em>JOIN</em> operations.</p>
<p>Let's take an example. Imagine you are building an application that has users and groups. Users can be in several groups and groups may have multiple users.</p>
<p>Your data model might look like this:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1620843791843/vrnalEcCN.png" alt="image.png" /></p>
<p>There are 2 problems with this design with GraphQL APIs:</p>
<p><strong>1) The client might ask for fields that are not denormalized in the relation.</strong></p>
<p>Since GraphQL is agnostic of the underlying data source and the types defined in the schema have all the fields defined (not just those that are denormalized), a query might request them. In our example, it might be the user's bio or profile picture. If these fields are not be denormalized in be the relation, they would be missing in the GraphQL response.</p>
<pre><code class="lang-graphql"><span class="hljs-keyword">query</span> {
  getGroupUsers(<span class="hljs-symbol">id:</span> ID!) {
    id
    name
    <span class="hljs-comment"># bio and picture are not denormalized in the relation</span>
    bio
    picture
  }
}
</code></pre>
<p>One approach to fix this would be to create a different type which is a subset of <code>User</code>. However, this defeats the purpose of GraphQL and might also not be what you want.</p>
<p><strong>2) It is hard to keep the data up to date when it changes.</strong></p>
<p>What if the user changes his username (think Twitter)? You will have to go through all the relation Items and update them. If the number of items is small, it can be manageable, but imagine a group that has thousands or millions of users! This can become a hassle to maintain and data can easily become out of sync.</p>
<p>Also, as explained before, with GraphQL in mind, you might end up having to denormalize the whole user item. This would not be a viable solution.</p>
<h1 id="heading-resolvers-to-the-rescue">Resolvers to the rescue</h1>
<p>One of the characteristics of GraphQL is <em>resolvers</em>. Resolvers are used to resolve child entities using data from the previously resolved ones (the <em>source</em> in AppSync).</p>
<p>One of the common approaches to solve the above problems would be to use a different resolver for the child entity (in our case: <code>user</code>). The implementation is pretty straightforward: first, resolve the relations, and then use them to resolve the underlying users (using the user id they contain).</p>
<p>For that to work, you would need to nest the user entity under the relation entity. This might not be a bad thing anyway, because you might want to return some metadata related to the relationship as well, such as a <code>joinedAt</code> attribute.</p>
<p>Example:</p>
<pre><code class="lang-graphql"><span class="hljs-keyword">query</span> {
  getGroupUsers(<span class="hljs-symbol">id:</span> ID!) {
    joinedAt
    user {
      id
      name
      bio
      picture
    }  
  }
}
</code></pre>
<blockquote>
<p>user is attached to a resolver that receives the user id from the group-user relation.</p>
</blockquote>
<p>There is one problem with this approach though: It introduces an n+1 problem. ie: every child entity will trigger one extra query to DynamoDB each. If a group has 10 users, you will end up executing 11 queries (one for all the relations and 10 for each individual user)</p>
<h1 id="heading-a-better-approach-pipeline-andamp-batch-resolvers">A better approach: Pipeline &amp; Batch resolvers</h1>
<p>Pipelines allow you to compose a resolver out of different steps or <em>functions</em>. If you are not familiar with pipelines yet, I suggest you read <a target="_blank" href="https://docs.aws.amazon.com/appsync/latest/devguide/pipeline-resolvers.html">the documentation</a></p>
<p>AppSync also supports <a target="_blank" href="https://docs.aws.amazon.com/appsync/latest/devguide/tutorial-dynamodb-batch.html">DynamoDB Batch resolvers</a> which you can use to act on several items in one single DynamoDB round-trip. There are three supported operations: <code>BatchGetItem</code>, <code>BatchPutItem</code> , and <code>BatchDeleteItem</code>.</p>
<p>The one we are interested in here is <code>BatchGetItem</code>. It can be used in order to retrieve up to <strong>100 DynamoDB items</strong> in one single DynamoDB request.</p>
<p>With all these elements in hand, we can implement a pipeline resolver with two functions:</p>
<ol>
<li><p>fetch the group-user relation items</p>
</li>
<li><p>fetch all the underlying user entities in one single query</p>
</li>
</ol>
<p>Let's see how that works and build the <code>getGroupUser</code> endpoint.</p>
<blockquote>
<p>The full solution is available <a target="_blank" href="https://github.com/bboure/appsync-n-plus-one">on GitHub</a></p>
</blockquote>
<p>In the <code>getGroupUsers</code> function (the first function of the pipeline), we first fetch the relation items between the group and the users. We also make sure not to go over the limit of 100 items imposed by <code>BatchGetItem</code>. After that, we'll need to paginate (more on that later).</p>
<pre><code class="lang-typescript">## getGroupUsers - request mapping
#set($limit=$util.defaultIfNull($ctx.args.limit, <span class="hljs-number">10</span>))
#<span class="hljs-keyword">if</span>($limit&gt;<span class="hljs-number">100</span>)
  #set($limit=<span class="hljs-number">100</span>)
#end
{
  <span class="hljs-string">"version"</span>: <span class="hljs-string">"2018-05-29"</span>,
  <span class="hljs-string">"operation"</span>: <span class="hljs-string">"Query"</span>,
  <span class="hljs-string">"limit"</span>: $util.toJson($limit),
  <span class="hljs-string">"nextToken"</span>: $util.toJson($ctx.args.nextToken),
  <span class="hljs-string">"query"</span> : {
    <span class="hljs-string">"expression"</span>: <span class="hljs-string">"#PK = :PK and begins_with(#SK, :SK)"</span>,
    <span class="hljs-string">"expressionNames"</span> : {
      <span class="hljs-string">"#PK"</span>: <span class="hljs-string">"PK"</span>,
      <span class="hljs-string">"#SK"</span>: <span class="hljs-string">"SK"</span>
    },
    <span class="hljs-string">"expressionValues"</span> : {
      <span class="hljs-string">":PK"</span>: $util.dynamodb.toStringJson(<span class="hljs-string">"GROUP#${ctx.args.id}"</span>),
      <span class="hljs-string">":SK"</span>: $util.dynamodb.toStringJson(<span class="hljs-string">"USER#"</span>)
    }
  }
}
</code></pre>
<p>The response mapping just forwards the items to the next function. We also keep <code>nextToken</code> into the stash in order to return it later to the client for pagination.</p>
<pre><code class="lang-typescript">## getGroupUsers - response mapping
$util.qr($ctx.stash.put(<span class="hljs-string">"nextToken"</span>, $ctx.result.nextToken))
$util.toJson($ctx.result.items)
</code></pre>
<p>The <code>getBatchUsers</code> function is where the magic happens. We build the Primary Key pairs (PK and SK) of our user items and pass them to the <code>GetBatchItem</code> query.</p>
<p>Before that, if the previous request returned no result, we just return an empty array straightway (bypassing thereby the extra query to DynamoDB).</p>
<pre><code class="lang-typescript">## getBatchUsers - request mapping
#<span class="hljs-keyword">if</span>($ctx.prev.result.size() == <span class="hljs-number">0</span>)
    #<span class="hljs-keyword">return</span>([])
#end
#set($keys=[])
#foreach($item <span class="hljs-keyword">in</span> $ctx.prev.result)
  ## the user and PK/SK is the SK <span class="hljs-keyword">from</span> the Item received <span class="hljs-keyword">from</span> the previous <span class="hljs-function"><span class="hljs-keyword">function</span>
  <span class="hljs-title">$util</span>.<span class="hljs-title">qr</span>(<span class="hljs-params">$keys.add({
    <span class="hljs-string">"PK"</span>: $util.dynamodb.toDynamoDB(${item.SK}),
    <span class="hljs-string">"SK"</span>: $util.dynamodb.toDynamoDB(${item.SK})
  })</span>)
#<span class="hljs-title">end</span>
</span>{
    <span class="hljs-string">"version"</span>: <span class="hljs-string">"2018-05-29"</span>,
    <span class="hljs-string">"operation"</span>: <span class="hljs-string">"BatchGetItem"</span>,
    <span class="hljs-string">"tables"</span> : {
        ## replace <span class="hljs-built_in">this</span> <span class="hljs-keyword">with</span> your table<span class="hljs-string">'s name
        "table-name": {
            "keys": $util.toJson($keys)
        }
    }
}</span>
</code></pre>
<p>Once we get to our response mapping, we have to restructure our data a bit and inject the user entities into the relation items returned by the previous pipeline function.</p>
<pre><code class="lang-typescript">## getBatchUsers - response mapping
#set($items=[])
#foreach($item <span class="hljs-keyword">in</span> $ctx.result.data.get(<span class="hljs-string">"table-name"</span>))
  #set($groupUser=$ctx.prev.result.get($foreach.index))
  $util.qr($groupUser.put(<span class="hljs-string">"user"</span>, $item))
  $util.qr($items.add($groupUser))
#end
$util.toJson($items)
</code></pre>
<p>Finally, in our <em>after mapping</em>, we return the data we previously aggregated and we also send the <code>nextToken</code> back to the client to allow for pagination.</p>
<pre><code class="lang-typescript">## getUsers - after mapping
$util.toJson({
  <span class="hljs-string">"nextToken"</span>: $ctx.stash.nextToken,
  <span class="hljs-string">"items"</span>: $ctx.result
})
</code></pre>
<p>Here you have it! Now, no matter how many user entities the group has, you would only be sending 2 requests to DynamoDB!</p>
<blockquote>
<p>💡 Did you know?</p>
<p>In DynamoDB, <code>BatchGetItem</code> <a target="_blank" href="https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_BatchGetItem.html">does not guarantee to return the items in any particular order</a>. However, AppSync does the heavy lifting for you and returns them in the same order as the keys. You, therefore, don't need to worry about it. 🙌</p>
</blockquote>
<p>There is one important thing to notice, though:</p>
<p><code>BatchGetItem</code> will have <strong>zero impact</strong> on your AWS bill. Fetching 100 items in batch will consume exactly the same RCUs as doing 100 individual <code>GetItem</code> requests. The only difference is that it can reduce the HTTP overhead and slightly improve latency.</p>
<h1 id="heading-conclusion">Conclusion</h1>
<p>In this post, we learned how to reduce the number of DynamoDB requests in many-to-many relationships with AppSync using pipeline resolvers and fetching items in batch from DynamoDB.</p>
<p>If you are interested in AppSync, I regularly share content related to it on <a target="_blank" href="https://twitter.com/Benoit_Boure">Twitter</a> and on this blog, so make sure to follow me and subscribe to my newsletter.</p>
<p>If you have any question, feel free to drop them in the comment section, and if you would like to receive advice or coaching from me about AppSync or Serverless, you can <a target="_blank" href="https://www.hiretheauthor.com/bboure">book a 1:1 conference or chat with me</a></p>
<div class="hn-embed-widget" id="graphbolt"></div>]]></content:encoded></item><item><title><![CDATA[Understanding the DynamoDB Sort Key Order]]></title><description><![CDATA[If you have been working with DynamoDB, you are probably quite familiar with the notion of Partition Keys and Sort Keys (aka PK and SK). You also know that Sort Keys are... well, sorted in ascending order by default. If your SK is of type Number the ...]]></description><link>https://benoitboure.com/understanding-the-dynamodb-sort-key-order</link><guid isPermaLink="true">https://benoitboure.com/understanding-the-dynamodb-sort-key-order</guid><category><![CDATA[DynamoDB]]></category><category><![CDATA[AWS]]></category><dc:creator><![CDATA[Benoît Bouré]]></dc:creator><pubDate>Sun, 02 May 2021 16:33:35 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1619970961289/EfDzM3pZ8.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>If you have been working with DynamoDB, you are probably quite familiar with the notion of Partition Keys and Sort Keys (aka PK and SK). You also know that Sort Keys are... well, sorted in ascending order by default. If your SK is of type <code>Number</code> the items will be sorted in numeric order (1, 3, 10, 50, 400), while if it's of type <code>String</code> they are sorted in "order of UTF-8 bytes". But what does that even mean and how does it affect the order of the items? Let's find out.</p>
<h3 id="what-is-is">What is is</h3>
<p>As per Wikipedia,</p>
<blockquote>
<p>UTF-8 is a variable-width character encoding used for electronic communication [...] UTF-8 is capable of encoding all 1,112,064 valid character code points in Unicode using one to four one-byte (8-bit) code units. </p>
</blockquote>
<p>What it means is that each and every character is assigned a specific byte, or <a target="_blank" href="https://en.wikipedia.org/wiki/Code_point">code point</a>, which have a numerical value.</p>
<p>It's that numerical value that DynamoDB uses to determine the order of your Sort Key.</p>
<p>If you want to know what character comes after which, a good start is to remember the order of the most common characters:</p>
<ol>
<li>numbers</li>
<li>uppercase letters</li>
<li>lowercase letters</li>
</ol>
<p>Notice that letters are "grouped" by case (first the capital letters, and after that lowercase), which means that <code>Zoey</code> will come <strong>before</strong> <code>alligator</code>. This is really important to know if you want to avoid surprises later.</p>
<p>For a complete list of UTF-8 characters, including special ones, sorted by their bytes order, refer to <a target="_blank" href="https://www.fileformat.info/info/charset/UTF-8/list.htm">this page</a>.</p>
<h3 id="how-to-use-it-to-your-advantage">How to use it to your advantage</h3>
<p>Once you know how strings are being sorted, you can use that knowledge to your advantage. A very common practice with DynamoDB and single table design is to pre-join data by placing them into the same partition.</p>
<p>Depending on your access pattern, you might either want your parent item to be at the beginning, or at the end of the partition. </p>
<p>For example, if you have Orders (SK prefixed with <code>ORDER#</code>) and Order Items (SK prefixed with <code>ITEM#</code>), you will perhaps want the <code>ORDER#</code> item at the beginning of the partition, and the <code>ITEM#</code> items sorted by number, in ascending order after it. However, <code>O</code> comes after <code>I</code>, which means that <code>ORDER#</code> will be placed at the end of the partition.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1619969551474/cGt0CDmam.png" alt="image.png" /></p>
<p>You could scan the index backwards, sure, but then your items would be sorted in reversed order, breaking the access pattern.</p>
<p>How to fix that? </p>
<p>Use the UTF-8 sorting mechanism to your advantage! You need <code>ORDER#</code> to start with a character that comes before <code>I</code> to make sure it comes before <code>ITEMS#</code>. For that, you can use any character that comes before <code>I</code>. Any letter from A to H would work, but it might not be user friendly (for debugging and inspecting the data later) and could change the meaning of your prefix. Instead, it is usually most common to use special characters that come before <code>A</code>: for example <code>#</code>, <code>$</code> or <code>%</code>. Let's use <code>$</code> and rename <code>ORDER#</code> to <code>$ORDER#</code>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1619969944675/gvPbTZBRs.png" alt="image.png" /></p>
<p>Now the <code>Order</code> item is at the top of the partition and all items are sorted as expected. 🎉</p>
<p>Note that you can use the same trick in order to sort items in reversed order if necessary.</p>
<p>Example:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1619971880594/zceefIJYY.png" alt="image.png" /></p>
<p>In the above example, you might want to scan the index in reverse order and get the latest vouchers at the top, in descending order. You can force the <code>USER#</code> item to be at the end of the partition by prefixing it with a character that is higher in the UTF-8 ranking. <code>|</code> or <code>~</code> are good examples.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1619971918219/pwJWoWk4i.png" alt="image.png" /></p>
<p>Now, <code>~USER#</code> is at the end of the partition and you can scan the index backwards.</p>
<h1 id="conclusion">Conclusion</h1>
<p>In this article, you learned what is the UTF-8 bytes order, and how you can use it to your advantage in order to force the order of items in your DynamoDB partitions.</p>
<p>If you would like to read more content like this, <a target="_blank" href="https://twitter.com/Benoit_Boure">follow me on Twitter</a> and subscribe to my newsletter on Hashnode.</p>
<hr />
<p>Photo credits: Markus Spiske on <a target="_blank" href="https://unsplash.com/photos/iar-afB0QQw">unsplash</a></p>
]]></content:encoded></item><item><title><![CDATA[5 Ways to Prevent Accidentally Deleting Your CloudFormation Resources]]></title><description><![CDATA[CloudFormation is an AWS service that allows you to maintain Infrastructure as Code (IaC). Whether you are using it natively (with JSON or YML) or through a third-party service such as the Serverless Framework, AWS CDK or SAM, it is a great way to ma...]]></description><link>https://benoitboure.com/5-ways-to-prevent-accidentally-deleting-your-cloudformation-resources</link><guid isPermaLink="true">https://benoitboure.com/5-ways-to-prevent-accidentally-deleting-your-cloudformation-resources</guid><category><![CDATA[AWS]]></category><category><![CDATA[Security]]></category><category><![CDATA[infrastructure]]></category><category><![CDATA[deployment automation]]></category><category><![CDATA[automation]]></category><dc:creator><![CDATA[Benoît Bouré]]></dc:creator><pubDate>Fri, 19 Mar 2021 19:05:09 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1616359152613/bDOALjvss.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>CloudFormation is an AWS service that allows you to maintain Infrastructure as Code (IaC). Whether you are using it natively (with JSON or YML) or through a third-party service such as the <a target="_blank" href="https://www.serverless.com/">Serverless Framework</a>, <a target="_blank" href="https://docs.aws.amazon.com/cdk/latest/guide/home.html">AWS CDK</a> or <a target="_blank" href="https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/what-is-sam.html">SAM</a>, it is a great way to make your infrastructure reproducible across various stages. It also makes the deployment process easily automatable through CI/CD pipelines. In other words, it makes managing your infrastructure less prone to human errors.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1616182974779/rrsobEdIu.png" alt="Kill all humans" /></p>
<p>Although automating things sounds like a good idea, one of the downsides of CloudFormation is that it is hard to understand what is going on under the hood and what exactly is going to happen to your stack during the process, turning every single deployment into a potential <a target="_blank" href="https://www.youtube.com/watch?v=Ki_Af_o9Q9s">7 minutes of terror</a> story. Imagine that an entire resource gets deleted and all its data with it. Make just one mistake and you will only find out when it's too late. What if it is a production database? <a target="_blank" href="https://www.google.com/search?q=accidentally+deleted+production+database">It happens more than you think</a>. 😱😱😱</p>
<p>To avoid this kind of disasters, I will show you <strong>5 ways to protect your resources from deletion with CloudFormation</strong>.</p>
<hr />
<h1 id="1-review-the-changeset">1. Review the Changeset</h1>
<p>The first technique is to understand which actions will effectively be executed during the update <strong>before</strong> they happen. CloudFormation offers a tool that lets you pre-visualize all the modifications that would be applied by a change in your template.</p>
<p>To use it, follow these simple steps:</p>
<ol>
<li>go to your CloudFormation console and select the stack that you want to update</li>
<li>click the <em>Stack actions</em> button and then select <em>Create change set for current stack</em>.</li>
<li>Choose <em>Replace current template</em> and upload your new template, or enter an S3 path to the file.</li>
<li>From there, just follow the guide in order to create the changeset</li>
</ol>
<p>It might take a few seconds for the changeset to be generated. Once it is done, the console will show you a detailed summary of what actions would be executed if you decided to proceed with the update.</p>
<p>Example: </p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1616267715004/Hqz2LaPyY.png" alt="image.png" /></p>
<p>As you can see, you can easily spot what resources will be Modified, or Removed and if they require replacement. Once you are confident enough that this is what you intend to do, you can hit the <em>Execute</em> button with a certain peace of mind 🧘</p>
<p>This method is useful when you want to visually confirm a change that you are unsure about. However, it is not always convenient. Let's explore other solutions.</p>
<h1 id="2-retain-specific-resources">2. Retain Specific Resources</h1>
<p>With the <code>DeletionPolicy</code> attribute, you can control what CloudFormation should do with a resource in the event of it being removed from the template, or if the stack is deleted altogether. The default value is <code>Delete</code> which is probably not what you want in some cases. By changing the value to <code>Retain</code>, you are telling CloudFormation to keep the resource instead.</p>
<p>Example:</p>
<pre><code class="lang-yml"><span class="hljs-attr">Resources:</span>
  <span class="hljs-attr">MyTable:</span>
    <span class="hljs-attr">Type:</span> <span class="hljs-string">AWS::DynamoDB::Table</span>
    <span class="hljs-attr">DeletionPolicy:</span> <span class="hljs-string">Retain</span>
    <span class="hljs-attr">Properties:</span>
      <span class="hljs-attr">TableName:</span> <span class="hljs-string">mytable</span>
</code></pre>
<p>One thing to notice here is that this method will <strong>not</strong> make your deployment fail. CloudFormation will execute all your changes. The difference is that any instruction to delete a resource with a <code>Retain</code> policy will be ignored and the resource will be "detached" from the stack instead. This also means that if you try to add the resource back to the stack, any subsequent deployment might fail because CloudFormation will try to re-create the resource that already exists (e.g: the DynamoDB table already exists with that name). If that happens, you can check this guide for <a target="_blank" href="https://aws.amazon.com/blogs/aws/new-import-existing-resources-into-a-cloudformation-stack/">Importing Existing Resources into a CloudFormation Stack</a>.</p>
<p>⚠️ Attention!</p>
<blockquote>
<p>This capability doesn't apply to resources whose physical instance is replaced during stack update operations. For example, if you edit a resource's properties such that CloudFormation replaces that resource during a stack update.</p>
</blockquote>
<p>This extract from the official documentation is <strong>very</strong> important. What it means is that if you change a property of a resource that <em>requires replacement</em> (e.g.: changing a DynamoDB table's name), the deletion policy <strong>will not apply</strong>, and it would still be <strong>deleted</strong> and re-created. Before you change a property, you should pay attention to the <em>Update requires</em> section of the CloudFormation documentation for that resource's attribute.</p>
<p>With certain types of resources, like EC2 volumes or RDS instances, you can also use <code>Snapshot</code>. In that case, the asset would still be deleted but a backup would be executed first.
You can read more about this strategy by reading <a target="_blank" href="https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-attribute-deletionpolicy.html">the official documentation</a>.</p>
<h1 id="3-define-a-stack-policy">3. Define a Stack Policy</h1>
<p>A more advanced way of protecting your resources is through Stack Policies. With Stack Policies, you can constraint what actions are allowed to be executed or not according to specific rules that you define. When you add a policy, <em>all</em> resources are protected by default. You need to explicitly <code>Allow</code> the changes on the resources that you want to update. You can think of it as an IAM policy, but the difference here is that it only applies during stack updates.</p>
<p>Example:</p>
<p>The following policy allows any change on all resources, except for the resource whose id is <code>MyDynamoDBTable</code>. By explicitly denying <code>Update:Delete</code> and <code>Update:Replace</code>, the resource is protected against deletion <strong>and</strong> replacement. On the other hand, modifications are still allowed (e.g.: Add a Global Secondary Index).</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"Statement"</span>: [
    {
      <span class="hljs-attr">"Effect"</span>: <span class="hljs-string">"Allow"</span>,
      <span class="hljs-attr">"Action"</span>: [<span class="hljs-string">"Update:*"</span>],
      <span class="hljs-attr">"Principal"</span>: <span class="hljs-string">"*"</span>,
      <span class="hljs-attr">"Resource"</span>: <span class="hljs-string">"*"</span>
    },
    {
      <span class="hljs-attr">"Effect"</span>: <span class="hljs-string">"Deny"</span>,
      <span class="hljs-attr">"Action"</span>: [<span class="hljs-string">"Update:Delete"</span>, <span class="hljs-string">"Update:Replace"</span>],
      <span class="hljs-attr">"Principal"</span>: <span class="hljs-string">"*"</span>,
      <span class="hljs-attr">"Resource"</span>: [<span class="hljs-string">"LogicalResourceId/MyDynamoDBTable"</span>]
    }
  ]
}
</code></pre>
<p>To learn how to write custom stack policies, refer to the <a target="_blank" href="https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/protect-stack-resources.html">documentation</a></p>
<h1 id="4-enable-stack-termination-protection">4. Enable Stack Termination Protection</h1>
<p>If all you worry about is someone (or a process) tearing down a whole stack by mistake, what you need is <a target="_blank" href="https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-protect-stacks.html">Stack termination protection</a>. When enabled, CloudFormation will reject any attempt of deleting the stack.</p>
<p>To enable termination protection:</p>
<ol>
<li>Go to CloudFormation and select the stack that you want to protect. </li>
<li>Chose <em>Stack actions</em> followed by <em>Edit termination protection</em></li>
<li>Chose <em>Enabled</em> and hit <em>Save</em></li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1616327767341/hcXuj_c-U.png" alt="Stack Termination Protection" /></p>
<h1 id="5-place-sensitive-resources-in-different-stacks">5. Place Sensitive Resources in Different Stacks</h1>
<p>Last but not least, if you are too paranoid about deleting precious resources and all the data they contain, the best thing you can do is isolate them into their own stack. Place each one of them in a dedicated template and touch them only if and when you need to. By doing so, you will not risk destroying them while deploying other stacks that change more often.</p>
<hr />
<h1 id="which-one-should-you-use">Which One Should You Use?</h1>
<p>Each solution has its own pros and cons. They also behave differently in different situations. To help you better understand the differences, I created a simple cheat sheet.</p>
<table>
<thead>
<tr>
<td>Will my resource be protected if</td><td>It is removed from the stack</td><td>It requires replacement</td><td>The stack is deleted</td></tr>
</thead>
<tbody>
<tr>
<td>Changeset review</td><td>Manual (1)</td><td>Manual (1)</td><td>No</td></tr>
<tr>
<td>DeletionPolicy</td><td>Yes</td><td>No</td><td>Yes</td></tr>
<tr>
<td>Stack Policy</td><td>Yes (2)</td><td>Yes (2)</td><td>No</td></tr>
<tr>
<td>Stack Termination Protection</td><td>No</td><td>No</td><td>Yes</td></tr>
<tr>
<td>Resource Isolation</td><td>No (3)</td><td>No (3)</td><td>No (3)</td></tr>
</tbody>
</table>
<p>(1) You will need to manually review and approve the changes.</p>
<p>(2) Provided you configure the policy properly</p>
<p>(3) On its own, resource isolation will not protect any resource. You'll need to combine it with other solutions</p>
<p>As you can see, there is no one-fits-all solution (none of the rows has all <em>Yeses</em>). You will need to use more than one if you want full protection.</p>
<hr />
<h1 id="conclusion">Conclusion</h1>
<p>I just showed you 5 ways to avoid accidental deletion of CloudFormation resources:</p>
<ul>
<li><strong>Review the changeset</strong> is good if you want to sporadically review changes manually before applying some important changes.</li>
<li>The <strong>DeletionPolicy</strong> attribute will save your data in the event of a resource removal or stack deletion, but it won't help against resource replacement.</li>
<li><strong>Stack Policies</strong> will save you from accidentally removing a resource from the stack and changes that force a replacement. On the other hand, it won't be of any help if the stack is deleted altogether.</li>
<li><strong>Stack termination protection</strong> will only prevent accidental deletion of the stack.</li>
<li><strong>Placing sensitive resources in isolation</strong> will help against some human mistakes, but on its own, it will not protect your data.</li>
</ul>
<p>Use the one that best fits your needs and your particular use-cases. If you need complete protection, you can combine them together and benefit from several safety nets at the same time.</p>
<p>Hopefully, these measures will help you and your team sleep better at night 😴.</p>
<hr />
<p>If you would like to read more content like this, <a target="_blank" href="https://twitter.com/Benoit_Boure">follow me on Twitter</a> and subscribe to my brand new newsletter on Hashnode.</p>
]]></content:encoded></item><item><title><![CDATA[How to Store Large Attribute Values in DynamoDB]]></title><description><![CDATA[DynamoDB is a fully managed NoSQL database that delivers single-digit millisecond performance at any scale. In order to keep up with its promises, there are a couple of constraints and good practices that you need to follow. One of them is to keep yo...]]></description><link>https://benoitboure.com/how-to-store-large-attribute-values-in-dynamodb</link><guid isPermaLink="true">https://benoitboure.com/how-to-store-large-attribute-values-in-dynamodb</guid><category><![CDATA[DynamoDB]]></category><category><![CDATA[AWS]]></category><category><![CDATA[Amazon Web Services]]></category><dc:creator><![CDATA[Benoît Bouré]]></dc:creator><pubDate>Mon, 15 Mar 2021 06:54:20 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1615737867834/Ax6664thb.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>DynamoDB is a fully managed NoSQL database that delivers single-digit millisecond performance at any scale. In order to keep up with its promises, there are a couple of constraints and good practices that you need to follow. One of them is to keep your items as small as possible. This is true not only for performance but also for cost. With DynamoDB, you pay per amount of data that you read or write as well as for storage. Reducing your data size is important if you want to reduce your monthly bill.</p>
<p>On top of that, DynamoDB also comes with some hard-limits including:</p>
<ul>
<li>Any item cannot exceed 400 KB in size.</li>
<li>Query and Scan operations are limited to 1 MB of data <strong>scanned</strong> (After that, you will be forced to paginate).</li>
</ul>
<p>If you handle large amounts of data, you can hit those limitations very quickly.</p>
<p>For example, imagine that you are building a blog application (like Hashnode). You might store posts and comments in a DynamoDB table. These kinds of items contain free text that can be quite long and grow very fast. A blog post can easily reach 10 to 20 KB or more. When you know that half an RCU allows you to read 4 KB of data (providing that you are doing eventually consistent reads), we are talking about 2 to 3 RCUs for every read, if not more if you have several other attributes!</p>
<p>When dealing with such large data, <a target="_blank" href="https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-use-s3-too.html">AWS recommends compressing them and storing them as Binary attributes</a>. In this blog post, I will show you how to compress long text strings with gzip and how to store them into DynamoDB. We will then inspect the read and write units consumed and compare them with the corresponding uncompressed version.</p>
<h1 id="1-writes">1. Writes</h1>
<blockquote>
<p>In this demo, I'll be using node.js but you should easily be able to apply these techniques to your favourite programming language.</p>
</blockquote>
<p>For the purpose of this test, we'll write a simple script that will first generate a dummy blog post using <a target="_blank" href="https://www.npmjs.com/package/lorem-ipsum">lorem-ipsum</a>. We will then save it to DynamoDB twice: once as raw text (uncompressed) and once compressed with gzip. We will also make use of the <code>ReturnConsumedCapacity</code> property of DynamoDB so that it returns the consumed capacity (WCU) for both operations.</p>
<pre><code class="lang-javascript"><span class="hljs-comment">//write.js</span>
<span class="hljs-keyword">const</span> AWS = <span class="hljs-built_in">require</span>(<span class="hljs-string">'aws-sdk'</span>);
<span class="hljs-keyword">const</span> { loremIpsum } = <span class="hljs-built_in">require</span>(<span class="hljs-string">'lorem-ipsum'</span>);
<span class="hljs-keyword">const</span> { gzipSync } = <span class="hljs-built_in">require</span>(<span class="hljs-string">'zlib'</span>);

<span class="hljs-keyword">const</span> content = loremIpsum({
    <span class="hljs-attr">count</span>: <span class="hljs-number">20</span>,
    <span class="hljs-attr">units</span>: <span class="hljs-string">"paragraph"</span>,
    <span class="hljs-attr">format</span>: <span class="hljs-string">"plain"</span>,
    <span class="hljs-attr">paragraphLowerBound</span>: <span class="hljs-number">5</span>,
    <span class="hljs-attr">paragraphUpperBound</span>: <span class="hljs-number">15</span>,
    <span class="hljs-attr">sentenceLowerBound</span>: <span class="hljs-number">5</span>,
    <span class="hljs-attr">sentenceUpperBound</span>: <span class="hljs-number">15</span>,
    <span class="hljs-attr">suffix</span>: <span class="hljs-string">"\n\n\n"</span>,
});

<span class="hljs-comment">// output some stats about the text's lenght.</span>
<span class="hljs-built_in">console</span>.log(<span class="hljs-string">`Generated a text with <span class="hljs-subst">${content.length}</span> characters and <span class="hljs-subst">${content.split(<span class="hljs-string">' '</span>).length}</span> words`</span>);

<span class="hljs-comment">// compress the content</span>
<span class="hljs-keyword">const</span> compressed = gzipSync(content);

<span class="hljs-comment">// more stats about the content</span>
<span class="hljs-built_in">console</span>.log(<span class="hljs-string">`total size (uncompressed): ~<span class="hljs-subst">${<span class="hljs-built_in">Math</span>.round(content.length/<span class="hljs-number">1024</span>)}</span> KB`</span>);
<span class="hljs-built_in">console</span>.log(<span class="hljs-string">`total size (compressed): ~<span class="hljs-subst">${<span class="hljs-built_in">Math</span>.round(compressed.length/<span class="hljs-number">1024</span>)}</span> KB`</span>);

<span class="hljs-comment">// config DynamoDB</span>
AWS.config.update({ <span class="hljs-attr">region</span>: <span class="hljs-string">'eu-west-1'</span> });
<span class="hljs-keyword">const</span> dynamoDbClient = <span class="hljs-keyword">new</span> AWS.DynamoDB();

dynamoDbClient.putItem({
    <span class="hljs-string">"TableName"</span>: <span class="hljs-string">"blog"</span>,
    <span class="hljs-string">"ReturnConsumedCapacity"</span>: <span class="hljs-string">"TOTAL"</span>,
    <span class="hljs-string">"Item"</span>: {
        <span class="hljs-string">"author"</span>: {
            <span class="hljs-string">"S"</span>: <span class="hljs-string">"bboure"</span>
        },
        <span class="hljs-string">"slug"</span>: {
            <span class="hljs-string">"S"</span>: <span class="hljs-string">"raw-blog-post"</span>
        },
        <span class="hljs-string">"title"</span>: {
            <span class="hljs-string">"S"</span>: <span class="hljs-string">"My blog post"</span>
        },
        <span class="hljs-string">"content"</span>: {
            <span class="hljs-string">"S"</span>: content,
        }
    }
}).promise().then(<span class="hljs-function"><span class="hljs-params">result</span> =&gt;</span> {
    <span class="hljs-built_in">console</span>.log(<span class="hljs-string">'Write capacity for raw post'</span>, result.ConsumedCapacity );
});


dynamoDbClient.putItem({
    <span class="hljs-string">"TableName"</span>: <span class="hljs-string">"blog"</span>,
    <span class="hljs-string">"ReturnConsumedCapacity"</span>: <span class="hljs-string">"TOTAL"</span>,
    <span class="hljs-string">"Item"</span>: {
        <span class="hljs-string">"author"</span>: {
            <span class="hljs-string">"S"</span>: <span class="hljs-string">"bboure"</span>
        },
        <span class="hljs-string">"slug"</span>: {
            <span class="hljs-string">"S"</span>: <span class="hljs-string">"compressed-blog-post"</span>
        },
        <span class="hljs-string">"title"</span>: {
            <span class="hljs-string">"S"</span>: <span class="hljs-string">"My blog post"</span>
        },
        <span class="hljs-string">"content"</span>: {
            <span class="hljs-string">"B"</span>: compressed,
        }
    }
}).promise().then(<span class="hljs-function"><span class="hljs-params">result</span> =&gt;</span> {
    <span class="hljs-built_in">console</span>.log(<span class="hljs-string">'Write capacity for compressed post'</span>, result.ConsumedCapacity );
});
</code></pre>
<p>In the script above, we generate a text of 20 paragraphs. Each paragraph will have between 5 and 15 sentences, and each sentence will be 5 to 15 words long. That is enough to generate a text of around 2000 words. Then, we compress the text and save both versions into DynamDB.</p>
<p>Let's run the script:</p>
<pre><code class="lang-bash">$ node write.js
Generated a text with 12973 characters and 1943 words
total size (uncompressed): ~13 KB
total size (compressed): ~4 KB
Write capacity <span class="hljs-keyword">for</span> compressed post { TableName: <span class="hljs-string">'blog'</span>, CapacityUnits: 4 }
Write capacity <span class="hljs-keyword">for</span> raw post { TableName: <span class="hljs-string">'blog'</span>, CapacityUnits: 14 }
</code></pre>
<p>As you can see, the raw text was around 13 KB and consumed 14 WCUs, while the compressed one only 4 KB for 4 RCUs. That looks right since 1 WCU computes for 1 KB of data. </p>
<p>By compressing the data we just saved ourselves 10 WCUs. That's a 70% gain! Not only that, but we also reduced the item size by 70%!  Since DynamoDB also charges us for storage, that can make a <strong>huge</strong> difference on our AWS bill! 🎉</p>
<h1 id="2-reads">2. Reads</h1>
<p>Now that we saved our blog post in DynamoDB we want to read it back. Let's create a new script that will read the items back and see how many RCUs they are consuming.</p>
<pre><code class="lang-javascript"><span class="hljs-comment">//read.js</span>
<span class="hljs-keyword">const</span> AWS = <span class="hljs-built_in">require</span>(<span class="hljs-string">'aws-sdk'</span>);
<span class="hljs-keyword">const</span> { gunzipSync } = <span class="hljs-built_in">require</span>(<span class="hljs-string">'zlib'</span>);

AWS.config.update({ <span class="hljs-attr">region</span>: <span class="hljs-string">'eu-west-1'</span> });
<span class="hljs-keyword">const</span> dynamoDbClient = <span class="hljs-keyword">new</span> AWS.DynamoDB();

dynamoDbClient.getItem({
    <span class="hljs-string">"TableName"</span>: <span class="hljs-string">"blog"</span>,
    <span class="hljs-string">"ReturnConsumedCapacity"</span>: <span class="hljs-string">"TOTAL"</span>,
    <span class="hljs-string">"Key"</span>: {
        <span class="hljs-string">"author"</span>: {
            <span class="hljs-string">"S"</span>: <span class="hljs-string">"bboure"</span>
        },
        <span class="hljs-string">"slug"</span>: {
            <span class="hljs-string">"S"</span>: <span class="hljs-string">"raw-blog-post"</span>
        },
    }
}).promise().then(<span class="hljs-function"><span class="hljs-params">result</span> =&gt;</span> {
    <span class="hljs-built_in">console</span>.log(<span class="hljs-string">'Read capacity for raw post'</span>, result.ConsumedCapacity );
});


dynamoDbClient.getItem({
    <span class="hljs-string">"TableName"</span>: <span class="hljs-string">"blog"</span>,
    <span class="hljs-string">"ReturnConsumedCapacity"</span>: <span class="hljs-string">"TOTAL"</span>,
    <span class="hljs-string">"Key"</span>: {
        <span class="hljs-string">"author"</span>: {
            <span class="hljs-string">"S"</span>: <span class="hljs-string">"bboure"</span>
        },
        <span class="hljs-string">"slug"</span>: {
            <span class="hljs-string">"S"</span>: <span class="hljs-string">"compressed-blog-post"</span>
        },
    }
}).promise().then(<span class="hljs-function"><span class="hljs-params">result</span> =&gt;</span> {
    <span class="hljs-built_in">console</span>.log(<span class="hljs-string">'Read capacity for compressed post'</span>, result.ConsumedCapacity );
    <span class="hljs-comment">// uncompress post content</span>
    <span class="hljs-keyword">const</span> content = gunzipSync(result.Item.content.B).toString();
    <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`Original text with <span class="hljs-subst">${content.length}</span> characters and <span class="hljs-subst">${content.split(<span class="hljs-string">' '</span>).length}</span> words`</span>);
});
</code></pre>
<p>Let's run it:</p>
<pre><code class="lang-bash">$ node read.js
Read capacity <span class="hljs-keyword">for</span> compressed post { TableName: <span class="hljs-string">'blog'</span>, CapacityUnits: 0.5 }
Original text with 12973 characters and 1943 words
Read capacity <span class="hljs-keyword">for</span> raw post { TableName: <span class="hljs-string">'blog'</span>, CapacityUnits: 2 }
</code></pre>
<p>At read time, we only consumed 0.5 RCUs against 2 for the uncompressed version. That's 4 times less! And as you can see, it is just as easy to uncompress the data back into its original form.</p>
<h1 id="3-secondary-indexes">3. Secondary indexes</h1>
<p>Before we call it a day, there is one last test I'd like to make. Sometimes, you want to add secondary indexes to your table. In our blog example, we could add a GSI that will index blog posts by author and sort them by timestamp. One could argue that you should probably avoid projecting the entire blog content in all your indexes (and I would definitely agree with that), but sometimes, you might not have a choice; and for the sake of completeness, we'll try it out.</p>
<p>Let's create another script that will test just that. I'm not going to copy the full script again here. Instead, just know that I added a <code>timestamp</code> attribute and I created a GSI index that projects all the attributes (Index name: <code>timestamp</code>, PK: <code>author</code>, SK: <code>timestamp</code>).</p>
<pre><code class="lang-bash">$ node write.js
Generated a text with 13673 characters and 1986 words
total size (uncompressed): ~13 KB
total size (compressed): ~4 KB
Write capacity <span class="hljs-keyword">for</span> compressed post {
  TableName: <span class="hljs-string">'blog'</span>,
  CapacityUnits: 12,
  Table: { CapacityUnits: 4 },
  GlobalSecondaryIndexes: { timestamp: { CapacityUnits: 8 } }
}
Write capacity <span class="hljs-keyword">for</span> raw post {
  TableName: <span class="hljs-string">'blog'</span>,
  CapacityUnits: 42,
  Table: { CapacityUnits: 14 },
  GlobalSecondaryIndexes: { timestamp: { CapacityUnits: 28 } }
}
</code></pre>
<p>As you can see, GSIs can be greedy in capacity units. That is because every write you make must be replicated to all your indexes. Our secondary index alone consumed 8 WCUs for the compressed post and a whopping 28 WCUs for the uncompressed version! Add the 4 and 14 WCUs that correspond to the table to that and we are at 12 vs 42 WCUs!</p>
<blockquote>
<p>Note: To be honest, I was expecting the GSI to consume the same amount of WCU as the table index (i.e.: 4 and 14). For some reason that I still don't understand, that amount is doubled. I could not find any information about why that happens. If you happen to know, please don't hesitate to drop a comment below. 🙏</p>
</blockquote>
<p>Here, even though the saving in terms of percentage is about the same (70%), the difference of capacity units starts to increase. We consumed 30 WCUs less with compressed content! Over time, this can quickly make a difference.</p>
<p>Note that here, there would be no difference in terms of RCUs when reading the data back. DynamoDB will read from the index that you provide in the query, and that index only.</p>
<h1 id="conclusion">Conclusion</h1>
<p>We just learned that by compressing large contents before saving them in DynamoDB, you can expect saving up to 70% in WCU, RCU and storage cost. This is significant enough to take the time and make the extra effort of compressing/decompressing the data as you read/write it.</p>
<p>If you'd like to read more content like this, <a target="_blank" href="https://twitter.com/Benoit_Boure">follow me on Twitter</a></p>
]]></content:encoded></item><item><title><![CDATA[How to use TypeScript with AppSync Lambda Resolvers]]></title><description><![CDATA[✏️ Edit - 2023-04-11: If you're interested in creating JavaScript resolvers with TypeScript rather than utilizing Lambda functions, check out this alternative article.

One of the great benefits of GraphQL is typing! Define your schema, and GraphQL e...]]></description><link>https://benoitboure.com/how-to-use-typescript-with-appsync-lambda-resolvers</link><guid isPermaLink="true">https://benoitboure.com/how-to-use-typescript-with-appsync-lambda-resolvers</guid><category><![CDATA[aws lambda]]></category><category><![CDATA[lambda]]></category><category><![CDATA[AWS]]></category><category><![CDATA[TypeScript]]></category><category><![CDATA[GraphQL]]></category><dc:creator><![CDATA[Benoît Bouré]]></dc:creator><pubDate>Wed, 03 Mar 2021 21:29:34 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1614692037497/pr_Jgcv1q.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote>
<p>✏️ Edit - 2023-04-11: If you're interested in creating JavaScript resolvers with TypeScript rather than utilizing Lambda functions, check out <a target="_blank" href="https://blog.graphbolt.dev/improving-developer-experience-with-typescript-how-to-write-strongly-typed-appsync-resolvers">this alternative article</a>.</p>
</blockquote>
<p>One of the great benefits of GraphQL is typing! Define your schema, and GraphQL enforces the input/output "shape" of your endpoints data.</p>
<p>If you are using Lambda as your AppSync resolvers with the <em>node.js</em> runtime, you might be using TypeScript, too. If you do, you might also be defining TS types that correspond to your schema. Doing this manually can be tedious, is prone to error, and is basically doing the same job twice! 🙁 Wouldn't it be great if you could import your GraphQL types into your code automatically?</p>
<p>In this article, I'll show you how to generate TypeScript types directly from your GraphQL schema, just by running a simple command line. Then, I'll teach you how to use those types in your Lambda resolvers.</p>
<p>Let's begin.</p>
<h1 id="heading-pre-requisites">Pre-requisites</h1>
<p>You should already have a basic AppSync project setup with a defined GraphQL schema (If you don't have one already, you can use the example down below).</p>
<p>For the purpose of this tutorial, I will take this simple schema as an example:</p>
<pre><code class="lang-graphql"><span class="hljs-keyword">type</span> Query {
    post(<span class="hljs-symbol">id:</span> ID!): Post
}

<span class="hljs-keyword">type</span> Mutation {
    createPost(<span class="hljs-symbol">post:</span> PostInput!): Post!
}

<span class="hljs-keyword">type</span> Post {
    <span class="hljs-symbol">id:</span> ID!
    <span class="hljs-symbol">title:</span> String!
    <span class="hljs-symbol">content:</span> String!
    <span class="hljs-symbol">publishedAt:</span> AWSDateTime
}

<span class="hljs-keyword">input</span> PostInput {
    <span class="hljs-symbol">title:</span> String!
    <span class="hljs-symbol">content:</span> String!
}
</code></pre>
<h1 id="heading-setting-up-the-project">Setting up the project</h1>
<h2 id="heading-install-the-dependencies">Install the dependencies</h2>
<p>We will need to install three packages:</p>
<pre><code class="lang-bash">npm i @graphql-codegen/cli @graphql-codegen/typescript @types/aws-lambda  -D
</code></pre>
<p>The first two packages belong to the <a target="_blank" href="https://github.com/dotansimha/graphql-code-generator">graphql-code-generator</a> suite. The first one is the base CLI, while the second one is the plugin that generates TypeScript code from a GraphQL schema.</p>
<p><a target="_blank" href="https://www.npmjs.com/package/@types/aws-lambda">@types/aws-lambda</a> is a collection of TypeScript types for AWS Lambda. It includes all sorts of Lambda event type definitions (API gateway, S3, SNS, etc.), including one for AppSync resolvers (<code>AppSyncResolverHandler</code>). We'll use that last one later when we build our resolvers.</p>
<h2 id="heading-create-the-configuration-file">Create the configuration file</h2>
<p>It's time to configure <code>graphql-codegen</code> and tell it how to generate our TS types. For that, we'll create a <code>codegen.yml</code> file:</p>
<pre><code class="lang-yml"><span class="hljs-attr">overwrite:</span> <span class="hljs-literal">true</span>
<span class="hljs-attr">schema:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-string">schema.graphql</span> <span class="hljs-comment">#your schema file</span>

<span class="hljs-attr">generates:</span>
  <span class="hljs-attr">appsync.d.ts:</span>
    <span class="hljs-attr">plugins:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">typescript</span>
</code></pre>
<p>This tells <em>codegen</em> which schema file(s) it should use (in the example: <code>schema.graphql</code>), what plugin (<code>typescript</code>) and where the output should be placed (<code>appsync.d.ts</code>). Fell free to change these parameters to match your needs.</p>
<h2 id="heading-support-for-aws-scalars">Support for AWS Scalars</h2>
<p>If you are using special <a target="_blank" href="https://docs.aws.amazon.com/appsync/latest/devguide/scalars.html">AWS AppSync Scalars</a>, you will also need to tell <code>graphql-codegen</code> how to handle them.</p>
<blockquote>
<p>💡 You need to declare, at the minimum, the scalars that you use, but it might be a good idea to just declare them all and forget about it.</p>
</blockquote>
<p>Let's create a new <code>appsync.graphql</code> file with the following content:</p>
<pre><code class="lang-graphql"><span class="hljs-keyword">scalar</span> AWSDate
<span class="hljs-keyword">scalar</span> AWSTime
<span class="hljs-keyword">scalar</span> AWSDateTime
<span class="hljs-keyword">scalar</span> AWSTimestamp
<span class="hljs-keyword">scalar</span> AWSEmail
<span class="hljs-keyword">scalar</span> AWSJSON
<span class="hljs-keyword">scalar</span> AWSURL
<span class="hljs-keyword">scalar</span> AWSPhone
<span class="hljs-keyword">scalar</span> AWSIPAddress
</code></pre>
<blockquote>
<p>⚠️ Don't place these types in the same file as your main schema. You only need them for code generation and they should not get into your deployment package to AWS AppSync.</p>
</blockquote>
<p>We also need to tell codegen how to map these scalars to TypeScript. For that, we will modify the <code>codegen.yml</code> file. Add/edit the following sections:</p>
<pre><code class="lang-yml"><span class="hljs-attr">schema:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-string">schema.graphql</span>
  <span class="hljs-bullet">-</span> <span class="hljs-string">appsync.graphql</span> <span class="hljs-comment"># 👈 add this</span>

<span class="hljs-comment"># and this 👇</span>
<span class="hljs-attr">config:</span>
  <span class="hljs-attr">scalars:</span>
    <span class="hljs-attr">AWSJSON:</span> <span class="hljs-string">string</span>
    <span class="hljs-attr">AWSDate:</span> <span class="hljs-string">string</span>
    <span class="hljs-attr">AWSTime:</span> <span class="hljs-string">string</span>
    <span class="hljs-attr">AWSDateTime:</span> <span class="hljs-string">string</span>
    <span class="hljs-attr">AWSTimestamp:</span> <span class="hljs-string">number</span>
    <span class="hljs-attr">AWSEmail:</span> <span class="hljs-string">string</span>
    <span class="hljs-attr">AWSURL:</span> <span class="hljs-string">string</span>
    <span class="hljs-attr">AWSPhone:</span> <span class="hljs-string">string</span>
    <span class="hljs-attr">AWSIPAddress:</span> <span class="hljs-string">string</span>
</code></pre>
<h1 id="heading-generate-the-code">Generate the code</h1>
<p>We are all set with the configuration. Time to generate some code! Run the following command:</p>
<pre><code class="lang-bash">graphql-codegen
</code></pre>
<blockquote>
<p>💡 You can also add <code>"codegen": "graphql-codegen"</code> to you package.json under the "scripts" section, and use <code>npm run codegen</code>.</p>
</blockquote>
<p>If you look in your working directory, you should now see an <code>appsync.d.ts</code> file that contains your generated types.</p>
<pre><code class="lang-ts"><span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> Maybe&lt;T&gt; = T | <span class="hljs-literal">null</span>;
<span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> Exact&lt;T <span class="hljs-keyword">extends</span> { [key: <span class="hljs-built_in">string</span>]: unknown }&gt; = { [K <span class="hljs-keyword">in</span> keyof T]: T[K] };
<span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> MakeOptional&lt;T, K <span class="hljs-keyword">extends</span> keyof T&gt; = Omit&lt;T, K&gt; &amp; { [SubKey <span class="hljs-keyword">in</span> K]?: Maybe&lt;T[SubKey]&gt; };
<span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> MakeMaybe&lt;T, K <span class="hljs-keyword">extends</span> keyof T&gt; = Omit&lt;T, K&gt; &amp; { [SubKey <span class="hljs-keyword">in</span> K]: Maybe&lt;T[SubKey]&gt; };
<span class="hljs-comment">/** All built-in and custom scalars, mapped to their actual values */</span>
<span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> Scalars = {
  ID: <span class="hljs-built_in">string</span>;
  <span class="hljs-built_in">String</span>: <span class="hljs-built_in">string</span>;
  <span class="hljs-built_in">Boolean</span>: <span class="hljs-built_in">boolean</span>;
  Int: <span class="hljs-built_in">number</span>;
  Float: <span class="hljs-built_in">number</span>;
  AWSDate: <span class="hljs-built_in">string</span>;
  AWSTime: <span class="hljs-built_in">string</span>;
  AWSDateTime: <span class="hljs-built_in">string</span>;
  AWSTimestamp: <span class="hljs-built_in">number</span>;
  AWSEmail: <span class="hljs-built_in">string</span>;
  AWSJSON: <span class="hljs-built_in">string</span>;
  AWSURL: <span class="hljs-built_in">string</span>;
  AWSPhone: <span class="hljs-built_in">string</span>;
  AWSIPAddress: <span class="hljs-built_in">string</span>;
};

<span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> Query = {
  __typename?: <span class="hljs-string">'Query'</span>;
  post?: Maybe&lt;Post&gt;;
};


<span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> QueryPostArgs = {
  id: Scalars[<span class="hljs-string">'ID'</span>];
};

<span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> Mutation = {
  __typename?: <span class="hljs-string">'Mutation'</span>;
  createPost: Post;
};


<span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> MutationCreatePostArgs = {
  post: PostInput;
};

<span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> Post = {
  __typename?: <span class="hljs-string">'Post'</span>;
  id: Scalars[<span class="hljs-string">'ID'</span>];
  title: Scalars[<span class="hljs-string">'String'</span>];
  content: Scalars[<span class="hljs-string">'String'</span>];
  publishedAt?: Maybe&lt;Scalars[<span class="hljs-string">'AWSDateTime'</span>]&gt;;
};

<span class="hljs-keyword">export</span> <span class="hljs-keyword">type</span> PostInput = {
  title: Scalars[<span class="hljs-string">'String'</span>];
  content: Scalars[<span class="hljs-string">'String'</span>];
};
</code></pre>
<p>Notice that, apart from some helper types at the top, different types are being generated:</p>
<ul>
<li><code>Scalars</code></li>
</ul>
<p>Contains all the basic scalars (ID, String, etc.) and the AWS custom Scalars.</p>
<ul>
<li><code>Query</code> and <code>Mutation</code></li>
</ul>
<p>These two types describe the full Query and Mutation types.</p>
<ul>
<li><code>Post</code></li>
</ul>
<p>This is our <em>Post</em> type from our schema translated into TypeScript. It is also the <em>return</em> value of the <code>post</code> query and the <code>createPost</code> mutation.</p>
<ul>
<li><code>QueryPostArgs</code> and <code>MutationCreatePostArgs</code></li>
</ul>
<p>These types describe the input <em>arguments</em> of the <code>post</code> Query and the <code>createPost</code> mutation, respectively.</p>
<blockquote>
<p>💡 Did you notice the name pattern here? Argument types are always named <code>Query[NameOfTheEndpoint]Args</code> and <code>Mutation[NameOfTheEndpoint]Args</code> in PascalCase. This is useful to know when you want to auto-complete types in your IDE.</p>
</blockquote>
<h1 id="heading-use-the-generated-types">Use the generated types</h1>
<p>Now that we have generated our types, it's time to use them!</p>
<p>Let's implement the <code>Query.post</code> resolver as an example.</p>
<p>Lambda handlers always receive 3 arguments:</p>
<ul>
<li><p><code>event</code>: contains information about the input query (arguments, identity, etc)</p>
</li>
<li><p><code>context</code>: contains information about the executed Lambda function</p>
</li>
<li><p><code>callback</code>: a function you can call when your handler finishes (if you are not using async/promises)</p>
</li>
</ul>
<p>The shape of an AppSync handler is almost always the same. It turns out that there is a <a target="_blank" href="https://www.npmjs.com/package/@types/aws-lambda">DefinitelyTyped package</a> that already defines it. We installed it at the beginning of this tutorial. Let's use it!</p>
<p>The <code>AppSyncResolverHandler</code> type takes two arguments. The first one is the type for the <code>event.arguments</code> object, and the second one is the return value of the resolver.</p>
<p>In our case that will be: <code>QueryPostArgs</code> and <code>Post</code>, respectively.</p>
<p>Here is how to use it:</p>
<pre><code class="lang-ts"><span class="hljs-keyword">import</span> db <span class="hljs-keyword">from</span> <span class="hljs-string">'./db'</span>;
<span class="hljs-keyword">import</span> { AppSyncResolverHandler } <span class="hljs-keyword">from</span> <span class="hljs-string">'aws-lambda'</span>;
<span class="hljs-keyword">import</span> {Post, QueryPostArgs} <span class="hljs-keyword">from</span> <span class="hljs-string">'./appsync'</span>;

<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> handler: AppSyncResolverHandler&lt;QueryPostArgs, Post&gt; = <span class="hljs-keyword">async</span> (event) =&gt; {
    <span class="hljs-keyword">const</span> post = <span class="hljs-keyword">await</span> db.getPost(event.arguments.id);

    <span class="hljs-keyword">if</span> (post) {
        <span class="hljs-keyword">return</span> post;
    }

    <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> <span class="hljs-built_in">Error</span>(<span class="hljs-string">'Not Found'</span>);
};
</code></pre>
<p>Now, our Lambda handler benefits from type-checking in 2 ways:</p>
<ul>
<li><p><code>event.arguments</code> will be of type <code>QueryPostArgs</code> (with the benefits of auto-complete!)</p>
</li>
<li><p>the <em>return</em> value, or the second argument of the <code>callback</code>, is expected to be of the same shape as <code>Post</code> (with an id, title, etc); or TypeScript will show you an error.</p>
</li>
</ul>
<h1 id="heading-advanced-usage">Advanced usage</h1>
<p>There are lots of options that let you customize your generated types. Check out <a target="_blank" href="https://graphql-code-generator.com/docs/plugins/typescript">the documentation</a> for more details!</p>
<h1 id="heading-conclusion">Conclusion</h1>
<p>By auto-generating types, you will not only improve your development speed and experience but will also ensure that your resolvers do what your API is expecting. You also ensure that your code types and your schema types are always in perfect sync, avoiding mismatches that could lead to bugs.</p>
<p>Don't forget to re-run the <code>graphql-codegen</code> command each time you edit your schema! It might be a good idea to automate the process or validate your types in your CI/CD pipeline.</p>
<div class="hn-embed-widget" id="graphbolt"></div>]]></content:encoded></item><item><title><![CDATA[How to use DynamoDB single-table design with AppSync]]></title><description><![CDATA[AppSync is a fully managed service from AWS that lets you build and deploy scalable and secure GraphQL APIs in the cloud. DynamoDB is a NoSQL fully managed and scalable database. Both being serverless services, they are very often used together.
If y...]]></description><link>https://benoitboure.com/how-to-use-dynamodb-single-table-design-with-appsync</link><guid isPermaLink="true">https://benoitboure.com/how-to-use-dynamodb-single-table-design-with-appsync</guid><category><![CDATA[GraphQL]]></category><category><![CDATA[DynamoDB]]></category><category><![CDATA[APIs]]></category><category><![CDATA[AWS]]></category><dc:creator><![CDATA[Benoît Bouré]]></dc:creator><pubDate>Thu, 25 Feb 2021 19:57:09 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1614262750324/87GtUFg2G.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>AppSync is a fully managed service from AWS that lets you build and deploy scalable and secure GraphQL APIs in the cloud. DynamoDB is a NoSQL fully managed and scalable database. Both being serverless services, they are very often used together.</p>
<p>If you are interested in DynamoDB, you have also probably already heard about single-table design. However, that way of designing databases is <a target="_blank" href="https://www.alexdebrie.com/posts/dynamodb-single-table/#graphql--single-table-design">often considered to be unhelpful with GraphQL</a>.</p>
<p>In this blog post, I will share some ideas on how we can still use single-table design and its benefits with GraphQL, and show some techniques that I use with AWS AppSync.</p>
<p>Let's begin.</p>
<h1 id="heading-its-all-about-access-pattern">It's all about access pattern</h1>
<p>If you have watched Rick Houlihan's talks at AWS re:invent (In <a target="_blank" href="https://www.youtube.com/watch?v=jzeKPKpucS0">2017</a>, <a target="_blank" href="https://www.youtube.com/watch?v=HaEPXoXVf2k">2018</a> and <a target="_blank" href="https://www.youtube.com/watch?v=6yqfmXiZTlM">2019</a>), or read <a target="_blank" href="https://www.dynamodbbook.com/">Alex DeBrie's book</a> (I totally recommend it if you haven't), you probably know it by now: The key to success with single table design is that you should <strong>know your access patterns in advance</strong>.</p>
<p>It is likely that your access patterns will reflect what the different pages or views of your application will show. Something like</p>
<blockquote>
<p>For a given user, show user details and his/her last 10 orders</p>
</blockquote>
<p>Then as the user drills down to a particular order, you will show the order and all its order items (That is a second access pattern).</p>
<p>However, one of the key features of GraphQL is that you can fetch nested children as deep as you need. In our example, you could ask "Give me that user's details with his/her last 10 orders <strong>and</strong> all the items of those orders".</p>
<p>Under the hood, GraphQL uses <em>resolvers</em> that fetch data from the persistence storage and return it in the query response. Most of the time, each child is a different resolver that receives the parent (or source) data, that you can use to fetch related data. This is how GraphQL "joins" data, and this is probably the way you have been using DynamoDB with GraphQL so far. DynamoDB doesn't have JOINs, GraphQL fills that gap for you!</p>
<p>With all that in mind, it does not look like single-table design DynamoDB has many benefits to bring to GraphQL.</p>
<p>But wait a minute...</p>
<p>One of the spirits of GraphQL is <em>no under- and over fetching</em>. GraphQL lets you choose what fields and children you need in your client application and lets you fetch those fields specifically. In our previous example, even though GraphQL allows you to, will you ever do a query that fetches all the users, and all the orders, and all the items in your client application? Unless you are building a public API with unpredictable access patterns, chances are that the answer is <em>No</em> (except for debugging or exploring your data, maybe).</p>
<p>It's all about access patterns! Just as I explained earlier, you will probably show a list of orders first, and when the user clicks on an order you will show the details. That's <strong>two</strong> different queries. They might look like this:</p>
<p>Query 1: Fetch a User and related order:</p>
<pre><code class="lang-graphql">  user(<span class="hljs-symbol">id:</span> <span class="hljs-string">"123"</span>) {
    id
    email
    name
    orders {
      id
      orderDate
      shippedDate
    }
  }
</code></pre>
<p>Query 2: Fetch an Order and related items:</p>
<pre><code class="lang-graphql">order(<span class="hljs-symbol">id:</span> <span class="hljs-string">"456"</span>) {
    id
    orderDate
    shippedDate
    items {
      productId
      name
      quantity
      price
    }
}
</code></pre>
<p>We still have the same 2 access patterns from the beginning.</p>
<p>By now you might think</p>
<blockquote>
<p>But I still need one resolver for each entity</p>
</blockquote>
<p>No, you don't! We'll see how in the next section.</p>
<h1 id="heading-build-your-resolvers-with-your-access-patterns-in-mind">Build your resolvers with your access patterns in mind</h1>
<p>Maybe one of the most common misconceptions with GraphQL is that each nested entity is a different resolver. However, this does not have to be the case. You can easily return child elements from the parent resolver. If you do so, you don't even need a child resolver at all. This can even often be the case with DynamoDB if you denormalise some relations in the parent item (ie: in a Map or a List attribute). But it can also work when the entities are decoupled but you are able to fetch them all in one query, for example with a JOIN (with RDBMS) or if your items live under the same partition (with DynamoDB).</p>
<p>Let's see how this can work in our example and what kind of resolvers we need.</p>
<p>Our <em>user</em> resolver can probably get the user <strong>and</strong> the orders from DynamoDB in a single query, then return them all in one resolver; while the <em>order</em> resolver can do the same with a particular order and its items.</p>
<p>Here is what the <em>order</em> resolver might look like (<em>request</em> template):</p>
<pre><code class="lang-json">## Fetch the Order and OrderItems (they are under the same partition)
{
    <span class="hljs-attr">"version"</span>: <span class="hljs-string">"2017-02-28"</span>,
    <span class="hljs-attr">"operation"</span>: <span class="hljs-string">"Query"</span>,
    <span class="hljs-attr">"query"</span>: {
      <span class="hljs-attr">"expression"</span>: <span class="hljs-string">"#PK = :PK"</span>,
      <span class="hljs-attr">"expressionNames"</span>: {
        <span class="hljs-attr">"#PK"</span>: <span class="hljs-string">"PK"</span>
      },
      <span class="hljs-attr">"expressionValues"</span>: {
        <span class="hljs-attr">":PK"</span>: $util.dynamodb.toDynamoDBJson(<span class="hljs-string">"ORDER#${context.args.id}"</span>)
      }
    }
}
</code></pre>
<p>and the <em>response</em> template</p>
<pre><code class="lang-typescript">## re-organize the data
#<span class="hljs-keyword">if</span>($context.result.items.size() == <span class="hljs-number">0</span>)
  $utils.error(<span class="hljs-string">"NotFound"</span>, <span class="hljs-string">"NotFound"</span>);
#<span class="hljs-keyword">else</span>
  #set ($order = {})
  #foreach($item <span class="hljs-keyword">in</span> $context.result.items)
    #<span class="hljs-keyword">if</span>($item.SK.startsWith(<span class="hljs-string">"ORDER#"</span>))
      #set ($order = $item)
      $util.qr($order.put(<span class="hljs-string">"items"</span>, []))
    #<span class="hljs-keyword">else</span>
      #<span class="hljs-keyword">if</span>($item.SK.startsWith(<span class="hljs-string">"ORDERITEM#"</span>))
        $util.qr($order.items.add($item))
      #end
    #end
  #end
  $utils.toJson($order)
#end
</code></pre>
<p>In the response template, we receive all the items in a single array. All we have to do is to re-organize them a little (we embed the items inside the order itself). We then return the whole thing.</p>
<p>We're sending only one query to DynamoDB! 🎉</p>
<h2 id="heading-the-drawbacks">The drawbacks</h2>
<p>This way of doing things comes with a couple of issues though:</p>
<ol>
<li><p>If you only need fields from the order (your query does not include the <em>items</em> field), you will be over-fetching.</p>
</li>
<li><p>It only works with that single access pattern. If you need to access the order items from let's say the <em>updateOrder</em> endpoint, it won't work because they won't come in that DynamoDB access pattern.</p>
</li>
</ol>
<p>Let's tackle these issues one at a time</p>
<h3 id="heading-avoid-over-fetching-children-entities-if-they-are-not-explicitly-included-in-the-query">Avoid over-fetching children entities if they are not explicitly included in the query</h3>
<p>That one is easy. Every AppSync query comes with a <a target="_blank" href="https://docs.aws.amazon.com/appsync/latest/devguide/resolver-context-reference.html">Context object</a> that contains information about the query. Things like <code>args</code>, <code>source</code>, <code>identity</code>, <code>request</code> and <code>info</code>. The last one is the one we're interested in. It gives us information about the Query and more specifically, the <code>selectionSetList</code> attribute tells us what <strong>fields</strong> were required in the query. We can use that to change our request to DynamoDB and include the item orders, or not, depending on its value. Let's adjust our request template to use it.</p>
<pre><code class="lang-typescript">#set($expression=<span class="hljs-string">"#PK = :PK"</span>)
#set($expressionNames={<span class="hljs-string">"#PK"</span>: <span class="hljs-string">"PK"</span>})
#set($expressionValues={<span class="hljs-string">":PK"</span>: $util.dynamodb.toDynamoDB(<span class="hljs-string">"ORDER#$context.args.id"</span>)})
## <span class="hljs-keyword">if</span> the selectionSetList does not contain the items, we fetch only the order
#<span class="hljs-keyword">if</span>(!$ctx.info.selectionSetList.contains(<span class="hljs-string">"items"</span>))
  #set($expression=$expression + <span class="hljs-string">" and #SK = :SK"</span>)
  $util.qr($expressionNames.put(<span class="hljs-string">"#SK"</span>, <span class="hljs-string">"SK"</span>))
  $util.qr($expressionValues.put(<span class="hljs-string">":SK"</span>, $util.dynamodb.toDynamoDB(<span class="hljs-string">"ORDER#${context.args.id}"</span>)))
#end
{
    <span class="hljs-string">"version"</span> : <span class="hljs-string">"2017-02-28"</span>,
    <span class="hljs-string">"operation"</span> : <span class="hljs-string">"Query"</span>,
    <span class="hljs-string">"query"</span> : {
      <span class="hljs-string">"expression"</span>: <span class="hljs-string">"$expression"</span>,
      <span class="hljs-string">"expressionNames"</span>: $util.toJson($expressionNames),
      <span class="hljs-string">"expressionValues"</span>: $util.toJson($expressionValues)
    }
}
</code></pre>
<p>When the <code>selectionSetList</code> does not include "items", we limit the Query to only the Order item itself by adding an SK condition. Now we only fetch what we need when we need it.</p>
<h3 id="heading-make-other-access-patterns-to-work">Make other access patterns to work</h3>
<p>This one is a little more tricky. If we want the <em>updateOrder</em> query to return order items as well, we will need to do it in 2 steps. This means that we will need a resolver for the order items. Unfortunately, here we have no choice. Let's write our <em>order.items</em> resolver.</p>
<pre><code class="lang-typescript">{
    <span class="hljs-string">"version"</span> : <span class="hljs-string">"2017-02-28"</span>,
    <span class="hljs-string">"operation"</span> : <span class="hljs-string">"Query"</span>,
    <span class="hljs-string">"query"</span> : {
      <span class="hljs-string">"expression"</span>: <span class="hljs-string">"#PK = :PK and begins_with(#SK, :SK)"</span>,
      <span class="hljs-string">"expressionNames"</span> : {
        <span class="hljs-string">"#PK"</span> : <span class="hljs-string">"PK"</span>,
        <span class="hljs-string">"#SK"</span> : <span class="hljs-string">"SK"</span>
      },
      <span class="hljs-string">"expressionValues"</span> : {
        <span class="hljs-string">":PK"</span>: $util.dynamodb.toDynamoDBJson(<span class="hljs-string">"ORDER#${context.source.id}"</span>),
        <span class="hljs-string">":SK"</span>: $util.dynamodb.toDynamoDBJson(<span class="hljs-string">"ORDERITEM#"</span>)
      }
    }
}
</code></pre>
<p>Now we can see related order items for an Order in any query. And this is probably what you want! Otherwise, your GraphQL API would be sort of inconsistent where data is returned in some request and not in others.</p>
<p>But wait! We just broke our previous access pattern! Since resolvers are associated with a Type (in our case: Order) and the type is the same, that new resolver will be used by the <em>order</em> endpoint, too. This means, that the extra query will also be executed in that case, making all the efforts we have done so far useless. Worse, we would be fetching the order items <strong>twice</strong>! Is there a way we can avoid that?</p>
<p>Remember the Context object? It also comes with the <code>source</code> attribute. We even just used it to get the id of the Order and fetch the related order items. That object actually comes with the <strong>full</strong> result from the previous (parent) resolver, including the order <em>items</em>, if any. We can use that in our <em>order.items</em> resolver and avoid the extra query <em>if</em> the order items come pre-populated from the source. For that, we can use the <em>#return</em> directive.</p>
<pre><code class="lang-typescript">#<span class="hljs-keyword">if</span>($ctx.source.items)
  #<span class="hljs-keyword">return</span>($ctx.source.items)
#<span class="hljs-keyword">else</span>
{
    <span class="hljs-string">"version"</span> : <span class="hljs-string">"2017-02-28"</span>,
    <span class="hljs-string">"operation"</span> : <span class="hljs-string">"Query"</span>,
    <span class="hljs-string">"query"</span> : {
      <span class="hljs-string">"expression"</span>: <span class="hljs-string">"#PK = :PK and begins_with(#SK, :SK)"</span>,
      <span class="hljs-string">"expressionNames"</span> : {
        <span class="hljs-string">"#PK"</span> : <span class="hljs-string">"PK"</span>,
        <span class="hljs-string">"#SK"</span> : <span class="hljs-string">"SK"</span>
      },
      <span class="hljs-string">"expressionValues"</span> : {
        <span class="hljs-string">":PK"</span>: $util.dynamodb.toDynamoDBJson(<span class="hljs-string">"ORDER#${context.source.id}"</span>),
        <span class="hljs-string">":SK"</span>: $util.dynamodb.toDynamoDBJson(<span class="hljs-string">"ORDERITEM#"</span>)
      }
    }
}
#end
</code></pre>
<p>By returning early in the request template, the DynamoDB query will not be executed at all and the data from the previous resolver will just pass-through.</p>
<h2 id="heading-limitations">Limitations</h2>
<p>The techniques I just showed you have some other limitations to keep in mind.</p>
<h3 id="heading-deep-nesting">Deep nesting</h3>
<p>It can only work well when you have 2 levels of nesting. In DynamoDB, with single-table design, you will almost never group more than 2 levels of hierarchy under the same partition key. You will probably not store orders and order items under the user PK. Instead, you will store the items under a GSI with the order id as the PK. You won't be able to fetch all these items in one single query and you will need at least two. Usually, you will be able to group entities two by two (4 levels of hierarchy = 2 grouped queries to DynamoDB).</p>
<p>That said, it still really depends on your access pattern. If your client API almost always fetches 3 levels, or more, of hierarchy in a single query, you might still group them under the same PK and filter/re-order the items in your top resolver. It might just become more complicated to maintain and you might be reading more than you need in some cases. You might also hit other limitations like the 1MB DynamoDB limit a lot faster.</p>
<h3 id="heading-pagination">Pagination</h3>
<p>In our example, our users will have an unbounded number of Orders and you probably don't want to return them all in one single query. You will for example get the last 10 in one query, and then paginate. You might want to have a query like this one:</p>
<pre><code class="lang-graphql">  user(<span class="hljs-symbol">id:</span> <span class="hljs-string">"123"</span>) {
    id
    email
    name
    orders(<span class="hljs-symbol">nextToken:</span> <span class="hljs-string">"ey........"</span>) {
      id
      orderDate
      shippedDate
    }
  }
</code></pre>
<p>Our design will simply not work in this case because when you pass <code>nextToken</code>, you simply won't get the Order item at all in the DyamoDB response. In fact, you won't even have access to the <code>nextToken</code> argument from your <em>user</em> resolver.</p>
<p>That said, it would also probably be a bad design of your GraphQL API. Do you still want to bring back the user for every orders page? Probably not. If you need to paginate, you should probably have another endpoint in your API. Something like <code>ordersByUserId(userId: ID!): [Order]</code>, and use that instead.</p>
<h3 id="heading-sorting">Sorting</h3>
<p>Sorting is also limited. You might for instance sometimes want to get the last 10 orders, and sometimes the first 10, for a given user. Now you are limited to a direction depending on where you placed the User item in the partition. If it's at the beginning (and your orders are sorted by date), you will get the user's first orders, and if it's at the end, the last ones. Sorting is basically limited to how you designed your table in the first place.</p>
<p>If you need 2 ways of sorting orders (ASC and DESC), you have in fact 2 access patterns. What you would normally do is to add a GSI to your table for the second access pattern. You could then use an index or the other depending on the direction requested by your query. That's another level of complexity to take into consideration.</p>
<p>If you use a sub-resolver (one for the User and one for the Orders), all you would have to do is change the <code>ScanIndexForward</code> param in your <em>items</em> query.</p>
<h1 id="heading-conclusions">Conclusions</h1>
<p>We just saw how single-table design can work well with GraphQL and how we can use its benefits to reduce DynamoDB calls. We found out that it comes with a few challenges and how to deal with them. We also learned about the limitations and things to take into account before using this method. If your question is:</p>
<blockquote>
<p>Is is worth it?</p>
</blockquote>
<p>Well, it probably depends on your use case. If you know your access patterns well in advance, it can give you a little performance boost. If what you need is flexibility or you have unpredictable access patterns, you probably should stick to keeping your resolvers de-coupled.</p>
<p>If you have comments, suggestions or questions, let me know!</p>
<div class="hn-embed-widget" id="graphbolt"></div>]]></content:encoded></item><item><title><![CDATA[How I used DynamoDB as a long-term cache layer for AppSync]]></title><description><![CDATA[Originally published on Medium

Updated on 2020–12–26: After posting this, I realized that it could be improved this, even more, by using DynamoDB TTL. I updated this article accordingly.
While I was working on a GraphQL API, I needed a couple of res...]]></description><link>https://benoitboure.com/how-i-used-dynamodb-as-a-long-term-cache-layer-for-appsync</link><guid isPermaLink="true">https://benoitboure.com/how-i-used-dynamodb-as-a-long-term-cache-layer-for-appsync</guid><category><![CDATA[DynamoDB]]></category><category><![CDATA[caching]]></category><category><![CDATA[cache]]></category><dc:creator><![CDATA[Benoît Bouré]]></dc:creator><pubDate>Mon, 21 Dec 2020 17:38:53 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1614526375184/vWmljGB1i.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote>
<p>Originally published <a target="_blank" href="https://bboure.medium.com/how-i-used-dynamodb-as-a-long-term-cache-layer-for-appsync-3d45391f431e">on Medium</a></p>
</blockquote>
<p><strong>Updated on 2020–12–26</strong>: After posting this, I realized that it could be improved this, even more, by using DynamoDB TTL. I updated this article accordingly.</p>
<p>While I was working on a GraphQL API, I needed a couple of resolvers hitting remote HTTP endpoints. This is rather straightforward with AppSync HTTP data sources. However, I didn't want it hitting the remote APIs at every single request. There were several reasons for that:</p>
<ul>
<li><p><strong>Latency</strong>: Due to several factors, like the location of the remote endpoint, it could add a noticeable overhead and add extra time to the request execution.</p>
</li>
<li><p><strong>Throttling</strong>: I did not want to spam the remote endpoints and suffer possible throttling, or even worst: being banned.</p>
</li>
<li><p><strong>API quotas</strong>: Some of the remote endpoints also had quotas and I did not want to reach the limits too fast.</p>
</li>
</ul>
<p>Because most of the data was not going to change over time anyway, the natural choice, in this case, was to use a cache layer.</p>
<p>My first instinct was to turn towards the <a target="_blank" href="https://docs.aws.amazon.com/appsync/latest/devguide/enabling-caching.html">AppSync caching</a> capabilities. AppSync comes out-of-the-box with a built-in server-side cache. It offers per-request and per-resolver caching. Unfortunately, it had two drawbacks for me:</p>
<ul>
<li><p><strong>Cost</strong>: Starting at $0.044 up to $6.775 per hour, it can quickly become expensive if the workload increases.</p>
</li>
<li><p>TTL is limited to <strong>3600 seconds</strong>, after what cached data will expire.</p>
</li>
</ul>
<p>The main issue for me here was the time limit. With a 1-hour cache TTL, data would be flushed every hour and the remote endpoints would have to be hit again. In my case, this was still too often, especially because I was totally OK with a day-old data or even <strong>a month-old</strong>, in some cases. So, I started looking for alternatives.</p>
<p>The data I had to store was plain JSON objects. So I thought: how about DynamoDB? I could store them as a document in a table. Then, I could have a resolver that looks into the table for a given cache key. If the record is found (and hasn't expired), return it; otherwise, fetch fresh data from the source, store it into the table for later and return the data. Because DynamoDB is fast, it sounded like a good idea.</p>
<p>Now, a naive approach would have been to use a Lambda resolver that would do just that. It would have worked for sure, but there is a better alternative that is <strong>faster and cheaper</strong>. AppSync supports <a target="_blank" href="https://docs.aws.amazon.com/appsync/latest/devguide/pipeline-resolvers.html">pipeline resolvers</a>. Pipeline resolvers let you execute multiple operations, or "functions", to resolve one single field. This was just what I needed. My resolver would be composed of 3 functions:</p>
<ol>
<li><p>Try and fetch data from a DynamoDB table. If there is a hit, **skip the following functions **and return the data.</p>
</li>
<li><p>If there was no hit, go fetch the remote data from the source.</p>
</li>
<li><p>Save the data into the DynamoDB table and return it.</p>
</li>
</ol>
<p>Let me show you how I implemented this with a simplified example. In this demo, we will build a GraphQL API that fetches Wikipedia articles. Because we don't want to spam Wikipedia's servers and because articles don't change that much very often, they should be cached for a month before we have to hit Wikipedia again and get the updated versions.</p>
<p>To build that, we will use the <a target="_blank" href="https://www.serverless.com/">Serverless Framework</a> and the <a target="_blank" href="https://github.com/sid88in/serverless-appsync-plugin">AppSync plugin</a>. I will not go into details on how the plugin works. For more information, please refer to the documentation on the repository or <a target="_blank" href="https://www.serverless.com/blog/running-scalable-reliable-graphql-endpoint-with-serverless">this series of articles</a>.</p>
<p>I will explain the most important parts only, but you can find the full code of this example <a target="_blank" href="https://github.com/bboure/appsync-long-cache-demo">on GitHub</a>.</p>
<p>Let's start with the serverless.yml.</p>
<pre><code class="lang-yaml">    <span class="hljs-attr">mappingTemplates:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">type:</span> <span class="hljs-string">Query</span>
        <span class="hljs-attr">field:</span> <span class="hljs-string">wikipedia</span>
        <span class="hljs-attr">kind:</span> <span class="hljs-string">PIPELINE</span>
        <span class="hljs-attr">functions:</span>
          <span class="hljs-bullet">-</span> <span class="hljs-string">fetchFromCache</span>
          <span class="hljs-bullet">-</span> <span class="hljs-string">fetchWikipedia</span>
          <span class="hljs-bullet">-</span> <span class="hljs-string">saveToCache</span>

    <span class="hljs-attr">dataSources:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">type:</span> <span class="hljs-string">HTTP</span>
        <span class="hljs-attr">name:</span> <span class="hljs-string">wikipedia</span>
        <span class="hljs-attr">description:</span> <span class="hljs-string">'Wikipedia api'</span>
        <span class="hljs-attr">config:</span>
          <span class="hljs-attr">endpoint:</span> <span class="hljs-string">https://en.wikipedia.org</span>

      <span class="hljs-bullet">-</span> <span class="hljs-attr">type:</span> <span class="hljs-string">AMAZON_DYNAMODB</span>
        <span class="hljs-attr">name:</span> <span class="hljs-string">wikicache</span>
        <span class="hljs-attr">description:</span> <span class="hljs-string">'Wikipedia cached titles'</span>
        <span class="hljs-attr">config:</span>
          <span class="hljs-attr">tableName:</span>
            <span class="hljs-attr">Ref:</span> <span class="hljs-string">WikipediaTable</span>

    <span class="hljs-attr">functionConfigurations:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">dataSource:</span> <span class="hljs-string">wikicache</span>
        <span class="hljs-attr">name:</span> <span class="hljs-string">fetchFromCache</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">dataSource:</span> <span class="hljs-string">wikicache</span>
        <span class="hljs-attr">name:</span> <span class="hljs-string">saveToCache</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">dataSource:</span> <span class="hljs-string">wikipedia</span>
        <span class="hljs-attr">name:</span> <span class="hljs-string">fetchWikipedia</span>

<span class="hljs-attr">resources:</span>
  <span class="hljs-attr">Resources:</span>
    <span class="hljs-attr">WikipediaTable:</span>
      <span class="hljs-attr">Type:</span> <span class="hljs-string">AWS::DynamoDB::Table</span>
      <span class="hljs-attr">Properties:</span>
        <span class="hljs-attr">TableName:</span> <span class="hljs-string">wikipedia</span>
        <span class="hljs-attr">BillingMode:</span> <span class="hljs-string">PAY_PER_REQUEST</span>
        <span class="hljs-attr">TimeToLiveSpecification:</span>
          <span class="hljs-attr">AttributeName:</span> <span class="hljs-string">expires_at</span>
          <span class="hljs-attr">Enabled:</span> <span class="hljs-literal">true</span>
        <span class="hljs-attr">AttributeDefinitions:</span>
          <span class="hljs-bullet">-</span> <span class="hljs-attr">AttributeName:</span> <span class="hljs-string">title</span>
            <span class="hljs-attr">AttributeType:</span> <span class="hljs-string">S</span>
          <span class="hljs-bullet">-</span> <span class="hljs-attr">AttributeName:</span> <span class="hljs-string">expires_at</span>
            <span class="hljs-attr">AttributeType:</span> <span class="hljs-string">N</span>
        <span class="hljs-attr">KeySchema:</span>
          <span class="hljs-bullet">-</span> <span class="hljs-attr">AttributeName:</span> <span class="hljs-string">title</span>
            <span class="hljs-attr">KeyType:</span> <span class="hljs-string">HASH</span>
</code></pre>
<p><strong>mappingTemplates</strong></p>
<p>This is where we define our resolver. As I explained earlier this is going to be a <strong>PIPELINE</strong> resolver with three consecutive functions: <em>fetchFromCache</em>, <em>fetchWikipedia</em> and <em>saveToCache.</em></p>
<p><strong>dataSources</strong></p>
<p>Here we define our two data sources:</p>
<ul>
<li><p>an HTTP endpoint which points to the Wikipedia API in English</p>
</li>
<li><p>a DynamoDB table</p>
</li>
</ul>
<p><strong>functionConfigurations</strong></p>
<p>And here, we declare the 3 pipeline functions that we use in the data source we created earlier.</p>
<p><strong>WikipediaTable resource</strong></p>
<p>Finally, we declare our DynamoDB table. It will have a HASH key, which will be the <em>title</em> of the article. We also set a <em>TimeToLiveSpecification</em> on the <em>expires_at attribute.</em></p>
<p><a target="_blank" href="https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/TTL.html">DynamoDB Time to Live (TTL)</a> is a feature that allows us to define a per-record timestamp when the record is no longer needed. When the timestamp is reached, the record is deleted. We will use that in order to auto-expire the cache.</p>
<p>Now, we need to define our mapping templates. There are a few of them. Let's go through them in the order they will be executed.</p>
<p>Let's begin with the "before" pipeline request mapping template.</p>
<pre><code class="lang-typescript">## Query.wikipedia.request.vtl
$util.qr($ctx.stash.put(<span class="hljs-string">"title"</span>, $ctx.args.title))
{}
</code></pre>
<p>Here, we simply put the title argument coming from the request into the stash (<a target="_blank" href="https://github.com/bboure/appsync-long-cache-demo/blob/master/schema.graphql">see the schema definition</a>). We will use it later in the pipeline. We also return an empty Map (because mapping templates cannot be empty).</p>
<p>At this point, the first function in the pipeline will be called: <em>fetchFromCache</em></p>
<pre><code class="lang-typescript">## fetchFromCache.request.vtl
{
  <span class="hljs-string">"version"</span>: <span class="hljs-string">"2018-05-29"</span>,
  <span class="hljs-string">"operation"</span>: <span class="hljs-string">"GetItem"</span>,
  <span class="hljs-string">"key"</span>: {
    <span class="hljs-string">"title"</span>: $util.dynamodb.toStringJson(<span class="hljs-string">"${ctx.stash.title}"</span>)
  }
}
</code></pre>
<p>Here, we execute a <em>GetItem</em> operation on our DynamoDB table using the title of the article as the key. Let's see what is in the response template:</p>
<pre><code class="lang-typescript">#<span class="hljs-keyword">if</span>($ctx.error)
  $util.error($ctx.error.message, $ctx.error.type)
#end

#<span class="hljs-keyword">if</span>($ctx.result)
  $util.qr($ctx.stash.put(<span class="hljs-string">"result"</span>, $ctx.result.content))
#end
{}
</code></pre>
<p>First, check for any error, and stop the process if we find any. Then, if we have a result (it means that we have a hit!), we stick it into the stash. You will find out why later.</p>
<p>Now, at this point, if we have a hit, we want to stop the execution and return the value to the user. It turns out that AppSync has a neat solution for that: the <a target="_blank" href="https://docs.aws.amazon.com/appsync/latest/devguide/resolver-util-reference.html#aws-appsync-directives">return directive</a>.</p>
<blockquote>
<p>The <code>#return</code> directive comes handy if you need to return prematurely from any mapping template. <code>#return</code> is analogous to the <em>return</em> keyword in programming languages, as it will return from the closest scoped block of logic. What this means is using <code>#return</code> inside a resolver mapping template will return from the resolver. Additionally, using <code>#return</code> from a function mapping template will return from the function and will continue the execution to either the next function in the pipeline or the resolver response mapping template.</p>
</blockquote>
<p>The important part to notice here is that, when used in a pipeline function, <code>*#return</code><strong><em>will continue to the next function in the pipeline.</em></strong> <em>But we don't want the next function to be executed, right? Well, it turns out that if you call</em> <code>*#return</code> in a request mapping, it will skip to the next function <strong>without executing the current one</strong>.</p>
<p>This is why we previously kept the result into the stash. We will use it in the following two functions' request mappings to determine if there was a hit, and skip to the next one directly, in cascade. See our <em>fetchWikipedia</em> request template:</p>
<pre><code class="lang-typescript">## fetchWikipedia.request.vtl
## Bypass <span class="hljs-built_in">this</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">if</span> <span class="hljs-title">result</span> <span class="hljs-title">is</span> <span class="hljs-title">present</span> <span class="hljs-title">in</span> <span class="hljs-title">the</span> <span class="hljs-title">stash</span>
#<span class="hljs-title">if</span>(<span class="hljs-params">$ctx.stash.result</span>)
  #<span class="hljs-title">return</span>(<span class="hljs-params">$ctx.stash.result</span>)
#<span class="hljs-title">end</span>
</span>{
  <span class="hljs-string">"version"</span>: <span class="hljs-string">"2018-05-29"</span>,
  <span class="hljs-string">"method"</span>: <span class="hljs-string">"GET"</span>,
  <span class="hljs-string">"params"</span>: {
    <span class="hljs-string">"query"</span>: {
        <span class="hljs-string">"action"</span>: <span class="hljs-string">"query"</span>,
        <span class="hljs-string">"format"</span>: <span class="hljs-string">"json"</span>,
        <span class="hljs-string">"prop"</span>: <span class="hljs-string">"extracts"</span>,
        <span class="hljs-string">"exintro"</span>: <span class="hljs-string">"true"</span>,
        <span class="hljs-string">"titles"</span>: <span class="hljs-string">"${ctx.stash.title}"</span>,
        <span class="hljs-string">"explaintext"</span>: <span class="hljs-string">"true"</span>,
        <span class="hljs-string">"exsentences"</span>: <span class="hljs-number">10</span>
    }
  },
  <span class="hljs-string">"resourcePath"</span>: <span class="hljs-string">"/w/api.php"</span>
}
</code></pre>
<p>If we have a result in the stash, we call <code>*#return</code>* prematurely and continue to the next function. Otherwise, the function would get executed, the API endpoint would be called and the response template too, where we extract the data we need. Here it is:</p>
<pre><code class="lang-typescript">## fetchWikipedia.response.vtl
#<span class="hljs-keyword">if</span>($ctx.result.statusCode == <span class="hljs-number">200</span>)
    #set($body = $utils.parseJson($ctx.result.body))
    #foreach ($page <span class="hljs-keyword">in</span> $body.query.pages.entrySet())
        #<span class="hljs-keyword">if</span> ($page.value.title == $ctx.args.title)
            #<span class="hljs-keyword">return</span>($page.value.extract)
        #end
    #end
    $utils.error(<span class="hljs-string">"Article not found"</span>, <span class="hljs-string">"NotFound"</span>)
#<span class="hljs-keyword">else</span>
    $utils.error($ctx.result.statusCode, <span class="hljs-string">"Error"</span>)
#end
</code></pre>
<p>All right, we are almost there. There is just one last function to define.</p>
<p>Remember, if we return early within any function, the return directive will skip to the next function. So, here again, we need to check if we have a result in the stash and return early one more time. Otherwise, this is where we save the result from the previous function into DynamoDB. We also set an expiry timestamp for 30 days in the future:</p>
<pre><code class="lang-typescript">## saveToCache.request.vtl
## Bypass <span class="hljs-built_in">this</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">if</span> <span class="hljs-title">result</span> <span class="hljs-title">is</span> <span class="hljs-title">present</span> <span class="hljs-title">in</span> <span class="hljs-title">the</span> <span class="hljs-title">stash</span>
#<span class="hljs-title">if</span>(<span class="hljs-params">$ctx.stash.result</span>)
  #<span class="hljs-title">return</span>(<span class="hljs-params">$ctx.stash.result</span>)
#<span class="hljs-title">end</span>
#<span class="hljs-title">set</span>(<span class="hljs-params">$expires_at = $util.time.nowEpochSeconds() + 3600 * 24 * 30</span>)
</span>{
  <span class="hljs-string">"version"</span> : <span class="hljs-string">"2018-05-29"</span>,
  <span class="hljs-string">"operation"</span> : <span class="hljs-string">"PutItem"</span>,
  <span class="hljs-string">"key"</span> : {
    <span class="hljs-string">"title"</span> : $util.dynamodb.toStringJson(${ctx.stash.title})
  },
  <span class="hljs-string">"attributeValues"</span>: {
    <span class="hljs-string">"expires_at"</span>: $util.dynamodb.toNumberJson($expires_at),
    <span class="hljs-string">"content"</span>: $util.dynamodb.toStringJson($ctx.prev.result)
  }
}
</code></pre>
<p>And we return the result.</p>
<pre><code class="lang-typescript">## saveToCache.response.vtl
#<span class="hljs-keyword">if</span>($ctx.error)
  $util.error($ctx.error.message, $ctx.error.type)
#end
$utils.toJson($ctx.prev.result)
</code></pre>
<p>Finally, our "after" pipeline just forwards the result to the resolver</p>
<pre><code class="lang-typescript">## Query.wikipedia.response.vtl
$util.toJson($ctx.result)
</code></pre>
<p>And we are done!</p>
<p>Let's deploy, run some queries and look at the X-Ray traces to confirm what we just built works as expected.</p>
<p>We will use the following query and execute it, <strong>twice</strong>:</p>
<pre><code class="lang-typescript">query {
  wikipedia(title: <span class="hljs-string">"Cat"</span>)
}
</code></pre>
<p>which generates the following traces.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1614525779551/LjRb65hZz.png" alt="Traces of the first execution" /></p>
<p><em>Traces of the first execution</em></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1614525781496/YTPcCV662.png" alt="Traces of the second execution" /></p>
<p><em>Traces of the second execution</em></p>
<p>As you can see, the first time, our resolver executed the three steps sequentially. The second time though, only the <em>GetItem</em> operation was executed. The HTTP request was not executed at all and our resolver execution time even went down from 165ms to 47ms. Isn't that nice?</p>
<p>If you look carefully you will also notice 2 warning signs. These are the request mapping templates where we do an early return. I am not sure why X-Ray shows that as warnings but there are no errors at all showing in the details or the CloudWatch logs, and everything works as expected.</p>
<p>Here you have it, a long-term cache layer for AppSync, using only out-of-the-box functionalities.</p>
<div class="hn-embed-widget" id="graphbolt"></div>]]></content:encoded></item></channel></rss>