Understanding the DynamoDB Sort Key Order

Learn what the UTF-8 order is and how to use it to your advantage

Understanding the DynamoDB Sort Key Order

If you have been working with DynamoDB, you are probably quite familiar with the notion of Partition Keys and Sort Keys (aka PK and SK). You also know that Sort Keys are... well, sorted in ascending order by default. If your SK is of type Number the items will be sorted in numeric order (1, 3, 10, 50, 400), while if it's of type String they are sorted in "order of UTF-8 bytes". But what does that even mean and how does it affect the order of the items? Let's find out.

What is is

As per Wikipedia,

UTF-8 is a variable-width character encoding used for electronic communication [...] UTF-8 is capable of encoding all 1,112,064 valid character code points in Unicode using one to four one-byte (8-bit) code units.

What it means is that each and every character is assigned a specific byte, or code point, which have a numerical value.

It's that numerical value that DynamoDB uses to determine the order of your Sort Key.

If you want to know what character comes after which, a good start is to remember the order of the most common characters:

  1. numbers
  2. uppercase letters
  3. lowercase letters

Notice that letters are "grouped" by case (first the capital letters, and after that lowercase), which means that Zoey will come before alligator. This is really important to know if you want to avoid surprises later.

For a complete list of UTF-8 characters, including special ones, sorted by their bytes order, refer to this page.

How to use it to your advantage

Once you know how strings are being sorted, you can use that knowledge to your advantage. A very common practice with DynamoDB and single table design is to pre-join data by placing them into the same partition.

Depending on your access pattern, you might either want your parent item to be at the beginning, or at the end of the partition.

For example, if you have Orders (SK prefixed with ORDER#) and Order Items (SK prefixed with ITEM#), you will perhaps want the ORDER# item at the beginning of the partition, and the ITEM# items sorted by number, in ascending order after it. However, O comes after I, which means that ORDER# will be placed at the end of the partition.

image.png

You could scan the index backwards, sure, but then your items would be sorted in reversed order, breaking the access pattern.

How to fix that?

Use the UTF-8 sorting mechanism to your advantage! You need ORDER# to start with a character that comes before I to make sure it comes before ITEMS#. For that, you can use any character that comes before I. Any letter from A to H would work, but it might not be user friendly (for debugging and inspecting the data later) and could change the meaning of your prefix. Instead, it is usually most common to use special characters that come before A: for example #, $ or %. Let's use $ and rename ORDER# to $ORDER#.

image.png

Now the Order item is at the top of the partition and all items are sorted as expected. 🎉

Note that you can use the same trick in order to sort items in reversed order if necessary.

Example:

image.png

In the above example, you might want to scan the index in reverse order and get the latest vouchers at the top, in descending order. You can force the USER# item to be at the end of the partition by prefixing it with a character that is higher in the UTF-8 ranking. | or ~ are good examples.

image.png

Now, ~USER# is at the end of the partition and you can scan the index backwards.

Conclusion

In this article, you learned what is the UTF-8 bytes order, and how you can use it to your advantage in order to force the order of items in your DynamoDB partitions.

If you would like to read more content like this, follow me on Twitter and subscribe to my newsletter on Hashnode.


Photo credits: Markus Spiske on unsplash

Did you find this article valuable?

Support Benoît Bouré by becoming a sponsor. Any amount is appreciated!