If you have been working with DynamoDB, you are probably quite familiar with the notion of Partition Keys and Sort Keys (aka PK and SK). You also know that Sort Keys are... well, sorted in ascending order by default. If your SK is of type
Number the items will be sorted in numeric order (1, 3, 10, 50, 400), while if it's of type
String they are sorted in "order of UTF-8 bytes". But what does that even mean and how does it affect the order of the items? Let's find out.
What is is
As per Wikipedia,
UTF-8 is a variable-width character encoding used for electronic communication [...] UTF-8 is capable of encoding all 1,112,064 valid character code points in Unicode using one to four one-byte (8-bit) code units.
What it means is that each and every character is assigned a specific byte, or code point, which have a numerical value.
It's that numerical value that DynamoDB uses to determine the order of your Sort Key.
If you want to know what character comes after which, a good start is to remember the order of the most common characters:
- uppercase letters
- lowercase letters
Notice that letters are "grouped" by case (first the capital letters, and after that lowercase), which means that
Zoey will come before
alligator. This is really important to know if you want to avoid surprises later.
For a complete list of UTF-8 characters, including special ones, sorted by their bytes order, refer to this page.
How to use it to your advantage
Once you know how strings are being sorted, you can use that knowledge to your advantage. A very common practice with DynamoDB and single table design is to pre-join data by placing them into the same partition.
Depending on your access pattern, you might either want your parent item to be at the beginning, or at the end of the partition.
For example, if you have Orders (SK prefixed with
ORDER#) and Order Items (SK prefixed with
ITEM#), you will perhaps want the
ORDER# item at the beginning of the partition, and the
ITEM# items sorted by number, in ascending order after it. However,
O comes after
I, which means that
ORDER# will be placed at the end of the partition.
You could scan the index backwards, sure, but then your items would be sorted in reversed order, breaking the access pattern.
How to fix that?
Use the UTF-8 sorting mechanism to your advantage! You need
ORDER# to start with a character that comes before
I to make sure it comes before
ITEMS#. For that, you can use any character that comes before
I. Any letter from A to H would work, but it might not be user friendly (for debugging and inspecting the data later) and could change the meaning of your prefix. Instead, it is usually most common to use special characters that come before
A: for example
%. Let's use
$ and rename
Order item is at the top of the partition and all items are sorted as expected. 🎉
Note that you can use the same trick in order to sort items in reversed order if necessary.
In the above example, you might want to scan the index in reverse order and get the latest vouchers at the top, in descending order. You can force the
USER# item to be at the end of the partition by prefixing it with a character that is higher in the UTF-8 ranking.
~ are good examples.
~USER# is at the end of the partition and you can scan the index backwards.
In this article, you learned what is the UTF-8 bytes order, and how you can use it to your advantage in order to force the order of items in your DynamoDB partitions.
If you would like to read more content like this, follow me on Twitter and subscribe to my newsletter on Hashnode.
Photo credits: Markus Spiske on unsplash