Understanding the DynamoDB Sort Key Order
Learn what the UTF-8 order is and how to use it to your advantage
If you have been working with DynamoDB, you are probably quite familiar with the notion of Partition Keys and Sort Keys (aka PK and SK). You also know that Sort Keys are... well, sorted in ascending order by default. If your SK is of type Number
the items will be sorted in numeric order (1, 3, 10, 50, 400), while if it's of type String
they are sorted in "order of UTF-8 bytes". But what does that even mean and how does it affect the order of the items? Let's find out.
What is is
As per Wikipedia,
UTF-8 is a variable-width character encoding used for electronic communication [...] UTF-8 is capable of encoding all 1,112,064 valid character code points in Unicode using one to four one-byte (8-bit) code units.
What it means is that each and every character is assigned a specific byte, or code point, which have a numerical value.
It's that numerical value that DynamoDB uses to determine the order of your Sort Key.
If you want to know what character comes after which, a good start is to remember the order of the most common characters:
- numbers
- uppercase letters
- lowercase letters
Notice that letters are "grouped" by case (first the capital letters, and after that lowercase), which means that Zoey
will come before alligator
. This is really important to know if you want to avoid surprises later.
For a complete list of UTF-8 characters, including special ones, sorted by their bytes order, refer to this page.
How to use it to your advantage
Once you know how strings are being sorted, you can use that knowledge to your advantage. A very common practice with DynamoDB and single table design is to pre-join data by placing them into the same partition.
Depending on your access pattern, you might either want your parent item to be at the beginning, or at the end of the partition.
For example, if you have Orders (SK prefixed with ORDER#
) and Order Items (SK prefixed with ITEM#
), you will perhaps want the ORDER#
item at the beginning of the partition, and the ITEM#
items sorted by number, in ascending order after it. However, O
comes after I
, which means that ORDER#
will be placed at the end of the partition.
You could scan the index backwards, sure, but then your items would be sorted in reversed order, breaking the access pattern.
How to fix that?
Use the UTF-8 sorting mechanism to your advantage! You need ORDER#
to start with a character that comes before I
to make sure it comes before ITEMS#
. For that, you can use any character that comes before I
. Any letter from A to H would work, but it might not be user friendly (for debugging and inspecting the data later) and could change the meaning of your prefix. Instead, it is usually most common to use special characters that come before A
: for example #
, $
or %
. Let's use $
and rename ORDER#
to $ORDER#
.
Now the Order
item is at the top of the partition and all items are sorted as expected. 🎉
Note that you can use the same trick in order to sort items in reversed order if necessary.
Example:
In the above example, you might want to scan the index in reverse order and get the latest vouchers at the top, in descending order. You can force the USER#
item to be at the end of the partition by prefixing it with a character that is higher in the UTF-8 ranking. |
or ~
are good examples.
Now, ~USER#
is at the end of the partition and you can scan the index backwards.
Conclusion
In this article, you learned what is the UTF-8 bytes order, and how you can use it to your advantage in order to force the order of items in your DynamoDB partitions.
If you would like to read more content like this, follow me on Twitter and subscribe to my newsletter on Hashnode.
Photo credits: Markus Spiske on unsplash