The set operations from the Redis API are pretty simple to implement on DynamoDB, because having a partition and range key means we can just have the partition key as the set key, and the member itself as the range key. This gives us all the behaviour that we’d expect with a set. Repeatedly adding members will not duplicate them. The DynamoDB range is sorted, but we don’t care about the sorting behaviour for the set operations anyway.
We’re going to be using the operations and concepts referred to in the DynamoDB Foundation, so please read that first.
SADD key member
Remember that we’ve created our DynamoDB with a string partition key
pk and a string sort key
sk. The first operation is
SADD which a member into a set. We can use a
PutItem for this, with the set key on
pk and the member value itself on
sk — this will give us set behaviour where we don’t duplicate members and just overwrite them if they already exist.
# PutItem Item: pk: S: key sk: S: member ReturnValues: ALL_OLD
We don’t send a value — we don’t actually need or use one here. We also use the
ReturnValues: ALL_OLD to check if the member was already present before the
PutItem call. Redis counts the number of items actually inserted into the set in the call, so this allows us to do that.
SREM key member
Removing a member from a set is the similar to adding it. We call the
DeleteItem operation with the same
sk we used for the
# DeleteItem Item: pk: S: key sk: S: member ReturnValues: ALL_OLD
ReturnValues: ALL_OLD helps again with knowing if the member was actually present in the set before we deleted it.
Counting a set, also known as checking its cardinality, in our case is counting the number of items inside a particular partition key
pk. We do this with a modified form of the
Query API operation.
# Query KeyConditionExpression: pk = :pk ExpressionAttributeValues: pk: S: key Select: COUNT
This query runs over all the items inside partition
pk, which represents all the members of our set, but instead of returning them it returns the count of the items instead.
If there are too many items to count in a single query, the response will contain a
LastEvaluatedKey, which can be passed back into the
Query operation again as
ExclusiveStartKey. This allows a continuation of the count until a
LastEvaluatedKey is no longer being returned, at which point we know the entire set has been counted.
The simplest way to finish all query operations that return a cursor like the
LastEvaluatedKey would be a
do...while loop with the presence of the
LastEvaluatedKey as the condition for the
do portion of the loop ensures that it runs at least once, and then paginates until there are no more pages available.
SISMEMBER key member
Checking whether a member is part of a set is as simple as trying to get it and seeing what we get. We can call
GetItem with the same arguments as passed into the
PutItem when adding the member.
# GetItem Item: pk: S: key sk: S: member
If the member isn’t present in the set, this will return an empty result, which allows us to return false for the operation.
# Query KeyConditionExpression: pk = :pk ExpressionAttributeValues: pk: S: key
This is will return the first batch of members, along with a
LastEvaluatedKey if there are any more. If this key is present, we can make another call to query and pass it in as
ExclusiveStartKey. We keep doing this until we no longer see the
LastEvaluatedKey, which tells us that all the members have been fetched. It makes sense to run
Query in a
do...while loop similar to
These are really just application level computation, from the DynamoDB point of view. We’ll need to load each key we want to work with into application memory using
SMEMBERS and perform the intersection, union or diff operations there.
SDIFF, these are also application level executions of those operations followed by
SADD for the resulting members. It is possible to optimise this by using BatchWriteItem but this won’t tell you if the members you’re inserting already exist or not, so it may or may not work for you.
SRANDMEMBER and SPOP
SRANDMEMBER fetches a random member of the set — there’s no comparable operation in DynamoDB, so depending on your application you might want to just fetch the first or last member using a
Query. It is possible to use a random number as well, stored in a secondary index. Or to add an offset to the query that skips the first random
N members. This all depends on your application, though.
SPOP is pretty much the same as
SRANDMEMBER, except that it also deletes the member from the set, for which we use
SMOVE source destination member
This is an interesting and mildly complex operation. This is because the operation is transactional — the member must be removed from the source set and put into the destination set in an single atomic operation, because we don’t want a situation where our first
SREM-like operation on the source set succeeds but the second
SADD-like operation on the destination set fails. We also need to do this operation only if the members actually exists at the source set — if it doesn’t we need to very deliberately do nothing.
To pull this off, we’ll use the
TransactWriteItems operation. This operation takes multiple write operations to perform atomically, along with a condition check. We can then express our operation as
- check if member exists at source — this is a
- remove from source — this is a
- add to destination — this is a
Here’s the API structure we can use:
# TransactWriteItems TransactItems: Delete: ConditionExpression: attribute_exists(pk) Key: pk: S: sourceKey sk: S: member Put: Item: pk: S: destinationKey sk: S: member
ConditionExpression: attribute_exists(pk) is the check to make sure the member exists on the source set. In this case it doesn’t actually matter which attribute we check for existence on, so the
pk works fine.