Implementing Redis Sets on DynamoDB

How to implement the Redis set operations on DynamoDB.

REDIMO

The set operations from the Redis API are pretty simple to implement on DynamoDB, because having a partition and range key means we can just have the partition key as the set key, and the member itself as the range key. This gives us all the behaviour that we’d expect with a set. Repeatedly adding members will not duplicate them. The DynamoDB range is sorted, but we don’t care about the sorting behaviour for the set operations anyway.

We’re going to be using the operations and concepts referred to in the DynamoDB Foundation, so please read that first.

SADD key member

Remember that we’ve created our DynamoDB with a string partition key pk and a string sort key sk. The first operation is SADD which a member into a set. We can use a PutItem for this, with the set key on pk and the member value itself on sk — this will give us set behaviour where we don’t duplicate members and just overwrite them if they already exist.

# PutItem
Item:
  pk:
    S: key
  sk:
    S: member
ReturnValues: ALL_OLD

We don’t send a value — we don’t actually need or use one here. We also use the ReturnValues: ALL_OLD to check if the member was already present before the PutItem call. Redis counts the number of items actually inserted into the set in the call, so this allows us to do that.

SREM key member

Removing a member from a set is the similar to adding it. We call the DeleteItem operation with the same pk and sk we used for the PutItem.

# DeleteItem
Item:
  pk:
    S: key
  sk:
    S: member
ReturnValues: ALL_OLD

The ReturnValues: ALL_OLD helps again with knowing if the member was actually present in the set before we deleted it.

SCARD key

Counting a set, also known as checking its cardinality, in our case is counting the number of items inside a particular partition key pk. We do this with a modified form of the Query API operation.

# Query
KeyConditionExpression: pk = :pk
ExpressionAttributeValues:
  pk:
    S: key
Select: COUNT

This query runs over all the items inside partition pk, which represents all the members of our set, but instead of returning them it returns the count of the items instead.

If there are too many items to count in a single query, the response will contain a LastEvaluatedKey, which can be passed back into the Query operation again as ExclusiveStartKey. This allows a continuation of the count until a LastEvaluatedKey is no longer being returned, at which point we know the entire set has been counted.

The simplest way to finish all query operations that return a cursor like the LastEvaluatedKey would be a do...while loop with the presence of the LastEvaluatedKey as the condition for the while. The do portion of the loop ensures that it runs at least once, and then paginates until there are no more pages available.

SISMEMBER key member

Checking whether a member is part of a set is as simple as trying to get it and seeing what we get. We can call GetItem with the same arguments as passed into the PutItem when adding the member.

# GetItem
Item:
  pk:
    S: key
  sk:
    S: member

If the member isn’t present in the set, this will return an empty result, which allows us to return false for the operation.

SMEMBERS key

Fetching all the members of a set is very similar to counting them, which is what we don in SCARD. We’ll use the Query API operation:

# Query
KeyConditionExpression: pk = :pk
ExpressionAttributeValues:
  pk:
    S: key

This is will return the first batch of members, along with a LastEvaluatedKey if there are any more. If this key is present, we can make another call to query and pass it in as ExclusiveStartKey. We keep doing this until we no longer see the LastEvaluatedKey, which tells us that all the members have been fetched. It makes sense to run Query in a do...while loop similar to SCARD.

SINTER, SUNION, SDIFF

These are really just application level computation, from the DynamoDB point of view. We’ll need to load each key we want to work with into application memory using SMEMBERS and perform the intersection, union or diff operations there.

SINTERSTORE, SUNIONSTORE, SDIFFSTORE

Similar to SINTER, SUNION and SDIFF, these are also application level executions of those operations followed by SADD for the resulting members. It is possible to optimise this by using BatchWriteItem but this won’t tell you if the members you’re inserting already exist or not, so it may or may not work for you.

SRANDMEMBER and SPOP

The SRANDMEMBER fetches a random member of the set — there’s no comparable operation in DynamoDB, so depending on your application you might want to just fetch the first or last member using a Query. It is possible to use a random number as well, stored in a secondary index. Or to add an offset to the query that skips the first random N members. This all depends on your application, though.

SPOP is pretty much the same as SRANDMEMBER, except that it also deletes the member from the set, for which we use SREM.

SMOVE source destination member

This is an interesting and mildly complex operation. This is because the operation is transactional — the member must be removed from the source set and put into the destination set in an single atomic operation, because we don’t want a situation where our first SREM-like operation on the source set succeeds but the second SADD-like operation on the destination set fails. We also need to do this operation only if the members actually exists at the source set — if it doesn’t we need to very deliberately do nothing.

To pull this off, we’ll use the TransactWriteItems operation. This operation takes multiple write operations to perform atomically, along with a condition check. We can then express our operation as

  • check if member exists at source — this is a Condition
  • remove from source — this is a Delete
  • add to destination — this is a Put

Here’s the API structure we can use:

# TransactWriteItems
TransactItems:
  Delete:
    ConditionExpression: attribute_exists(pk)
    Key:
      pk:
        S: sourceKey
      sk:
        S: member
  Put:
    Item:
      pk:
        S: destinationKey
      sk:
        S: member

The ConditionExpression: attribute_exists(pk) is the check to make sure the member exists on the source set. In this case it doesn’t actually matter which attribute we check for existence on, so the pk works fine.