Skip to content

Complete example of implementing microservice Tranasctional Messaging patterns - Idempotent Consumer, Tranasctional Outbox, Transaction Log Tailing, and Unit of Work using AWS DynamoDB Streams, SNS, SQS, and Lambda.

License

Notifications You must be signed in to change notification settings

filipsnastins/transactional-messaging-patterns-with-aws-dynamodb-streams-sns-sqs-lambda

Repository files navigation

Transactional Messaging Patterns - with AWS DynamoDB Streams, SNS, SQS, and Lambda

Overview

The project consists of multiple modules.

ℹ️

See README files inside of each module for more detailed documentation.

Motivation

In a software architecture that uses asynchronous messaging for inter-service communication, additional measures need to be taken to ensure that messages are published reliably.

Asynchronous messaging has several inherent properties that need to be taken into account when designing a system that uses asynchronous messaging. These properties require additional countermeasures to ensure that the system behaves as expected.

Property: At-least-once message delivery

When message is published to a message broker, it’s guaranteed to be delivered to a consumer, but the same message can be delivered more than once.

At-least-once delivery occurs due to how a message broker replicates the published message internally to ensure high availability, fault tolerance, and durability. Read more about the internals of why at-least-once delivery occurs in AWS Simple Queue Service.

Consequences

For the applications that use asynchronous messaging, this means that an action can be performed more than once, which can lead to data inconsistency and bad user experience. For example, when a user creates an order, the Order Service publishes OrderCreated message. The message can be delivered more than once to a Notifications Service that sends emails to the user, which can lead to the user receiving multiple identical emails.

Countermeasures

To compensate for at-least-once delivery, message consumers need to be able to recognize and handle duplicate messages. This can be achieved by implementing the Idempotent Consumer pattern. A consumer is idempotent when the same call produces the same result, or in other words, when the same message is processed more than once, the result is the same as if the message was processed only once.

Responsibility of handling duplicate messages is now always on the consumer side, because the publisher has no way of ensuring that the message is publisher and delivered exactly once.

Implementation example

There are many ways of implementing the Idempotent Consumer pattern. A common approach is to use an Inbox or ProcessedMessages table in a database to store unique MessageId of already processed messages. When a message is received, the consumer checks if the MessageId is already in the Inbox table. If it is, the message is ignored as a duplicate.

However, some message consumers might be already idempotent by design and not require any additional measures.

Idempotent Consumer pattern with DynamoDB

Property: Domain inconsistency in event-driven architecture

When actions in a system are driven by asynchronous messages - events or commands - usually a message triggers a consumer to change its internal state with a database update, and then publish a new message to notify other consumers about the change.

For example, when a user places an order, the Order Service creates the order entry in the database, and then publishes OrderCreated message to notify surrounding services about the new order, so they can proceed with subsequent actions like shipping, credit checks, invoicing, notifications etc.

Consequences

In this process, a business transaction spans across two independent infrastructure components - a database and a message broker. This introduces a risk of a partial failure, when the database update succeeds, but the message publishing fails. For example, when a user places an order, the Order Service creates an order entry in the database, but the OrderCreated message is not published due to an error in the message broker. This can lead to a situation when the order is created in the Order Service database, but surrounding services are not aware of that, and the order is not processed further.

Even though message brokers are highly-available, the message publishing can still fail due to various reasons, for example, due to a network failure, or a message broker being temporarily overloaded. In this example, the business transaction lacks atomicity - the property that ensures that either all operations - database update and message publishing - succeed, or none of them succeed.

Countermeasures

To ensure atomicity of the business transaction implement transactional messaging to publish messages as part of a database transaction.

Messages that need to be published are saved to the database as part of the database transaction, and a separate process - a message relay - reads messages from the database and publishes them to the message broker. This way, the database transaction ensures that either all operations - database update and message publishing - succeed, or none of them succeed.

Implementation example

There are multiple ways of implementing transactional messaging - with a combination of Transactional Outbox, Polling Publisher, and Transaction Log Tailing patterns.

This project demonstrates how to implement the message publisher using Transactional Outbox and Transactional Log Tailing with AWS DynamoDB Streams.

In a Transactional Outbox pattern, messages are saved to a DynamoDB Outbox table as part of a DynamoDB transaction. Then, a separate process - message relay - reads or receives the messages from the Outbox table and publishes them to a message broker. The message relay can be implemented with Transaction Log Tailing pattern by using DynamoDB Streams. DynamoDB Streams capture data changes to the Outbox table, and publish events to a AWS Lambda function. When a new message is saved to the Outbox table, DynamoDB Streams emit an INSERT event to the Lambda function. The Lambda function receives the inserted message, and reliably publishes it to a message broker.

Transaction Log Tailing pattern with DynamoDB

DynamoDB Streams also guarantee at-least-once delivery of data capture events, so the chance of publishing duplicate messages is higher, and the message consumer must be idempotent by implementing the Idempotent Consumer pattern described in a section above.

Unit of Work - ensuring atomicity of a business transaction

Idempotent Consumer and Transactional Outbox patterns use a database layer to persist already processed message identifiers and published messages. For the pattern implementations to work reliably, it must be ensured that all database operations within a business transaction are atomic. If an application is performing multiple independent database writes within a single business transaction, it’s possible that some of the writes succeed, and some fail, which can lead to data inconsistency. Therefore, we need to introduce another puzzle piece that will keep track of all database operations and ensure that they’re committed atomically - Unit of Work pattern.

Unit of Work pattern is described by Martin Fowler in P of EAA book. The Unit of Work pattern is more broad that what’s required for transactional messaging - it’s also responsible for figuring out what exactly needs to be saved to the database and how to resolve concurrency conflicts, but for the purposes of transactional messaging, the atomicity aspect is the most relevant.

For our purposes, Unit of Work will keep a list of all database operations that need to happen within the particular business transaction, and perform an atomic commit at the end of the transaction. The operations will include saving a processed message identifier to the Inbox table, saving publishes messages to the Outbox table, and saving domain objects to another aggregate table, for example, in case of the Order Service - the Orders table.

The database commit needs to be atomic, and its implementation will depend on a particular database technology. This project uses AWS DynamoDB that supports transactional operations - exactly what we need for our purposes. If we were using a relational database, we’d be using a familiar ACID transaction.

If you’re familiar with SQLAlchemy ORM, the sqlalchemy.orm.Session object already implements the Unit of Work pattern. Java JPA has a similar concept of @Transactional annotation. If your database doesn’t support transactions, there are other ways of ensuring operation atomicity, like writing all changes to a single aggregate document in one operation.

💡

Patterns are not static and set in stone. They can be adapted to fit the needs of a particular use case, or only a required part of a pattern can be used. The most useful thing about patterns is that they provide a common language to describe a solution to a problem.

Unit of Work pattern with DynamoDB

ℹ️

See a complete AWS reference architecture diagram - Domain consistency in event-driven architectures

Working Example - Customers and Orders application

The application example is inspired by eventuate-tram/eventuate-tram-examples-customers-and-orders.

The application demonstrates three key patterns:

The application consists of two services:

  • Order Service - manages orders

  • Customer Service - manages customers

All services are implemented using tomodachi framework that provides HTTP and AWS SNS SQS transport layers.

The service design is similar to eventuate-tram/eventuate-tram-examples-customers-and-orders. The main difference is in the implementation technologies used. Instead of MySQL, DynamoDB is used as a database. Binlog/Transaction log is replaced with DynamoDB Streams. Eventuate Tram CDC service is implemented with AWS Lambda that listens to AWS DynamoDB Streams. Instead of Apache Kafka, Amazon SNS and SQS are used as a message broker. Order History Service is not implemented.

Run locally with Docker Compose

  • Start applications

docker compose up
  • Get outbox lambda logs

awslocal --region=us-east-1 logs describe-log-groups

awslocal --region=us-east-1 logs tail /aws/lambda/lambda-dynamodb-streams--customers-outbox
awslocal --region=us-east-1 logs tail /aws/lambda/lambda-dynamodb-streams--orders-outbox

Sample requests

  • Create customer

curl -X POST --header "Content-Type: application/json" -d '{
  "name": "Jane Doe",
  "credit_limit": 250
}' http://localhost:9701/customers
  • Create order

curl -X POST --header "Content-Type: application/json" -d '{
  "customer_id": "97c05e79-5902-451f-b96e-f06c8fc3ed68",
  "order_total": 100
}' http://localhost:9702/orders
  • Get customer

curl http://localhost:9701/customer/15883bbb-dbf9-4ea0-afec-b2fab1a0ab2f
  • Get order

curl http://localhost:9702/order/a5ecbfba-32cd-4c94-bfcf-f6a4a8f8a91c
  • Cancel order

curl -X POST http://localhost:9702/order/a5ecbfba-32cd-4c94-bfcf-f6a4a8f8a91c/cancel

Further improvements for production use

Monitoring

  • Active monitoring on the Outbox table is required to notice if Lambda message relay has issues dispatching messages. OutboxRepository has a method get_not_dispatched_messages that returns a list of not dispatched messages. The monitoring can be implemented with a custom Lambda function that will be triggered by EventBridge, for example, every 5 minutes, and will check the age of the oldest not dispatched message. The Lambda function can publish a custom CloudWatch metric with the age of the oldest not dispatched message, and an alarm can be configured to trigger if the latest metric value is greater than, for example, 30 minutes.

  • Implement CloudWatch alarm that monitors if there are messages in Lambda’s dead-letter queue.

  • When working with DynamoDB Streams, set monitoring on Lambda IteratorAge metric - see working with streams section.

  • Implement and test tools/scripts to manually dispatch stuck messages in the Outbox table.

Storage

  • To save storage costs, Inbox and Outbox repositories should use DynamoDB time-to-live to cleanup old items, for example, after one year.

Large message payloads

Notes on running Outbox Lambda in a local environment or autotests

Moto: arm64 vs amd64 Lambda architecture

Seems that Moto ignores specified Lambda architecture name, and uses whichever mlupin/docker-lambda Docker image is pulled to the local Docker daemon. If you get errors that outbox messages are not dispatched, try to delete locally cached mlupin/docker-lambda images and pull them again for your machines architecture.

No such problems observed with LocalStack.

Moto: marking messages as dispatched doesn’t work with Moto

When Outbox Lambda is trying to mark a message as dispatched, Lambda goes into an infinite loop. That’s why marking messages as dispatched is disabled when running Outbox Lambda in local environment or autotests.

Seems that it happens because Moto publishes DynamoDB stream message before it returns a successful response on DynamoDB put_item operation, but further debugging in Moto is required.

No such problems observed with LocalStack.

Resources and acknowledgements

About

Complete example of implementing microservice Tranasctional Messaging patterns - Idempotent Consumer, Tranasctional Outbox, Transaction Log Tailing, and Unit of Work using AWS DynamoDB Streams, SNS, SQS, and Lambda.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published