Blog post
CQRS and Saga Pattern in .NET: Building Reliable Distributed Workflows
CQRS and Saga solve different but related problems: CQRS separates reads from writes inside a service boundary, while Saga coordinates long-running business workflows across service boundaries.
- Category
- architecture
- Published
Project
GitHub placeholder: cqrs-saga-dotnet
The project is intended to show a realistic order workflow:
OrderServiceaccepts commands and owns the order aggregate.InventoryServicereserves and releases stock.PaymentServicecharges and refunds payment.ShippingServicecreates shipments.SagaOrchestratorcoordinates the cross-service workflow.- An outbox table guarantees that committed state changes eventually publish integration events.
The Problem CQRS and Saga Solve
Most systems begin with one model:
public class Order
{
public Guid Id { get; set; }
public string CustomerId { get; set; } = "";
public decimal Total { get; set; }
public string Status { get; set; } = "Draft";
}
The same model is used to create orders, validate checkout, update status, render lists, calculate dashboards, and expose reports. That works until two pressures appear.
First, write behavior becomes richer. Creating an order is no longer a simple insert. It validates customer limits, checks product availability, applies discounts, emits events, and enforces invariants. The write model becomes behavior-heavy.
Second, read behavior becomes performance-heavy. The product page wants denormalized rows. The admin dashboard wants filters. The customer mobile app wants a small projection. The finance report wants historical totals. The read model wants to be optimized for queries, not business rules.
CQRS addresses this by separating the write side from the read side. Saga addresses a different problem: what happens when one business operation spans multiple services and no single database transaction can cover all of them?
CQRS in One Sentence
Command Query Responsibility Segregation separates operations that change state from operations that read state.
- Command: expresses an intent to change the system, such as
PlaceOrder,CancelOrder, orApproveRefund. - Query: asks for data without changing state, such as
GetOrderDetails,SearchOrders, orGetCustomerOrderHistory.
CQRS does not require event sourcing, microservices, Kafka, or multiple databases. At its simplest, it is a code organization pattern inside one .NET application.
A Practical .NET CQRS Shape
A common .NET implementation uses MediatR to route commands and queries to handlers.
public sealed record PlaceOrderCommand(
Guid CustomerId,
IReadOnlyList<OrderLineRequest> Lines
) : IRequest<Guid>;
public sealed record OrderLineRequest(Guid ProductId, int Quantity, decimal UnitPrice);
The handler owns write-side behavior:
public sealed class PlaceOrderHandler : IRequestHandler<PlaceOrderCommand, Guid>
{
private readonly OrdersDbContext _db;
private readonly ISystemClock _clock;
public PlaceOrderHandler(OrdersDbContext db, ISystemClock clock)
{
_db = db;
_clock = clock;
}
public async Task<Guid> Handle(PlaceOrderCommand command, CancellationToken cancellationToken)
{
var order = Order.Place(
customerId: command.CustomerId,
lines: command.Lines.Select(x => new OrderLine(x.ProductId, x.Quantity, x.UnitPrice)),
placedAt: _clock.UtcNow
);
_db.Orders.Add(order);
await _db.SaveChangesAsync(cancellationToken);
return order.Id;
}
}
The query side is intentionally simpler. It does not load aggregates unless it must. It can project directly into response models.
public sealed record GetOrderDetailsQuery(Guid OrderId) : IRequest<OrderDetailsDto?>;
public sealed record OrderDetailsDto(
Guid Id,
string Status,
decimal Total,
IReadOnlyList<OrderLineDto> Lines
);
public sealed class GetOrderDetailsHandler
: IRequestHandler<GetOrderDetailsQuery, OrderDetailsDto?>
{
private readonly OrdersDbContext _db;
public GetOrderDetailsHandler(OrdersDbContext db)
{
_db = db;
}
public Task<OrderDetailsDto?> Handle(
GetOrderDetailsQuery query,
CancellationToken cancellationToken)
{
return _db.Orders
.AsNoTracking()
.Where(order => order.Id == query.OrderId)
.Select(order => new OrderDetailsDto(
order.Id,
order.Status,
order.Total,
order.Lines.Select(line => new OrderLineDto(
line.ProductId,
line.Quantity,
line.UnitPrice
)).ToList()
))
.SingleOrDefaultAsync(cancellationToken);
}
}
The important part is not MediatR. The important part is that write logic and read logic evolve independently.
The Write Model Should Protect Invariants
Commands should not simply update properties. They should call behavior on aggregates.
public sealed class Order
{
private readonly List<OrderLine> _lines = new();
public Guid Id { get; private set; }
public Guid CustomerId { get; private set; }
public OrderStatus Status { get; private set; }
public DateTimeOffset PlacedAt { get; private set; }
public IReadOnlyCollection<OrderLine> Lines => _lines;
public decimal Total => _lines.Sum(line => line.Quantity * line.UnitPrice);
private Order() { }
public static Order Place(
Guid customerId,
IEnumerable<OrderLine> lines,
DateTimeOffset placedAt)
{
var orderLines = lines.ToList();
if (orderLines.Count == 0)
{
throw new DomainException("An order must contain at least one line.");
}
if (orderLines.Any(line => line.Quantity <= 0))
{
throw new DomainException("Order line quantity must be greater than zero.");
}
return new Order
{
Id = Guid.NewGuid(),
CustomerId = customerId,
Status = OrderStatus.PendingInventory,
PlacedAt = placedAt,
_lines = { orderLines }
};
}
public void MarkInventoryReserved()
{
if (Status != OrderStatus.PendingInventory)
{
throw new DomainException("Inventory can only be reserved for pending orders.");
}
Status = OrderStatus.PendingPayment;
}
public void MarkPaid()
{
if (Status != OrderStatus.PendingPayment)
{
throw new DomainException("Payment can only be accepted after inventory is reserved.");
}
Status = OrderStatus.ReadyToShip;
}
public void Cancel(string reason)
{
if (Status == OrderStatus.Shipped)
{
throw new DomainException("A shipped order cannot be cancelled.");
}
Status = OrderStatus.Cancelled;
CancellationReason = reason;
}
}
This is where CQRS pays off. Query DTOs can be flat and fast, while the write model remains strict and behavior-oriented.
Read Models Can Be Denormalized
When reads need to scale or serve different UI shapes, create projections.
public sealed class OrderSummaryProjection
{
public Guid OrderId { get; set; }
public Guid CustomerId { get; set; }
public string CustomerName { get; set; } = "";
public string Status { get; set; } = "";
public decimal Total { get; set; }
public DateTimeOffset PlacedAt { get; set; }
public int TotalItems { get; set; }
}
The projection can live in the same database, a separate read database, Elasticsearch, Azure Cosmos DB, or Redis depending on query needs. Do not split databases just because CQRS is present. Split when independent scaling, storage shape, or operational ownership justifies it.
Domain Events vs Integration Events
CQRS implementations often introduce events, but not all events have the same audience.
Domain events are internal to a bounded context. They capture something meaningful that happened inside the domain.
public sealed record OrderPlacedDomainEvent(
Guid OrderId,
Guid CustomerId,
decimal Total
) : IDomainEvent;
Integration events are contracts between services.
public sealed record OrderPlacedIntegrationEvent(
Guid EventId,
Guid OrderId,
Guid CustomerId,
decimal Total,
DateTimeOffset OccurredAt
);
Do not publish domain objects directly to other services. Integration events should be stable, versioned contracts. They should contain what other services need, not your internal aggregate structure.
The Outbox Pattern
The classic distributed systems bug is simple:
- Save the order to the database.
- Publish
OrderPlacedto the message broker. - The service crashes between step 1 and step 2.
The database says the order exists, but no other service knows about it.
The outbox pattern fixes this by storing the outgoing message in the same transaction as the business change.
public sealed class OutboxMessage
{
public Guid Id { get; set; }
public string Type { get; set; } = "";
public string Payload { get; set; } = "";
public DateTimeOffset OccurredAt { get; set; }
public DateTimeOffset? ProcessedAt { get; set; }
public string? Error { get; set; }
}
During SaveChanges, convert domain events to outbox messages:
public override async Task<int> SaveChangesAsync(CancellationToken cancellationToken = default)
{
var domainEvents = ChangeTracker
.Entries<IHasDomainEvents>()
.SelectMany(entry => entry.Entity.DequeueDomainEvents())
.ToList();
foreach (var domainEvent in domainEvents)
{
OutboxMessages.Add(new OutboxMessage
{
Id = Guid.NewGuid(),
Type = domainEvent.GetType().Name,
Payload = JsonSerializer.Serialize(domainEvent, domainEvent.GetType()),
OccurredAt = DateTimeOffset.UtcNow
});
}
return await base.SaveChangesAsync(cancellationToken);
}
A background worker reads unpublished messages and publishes them to the broker. This gives you at-least-once delivery. Consumers must be idempotent.
Where Saga Enters
CQRS organizes a service. Saga coordinates multiple services.
Consider checkout:
Place order
-> reserve inventory
-> charge payment
-> create shipment
-> confirm order
If payment fails, inventory must be released. If shipment creation fails, payment may need to be refunded. This cannot be handled with one database transaction if each service owns its own database.
A saga is a sequence of local transactions with compensating actions.
OrderPlaced
-> ReserveInventory
-> InventoryReserved
-> ChargePayment
-> PaymentCharged
-> CreateShipment
-> ShipmentCreated
-> OrderConfirmed
Failure path:
OrderPlaced
-> ReserveInventory
-> InventoryReserved
-> ChargePayment
-> PaymentFailed
-> ReleaseInventory
-> OrderCancelled
Choreography Saga
In choreography, each service reacts to events and emits the next event.
OrderService publishes OrderPlaced
InventoryService consumes OrderPlaced and publishes InventoryReserved
PaymentService consumes InventoryReserved and publishes PaymentCharged
ShippingService consumes PaymentCharged and publishes ShipmentCreated
OrderService consumes ShipmentCreated and confirms the order
Choreography is simple when the workflow is short. There is no central coordinator. Each service knows only the events it consumes and emits.
The downside is visibility. The workflow is scattered across services. Timeouts, retries, and compensation become harder to reason about as the number of steps grows.
Orchestration Saga
In orchestration, a saga coordinator owns the workflow state and sends commands to services.
public sealed class OrderCheckoutSaga :
MassTransitStateMachine<OrderCheckoutState>
{
public State InventoryPending { get; private set; } = null!;
public State PaymentPending { get; private set; } = null!;
public State ShippingPending { get; private set; } = null!;
public Event<OrderPlacedIntegrationEvent> OrderPlaced { get; private set; } = null!;
public Event<InventoryReservedIntegrationEvent> InventoryReserved { get; private set; } = null!;
public Event<InventoryReservationFailedIntegrationEvent> InventoryFailed { get; private set; } = null!;
public Event<PaymentChargedIntegrationEvent> PaymentCharged { get; private set; } = null!;
public Event<PaymentFailedIntegrationEvent> PaymentFailed { get; private set; } = null!;
public OrderCheckoutSaga()
{
InstanceState(x => x.CurrentState);
Initially(
When(OrderPlaced)
.Then(ctx =>
{
ctx.Saga.OrderId = ctx.Message.OrderId;
ctx.Saga.CustomerId = ctx.Message.CustomerId;
})
.Send(new Uri("queue:inventory-reserve"), ctx => new ReserveInventoryCommand(
ctx.Message.OrderId,
ctx.Message.Items
))
.TransitionTo(InventoryPending)
);
During(InventoryPending,
When(InventoryReserved)
.Send(new Uri("queue:payment-charge"), ctx => new ChargePaymentCommand(
ctx.Saga.OrderId,
ctx.Message.Amount
))
.TransitionTo(PaymentPending),
When(InventoryFailed)
.Send(new Uri("queue:order-cancel"), ctx => new CancelOrderCommand(
ctx.Saga.OrderId,
"Inventory reservation failed"
))
.Finalize()
);
During(PaymentPending,
When(PaymentCharged)
.Send(new Uri("queue:shipping-create"), ctx => new CreateShipmentCommand(
ctx.Saga.OrderId
))
.TransitionTo(ShippingPending),
When(PaymentFailed)
.Send(new Uri("queue:inventory-release"), ctx => new ReleaseInventoryCommand(
ctx.Saga.OrderId
))
.Send(new Uri("queue:order-cancel"), ctx => new CancelOrderCommand(
ctx.Saga.OrderId,
"Payment failed"
))
.Finalize()
);
}
}
The orchestrator makes the business process explicit. It is easier to monitor, test, and operate. For complex workflows, orchestration is usually easier to maintain than choreography.
Saga State
Saga state should capture workflow progress, not duplicate every service's data.
public sealed class OrderCheckoutState : SagaStateMachineInstance
{
public Guid CorrelationId { get; set; }
public string CurrentState { get; set; } = "";
public Guid OrderId { get; set; }
public Guid CustomerId { get; set; }
public DateTimeOffset CreatedAt { get; set; }
public DateTimeOffset? CompletedAt { get; set; }
public int RetryCount { get; set; }
}
The correlation ID is critical. Every message in the workflow should carry it so logs, traces, and saga state can be joined.
Idempotency Is Mandatory
Outbox publishing and message broker retries usually provide at-least-once delivery. That means the same event can be processed more than once.
Consumers need an inbox or processed-message table.
public sealed class ProcessedMessage
{
public Guid MessageId { get; set; }
public string Consumer { get; set; } = "";
public DateTimeOffset ProcessedAt { get; set; }
}
The consumer checks whether the message was processed before doing business work.
public async Task Consume(ConsumeContext<InventoryReservedIntegrationEvent> context)
{
var alreadyProcessed = await _db.ProcessedMessages.AnyAsync(
x => x.MessageId == context.Message.EventId &&
x.Consumer == nameof(InventoryReservedConsumer),
context.CancellationToken
);
if (alreadyProcessed)
{
return;
}
await _orders.MarkInventoryReserved(
context.Message.OrderId,
context.CancellationToken
);
_db.ProcessedMessages.Add(new ProcessedMessage
{
MessageId = context.Message.EventId,
Consumer = nameof(InventoryReservedConsumer),
ProcessedAt = DateTimeOffset.UtcNow
});
await _db.SaveChangesAsync(context.CancellationToken);
}
Use a unique index on (MessageId, Consumer) so concurrent duplicate delivery cannot race.
Timeouts and Compensation
Long-running workflows need timeouts. Inventory reservation might never respond because the inventory service is down. Payment might stay pending because the provider callback was delayed.
In a saga, timeout is a business event:
Inventory reservation did not complete in 5 minutes
-> cancel order
-> notify customer
-> release any partial reservation if necessary
Compensation is not rollback. It is a new business action that semantically reverses a previous step.
ReleaseInventorycompensatesReserveInventory.RefundPaymentcompensatesChargePayment.CancelShipmentcompensatesCreateShipment.
Design compensation commands up front. If a step has no realistic compensation, the workflow may require human review instead of automatic reversal.
Observability Checklist
CQRS + Saga systems fail in distributed ways. Build observability before production.
- Propagate
traceparentand correlation IDs through every command and event. - Log command names, event names, aggregate IDs, saga IDs, and message IDs.
- Track outbox backlog and oldest unpublished message age.
- Track consumer retry count and dead-letter queue size.
- Expose saga state counts by workflow step.
- Alert on sagas stuck in one state beyond the expected SLA.
OpenTelemetry should be part of the template, not an afterthought.
Testing Strategy
Test the write model without infrastructure:
[Fact]
public void Place_rejects_empty_orders()
{
var exception = Assert.Throws<DomainException>(() =>
Order.Place(Guid.NewGuid(), Array.Empty<OrderLine>(), DateTimeOffset.UtcNow)
);
Assert.Equal("An order must contain at least one line.", exception.Message);
}
Test handlers with a real database provider when EF Core behavior matters. Avoid relying only on mocks for transaction and query behavior.
Test sagas as state machines:
- Given
OrderPlaced, assert the saga sendsReserveInventory. - Given
InventoryReserved, assert the saga sendsChargePayment. - Given
PaymentFailed, assert the saga sendsReleaseInventoryandCancelOrder. - Given timeout, assert the saga moves to the correct compensation path.
When Not to Use CQRS
Do not use CQRS if your service is a simple CRUD application with low query complexity and limited business behavior. A well-structured vertical slice architecture can be enough.
Avoid splitting read and write databases too early. It introduces synchronization lag, extra deployment surfaces, and debugging complexity.
Use CQRS when the read and write sides have different reasons to change.
When Not to Use Saga
Do not use Saga for operations that can fit cleanly inside one database transaction. Local transactions are simpler, faster, and easier to reason about.
Do not use Saga to hide unclear service boundaries. If every workflow requires every service, the boundaries may be wrong.
Use Saga when a business process crosses independently owned transactional boundaries and needs explicit failure handling.
Architecture Summary
Client
-> API
-> MediatR Command
-> Command Handler
-> Aggregate
-> Write Database
-> Outbox
-> Message Broker
-> Saga Orchestrator
-> Service Commands
-> Integration Events
-> Read Model Projections
CQRS gives each service a clean internal shape. Saga gives the business process a reliable distributed shape. The two patterns fit together because commands change local state, events describe committed facts, and the saga coordinates the next step only after those facts exist.
The key is discipline: keep invariants in the write model, keep read models optimized for queries, publish through an outbox, make consumers idempotent, and treat the saga as a first-class workflow with state, timeouts, compensation, and observability.