Message Queue Systems

9 min read Β· Updated 2026-04-25

Message brokers are how loosely-coupled services exchange data without holding each other’s network connections. They’re the substrate that makes asynchronous architectures possible β€” and the source of half the operational complexity in distributed systems.

This lesson is the working vocabulary: queue vs. stream, push vs. pull, the major brokers, and when to use which.

Two Paradigms

Most message systems fall into one of two categories:

Queue (Job Queue)
Each message goes to one consumer
Producer pushes work; one of N workers picks it up and processes. Once consumed, message is gone. Used for task queues, background jobs, work distribution.
Stream (Log)
Multiple consumers see all messages
Producer appends to a log; many consumers each track their own position. Multiple independent reads of the same data. Used for event sourcing, analytics, fan-out.

The Major Brokers

RabbitMQ β€” The Classic Queue

AMQP-based
Mature, feature-rich. Advanced routing (exchanges, bindings, topic routing), priorities, TTLs, dead-letter queues.
Best fit
Job queues, work distribution, RPC patterns, complex routing logic. When the consumer count and message count are both modest.
Scale ceiling
Tens of thousands of msg/sec per node. Beyond that, things get hard.
Trade-off
Rich features at cost of operational complexity. Cluster failover semantics are subtle.

Apache Kafka β€” The Streaming Standard

Append-only log
Topic = partitioned, replicated, durable log. Consumers read at their own pace; can rewind and replay.
High throughput
Millions of messages per second per cluster. Used by LinkedIn, Netflix, Uber, every modern SaaS.
Best fit
Event streaming, analytics pipelines, audit logs, fan-out to many consumers, anything where you might want to replay.
Operational tax
Real cluster ops. Managed offerings (Confluent Cloud, AWS MSK) reduce but don't eliminate this.

We have an entire deep-dive lesson coming on Kafka.

Amazon SQS / Google Pub/Sub / Azure Service Bus

Fully managed
No clusters to operate. Pay per request. Scale handled by the cloud provider.
SQS
AWS's queue service. Standard (high throughput, at-least-once) or FIFO (ordered, exactly-once). Tightly integrated with Lambda.
Pub/Sub
Google's pub/sub. Globally distributed. At-least-once delivery; exactly-once available with Pub/Sub Lite.
Best fit
When you want messaging without operational overhead. Sufficient for most SaaS use cases.

Apache Pulsar β€” The Modern Hybrid

Both queue and stream
Different subscription types (Exclusive, Shared, Failover, Key_Shared) give you queue semantics or stream semantics on the same broker.
Geo-replication built in
First-class multi-region. Each cluster replicates to others without external tooling.
Multi-tenancy
Native multi-tenant β€” tenant/namespace as first-class concepts. Strong fit for SaaS platforms.
Best fit
When you want Kafka-like throughput, RabbitMQ-like flexibility, native geo and multi-tenancy. The newer alternative for greenfield deployments.

Push vs. Pull

How do consumers receive messages?

Push
Broker delivers to consumer
RabbitMQ default, SQS's long polling. Broker tracks consumer state. Simple consumer code. Risk: broker overwhelms slow consumer (need flow control).
Pull
Consumer fetches from broker
Kafka, Pulsar. Consumer controls pace. Easy back-pressure. Stateless broker for that consumer. Slight latency overhead from polling.

Modern systems mostly use pull-based models for the back-pressure benefits.

Delivery Semantics

What guarantees does the broker provide about message delivery?

At most once
Messages may be lost on failure. Simple, but rare in production.
At least once
Default for most systems. Messages can be duplicated. Consumers must be idempotent.
Exactly once
Theoretically perfect. Hard to implement; often unnecessary. We have a whole lesson on this.

Ordering

When does message order matter?

No ordering
Maximum throughput
Messages can be processed in parallel. Workers pull from a queue freely. Right when each message is independent.
Ordered by partition
Practical compromise
Kafka partitions: strict order within a key (e.g., user ID), parallelism across keys. The right answer for most workloads where order matters.

Global ordering is expensive and usually unnecessary. Most β€œwe need ordering” requirements are actually β€œwe need ordering per entity” β€” which partition keys give you.

Common Patterns

Work queue
Many workers pull from one queue. Each message processed once. Used for background jobs, image processing, email sending.
Pub/sub fan-out
One publish, many subscribers. Each subscriber sees all messages. Used for notifications, cache invalidation, analytics.
Request/reply
Async RPC. Producer sends a message with a reply queue; consumer responds to that queue. RabbitMQ is good for this; Kafka is awkward.
Routing
Messages routed to different queues based on content. RabbitMQ exchange + binding rules. Useful for complex business workflows.
Dead-letter queue
Messages that fail processing repeatedly go here. Don't block the main queue. Required for production-grade handling.
Delay queues
Process this message in 5 minutes. SQS supports natively; Kafka requires patterns or extensions.

Choosing for SaaS

Use caseDefault choice
Background jobs / task queueRabbitMQ or SQS
Event streaming, analytics, audit logKafka or Pulsar
Cross-region pub/subGoogle Pub/Sub or Pulsar
AWS-native, simpleSQS + SNS
GCP-native, simplePub/Sub
Both queue and stream neededPulsar (or Kafka with workarounds)
Hosted, low operational overheadSQS / Pub/Sub / Confluent Cloud

For most SaaS, the practical answer is:

Recap