Quick Notes — Event Hubs
2 min readAug 6, 2019
General Points
- Kafka-like big data ingestion service with event retention.
- 1 reader per partition
- Consumer Group is a view with a client-side cursor, offset, of where that CG is in its processing. This allows for many “applications” to pull and process events in their own way at their own pace.
- More like a tape recorder where left and right channel are like partitions.
- Partitions are for consumer side parallelism; 1:1 reader and 1:1 Throughput Unit.
- 2–32 max parts and this cannot be changed after.
- Events are out of order between parts.
- 256Kb max message size or 1Mb for dedicated plan.
- Event Hubs Capture is a configurable way to archive events to Azure Storage, but is stored in Apache format.
- HA is done by writing to an alias which routes to an active primary and replicates to a secondary.
- Failover is manual operation, secondary becomes primary.
- Partitions gaurantee 1TU ~ 1MB/s ingress, which is slow! 32MB/s max sustained for 32 parts.
- 5 concurrent Readers max, per partition, per Consumer Group (not sure what that really means) recommended that there is only one active receiver on a partition per Consumer Group.
- If you have multiple readers on the same partition, then you process duplicate messages. You need to handle this in your code.
- Checkpointing is client-side responsibility and done per Reader.
- Readers lease their partition so that there’s only one active.
- Events dropping into different partitions will be read out-of-order.
Availability and Disaster Recovery
- Geo-DR is only available for Standard SKU. This means limiting to 256KB messages and other constraints. Dedicated is >£4K/mo..!
- Availability Zones on new hub resources within regions that support it.
Assumptions
- A Consumer Group is a view into all events and thus spans across all partitions, within the group are a number of Readers that keep track of their offset on that partition.
- Reader checkpointing for other PaaS services that feed from a Hub is probably done for you, presumably by supplying a Storage connection string.
- Not sure what happens if the processing is done but the offset cannot be written.
Side Notes
- Service Bus Queue Premium with 4 messaging units reached 16,782 KB/s (https://blogs.msdn.microsoft.com/servicebus/2016/07/18/premium-messaging-how-fast-is-it/ )
- Only a Dedicated SKU Hub can go beyond this, and that’s expensive at its smallest size (https://docs.microsoft.com/en-gb/azure/event-hubs/event-hubs-faq#dedicated-clusters )
- Migration to a new Hub would, I assume, be done by:
- Setting up a new hub.
- Flow the events from the old into the new.
- Stop a SA job (or whatever) consuming the original Hub.
- Set it up again exactly against the new Hub, downtime/delay experienced during this time.