Quick Notes — Event Hubs

Luke Puplett
2 min readAug 6, 2019

General Points

  • Kafka-like big data ingestion service with event retention.
  • 1 reader per partition
  • Consumer Group is a view with a client-side cursor, offset, of where that CG is in its processing. This allows for many “applications” to pull and process events in their own way at their own pace.
  • More like a tape recorder where left and right channel are like partitions.
  • Partitions are for consumer side parallelism; 1:1 reader and 1:1 Throughput Unit.
  • 2–32 max parts and this cannot be changed after.
  • Events are out of order between parts.
  • 256Kb max message size or 1Mb for dedicated plan.
  • Event Hubs Capture is a configurable way to archive events to Azure Storage, but is stored in Apache format.
  • HA is done by writing to an alias which routes to an active primary and replicates to a secondary.
  • Failover is manual operation, secondary becomes primary.
  • Partitions gaurantee 1TU ~ 1MB/s ingress, which is slow! 32MB/s max sustained for 32 parts.
  • 5 concurrent Readers max, per partition, per Consumer Group (not sure what that really means) recommended that there is only one active receiver on a partition per Consumer Group.
  • If you have multiple readers on the same partition, then you process duplicate messages. You need to handle this in your code.
  • Checkpointing is client-side responsibility and done per Reader.
  • Readers lease their partition so that there’s only one active.
  • Events dropping into different partitions will be read out-of-order.

Availability and Disaster Recovery

  • Geo-DR is only available for Standard SKU. This means limiting to 256KB messages and other constraints. Dedicated is >£4K/mo..!
  • Availability Zones on new hub resources within regions that support it.

Assumptions

  • A Consumer Group is a view into all events and thus spans across all partitions, within the group are a number of Readers that keep track of their offset on that partition.
  • Reader checkpointing for other PaaS services that feed from a Hub is probably done for you, presumably by supplying a Storage connection string.
  • Not sure what happens if the processing is done but the offset cannot be written.

Side Notes

--

--