Back to articles
December 13, 2024

Writing Data to Amazon Kinesis Data Streams

  • A producer is an application that writes data to Amazon Kinesis Data Streams
  • An Amazon Kinesis Data Streams producer is an application that puts user data records into a Kineses data stream
  • The Kinesis Producer Library simplifies producer application development
    • allows developers to achieve high write throughput to a Kinesis data stream

Benefits of using Kinesis Data Streams

  • Common use is the real-time aggregation of data, followed by loading the data into a data warehouse or map-reduce cluster
  • Data on Kinesis data streams are ensured of durability and scalability
  • Data can be retrieved less than 1 second after it's put on the stream
  • Multiple Kinesis Data Streams applications can consume data from a stream
    • archiving and processing can take place concurrently/independently
  • Kinesis Client Library allows fault-tolerant consumption of stream data
    • provides scaling support for Kinesis Data Streams applications

Creating and Updating Data Streams

  • Amazon Kinesis Data Streams ingests data, stores the data and makes it available for consumption
  • Unit of data stored by Kinesis data streams is a data record
  • A group of data records is a data stream
    • the data records in a data stream are distributed into shards

Data Shards

  • A shard has a sequence of data records in a stream
  • When you create a stream, you specify the number of shards for the stream
  • The Total Capacity of a stream is the sum of the capacities of its shards
    • the number of shards in the stream can be increased or decreased
      • you are charged on a per-shard basis
  • A producer puts data records into shards
  • A consumer gets data records from shards

Re-sharding Errors with Kinesis Data Streams

  • When sharding some streams, an extra shard can be left after the operation finishes
  • If an even number of shards was requested, the number of open shards became odd
  • This occurs when the width of a shard is very small in relation to other shards in the stream
    • Resolve by merging the extra shard with any adjacent shard
  • This issue occurs when the difference between StartingHashKey and EndingHashKey is very small, like '1'
    • normally the difference between StartingHashKey and EndingHashKey is large
  • This can normally be resolved by finding the ShardID with the next adjacent Hash Key value, and merging the small shard into that shard
Loading comments...