Back to articles
December 14, 2024

Kinesis Data Streams Terminology

Kinesis Data Stream

  • A Kinesis data stream is set of shards. Each shard is a sequence of data records.
    • each data record in a shard has a sequence number, assigned by Kinesis Streams.

Data Record

  • A data record is the unit of data stored in a Kinesis data stream. Data records are composed of a sequence number, a partition key, and a data blob.
    • a data blob is an immutable sequence of bytes
    • Kinesis Data Streams does not inspect, interpret or change the data in the blob in any way
    • a blob can be up to 1 MB

Retention Period

  • the retention period is the length of time that data records are accessible after they are added to the stream.
  • A streams retention period is set to a default of 24 hours after creation
  • You can increase the retention period up to 168 hours (7 days)
    • use IncreaseStreamRetentionPeriod to increase
    • use DecreaseStreamRetentionPeriod to decrease to minimum of 24 hours

Producer

  • Producers put records into Amazon Kinesis Data Streams. For example, a web server sending log data to a stream is a producer.

Consumer

  • Consumers get records from Amazon Kinesis Data Streams and process them
  • These consumers are known as Amazon Kinesis Data Streams Applications.

Amazon Kinesis Data Streams Application

  • An Amazon Kinesis Data Streams application is a consumer of a stream that commonly runs on a fleet of EC2 instances.

Shard

  • A shard is a uniquely identified sequence of data records in a stream.
  • A stream is composed of one or more shards
  • Each shard produces a fixed unit of capacity
  • The data capacity of your stream is a function of the number of shards that you specify in your stream
  • The total capacity of the stream is the sum of the capacity of its shards.
  • If you data rate increases, you can increase or decrease the number of shards allocated to your stream

Partition Key

  • A partition key is used to group data by shard within a stream.
  • Kinesis data streams segregates the data records belonging to a stream into multiple shards.
  • It uses the partition key that is associated with each data record to determine which shard a given data record belongs to.
  • When an application puts data into a stream, it must specify a partition key.

Sequence Number

  • Each data record has a sequence number that is unique per partition-key within its shard
  • Kinesis Data Streams assigns the sequence number after you write to the stream with client.putRecords or client.putRecord. Sequence numbers for the same partition key generally increase over time. The longer the time period between requests, the larger the sequence numbers become.

Kinesis Client Library

  • The Kinesis Client Library is compiled into your application to enable fault-tolerant consumption of data from the stream.
  • The Kinesis Client Library ensures that every shard has a record processor running and processing that shard.
  • The library also simplifies reading data from the stream.
  • The Kinesis Client Library uses an Amazon DynamoDB table to store control data.
  • It creates one table per application that is processing data.

Application Name

  • The name of the Amazon Kinesis Data Streams application identifies the application.
  • Each of your applications must have a unique name that is scoped to the AWS account and Region used by the application
  • This name is used as a name for the control table in Amazon DynamoDB and the namespace for Amazon CloudWatch metrics.

Server Side Encryption

  • Amazon Kinesis Data Streams can automatically encrypt sensitive data as a producer enters it into the stream
  • Kinesis Data Streams uses AWS KMS master keys for encryption
Loading comments...