HomeBig DataCreate extra partitions and retain knowledge for longer in your MSK Serverless...

Create extra partitions and retain knowledge for longer in your MSK Serverless clusters

In April 2022, Amazon Managed Streaming for Apache Kafka (Amazon MSK) launched an thrilling new functionality, Amazon MSK Serverless. Amazon MSK is a totally managed service for Apache Kafka that makes it simpler for builders to construct and run extremely out there, safe, and scalable functions based mostly on Apache Kafka. With MSK Serverless, builders can run their functions with out having to provision, configure, or optimize their Apache Kafka clusters. MSK Serverless routinely provisions and scales compute and storage assets, so builders have entry to on-demand streaming capability and storage.

Over the rest of 2022, the workforce collected buyer suggestions and labored backward from buyer necessities so as to add new capabilities that made MSK Serverless even higher. On this publish, we focus on a number of of those enhancements intimately and supply an instance use case.

Larger default quota for partitions in a cluster

Information in Apache Kafka is written to subjects, which might be partitioned into a number of log recordsdata known as partitions. When a producer utility writes knowledge to a subject, it’s appended to considered one of these partitions. MSK Serverless launched with a most quota of 120 partitions per cluster. Nonetheless, our prospects instructed us that they wanted extra partitions per cluster for quite a lot of use circumstances, starting from change knowledge seize (CDC) to quicker real-time knowledge processing.

In December 2022, we elevated the default quota for partitions for MSK Serverless clusters. With the elevated quota, you possibly can create as much as 2,400 partitions per cluster. The 20-fold improve within the variety of partitions you possibly can have per cluster enables you to create extra subjects per cluster and have extra functions devour knowledge in parallel. You can even implement higher isolation of knowledge with fine-grained entry management. Extra partitions are significantly helpful for CDC use circumstances the place every desk within the database has lots of of unqiue keys, that are every mapped to a singular partition. With extra partitions, you should utilize MSK Serverless for capturing modifications in bigger databases with a lot of tables and lots of of keys. Be aware that the two,400 restrict solely applies to chief partitions. MSK Serverless creates two replicas of every partition by default at no further value that don’t rely in the direction of this restrict.

Limitless knowledge retention period

The info you produce to your subjects might be retained in Apache Kafka for a configurable period, relying on how lengthy it’s good to entry knowledge utilizing Apache Kafka shopper APIs. Sometimes, prospects retain knowledge for brief intervals of time, starting from a number of hours to some days. Beforehand, MSK Serverless restricted knowledge retention to a most of 24 hours (1 day), which is adequate for hottest Apache Kafka use circumstances. Nonetheless, some use circumstances require prospects to retain knowledge for longer, resembling retaining knowledge for audit functions or sustaining utility restoration SLAs.

Now, with the rise within the knowledge retention period quota, you possibly can retain knowledge for so long as you want in your MSK Serverless clusters. Longer knowledge retention is especially helpful to be used circumstances the place your shopper functions want fast entry to older knowledge. As an illustration, within the case of a failure, the appliance could have to entry knowledge from the beginning of the subject to reconstruct its state. As a result of now you can retain knowledge in your subjects for longer durations, you possibly can restore your utility’s state by accessing older knowledge utilizing Kafka’s shopper API, making it simpler to get well from such failures. After the appliance recovers, you possibly can configure your utility to begin consuming the information from the earliest timestamp it’s good to reestablish your utility’s state. Be aware you could solely retain as much as 250 GB of knowledge per partition. So long as your partition doesn’t attain 250 GB in dimension, you might retain it for so long as you would like. Chances are you’ll create extra partitions in the event you want extra storage for a given matter.

These new quotas can be found in all Areas the place MSK Serverless is accessible. For extra info, navigate to the MSK Serverless tab on the Amazon MSK pricing web page and select the Area drop-down menu.

You can even request a rise to the utmost variety of partitions quota by contacting AWS Help in the event you want greater than 2,400 partitions in a cluster. The quotas for extra partitions and longer retention are utilized to each present and new clusters.

Getting began: Create a subject with 1,000 partitions and 7-day retention

On this part, we reveal easy methods to create a subject in MSK Serverless, specify the variety of partitions, and set its retention period.

As a prerequisite, it’s essential to have an MSK Serverless cluster and an Apache Kafka shopper. Consult with Getting began utilizing MSK Serverless clusters for step-by-step directions.

  1. In your shopper machine, entry kafka_2.12-2.8.1/bin and run the next export command (substitute the ‘my-endpoint’ with the bootstrap server string of your MSK Serverless cluster):
  2. Run the next command to create a subject known as msk-sample-topic with 1,000 partitions and 7-day knowledge retention (604,800,000 milliseconds):
    --bootstrap-server $BS 
    --command-config shopper.properties 
    --topic msk-sample-topic 
    --partitions 1000 
    --config retention.ms=604800000

  3. (Optionally available) Run the next command to view the main points of the subject you created in step 2 above:
    --bootstrap-server $BS
    --command-config shopper.properties 
    --topic msk-sample-topic 
    --describe | head -n1

    You will notice the next end result:

    Matter: msk-sample-topic TopicId: Ze76LY9EQuiH0xOIenx_HA PartitionCount: 1000ReplicationFactor: 3    Configs: min.insync.replicas=2,phase.bytes=134217728,retention.ms=604800000,message.format.model=2.8-IV2,unclean.chief.election.allow=false,retention.bytes=268435456000

Clear up

To keep away from incurring expenses on the AWS assets created on this publish, delete the MSK Serverless cluster and the Amazon Elastic Compute Cloud (Amazon EC2) occasion in your shopper machine.

  1. On the Amazon MSK console, choose the MSK Serverless cluster you used for this answer.
  2. Select Actions, then select Delete.
  3. On the Amazon EC2 console, choose the occasion that you just created in your Apache Kafka shopper machine.
  4. Select Occasion state, then select Terminate occasion.


This publish demonstrated easy methods to create an MSK Serverless cluster matter with 1,000 partitions and 7-day retention. With the brand new quota will increase, you possibly can create as much as 2,400 partitions per cluster and retain knowledge for so long as you want. In case you have feedback or suggestions, please be at liberty to go away them within the feedback.

In regards to the writer

Usama Naseem is a Senior Product Supervisor for Amazon MSK and focuses on MSK Serverless. Beforehand, he held product administration roles for AWS Lambda and Amazon Recent. He’s captivated with giving prospects the instruments to construct real-time functions within the cloud. Outdoors of labor, he continues to be underneath the delusion that he would be the greatest squash participant on this planet in the future.


Most Popular

Recent Comments