S3 sink connector by Confluent naming and data formats ====================================================== The **Apache Kafka Connect® S3 sink connector** enables you to move data from an **Aiven for Apache Kafka®** cluster to **Amazon S3** for long term storage. The following document describes advanced parameters defining the naming and data formats. .. Warning:: Aiven provides two version of S3 sink connector: one developed by Aiven, another developed by Confluent. This article is about the **Confluent** version. Documentation for the Aiven version is available in the :doc:`dedicated page `. S3 naming format ---------------- The Apache Kafka Connect® S3 sink connector by Confluent stores a series of files as objects in the specified S3 bucket. By default, each object is named using the pattern: .. code:: topics//partition=/++. The placeholders are the following: * ``TOPIC_NAME``: Name of the topic to be pushed to S3 * ``PARTITION_NUMBER``: Topic partitions number * ``START_OFFSET``: File starting offset * ``FILE_EXTENSION``: The file extension depends on serialization format defined. The ``bin`` extension is generated when serializing messages in binary format. For example, a topic with 3 partitions generates initially the following files in the destination S3 bucket: .. code:: topics//partition=0/+0+0000000000.bin topics//partition=1/+1+0000000000.bin topics//partition=2/+2+0000000000.bin S3 data format -------------- By default, data is stored in binary format, one line per message. The connector creates a file every ``X`` messages, where ``X`` is defined by the ``flush.size`` parameter. Setting the ``flush.size`` parameter to ``1`` generates a file for each message in a topic. In the above example, having a topic with 3 partitions and 10 messages, setting the ``flush.size`` parameter to 1 generates the following files (one per message) in the destination S3 bucket: .. code:: topics//partition=0/+0+0000000000.bin topics//partition=0/+0+0000000001.bin topics//partition=0/+0+0000000002.bin topics//partition=0/+0+0000000003.bin topics//partition=1/+1+0000000000.bin topics//partition=1/+1+0000000001.bin topics//partition=1/+1+0000000002.bin topics//partition=2/+2+0000000000.bin topics//partition=2/+2+0000000001.bin topics//partition=2/+2+0000000002.bin You can find additional documentation at the `dedicated page `_.