Google Cloud Storage sink connector naming and data formats#

The Apache Kafka Connect® GCS sink connector by Aiven enables you to move data from an Aiven for Apache Kafka® cluster to a Google Cloud Storage bucket for long term storage. The full connector documentation is available in the dedicated GitHub repository.

File name format#

The connector uses the following format for output files (blobs)

<prefix><topic>-<partition>-<start-offset>[.gz]

The file name format has the following building blocks:

  • <prefix>: the file name prefix, useful, for example, to define subdirectories in the storage bucket

  • <topic>: the source Apache Kafka topic name

  • <partition>: the source Apache Kafka topic’s partition number

  • <start-offset>: the offset of the first record in the file

  • [.gz]: the file suffix, added when compression is enabled and depending on compression type

Data format#

The connector output files are text files that contain one record per line (separated by \n).

There are two types of data format available:

  • Flat structure: it’s the default data format, where the field values are separated by comma (CSV).

    You can use the CSV format by setting the format.output.type to csv.

  • Complex structure: the file stores messages in the format of JSON lines. It contains one record per line and each line is a valid JSON object (jsonl).

    You can use the JSON format by setting the format.output.type to jsonl.