Google Cloud Storage sink connector naming and data formats#
The Apache Kafka Connect® GCS sink connector by Aiven enables you to move data from an Aiven for Apache Kafka® cluster to a Google Cloud Storage bucket for long term storage. The full connector documentation is available in the dedicated GitHub repository.
File name format#
The connector uses the following format for output files (blobs)
<prefix><topic>-<partition>-<start-offset>[.gz]
The file name format has the following building blocks:
<prefix>
: the file name prefix, useful, for example, to define subdirectories in the storage bucket<topic>
: the source Apache Kafka topic name<partition>
: the source Apache Kafka topic’s partition number<start-offset>
: the offset of the first record in the file[.gz]
: the file suffix, added when compression is enabled and depending on compression type
Data format#
The connector output files are text files that contain one record per line (separated by \n
).
There are two types of data format available:
Flat structure: it’s the default data format, where the field values are separated by comma (CSV).
You can use the CSV format by setting the
format.output.type
tocsv
.Complex structure: the file stores messages in the format of JSON lines. It contains one record per line and each line is a valid JSON object (
jsonl
).You can use the JSON format by setting the
format.output.type
tojsonl
.