Configure data retention thresholds in Aiven for ClickHouse®’s tiered storage#

Important

Aiven for ClickHouse® tiered storage is a limited availability feature. If you’re interested in trying out this feature, contact the sales team at sales@Aiven.io.

Learn to control how your data is distributed between storage devices in the tiered storage of an Aiven for ClickHouse service. Check out how to configure tables so that your data is automatically written either to SSD or object storage as needed.

About data retention control#

If you have the tiered storage feature enabled on your Aiven for ClickHouse service, your data is distributed between two storage devices (tiers). The data is stored either on SSD or in object storage, depending on whether and how you configure this behavior. By default, data is moved from SSD to object storage when SSD reaches 80% of its capacity (default size-based data retention policy).

You may want to change this default data distribution behavior by configuring your table’s schema by adding a TTL (time-to-live) clause. Such a configuration allows ignoring the SSD-capacity threshold and moving the data from SSD to object storage based on how long the data is there on your SSD.

To enable this time-based data distribution mechanism, you can set up a retention policy (threshold) on a table level by using the TTL clause. For data retention control purposes, the TTL clause uses the following:

  • Data item of the Date or DateTime type as a reference point in time

  • INTERVAL clause as a time period to elapse between the reference point and the data transfer to object storage

Prerequisites#

  • Aiven organization

  • Tiered storage feature enabled on the project level and on the table level

  • Command line tool (ClickHouse client)

Configure time-based data retention#

  1. Connect to your Aiven for ClickHouse service using, for example, the ClickHouse client (CLI).

  2. Select a database for operations you intend to perform.

    USE database-name
    

Add TTL to a new table#

Create a new table with the storage_policy setting set to tiered (to enable the feature) and TTL (time-to-live) configured to add a time-based data retention threshold on the table.

CREATE TABLE example_table (
    SearchDate Date,
    SearchID UInt64,
    SearchPhrase String
)
ENGINE = MergeTree
ORDER BY (SearchDate, SearchID)
PARTITION BY toYYYYMM(SearchDate)
TTL SearchDate + INTERVAL 1 WEEK TO VOLUME 'remote'
SETTINGS storage_policy = 'tiered';

Add TTL to an existing table#

Use the MODIFY TTL clause:

ALTER TABLE database_name.table_name MODIFY TTL ttl_expression;

Update TTL to an existing table#

Change an already configured TTL in an existing table by using the ALTER TABLE MODIFY TTL clause:

ALTER TABLE database_name.table_name MODIFY TTL ttl_expression;

Result

You have your time-based data retention policy set up. From now on, when data is on your SSD longer than a specified time period, it’s moved to object storage, regardless of how much of SSD capacity is still available.

What’s next#