Apache Druid has always had built-in data lifecycle management by way of retention rules. Specifying fixed time intervals or relative periods, you would tell Druid to retain only data segments that are not older than x days.

The mid-August release of Polaris brings retention management to Imply Polaris, the fully managed analytics service powered by Druid. You can set the retention policy by table. Here is how it’s done:

In the Tables view, select the ... menu for the table that you want to set the retention policy for.

Tables view with context menu

In the Edit table screen, find the barrel icon with Data retention next to it. Select Specific, and enter the desired period. The format is ISO-8601 duration, so for instance, P7D means 7 days (before the current date.) Any data that is older (by primary timestamp) will be scheduled for deletion after 30 days.

Table editor with retention menu

Then hit Update to apply the changes.

Conclusion

  • Data retention management is now available in Polaris.
  • Unlike Druid default (which retains data in deep storage indefinitely), data dropped from Polaris will be deleted permanently after 30 days.