Multi-Value Dimensions in Apache Druid (Part 4)
Let’s do some customer segmentation in our little restaurant!
This is part 4 of the miniseries about multi-value dimensions in Apache Druid. The previous posts can be found here:
Last time we were able to learn a lot about orders in a fictional Italian restaurant. We managed to isolate groups of customer visits with the same items, the same number of the same items, and even the exact same sequence of orders.
Marketeers are very interested in this kind of information. They map it to customer segments, in order to create tailored offerings for different segments. Ideally, this breaks down to very small groups of customers with similar preferences.
Let’s see if Druid could help a marketeer!
We are using the
ristorante datasource from part 3, and start with the analysis how many customers bought each item:
Let’s find out who the customers were that ordered each basket. I hinted at it last time: this is where array aggregation functions come in. The
ARRAY_AGG function creates a SQL array from all the values in a group by bucket:
And just like that, we got the list of customers for each dish!
SQL arrays can be a bit unwieldy, though. For client software that cannot handle those, there is the new
STRING_AGG function in Druid that concatenates the values into a string, using a configurable delimiter.
And this solves the mystery of the
- Multi value functions and array aggregation functions do different things but complement each other.
- Together, they are powerful for order and basket analysis and for customer segmentation.