Data Cookbook Kitchen

Refreshable Materialized Views (RMV) are a way to define scheduled data transformations in ClickHouse. A materialized view definition becomes an RMV by adding a REFRESH clause to the DDL like so:

CREATE TABLE ukp_bytown (
    town LowCardinality(String),
    total UInt32
)
ENGINE = MergeTree
ORDER BY (town);

CREATE MATERIALIZED VIEW rmv_ukp
REFRESH EVERY 5 SECOND TO ukp_bytown AS
SELECT
  town,
  sum(price) AS total
FROM uk_price_paid
GROUP BY town;

This would run every 5 seconds and compute an aggregate over the whole uk_price_paid table every time. What happens if the refresh query runs longer than 5 seconds? At most one refresh may be running at a time, for a given view. That means, subsequent invocations of the schedule will be skipped until the current run is finished. You can also use the REFRESH AFTER clause, in which case the next invocation will always start a specified time interval after the previous one has finished, but then you lose the fixed schedule. In either case, you can’t guarantee that there will be a fresh copy of your data every 5 seconds.

A customer was asking how to actually measure the execution time for a given RMV. This way, they want to be able to commit to a data freshness SLO. But how to monitor execution time? Let’s take a glance using ClickHouse Cloud.

Checking out system.view_refreshes

The first stopping point is system.view_refreshes. Unfortunately, this one gives only the last execution of the RMV. I can do something like:

SELECT * FROM system.view_refreshes
WHERE view = 'rmv_ukp'
FORMAT Vertical;
Row 1:
──────
database:                 default
view:                     rmv_ukp
uuid:                     e8efe4fb-e5f1-410a-bf3c-9ad96735be8c
status:                   Scheduled
last_success_time:        2026-03-17 11:31:40
last_success_duration_ms: 483
last_refresh_time:        2026-03-17 11:31:40
last_refresh_replica:     c-auburn-ea-63-server-vej09as-0
next_refresh_time:        2026-03-17 11:31:45
exception:                
retry:                    0
progress:                 ᴺᵁᴸᴸ
read_rows:                ᴺᵁᴸᴸ
read_bytes:               ᴺᵁᴸᴸ
total_rows:               ᴺᵁᴸᴸ
written_rows:             ᴺᵁᴸᴸ
written_bytes:            ᴺᵁᴸᴸ

What I would really like is a statistical analysis and query time percentiles over a given time period. So I need to dig deeper, and this means looking at system.query_log. But how to identify the query that Clickhouse actually runs?

Performance percentiles from query_log

Let’s first find out how the RMV queries show up in the query log. In order to create a suitable filter, I extract the DDL as it is stored in system.tables:

SELECT create_table_query
FROM system.tables
WHERE name = 'rmv_ukp';

ClickHouse has normalized the query and added some bookkeeping information:

CREATE MATERIALIZED VIEW default.rmv_ukp REFRESH EVERY 5 SECOND TO default.ukp_bytown (`town` LowCardinality(String), `total` UInt64) DEFINER = `sql-console:hellmar.becker@clickhouse.com` SQL SECURITY DEFINER AS SELECT town, sum(price) AS total FROM default.uk_price_paid GROUP BY town

but if I take all the stuff after the first AS, this gives me the SELECT part of the query:

SELECT regexpExtract(create_table_query, '\s+AS\s+(.*)$')
FROM system.tables
WHERE name = 'rmv_ukp';
SELECT town, sum(price) AS total FROM default.uk_price_paid GROUP BY town

Now I can surround this result with ‘%’ characters and drop it into a LIKE filter and I get to see what is really executed:

SELECT query
FROM clusterAllReplicas(default, merge(system, '^query_log*'))
WHERE query LIKE (
    SELECT concat('%', regexpExtract(create_table_query, '\s+AS\s+(.*)$'), '%')
    FROM system.tables
    WHERE name = 'rmv_ukp'
    LIMIT 1
  )
LIMIT 1;
INSERT INTO default.`.tmp.inner_id.e8efe4fb-e5f1-410a-bf3c-9ad96735be8c` (town, total) SELECT town, sum(price) AS total FROM default.uk_price_paid GROUP BY town

The RMV INSERTs into an intermediate table with a name generated by ClickHouse. When the INSERT finishes successfully, the intermediate table is swapped out with the target table in one atomic step.

There are a few more things to consider:

  • In a cluster like we have in ClickHouse Cloud, each node has its own version of query_log. To get a comprehensive view, I need to collect them all using clusterAllReplicas. I sneaked that in above already.
  • query_log logs two entries for each query - the beginning and the end. Only the end records the query duration, so let’s filter by type = 'QueryFinish'.
  • The quantiles() function gives me multiple percentile values with only one pass through the data.
  • Finally, my analytical query itself will also match the filter and I don’t want that to be included. So I group by normalized_query_hash, which is a value that represents the query shape (minus concrete values for filters and parameters.) Then I keep only the rows for the hash that occurs most frequently. This way, I achieve an effect similar to the ... | grep -v grep pattern in Linux.

Here’s the final query, only looking at invocations during the current day:

SELECT
  count(),
  quantiles(0.5, 0.9, 0.95, 0.99)(query_duration_ms),
  normalized_query_hash,
  any(query)
FROM clusterAllReplicas(default, merge(system, '^query_log*'))
WHERE query LIKE (
    SELECT concat('%', regexpExtract(create_table_query, '\s+AS\s+(.*)$'), '%')
    FROM system.tables
    WHERE name = 'rmv_ukp'
    LIMIT 1
  )
  AND type = 'QueryFinish'
  AND event_date = today()
GROUP BY normalized_query_hash
ORDER BY count() DESC
LIMIT 1;

The result for the quantiles is [196,901,983,1084]: The p95 execution time is below 1 second on my cluster, and even p99 is only slightly above it. So I am perfectly good running this every 5 seconds!

Conclusion

  • The last invocation of an RMV is logged in view_refreshes but for a comprehensive overview one has to look at query_log.
  • When grepping for the RMV query in the logs, be aware that the INSERT statement goes to an intermediate staging table.
  • Use quantiles to calculate multiple percentiles in one fell swoop.

This image is taken from Page 377 of Praktisches Kochbuch für die gewöhnliche und feinere Küche” by Medical Heritage Library, Inc. is licensed under CC BY-NC-SA 2.0 .