Data Cookbook Kitchen

Did you know you can send files, like log files, directly to an endpoint in ClickHouse? For simple use cases this means you might not need to use a collector/agent like Open Telemetry Collector.

Let’s look at a simple example. I am using a 100 line sample from this example, taken from the ClickStack documentation.

The data is structured and represents a web server log and looks like this:

{"remote_addr":"54.36.149.41","remote_user":"-","run_time":"0","time_local":"2019-01-22 00:26:14.000","request_type":"GET","request_path":"\/filter\/27|13 ,27|  5 ,p53","request_protocol":"HTTP\/1.1","status":"200","size":"30577","referer":"-","user_agent":"Mozilla\/5.0 (compatible; AhrefsBot\/6.1; +http:\/\/ahrefs.com\/robot\/)"}
{"remote_addr":"31.56.96.51","remote_user":"-","run_time":"0","time_local":"2019-01-22 00:26:16.000","request_type":"GET","request_path":"\/image\/60844\/productModel\/200x200","request_protocol":"HTTP\/1.1","status":"200","size":"5667","referer":"https:\/\/www.zanbil.ir\/m\/filter\/b113","user_agent":"Mozilla\/5.0 (Linux; Android 6.0; ALE-L21 Build\/HuaweiALE-L21) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/66.0.3359.158 Mobile Safari\/537.36"}
{"remote_addr":"31.56.96.51","remote_user":"-","run_time":"0","time_local":"2019-01-22 00:26:16.000","request_type":"GET","request_path":"\/image\/61474\/productModel\/200x200","request_protocol":"HTTP\/1.1","status":"200","size":"5379","referer":"https:\/\/www.zanbil.ir\/m\/filter\/b113","user_agent":"Mozilla\/5.0 (Linux; Android 6.0; ALE-L21 Build\/HuaweiALE-L21) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/66.0.3359.158 Mobile Safari\/537.36"}
{"remote_addr":"40.77.167.129","remote_user":"-","run_time":"0","time_local":"2019-01-22 00:26:17.000","request_type":"GET","request_path":"\/image\/14925\/productModel\/100x100","request_protocol":"HTTP\/1.1","status":"200","size":"1696","referer":"-","user_agent":"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)"}
{"remote_addr":"91.99.72.15","remote_user":"-","run_time":"0","time_local":"2019-01-22 00:26:17.000","request_type":"GET","request_path":"\/product\/31893\/62100\/----PR257AT","request_protocol":"HTTP\/1.1","status":"200","size":"41483","referer":"-","user_agent":"Mozilla\/5.0 (Windows NT 6.2; Win64; x64; rv:16.0)Gecko\/16.0 Firefox\/16.0"}
{"remote_addr":"40.77.167.129","remote_user":"-","run_time":"0","time_local":"2019-01-22 00:26:17.000","request_type":"GET","request_path":"\/image\/23488\/productModel\/150x150","request_protocol":"HTTP\/1.1","status":"200","size":"2654","referer":"-","user_agent":"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)"}
{"remote_addr":"40.77.167.129","remote_user":"-","run_time":"0","time_local":"2019-01-22 00:26:18.000","request_type":"GET","request_path":"\/image\/45437\/productModel\/150x150","request_protocol":"HTTP\/1.1","status":"200","size":"3688","referer":"-","user_agent":"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)"}
{"remote_addr":"40.77.167.129","remote_user":"-","run_time":"0","time_local":"2019-01-22 00:26:18.000","request_type":"GET","request_path":"\/image\/576\/article\/100x100","request_protocol":"HTTP\/1.1","status":"200","size":"14776","referer":"-","user_agent":"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)"}
{"remote_addr":"66.249.66.194","remote_user":"-","run_time":"0","time_local":"2019-01-22 00:26:18.000","request_type":"GET","request_path":"\/filter\/b41,b665,c150|,p56","request_protocol":"HTTP\/1.1","status":"200","size":"34277","referer":"-","user_agent":"Mozilla\/5.0 (compatible; Googlebot\/2.1; +http:\/\/www.google.com\/bot.html)"}
{"remote_addr":"40.77.167.129","remote_user":"-","run_time":"0","time_local":"2019-01-22 00:26:18.000","request_type":"GET","request_path":"\/image\/57710\/productModel\/100x100","request_protocol":"HTTP\/1.1","status":"200","size":"1695","referer":"-","user_agent":"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)"}

In ClickHouse, infer the schema like so (you can run this with clickhouse local):

DESCRIBE TABLE file('access100.log')
FORMAT TSV
SETTINGS schema_inference_make_columns_nullable = 0;
remote_addr	String					
remote_user	String					
run_time	String					
time_local	DateTime64(9)					
request_type	String					
request_path	String					
request_protocol	String					
status	String					
size	String					
referer	String					
user_agent	String					 

Create a table from this:

CREATE TABLE access100 (
    remote_addr	String,					
    remote_user	String,				
    run_time	String,				
    time_local	DateTime64(9),					
    request_type	String,					
    request_path	String,					
    request_protocol	String,					
    status	String,					
    size	String,					
    referer	String,					
    user_agent	String
)
ENGINE = MergeTree
ORDER BY ( request_path );

and then you can just send files by POSTing to the ClickHouse endpoint, as described here:

curl -u 'default:<password>' 'https://<node name>.eu-central-1.aws.clickhouse.cloud:8443/?query=INSERT%20INTO%20access100%20FORMAT%20JSONEachRow' --data-binary @./access100.log

The query is encoded in a URL parameter and the file is in the request body.

Encapsulating the query in a ClickHouse Cloud query endpoint

In ClickHouse Cloud, you can even hide the query from outside users entirely (and stop them from tampering with it.) You do this by defining a query endpoint. Write this query in an SQL Console window: define the query in the cloud console:

INSERT INTO access100 FORMAT JSONEachRow;

Follow the instructions to share the query.

Then you can stream the file to the endpoint without having the query explicitly in the URL:

curl -H "Content-Type: application/octet-stream" -s --user '<API key>:<API secret>' 'https://queries.clickhouse.cloud/run/<endpoint ID>' --data-binary @./access100.log

Notes:

  • This approach uses cloud console authentication so you need to create an API key
  • Make sure to set the Content-Type header correctly. application/json, which is the default suggested by the console help, does not work in this case!

This image is taken from Page 377 of Praktisches Kochbuch für die gewöhnliche und feinere Küche” by Medical Heritage Library, Inc. is licensed under CC BY-NC-SA 2.0 .