Poor Person's Telemetry Collector with ClickHouse

Did you know you can send files, like log files, directly to an endpoint in ClickHouse? For simple use cases this means you might not need to use a collector/agent like Open Telemetry Collector.
Let’s look at a simple example. I am using a 100 line sample from this example, taken from the ClickStack documentation.
The data is structured and represents a web server log and looks like this:
{"remote_addr":"54.36.149.41","remote_user":"-","run_time":"0","time_local":"2019-01-22 00:26:14.000","request_type":"GET","request_path":"\/filter\/27|13 ,27| 5 ,p53","request_protocol":"HTTP\/1.1","status":"200","size":"30577","referer":"-","user_agent":"Mozilla\/5.0 (compatible; AhrefsBot\/6.1; +http:\/\/ahrefs.com\/robot\/)"}
{"remote_addr":"31.56.96.51","remote_user":"-","run_time":"0","time_local":"2019-01-22 00:26:16.000","request_type":"GET","request_path":"\/image\/60844\/productModel\/200x200","request_protocol":"HTTP\/1.1","status":"200","size":"5667","referer":"https:\/\/www.zanbil.ir\/m\/filter\/b113","user_agent":"Mozilla\/5.0 (Linux; Android 6.0; ALE-L21 Build\/HuaweiALE-L21) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/66.0.3359.158 Mobile Safari\/537.36"}
{"remote_addr":"31.56.96.51","remote_user":"-","run_time":"0","time_local":"2019-01-22 00:26:16.000","request_type":"GET","request_path":"\/image\/61474\/productModel\/200x200","request_protocol":"HTTP\/1.1","status":"200","size":"5379","referer":"https:\/\/www.zanbil.ir\/m\/filter\/b113","user_agent":"Mozilla\/5.0 (Linux; Android 6.0; ALE-L21 Build\/HuaweiALE-L21) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/66.0.3359.158 Mobile Safari\/537.36"}
{"remote_addr":"40.77.167.129","remote_user":"-","run_time":"0","time_local":"2019-01-22 00:26:17.000","request_type":"GET","request_path":"\/image\/14925\/productModel\/100x100","request_protocol":"HTTP\/1.1","status":"200","size":"1696","referer":"-","user_agent":"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)"}
{"remote_addr":"91.99.72.15","remote_user":"-","run_time":"0","time_local":"2019-01-22 00:26:17.000","request_type":"GET","request_path":"\/product\/31893\/62100\/----PR257AT","request_protocol":"HTTP\/1.1","status":"200","size":"41483","referer":"-","user_agent":"Mozilla\/5.0 (Windows NT 6.2; Win64; x64; rv:16.0)Gecko\/16.0 Firefox\/16.0"}
{"remote_addr":"40.77.167.129","remote_user":"-","run_time":"0","time_local":"2019-01-22 00:26:17.000","request_type":"GET","request_path":"\/image\/23488\/productModel\/150x150","request_protocol":"HTTP\/1.1","status":"200","size":"2654","referer":"-","user_agent":"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)"}
{"remote_addr":"40.77.167.129","remote_user":"-","run_time":"0","time_local":"2019-01-22 00:26:18.000","request_type":"GET","request_path":"\/image\/45437\/productModel\/150x150","request_protocol":"HTTP\/1.1","status":"200","size":"3688","referer":"-","user_agent":"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)"}
{"remote_addr":"40.77.167.129","remote_user":"-","run_time":"0","time_local":"2019-01-22 00:26:18.000","request_type":"GET","request_path":"\/image\/576\/article\/100x100","request_protocol":"HTTP\/1.1","status":"200","size":"14776","referer":"-","user_agent":"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)"}
{"remote_addr":"66.249.66.194","remote_user":"-","run_time":"0","time_local":"2019-01-22 00:26:18.000","request_type":"GET","request_path":"\/filter\/b41,b665,c150|,p56","request_protocol":"HTTP\/1.1","status":"200","size":"34277","referer":"-","user_agent":"Mozilla\/5.0 (compatible; Googlebot\/2.1; +http:\/\/www.google.com\/bot.html)"}
{"remote_addr":"40.77.167.129","remote_user":"-","run_time":"0","time_local":"2019-01-22 00:26:18.000","request_type":"GET","request_path":"\/image\/57710\/productModel\/100x100","request_protocol":"HTTP\/1.1","status":"200","size":"1695","referer":"-","user_agent":"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)"}
In ClickHouse, infer the schema like so (you can run this with clickhouse local):
DESCRIBE TABLE file('access100.log')
FORMAT TSV
SETTINGS schema_inference_make_columns_nullable = 0;
remote_addr String
remote_user String
run_time String
time_local DateTime64(9)
request_type String
request_path String
request_protocol String
status String
size String
referer String
user_agent String
Create a table from this:
CREATE TABLE access100 (
remote_addr String,
remote_user String,
run_time String,
time_local DateTime64(9),
request_type String,
request_path String,
request_protocol String,
status String,
size String,
referer String,
user_agent String
)
ENGINE = MergeTree
ORDER BY ( request_path );
and then you can just send files by POSTing to the ClickHouse endpoint, as described here:
curl -u 'default:<password>' 'https://<node name>.eu-central-1.aws.clickhouse.cloud:8443/?query=INSERT%20INTO%20access100%20FORMAT%20JSONEachRow' --data-binary @./access100.log
The query is encoded in a URL parameter and the file is in the request body.
Encapsulating the query in a ClickHouse Cloud query endpoint
In ClickHouse Cloud, you can even hide the query from outside users entirely (and stop them from tampering with it.) You do this by defining a query endpoint. Write this query in an SQL Console window: define the query in the cloud console:
INSERT INTO access100 FORMAT JSONEachRow;
Follow the instructions to share the query.
Then you can stream the file to the endpoint without having the query explicitly in the URL:
curl -H "Content-Type: application/octet-stream" -s --user '<API key>:<API secret>' 'https://queries.clickhouse.cloud/run/<endpoint ID>' --data-binary @./access100.log
Notes:
- This approach uses cloud console authentication so you need to create an API key
- Make sure to set the
Content-Typeheader correctly.application/json, which is the default suggested by the console help, does not work in this case!
“This image is taken from Page 377 of Praktisches Kochbuch für die gewöhnliche und feinere Küche” by Medical Heritage Library, Inc. is licensed under CC BY-NC-SA 2.0 .