Loading data files from Google Drive into ClickHouse
This is a question I encountered recently. A customer wanted to get started with ClickHouse but they had their historical data in CSV files in Google Drive.
Here’s how to get the job done. Main source is this StackOverflow answer.
Get the file ID:
- Go to your Google Drive in your browser.
- Right-click the file you want to download and click
Get shareable link
. The link looks like this:https://drive.google.com/file/d/XXX/view?usp=sharing
. Make note of the file IDXXX
; you will be needing it below.
Get an OAuth token:
- Go to OAuth 2.0 Playground
- In the
Select & authorize APIs
box, scroll down, expandDrive API v3
, and selecthttps://www.googleapis.com/auth/drive.readonly
. - Click
Authorize APIs
and thenExchange authorization code for tokens
. Copy the Access tokenYYY
; you will be needing it below.
Find the table schema:
Using the values obtained in the previous steps, create the DESCRIBE
command to read the table definition using the url
table function. Use the setting schema_inference_make_columns_nullable = 0
to make all columns non nullable.
DESCRIBE url(
'https://www.googleapis.com/drive/v3/files/XXX?alt=media',
'CSVWithNames',
headers('Authorization' = 'Bearer YYY')
)
SETTINGS schema_inference_make_columns_nullable = 0;
Choose the column(s) for your primary key from the displayed column list. I’ll assume you are going to use a column named ZZZ
as the primary key.
Create and populate the table:
CREATE OR REPLACE TABLE t
ENGINE = MergeTree
ORDER BY `ZZZ`
EMPTY AS
SELECT * FROM url(
'https://www.googleapis.com/drive/v3/files/XXX?alt=media',
'CSVWithNames',
headers('Authorization' = 'Bearer YYY')
)
SETTINGS schema_inference_make_columns_nullable = 0;
INSERT INTO t
SELECT * FROM url(
'https://www.googleapis.com/drive/v3/files/XXX?alt=media',
'CSVWithNames',
headers('Authorization' = 'Bearer YYY')
)
SETTINGS schema_inference_make_columns_nullable = 0;
Conclusion
- Files in Google Drive can be accessed programmatically through the GDrive API.
- Use the
url
table function to ingest such files into ClickHouse.
“This image is taken from Page 377 of Praktisches Kochbuch für die gewöhnliche und feinere Küche” by Medical Heritage Library, Inc. is licensed under CC BY-NC-SA 2.0 .