- July 28, 2023
Google Sheets Data to GCS & BigQuery – Automated
Problem Statement
Currently, the client stores the data in Google Sheets and creates reports using Looker. Because of this, the business is not able to save the historical data due to continuous changes in google sheets.
To overcome this situation the client wants to analyze the data by performing various transformations in BigQuery and save the data in GCS Data lakes. This complete process needs to be automated using the Orchestration tool.
Methodology
The data is extracted from Google Sheets and ingested into Big Query using Airflow Composer
- Creating Data lakes in GCS to store the Google Sheets Data
- Querying the Google Sheets Data in BigQuery for Analysis
- Automation of Data pipeline process from Google Sheets to BigQuery
Technology
- Google Cloud Platform Components / Tools
- Google Cloud Storage
- Cloud Composer
- BigQuery
- Google Sheets
Project Architecture
Our Approach
- Created a Service Account with a Viewer role under the project.
- Assigned the Service account to Google Sheets by using the share option as to provide access.
- The data extraction process from Google Sheets to Google Cloud Storage is done by using Airflow composer
- Used Google Transfer Operator which is Google Sheets To GCS Operator in the Airflow DAG
- Created Dataset, and Table in the BigQuery using Airflow operators and loaded the data to BQ using Bash operator.
- Specify the Source Sheet ID, and Bucket Name in the Airflow Dag.
- Ex: upload_sheet_to_gcs = Google Sheets To GCS Operator(
task_id=”upload_sheet_to_gcs”,
destination_bucket=BUCKET_NAME,
spreadsheet_id=SPREADSHEET_ID,) - Check the BigQuery dataset for loaded data