Objective and Background

  • In BigQuery, we often save historical data by appending suffixes like _YYYYMMDD to table names. Over time, this can lead to an accumulation of many versions of data that need to be deleted.
  • However, the GCP console only allows deletion of one table at a time, and the bq command does not support wildcard deletions. To address this, we use the following approach.

One-liner

  • Replace <Project_name>, <Dataset_Name>, and <Table_Name_Pattern_To_Delete> in the command below.
bq query --project_id=<Project_name> --use_legacy_sql=false --format=csv --max_rows=1000 "SELECT table_id FROM \`<Product_Name>.<Dataset_Name>.__TABLES__\` WHERE table_id LIKE '%<Table_Name_Pattern_To_Delete>%'" | tail -n +2 | xargs -I {} bq rm -f -t <Product_Name>:<Dataset_Name>.{}

Additional Notes

‐ It is advisable to verify the target tables before executing the deletion. Use the following command to check:

bq query --project_id=<Project_name> --use_legacy_sql=false --format=csv --max_rows=1000 "SELECT table_id FROM \`<Product_Name>.<Dataset_Name>.__TABLES__\` WHERE table_id LIKE '%<Table_Name_Pattern_To_Delete>%'" | tail -n +2 | xargs -I {} echo "Table ID: {}"

‐ Setting a default expiration for datasets upon creation is considered best practice. However, this setting does not apply to already created tables, which must be manually deleted.

https://cloud.google.com/bigquery/docs/best-practices-storage#use-expiration-settings