Synchronisation between Cervinodata and GBQ
When the data is collected and stored in the Cervinodata database, it is synced to Google Big Query. To make sure that the end user gets a complete new set at once, Cervinodata uses three data sets in a GBQ project.
The following three data sets have been automatically created by Cervinodata upon the first synchronisation:
- synchronisation
- production
- backup
Step 1: The synchronisation data set
First, the new data is synced to the synchronisation tables. This works as follows:
Existing tables are replaced by new tables. Records are added in batches of 500.000 records.
In the description of the data set a log is kept of the time the sync completed, as well how many records are added to the tables. Errors are also logged here.
The duration of this process depends on the amount of tables and the amount of records, but normally takes about 15 minutes to complete.
Step 2: The backup data set
If there are no errors in the synchronisation process, step 2 is to create a backup from the production data set to the backup data set. This works by copying tables from production one-by-one. Each backup table is deleted and re-created using the table from the production data set.
A log is kept of the time the backup process finished. Errors are also logged. The duration of this backup process depends on the number of tables and total number of records but normally completes within 10 minutes.
Step 3: The production data set
When the backup procedure is successful, a copy is made from synchronisation to production. A log is kept of the time the process completed. Errors are also logged. The duration of this process depends on the number of tables and total number of records but normally completes within 10 minutes.
Timing & duration
- Every morning at 01:00 a.m. GMT + 1 (Amsterdam) the data collection process is activated.
- At 03:00 a.m. the synchronisation starts as described in the previous paragraph. This process is completed between 03.30 a.m. and 04:00 a.m.
- The entire syncing process is shown in the schedule on the following page.
Google Big Query Costs
We have created a support page (here) to give you an idea of the costs of Google BigQuery when used with Cervinodata. During the syncing process we try to minimize the load for GBQ to help keep costs down. So far, most of our users stay well within the limit of the free Google BigQuery plan.