Synchronization between Cervinodata and GBQ
When the data is collected and stored in the Cervinodata database, it is synced to Google Big Query. To make sure that the end user gets a complete new set at once, Cervinodata uses three data sets in a GBQ project.
The following three data sets have been automatically created by Cervinodata upon the first synchronization:
step 1: The synchronization data set
First, the new data is synced to the synchronization tables. This works as follows:
Each table is checked, and if a table does not exist, it is created. Next, for each table in GBQ, the maximum id is compared to the maximum id in Cervinodata. If there is a record that exists in Cervinodata, but not in GBQ, it is added. This is done in batches of 10.000 records.
Next, it is checked which records have changed in Cervinodata. These records are deleted from GBQ and then replaced by the newer data from Cervinodata. This is also done in batches of 10.000 records.
In the description of the data set a log is kept of the time the sync completed, as well as which tables have new records or updates (including how many). Errors are also logged here.
The duration of this process depends on the amount of tables and the amount of new or changed records, but normally takes about 15 minutes to complete.
Step 2: The Backup data set
If there are no errors in the synchronization process, step 2 is to create a backup from the production data set to the backup data set. This works by copying tables from production one-by-one. Each backup table is deleted and re-created using the table & data from the production data set.
A log is kept of the time the backup process finished. Errors are also logged. The duration of this backup process depends on the number of tables and total number of records but normally completes within 10 minutes.
Step 3: The Production data set
When the backup procedure is successful, a copy is made from synchronization to production. A log is kept of the time the process completed. Errors are also logged. The duration of this process depends on the number of tables and total number of records but normally completes within 10 minutes.
Timing & duration
- Every morning at 01:00 a.m. GMT + 1 (Amsterdam) the data collection process is activated. Because it can happen that one or more platform API’s are not available, the collection process is started again at 02:00.
- At 04:00 a.m. the synchronization starts as described in the previous paragraph. This process is completed between 04.30 a.m. and 05:00 a.m.
- The entire syncing process is shown in the schedule on the following page.
Google Big Query Costs
We have created a support page (here) to give you an idea of the costs of Google BigQuery when used with Cervinodata. During the syncing process we try to minimize the load for GBQ to help keep costs down. So far, most of our users stay well within the limit of the free Google BigQuery plan.