Google BigQuery
  • Dark
    Light

Google BigQuery

  • Dark
    Light

Overview

This page describes how to configure a Google BigQuery data source. With Matillion Data Loader, you can replicate and load your source data into your target destination.

Schema Drift Support: Yes. Read Schema Drift to learn more.

Return to any page of this wizard by clicking Previous.

Click X in the upper-right of the UI and then click Yes, discard to close the pipeline creation wizard.


Prerequisites

  • Please refer to the Whitelisted IP Addresses topic before you begin. You may not be able to connect to certain data sources without first whitelisting the Batch IP addresses. In these circumstances, connection tests will always fail and you will not be able to complete the pipeline.
  • You need permissions to create and manage resources in your Google Cloud Platform (GCP) account. Please contact your administrator where applicable.
  • You need access to a Google Cloud project that has one or more datasets containing at least one table.

Create pipeline

  1. In Matillion Data Loader, click Add pipeline.
  2. Choose Google BigQuery from the grid of data sources.
  3. Choose Batch Loading.

Connect to Google BigQuery

Configure the Google BigQuery database connection settings, specifying the following:

Property Description
Google BigQuery OAuth Select an OAuth entry from the dropdown menu or click Add OAuth if one doesn't exist.
OAuth label Provide a descriptive name for the OAuth entry and then click Authorize. A new browser tab will open, where Google will ask you to confirm authorization using valid credentials.
Project ID The Google Cloud project to use. For more information, read Creating and managing projects
Dataset ID The Google BigQuery dataset to load data into. For more information, read Introduction to datasets.
Advanced settings Additional JDBC parameters or connection settings. Click Advanced settings and then choose a parameter from the dropdown menu and enter a value for the parameter. Click Add parameter for each extra parameter you want to add.
Error

If you receive the error Response too large to return, consider specifying a destination table in your job configuration. Read Error messages for more information.
You may need to add two connection options:

  1. AllowLargeResultSets:true
  2. DestinationTable: <BigQuery-staging-project-name>.<BigQuery-staging-dataset-name>.${table} These connection settings will facilitate the use ofAllowLargeResultSetswhen staging the data. Please retain the appendage${table}`, as this is resolved at runtime.

Click Continue.


Choose tables

Choose a BigQuery schema from the dropdown list.

Choose any tables you wish to include in the pipeline. Use the arrow buttons to move tables to the Tables to extract and load listbox and then reorder any tables with click-and-drag. Additionally, select multiple tables using the SHIFT key.

Click Continue with X tables to move forward.


Review your data set

Choose the columns from each table to include in the pipeline. By default, Matillion Data Loader selects all columns from a table.

Click Configure on a table to open Select columns. Use the arrow buttons to move columns out of the Columns to extract and load listbox. Order columns with click-and-drag. Select multiple columns using SHIFT.

Click Done to continue.

Click Continue once you have configured each table.


Choose destination

  1. Choose an existing destination or click Add a new destination.
  2. Select a destination from Snowflake, Amazon Redshift, or Google BigQuery.

Set frequency

Property Description
Pipeline name A descriptive label for your pipeline. This is how the pipeline appears on the pipeline dashboard and how Matillion Data Loader refers to the pipeline.
Sync every The frequency at which the pipeline should sync. Day values include 1—7. Hour values include 1—23. Minute values include 5—59. The input is also the length of delay before the first sync.

Currently, you can't specify a start time.

Once you are happy with your pipeline configuration, click Create pipeline to complete the process and add the pipeline to your dashboard.