This page describes how to configure a Google Sheets data source. With Matillion Data Loader, you can replicate and load your source data into your target destination.
Schema Drift Support: Yes. Read Schema Drift to learn more.
Return to any page of this wizard by clicking Previous.
Click X in the upper-right of the UI and then click Yes, discard to close the pipeline creation wizard.
- Please refer to the Whitelisted IP Addresses topic before you begin. You may not be able to connect to certain data sources without first whitelisting the Batch IP addresses. In these circumstances, connection tests will always fail and you will not be able to complete the pipeline.
- The Google Sheets and Google Drive APIs should be enabled for the service account, if connecting via it.
- Google Sheets target tables are dropped and recreated each run. This handles any metadata changes at the data source.
_tempversion of the target table is created, which represents the staging table. This is then copied into the target table, defined by the UI. This makes this a packaged load process where multiple sheets from a spreadsheet can be loaded in the same Pipeline—it's not an incremental load.
- If a range specified is less than the number of columns in the sheet, the unspecified columns will be created, and the target will be empty. For example, if a sheet has columns
C, and the range
A1:B5is specified, column
Cwill be empty.
- If a header and range are specified, the range will be ignored.
- In Matillion Data Loader, click Add pipeline.
- Choose Google Sheets from the grid of data sources.
- Choose Batch Loading.
Connect to Google Sheets
Configure the Google Sheets database connection settings, specifying the following:
|Google Sheets OAuth||Select an OAuth entry from the dropdown menu or click Add OAuth if one doesn't exist.|
|OAuth label||Provide a descriptive name for the OAuth entry and then click Authorize. A new browser tab will open, where Google will ask you to confirm authorization using valid credentials.|
|Spreadsheet||The Google sheet you wish to use. Please note, the pipeline will fail if the Google sheet name has a leading or trailing space.|
|Advanced settings||Additional JDBC parameters or connection settings. Click Advanced settings and then choose a parameter from the dropdown menu and enter a value for the parameter. Click Add parameter for each extra parameter you want to add.|
- All sheets of a Google Sheets file are available. Define a range for each sheet by entering a value in the Range column.
- You can indicate which sheets use a header row by toggling Contains header.
- You can remove a sheet from the pipeline by selecting the - symbol alongside the sheet.
- You can add or replace a sheet by clicking +Add another sheet.
Review your data set
- Here you can review all of the sheets included in the pipeline, and the columns of each sheet. If you don't need to make any changes, click Continue.
- Make changes to columns in each sheet by clicking Configure alongside the appropriate sheet. Use the arrow buttons to move columns to the Columns to extract and load listbox and then reorder any columns with click-and-drag. Additionally, select multiple columns using the
SHIFTkey. Use the Filter field to locate columns with a text string. Click Done to finish.
- Choose an existing destination or click Add a new destination.
- Select Snowflake or Amazon Redshift as the destination.
|Pipeline name||A descriptive label for your pipeline. This is how the pipeline appears on the pipeline dashboard and how Matillion Data Loader refers to the pipeline.|
|Sync every||The frequency at which the pipeline should sync. Day values include 1—7. Hour values include 1—23. Minute values include 5—59. The input is also the length of delay before the first sync.|
Currently, you can't specify a start time.
Once you are happy with your pipeline configuration, click Create pipeline to complete the process and add the pipeline to your dashboard.