Google Drive Table Component
Google Drive Table
Create a table that references data stored in Google Drive.
Note that the Matillion ETL instance must have access to this data; that is, the GCP credentials used in Matillion must be from an account that has the right to at least view the data. Publically viewable sheets can be accessed and used by Matillion regardless of credentials.
Should a sheet be specified that Matillion does not have proper access to, this component may still successfully run as it is merely referencing the data in a table. However, when the table itself is used (e.g. sampled), an error may then occur when the referenced data cannot be accessed.
Note: It is essential that your GCP credentials have the Google Drive API enabled. See the GCP OAuth documentation for details. Due to limitations of how GBQ communicates with Google Drive, it may be necessary to grant additional API scope - even beyond 'Allow full access to all Cloud APIs'. To do this, use the following command (completed with your own information in the <brackets>) in the GBQ console:
gcloud beta compute instances set-scopes <your instance> --service-account <firstname.lastname@example.org> --scopes https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/drive
|Name||Text||The descriptive name for the component.
This is automatically determined from the table name when the Table Name property is first set.
|Project||Text||Enter the name of the Google Cloud Platform Project that the table belongs to.|
|Dataset||Text||Enter the name of the Google Cloud Platform Dataset that the table belongs to.|
|New Table Name||Text||Select an existing table to load data into.|
|Table Metadata||Column Name||The name of the new column|
|Data Type||For more information on available BigQuery data types please refer to the GCP documentation.
String: this type can hold any kind of data, subject to a maximum size.
Integer: this type is suitable for whole-number types (no decimals).
Float: this type is suitable for numeric types, with or without decimals.
Numeric: this data type is suitable for data of an exact numeric value, allowing 38 digits of precision and 9 decimal digits of scale.
Boolean: this type is suitable for data that is either 'true' or 'false'.
Date: a formatted date object without time. See the GCP documentation.
Time: a formatted time object without date. See the GCP documentation.
DateTime: a formatted timestamp containing both date and time that is easily readable by the user. See the GCP documentation.
Timestamp: this type is a timestamp left unformatted (exists as Unix/Epoch Time).
|Define Nested Metadata||When the Define Nested Metadata checkbox is ticked inside the 'Table Metadata' property, a tree-structure can be defined for metadata.|
|Mode||The field mode. Default is 'NULLABLE'.
NULLABLE: Field allows null values
REQUIRED: Field does not accept null values
REPEATED: Field can accept multiple values
|Create/Replace||Select||Create: The default option, creates a new table. This will generate an error if a table with the same name already
exists, but will never destroy existing data.
Create if not exists: This will only create a new table if a table of the same name does not already exist. It will not destroy existing data. If the schema of the existing table does not match the schema defined in this component, no attempt is made to fix or correct it, which could lead to errors later in the job if you did not expect an existing table to exist, or to have a different schema to the one defined in this component.
Replace This drops any existing table of the same name, and then creates a new table. This guarantees that after the component succeeds the table matches the schema defined in this component, however any existing data in an existing table will be lost.Note: Since other database objects may depends upon this table,
drop ... cascadeis used which may actually remove many other database objects.
|Google Drive URL Location||Text||The URL of the Google Drive file. This follows the format https://drive.google.com/open?id=(fileid).|
|Compression||Select||Whether the input file is compressed in GZIP format or not compressed at all.|
|File Format||Select||Google Sheets
JSON (New line delimited): this requires an additional "JSON Format".
|Number of Errors Allowed||Text||The maximum number of individual parsing errors that cause the whole load to fail. Values up to this will be substituted as null values. This value defaults to 0.|
|Ignore Unknown Values||Select||Yes: Accept rows that contain values that do not match the schema. Unknown values are ignored. Will ignore extra
values at the end of a line for CSV files.
No: Omit any rows with invalid values.
|Delimiter||Select||The delimiter that separates columns. The default is a Comma. A [TAB] character can be specified as "\t".|
|CSV Quoter||Text||Specifies the character to be used as the quote character when using the CSV option.|
|Encoding||Select||The encoding the data is in. This defaults to UTF-8.|
|Header Rows To Skip||Text||The number of rows at the top of the file to ignore - defaults to 0.|
|Allow quoted newlines||Select||Yes: Allow a CSV value to contain a newline character when the value is encased in quotation marks.
No: A new line character, regardless of quotations, is always considered a new row.
|Allow Jagged Rows||Select||Yes: Missing values are treated as 'null' but accepted.
No: Rows with missing data are treated as bad records. Note: A bad record will count toward the 'Maximum Errors' count.
In this example, we have a small list of x and y coordinates in a google sheet and we want to transfer that to a table. The google sheet doc is shown below. Note that the document has no headers (such as x and y) and contains only integer values.
To tackle this problem we use the Google Drive component in an orchestration job.
The Google Drive component Selecting 'replace' in the 'Create/Replace' property allows us to run this multiple times (if desired) without deleting or manually replacing the table each time. Since the component is creating a new table, it requires knowledge of how to create the columns correctly. Since we are expecting x and y coordinates, we make 2 columns (x and y) and ensure they are both integers. Finally, we set the component to accept Google Sheets and give it the Google Drive URL for our sheet.
Where does this Google Drive URL come from? By browsing to the Google Drive account and right-clicking the relevant file, the 'Get Shareable Link' option will generate the Google Drive link for this document.
Note: that this is generally different to the Google Docs link that is generated through the 'Share...' link. Only a Google Drive link will allow this component to correctly access the file.
Running the job will take the sheet data and load it into the specified table. Finally, we set up a second job (A Transformation job) with a single 'Table Input' component to read the new data. This component can link to many others in order to transform the data but for now, we simply use the 'Sample' tag to ensure the data has been transfered correctly.