-
DarkLight
Cloud Storage Unload
-
DarkLight

Cloud Storage Unload
The Cloud Storage Unload component writes files from a table into a specified Google Cloud Storage (GCS) bucket.
For Google BigQuery users: this component cannot be used to unload views to Google BigQuery. Users wanting to unload views should first create a table with that view's metadata using a Create Table component. Next, use a Table Input component to select your view, then connect it to a Table Output component to copy the data to the new table. Finally, use your new table in the Cloud Storage Unload component.
For Snowflake users: this component can be used to unload views to your Snowflake data warehouse.
Properties
Snowflake Properties | ||
---|---|---|
Property | Setting | Description |
Name | String | A human-readable name for the component. |
Stage | Select | Select your stage. To learn more about stages in Matillion ETL, read Manage Stages. |
Google Storage URL Location | GCS Bucket | Input, or select via the file path tree, the URL of the Google Storage bucket from which to retrieve files. This follows a format such as gs://<bucket-name>/<path>. When a user enters a forward slash character / after a folder name, a validation of the file path is triggered. This works in the same manner as the Go button. |
File Prefix | String | A prefix added to files during the unload. The default prefix is data |
Storage Integration | Select | Select the storage integration. Storage integrations are required to permit Snowflake to read data from and write to a cloud storage location. Integrations must be set up in advance of selecting them in Matillion ETL. To learn more about setting up a storage integration, read Storage Integration Setup Guide. Note: Storage integrations can be configured to support Amazon S3, Google Cloud Storage, or Microsoft Azure cloud storage regardless of the cloud provider that hosts your Snowflake account. |
Warehouse | Select | A Snowflake warehouse to perform the data load. The special value, [Environment Default], uses the warehouse specified in the Matillion ETL environment. For more information about warehouses, please refer to the Snowflake documentation. |
Database | Select | A Snowflake database to create the new table in. The special value, [Environment Default], uses the database specified in the Matillion ETL environment. For more information about databases, please refer to the Snowflake documentation. |
Schema | Select | A Snowflake schema. A Snowflake schema is a logical grouping of database objects such as tables or views. The special value, [Environment Default], uses the schema specified in the Matillion ETL environment. |
Target Table | Select | Choose a target table to unload data from into a GCS bucket. Warning: This table will be recreated and drop any existing table of the same name upon each run. |
Format | Select | A named file format that describes a set of staged data to access or load into Snowflake tables. The default is [Custom]. |
File Type | Select | The file type for the file format. Available file types are: CSV, JSON, and PARQUET. Note: Component properties beneath this property will change depending on the chosen file type. Matillion ETL provides settings based on the specific file type. For additional information on file type options, please refer to the Snowflake documentation. |
Compression | Select | Select the compression method if you wish to compress your data. If you do not wish to compress at all, select NONE. The default setting is AUTO. For more information, please refer to the relevant Snowflake documentation per your file type: |
Nest Columns | Boolean | (JSON only) When "True", columns are nested. Default setting is "False". |
Record Delimiter | String | (CSV only) Input a delimiter for records. This can be one or more singlebyte or multibyte chatracters that separate records in a file. Notes:
|
Field Delimiter | String | (CSV only) Input a delimiter for fields. This can be one or more singlebyte or multibyte characters that separate fields in a file. Notes:
|
Date Format | String | (CSV only) Specify the date format. Snowflake accepts dates in the most common forms, including YYYY-MM-DD and DD-MON-YYYY among others. Additionally, all accepted timestamps are valid inputs for dates. Please refer to the Snowflake documentation for more information about date and time formats. |
Time Format | String | (CSV only) Specify the time format. Snowflake supports a single TIME data type, for storing times in the form of HH:MI:SS. TIME supports an optional precision parameter for fractional seconds, e.g. TIME(3). Time precision ranges between 0 (seconds) to 9 (nanoseconds). The default precision is 9. Please refer to the Snowflake documentation for more information about time and date formats. |
Timestamp Format | String | (CSV only) Specify the timestamp format. Snowflake supports three variations of timestamp: TIMESTAMP_LTZ, TIMESTAMP_NTZ, and TIMESTAMP_TZ. To learn more, please consult the Snowflake documentation. |
Escape | Select | (CSV only) Specify a single character to be used as the escape character for field values that are enclosed. Default is NONE. |
Escape Unenclosed Field | String | (CSV only) Specify a single character to be used as the escape character for unenclosed field values only. Accepts common escape sequences, octal values, or hex values. Also accepts a value of NONE (default). Default is \\\\. If a character is specified in the "Escape" field, it will override this field. If you have set a value in the property Field Optionally Enclosed, all fields will become enclosed, rendering the Escape Unenclosed Field property redundant, in which case it will be ignored. |
Field Optionally Enclosed | String | (CSV only) Specify a character used to enclose strings. The value can be NONE, single quote character ('), or double quote character ("). To use the single quote character, use the octal or hex representation (0x27) or the double single-quoted escape (''). Default is NONE. Note: When a field contains one of these characters, escape the field using the same character. For example, to escape a string like this: 1 "2" 3, use double quotation to escape, like this: 1 ""2"" 3. |
Null If | String | Specify a string to convert to SQL NULL values. In this field, users can specify a value that will be used when unloading the data to the chosen file. If an SQL NULL value is found, then the NULL value is replaced with the first value listed in this. For unloads, only the first string that you specify in this property is used. |
Trim Space | Boolean | When "True", removes white space from fields. Default setting is "False". |
Overwrite | Boolean | When "True", overwrites existing files with matching names in the location where files are stored. The option does not remove any existing files that do not match the names of the files that the COPY command unloads. Default setting is "False". |
Single File | Boolean | When True, the unload will work in serial rather than parallel. This results in a slower unload but a single, complete file. The default setting is False. When True, no file extension is used in the output filename (regardless of the file type, and regardless of whether or not the file is compressed). When False, a filename prefix must be included in the path. |
Max File Size | Number | A number, greater than 0, that specifies the upper size limit (in bytes) of each file to be generated in parallel per thread. Note: the actual file size and number of files unloaded are determined by the total amount of data and number of nodes available for parallel processing. |
Include Headers | Boolean | When "True" header rows are included. Default setting is "False". |
BigQuery Properties | ||
---|---|---|
Property | Setting | Description |
Name | String | A human-readable name for the component. |
Project | Select | Select the target BigQuery project to load data into. [Environment Default] is set as the default and uses the project defined in the environment. |
Dataset | Select | Select the target BigQuery dataset to load data into. [Environment Default] is set as the default and uses the dataset defined in the environment. For more information on Google Cloud Datasets, visit the official documentation |
Table | Select | Select the table from which data will be unloaded to the GCS bucket. |
Google Storage URL Location | Filepath | Select | Select the Google Cloud Storage bucket. Users can click through the file tree, or use the URL template: gs://<bucket>/<path> .When a user enters a forward slash character / after a folder name, a validation of the file path is triggered. This works in the same manner as the Go button. |
Output Object Name | String | Specify a name for the output object (the object that will be created in the chosen GCS bucket). |
Format | Select | Select the format of the data. Users can select one of: AVRO, CSV, JSON (New line delimited). |
Include Header | Yes | No | (CSV format only) Select "Yes" to add a header line to the top of each file that has a column name. The default setting is "Yes". |
Compression | Select | (AVRO format only) Select the AVRO file format compression type. Options include: Deflate, Snappy, or no compression (None). (CSV, JSON formats only) Select whether or not output files are to be compressed via gzip compression. |
Delimiter | Delimiting Character | (CSV format only) Specify a delimiter character to separate columns. The default value is a comma , A [TAB] character can be specified as "/t". |