Azure Blob Storage Unload
Azure Blob Storage Unload Component
This component creates files on a specified Azure Blob Storage account and loads them with data from a table or view.
By default, your data will be unloaded in parallel.
|Name||Text||The descriptive name for the component.|
|Stage||Select||Choose a predefined stage for your data. These stages must be created from your Snowflake account console. Otherwise, "Custom" can be chosen for the staging to be based on the component's properties.|
|Azure Storage Location||Select||Browse to the location of an Azure blob storage container that files will be unloaded into.|
|File Prefix||Text||Filename prefix for unloaded data to be named on the blob container. Each file will be named as the prefix followed by a number denoting which node this was unloaded from. All unloads are parallel and will use the maximum number of nodes available at the time.|
|Warehouse||Select||Choose a Snowflake warehouse that will run the load.|
|Database||Select||Choose a database to create the new table in.|
|Schema||Select||Select the table schema. The special value, [Environment Default], will use the schema defined in the environment. For more information on using multiple schemas, see this article.|
|Target Table||Text||The table or view to unload to blob storage.|
|Format||Select||Choose from preset file formats available in your Snowflake database. Additional file formats can be created using the Create File Format component. Selecting the [Custom] file format will use the component properties to define the file format.|
|File Type||Select||Choose whether you would like Matillion ETL to unload the data in a CSV, JSON, or PARQUET file format.|
|Compression||Select||Whether the input file is compressed in GZIP format, BROTLI, BZ2, DEFLATE, RAW_DEFLATE, ZSTD or not compressed at all.|
|Nest Columns||Select||Note: This parameter is only available when the File Type parameter is set to "JSON".
Specify whether or not ("True" or "False") the table columns should be nested into a single JSON object so that the file can be configured correctly. A table with a single variant column will not require this setting to be "True". Default is "False"
|Record Delimiter||Text||The delimiter to be used that separates records (rows) in the file. Defaults to newline. \n can also signify a newline. \r can signify a carriage return.|
|Field Delimiter||Text||The delimiter that separates columns. The default is a Comma. A [TAB] character can be specified as "\t".|
|Date Format||Text||Defaults to 'auto' - this can be used to manually specify a date format.|
|Time Format||Text||Defaults to 'auto' - this can be used to manually specify a time format.|
|Timestamp Format||Text||Defaults to 'auto' - this can be used to manually specify a timestamp format.|
|Escape||Select||Whether or not to insert backslashes to escape special characters. This is often a good idea if you intend to re-load the data back into a table later, since the COPY also supports this option.|
|Escape Unenclosed Field||Text||Single character string used as the escape character for unenclosed field values. Accepts common escape sequences, octal values, or hex values. Also accepts a value of NONE (default). If a character is specified in the 'Escape' field, it will override this field.|
|Field Optionally Enclosed||Text||A character that is used to enclose strings. Can be single quote (') or double quote (") or NONE (default). Note that the character chosen can be escaped by that same character.|
|Null If||Text||This option replaces the specified string with null in the output table. Use this is your data has a particular representation of missing data.|
|Overwrite||Select||If "True": If the target file already exists, overwrite data instead of generating an error. Default setting is "False".|
|Single File||Select||If set to "True", the unload will work in serial rather than parallel. This results in a slower unload but a single, complete file. Default setting is "False".|
|Max File Size||Text||The maximum size (in bytes) of each file generated, per thread. The maximum blob size is 4.77 TB (50,000 x 100MB blob blocks).|
|Include Headers||Select||If set to "True", write column names as headers at the top of the unloaded files. Default is "False".|
In this example, we are using a JIRA Query component to load some data into a table named "doc_tbl". After some transformation of the data, we wish to unload the data to an Azure Blob Storage Container for long-term storage. To this end, we use the Azure Blob Storage Unload component in a job as shown below.
The Azure Blob Storage Unload component is configured to take data from "doc_tbl" (as specified in the "Table" property) and unload it into the Blob container specified in the "Azure Storage Location" property. Each file it creates will have the prefix "JIRA_TABLE" in its name.
By exploring through the Azure Portal, we can check the contents of the Blob Storage Container and confirm that our data has landed there.
Copying Files to an Azure Premium Storage blob
When copying files to an Azure Premium Storage blob, Matillion may provide the following error:
Self-suppression not permitted.
This is because, unlike standard Azure Storage, Azure Premium Storage does not support block blobs, append blobs, files, tables, or queues. Premium Storage supports only page blobs that are incrementally sized.
A page blob is a collection of 512-byte pages that are optimised for random read and write operations. Thus, all writes must be 512-byte aligned and so any file that is not sized a multiple of 512 will fail to write.
For additional information about Azure Storage blobs, we recommend consulting the Microsoft Azure documentation.