Azure Blob Load Generator

Azure Blob Load Generator


This article is specific to the following platforms - Snowflake - Synapse.

Overview

The Azure Blob Load Generator component helps users load delimited data from public objects in an Azure Storage Blob.

The load wizard takes the form of a tool that empowers users to load and view files on the fly, altering load component properties and observing their effects without the need for a separate Transformation Job. The generator can also guess the schema of a table, relieving much of the end user's work.

This component requires working Azure credentials with "read" access to the Azure Blob containing the source data's file(s).


Getting Started

Search for "Azure Blob Load Generator" in the Components panel.

Drag the Azure Blob Load Generator component onto the canvas. This action will trigger the setup wizard.


1. Get Sample

The first page of the wizard requires users to configure the following settings:

Azure Blob Storage

Specify the storage location. Users can either:

  1. Manually paste or type the Azure storage location URL following the template: azure://<account>/<container>/<path>

  2. Click and navigate through the file structure to find the intended storage location.

The selected file must be delimited (including .csv) and the user must have permission to access the file. If the file is compressed, the correct compression method must be selected.

supported compression methods are gzip and bzip2.



Compression

Use this dropdown menu to select either None or GZIP compression.

Use this dropdown menu to select either None, GZip, or BZip2 compression.


Row Limit

Set a row limit for the sample. The default value is 50.


Click Get Sample to instruct Matillion ETL to return a sample of data from the chosen object, as in the image below. Raw data is displayed in the panel.


2. Guess Schema

Page two of the wizard will automatically guess the schema, provided that you have clicked Get Sample on the previous page.

If you wish to alter the configuration manually, the table below cites the setup properties.

Property Setting Description
Table Name String A human-readable name for the table.
Field Delimiter Character Specify a delimiter to separate fields. The default is a comma [,].
A [TAB] character can be specified as "\t".
Field Optionally Enclosed By Character A character that is used to enclose strings. Can be single quote (') or double quote (") or NONE (default). Note that the character chosen can be escaped by that same character.
First Row Integer Typically the default will be 2. This is because row 1 may be reserved for column headers.
Date Format Select Select the date format for the file.
Max Errors Integer Specify the maximum number of rejected rows allowed in the load before the COPY operation is cancelled. Each row that the COPY operation cannot import is ignored and counted as one error. The default value for this property is 0.
Property Setting Description
For more information on all the settings in this component, see the Snowflake COPY syntax for more information.
Table Name String A human-readable name for the table.
Field Delimiter Character Specify a delimiter to separate fields. The default is a comma [,].
A [TAB] character can be specified as "\t".
Field Optionally Enclosed By Character A character that is used to enclose strings. Can be single quote (') or double quote (") or NONE (default). Note that the character chosen can be escaped by that same character.
Ignore Header Rows Integer The number of rows at the top of the file to ignore. Default is 0.
Date Format Date Specify a date format. Defaults to auto.
Time Format Time Specify a time format. Defaults to auto.
Time Stamp Format Timestamp Specify the format of timestamp values in the data files to be loaded. Defaults to auto.
Escape Unenclosed Character Single character string used as the escape character for unenclosed field values only. Default is backslash (\\).
Escape String Single character used as the escape character for any field values.
Null As String This option replaces the specified string with NULL in the output table. Use this if your data has a particular representation of missing data.
Trim Blanks Checkbox Tick to remove trailing and leading white space from the input data.
Error on Column Count Checkbox If enabled, a parsing error is generated when the number of delimited columns (fields) in the input data file does not match the number of columns in the corresponding table.
Empty Field As Null Checkbox If ticked, empty columns in the input file will become NULL.

Users can add columns for their table in the right-hand pane. Click to add a new column, and specify the Name, Type, Size, and Decimal Places values. To remove a column, click .


3. Validation

Click Test to ensure that the table loads as expected. If you need to reconfigure the setup, click Back. Otherwise, click Create & Run.

The Orchestration Job will then be deployed to the canvas, ready to run.