Azure Load DL
  • Dark
    Light

Azure Load DL

  • Dark
    Light

This article is specific to the following platforms - Delta Lake.

Azure Blob Storage Load

The Azure Blob Storage Load component lets users load data into an existing table from objects stored in Azure Blob Storage.

Azure Blob Storage is used for storing large amounts of unstructured object data, for example as text or binary data.

To learn more, read Blob storage.

Properties

Delta Lake Properties

Property Setting Description
Name String A human-readable name for the component.
Storage Account Select Select an Azure Blob Storage account. An Azure storage account contains all of your Azure Storage data objects: blobs, files, queues, tables, and disks. For more information, read Storage account overview.
Blob Container Select A Blob Storage location. The available blob containers will depend on the selected storage account.
Pattern String A string that will partially match all filenames that are to be included in the load. Defaults to .* indicating all files within the Azure Storage Location.
Catalog Select Select a Databricks Unity Catalog. The special value, [Environment Default], will use the catalog specified in the Matillion ETL environment setup. Selecting a catalog will determine which databases are available in the next parameter.
Database Select Select the Delta Lake database. The special value, [Environment Default], will use the database specified in the Matillion ETL environment setup.
Target Table Select Select the table into which data will be loaded from Azure Blob storage.
Load Columns Column Select Select which of the target table's columns to load. Move columns to the right using the arrow buttons to include them in the load. Columns on the left will be excluded from the load.
Recursive File Lookup Boolean When enabled, disables partition inference. To control which files are loaded use the "pattern" property instead.
File Type Select Select the file type. Available types include AVRO, CSV, JSON, and PARQUET. Below properties will change to reflect the selected file type.
Skip Header Boolean (CSV only) When True, uses the first line as names of columns. Default is False.
Field Delimiter Delimiting Character (CSV only) Specify a delimiter to separate columns. The default is a comma ,.
A TAB character can be specified as "\ ".
Date Format String (CSV & JSON only) Manually set a date format. If none is set, the default is yyyy-MM-dd.
Timestamp Format String (CSV & JSON only) Manually set a timestamp format. If none is set, the default is yyyy-MM-dd'T'HH:mm:ss.[SSS][XXX].
Encoding Type String (CSV & JSON only) Decodes the CSV files via the given encoding type. If none is set, the default is UTF-8.
Mode Select Select the mode for handling corrupted records during parsing.
DROPMALFORMED: ignores corrupted records.
FAILFAST: throws an exception when it meets corrupted records.
PERMISSIVE: when a corrupted record is met, the malformed string is placed into a field configured by columnNameOfCorruptRecord, and the malformed field is set to null. This is the default setting.
Ignore Leading White Space Boolean (CSV only) When True, skips any leading whitespaces. Default is False.
Ignore Trailing White Space Boolean (CSV only) When True, skips any trailing whitespaces. Default is False.
Infer Schema Boolean (CSV only) When True, infers the input schema automatically from the data. Default is False.
Multi Line Boolean When True, parses records, which may span multiple lines. Default is False.
Null Value String (CSV only) Sets the string representation of a null value. The default value is an empty string.
Empty Value String (CSV only) Sets the string representation of an empty value. The default value is an empty string.
Primitive as String Boolean (JSON only) When True, primitive data types are inferred as strings. Default is False.
Prefers Decimal Boolean (JSON only) When True, infers all floating-point values as a decimal type. If the values do not fit in decimal, then they are inferred as doubles. Default is False.
Allow Comments Boolean (JSON only) When True, allows JAVA/C++ comments in JSON records. Default is False.
Allow Unquoted Field Names Boolean (JSON only) When True, allows unquoted JSON field names. Default is False.
Allow Single Quotes Boolean (JSON only) When True, allows single quotes in addition to double quotes. Default is True.
Allow Numeric Leading Zeros Boolean (JSON only) When True, allows leading zeros in numbers, e.g. 00019. Default is False.
Allow Backslash Escaping Any Character Boolean (JSON only) When True, allows accepting the quoting of all characters using the backslash quoting mechanism \\. Default is False.
Allow Unquoted Control Chars Boolean (JSON only) When True, allows JSON strings to include unquoted control characters (ASCII characters where their value is less than 32, including Tab and line feed characters). Default is False.
Drop Field If All Null Boolean (JSON only) When True, ignores column of all null values or empty arrays/structs during the schema inference. Default is False.
Merge Schema Boolean (AVRO, PARQUET only) When True, merges schemata from all Parquet part-files. Default is False.
Path Glob Filter String An optional glob pattern, used to only include files with paths matching the pattern.
Force Load Boolean When True, idempotency is disabled and files are loaded regardless of whether they have been loaded before. Default is False.

What's Next