Azure SQL Query

Azure SQL Query


This article is specific to the following platforms - Snowflake - Synapse.

Azure SQL Query

Run an SQL Query on an Azure SQL Database and copy the result to a table, via Azure Blob Storage.

This component is for data-staging - getting data into a table in order to perform further processing and transformations on it. The target table should be considered temporary, as it will either be truncated or recreated each time the components runs.



Snowflake Properties

Property Setting Description
Name String Input the descriptive name for the component.
Basic/Advanced Mode Select Basic: This mode will build an Azure SQL query using settings from the Data Source, Data Selection, and Data Source Filter parameters. In most cases, this mode will be sufficient.
Advanced: This mode will require users to write an SQL-like query, which is translated into one or more Azure SQL API calls.
Azure SQL Server Name Select The SQL Server to connect to. This list should be autopopulated by Matillion, providing that the instance credentials include access to the corresponding servers.
Database Name String This is the name of the database on the selected Azure SQL Server.
Username String This is your Azure SQL Server connection username.
Password String This is your Azure SQL Server connection password. The password is masked so it can be set, but not read.
SQL Query String This is an SQL query, written in the dialect of the SQL database. This should be a simple select query.
JDBC Options Parameter A Microsoft SQL Server parameter.
These are usually not required, since sensible defaults are assumed.
Value A value for the given Parameter.
Please contact support if you think you require an advanced JDBC option.
Data Schema Select Select the data schema.
Data Source Select Select a data source.
Data Selection Select Select one or more columns from the chosen data source to return from the query.
Data Source Filter Input Column Select an input column for your filter. The available input columns vary depending upon the data source.
Qualifier Is: Compares the column to the value using the comparator.
Not: Reverses the effect of the comparison, so "Equals" becomes "Not equals", "Less than" becomes "Greater than or equal to", etc.
Comparator Select the comparator. Note: Not all comparators will work with all possible data sources.
Choose one of "Equal to", "Greater than", "Less than", "Greater than or equal to", "Less than or equal to", or "Like".
"Equal to" can match exact strings and numeric values, while other comparators such as "Great than" and "Less than" will work only with numerics. The "Like" comparator allows the wildcard character % to be used at the start and end of a string value to match a column. The "Null" comparator matches only null values, ignoring whatever the value is set to.
Note: Not all data sources support all comparators, meaning that, often, only a subset of the above comparators will be available for selection.
Value Specify value to be compared.
Combine Filters Select Use the defined filters in combination with one another according to either And or Or.
Limit Integer Set a numeric value to limit the number of rows that can be loaded.
Warehouse Select Choose a Snowflake warehouse that will run the load.
Database Select Select the database that the newly-created table will belong to.
Schema Select Select the table schema. The special value, [Environment Default] will use the schema defined in the environment. For more information on using multiple schemas, see this article.
Table Text Provide a name for the new table to load data into.
Warning: This table will be recreated and will drop any existing table of the same name.
Storage Account Select (Azure Only) Select a Storage Account with your desired Blob Container to be used for staging the data.
Blob Container Select (Azure Only) Select a Blob Container to be used for staging the data.
Load Options Multiple Select Clean Staged Files: Destroy staged files after loading data. Default is On.
String Null is Null: Converts any strings equal to "null" into a null value. This is case sensitive and only works with entirely lower-case strings. Default is Off.
Recreate Target Table: Choose whether the component recreates its target table before the data load. If Off, the component will use an existing table or create one if it does not exist. Default is On.
File Prefix: Give staged file names a prefix of your choice. The default setting is an empty field.
Trim String Columns: Remove leading and trailing characters from a string column. Default is On
Use Grid Variable: Check this checkbox to use a grid variable. This box is unchecked by default.

Synapse Properties

Property Setting Description
Name String Input the descriptive name for the component.
Basic/Advanced Mode Select Basic: This mode will build an Azure SQL query using settings from the Data Source, Data Selection, and Data Source Filter parameters. In most cases, this mode will be sufficient.
Advanced: This mode will require users to write an SQL-like query, which is translated into one or more Azure SQL API calls.
Azure SQL Server Name Select The SQL Server to connect to. This list should be autopopulated by Matillion, providing that the instance credentials include access to the corresponding servers.
Database Name String This is the name of the database on the selected Azure SQL Server.
Username String This is your Azure SQL Server connection username.
Password String This is your Azure SQL Server connection password. The password is masked so it can be set, but not read.
SQL Query String This is an SQL query, written in the dialect of the SQL database. This should be a simple select query.
JDBC Options Parameter A Microsoft SQL Server parameter.
These are usually not required, since sensible defaults are assumed.
Value A value for the given Parameter.
Please contact support if you think you require an advanced JDBC option.
Data Schema Select Select the data schema.
Data Source Select Select a data source.
Data Selection Select Select one or more columns from the chosen data source to return from the query.
Data Source Filter Input Column Select an input column for your filter. The available input columns vary depending upon the data source.
Qualifier Is: Compares the column to the value using the comparator.
Not: Reverses the effect of the comparison, so "Equals" becomes "Not equals", "Less than" becomes "Greater than or equal to", etc.
Comparator Select the comparator. Note: Not all comparators will work with all possible data sources.
Choose one of "Equal to", "Greater than", "Less than", "Greater than or equal to", "Less than or equal to", or "Like".
"Equal to" can match exact strings and numeric values, while other comparators such as "Great than" and "Less than" will work only with numerics. The "Like" comparator allows the wildcard character % to be used at the start and end of a string value to match a column. The "Null" comparator matches only null values, ignoring whatever the value is set to.
Note: Not all data sources support all comparators, meaning that, often, only a subset of the above comparators will be available for selection.
Value Specify value to be compared.
Combine Filters Select Use the defined filters in combination with one another according to either And or Or.
Limit Integer Set a numeric value to limit the number of rows that can be loaded.
Schema Select Select the table schema. The special value, [Environment Default], will use the schema defined in the environment. For more information on schemas, please see the Azure Synapse documentation.
Table String Provide a new table name.
Warning: This table will be recreated on each run of the job, and drop any existing table of the same name.
Storage Account Select Select an Azure storage account with your desired blob container to be used for staging the data.
Please visit the Azure documentation for help creating an Azure Storage Account.
Blob Container Select Select a blob container to be used for staging the data. The blob containers available for selection depend on the chosen storage account.
Load Options Multiple Select Configure this Orchestration Job's load options. These load options will apply each time the job runs. Sensible defaults are assumed. Clean Staged Files: Destroy staged files after loading data. Default is On.
String Null is Null: Converts any strings equal to "null" into a null value. This load option is case-sensitive and only works with entirely lower-case strings. Default is Off.
Recreate Target Table: Choose whether the component recreates its target table before the data load. If set to Off, the existing table will be used instead. Default is On.
File Prefix: Give staged file names a prefix of your choice. The default setting is an empty field.
Use Grid Variable: Check this checkbox to use a grid variable. This box is unchecked by default.
Distribution Style Select Select the distribution style
Hash: This setting assigns each row to one distribution by hashing the value stored in the distribution_column_name. The algorithm is deterministic, meaning it always hashes the same value to the same distribution. The distribution column should be defined as NOT NULL, because all rows that have NULL are assigned to the same distribution.
Replicate: This setting stores one copy of the table on each Compute node. For SQL Data Warehouse, the table is stored on a distribution database on each Compute node. For Parallel Data Warehouse, the table is stored in an SQL Server filegroup that spans the Compute node. This behavior is the default for Parallel Data Warehouse.
Round Robin: Distributes the rows evenly in a round-robin fashion. This is the default behaviour.
For more information, please read this article.
Distribution Column Select Select the column to act as the distribution column. This property is only available when the Distribution Style property is set to "Hash".
Index Type Select Select the table indexing type. Options include:
Clustered: A clustered index may outperform a clustered columnstore table when a single row needs to be retrieved quickly. The disadvantage to using a clustered index is that only queries that benefit are the ones that use a highly selective filter on the clusted index column. Choosing this option prompts the Index Column Grid property.
Clustered Column Store: This is the default setting. Clustered columnstore tables offer both the highest level of data compression and the best overall query performance, especially for large tables. Choosing this option prompts the Index Column Order property.
Heap: Users may find that using a heap table is faster for temporarily landing data in Synapse SQL pool. This is because loads to heaps are faster than to index tables, and in some cases, the subsequent read can be done from cache. When a user is loading data only to stage it before running additional transformations, loading the table to a heap table is much faster than loading the data to a clustered columnstore table.
For more information, please consult the Azure Synapse documentation.
Index Column Grid Name The name of each column.
Sort Assign a sort orientation of either acending (Asc) or descending (Desc).
Index Column Order Multiple Select Select the columns in the order to be indexed.
Partition Key Select Select the table's partition key. Table partitions determine how rows are grouped and stored within a distribution.
For more information on table partitions, please refer to this article.
This property is empty by default.
Range Direction Select Select the direction for the values. The default is "Left".
Boundary Values Select Configure the boundary values.

Variable Exports

This component makes the following values available to export into variables:

Source Description
Time Taken To Stage The amount of time (in seconds) taken to fetch the data from the data source and upload it to storage.
Time Taken To Load The amount of time (in seconds) taken to execute the COPY statement to load the data into the target table from storage.
>

Strategy

Connect to the Azure SQL Database and issue the query. Stream the results into objects on Azure Blob Storage. Then create or truncate the target table and issue a COPY command to load the Azure Blob Storage objects into the table. Finally, clean up the temporary Azure Blob Storage objects.