RDS Query Component

RDS Query Component


This article is specific to the following platforms - Redshift - Snowflake.

RDS Query

The RDS Query component enables users to run an SQL query on an RDS database and then copy the results to a table via Amazon S3 (Amazon Simple Storage Service).

This component is for staging data. That is to say, retrieving data and loading that data into a table wherein users can perform data transformations to enrich and manage the data in permanent tables.

Warning: This component is potentially destructive. If the target table undergoes a change in structure, it will be recreated.

Note: This component is only available for Snowflake instances on AWS.


Redshift Properties

Property Setting Description
Name Text The descriptive name of the component.
Basic/Advanced Mode Select Basic: This mode will build a query using settings determined by the user in the Data Source, Data Selection, and Data Source Filter properties. In most use cases, this mode will be sufficient.
Advanced: (The default setting for this component.) This mode will require the user to write an SQL-like query to call data from the RDS database.
Database Type Select aurora: Amazon Aurora (see AWS Documentation here).
mariadb: Amazon RDS for MariaDB (see AWS documentation here).
mssql: Microsoft SQL Server (see AWS Documentation here).
mysql: MySQL (see AWS Documentation here).
oracle: Oracle (see AWS Documentation here).
postgresql: (The default setting for this component.) PostgreSQL (see AWS Documentation here).
Note: For Oracle, users must provide an Oracle JDBC driver, because this is not distributed with Matillion ETL.
RDS Endpoint Select/Text Please select an RDS Database Endpoint from the dropdown menu. By default, this property offers the user a list of all of the RDS instances available within the user's current region that are the same type as the selected Database Type.
If the desired endpoint is located in a different region, or the user is not running on Amazon EC2 and therefore does not have a region, they can provide values manually. To acquire your database endpoint and provide it manually, follow these steps:
  1. Log in to the AWS Console.
  2. In the "Find Services" search bar, search for "RDS".
  3. In the "Amazon RDS" navigation column on the left side of your screen, click "Databases".
  4. Select a database.
  5. Locate the endpoint for that database in the "Connectivity & security" section.
Note: Users must include the port number when manually typing the endpoint.
Database Name Text Please provide the name of the database within your RDS instance. In the AWS Console, this is the "DB identifier".
Username Text Please provide the RDS connection username.
Password Text Please provide the RDS connection password. Users have the option to store their password inside the component, but it is highly recommended to use the Password Manager option.
SQL Query Text (Advanced Mode only.) Use this setting to write an SQL query, written in the dialect of the RDS database. The query can be as simple as
select * from tablename
Where possible, it should be a simple "select" query.
JDBC Options Parameter A JDBC parameter supported by the Database Driver. The available parameters are determined automatically from the driver and may change from version to version. Manual input is not normally required, since sensible defaults are assumed.
Value A value for the given parameter.
Data Schema Select Please select a valid schema. This property being available is dependent on the selected database.
Data Source Select Please select a data source.
Data Selection Multiple Select Please select one or more columns to return from the query. The columns available depend on the selected Data Source.
Data Source Filter Input Column The available input columns vary depending upon the Data Source.
Qualifier Is: Compares the column to the value using the comparator.
Not: Reverses the effect of the comparison, so "Equals" becomes "Not equals", "Less than" becomes "Greater than or equal to", etc.
Comparator Choose a method of comparing the column to the value. Possible comparators include: "Equal To", "Greater than", "Less than", "Greater than or equal to", "Less than or equal to", "Like", "Null".

"Equal To" can match exact strings and numeric values while other comparators, including "Greater than", "Less than", "Greater than or equal to", and "Less than or equal to", will work only with numerics. The "Like" operator allows the wildcard character (%) to be used at the start and end of a string value to match a column. The Null operator matches only Null values, ignoring whatever the value is set to.

Note: Not all data sources support all comparators, meaning that it is likely that only a subset of the above comparators will be available to choose from.
Value The value to be compared.
Combine Filters Select Use the defined filters in combination with one another according to either "and" or "or".
Limit Number Set a limit for the number of rows that are loaded from the file.
Type Select Choose between using a standard table or an external table.
Standard: The data will be staged on an S3 bucket before being loaded in to a table.
External: The data will be put into an S3 bucket and referenced by an external table.
Schema Select Select the table schema. The special value [Environment Default], will use the schema defined in the environment. For more information on using multiple schemas, see this article.
Note: An external schema is required if the Type property is set to External.
Target Table Text Provide a new table name.
Warning: This table will be recreated and will drop any existing table of the same name.
S3 Staging Area Text The name of an S3 bucket for temporary storage. Ensure your access credentials have S3 access, as well as permission to write to the bucket. See this document for details on managing credentials in Matillion ETL. Visit the Snowflake documentation for further information on configuring secure access to S3.
Note: The temporary objects created in this bucket will be removed again after the load completes, they are not kept.
Distribution Style Select All: Copy rows to all nodes in the Redshift cluster.
Auto: (Default) Allow Redshift to manage your distribution style.
Even: Distributes rows around the Redshift cluster evenly.
Key: Distribute rows around the Redshift cluster according to the value of a key column.
Table distribution is critical to good performance. See the Amazon Redshift documentation for more information.
Sort Key Multiple Select Please specify the columns from the input that should be set as the table's sort key.
Sort keys are critical to good performance - see the Amazon Redshift documentation for more information.
Sort Key Options Select Decide whether the sort key is of a compound or interleaved variety - see the Amazon Redshift documentation for more information.
This property becomes available once a sort key has been selected.
Primary Key Select Please Select one or more columns to be designated as the table's primary key.
Load Options Multiple Select Columns Comp Update: Apply automatic compression to the target table. Default is On.
Stat Update: Automatically update statistics when filling a table. Default is On. In this case, it is updating the statistics of the target table.
Clean S3 Objects: Automatically remove UUID-based objects on the S3 bucket. Default is On. Effectively, users decide here whether to keep the staged data in the S3 bucket or not.
String Null is Null: Converts any strings equal to "null" into a null value. This is case sensitive and only works with entirely lower-case strings. Default is On.
Recreate Target Table: Choose whether the component recreates its target table before the data load. If Off, the component will use an existing table or create one if it does not exist. Default is On.
File Prefix: Give staged file names a prefix of your choice. When this Load Option is selected, users should set their preferred prefix in the text field.
Use Grid Variable: Check this checkbox to use a grid variable. This box is unchecked by default.
Concurrency Number Please set the number of S3 files to create. Doing this helps for loading into Amazon Redshift, since files are loaded in parallel. Matillion ETL can upload parts of these files concurrently. Default is 2.
Note: The maximum concurrency is 8 times the number of processors on the user's cloud instance. For example, an instance with 2 processors has a maximum concurrency of 16.
Encryption Select Decide on how the files are encrypted inside the S3 Bucket.
None: No encryption.
SSE KMS: Encrypt the data according to a key stored on KMS.
SSE S3: Encrypt the data according to a key stored on an S3 bucket
KMS Key ID Select The ID of the KMS encryption key you have chosen to use in the Encryption property.
This property only becomes available if a KMS key has been set.

Snowflake Properties

Property Setting Description
Name Text The descriptive name of the component.
Basic/Advanced Mode Select Basic: This mode will build a query using settings determined by the user in the Data Source, Data Selection, and Data Source Filter properties. In most use cases, this mode will be sufficient.
Advanced: (The default setting for this component.) This mode will require the user to write an SQL-like query to call data from the RDS database.
Database Type Select aurora: Amazon Aurora (see AWS Documentation here).
mariadb: Amazon RDS for MariaDB (see AWS documentation here).
mssql: Microsoft SQL Server (see AWS Documentation here).
mysql: MySQL (see AWS Documentation here).
oracle: Oracle (see AWS Documentation here).
postgresql: (The default setting for this component.) PostgreSQL (see AWS Documentation here).
Note: For Oracle, users must provide an Oracle JDBC driver, because this is not distributed with Matillion ETL.
RDS Endpoint Select/Text Please select an RDS Database Endpoint from the dropdown menu. By default, this property offers the user a list of all of the RDS instances available within the user's current region that are the same type as the selected Database Type.
If the desired endpoint is located in a different region, or the user is not running on Amazon EC2 and therefore does not have a region, they can provide values manually. To acquire your database endpoint and provide it manually, follow these steps:
  1. Log in to the AWS Console.
  2. In the "Find Services" search bar, search for "RDS".
  3. In the "Amazon RDS" navigation column on the left side of your screen, click "Databases".
  4. Select a database.
  5. Locate the endpoint for that database in the "Connectivity & security" section.
Note: Users must include the port number when manually typing the endpoint.
Database Name Text Please provide the name of the database within your RDS instance. In the AWS Console, this is the "DB identifier".
Username Text Please provide the RDS connection username.
Password Text Please provide the RDS connection password. Users have the option to store their password inside the component, but it is highly recommended to use the Password Manager option.
SQL Query Text (Advanced Mode only.) Use this setting to write an SQL query, written in the dialect of the RDS database. The query can be as simple as
select * from tablename
Where possible, it should be a simple "select" query.
JDBC Options Parameter A JDBC parameter supported by the Database Driver. The available parameters are determined automatically from the driver and may change from version to version. Manual input is not normally required, since sensible defaults are assumed.
Value A value for the given parameter.
Data Source Select Please select a data source.
Data Selection Multiple Select Please select one or more columns to return from the query. The columns available depend on the selected Data Source.
Data Source Filter Input Column The available input columns vary depending upon the Data Source.
Qualifier Is: Compares the column to the value using the comparator.
Not: Reverses the effect of the comparison, so "Equals" becomes "Not equals", "Less than" becomes "Greater than or equal to", etc.
Comparator Choose a method of comparing the column to the value. Possible comparators include: "Equal To", "Greater than", "Less than", "Greater than or equal to", "Less than or equal to", "Like", "Null".

"Equal To" can match exact strings and numeric values while other comparators, including "Greater than", "Less than", "Greater than or equal to", and "Less than or equal to", will work only with numerics. The "Like" operator allows the wildcard character (%) to be used at the start and end of a string value to match a column. The Null operator matches only Null values, ignoring whatever the value is set to.

Note: Not all data sources support all comparators, meaning that it is likely that only a subset of the above comparators will be available to choose from.
Value The value to be compared.
Combine Filters Select Use the defined filters in combination with one another according to either "and" or "or".
Limit Number Set a limit for the number of rows that are loaded from the file.
Primary Keys Multiple Select (Default state for this setting is empty.) Please select one or more columns to be designated as Primary Keys for the table.
Warehouse Select Choose a Snowflake warehouse that will run the load.
Database Select Choose a database to create the new table in.
Schema Select Select the table schema. The special value [Environment Default], will use the schema defined in the environment. For more information on using multiple schemas, see this article.
Target Table Text Provide a new table name.
Warning: This table will be recreated and will drop any existing table of the same name.
Staging Select Select a staging setting.
Snowflake Managed: Allow Matillion ETL to create and use a temporary internal stage on Snowflake for staging the data. This stage, along with the staged data, will cease to exist after loading is complete.
(AWS only) Existing Amazon S3 Location: Selecting this will offer the S3 Staging Area property, allowing users to specify a custom staging area on Amazon S3.
(Azure only) Existing Azure Blob Storage Location: Selecting this will offer the Storage Account and Blob Container properties, allowing users to specify a custom staging location on Azure.
(GCP only) Existing Google Cloud Storage Location: Selecting this will offer the GCS Staging Area property, allowing users to specify a custom staging area within Google Cloud Storage.
GCS Staging Area Select (GCP only) The URL and path of the target Google Storage bucket to be used for staging the queried data.
Integration Select Choose your Google Cloud Storage Integration. Integrations are required to permit Snowflake to read data from and write to a Google Cloud Storage bucket. Integrations must be set up in advance of selecting them in Matillion ETL. To learn more about setting up a storage integration, read our Storage Integration Setup Guide.
S3 Staging Area Text The name of an S3 bucket for temporary storage. Ensure your access credentials have S3 access, as well as permission to write to the bucket. See this document for details on managing credentials in Matillion ETL. Visit the Snowflake documentation for further information on configuring secure access to S3.
Note: The temporary objects created in this bucket will be removed again after the load completes, they are not kept.
This property is available when using an Existing Amazon S3 Location for Staging.
Encryption Select Decide on how the files are encrypted inside the S3 Bucket. This property is available when using an Existing Amazon S3 Location for Staging.
None: No encryption.
SSE KMS: Encrypt the data according to a key stored on KMS.
SSE S3: Encrypt the data according to a key stored on an S3 bucket
KMS Key ID Select (AWS Only) The ID of the KMS encryption key you have chosen to use in the Encryption property.
This property only becomes available if a KMS key has been set.
Load Options Multiple Select Clean Staged Files: Destroy staged files after loading data. Default is On.
String Null is Null: Converts any strings equal to "null" into a null value. This is case sensitive and only works with entirely lower-case strings. Default is Off.
Recreate Target Table: Choose whether the component recreates its target table before the data load. If Off, the component will use an existing table or create one if it does not exist. Default is On.
File Prefix: Give staged file names a prefix of your choice. The default setting is an empty field.
Trim String Columns: Remove leading and trailing characters from a string column. Default is On
Use Grid Variable: Check this checkbox to use a grid variable. This box is unchecked by default.

Strategy

Connect to the RDS database and issue the query. Stream the results into objects on S3. Then create or truncate the target table and issue a COPY command to load the S3 objects into the table. Finally, clean up the temporary S3 objects.

Video