JDBC Incremental Load
  • Dark
    Light

JDBC Incremental Load

  • Dark
    Light

Incremental load icon

This article is part of the series on Incremental Load Tools and Shared Jobs.

Overview

The JDBC Incremental Load component is a tool designed to allow users to easily set up a Shared Job that will incrementally load from JDBC-compliant databases, rather than having to manually create such a job, which would require significantly more expertise.

Users should schedule their incremental load jobs to run periodically for the job to continually update the created tables. To learn more about scheduling, read Manage Schedules.

In the Components panel, type "JDBC incremental load" to locate the component, and drag it onto the canvas. The wizard will open once you drop the component onto the canvas.


JDBC incremental load setup (Snowflake)

Complete the following six pages in the wizard.

1. Database Selection

Page 1 of the wizard gives users an explanation of the wizard, as well as requiring database connection details.

  • Database Type: Select the type of database to connect to. The available database types are:

    • IBM DB2
    • Microsoft SQL Server
    • MySQL
    • Oracle
    • PostgreSQL
  • Connection URL: Pass your database's connection URL. Once you have selected a database type, Matillion ETL will automatically load a template URL relative to the database type. For example, selecting PostgreSQL as the database type provides this template: jdbc:postgresql://<host>/<database>.

  • Username: Provide the username for the database.

  • Password name: Select a configured password entry from the dropdown list. To add a new password entry, or edit or remove existing entries, click Manage. Read Manage Passwords for more information.

Click Next.

2. Connection Options

Page 2 of the wizard is for setting connection options. These are defined in parameter=value pairs. To add a new connection option, click +.

The following pages give the connection options for each supported database type:

Schema: Select the source schema. Depending on the selected database type and the selected user, this property may be hidden.

Click Next.

3. Data Sources

Page 3 of the wizard focuses on the data sources (tables) to load. If the database setup and connection options were applied successfully, page 3 will show a Success message at the top, otherwise it will show No Database Type specified.

Use the arrow buttons to select which data sources to add to the incremental load. Move data sources from the left column to the right column to include them. You can type a string into the text field at the top of the column to filter the sources shown in the column, which will aid when searching an extensive list of sources.

Click Next.

4. Data Selection

Page 4 of the wizard requires you to confirm the columns to be loaded from each selected data source.

Click the settings icon against each table to open the Select Columns dialog. In the dialog, you can set any columns as Incremental as well as define the Primary key for the table.

Click Next.

5. Staging Configuration

On page 5 of the wizard you will specify data staging details, as follows:

Property Setting Description
Staging Table Prefix String Specify a prefix to be added to all tables that are staged.
Staging Warehouse Select Select the staging warehouse.
Staging Database Select Select the staging database.
Staging Schema Select Select the staging schema.

Click Next.

6. Target Configuration

On page 6 of the wizard you will specify target data warehouse details, as follows:

Property Setting Description
Target Table Prefix String Specify a prefix to be added to all tables in the load.
Target Warehouse Select Select the target warehouse.
Target Database Select Select the target database.
Target Schema Select Select the target schema.
Concurrency Select Select whether to load data in a concurrent or sequential method.

Click Create & Run to finish the setup, or else click Back to cycle back through the wizard pages.


JDBC incremental load setup (Redshift)

Complete the following five pages in the wizard.

1. Database Selection

Page 1 of the wizard gives users an explanation of the wizard, as well as requiring database connection details.

  • Database Type: Select the type of database to connect to. The available database types are:

    • IBM DB2
    • Microsoft SQL Server
    • MySQL
    • Oracle
    • PostgreSQL
  • Connection URL: Pass your database's connection URL. Once you have selected a database type, Matillion ETL will automatically load a template URL relative to the database type. For example, selecting PostgreSQL as the database type provides this template: jdbc:postgresql://<host>/<database>.

  • Username: Provide the username for the database.

  • Password name: Select a configured password entry from the dropdown list. To add a new password entry, or edit or remove existing entries, click Manage. Read Manage Passwords for more information.

Click Next.

2. Connection Options

Page 2 of the wizard is for setting connection options. These are defined in parameter=value pairs. To add a new connection option, click +.

The following pages give the connection options for each supported database type:

Schema: Select the source schema. Depending on the selected database type and the selected user, this property may be hidden.

Click Next.

3. Data Sources

Page 3 of the wizard focuses on the data sources (tables) to load. If the database setup and connection options were applied successfully, page 3 will show a Success message at the top, otherwise it will show No Database Type specified.

Use the arrow buttons to select which data sources to add to the incremental load. Move data sources from the left column to the right column to include them. You can type a string into the text field at the top of the column to filter the sources shown in the column, which will aid when searching an extensive list of sources.

Click Next.

4. Data Selection

Page 4 of the wizard requires you to confirm the columns to be loaded from each selected data source.

Click the settings icon against each table to open the Select Columns dialog. In the dialog, you can set any columns as Incremental as well as define the Primary key for the table.

Click Next to move to the final page of the wizard.

5. Configuration

On page 5 of the wizard you will specify data warehouse details, as follows:

Property Setting Description
Staging Bucket Select Select the S3 bucket from the dropdown list for data staging. The available buckets depend on the selected Redshift cluster.
Staging Table Prefix String Specify a prefix to be added to all tables that are staged.
Stage Schema Select Select the Redshift schema via which tables will be staged.
Target Table Prefix String Specify a prefix to be added to all tables in the load.
Target Schema Select Select the Redshift schema into which tables will be loaded.
Target Distribution Style Select Select the distribution style: select All to copy rows to all nodes in the Redshift cluster, Even to distribute rows around the Redshift cluster evenly. The default setting is Even.
Concurrency Select Select whether to load data in a Concurrent or Sequential method.

Click Create & Run to finish the setup, or else click Back to cycle back through the wizard pages.


JDBC incremental load setup (BigQuery)

Complete the following six pages in the wizard.

1. Database Selection

Page 1 of the wizard gives users an explanation of the wizard, as well as requiring database connection details.

  • Database Type: Select the type of database to connect to. The available database types are:

    • IBM DB2
    • Microsoft SQL Server
    • MySQL
    • Oracle
    • PostgreSQL
  • Connection URL: Pass your database's connection URL. Once you have selected a database type, Matillion ETL will automatically load a template URL relative to the database type. For example, selecting PostgreSQL as the database type provides this template: jdbc:postgresql://<host>/<database>.

  • Username: Provide the username for the database.

  • Password name: Select a configured password entry from the dropdown list. To add a new password entry, or edit or remove existing entries, click Manage. Read Manage Passwords for more information.

Click Next.

2. Connection Options

Page 2 of the wizard is for setting connection options. These are defined in parameter=value pairs. To add a new connection option, click +.

The following pages give the connection options for each supported database type:

Schema: Select the source schema. Depending on the selected database type and the selected user, this property may be hidden.

Click Next.

3. Data Sources

Page 3 of the wizard focuses on the data sources (tables) to load. If the database setup and connection options were applied successfully, page 3 will show a Success message at the top, otherwise it will show No Database Type specified.

Use the arrow buttons to select which data sources to add to the incremental load. Move data sources from the left column to the right column to include them. You can type a string into the text field at the top of the column to filter the sources shown in the column, which will aid when searching an extensive list of sources.

Click Next.

4. Data Selection

Page 4 of the wizard requires you to confirm the columns to be loaded from each selected data source.

Click the settings icon against each table to open the Select Columns dialog. In the dialog, you can set any columns as Incremental as well as define the Primary key for the table.

Click Next.

5. Staging Configuration

On page 5 of the wizard you will specify data staging details, as follows:

Property Setting Description
Cloud Storage Area Select Select the cloud storage bucket to stage the data.
Staging Table Prefix String Specify a prefix to be added to all tables that are staged.
Staging Project Select Select the staging project.
Staging Dataset Select Select the staging dataset.

Click Next.

6. Target Configuration

On page 6 of the wizard you will specify target data warehouse details, as follows:

Property Setting Description
Target Table Prefix String Specify a prefix to be added to all tables in the load.
Target Project Select Select the target BigQuery project.
Target Dataset Select Select the target dataset.
Concurrency Select Select whether to load data in a concurrent or sequential method.

Click Create & Run to finish the setup, or else click Back to cycle back through the wizard pages.


JDBC incremental load setup (Delta Lake on Databricks)

Matillion ETL supports JDBC incremental loading on Delta Lake, but doesn't provide a component for this. Instead, we have created a shared job for use with Delta Lake. This shared job's properties must be configured when you add it to an orchestration job.

To use the shared job:

  1. In Matillion ETL, with an orchestration job open on the canvas, open the Shared Jobs panel in the bottom left.
  2. In the Shared Jobs panel, expand the Read-Only, then Matillion, then Incremental folders.
  3. Locate the JDBC Incremental Load job and drag it onto the canvas.
  4. Select the icon on the canvas to open the Properties panel, and configure the following properties.

Name = string

A human-readable name for the component.

Database Type = string

Enter the type of database to connect to. The available database types are:

  • Microsoft SQL Server
  • Oracle
  • PostgreSQL
  • MySQL
  • IBM DB2
Note

For some databases, you must first provide a JDBC driver as not all drivers can be distributed with Matillion ETL. See Manage Database Drivers for instructions.

Connection URL = string

The string used to define the connection URL of the source database. This will be passed in from the calling routine.

Username = string

The username to be supplied to connect to the source. This applies to the generic JDBC routine. Different authentication options may be required for the other routines.

Password Manager Entry = string

The name in the Matillion ETL Password Manager for the password relating to the authentication method.

Source Schema = string

The name of the schema in the source database where the tables are located. This may not be required for some databases.

Load Type = string

Enter one of the following load types:

  • Sequential: Iterations are run in sequence, waiting for each to complete before starting the next. This is the default.
  • Concurrent: Iterations are run concurrently. This requires all "Variables to Iterate" to be defined as copied variables so that each iteration gets its own copy of the variable isolated from the same variable being used by other concurrent executions. The maximum concurrency is limited by the number of available threads (2x the number of virtual CPUs on your cloud instance).

Table and Columns = grid variable

In the Tables and Columns dialog, define a grid variable containing the following details for each table requested:

  • Table Name
  • Column Name
  • Incremental Column: Enter 1 if the column is incremental, 0 if it is not.
  • Key Column: Enter 1 if the column is a key column, 0 if it is not.

Connection Options = columns editor

A list of parameters as name-values pairs. Parameters and their allowed values are database/driver specific. Referring to the data model will provide insight for what you could provide here. Connection options are usually not required as sensible defaults are assumed.

S3 Bucket Name = string

(AWS only) The name of an S3 bucket for temporary storage. Ensure your access credentials have S3 access and permission to write to the bucket. Read Manage Credentials for details on setting up access. The temporary objects created in this bucket will be removed again after the load completes.

Stage Prefix = string

A prefix value that will be added to the start of the stage table names. For example, if a stage prefix of "stage_" is specified and the table being processed is named "test_data", then the target table will be named "stage_test_data".

Stage Database = string

The database name where the staging data will be stored.

Target Prefix = string

A prefix value that will be added to the start of the target table names. For example, if a target prefix of "target_" is specified and the table being processed is named "test_data", then the target table will be named "target_test_data".

Target Database = string

The database name where the target data will be stored.

Encryption = string

(AWS only) Decide on how the files are encrypted inside the S3 bucket. Enter one of the following:

  • None: No encryption.
  • SSE KMS: Encrypt the data according to a key stored on KMS.
  • SSE S3: Encrypt the data according to a key stored on an S3 bucket.

KMS Key ID = string

The ID of the KMS encryption key you have chosen to use.

Storage Account = string

(Azure only) An Azure Storage Account to allow Matillion ETL to specify a blob container object for staging data.

Blob Container = string

(Azure only) Azure Blob Container to allow Matillion ETL to use the blob storage container object for staging data.

Log Metrics = string

Whether to add useful information to the metrics store (enter TRUE) or not (enter FALSE). The default is TRUE.

Automatically Update Target Metadata = string

For incremental loads, if the job detects metadata changes between the source and target tables, should the job automatically update the target table metadata. Enter Yes (the default) or No.


Enable schema drift

Schema drift support accommodates changes made to the source data, such as:

  • Missing columns as a result of changes in the source schema. Missing columns are loaded as NULL.
  • Data type changes for specified columns in the shared job configuration. For further information, see below.
  • Tables no longer present in the source schema. Any missing tables will no longer be loaded. However, your shared job will fail. All other tables specified as part of the configuration in the shared job will be loaded. If this scenario occurs, edit your shared job table and columns grid variable to remove the missing table.
  • Manual addition of new tables or columns via the table and columns grid variable within the existing shared job configuration. If a new table or column is added to your source, it is not added to the shared job configuration as a default behavior. However, any new tables or columns can be added manually.

Data type changes will also be accommodated, but if these are not compatible changes for the target cloud platform, the current column will be renamed as <column_name>_datetime and the column re-purposed as the new data type. The format of the datetime extension is _yyyymmddhhmmss, for example _20210113110334, and will be the same for all columns in the same table in the same shared job configuration. The new column will be NULL up to the date of change, which needs to be considered for downstream dependencies such as views and reports.

Upon completion of the wizard, view the list of the component's properties, and click ... next to Automatically Update Target Metadata. Delete "No" and replace it by typing "Yes" into the text field provided. Then click OK to save the change and enable schema drift support.


What's Next