Salesforce Incremental Load
  • Dark
    Light

Salesforce Incremental Load

  • Dark
    Light

This article is part of the series target="blank">Incremental Load Tools and Shared Jobs.

Overview

The Salesforce Incremental Load component lets users configure a pre-existing (read-only) Shared Job that will perform an incremental load from Salesforce. When the component is added to the Matillion ETL canvas, the Salesforce Incremental Load wizard is activated. Once this wizard is finalized, a unique component is added to the canvas, with a custom configuration provided by the user's setup choices.

Users should schedule their incremental load jobs to run periodically for the job to continually update the created tables. To learn more about scheduling, read Manage Schedules.

In the Components panel, type "salesforce incremental load" to locate the component and add it to the canvas.


Salesforce Incremental Load Setup (Snowflake)

1. Connection Details

Page 1 of the wizard gives users a brief explanation of the Salesforce Incremental Load generator, and requires basic connection information.

Salesforce OAuth

Here, users can select a configured Salesforce OAuth entry from the dropdown list. To add a new OAuth entry, or edit or remove existing entries, click the Manage button and refer to Salesforce Query Authentication Guide to learn how to set up and authorize a Salesforce OAuth entry.

In this example. an OAuth entry named "Salesforce" has been selected.

Parameter | Value

Next, specify any connection options. These exist as parameter=value pairs. To add a parameter, click . Users can consult the Salesforce Data Model for more information about connection options.

Sync Deleted Records

Check this box if you wish to synchronize record deletions from source data to the target table. By default, this box is not checked.

Click Next.

2. Data Sources

Page 2 of the wizard focuses on the data sources (tables) to load. If the OAuth entry and connection options were applied successfully, page 2 will show the success message.

Data Sources

Use the arrow buttons to select which data sources to add to the incremental load. Data sources in the left column are not set to be included; data sources in the right column will be included in the incremental load.

By default, the right column is empty, and users must manually add the columns they wish to load.

In this example, the data sources "Account" and "AcceptedEventRelation" have been selected for the incremental load.

Use the text fields above the columns to refine your searches.

Click Next.

3. Columns

Page 3 of the wizard requires users to confirm the columns to be loaded from each selected data source.

Select Columns

Click to navigate into a similar multiple-select dialog that is unique to the data source and its columns. From here, use the arrow buttons to add or remove any columns. By default, all columns are set to be loaded.

Click Next.

4. Staging Configuration

Page 4 of the wizard requires users to specify details for data staging.

Property Setting Description
Staging Table Prefix String Specify a prefix to be added to all tables that are staged.
Staging Warehouse Select Select the staging warehouse.
Staging Database Select Select the staging database.
Staging Schema Select Select the staging schema.

Click Next.

5. Target Configuration

Page 5 of the wizard requires users to specify target data warehouse details.

Property Setting Description
Target Table Prefix String Specify a prefix to be added to all tables in the load.
Target Warehouse Select Select the target warehouse.
Target Database Select Select the target database.
Target Schema Select Select the target schema.
Concurrency Select Select whether to load data in a concurrent or sequential method.

Click Create & Run to finish the setup, or else click Back to cycle back through the wizard and make any desired changes.


Salesforce Incremental Load Setup (Redshift)

1. Connection Details

Page 1 of the wizard gives users a brief explanation of the Salesforce Incremental Load generator, and requires basic connection information.

Salesforce OAuth

Here, users can select a configured Salesforce OAuth entry from the dropdown list. To add a new OAuth entry, or edit or remove existing entries, click the Manage button and refer to Salesforce Query Authentication Guide to learn how to set up and authorize a Salesforce OAuth entry.

In this example an OAuth entry named "Salesforce" has been selected.

Parameter | Value

Next, specify any connection options. These exist as parameter=value pairs. To add a parameter, click . Users can consult the Salesforce Data Model for more information about connection options.

Sync Deleted Records

Check this box if you wish to synchronize record deletions from source data to the target table. By default, this box is not checked.

Click Next.

2. Data Sources

Page 2 of the wizard focuses on the data sources (tables) to load. If the OAuth entry and connection options were applied successfully, page 2 will show the success message.

Data Sources

Use the arrow buttons to select which data sources to add to the incremental load. Data sources in the left column are not set to be included; data sources in the right column will be included in the incremental load.

By default, the right column is empty, and users must manually add the columns they wish to load.

In this example, the data sources "Account" and "AcceptedEventRelation" have been selected for the incremental load.

Use the text fields above the columns to refine your searches.

Click Next.

3. Columns

Page 3 of the wizard requires users to confirm the columns to be loaded from each selected data source.

Select Columns

Click to navigate into a similar multiple select dialog that, but that is unique to the data source and its columns. From here, use the arrow buttons to add or remove any columns. By default, all columns are set to be loaded.

Click Next.

4. Configuration

Page 4 of the wizard requires users to specify data warehouse details.

Property Setting Description
Staging Bucket Select Select the S3 bucket from the dropdown list for data staging. The available buckets depend on the selected Redshift cluster.
Staging Table Prefix String Specify a prefix to be added to all tables that are staged.
Stage Schema Select Select the Redshift schema via which tables will be staged.
Target Table Prefix String Specify a prefix to be added to all tables in the load.
Target Schema Select Select the Redshift schema into which tables will be loaded.
Target Distribution Style Select Select the distribution style.
  1. All Copy rows to all nodes in the Redshift Cluster.
  2. Even Distribute rows around the Redshift cluster evenly. This is the default setting.
Concurrency Select Select whether to load data in a concurrent or sequential method.

Click Create & Run to finish the setup; or else click Back to cycle back through the wizard.


Salesforce Incremental Load Setup (BigQuery)

1. Connection Details

Page 1 of the wizard gives users a brief explanation of the Salesforce Incremental Load Generator, and requires basic connection information.

Salesforce OAuth

Here, users can select a configured Salesforce OAuth entry from the dropdown list. To add a new OAuth entry, or edit or remove existing entries, click the Manage button and refer to Salesforce Query Authentication Guide to learn how to set up and authorize a Salesforce OAuth entry.

In this example an OAuth entry named "Salesforce" has been selected.

Parameter | Value

Next, specify any connection options. These exist as parameter=value pairs. To add a parameter, click . Users can consult the Salesforce Data Model for more information about connection options.

Sync Deleted Records

Check this box if you wish to synchronize record deletions from source data to the target table. By default, this box is not checked.

Click Next.

2. Data Sources

Page 2 of the wizard focuses on the data sources (tables) to load. If the OAuth entry and connection options were applied successfully, page 2 will show the success message.

Data Sources

Use the arrow buttons to select which data sources to add to the incremental load. Data sources in the left column are not set to be included; data sources in the right column will be included in the incremental load.

By default, the right column is empty, and users must manually add the columns they wish to load.

In this example, the data sources "Account" and "AcceptedEventRelation" have been selected for the incremental load.

Use the text fields above the columns to refine your searches.

Click Next.

3. Columns

Page 3 of the wizard requires users to confirm the columns to be loaded from each selected data source.

Select Columns

Click to navigate into a similar multiple select dialog that, but that is unique to the data source and its columns. From here, use the arrow buttons to add or remove any columns. By default, all columns are set to be loaded.

Click Next.

4. Staging Configuration

Page 4 of the wizard requires users to specify details for data staging.

Property Setting Description
Cloud Storage Area Select Select the Cloud Storage bucket to stage the data.
Staging Table Prefix String Specify a prefix to be added to all tables that are staged.
Staging Project Select Select the staging project.
Staging Dataset Select Select the staging dataset.

Click Next.

5. Target Configuration

Page 5 of the wizard requires users to specify target data warehouse details.

Property Setting Description
Target Table Prefix String Specify a prefix to be added to all tables in the load.
Target Project Select Select the target BigQuery Project.
Target Dataset Select Select the target dataset.
Concurrency Select Select whether to load data in a concurrent or sequential method.

Click Create & Run to finish the setup; or else click Back to cycle back through the wizard.



Completion

Upon completion of the wizard, a Salesforce Incremental Load component will be present on the canvas. This component is not identical to the Salesforce Query component, but it does utilise the same ideas to perform an incremental load from Salesforce.

To make changes to the component, simply click on it, and click into any of the component's properties—exactly as you would with any other Matillion ETL component. Users should avoid making changes to the component during a running workflow.



Enable schema drift

Schema drift support accommodates changes made to the source data, such as:

  • Missing columns as a result of changes in the Salesforce schema. Missing columns are loaded as NULL.
  • Data type changes for specified columns in the shared job configuration. For further information, see below.
  • Tables no longer present in the Salesforce schema. Any missing tables will no longer be loaded. However, your shared job will fail. All other tables specified as part of the configuration in the shared job will be loaded. If this scenario occurs, edit your shared job Table and Columns grid variable to remove the missing table.
  • Manual addition of new tables or columns via the Table and Columns grid variable within the existing shared job configuration. If a new table or column is added to your source, it is not added to the shared job configuration as a default behavior. However, any new tables or columns can be added manually.

Data Type changes will also be accommodated, but if these are not compatible changes for the target cloud platform, the current column will be renamed as <column_name>_datetime and the column re-purposed as the new data type. The format of the datetime extension is _yyyymmddhhmmss, e.g._20210113110334 and will be the same for all columns in the same table in the same shared job configuration. The new column will be NULL up to the date of change—this should be considered for downstream dependencies such as views and reports.

Upon completion of the wizard, view the list of the component's properties, and click next to Automatically Update Target Metadata. Delete "No" and replace it by typing "Yes" into the text field provided. Then, click OK to save the change, and enable schema drift support.



Blacklisted Tables

The following tables have been blacklisted:

  • AccountHistory
  • Announcement
  • ContactHistory
  • ContentDocumentLink
  • ContentFolderItem
  • ContentFolderMember
  • ContentHubItem
  • DataStatistics
  • DatacloudAddress
  • EntityParticle
  • FeedAttachment
  • FeedItem
  • FeedRevision
  • FieldDefinition
  • FlexQueueItem
  • LeadHistory
  • LeadShare
  • ListViewChartInstance"
  • Note
  • OutgoingEmail
  • OutgoingEmailRelation
  • OwnerChangeOptionInfo
  • PicklistValueInfo
  • PlatformAction
  • RelationshipDomain
  • RelationshipInfo
  • SearchLayout
  • SiteDetail
  • TaskFeed
  • TaskRelation
  • UserEntityAccess
  • UserFieldAccess
  • Vote


Video