Data Staging Components

Data Staging Components


Matillion ETL comes with many components that can retrieve data from other services and load that data into a table. These components can be called Data Stagers, Connectors, Query Components or Integrations, quite interchangeably.

Each Data Stager retrieves data via API calls using SQL queries that can filter the data before it reaches RedshiftSnowflakeBigQuery. The data is loaded into a table of your choosing and by default will destroy and recreate that table before loading the data. To avoid this, set Recreate Target Table to Off in the Load Options property

Query components in Matillion ETL comes with a Data Model that describes connection options, tables and views associated with the collection of data from that particular service which are linked alongside the components below. If there is no Data Model associated with the component, it likely uses the Matillion ETL API Query model and/or a custom API Profile.

Most Data Stagers tend to work in the same way; you enter relevant credentials to access the service and opt to use either a Basic or Advanced mode for your queries.
Note: For some Data Staging components, an Incremental Load Tool is available which can be considerably easier to use than it would be to create your own incremental loads. 

Basic Query

In Basic Mode, the Data Staging component will come with many properties that the user can set (usually from a predefined list) and will build an SQL query from those choices. The user does not write any SQL themselves but instead can select a data source and columns which can then be filtered.

Advanced Query

Using the Advanced Mode will allow users to submit their own SQL queries to the service via the SQL Editor shown below.

For most Query Components, available data sources and their columns are listed in the Metadata Explorer. Environment Variables are listed in the Variables panel. Both columns and variables can be brought into the SQL by double-clicking them.

The editor allows users to test their SQL on the fly by hitting the Sample button that will return their query results up to the entered limit.

STL Load Errors

When an error occurs loading data through these components, you will be notified of the error on the Tasks tab for that run.

Clicking the STL Load Errors will bring up a new dialogue with verbose error output.

Each error in the output is given as a collapsible box listing its position in the data, what the offending row holds and the reason for the error. These results are paginated and can be explored using the Next and Back buttons.


Debug Mode

If you are experiencing issues with any given Data Staging component, we encourage you to Contact Support. In order to get the best possible help in the quickest time, it is highly recommended that you attach a Debug Log that can be readily generated from most Data Staging components.

To generate a Debug Log, find the Auto Debug property at the bottom of your component and turn it On. A debug level can then be selected. A description of these properties is given below:


Property Setting Description
Auto Debug Select Choose whether to automatically log debug information about your load. These logs can be found in the Task History and should be included in support requests concerning the component. Turning this on will override any debugging Connection Options.
Debug Level Select The level of verbosity with which your debug information is logged. Levels above 1 can log huge amounts of data and result in slower execution.
1: Will log the query, the number of rows returned by it, the start of execution and the time taken, and any errors.
2: Will log everything included in Level 1, cache queries, and additional information about the request, if applicable.
3: Will additionally log the body of the request and the response.
4: Will additionally log transport-level communication with the data source. This includes SSL negotiation.
5: Will additionally log communication with the data source and additional details that may be helpful in troubleshooting problems. This includes interface commands.

WARNING: Do not leave the Auto Debug as On when not directly in need of it. It can potentially consume large amounts of disk space!


Refresh Source Schema

Sometimes you might effect change of a schema that Matillion ETL is using outside of the client. An example of this might be creating a new table or view via console commands and then attempting to find that table in a Data Staging component, only to find it does not appear. This is due to Matillion ETL using a cache of table data for many components.

To sync the Matillion ETL client with your platform and resolve this issue, right-click the component in question and select Refresh Source Schema.

Sampling Data

Every Data Stager Component has a Sample tab that is available when clicking the component on the canvas. Data can be sampled from the source using the Data button according to the component's current configuration; this is analogous to running the component and sampling the resulting table in a Transformation job.

Each sample is limited in its number of rows using the Limit field; however, the total number of rows that would be loaded if the component were to be run can be found by clicking the Row Count field button.