Data Staging Components
  • Dark
    Light
  • PDF

Data Staging Components

  • Dark
    Light
  • PDF

Overview

Matillion ETL comes with many components that can retrieve data from other services and load that data into a table. These components can be called data staging component, data stagers, connectors, query components or integrations, interchangeably.

Each data stager retrieves data via API calls using SQL queries that can filter the data before it reaches the cloud data warehouse. The data is loaded into a table of your choosing and by default will destroy and recreate that table before loading the data.

Note

You can avoid the target table being destroyed and recreated by setting Recreate Target Table to Off in the Load Options property.

Query components in Matillion ETL come with a data model that describes connection options, tables, and views associated with the collection of data from that particular service. If there is no data model associated with the component, it likely uses the Matillion ETL API Query model and/or a custom API query profile.

Most data stagers tend to work in the same way: you enter relevant credentials to access the service ,and opt to use either a Basic or Advanced mode for your queries.

Note

For some data staging components, an Incremental Load Generator is available which can be considerably easier to use than creating your own incremental loads.



Basic Query

When a Query component is set to Basic Mode, the component will come with many properties that the user can set (usually from a predefined list) and will build an SQL query from those choices. The user does not write any SQL themselves but instead can select a data source and columns that can then be filtered.



Advanced Query

When a Query component is set to Advanced Mode, a property named SQL Query is available, and users can write their own SQL queries to the service via the SQL editor, shown below.

Advanced query SQL editor

For most Query components, available data sources and their columns are listed in the Metadata Explorer panel. Environment variables are listed in the Variables panel. Columns and variables can be brought into the SQL by double-clicking them, or you can type their names directly into the SQL.

Test your SQL by clicking Sample. This will return a sample of the queried data, up to a specified row count Limit (or 10 rows by default).



STL Load Errors

When an error occurs loading data through these components, you will be notified of the error within the Tasks tab for that run.

Tasks tab

Click STL Load Errors to open a dialog with verbose error output.

Verbose STL load error message

Each error in the output is given as a collapsible box. Listed is its position in the data, what the offending row holds, and the reason for the error. These results are paginated and can be explored using the Next and Back buttons.



Debug Mode

If you are experiencing issues with a data staging component, we encourage you to contact Matillion support. Please provide us with a Debug Log when you contact support. This can be generated from most data staging components.

To generate a Debug Log, find the Auto Debug property at the bottom of a component and turn it On. Turning on Auto Debug activates the Debug Level property. A description of these properties is given below:

Property Setting Description
Auto Debug Boolean When On, debug information about your query is automatically logged. These logs can be found in the Task History. Please include these logs in support requests concerning the component. Turning on this property will override any debugging connection options.
Debug Level Select The level of verbosity with which your debug information is logged. Levels above 1 can log huge amounts of data and result in slower execution. The levels are:
  1. Logs the query, the number of rows returned, the start time of execution, the amount of time taken, and any errors.
  2. Logs everything included in Level 1, and also cache queries and additional information about the request, if applicable.
  3. Additionally logs the body of the request and the response.
  4. Additionally logs transport-level communication with the data source. This includes SSL negotiation.
  5. Additionally logs communication with the data source and additional details that may be helpful in troubleshooting problems. This includes interface commands.

Warning

Do not leave Auto Debug On when not directly in need of it, as it can consume large amounts of disk space.



Refresh Source Schema

Sometimes a source schema that Matillion ETL is using might be changed externally from Matillion ETL. An example of this is creating a new table via console commands and then attempting to find that table in a data staging component in Matillion ETL, only to find that the table does not appear. This is due to Matillion ETL using a cache of table data for many components.

To sync the Matillion ETL client with your platform and resolve this issue, right-click the component in question and select Refresh Source Schema.



Sampling Data

Data staging components in Matillion ETL have a sample option, which allows you to display a sample of the data the component produces before committing to running your full orchestration or transformation job.

On the component's Sample tab, click Data to load the sample data into Matillion ETL. You can limit the number of rows retrieved from the dataset, if required.

Sampling data