API v1 - Metadata
  • Dark
    Light

API v1 - Metadata

  • Dark
    Light

Important Information

  • This endpoint can only be accessed on Matillion ETL version 1.54.7 and onwards.
  • This endpoint is an Enterprise Mode feature.
  • This endpoint is currently only available for Matillion ETL for Snowflake and Matillion ETL for Redshift.
  • Credentials on COPY SQL statements will be redacted before being logged.

Overview

The Matillion ETL Metadata API lets you determine where data in Matillion ETL jobs has originated from. SQL queries that update data—insert, update, delete—or that alter database structures (for example, when creating a table) are logged in a single database table, along with metadata pertinent to the Orchestration or Transformation Job, as well as task information for the component.

This endpoint allows you to retrieve the SQL queries executed by a specific project against a specific environment. SQL queries that execute DDL or DML operations as part of a job run will be logged, as well as SELECT queries run on source systems as part of data ingestion.

This information can be used to determine the lineage of data staged by Matillion ETL and how this is transformed into new target tables in your CDW.

Please Note

Entries older than seven days are removed daily from the table.



Record Field Explanations

Below is an example JSON block returned from the server when the endpoint is called. Beneath the JSON block is an explanation of each field.

{
  "taskType": "Run",
  "loggedTimestamp": 1619017102402,
  "startTimestamp": 1619017086243,
  "endTimestamp": 1619017102383,
  "urlPath": "/#test/test/default/lin_orch",
  "componentName": "Database Query",
  "componentCanvasName": "Stage - MySQL - Orders",
  "sourceConnectionUrl": "jdbc:mysql://mysql.eu-west-1.rds.amazonaws.com/tpcc",
  "sourceUsername": "admin",
  "environment": {
                  "name": "test"
                  "url": "matillion",
                  "schema": "PUBLIC",
                  "database": "PRODUCT",
                  "user": "username",
                  "connectionOptions": {}
                  },
  "job": {
          "name": "lin_orch",
          "type": "ORCHESTRATION"
          },
  "sql": [
            {
                "query": "SELECT \n\t`o_id`, \n\t`o_w_id`, \n\t`o_d_id`, \n\t`o_c_id`, \n\t`o_carrier_id`, \n\t`o_ol_cnt`, \n\t`o_all_local`, \n\t`o_entry_d`\nFROM `orders`",
                "dialect": "MySQL"
            },
            {
                "query": "truncate \"PRODUCT\".\"PUBLIC\".\"jc_orders\"",
                "dialect": "Snowflake"
            },
            {
                "query": "COPY INTO \"PRODUCT\".\"PUBLIC\".\"jc_orders\" (\"o_id\", \"o_w_id\", \"o_d_id\", \"o_c_id\", \"o_carrier_id\", \"o_ol_cnt\", \"o_all_local\", \"o_entry_d\")\n FROM 's3://bucket/XXXXXX' CREDENTIALS = (XXXXXX) FILE_FORMAT=(TYPE='CSV'FIELD_DELIMITER='\\t' COMPRESSION='GZIP' TIMESTAMP_FORMAT='auto' DATE_FORMAT='auto' TIME_FORMAT='auto' TRIM_SPACE=TRUE NULL_IF=('\\\\N') EMPTY_FIELD_AS_NULL=FALSE)\nON_ERROR=ABORT_STATEMENT TRUNCATECOLUMNS=TRUE",
                "dialect": "Snowflake"
            }
        ]
    }
Field Description
taskType Describes how the component was run, using the display name in the DatabaseTaskBatchType enumeration. For example, "Run", "Schedule", "API"
loggedTimestamp The time in milliseconds (ms) denoting when the line was logged.
startTimestamp Value in milliseconds (ms) denoting when the component started a run.
endTimestamp Value in milliseconds (ms) denoting when the component completed a run.
urlPath The path appended to the host. Used for accessing the Orchestration or Transformation Job containing this component.
componentName The name of the underlying Matillion ETL component. For example, Truncate Table.
componentCanvasName The name of the component as defined by the user in the component's Name property.
sourceConnectionUrl The connection URL configured on certain components. This parameter is restricted to a select few components, and will have a value of "Not Applicable" elsewhere.
sourceUsername The username for the data connection configured on certain components. This parameter is restricted to a select few components, and will have a value of "Not Applicable" elsewhere.
environment Details of the Matillion ETL environment used to execute the job, such as which Cloud Data Warehouse (CDW) database and schema are the defaults.
job A block of JSON data containing information about the Matillion ETL job.
SQL All of the SQL related to the running of the component, presented in tuples to include the SQL dialect.



Lineage Log API Endpoints


API Base URI

http(s)://<host>/rest/v1/group/name/<projectGroupName>/project/name/<projectName>/environment/name/<environmentName>

API Endpoint and Function

The Lineage API endpoint is available on standard RESTful APIs that use HTTP or HTTPS requests to GET data. The Lineage API service is accessed via the Uniform Resource Identifier (URI). All reference below will assume the API Base URI has been specified.

The available API endpoint is listed below.

Method Path URI Function
GET /Lineage /lineage/?startTimestamp=<value>&endTimestamp=<value> Get a lineage log of each component in a Matillion ETL job.

Full endpoint example:

http(s)://<host>/rest/v1/group/name/<projectGroupName>/project/name/<projectName>/environment/name/<environmentName>/lineage?startTimestamp=<value>&endTimestamp=<value>



API Parameters and Description

Below you will find the endpoint parameters and a description of each.

Parameter Description Condition
startTimestamp The earliest possible logged time for records to be returned. Takes the format of UTC time in milliseconds. Required
endTimestamp The latest possible logged time for records to be returned. Takes the format of UTC time in milliseconds. Optional

When run, the endpoint returns a list—in ascending order—of records logged after (and including) the startTimestamp parameter, and before (and including) the passed endTimestamp parameter.

Please Note

By default, no more than 30,000 records can be returned per call. However, this limit can be modified if required. Please contact support if this applies to you.


What's Next