API Profiles - RSDs
  • Dark
    Light

API Profiles - RSDs

  • Dark
    Light

Overview

Matillion can extract and load data from most JSON and XML-based REST APIs, and also from some SOAP APIs (those which behave RESTfully).

Some positive indicators are:

  • You can get data from the API using a cURL or Postman.
  • The returned data contains an array (or repeating element) at the granularity you require.
  • The documents are regular i.e. the structure does not differ record by record.
  • If the API uses Paging, it provides a mechanism that you can use to navigate from page to page (e.g. providing an URL that points to the next page).

Important Information

Matillion's API Query CANNOT load data from:

  • Data other than valid JSON or XML.
  • Binary or unstructured data, such as PDF or Word documents, or images.
  • HTML
  • APIs where the data contains embedded CSVs.
  • APIs which return XML embedded inside JSON, or vice versa.
  • APIs in which the returned content is encoded or encrypted.

Matillion's API integration is known as an "API Query" and is managed through the ProjectManage API ProfilesManage Query Profiles.

Manage Query Profile

Manage Query Profiles

Integrating with a new API includes following process:

  1. Load and Save the JSON response of the API
  2. Creating API Query Profile
  3. Creating the RSD file
  4. Adding Parameters
  5. Pagination

Load and Save the JSON response of the API

We will be using an Open API Endpoint Finance API as an example to generate RSD file in this guide. This API is simple, universally available, and requires no authentication.The API returns a finance chart with specific details and is returned as a JSON document with a nested array. This can be accessed from a web browser by navigating to the URL: https://query1.finance.yahoo.com/v8/finance/chart/SNOW.

Before we start creating a new profile, please make sure you download and save the response in JSON format, as example "finance.json".

  • Use Postman (or cURL) to get a response back from the API endpoint you are trying to interface with, and save that data as a JSON or XML file.
  • Use the file to help build the first RSD of your API Query Profile.

Make a GET request into the postman and save the response into JSON format.

  1. Select the method (the action type) from the dropdown. The default method is GET.
  2. Input the API endpoint URL https://query1.finance.yahoo.com/v8/finance/chart/SNOW in the "Enter request URL" field.
  3. Save the server response in JSON or XML format. Click the Body tab, and perform the following tasks:
    • Click the raw button.
    • Select JSON from the dropdown.
  4. Save the response into the JSON format using Save Response.

Postman Response of API

Postman Response of API

Verify that the file contains an array or repeating element at the desired granularity. Take a note of the logical path to this repeating group.

Please Note

If you are able to access the API from your workstation, it does not mean that your Matillion server will also be able to access the API. Check with your network administrator to ensure that you have given your Matillion ETL instance appropriate outbound network access.


Creating an API Query profile

Once you have your data saved into a local file you can create your new API Query Profile from the Matillion Query Profiles interface using Project Manage API Profiles Manage Query Profiles .

  1. Click Project Manage API Profiles Manage Query Profiles to open the Manage Query Profiles interface window. Click to add a new Query profile.

    Manage Query Profiles

    Manage Query Profiles

    Please Note

    Matillion ETL already has several "out of the box" API profiles, including one for accessing its own REST API.

  2. In the Add Query Profile pop-up window, enter "FinanceAPI" (as example here) into the Profile Name field and click OK.

    Add API Profile

    Add API Profile

  3. On returning to the Manage Query Profiles pop-up window, find "FinanceAPI" in the API Profiles list, and click. This should open Configure API Profile pop-up window. Click New File to create a new Query file in the interface window. New File

    New File

  4. The New File button takes you to the Select File Type window. Here you need to choose the type of file you want to create. Select Create Sample File if you want to upload the saved JSON file.Then, click Next. Select File Type

    Select File Type

  5. Please Note

    • You are allowed to add files in various formats such as JSON, XML, RSD or RSB.
    • Select "Create Config File" if you want to upload/write an RSD or RSB file.
    • Select "Create Sample File" if you want to upload a JSON or XML sample file.
  6. Next to the Create File window, browse the JSON file you saved before. Create File

    Create File

  7. Review the file using the Validate button. Once validated successfully, click Finish. Validate File

    Validate File

  8. The "finance.json" Query profile will be created in Configure Query Profile window. JSON File created

    JSON File created


Creating the RSD file

Once a new Query profile is created, generate a new RSD file using the created finance.json Query profile.

  1. On the Configure Query Profiles window, click New Endpoint, to generate a new "RSD" script.

    New Endpoint

    New Endpoint

  2. On the next window, provide the name of the table in the Source details. You can also provide a description. To use JSON filefinance.json to generate the RSD file, click on the checkbox Use Sample File and select the file from the dropdown. Click Next. Source details

    Source details

  3. Please Note

    • If you click the box "Use Sample File" and select JSON file from the dropdown, the JSON response will be generated automatically based on the "finance.json" Query profile which was created before.
    • If you uncheck the box "Use Sample File", then at the Endpoint Configuration window, you need to provide Endpoint URI to fetch the data. You can preconfigure your RSD script file and add "Params" and "Auth".
  4. An API will typically return a JSON or XML document in "response" tab automatically, since you have selected the created JSON file in the previous step. Click Next. Endpoint Configuration

    Endpoint Configuration

  5. Please Note

    The Endpoint Configuration window allows you to add "Authentication" and "Parameter" attribute values using Auth and Params button in the wizard. The detail description on adding Parameters and Authentication will be discussed in the Adding Parameters section of this guide.

  6. On the Response Configuration window, you may only want specific items such as those nested in an object of the JSON. The repeat element allows you to specify that object, so that you can define your "xpath" from there, rather than the start of the document each time.

    If you want your "xpath" to start from the beginning of your object, just click Set Repeating Element using right click on the tree view fields (/charts/result) into the "Repeat Element" field. Enable the Paging feature and select the required strategy from the dropdown. Select the field from the tree view (using right click on the field) to populate the xpath, which points to where the data is in the JSON or XML document returned by the API.

    Response Configuration

    Response Configuration

  7. Please Note

    Enable "Paging" and select your strategy in the dropdown. The paging feature will make each returned data set have a single, top-level element "NextPageURI", which points to the URL the client should use to fetch the next page.

    For detailed desription of paging and its strategies, please visit the Paging section of this guide.

  8. If all the previous steps have been correctly configured, you should see a sample of the data in Data Preview window. You can also review your configurations by clicking "Config Review" button. When you click Finish, the details you have entered will be generated into an .rsd script, which you can then edit manually. Data Preview

    Data Preview

  9. Returning to the Configure Query Profile window, the file "finance.rsd" will now be open in the tabular mode. Click Advanced Mode at the top of the screen, and it will convert into the RSD file, which can be modified manually.

    Paste RSD file content

    Paste RSD file content

  10. Next, click Test to check the RSD file is working correctly. This will invoke the API and add details to the Tables panel. Clicking on the table name will then display its contents at the bottom of the Configure API Profile window.

    Test API Profile

    Test API Profile

Important Information

  • View the sample data at the bottom of the dialog and make sure it is retrieving the data at the expected granularity.
  • You may wish to adjust the fields returned, which you can do by editing the <attr> elements near the top of the RSD (lines 6 to 10 in the example above).

Please Note

Once a working example is available, it is usually best to create a second RSD file in the same profile to experiment with. Choose a different name for this, copy and paste the content from the working file, and use Test to verify that any changes made are working as expected.

Testing Response after change in RSD file

For testing, in the above example , we have removed the column "timestamp", and "timezone". Then click Test to check if the RSD API profile still gives the expected result.

Test Again after changes-07

Test Again after changes in RSD


Switching to Real API

Once you have your API Query configured and successfully tested against your static first page file, you can switch to using the real API.

Replace the "uri" value with the API endpoint

This process involves updating the "uri" attribute to the one you used in the original Postman command:

Before:

JSON file URI

Test Again after changes

After

Real Endpoint URI

Real Endpoint URI

After you change the URI in the RSD file, test the response. It will fetch the data from the real API. Matillion will attempt to contact the real API over the network. Data should appear at bottom right.

If no data appears and it takes a long time (e.g. exactly 60 or 120 seconds), or if you receive timeout errors, review your network connectivity using the Network Connectivity Test shared job as described earlier.

The displayed data might be different from that displayed previously, since it's coming from the real API this time. However, it should be at the same granularity.


Adding Parameters

Matillion Query Profile allows you to add parameters in the RSD file by editing it manually. You can also add parameters at the Endpoint Configuration window in the process of generating RSD file at step3.

Add Parameters in RSD file

This section will discuss about adding parameter and values by editing the RSD file manually as shown below.

Real Endpoint URI-After adding Param

Real Endpoint URI-After adding Param

Once you finish adding Parameters in the RSD file, you need to add parameter attributes value in the "connection " settings in the Configure Query Profile window using icon.

Configure Parameter attributes value

Configure Parameter attributes value

Add Parameters in the Wizard

Matillion Query profile wizard allows you to add parameters in the process of generating RSD file. You can do that at step3, Endpoint Configuration window by providing real API Endpoint URI.

Add Parameter in Wizard

Add Parameter in Wizard

Once, you finish adding parameter and values in the wizard, the generated RSD file as shown below:

RSD File - After Adding Parameter in Wizard

RSD File - After Adding Parameter in Wizard

For more information on configuring or adding paramters, you can refer Using Parameters in the API Profile at the Matillion documentation.

Add Authentication in RSD file

Many different authentication methods are available to authenticate RSD files. Often you can setup authentication in the API endpoint at the Endpoint Configuration window, as shown in the image below:

Adding Auth

Adding Auth

For detail description about the authentication you can refer Authentication with API Profile at the Matillion documentation.


Pagination

Most endpoints that returns a list of entities will need to have some sort of pagination. Without pagination, a simple search could return millions or even billions of hits causing extraneous network traffic. Paging requires an implied ordering. By default this may be the item's unique identifier, but can be other ordered fields such as a created date.

API paging can be enabled at Response Configuration window, step 4 of generating RSD file by clicking Paging button.

Paging Enable

Paging Enable

Please follow the description of the each paging strategy available below:

Full Path Pagination

This pagination feature will add the "pageURLPath" to the field you want to be next. You can select the field from the tree view on the left and Set Repeating Paging Element to populate the xpath.

Full Path Paging

Full Path Paging

If you set repeating paging element as "/chart/result", every time it will see "/chart/result" it will move to the next page. Example shown below:

Paging Example

Paging Example

Link Header Pagination

Insert a parameter into the document; the example here is "page size" set as the parameter used to indicate the page size. Enter the Page Size number.

Link Header

Link Header

Below is the field description for Link Header Pagination.

  • Page Size Parameter: This is the parameter used to indicate the page size.
  • Page Size: The number of the records return per page.

Below is the example of Link Header Pagination in the endpoint response.

Link Header Example

Link Header Example

Offset-based pagination

This is the simplest form of paging. Offset became popular with apps using SQL databases which already have LIMIT and OFFSET as part of the SQL SELECT Syntax. Offset pagination will add a record count in the doument by selecting the field from the tree view on the left and Set Record Count to populate the xpath on the total number of records.

Below is the field description for Offset-based Pagination.

  • Limit: This number of records to return per page.
  • Record Count: Select a field from the tree to count the total number of records.
Offset based pagination

Offset based pagination

Below is the example of offset based pagination in the endpoint response.

Offset based pagination

Offset based pagination

Cursor-based Pagination

This lets you insert a cursor into the document. You can select the cursor from the tree view to the left by right clicking on the field and then select Set Cursor Paging Element.

Below is the field description for cursor based pagination.

  • Cursor: Select a field from the tree to set as cursor.
  • Cursor parameter: The name of the cursor query parameter.
  • Page Size: The number of records returns per page.
  • Page Size parameter: The name of the page size query parameter.
Cursor Based Pagination

Cursor-based Pagination

Below is the example of cursor-based pagination in the endpoint response .

Cursor Based Example

Cursor-based Example

Page-based pagination

Page based pagination allows you to sort the API call response by page number, how many pages you want and the last page element, to be set by right clicking to the field from the tree view and selecting Set Last Page.

Below is the field description for page based pagination.

  • Page Size Parameter: The name of the page size query parameter.
  • Page Size: The number of records returns per page.
  • Page Number Parameter: The name of the page number parameter.
  • Last Page: Select how the last page will be handled and the field from the tree containing the last page.
  • Out of Range: If you select "Out of Range", there are further two options to select from the dropdown "Empty Page" and "Error Code"

Please Note

You can either specify the last page you want to call (last page), or just carry on until the records run out (out of range).

Page Based Pagination

Page-based Pagination

Below is the example of Page based pagination in the endpoint response.

Page Based Example

Page-based Example

Important Information

When ever a parameter needs to be specified, it is because the name of the element (for example, CURSOR) used in the RSD may be different to the query used in the response call URI. This applies to all paging options.


RSD File Atonomy

Each RSD "file" in an API Profile describes how to handle one of the endpoints in the API. The main features are:

Column Definitions to specify column behavior and use XPaths
 <api:info title="finance" desc="Generated schema file." xmlns:other="http://apiscript.com/ns?v1">

Column Name, type and size (editable)

 
<!-- You can modify the name, type, and column size here. -->
<attr name="meta.chartPreviousClose"   xs:type="double"  readonly="false"              other:xPath="/json/chart/result/meta/chartPreviousClose"    />
<attr name="meta.currency"  xs:type="string"  readonly="false"       other:xPath="/json/chart/result/meta/currency"    />
<attr name="meta.symbol"  xs:type="string"  readonly="false"             other:xPath="/json/chart/result/meta/symbol"  />
<attr name="meta.timezone"   xs:type="string"  readonly="false"              other:xPath="/json/chart/result/meta/timezone"   />
<attr name="meta.validRanges"   xs:type="string"  readonly="false"              other:xPath="/json/chart/result/meta/validRanges"   />
<attr name="timestamp"  xs:type="string"  readonly="false"             other:xPath="/json/chart/result/timestamp"  />
<input name="rows@next" desc="Identifier for the next page of results. Do not set this value manually." />
</api:info>

Column XPaths are relative to a RepeatElement that splits the JSON into rows

  <api:set attr="RepeatElement" value="/prefixes" />

URI details

 <api:set attr="uri" value="https://query1.finance.yahoo.com/v8/finance/chart/SNOW" />

Paging is enabled and defined

<api:set attr="pageurlpath" value="/next_page" />
<api:set attr="JSONPath" value="/chart/result" />

URI Parameters Setup

  <api:set attr="paramname#1" value="range" />
<api:set attr="paramvalue#1" value="[_connection.range]" />
<api:set attr="paramname#2" value="includePrePost" />
<api:set attr="paramvalue#2" value="[_connection.includePrePost]" />
<api:set attr="paramname#3" value="interval" />
<api:set attr="paramvalue#3" value="[_connection.interval]" />

These param values setup automatically as we have configured the parameters in the "Endpoint Configuration" window in the wizard. You're allowed to add these parameter and parameter values manually or modify them if required in the RSD file.

Describe the REST operation (an HTTP "GET" in this example)
<api:script method="GET">
<api:set attr="method" value="GET"/> 

Inform Matillion what kind of API it is dealing with

  <api:call op="jsonproviderGet">

In this example, a "jsonproviderGet" is used, but there is also an equivalent "xmlproviderGet".



Further Support

It is sometimes useful to be able to send API Query Profile data from one place to another without having to copy-and-paste.

Exporting

For example, if you wish to export all your work on FinanceAPIrun, run the following:

curl -k -X GET -u "myUsername:myPassword" -o finance.json
https://www.the-matillion-etl-server-for-mycompany.com/rest/v1/apiprofile/name/FinanceAPI/export

You should find a JSON file named FinanceAPI.json in your local filesystem. It can be shared with another Matillion user who will be able to import it as shown below.

Importing

To import a new API Query Profile into your Matillion instance, run the following:

curl -k -X POST -u "myUsername:myPassword" "https://www.the-matillion-etl-server-for-mycompany.com/rest/v1/apiprofile/import" -H "Content-Type: application/json" --data-binary @AnotherAPIProfile.json

A user with API privilege will get a username and password, and the file "AnotherAPIProfile.json" exists in your local filesystem. You'll receive a message like:

{ "name" : "API Profiles", "statusList" : [ { "success" : true, "name" : "AnotherAPIProfile" } ], "success" : true }

Next time you enter the Manage API Profiles screen, you should find a new entry for the newly imported API Profile.

Please Note

This method of importing a new API Profile will silently and completely overwrite any existing API Profile of that name.