Connecting To Any REST API

Connecting To Any REST API


Overview

Matillion can extract and load data from most JSON and XML-based REST APIs, and also from some SOAP APIs (those which behave RESTfully).

Some good positive indicators are:

  • You can get data from the API using a cURL or Postman command.

  • The returned data contains an array (or repeating element) at the granularity you require.

  • The documents are regular - i.e. the structure does not differ record by record.

  • If the API uses paging, it provides a mechanism that the client can use to navigate from page to page (for example providing a URL which points to the next page).

Important Information

    Matillion’s API Query component can not load data from:

    • Data other than valid JSON or XML.

    • Binary or unstructured data, such as PDF or Word documents, or images.

    • HTML

    • APIs where the data contains embedded CSVs.

    • APIs which return XML embedded inside JSON, or vice versa.

    • APIs in which the returned content is encoded or encrypted.

Matillion’s API integration is known as an "API Profile" and is managed through the Project Manage API Profiles menu.

Manage API Profile

Manage API Profile

Matillion can contain multiple API Profiles, and each API Profile can contain one or more "files". A "file" is either:

  • a sample data file (JSON or XML), or

  • an “RSD” file (identified by a .rsd suffix).

RSD files contain all the logic needed for Matillion to interact with an API. They are expressed in a declarative XML language.


Integrating with a new API is a four-step process:

  1. Find and understand the API documentation

  2. Load the first page

  3. Establish Paging

  4. Productionize

Important Information-2020 Updates

  • Matillion’s RSD syntax has changed significantly.

  • This document refers to the 1.41 (and later) versions of the Matillion.

  • You can force Matillion to use the latest version by adding the parameter BackwardsCompatibilityMode set to false.

  • Change Parameter

Find and understand the API documentation

This section will guide you on how the API works, the data formats and the endpoint URLs.

Your Matillion instance must be able to access those API endpoint(s) over the network. You can test the connectivity using the "Network Connectivity Test" shared job which you can download from the Data Stagers Support page. If Matillion does not have network connectivity to the API endpoint(s) you must get this resolved before continuing.

The API documentation should help you answer the following questions:

  • How do you authenticate?

  • How do you supply parameters? (is it in the URL or as a POST body).

  • Does the API provide data at the necessary granularity (for example as a JSON array)?

  • Does the API use paging, and if so how does it provide the information needed to page through the result set?


Load the first page

Having checked the API documentation, the next goal is to get Matillion to read the first page of data. This is achieved in several steps:

  • Use cURL (or Postman) to get a response back from the API endpoint you are trying to interface with, and save that data as a JSON or XML file.

  • Use the file to help build the first RSD of your API Profile.

To save the data returned by a cURL command into a file, use the "-o" option. Consult the API documentation to find the full cURL command required.

For example:

curl -o page1-raw.json http://jsonplaceholder.typicode.com/users

It’s a good idea to run the resulting file through a prettifier. This verifies that it’s valid, and also makes it much easier to understand.

For JSON, you could use "jq":

jq "." < page1-raw.json > page1.json

For XML, you could use "xmlstarlet"

xmlstarlet fo < page1-raw.xml > page1.xml

Verify that the file contains an array or repeating element at the desired granularity. Take a note of the logical path to this repeating group.

Please Note

If you are able to access the API from your workstation, it does not mean that your Matillion server will also be able to access the API. Check with your network administrator to ensure that you have given your Matillion instance appropriate outbound network access.

New API Profile

Once you have your first page of data saved into a local file you can create your new API Profile from the Matillion user interface.

Create a New API profile:

  1. In Matillion ETL, on the top left corner of the screen, click ProjectManage API Profile. Then, click + to add API profile.

  2. Manage and Add API Profile

    Manage and Add API Profile

  3. On the Add API Profile dialog window, enter the profile name and then, click OK.

  4. Add API Profile

    Add API Profile

    Please Note

    To avoid a variety of annoying problems later on, it is strongly recommended that you don’t use any spaces in the API Profile name. Use "CamelCase" instead.

  5. Now, click to edit the API profile you created,

  6. Edit API Profile

    Edit API Profile

    Add a New File:
  7. Next, on Configure API Profile window, click +

  8. Configure API Profile

    Configure API Profile

  9. On the Add File window, please fill in the required fields.

    Please Note

    • The fields will change as you select the File Type. For example: rsd,json,rsb or xml.

    • The image below is showing an example for the file type "JSON".

    • Since, we have choosen to upload a JSON file here, some of the fields disappear and are not relevant for discussion.

    Below is the description of all the available fields:
    • Filename – name the file without a .json or .xml suffix. Example here:page1

    • File Type – select the file type (rsd, json, rsb, xml) from the dropdown, accordingly fill the appropriate fields.

    • JSON File – browse the json file. You can copy-and-paste but it’s highly recommended to use the file upload button instead, to make sure you upload the full contents of the "page1.json" (or xml) file.

    • URI – enter the URI of the API call you want to make.

    • Return Type – select the return type from the dropdown.An API will typically return a JSON or XML document.

    • Repeat Element– the repeat element allows you to specify that object so that you can define your xpath from there, rather than the start of the document each time.If you want your xpath to start from the beginning of your object, just enter / into the Repeat Element field.

    • Column Detail – define the columns for the table you want to build from this JSON data. The minimum required to do this is a name for the column, the type of data it will hold, and an xpath, which points to where the data is in the JSON or XML document returned by the API.

    Add File

    Add File

  10. Lastly, details you have entered will be generated into a script, which you can then edit manually. It will upload the file onto your Matillion server. If it’s large you might need to wait a few seconds before receiving the "API Profile File was uploaded successfully" message. Then, click OK and leave the API Profile editor temporarily.

  11. API Profile File Uploaded Successfully

    API Profile File Uploaded Successfully

  12. Now the file exists on your Matillion Instance you’ll need to find its full path. You can do this by creating a new Orchestration Job and the, click and drag the Bash Script component onto the Job canvas from the component panel.

  13. Orchestration Job-Bash Script component

    Orchestration Job-Bash Script component

  14. Click the component icon on the job canvas to open the Properties window at the bottom of the screen. Next, click Script property and Update Component by the commmand below. Then, click OK

  15. find / -name "page1.json" -print 2>/dev/null
    Update Component

    Update component

  16. Next, check the Task Message and copy the full path from there as shown below.

  17. /usr/share/tomcat8/api_profiles/TheExampleAPI/page1.json
    Task Manager

    Task Manager

  18. Now you can use the file created to generate the RSD file. For that, go back to Matillion API Profile editor and then, click Generate.

  19. Generate RSD

    Generate RSD

  20. In the pop-up dialogue window, provide the details as explained below:

    • Table Name – use a valid database table name such as "stg_users".
    • Description – a short description about the file.
    • XPath to Repeat Element – in the form /path/subpath.
    • Data format – set to JSON or XML.
    • URI – set to the full path of the newly created file as retrieved earlier. Space characters are not permitted in this dialog
    • Properties – ensure that BackwardsCompatibilityMode is present and false

    Please Note

    There is an intermittent bug in some web browsers which causes Matillion to reject a valid XPath if you just set it to a single forward-slash. Switch to a different browser if this happens

    Generate RSD details

    Generate RSD details

    Once you have done with all settings, click OK

  21. Matillion will create a new RSD file, named after your chosen Table Name with a .rsd suffix. Click Test, and your Table Name should appear under Tables. Click on the table name, and Matillion will run the newly created RSD logic over the static file.

  22. Test RSD File

    Test RSD File

Important Information

  • View the sample data at the bottom of the dialog and make sure it is retrieving the data at the expected granularity.

  • You may wish to adjust the fields returned, which you can do by editing the <attr> elements near the top of the RSD (lines 6 to 20 in the example above).

  • You should also remove the three api:script sections associated with INSERTS, UPDATES and DELETES, as these cannot be used by Matillion.

API Script

API Script

Once you have your API Profile configured and successfully tested against your static first page file, you can switch to using the real API.

  • Replace the "uri" value with the actual API endpoint.

  • Add any authentication parameters that are necessary.

The above two steps will be covered in the next section.

Switching to the real API

URLs and authentication parameters will need to change in a real Production environment, but in order to get the connectivity working, you will need to initially use hardcoded values. Once the functionality has been verified, the hardcoded values will be removed. In detail will be dicussed in the later section Productionize.

Replace the "uri" value with the actual API endpoint

This process involves updating the "uri" attribute to the one you used in the original cURL or Postman command:

Before:

API Profile URI - Before

API Profile URI-Before

After:

API Profile URI - After

API Profile URI-After

Add the necessary authentication parameters

Many different authentication methods are available to authenticate RSD files. For detail description about the Authentication you can refer Authentication with API Profile.

Often the requirement is to add a custom HTTP header, which you can do by adding two lines near the top of the RSD. It’s usually good to put these just before the place where the "uri" attribute is set.

Once you have saved the changes to the RSD file, press the Test button again. A new record for "stg_users" should appear again at bottom left. Left-click the "stg_users" listing and Matillion will attempt to contact the real API over the network. Data should appear at bottom right.

If no data appears and it takes a long time (e.g. exactly 60 or 120 seconds), or if you receive timeout errors, review your network connectivity using the Network Connectivity Test shared job as described earlier.

The displayed data might be different from that displayed previously, since it’s coming from the real API this time. Nevertheless it should be at the same granularity.

You can use the API Profile in an API Query component now, and it should successfully return data. However paging has not yet been enabled, so it will only return records from the first page.

Paging will be discussed in the next section.


Establish Paging

Reaching this step, you should already have a working API Profile which will retrieve one page of data.

Most APIs don’t return all the data in a single response. This would be very inefficient and error-prone. Instead, most APIs use paging, which can help return a large number of records in smaller batches.

Several additional RSD features need to be added to enable paging.

There are many variations, but all require that you add a new input named rows@next to the list of columns inside the existing "api:info" block.

<input name="rows@next" desc="Identifier used for paging"/>

Now you’ll need to know how the API implements paging. In the simple example used here, the output document includes an attribute named nextURL which provides the full URL of the next page of data.

You must also add the following to the "api:script" section. A good position is just add after the authentication settings.

<api:set attr="EnablePaging" value="TRUE" />

<api:set attr="pageurlpath" value="/nextURL" />

Add to api:script

Add to api:script

For this simplest of paging mechanisms, no other changes are required. With the rows@next mechanism in place, the driver will automatically loop until either.

  • No records were returned.

  • rows@next is missing.

  • rows@next did not change since the last iteration.

Paging can be tricky to get right because RSD is a declarative language rather than an explicit loop, and so can be difficult to debug. It’s usually impossible to tell just from the Test screen whether or not paging has been invoked, because the Test screen only shows the first few records.

The best way to be certain that paging is working is to use the API Profile in an API Query component, and check that it successfully returns all the data that matches the query.



Productionize

By this step of the process you should have a fully working RSD, but most likely containing some hardcoded authentication parameters.

The RSD is fully functional as-is, but it’s bad practice to leave hardcoded authentication parameters in place, for two reasons:

  • It’s insecure: anyone logged into Matillion can read the credentials in plain text, so they need to be properly protected .

  • Authentication parameters usually expire at some point, so the existing API Profile has a limited shelf life.

To remedy these things, the RSD needs to be parameterized, and the authentication parameters need to be obtained at runtime and passed into the RSD.

Useful Links

  • Please refer this support article for how to use parameters with API profiles.

  • Please refer this support article for how to obtain an API token at runtime and pass it into an RSD script.

  • Please refer this support article for how to implement password management functionality, to protect your private API credentials.



Further Support

It is sometimes useful to be able to send API Profile data from place to place without having to copy-and-paste.

Exporting

For example, if you wish to export all your work on TheExampleAPI you could do as follows:

curl -k -X GET -u "myUsername:myPassword" -o TheExampleAPI.json
https://www.the-matillion-etl-server-for-mycompany.com/rest/v1/apiprofile/name/TheExampleAPI/export

You should then find a JSON file named TheExampleAPI.json in your local filesystem. It can be shared with another Matillion user who will be able to import it as shown below.

Importing

Similarly, if you wish to import a new API Profile into your Matillion instance you could do as follows:

curl -k -X POST -u "myUsername:myPassword" "https://www.the-matillion-etl-server-for-mycompany.com/rest/v1/apiprofile/import" -H "Content-Type: application/json" --data-binary @AnotherAPIProfile.json

Provided the address is correct, the username and password are for a user with API privilege, and the file "AnotherAPIProfile.json" exists in your local filesystem, you should receive a message like this:

{
"name" : "API Profiles",
"statusList" : [ {
"success" : true,
"name" : "AnotherAPIProfile"
} ],
"success" : true
}

Next time you enter the Manage API Profiles screen, you should find a new entry for the newly-imported API Profile.

Please Note

This method of importing a new API Profile will silently and completely overwrite any existing API Profile of that name.