Manage API Profiles

Manage API Profiles


Overview

The Matillion ETL API Query component provides a method of querying a JSON- or XML-based API, and loading the resulting data into a table. It is a powerful and flexible component, which can deal with APIs that contain nested data structures, and those which require authentication and paging. When working with tables, rows and columns, the main step in handling the hierarchical data from APIs is to relationalise it. This is done with a Matillion “API Profile” specification, which will need to be created.

Every API Profile contains one or more RSD “files”. Each file declares how to map an API response into rows and columns. In general, one RSD file will be needed for every different endpoint in the API. This document describes how to author an API Profile, and use it with a API Query component.

Warning

  • Due to a recent update to the REST API driver, the syntax of example files and RSD files on Matillion ETL may differ slightly from current API standards—for example, <rsb: is now <api:, .
  • Other updates include the need for users to setup input parameters as inputs rather than dummy columns, and only returning the date on Date data types as opposed to the datetime as previously.

Example: The AWS IP Ranges API

This is a useful first example, because it is quite simple, universally available, and requires no authentication or paging.

The API returns a list of the IP address ranges which AWS use for services, and is returned as a JSON document with a nested array. This can be accessed from a web browser by just navigating to the URL: https://ip-ranges.amazonaws.com/ip-ranges.json

AWS IP Ranges API in Postman

AWS IP Ranges API in Postman


Creating a Matillion API Profile

  1. Click Project Manage API Profiles to open the Manage API Profiles pop-up window. It should contain one API Profile for every API that has been configured. Click + to add a new API profile.

    Please Note

    Matillion ETL already has several "out of the box" API profiles, including one for accessing its own REST API.

    Manage API Profiles

    Manage API Profiles

  2. In the Add API Profile pop-up window, enter "AWS" into the Profile Name field and click OK.

    Add API Profile

    Add API Profile

  3. On returning to the Manage API Profiles pop-up window

    , find "AWS" in the API Profiles list, and click . This should open the Configure API Profile pop-up window. Click + to add a new "file".

    Configure API Profile

    Configure API Profile

  4. In the Add File pop-up window, provide details for the following fields:

    • Filename: enter "ip-ranges" (but do not give it any suffix at this stage)
    • File Type: select RSD
    • URI: enter "https://ip-ranges.amazonaws.com/ip-ranges.json" (this will pre-populate the RSD file with data)
    • Return Type: select JSON
    • Repeat Element: enter "/prefixes", then click OK

    Add File

    Add File

  5. Returning to the Configure API Profile window, the file "ip-ranges.rsd" will now be open in editable mode. Now, copy and paste the contents of the attached file "IP-Ranges.txt" over the content in the script window on the right. This will add additional attributes to the file.

    Paste RSD file content

    Paste RSD file content

  6. Next, click Test to check the RSD file is working correctly. This will invoke the API and add ip-ranges to the Tables panel. Clicking on the table name will then display its contents at the bottom of the Configure API Profile window.

    Test API Profile

    Test API Profile


RSD file anatomy

<rsb:script xmlns:rsb="http://www.rssbus.com/ns/rsbscript/2/">

<rsb:info title="AWS API" desc="Many rows, two columns, no authentication, no paging"> <attr name="ipv4_cidr" xs:type="string" other:xPath="ip_prefix" /> <attr name="aws_region" xs:type="string" other:xPath="region" /> </rsb:info>
<rsb:set attr="uri" value="https://ip-ranges.amazonaws.com/ip-ranges.json" /> <rsb:set attr="RepeatElement" value="/prefixes" />
<rsb:script method="GET" > <rsb:call op="jsonproviderGet"> <rsb:push/> </rsb:call> </rsb:script>
</rsb:script>

Each RSD "file" in an API Profile describes how to handle one of the endpoints in the API. The main features are:

  • Line 4 and 5: which columns will be accessed from every row
  • Line 8: name of the URI
  • Line 10: how rows are defined
  • Line 12: describe the REST operation (an HTTP "GET" in this example)
  • Line 13: inform Matillion what kind of API it is dealing with (in this example, a "jsonproviderGet" is used, but there is also an equivalent "xmlproviderGet")

Please Note

Once a working example is available, it is usually best to create a second RSD file in the same profile to experiment with. Choose a different name for this, copy and paste the content from the working file, and use Test to verify that any changes made are working as expected.


Using an API Profile in Matillion

API Query component

API Query component

Once an API Profile has been tested and confirmed to be working correctly and returning data, it is ready to use in the API Query component. This can be done as follows:

  1. Create a new Orchestration Job.
  2. Drag an API Query component onto the job canvas.
  3. Click on the component icon to open the Properties panel.
  4. Click ... to edit a property. First, select "AWS" as the Profile.
  5. Next, select "ip-ranges" as the Data Source.
  6. Then, in the Data Source pop-up window, add the columns to be used by moving them from the left panel to the right using .
  7. Lastly, provide a name for the Target Table and select a Staging Area.
Paste RSD file content

Paste RSD file content

When running this component, it should query the API and create a named table with the columns described in the RSD file. This completes the setup of a Matillion component—invoking an external API, translating the returned JSON into relational rows and columns, and loading the data into a new table. This table can then be used in Transformation Jobs to combine with data from other sources. This enables the creation of joined-up information from data sets that would otherwise have remained siloed.


Paging

API paging can be enabled by adding the following to the <rsb:script> section of the RSD file:

<rsb:set attr="EnablePaging" value="TRUE" />

Retrieving the page number from the response header

Retrieving page number

Retrieving page number

In some APIs, the next page indicator is returned through a header attribute. An attribute value can be extracted through the follow method:

  1. Set the pushresponseheader#1 attribute value to the header attribute to be extracted—in this example, that is X-Next-Page.

  2. Set the value of the rows@next input to the value from the header attribute. The header value can be extracted with the following: [_out1.header:X-Next-Page] (where X-Nest_Page is the header attribute from which the value is extracted)

<rsb:script method="GET" >
  <rsb:call op="jsonproviderGet">

<rsb:check attr="_out1.header:X-Next-Page"> <rsb:set attr="rows@next" value="[_out.header:X-Next-Page]" /> <rsb:else> <rsb:set attr="rows@next" value="" /> </rsb:else> </rsb:check>
<rsb:push/>
</rsb:call>
</rsb:script>



Incremental paging without an indicator attribute

Once in a while, the API may facilitate an incremental paging structure, but does not provide a page indicator through any of the attributes in the response. If this is the case, then the following solution can be implemented:

<rsb:set attr="EnablePaging" value="TRUE" />

<rsb:check attr="_input.rows@next"> <rsb:set attr="uri" value="https://uri.com/users?page=[_input.rows@next]" /> <rsb:else> <rsb:set attr="uri" value="https://uri.com/api/users?page=1" /> </rsb:else> </rsb:check>
<rsb:set attr="RepeatElement" value="/data" />
<rsb:script method="GET"> <rsb:call op="jsonproviderGet"> <rsb:set attr="rows@next" value="[_input.rows@next | add(1)]" /> <rsb:push/> </rsb:call>
</rsb:script>

Please Note

The add(1) added to the rows@next input will increment the rows@next input on each request.