Manage API Profiles

Manage API Profiles


Overview

Matillion’s API Query component provides a way to query a JSON- or XML-based API, and load the resulting data into a table. It’s a powerful and flexible component, which can deal with APIs that contain nested data structures, and those which require authentication and paging. When working with tables, rows and columns, the main step in handling the hierarchical data from APIs is to relationalise it. This is done with a Matillion “API Profile” specification, which will need to be created.

Every API Profile contains one or more RSD “files”. Each file declares how to map an API response into rows and columns. In general, one RSD file will be needed for every different endpoint in the API. This document describes how to author an API Profile, and use it with a API Query component.

Warning

  • Due to a recent update to the REST API driver, the syntax of example files and RSD files on Matillion ETL may differ slightly from current API standards—for example, <api: is now <api:, .
  • Other updates include the need for users to setup input parameters as inputs rather than dummy columns, and only returning the date on Date data types as opposed to the datetime as previously.

The AWS IP Ranges API

This is a useful first example, because it’s quite simple, is universally available, and requires no authentication or paging.

The API returns a list of the IP address ranges which AWS use for services, and is returned as a JSON document with a nested array. This can be accessed from a web browser by just navigating to the URL: https://ip-ranges.amazonaws.com/ip-ranges.json

AWS IP Ranges API

AWS IP Ranges API


Creating a Matillion API Profile

  1. Click Project ▼Manage API Profiles to open the Manage API Profiles pop-up window. It should contain one API Profile for every API that has been configured.

    Please Note

    Matillion ETL does ship with some example API profiles, including one for accessing its own REST API.

  2. Click to add a new API profile, and name it "AWS".

    Manage API Profiles

    Manage API Profiles

  3. Find "AWS" in the list, and click .

  4. This should open the Configure API Profile pop-up window. Click to add a new "file", and name it "ip-ranges"—don't give it any suffix at this stage.

    Configure API Profile

    Configure API Profile

  5. Returning to the Configure API Profile window, with the file "ip-ranges.rsd" open in editable mode. Paste the contents of the attached file "ip-ranges.rsd.txt" into the script window on the right.

    Paste RSD file content

    Paste RSD file content

  6. After clicking OK, the AWS API Profile may be edited again by clicking . Click Test, and click on the AWS IP Ranges "table" at bottom left. Matillion should invoke the API and hopefully return some rows and columns to the screen.


RSD file anatomy

>Each RSD "file" in an API Profile describes how to handle one of the endpoints in the API. The main features are:

  • Line 8: name the URI
  • Line 10: decide how “rows” are defined
  • Line 4 and 5: access the columns from every row
  • Line 13: inform Matillion what kind of API it is dealing with (In this case, it is a "jsonproviderGet", but there is also an equivalent "xmlproviderGet")
  • Line 12: describe the REST operation (which is an HTTP "GET" in this case)

Once a working example is available, it is usually best to create a second RSD file in the same profile to experiment with. Choose a different name for this, copy and paste the content from the working file, and use Test to verify that any changes made are working as expected.


Using an API Profile in Matillion

  • Once some data has been successfully returned in the API Profile test window, it is ready to use in the API Query component.
  • Create a new Orchestration Job
  • Drag on an API Query component
  • Set it to Basic mode
  • The "Profile" is "AWS"
  • The "Data Source" is "ip-ranges" (which should appear in the dropdown list)
  • Choose an S3 Staging Area and a Target Table name

When running this component, it should query the API and create a named table with the columns described in the RSD script. This completes the setup of a Matillion component which invokes an external API, translates the returned JSON into relational rows and columns, and loaded the data into a new table. This table can now be used in Transformation Jobs to combine with data from other sources. This enables the creation of joined-up information from data sets that would otherwise have remained siloed.

Please Note

Please find attached an example job which uses the newly created API Profile. Import the file "aws_ip_ranges_api.json" into the Matillion instance using the Project / Import menu.

API Query component  

API Query component


Paging

Enable API paging by adding the following to the <api:script> section of the RSD:

<api:set attr="EnablePaging" value="TRUE" />

Retrieving the page number from the response header

In some APIs, the next page indicator is returned through a header attribute. An attribute value can be extracted through the follow method:

Retrieving page number

Retrieving page number

  1. Set the pushresponseheader#1 attribute value to the header attribute to be extracted. In the example above it is the “X-Next-Page” attribute.

    Push Responder Header

    Push Responder Header

  2. Make sure to set the value of the rows@next input to the value from the header attribute. The header value can be extracted by doing the following:

    [_out1.header:X-Next-Page]
  3. X-Next-Page is the header attribute that we’re extracting the value from.


Incremental paging without an indicator attribute

Once in a while, the API may facilitate an incremental paging structure, but does not provide a page indicator through any of the attributes in the response. If this is the case, then the following solution can be implemented:

<api:set attr="EnablePaging" value="TRUE" />

<api:check attr="_input.rows@next">
<api:set attr="uri" value="https://uri.com/users?page=[_input.rows@next]" />
<api:else>
<api:set attr="uri" value="https://uri.com/api/users?page=1" />
</api:else>
</api:check>

<api:set attr="RepeatElement" value="/data" />

<api:script method="GET">
<api:call op="jsonproviderGet">
<api:set attr="rows@next" value="[_input.rows@next | add(1)]" />
<api:push/>
</api:call>
</api:script>

The add(1) to the rows@next input will increment the rows@next input on each request.