Manage API Profiles
Matillion’s API Query component provides a way to query a JSON- or XML-based API, and load the resulting data into a table. It’s a powerful and flexible component, which can deal with APIs that contain nested data structures, and those which require authentication and paging. When working with tables, rows and columns, the main step in handling the hierarchical data from APIs is to relationalise it. This is done with a Matillion “API Profile” specification, which will need to be created.
Every API Profile contains one or more RSD “files”. Each file declares how to map an API response into rows and columns. In general, one RSD file will be needed for every different endpoint in the API. This document describes how to author an API Profile, and use it with a API Query component.
- Due to a recent update to the REST API driver, the syntax of example files and RSD files on Matillion ETL may differ slightly from current API standards—for example,
- Other updates include the need for users to setup input parameters as inputs rather than dummy columns, and only returning the date on
Datedata types as opposed to the datetime as previously.
The AWS IP Ranges API
This is a useful first example, because it’s quite simple, is universally available, and requires no authentication or paging.
The API returns a list of the IP address ranges which AWS use for services, and is returned as a JSON document with a nested array. This can be accessed from a web browser by just navigating to the URL: https://ip-ranges.amazonaws.com/ip-ranges.json
Creating a Matillion API Profile
Click Project ▼ → Manage API Profiles to open the Manage API Profiles pop-up window. It should contain one API Profile for every API that has been configured.
Matillion ETL does ship with some example API profiles, including one for accessing its own REST API.
Click ✚ to add a new API profile, and name it "AWS".
Find "AWS" in the list, and click ⚙.
This should open the Configure API Profile pop-up window. Click ✚ to add a new "file", and name it "ip-ranges"—don't give it any suffix at this stage.
Returning to the Configure API Profile window, with the file "ip-ranges.rsd" open in editable mode. Paste the contents of the attached file "ip-ranges.rsd.txt" into the script window on the right.
After clicking OK, the AWS API Profile may be edited again by clicking ⚙. Click Test, and click on the AWS IP Ranges "table" at bottom left. Matillion should invoke the API and hopefully return some rows and columns to the screen.
RSD file anatomy
>Each RSD "file" in an API Profile describes how to handle one of the endpoints in the API. The main features are:
- Line 8: name the URI
- Line 10: decide how “rows” are defined
- Line 4 and 5: access the columns from every row
- Line 13: inform Matillion what kind of API it is dealing with (In this case, it is a "jsonproviderGet", but there is also an equivalent "xmlproviderGet")
- Line 12: describe the REST operation (which is an HTTP "GET" in this case)
Once a working example is available, it is usually best to create a second RSD file in the same profile to experiment with. Choose a different name for this, copy and paste the content from the working file, and use Test to verify that any changes made are working as expected.
Using an API Profile in Matillion
- Once some data has been successfully returned in the API Profile test window, it is ready to use in the API Query component.
- Create a new Orchestration Job
- Drag on an API Query component
- Set it to Basic mode
- The "Profile" is "AWS"
- The "Data Source" is "ip-ranges" (which should appear in the dropdown list)
- Choose an S3 Staging Area and a Target Table name
When running this component, it should query the API and create a named table with the columns described in the RSD script. This completes the setup of a Matillion component which invokes an external API, translates the returned JSON into relational rows and columns, and loaded the data into a new table. This table can now be used in Transformation Jobs to combine with data from other sources. This enables the creation of joined-up information from data sets that would otherwise have remained siloed.
Please find attached an example job which uses the newly created API Profile. Import the file "aws_ip_ranges_api.json" into the Matillion instance using the Project / Import menu.
Enable API paging by adding the following to the <api:script> section of the RSD:
<api:set attr="EnablePaging" value="TRUE" />
Retrieving the page number from the response header
In some APIs, the next page indicator is returned through a header attribute. An attribute value can be extracted through the follow method:
Set the pushresponseheader#1 attribute value to the header attribute to be extracted. In the example above it is the “X-Next-Page” attribute.
Make sure to set the value of the
rows@nextinput to the value from the header attribute. The header value can be extracted by doing the following:
X-Next-Page is the header attribute that we’re extracting the value from.
Incremental paging without an indicator attribute
Once in a while, the API may facilitate an incremental paging structure, but does not provide a page indicator through any of the attributes in the response. If this is the case, then the following solution can be implemented:
<api:set attr="EnablePaging" value="TRUE" />
<api:set attr="uri" value="https://uri.com/users?page=[_input.rows@next]" />
<api:set attr="uri" value="https://uri.com/api/users?page=1" />
<api:set attr="RepeatElement" value="/data" />
<api:set attr="rows@next" value="[_input.rows@next | add(1)]" />
add(1) to the
rows@next input will increment the
rows@next input on each request.