Manage API Profiles
The Matillion ETL API Query component provides a method of querying a JSON- or XML-based API, and loading the resulting data into a table. It is a powerful and flexible component, which can deal with APIs that contain nested data structures, and those which require authentication and paging. When working with tables, rows and columns, the main step in handling the hierarchical data from APIs is to relationalise it. This is done with a Matillion “API Profile” specification, which will need to be created.
Every API Profile contains one or more RSD “files”. Each file declares how to map an API response into rows and columns. In general, one RSD file will be needed for every different endpoint in the API. This document describes how to author an API Profile, and use it with a API Query component.
- Due to a recent update to the REST API driver, the syntax of example files and RSD files on Matillion ETL may differ slightly from current API standards—for example,
- Other updates include the need for users to setup input parameters as inputs rather than dummy columns, and only returning the date on
Datedata types as opposed to the datetime as previously.
Example: The AWS IP Ranges API
This is a useful first example, because it is quite simple, universally available, and requires no authentication or paging.
The API returns a list of the IP address ranges which AWS use for services, and is returned as a JSON document with a nested array. This can be accessed from a web browser by just navigating to the URL: https://ip-ranges.amazonaws.com/ip-ranges.json
Creating a Matillion API Profile
Click Project → Manage API Profiles to open the Manage API Profiles pop-up window. It should contain one API Profile for every API that has been configured. Click + to add a new API profile.
Matillion ETL already has several "out of the box" API profiles, including one for accessing its own REST API.
In the Add API Profile pop-up window, enter "AWS" into the Profile Name field and click OK.
On returning to the Manage API Profiles pop-up window
, find "AWS" in the API Profiles list, and click . This should open the Configure API Profile pop-up window. Click + to add a new "file".
In the Add File pop-up window, provide details for the following fields:
- Filename: enter "ip-ranges" (but do not give it any suffix at this stage)
- File Type: select RSD
- URI: enter "https://ip-ranges.amazonaws.com/ip-ranges.json" (this will pre-populate the RSD file with data)
- Return Type: select JSON
- Repeat Element: enter "/prefixes", then click OK
Returning to the Configure API Profile window, the file "ip-ranges.rsd" will now be open in editable mode. Now, copy and paste the contents of the attached file "IP-Ranges.txt" over the content in the script window on the right. This will add additional attributes to the file.
Next, click Test to check the RSD file is working correctly. This will invoke the API and add ip-ranges to the Tables panel. Clicking on the table name will then display its contents at the bottom of the Configure API Profile window.
RSD file anatomy
<rsb:info title="AWS API" desc="Many rows, two columns, no authentication, no paging"> <attr name="ipv4_cidr" xs:type="string" other:xPath="ip_prefix" /> <attr name="aws_region" xs:type="string" other:xPath="region" /> </rsb:info>
<rsb:set attr="uri" value="https://ip-ranges.amazonaws.com/ip-ranges.json" /> <rsb:set attr="RepeatElement" value="/prefixes" />
<rsb:script method="GET" > <rsb:call op="jsonproviderGet"> <rsb:push/> </rsb:call> </rsb:script>
Each RSD "file" in an API Profile describes how to handle one of the endpoints in the API. The main features are:
- Line 4 and 5: which columns will be accessed from every row
- Line 8: name of the URI
- Line 10: how rows are defined
- Line 12: describe the REST operation (an HTTP "GET" in this example)
- Line 13: inform Matillion what kind of API it is dealing with (in this example, a "jsonproviderGet" is used, but there is also an equivalent "xmlproviderGet")
Once a working example is available, it is usually best to create a second RSD file in the same profile to experiment with. Choose a different name for this, copy and paste the content from the working file, and use Test to verify that any changes made are working as expected.
Using an API Profile in Matillion
- Create a new Orchestration Job.
- Drag an API Query component onto the job canvas.
- Click on the component icon to open the Properties panel.
- Click ... to edit a property. First, select "AWS" as the Profile.
- Next, select "ip-ranges" as the Data Source.
- Then, in the Data Source pop-up window, add the columns to be used by moving them from the left panel to the right using .
- Lastly, provide a name for the Target Table and select a Staging Area.
When running this component, it should query the API and create a named table with the columns described in the RSD file. This completes the setup of a Matillion component—invoking an external API, translating the returned JSON into relational rows and columns, and loading the data into a new table. This table can then be used in Transformation Jobs to combine with data from other sources. This enables the creation of joined-up information from data sets that would otherwise have remained siloed.
API paging can be enabled by adding the following to the
<rsb:script> section of the RSD file:
<rsb:set attr="EnablePaging" value="TRUE" />
Retrieving the page number from the response header
In some APIs, the next page indicator is returned through a header attribute. An attribute value can be extracted through the follow method:
pushresponseheader#1attribute value to the header attribute to be extracted—in this example, that is
Set the value of the
rows@nextinput to the value from the header attribute. The header value can be extracted with the following:
X-Nest_Pageis the header attribute from which the value is extracted)
<rsb:script method="GET" > <rsb:call op="jsonproviderGet">
<rsb:check attr="_out1.header:X-Next-Page"> <rsb:set attr="rows@next" value="[_out.header:X-Next-Page]" /> <rsb:else> <rsb:set attr="rows@next" value="" /> </rsb:else> </rsb:check>
Incremental paging without an indicator attribute
Once in a while, the API may facilitate an incremental paging structure, but does not provide a page indicator through any of the attributes in the response. If this is the case, then the following solution can be implemented:
<rsb:set attr="EnablePaging" value="TRUE" />
<rsb:check attr="_input.rows@next"> <rsb:set attr="uri" value="https://uri.com/users?page=[_input.rows@next]" /> <rsb:else> <rsb:set attr="uri" value="https://uri.com/api/users?page=1" /> </rsb:else> </rsb:check>
<rsb:set attr="RepeatElement" value="/data" />
<rsb:script method="GET"> <rsb:call op="jsonproviderGet"> <rsb:set attr="rows@next" value="[_input.rows@next | add(1)]" /> <rsb:push/> </rsb:call>
add(1) added to the
rows@next input will increment the
rows@next input on each request.