API Extract

API Extract


This article is specific to the following platforms - Redshift - Snowflake - BigQuery.


API Extract

The API Extract component lets users create their own custom Matillion ETL connector by extracting and loading data from their desired API in order to transform that data.


Important Information

Using the API Extract component requires at least one configured Extract Profile. Read our Manage Extract Profiles guide for more information on completing the Manage Extract Profile setup, including adding a new endpoint. When adding a new endpoint within the wizard, users can opt to authenticate with a username and password or an API token. Enabling authentication will make the properties available (either Username and Password, or API token).

Using this component on Matillion ETL for Redshift may return structured data that requires flattening. For help flattening such data, please read our Nested Data Load Component documentation.

Using this component on Matillion ETL for Snowflake may return structured data that requires flattening. For help flattening such data, please read our Extract Nested Data Component documentation.

Using this component on Matillion ETL for BigQuery may return structured data that requires flattening. For help flattening such data, please read our Extract Nested Data Component documentation.



Redshift Properties

Property Setting Description
Name String Input the descriptive name for the component.
API Select Select the API extract profile. To manage extract profiles, click ProjectManage Extract Profiles. Read our Manage Extract Profiles guide for more information.
Data Source Select Select the data source. The Data Source property is only displayed when there is more than one data source (endpoint) configured for the selected API profile. If there is only one endpoint configured for this API profile, then the Data Source property is automatically configured and hidden.
Username String The username used to authenticate the endpoint.
Password String The password used to authenticate the endpoint.
API Token String The API token used to authenticate the endpoint.
URI Params Parameter Name Any parameters that are configured for this endpoint in the wizard will be displayed here. URI parameters cannot be set as constants. Constants will not appear here.
Users can toggle the Text Mode checkbox to navigate between grid mode and text mode.
Parameter Value Specify the values for any added parameters.
Query Params Parameter Name Specify any query parameters. Any parameters that are configured for this endpoint in the wizard will be displayed here. Constants will not appear here.
Users can toggle the Text Mode checkbox to navigate between grid mode and text mode.
Parameter Value Specify the values for any added parameters.
Header Params Parameter Name Specify any header parameters. Any parameters that are configured for this endpoint in the wizard will be displayed here. Constants will not appear here.
Users can toggle the Text Mode checkbox to navigate between grid mode and text mode.
Parameter Value Specify the values for any added parameters.
Page Limit Integer Specify the maximum number of pages to stage.
Location File Structure | String Specify the S3 bucket. Users can click through the tree structure to locate the preferred S3 bucket, or specify the URL of the S3 bucket in the URL field, following the template: s3://<bucket>/<path>
External Schema Select Select the table's external schema. To learn more about external schema, please consult the Configuring The Matillion ETL Client section of the Getting Started With Amazon Redshift Spectrum documentation.
For more information on using multiple schema, see Schema Support.
Target Table String Specify the external table to be used.
Warning: This table will be recreated and any existing table of the same name will be dropped.

Snowflake Properties

Property Setting Description
Name String Input the descriptive name for the component.
API Select Select the API extract profile. To manage extract profiles, click ProjectManage Extract Profiles. Read our Manage Extract Profiles guide for more information.
Data Source Select Select the data source.
Username String The username used to authenticate the endpoint.
Password String The password used to authenticate the endpoint.
API Token String The API token used to authenticate the endpoint.
URI Params Parameter Name Any parameters that are configured for this endpoint in the wizard will be displayed here. URI parameters cannot be set as constants. Constants will not appear here.
Users can toggle the Text Mode checkbox to navigate between grid mode and text mode.
Parameter Value Specify the values for any added parameters.
Query Params Parameter Name Specify any query parameters. Any parameters that are configured for this endpoint in the wizard will be displayed here. Constants will not appear here.
Users can toggle the Text Mode checkbox to navigate between grid mode and text mode.
Parameter Value Specify the values for any added parameters.
Header Params Parameter Name Specify any header parameters. Any parameters that are configured for this endpoint in the wizard will be displayed here. Constants will not appear here.
Users can toggle the Text Mode checkbox to navigate between grid mode and text mode.
Parameter Value Specify the values for any added parameters.
Page Limit Integer Specify the maximum number of pages to stage.
Location Storage Location Provide an S3 bucket path (AWS only), GCS bucket path (GCP only), or Azure Blob Storage path (Azure only) that will be used to store the data. Once on an S3 bucket, GCS bucket or Azure Blob, the data can be referenced by an external table. A folder will be created at this location with the same name as the Target Table.
Integration Select (GCP only) Choose your Google Cloud Storage Integration. Integrations are required to permit Snowflake to read data from and write to a Google Cloud Storage bucket. Integrations must be set up in advance of selecting them in Matillion ETL. To learn more about setting up a storage integration, read our Storage Integration Setup Guide.
Warehouse Select Choose a Snowflake warehouse that will run the load.
Database Select Select a database. A database is a logical grouping of schemas. Each database belongs to a single Snowflake account.
Schema Select Select the schema. A schema is a logical grouping of database “objects” (tables, views, etc.). Each schema belongs to a single database. The special value, [Environment Default], will use the schema defined in the environment. For more information on using multiple schemas, see this article.
Target Table String Specify the external table to be used.
Warning: This table will be recreated and any existing table of the same name will be dropped.

BigQuery Properties

Property Setting Description
Name String Input the descriptive name for the component.
API Select Select the API extract profile. To manage extract profiles, click ProjectManage Extract Profiles. Read our Manage Extract Profiles guide for more information.
Data Source Select Select the data source.
Username String The username used to authenticate the endpoint.
Password String The password used to authenticate the endpoint.
API Token String The API token used to authenticate the endpoint.
URI Params Parameter Name Any parameters that are configured for this endpoint in the wizard will be displayed here. URI parameters cannot be set as constants. Constants will not appear here.
Users can toggle the Text Mode checkbox to navigate between grid mode and text mode.
Parameter Value Specify the values for any added parameters.
Query Params Parameter Name Specify any query parameters. Any parameters that are configured for this endpoint in the wizard will be displayed here. Constants will not appear here.
Users can toggle the Text Mode checkbox to navigate between grid mode and text mode.
Parameter Value Specify the values for any added parameters.
Header Params Parameter Name Specify any header parameters. Any parameters that are configured for this endpoint in the wizard will be displayed here. Constants will not appear here.
Users can toggle the Text Mode checkbox to navigate between grid mode and text mode.
Parameter Value Specify the values for any added parameters.
Page Limit Integer Specify the maximum number of pages to stage.
Project Select Select the Google Bigquery project. The special value, [Environment Default], will use the project defined in the environment.
For more information, refer to the BigQuery documentation.
Dataset Select Select the Google Bigquery dataset to load data into. The special value, [Environment Default], will use the dataset defined in the environment.
For more information, refer to the BigQuery documentation.
Target Table String Specify the external table to be used.
Warning: This table will be recreated and any existing table of the same name will be dropped.
Cloud Storage Location String | File Structure Specify the target Google Cloud Storage bucket to be used for staging the queried data. Users can either:
1. Input the URL string of the Cloud Storage bucket following the template provided: gs://<bucket>/<path> 2. Navigate through the file structure to select the target bucket.
Load Options Multiple Select Clean Cloud Storage Files: Destroy staged files on Cloud Storage after loading data. Default is On.
Cloud Storage File Prefix: Give staged file names a prefix of your choice. The default setting is an empty field.
Recreate Target Table: Choose whether the component recreates its target table before the data load. When Off, the component will use an existing table or create one if it does not exist. Default is On.
Use Grid Variable: Check this checkbox to use a grid variable. This box is unchecked by default.

Video Example