Ingesting AWS ElasticSearch Data via the API Query Component

Ingesting AWS ElasticSearch Data via the API Query Component

Elasticsearch is a distributed and scalable multi-tenant search engine which is often used for indexing data say for a website and to provide fuzzy matching and suggested text.

Data can be streamed into Elasticsearch through a variety of different methods but one question which has been repeatedly asked recently is how to export data from AWS Elasticsearch into RedshiftBigQuerySnowflake

Taking the example data uploaded in this tutorial, we will use Matillion to load this into RedshiftBigQuerySnowflake. The data contains movie information and can be searched on different criteria.

In this article, we will be using Matillion ETL to use the AWS Elasticsearch REST API and load it using the ElasticSearch Query Component.

Create an API profile to call the Elasticsearch endpoint with the search criteria. This profile returns all movies:

The URI on line 11 can be changed to narrow down the search. For example the below will return all movies from 1996:

Lines 4 to 8 specify which fields to return from the API, whilst line 13 specifies the Repeat Element of the API so where the data is located in the JSON response from Elasticsearch. Further details on creating API Profiles is available here.

After the API Profile has been created, this can now be used in an Orchestration job using the API Query component. 

This API profile can then be used in a Matillion Orchestration job in conjunction with the API Query component.

After bringing the component onto the canvas, select the API profile just created:

Then select the Data Source which will form the new table. This will be the list of the rsd files created in the API profile.

Next select the fields in the Data Selection to form the columns in the table:

After specifying an S3 Staging bucket and Target Table name, run the API query component to query Elasticsearch and copy the data to a new Redshift table:

This data can now be used in a Transformation Job to view the data and transform it to split out the genre and actor columns:

The below transformation job will split out the comma separated list held in genre into separate columns:

This article looked at how to use the AWS Elasticsearch API to bring data into Redshift and transform it into a more user friendly format.

Example jobs (JSON format) for a range of platforms are available for download. RSD files featured in this article are also available.