Launching Matillion ETL for Delta Lake on AWS
New customers must go through the Matillion Hub to select their preferred cloud provider and data warehouse to begin their Matillion ETL journey.
This guide explains how to create a Matillion ETL instance using the Matillion Hub, launching on AWS with Delta Lake on Databricks, via Amazon Machine Image (AMI).
- This document is part of the Matillion ETL Instance Creation process.
- Matillion ETL uses Databricks Partner Connect to simplify the process of connecting to an existing SQL endpoint or cluster in your Databricks workspace. For full details of using Partner Connect, read Connect to Matillion by using Partner Connect in the Databricks documentation.
Prior to launching a Matillion ETL instance you will need to register for a Matillion Hub account. You will also require:
- Adequate knowledge about the cloud service account (AWS, Azure, GCP) and Cloud Data Warehouse (Snowflake, Redshift, Google BigQuery and Delta Lake on Databricks) you want to launch.
- A user with admin permissions who can access the intended cloud service account.
- Access to a cloud storage bucket (S3, Azure, Blob Storage or Google Cloud Storage) to house the transient staging files Matillion used to load data to the cloud.
- A network path to access the intended data sources. This may involve working with your network team to enable access to on-premise databases.
Launching Matillion ETL using Amazon Machine Image (AMI)
To launch and configure Matillion ETL for Delta Lake on AWS using Partner Connect, use the following steps.
Log in to Matillion Hub and choose your account. click Add Matillion ETL instance on the Select your service page to begin the process of creating an instance.
Click Continue in AWS and the Amazon EC2 Console opens. The system will have pre-populated an Amazon Machine Image (AMI) with all pre-configured details required to launch the instance. Select the AMI from the list presented, then click Launch instance from AMI at the top-right of the page.
If choosing from a list of AMIs, use only AMIs with "billing" in the instance name and not "byol". A "billing" filter has been pre-set on the list to assist you here. Also note that images aren't necessarily listed in order of recency and care should be taken when selecting the desired version.
- The Launch an instance page opens. This page will be pre-populated with default settings for the instance. The only item you must select here is a valid Key pair, for everything else you can accept the defaults. If you wish to change any default settings before launching, please take the following points into account.
- Number of instances: Leave as default (1), unless you want to launch multiple instances.
- Name and Tags: Click Add additional tags to add any instance tags you require.
- Application and OS Images (Amazon Machine Image): This has been pre-selected, so don't change anything here.
- Instance Type: You can leave this as default, or click the drop-down arrow to choose another supported instance type from the list.
- Key pair (login): Select a valid key pair from the list.
- Network settings: VPC: The availability zone in which this VPC is located should ideally be the same as your Databricks account (European Databricks accounts should be paired with an EU-based AWS region, for example).
- Network settings: Auto assign Public IP: This depends on the setup. By default a new VPC won't have VPN connections or NAT Gateways available, so in order for Matillion ETL to connect to the internet and for you to access Matillion ETL this will normally need to be set to Enable.
- Configure storage: Accept the default root volume size.
- Advanced details: Request spot instances: Do not select this option.
- Advanced details: Shutdown behavior: Select Stop.
- Advanced details: Termination protection: Select Enable.
- Once you have selected a Key pair (the only mandatory item on this page), click Launch instance.
- It may take a few minutes until your launched instances are in a running state.
- Click View Instances to monitor the instance status. Once your instances are in a running state, you can connect to them from the Instances page.
- Log into Matillion ETL with the username ec2-user and the instance ID i-xxxxxxxx (for example i-88ed92c6).