Setting Up Matillion ETL In A Private VPC
This guide looks at how to configure Matillion ETL in a private Virtual Private Cloud (VPC) with no internet access. We assume you already have a private subnet set up and that you have (or can launch) an EC2 instance with Matillion ETL installed inside this subnet.
Matillion ETL for Redshift works best when it has access to the internet, either via a publicly addressable IP address and an internet gateway or via an Elastic Load Balancer and a NAT gateway. It is however also possible to deploy Matillion ETL to a VPC without any internet access or to an isolated subnet with no further routing configured. There are some caveats; access to S3 will be via an VPC endpoint for S3 and only accessible for buckets within the same region. While RedShift can be supported in this configuration, access to the wider range of Matillion ETL components will be limited.
In order for Matillion ETL to access the Redshift cluster, it must be created in the same VPC or VPC peering needs to be set up. To create a Redshift cluster in a private VPC, a subnet group needs to be set up associated with that VPC.
- Follow the AWS Console services menu to 'Redshift' then click "Launch cluster".
- On the first page, enter identifying details as appropriate for your new cluster.
- On the second page, select the desired size of the Redshift cluster.
- On the third page, you can configure your subnet.
- Select the VPC in which your private subnet resides in Choose a VPC.
- Select the private subnet in Cluster subnet group.
- Ensure Publically Accessible is set to No.
- Ensure Enhanced VPC Routing is set to No.
- Availability Zone and Security Groups can also be selected here.
- Click Continue and then again Launch Cluster. (Note: Charges will apply from this point on.)
- Your new Redshift cluster is now set up.
Matillion ETL requires access to access S3 to load data into Redshift. To grant your private VPC access to your S3 buckets, an endpoint must be created.
- From the AWS Console services menu, browse to VPC.
- On the VPC page, select Endpoints from the left-hand menu.
- From the Endpoints screen select Create Endpoint.
- Ensure AWS services is selected.
- Select the service that points to S3 of the same region your private subnet is in. (e.g. Below shown the endpoint for eu-west-2 S3.
- Below, select the VPC in which your private subnet resides.
This will add an entry into the VPC route table to route traffic to the S3 buckets
Access to Matillion ETL should now work and data can be loaded and transformed in your Redshift cluster. However please note there are some gotchas to be aware of. These include:
The S3 bucket used for loading in Matillion ETL must use bucket in same region as the Redshift Cluster.
Matillion ETL relies on internet access to connect to many services. Without a route to the internet, many components may not be able to connect to the target API such as Facebook, Google Analytics and YouTube Query components. Several key functions of Matillion ETL may also be fail such as Updating and Migration services.
Example CloudFormation Template
Important: The template is for educational purposes only, demonstrating various different types of VPC connecting to many different services. It is highly unlikely that the template is one you would find practically useful and should not be used as such.
The CloudFormation Template attached at the bottom of the page and will create a VPC in eu-west-2 with the following:
- 1 Private Subnet (Matillion would usually live here)
- Route table via the NGW
- 2 Public Subnet (InternetGateWay/NGW and ELB is here)
- Route table via the IGW
- Public IP on by default
- 1 Isolated Subnet (This can't route out)
- 1 Instance in the private subnet
- 1 Instance in the isolated subnet
- 1 ELB in the public subnet routing to port 22 of the private subnet
- 1 Security group for the isolated subnet, all traffic allowed
- 1 Security group for the ELB, all traffic allowed
- 1 Security group for the private subnet, all traffice allowed
- VPC endpoint for S3 in eu-west-2 with an all-allow policy
- IAM Profile/Role with full privilleges on all services