Configuring an AWS VPC
Amazon Virtual Private Cloud (Amazon VPC) lets you provision a logically isolated section of the Amazon Web Services (AWS) Cloud where you can launch AWS resources in a virtual network that you define. You have complete control over your virtual networking environment, including selection of your own IP address range, the creation of subnets, and configuration of route tables and network gateways.
We recommend using Matillion ETL in a VPC for production environments. However, setups can vary, and with that in mind, this page demonstrates how to set up a minimal VPC as an example, and also discusses some of the options.
Internet Access in a VPC
By default, an EC2 instance launched into a VPC does not have internet access unless an Elastic IP is associated with that instance. It is possible, but not recommended to run Matillion ETL without an internet connection; however, there are some limitations. If you do not allow internet access, the following features and components—which rely on the AWS API—will not work:
- Cluster discovery when setting up the environment: Redshift environment details will need to be entered manually.
- SQS Discovery: This feature uses the AWS API to list existing queues and listen for messages.
- SNS Message Component: This feature uses the AWS API to create an SNS endpoint and send messages.
- SQS Message Component: This feature uses the AWS API to send a message to an SQS queue.
- RDS Query Component: This feature uses the AWS API to upload the data to Amazon S3.
Setting up the VPC
These instructions set up a VPC using the Amazon Command Line Interface (CLI). Users can find the link to download the AWS CLI here. Installation and setup are assumed.
For instructions on launching a Redshift cluster, refer to Create Redshift Cluster.
The following commands can be used to set up the VPC:
- First, create a new VPC with
10.0.*.*private address block. Note the VPC ID when created:
aws ec2 create-vpc --cidr-block 10.0.0.0/16
- Then, add a matching subnet, and note the subnet ID when created:
aws ec2 create-subnet --vpc-id <VPC ID> --cidr-block 10.0.0.0/16
- Next, using an internet gateway allows connection to the internet. Note the gateway ID.
aws ec2 create-internet-gateway
- Attach the internet gateway to the subnet, like so:
aws ec2 attach-internet-gateway --internet-gateway-id <gateway ID> --vpc-id <VPC ID>
- Add the default route to the route table so that the VPC can route traffic to the internet. First we need to find the route table using the below code. Note the route table ID.
aws ec2 describe-route-tables
- Create the route in the route table.
aws ec2 create-route --route-table-id <route table ID> --destination-cidr-block 0.0.0.0/0 --gateway-id <gateway ID>
- Find the security group that was created with the VPC.
aws ec2 describe-security-groups --filters Name=vpc-id,Values=<VPC ID>
- Add a rule so that Matillion ETL can connect to the cluster.
aws ec2 authorize-security-group-ingress --group-id <Security Group Id> --protocol tcp --port 5439 --cidr 10.0.0.0/16
Create Redshift Cluster
In this example, we create a Redshift cluster in the VPC called My-Redshift-Cluster. However, before we do that, we need to create a cluster subnet group for the cluster to live in.
aws redshift create-cluster-subnet-group --cluster-subnet-group-name mysubnetgroup --description "My subnet group" --subnet-ids <subnet ID>
Now launch the cluster. This is a single node cluster. For more nodes, remove --cluster-type single-node and add --number-of-nodes x
aws redshift create-cluster --node-type dc1.large --master-username admin --master-user-password Password1 --cluster-type single-node --cluster-identifier My-Redshift-Cluster --db-name redshift --cluster-subnet-group-name mysubnetgroup
Next, update the default security group for the cluster, so that the Redshift cluster can talk locally.
aws redshift authorize-cluster-security-group-ingress --cluster-security-group-name default --cidrip 10.0.0.0/16
You should now be able to launch Matillion ETL into the VPC.