-
DarkLight
Quick Guide - Deploying CDC Agent in GCP
-
DarkLight
Overview
Use this guide to add a CDC agent in Matillion Data Loader and then deploy that agent in Google Cloud Platform (GCP). Creating and deploying an agent are required steps to set up a CDC pipeline in Matillion Data Loader.
For best performance, your GCP region should be geographically similar to your Matillion Hub account region.
Prerequisites
- Install Terraform.
- Export cloud credentials with access to:
- Google Project and Service Account. Read Project and Service Account for more information.
- Google Service account with administrative permissions:
- roles/container.admin
- roles/iam.serviceAccountAdmin
- Google Cloud Storage bucket. Read GCP Storage Bucket for more information.
- Google Secret Manager. Read Google Secret Manager for more information.
- Platform Key (Secret Manager) Secret value (Platform Key gets generated on the first agent creation. All the subsequent agents will use the same key).
- Database password (Secret Manager) Secret name.
Create a CDC agent in Matillion Data Loader
- Log in to the Matillion Hub.
- The My Accounts lists any accounts you have already created or joined. At the bottom of this list, click Add new account. Read Create an Account to learn more about this topic.
Each Matillion Hub account can generate its own unique platform key that your CDC agent will use to communicate with Matillion Data Loader. With this in mind, create the CDC agent in the account that matches the platform key you will be using.
- Choose Matillion Data Loader as a service on the Select your service page.
- On the Matillion Data Loader dashboard, scroll to the lower-right of the UI and choose your region.
- Select Agents in the left sidebar and click Add agent.
- Give your agent a sensible Agent name and Description. Click Continue.
- Since this guide is for GCP, select GCP as your cloud provider.
- Choose Terraform as the service to provision and deploy your cloud resources from for the CDC agent installation.
- In the Prerequisites for agent setup, note the following values:
- ID_ORGANIZATION: This value is used when deploying the CDC agent in GCP. The value is unique per agent.
- ID_AGENT: Also used when deploying the CDC agent. The value is unique per agent.
- PLATFORM_WEBSOCKET_ENDPOINT: Also used when deploying the agent. The value is unique for the Matillion Data Loader region (US or EU).
- Public/Private key pair: This is a generated value. If you haven't generated a platform secret for your account yet, Matillion Data Loader will prompt you to do so when creating a CDC pipeline. You need to store this value in GCP Secret Manager where your CDC agent can access it. For security reasons, this key pair can only be generated and shown once per account, so make sure to copy and save it for future use.
You can revisit this page if required.
- Check the I have saved the private key in AWS Secrets Manager and made a note of the secret name checkbox.
- Click Submit key pair.
GCP Secret Manager
- Navigate to the Secret Manager in the Google Cloud console.
- On the Secret Manager page, click Create secret.
- On the Create secret page, enter the name of your secret.
- For database passwords, the secret name can be arbitrary and is referred to in Matillion Data Loader.
- In the Secret Value section, either upload the value or enter the secret value in a JSON format.
- The secret value is the Platform Key that is generated on the first agent creation. All the subsequent agents will use the same key.
- In Region choose specific regions for storing your secret, select manually if you want to choose any specific region or leave blank.
- Click the Create secret button.
Once created, you can view your secret by clicking View secret value.
IAM Policies and permissions
Certain permissions are broadly necessary to use the Google Cloud console and to grant Cloud Access Management.
- Navigate to IAM & Admin → Roles.
- Select + CREATE ROLE.
- Enter a title, description, and ID.
- Select + ADD PERMISSIONS and add the following permissions:
orgpolicy.policy.get
resourcemanager.projects.get
secretmanager.versions.access
storage.buckets.get
storage.multipartUploads.abort
storage.multipartUploads.create
storage.multipartUploads.list
storage.multipartUploads.listParts
storage.objects.create
storage.objects.delete
storage.objects.get
storage.objects.list
storage.objects.update
- Select CREATE to create the role with these permissions.
- Select IAM & Admin → Roles.
- Select ADD.
- Search for or paste the service account email in New principals.
- Select the newly created role in the drop-down menu and select SAVE.
Deploy the CDC Agent in GCP
There are multiple deployment methods for a Matillion Data Loader CDC agent. The following steps use the process detailed in the Terraform Compute Engine Advanced template.
- Download the Terraform template, linked here.
- Update the following template file
Matillion-cdc-agent.tfvars
with the following details:- Project Id: This is the Google Cloud project ID.
- Region: Google Cloud region where the Compute Engine instance will be deployed.
- Zone: Google Cloud zone where the Compute Engine instance will be deployed.
- Network_name: Google Cloud network to attach to the Compute Engine instance.
- Instance_name: Name of the Compute Engine instance e.g. matillion-cdc-agent.
- Storage_bucket_name: Name of the Google Cloud Storage bucket where the agent will land the data.
- Organization_id: This is provided to you by the Matillion Data Loader client when setting up a new agent.
- Agent_id: This is provided to you by the Matillion Data Loader client when setting up a new agent.
- Platform_websocket_endpoint: This value must be set to
wss://ws-<region>.matillion-cdc-prod.matillion.com:443/ws
where<region>
is eithereu
orus
depending on the Matillion Data Loader region you are building the pipeline in. - Platform_key_secret_name: Name of the Platform Key Secret stored in the Google Secret Manager.
- Database_password_secret_name: Name of the source Database Password Secret stored in the Google Secret Manager.
In a terminal session, where you have copied the downloaded template files, you apply the terraform .tfvars
file in your terminal.
-
Begin the Terraform deployment process. An example is provided below but this may change depending on your company.
-
Initialise Terraform.
terraform init -var-file=remote-state.tfvars`
- CDC workspace.
terraform workspace new cdc
terraform workspace select cdc
- Create CDC infrastructure.
terraform plan -var-file=matillion-cdc-agent.tfvars
terraform apply -var-file=matillion-cdc-agent.tfvars
- Once you select apply terraform with the required information in the
.tfvars
file, initialization will commence, and create the required resources in the Google Cloud console needed for CDC agent.
- Once this process is finished, you can visit the Google Cloud console (VM instances). Refresh the page and you can see an instance has been created.
In Matillion Data Loader, your created CDC agent's status should display as Connected and offer the Add Pipeline button.