-
DarkLight
CDC Pipeline Overview
-
DarkLight
Overview
A pipeline in Matillion CDC is a holistic description of your CDC workflow. This can be understood in four main parts:
- CDC Agent: The Matillion CDC agent is responsible for managing the CDC tasks in your cloud provider that orchestrates the CDC process.
- Source: Connection to your source database or service.
- Destination: Connection to your destination service, usually Amazon S3 or Azure Blob Storage.
- Pipeline Settings: Configuration of the overall pipeline such as the frequency at which it runs.
Pipeline creation should follow this order. Users have the freedom to mix and match each of the above to create a bespoke pipeline. Thus, the specifics of configuring each can be found in different parts of the documentation.
- Read Create a CDC pipeline for details of how to create and configure a pipeline.
- Read MDL Pipeline UI for details of how to manage your pipelines in the Matillion Data Loader pipeline dashboard.
Agent
Every CDC pipeline requires a Matillion CDC Agent to orchestrate the data loading tasks.
- Currently-defined agents can be viewed from the Agents dashboard
- Agents must be installed within the cloud service provider of your choosing before a pipeline can be created.
- Defined agents can be reused and selected by other pipelines you create.
- Agent status must be "Connected" for a pipeline to succeed.
- Agent installation can be a technically involved process and it is highly recommended that you read the documentation for agent installation and consult your cloud administrator for help and permissions where required.
Source
Your source database is where you plan to pull data from.
- You require permission to access your database and configure it for CDC. Contact your database administrator as required.
- Matillion CDC requires connection details including host name, database name and login details to connect to the source database.
- You are required to store login passwords in a secrets manager (AWS or Azure) that should be set up in advance.
- Source databases must be configured for CDC before inclusion in a pipeline. Please consult the Sources category for more help.
- Some data sources allow you to specify JDBC connection parameters. The documentation for your source data model will provide information about supported JDBC connection settings.
- Your choice of source tables may limit your transformation options when moving data from storage to a data warehouse.
Destination
The destinations for CDC pipelines are cloud storage containers that will receive the data.
- You will need to know details of your storage container including name, account name and password.
- You are required to store login passwords in a secrets manager (AWS or Azure) that should be set up in advance.
- It is advised you have access to your storage container.
- Please consult the Destination category for more help.
Settings
- Pipeline Name: Provide a unique name for the pipeline.
- Snapshotting: Turn full snapshotting off or on. For more detailed information on snapshotting options, see the Advanced Settings article for your chosen CDC source.
Managing pipelines
Once created, pipelines are managed through the Matillion Data Loader pipeline dashboard. From here, you can view the status of all pipelines, and you can start, stop, or delete any pipeline.
When deleting and recreating a CDC pipeline, you must clear out the files that the pipeline places in your cloud storage. If you don't, the new pipeline will recognise the existing offset.dat
file and will therefore skip the snapshot phase.