-
DarkLight
PostgreSQL Connector
-
DarkLight
Overview
The CDC agent's PostgreSQL connector can monitor and capture row-level changes within PostgreSQL schemas in a non-intrusive and performant manner and achieves this by using the raw output stream from the pgoutput
decoding plugin. The connector produces a change event record for every insert, update and delete event occurring in any tables being monitored by the CDC agent.
Versions
Matillion Data Loader CDC supports PostgresSQL versions 10, 11, 12, 13, and 14 (including minor versions). PostgresSQL versions 10 and above contain the pgoutput
plugin by default to capture logs natively.
Database rules
The PostgreSQL database must be configured for CDC. For details, see Configuring PostgreSQL database.
The database must be to the primary server, and this server must be active.
Aurora Serverless DB clusters do not support CDC. Only provisioned Aurora DB clusters support CDC.
Connect to PostgreSQL
When selecting PostgreSQL as a source during pipeline creation, you will be required to provide the following information:
Property | Description |
---|---|
Pipeline Name | The name for your new Pipeline. Must be unique. |
Server address | The server address of your PostgreSQL database. |
Port | The port number used to access your PostgreSQL database. |
Database Name | The name of your PostgreSQL Container Database installation. |
Username | The username used to log in to the specified database. |
Secret Provider | Choose the provider for your secrets manager that contains your database password. |
Secret Name | Enter the name of the secret that corresponds to your database password. |
Tables
Property | Description |
---|---|
All Tables | Tables available from your defined PostgreSQL source will be displayed in Matillion CDC. These tables are defined and created by the user in their PostgreSQL database and thus cannot be described here. |
Tables to extract and load | Selecting and moving a table to this column will include it in the CDC pipeline |
How it works
PostgreSQL normally purges write-ahead log segments (WAL) after a period of time. This means a complete history of changes are not available for the connector. To overcome this scenario, the connector will initially perform a consistent snapshot for all schemas and tables that are being monitored. This will allow the connector to establish a base state after which streaming can begin.
Depending on the number of rows within your schema, this may take a while to complete.
If required, the snapshot stage can be excluded if you wish to omit historic data from your CDC pipeline. This involves setting an advanced property, snapshot.mode
, to the appropriate setting. Please see the PostgreSQL Advanced Settings page for further details.
Once the snapshot stage has completed, the connector will move to the streaming stage. All changes that have occurred since the snapshot started will be captured during the streaming process - no changes will be missed.
The streaming stage will continue to monitor and consume changes as and when they're occurring within the database and any produced change events will be exported to your selected data lake in a consistent and predictable pattern. The connector will maintain an acceptable time lag behind the source database - this lag can be monitored through the Matillion Data Loader UI.
The connector is tolerant of failures. As the connector reads changes and produces events, it records the WAL position for each event. If the connector stops for any reason (including communication failures, network problems, or crashes), upon restart, the connector continues reading the WAL where it last stopped.
If the connector stopped at any point during the snapshot stage, a new snapshot will be taken on restart of the pipeline.
Schema changes are not supported at this time by the PostgreSQL connector.
Next steps
You will now be required to configure and setup your PostgreSQL database. For detailed information on how to do this, checkout the Configuring PostgreSQL database guide.