PostgreSQL Connector
  • Dark
    Light

PostgreSQL Connector

  • Dark
    Light

Overview

The CDC agent's PostgreSQL connector can monitor and capture row-level changes within PostgreSQL schemas in a non-intrusive and performant manner and achieves this by using the raw output stream from the pgoutput decoding plugin. The connector produces a change event record for every insert, update and delete event occurring in any tables being monitored by the CDC agent.


Versions

Matillion Data Loader CDC supports PostgresSQL versions 10, 11, 12, 13, and 14 (including minor versions). PostgresSQL versions 10 and above contain the pgoutput plugin by default to capture logs natively.


Database rules

The PostgreSQL database must be configured for CDC. For details, see Configuring PostgreSQL database.

The database must be to the primary server, and this server must be active.

Aurora Serverless DB clusters do not support CDC. Only provisioned Aurora DB clusters support CDC.


Connect to PostgreSQL

When selecting PostgreSQL as a source during pipeline creation, you will be required to provide the following information:

Property Description
Pipeline Name The name for your new Pipeline. Must be unique.
Server address The server address of your PostgreSQL database.
Port The port number used to access your PostgreSQL database.
Database Name The name of your PostgreSQL Container Database installation.
Username The username used to log in to the specified database.
Secret Provider Choose the provider for your secrets manager that contains your database password.
Secret Name Enter the name of the secret that corresponds to your database password.

Tables

Property Description
All Tables Tables available from your defined PostgreSQL source will be displayed in Matillion CDC. These tables are defined and created by the user in their PostgreSQL database and thus cannot be described here.
Tables to extract and load Selecting and moving a table to this column will include it in the CDC pipeline

How it works

PostgreSQL normally purges write-ahead log segments (WAL) after a period of time. This means a complete history of changes are not available for the connector. To overcome this scenario, the connector will initially perform a consistent snapshot for all schemas and tables that are being monitored. This will allow the connector to establish a base state after which streaming can begin.

Note

Depending on the number of rows within your schema, this may take a while to complete.

If required, the snapshot stage can be excluded if you wish to omit historic data from your CDC pipeline. This involves setting an advanced property, snapshot.mode, to the appropriate setting. Please see the PostgreSQL Advanced Settings page for further details.

Once the snapshot stage has completed, the connector will move to the streaming stage. All changes that have occurred since the snapshot started will be captured during the streaming process - no changes will be missed.

The streaming stage will continue to monitor and consume changes as and when they're occurring within the database and any produced change events will be exported to your selected data lake in a consistent and predictable pattern. The connector will maintain an acceptable time lag behind the source database - this lag can be monitored through the Matillion Data Loader UI.

The connector is tolerant of failures. As the connector reads changes and produces events, it records the WAL position for each event. If the connector stops for any reason (including communication failures, network problems, or crashes), upon restart, the connector continues reading the WAL where it last stopped.

If the connector stopped at any point during the snapshot stage, a new snapshot will be taken on restart of the pipeline.

Note

Schema changes are not supported at this time by the PostgreSQL connector.


Next steps

You will now be required to configure and setup your PostgreSQL database. For detailed information on how to do this, checkout the Configuring PostgreSQL database guide.