The CDC agent's Oracle connector can monitor and record the row-level changes in your Oracle database. The connector ingests change events using Oracle's native LogMiner database package. Oracle LogMiner is part of the Oracle Database utilities and provides a well-defined, easy-to-use, and comprehensive interface for querying online and archived redo log files.
- Matillion CDC supports Oracle versions 12c EE and above (including minor versions).
- Your Oracle database must be configured for CDC. For details, see Configuring Oracle database.
- Aurora Serverless DB clusters do not support CDC. Only provisioned Aurora DB clusters support CDC.
Connect to Oracle
When selecting Oracle as a source during pipeline creation, you will be required to provide the following information:
|Pipeline Name||The name for your new Pipeline. Must be unique.|
|Server address||The server address of your oracle database.|
|Port||The port number used to access your oracle database.|
|Container Database Name||The name of your Oracle Container Database installation.|
|Pluggable Database Name||The name of the PDB to be used that's contained within the CDB named above. Leave blank if your CDB does not contain a PDB.|
|Username||The username used to log in to the specified database.|
|Secret Provider||Choose the provider for your secrets manager that contains your database password.|
|Secret Name||Enter the name of the secret that corresponds to your database password.|
|All Tables||Tables available from your defined Oracle source will be displayed in Matillion CDC. These tables are defined and created by the user in their Oracle database and thus cannot be described here.|
|Tables to extract and load||Selecting and moving a table to this column will include it in the CDC pipeline|
How it works
Most Oracle servers may not have been configured to retain the complete history of changes within the Oracle redo logs. As a result, it will be necessary to take a consistent snapshot of the database before moving on to the streaming stage. This will ensure that streaming will only begin once the history of your schema / tables has been captured.
Depending on the number of rows within your schema, this may take a while to complete.
If required, the snapshot stage can be excluded if you wish to omit historic data from your CDC pipeline. This involves setting an advanced property,
snapshot.mode, to the appropriate setting. Please see the Oracle Advanced Settings page for further details.
The Snapshotting process involves the following actions against your CDC monitored tables:
- Obtain a
ROW SHARE MODElock on each of the monitored tables to ensure that no structural changes can occur.
- Read the current system change number (SCN) position in the server's redo log.
- Capture the structure of all relevant tables.
- Release the locks obtained earlier (these locks are only held for a short period of time).
- Generate change events for each row (as valid at the SCN captured earlier).
Once the snapshot stage has completed, the connector will move to the streaming stage. All changes that have occurred since the snapshot started will be captured during the streaming process - no changes will be missed.
The streaming stage will continue to monitor and consume changes as and when they're occurring within the database and any produced change events will be exported to your selected data lake in a consistent and predictable pattern. The connector will maintain an acceptable time lag behind the source database - this lag can be monitored through the Matillion Data Loader UI.
- The connector is tolerant of failures. As the connector reads changes and produces events, it records the SCN position for each successful event captured.
- If the connector stops for any reason (including communication failures, network problems, or crashes), upon restart, the connector will continue streaming where it left off.
- If the connector stopped at any point during the snapshot stage, a new snapshot will be taken on restart of the pipeline.