-
DarkLight
Configuring MS SQL Database
-
DarkLight
Overview
For a Matillion CDC agent to capture change events from Microsoft SQL Server tables, a Microsoft SQL Server administrator with the necessary privileges must first run a query to enable CDC on the database. The administrator must then enable CDC for each table that you want the CDC agent to capture. These steps are based on advice that can be found in the Microsoft documentation, and we recommend reading that documentation thoroughly. This article is intended to put that advice into the context of Matillion Data Loader's CDC feature.
CDC captures all INSERT
, UPDATE
, and DELETE
operations committed to the tables for which CDC is enabled once it is deployed. The connector can then record and broadcast these events.
Prerequisites
Before the Matillion CDC agent can be used to load data from Microsoft SQL Server, the database must be configured. To perform these steps, the following is required:
- A
sysadmin
user must enable change data capture for the Microsoft SQL Server database (or Azure SQL Managed Instance). - In Azure SQL Database, the
db_owner
role is required to enable change data capture. - The Microsoft SQL Server Agent must be running.
If the Agent isn't running already, start Microsoft SQL Server Agent. Please read Start, Stop, or Pause the SQL Server Agent Service for more information. If it's not running, you will see an error similar to the following in server logs:
2017-01-08 15:40:24,596 @ -ERROR cached5
com.webaction.source.tm.MSSqlTransactionManager.getStartPosition
(MSSqlTransactionManager.java:389) 2522 :
Could not position at EOF, its equivalent LSN is NULL
Enabling CDC on the SQL Server database
As noted in Prerequisites, a privileged user must first activate the database for change data capture before a capture instance for specific tables can be established.
To determine if a database already has CDC enabled, run the following command:
SELECT name, is_cdc_enabled
FROM sys.databases
WHERE name = '<DATABASE_NAME>'
If the value for is_cdc_enabled
is 1
, then CDC is enabled for the database, otherwise it is not enabled.
To enable CDC on your database, run the following stored procedure:
USE <database name>
EXEC sys.sp_cdc_enable_db
Enabling the database for CDC will trigger the creation of several database-specific resources: a schema with the name cdc
, a cdc user, metadata tables, and other system objects.
Creating a CDC user role
-
Create a SQL Server user to be used by the Microsoft SQL Server Reader. The user should have
sysadmin
fixed server role for the SQL Server and must use the SQL Server authentication mode, which must be enabled in SQL Server. -
Grant the Microsoft SQL Server Reader user the
db_owner
role. This must be done for each database that is to be read. Use the following commands, substituting your own details:
USE <database name>
EXEC sp_addrolemember @rolename=db_owner, @membername=<user name>
For example, to enable CDC on the database mydb
, create a user cdcuser
, and give that user the db_owner
role on mydb
:
USE mydb
EXEC sys.sp_cdc_enable_db
CREATE LOGIN cdcuser WITH PASSWORD = 'passwd'
CREATE USER cdcuser FOR LOGIN cdc
EXEC sp_addrolemember @rolename=db_owner, @membername=cdcuser
Enabling CDC on a SQL Server table
Once your database has been enabled for CDC, you must then enable CDC for all tables that you wish to capture data from. A capture instance for each source table must be created. To do so, the creating member must have the db_owner
fixed database role. The stored procedure sys.sp_cdc_enable_table
can then be used as described below.
- Use the following command to retrieve all the source tables that have been enabled for CDC:
SELECT name, is_tracked_by_cdc, SCHEMA_NAME(schema_id) AS 'schema_name'
FROM sys.tables
WHERE is_tracked_by_cdc = 1
- Use the following command to enable CDC on source tables you wish to capture:
USE MyDB
GO
EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name = N'MyTable',
@role_name = N'MyRole',
@filegroup_name = N'MyDB_CT',
@supports_net_changes = 1
GO
The following fields are used in the above script:
Field name | Description |
---|---|
@source_name | Specifies the name of the table that you want to capture. |
@role_name | Specifies a role to which you can add users to whom you want to grant SELECT permission on the captured columns of the source table. sysadmin or db_owner roles also have access to the specified change tables. Set the value of @role_name to NULL to allow only members in the sysadmin or db_owner roles to have full access to captured information. |
@filegroup_name | Specifies the filegroup where SQL Server places the change table for the captured table. The named filegroup must already exist. It's best not to locate change tables in the same filegroup that you use for source tables. |
Verify that the user has access to the CDC table
To get CDC configuration information from a database or table, an SQL Server administrator can use a system stored procedure.
SQL Server administrator with the necessary privileges must do the following:
- All the captured columns in the capture instance have SELECT permission. Members of the db_owner database role have access to all the defined capture instances' information.
- You have membership in any gating roles that are defined for the table information that the query includes.
- Run the
sys.sp_cdc_help_change_data_capture
stored procedure to query the table.
Queries shouldn't return empty results.
The following example runs the stored procedure sys.sp_cdc_help_change_data_capture
on the database MyDB
to query a table for CDC configuration information:
USE MyDB;
EXEC sys.sp_cdc_help_change_data_capture
The query returns configuration information for each CDC-enabled table in the database that includes change data that the caller is authorized to see. If the result is empty, make sure the user has access to the capture instance as well as the CDC tables.
SQL Server Always On
The SQL Server connector can capture changes from an Always On read-only replica.
Prerequisites:
- Change data capture is configured and enabled on the primary node. SQL Server doesn't support CDC directly on replicas.
- The read-only replica must be configured with Readable Secondary as either
Read-intent only
orYes
. Read Microsoft's documentation for more information, including detailed steps. - Ensure the Read Replica toggle is on when creating your Pipeline.
Schema drift
Schema drift occurs when your data source changes its metadata, by adding, removing, or changing fields, columns, and types. When not properly handled, schema drift will cause a difference between the data source and your CDC pipeline configuration, resulting in the pipeline failing or processing incorrect information.
Due to SQL Server limitations, Matillion Data Loader is not able to automatically handle schema drift on source tables in Microsoft SQL Server databases. A SQL Server connector with CDC enabled propagates column type changes to your destination, but it does not propagate any other schema changes, such as renamed columns, new tables, or new columns, meaning you may have to reconfigure your pipeline in response to changes in the source metadata.
Limitations
Change Data Capture has the following limitations:
- To build a change capture instance in SQL Server, the base object must be a table. As a result, SQL Server doesn't enable collecting changes from indexed views (also known as materialized views).
- If the Microsoft SQL Server Agent service isn't running, Microsoft SQL Server CDC (Change Data Capture) capture jobs won't execute.
- CDC doesn't support the values for computed columns even if the computed column is defined as persisted. Computed columns that are included in a capture instance always have a value of NULL.
Read the Microsoft documentation for additional limitations.