CDC Feature in Microsoft SQL Server

Writing my first BLOG ever, couldn’t think of anything better then the technical stuffs which i do with my Employer. Today would pin down my thoughts for design approach i took for one of the projects where we wanted to capture DELTA loads using the newbie(not so new now) in SQL SERVER 2008 R2. Let me introduce first to some concepts

What is CDC: CDC aka CHANGE DATA CAPTURE is the feature introduced via sql server 2008, CDC is available only on the Enterprise, Developer, and Evaluation editions of SQL Server 2008

What does CDC do: CDC records insert, update, and delete activity that is applied to a SQL Server table making them available as easily consumable relational format tables. Any changes that are made to the specified tables is captured, and the modified rows are stored in change tables that mirror the column structure of the tracked source tables. Post that CDC/ SQL server provides Table-valued functions for systematic access to the change data by consumers.

Explaining CDC: If we try to correlate how Microsoft brought all these things together, and try to correlate, remember the good old concepts of Triggers and writing of log files, MS brought both of them together to give us something which is called as CDC, you might think how are these related to understand its better i would try to correlate and explain things such that they could be understood better, Warning: This is just for better understanding and SQL server might not exactly do all the steps)

As we know any changes made to the Database are written into a forward only .ldf file by the SQL server lazy writer think of it as a page in a book where someone continues writing and cant go back, so whatever changes or things he wants to write he will continue writing, to recognize one will place page numbers or row numbers in between so that things can be tracked, something called as LSN(Logical Sequence Number) for SQL server.

Now once the CDC is implemented the changes records are captured via the triggers magic table(Inserted, deleted) and rows are captured with the LSN from the ldf files and inserted as data to the new tables, this is at high level how CDC works.

ENOUGH OF THEORY SOME CODE NOW: First COde to enable CDC

USE <DBNAME>
GO

–Create FileGroup to Store the CDC TABLES
ALTER DATABASE ADD FILEGROUP fgCDC

–Associate File .ndf for the FileGroup
ALTER DATABASE
ADD FILE (NAME = ‘CDC.ndf’,
FILENAME = ‘C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\NDF\CDC.ndf’,
SIZE = 1024,
MAXSIZE = UNLIMITED ,
FILEGROWTH = 10244)
TO FILEGROUP fgCDC

DECLARE @Error VARCHAR(MAX)
BEGIN TRY
BEGIN TRAN
select name, database_id,is_cdc_enabled
from sys.databases

– Enable CDC
EXEC sys.sp_cdc_enable_db
–EXEC sys.sp_cdc_disable_db

–Check Enabled or Not on DB
select name, database_id,is_cdc_enabled
from sys.databases

– Enable a Table Specifying Filegroup Option
– ===================================================
EXEC sys.sp_cdc_enable_table
@source_schema = N’dbo’,
@source_name = N’Test’,
@role_name = NULL,
@filegroup_name = N’fgCDC’,
@supports_net_changes = 1

END TRY

BEGIN CATCH
SELECT @Error = ERROR_MESSAGE()+ ERROR_LINE()
RAISERROR(16,@Error,1)
ROLLBACK TRAN
RETURN
END CATCH

COMMIT TRAN

Steps Involved

a) Create a file group to store the CDC enabled tables, this file group would store the secondary files defined to store the CDC tables.

b) Associate a secondary file(.ndf) to the filegroup, note that the .ndf should reside on a separate directory other then the primary .mdf file.

c) Enable CDC at database instance level : This is done by using the stored procedure sys.sp_cdc_enable_db.

d) Enable CDC for Table level instance : Source tables can be identified as tracked tables by using the stored procedure sys.sp_cdc_enable_table. When a table is enabled for change data capture, an associated capture instance is created to support the dissemination of the change data in the source table. The capture instance consists of a change table and up to two query functions

All objects that are associated with a capture instance are created in the change data capture schema of the enabled database. By default, the name is <schema name_table name> of the source table. Its associated change table is named by appending _CT to the capture instance name. The function that is used to query for all changes is named by prepending fn_cdc_get_all_changes_ to the capture instance name. If the capture instance is configured to support net changes, the net_changes query function is also created and named by prepending fn_cdc_get_net_changes_ to the capture instance name.

CDC has to be enabled for each Database and then tables individually on the server. once enabled SQL server creates several tables/ functions which are used to help CDC work, most important of them are, assuming we enabled CDC for a TABLEA

a) Table:

CDC.lsn_time_Mapping: this table is database specific and captures all the start times and end times of trassactions with the respective LSN(Logical Sequence Numbers)

CDC.dbo_TABLEA:THis is the table with the same structure as the table you want to capture, with some additional columns like: StartLSN, EndLSN, $UpdateMask(this stores values in binary format and help identifies which column has changed), $Operation(Specifies the type of operation row corrsoponds to, 1: Delete, 2: Inserts, 3: Updated values before, 4: Updated value after)

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: