SSIS Performance Enhancement- Remove Sort Column Additional Steps

Writing again after a long time, today while working on a SSIS package i came across a situation where i have a Type2 Dimension and i was getting duplicates rows to the table the only difference was BUsiness Key column was behaving as a case sensitive column i.e. if any row on source was found to change in case the code was treating this business Key as a new row and inserting whereas i needed it to ignore case.

Now here is the reason why it was happening and the resolution,

Cause: I was using the best practices for my Merge Join on the package wherein i have removed the SORT column and to achieve the same i was ordering it on my SOurce Trasnform and marking the ISSorted property and sorting the dataset from source. For details on what i mean read this

https://technet.microsoft.com/en-us/library/ms141031%28v=sql.105%29.aspx

My Data Flow

DataFlow

My SQL server is case insensitive, and SSIS takes by Default windows Sorting, so i was getting the case sensitive row as a new row as the key column changed from ABC to abc,

So to get this resolved what we have to do is set the, comparison Flag property to IGnore case and this would lead to ignore of case for the Key column, as per my DFT above this would have to done on both the source transform, such that setting is identical on both of them

Comparison_Flag

The link helped me resolve it:

http://microsoft-ssis.blogspot.com/2011/03/sorting-in-sql-vs-sorting-in-ssis.html

Advertisements

Consume Webservice via SSIS Script Component

I have always believed SSIS Script component is a brilliant task and it provides for accessing data from any type of source. I have used it as a SOURCE Component very recently to consume data from Webservice, and it required a bit of a tweak to overcome the limitations on this Script component. viz:

Script Component doesn’t support reading data from a Config file, you can have a look at the problem i had published here

http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/75111c36-3d92-4440-a7b3-5cfe8a648403

The scenario i will explain below, would consume a WebService which is a WCF service targeted at Framework 3.5, and use HTTPBINDING.

So how to we proceed step wise as:

1. Create a DFT and drop a DFT task.

2. Inside the DFT place the Script Component, select the SOURCE

Image

3) Now open the Script component to edit the Script, look for Add Service Reference on the Reference section if you dont see it change the project to target Framework 3.5, once done you would be able to see the AddSerivceRefrence

Image

4. Adding the Service Reference would lead to adding a file a App.config on the project, which would have the bindings and Contract settings for the web service, Genrally you would use this and from code call the settings stored on this file to access the Webservices.

Image

THIS IS THE PROBLEM WE CANT CALL THE WEBSERVICE NORMAL WAY AS SSIS COMPONENT TASK DOESNT RECOGNISE THE APP.CONFIG file,

Resolution is to define the bindings at runtime so use a code like below, wherein you define the BINDINGS via code rather then depending on the App.config file.

You would need to use the code below for BASICHTTPBINDING

BasicHttpBinding binding = new BasicHttpBinding(BasicHttpSecurityMode.TransportWithMessageCredential);
binding.Security.Message.ClientCredentialType = BasicHttpMessageCredentialType.UserName;
binding.MaxBufferSize = 2147483647;
binding.MaxBufferPoolSize = 2147483647;
binding.MaxReceivedMessageSize = 2147483647;

// Replace <machinename> with the name of the development server.
EndpointAddress remoteAddress = new EndpointAddress(“<service Link>”);

ReportPortClient client = new ReportPortClient(binding, remoteAddress);
client.ClientCredentials.UserName.UserName = “USER NAME”;
client.ClientCredentials.UserName.Password = “Password”;

BindingElementCollection elements = client.Endpoint.Binding.CreateBindingElements();
elements.Find<SecurityBindingElement>().IncludeTimestamp = false;
client.Endpoint.Binding = new CustomBinding(elements);

// THis portion is specific to my call of web service

Execute_ReportType request = new Execute_ReportType();
response = client.Execute_Report(request);

For NETTCPBINDING and defining the bindings from code you could have a look at:

http://www.bibits.co/post/2012/08/15/Consuming-Web-Services-in-SSIS-Script-Component.aspx

Reach out to me at my Email if you some clarifications and require assistance in consuming a Webservice via the Script Component task,

 

PS: Ensure for the SSIS Script component Source to work properly the SSIS package executing account must have read/ write permissions on the TEMP(generally- C:\Windows\Temp) directory, on the box the package would execute

 

SSIS Delta Load – CDC Approach SQL Server 2008 R2

I have spoken about CDC in my last post will speak about how to impliment CDC in conjuction with SSIS. This approrach might not be the best approach and you would need to evaluate properly in case you decide to implimenting the same

MS describes 2 ways to impliment the same,

Specified Interval Sample: http://msftisprodsamples.codeplex.com/wikipage?title=SS2008!Change%20Data%20Capture%20for%20Specified%20Interval%20Package%20Sample&ProjectName=msftisprodsamples

Change Data Capture since Last Request Package Sample

http://msftisprodsamples.codeplex.com/wikipage?title=SS2008%21Change%20Data%20Capture%20since%20Last%20Request%20Package%20Sample

i have used  the change data capture since last request package appraoch and cusomised its use, will try to define and simplify the same here

The package at high level looks like

The basic approach is:

  1. There are 4 containers, high level as defined below:
    1. Initialization Container: This step initializes all the connection Managers on the package and logs the start of the package to SQL server via the OnPrexecute Event Handler
    2. LSN assign Container: This step calculates if there are any data changes to be processed for the present run.
    3. Child package Container: This step calls all the child packages required for loading the data.
    4. End Package Container: This logs the end of package and sends Email, this will replaced with the SEND MAIL task provided in SSIS, and based on requirements the email will be sent to ADMIN(Need to figure out whom should Emails be sent to? )

Points 1 and 4 above are specific to your own implimentation, I would shed some light on the 2 and 3 as part of this post

THe basic approach taken requires us to introduce a new table called as LOAD_LOG onto the Destination DB, this table will capture the Start LSN, END LSN, Start Time, End TIme , Load Status.i.e. it defines values S- Sucess F: Failure N:No records Found

The approach above is a incremental data load approach it requires the initial data load to be completed in advance, and ETL_ Load_Log table to be populated manually once with the appropriate LSN values which can be sourced from cdc.lsn_time_mappingtable on your DB

Obtain the Start LSN from the Load_LOG table,  this would be the same as the latest row on this table which has load run= S and picked from EndLsn Column sample query may look like

The query tries to get the lastLSN which was succefully loaded which can then act as a start point for the next load

The next load LSN (both start and END) can be odtained via using the fucntion

StartLSN: sys.fn_Cdc_Incriment: this function will incriment the last sucssfull loaded LSN to the next LSN

EndLSN: sys.fn_cdc_get_max_Lsn(), this will give the END LSN for which load needs to run,

Keep a note the LSN values are binary values and for passing in the package we need to conver them to string and then viceversa for this we need to use functions like sys.fn_varbintoHex() and sys.fn_cdc_hexstrtoBin

An example of how to find out the new points once we know the last Succefull LSN load would look like something in the image below:

The step above sets the varaible dataready which can be used to figure out if the data is ready to be loaded or not, as above compare the LSN’s obtained as stored on the cdec.lsn_time_mapping table if the lsn <= startLSN and >= EndLSN exists that means data is ready to be loaded and we can go forward with the load else the no data for loading exisits

If the data is ready the next step to insert a record onto the load_log table with the status F, which would be updated post all the data load completion with S, if at all the things run fine

Once inserte the next conatiner would be used to call all the child packages whcih would have the code for all the tables required during load

Hope this articale makes sense the next article i would deal with the drawbacks of the appraoch i ahve taken and precautions which needs to be taken for this approach to work correctly

I would like to know the feedback and  questions for the above post, do write in what is good what is bad which has pointers for improvements in future posts.