Pages

Showing posts with label SSIS. Show all posts
Showing posts with label SSIS. Show all posts

Monday 5 May 2014

SSIS Fast Load Oracle Destination Step by Step :

SSIS Fast Load Oracle Destination Step by Step : 

To fast load the data into Oracle destination I am using here Oracle Attunity. Attunity help us to fast load the data. Here I am showing you step by step to load the data fastly for this I am using DB2 source.

Step 1: Take Dataflow Task for perform ETL operation.




Step 2 : Take OLEDB Source and create a connection which is your source database and choose SQL Command to write the script for fetch only those field which required.


Step 3 : Click on Columns tab.


Step 4 : Take Data Conversion Transformation if there is need in my scenario I need Data Conversion transformation to change some data types and their size.


Step 5: Here I have changed some columns data types according to destination columns data types.


Step 6: Take Oracle Attunity Destination and configure the connection string.
TNS Service Name: Is your TNS for Destination server
User Name: Server User name
Password: Server Password


Then Click on Test Connection if Test connection succeeded then press OK and again OK.

Step 7: Click on Name Of The Table Or The View Dropdown to select the correct table.


Step 8: Click on Mapping tab to map the columns and Click on OK.


Step 9: Click on RUN to Execute the Package.






It has taken only 0.22 Seconds to fetch 87,840 records.

Saturday 9 March 2013

Difference Between Script Task and Script Component

Script Task VS Script Component :

While both the objects provide scripting environments using the VSTA engine, yet there are inherent differences in their purpose and functions. This is due to their design architecture and the placement in development environment.

Ø  The main difference of Script Task and Script Component is the way the script is executed by both the objects. The Script Task executes the script only once for each execution instance of the task, while the Script Component generally executes the script for each row, as it applies the custom transformation on each of the data row reads.

Ø  The Script Task is available in the Control Flow task tab, while the Script Component works in the pipeline or the Data Flow Task.

Ø  You can choose Script Component as a Source, a transformation, or a destination. The choice of function and the metadata and properties you configure in the editor results in creation of different members of the base classes in the auto-generated code. The Script Task does not provide any such facility, as the code generated is always the same.

Ø  The Dts property works with only the Script Task and it is not available in the Script Component.

Ø  The Script Task supports breakpoints while Script Component does not support breakpoints. 

Monday 4 March 2013

Nonblocking, Partial Blocking and Blocking Transformations In SSIS


Classifying Data Flow Transformations on performance considerations:

The three classifications of Data Flow transformations that are based mainly on
performance considerations.

Nonblocking Synchronous Row-Based Transformations:

These transformations work on a row-by-row basis and modify the data by changing
the data in columns, adding new columns, or removing columns from the data rows,
but these components do not add new rows to the data flow. The rows that arrive at the
input are those that leave at the output of these transformations. These transformations
have synchronous outputs and make data rows available to the downstream components
straightaway. In fact, these transformations do not cause data to move from one buffer
to another; instead, they traverse over the data buffer and make the changes to the data
columns. So these transformations are efficient and lightweight from the process point
of view. The transformations that readily pass the data to the downstream components are classified as Row Transformations.

The following Row Transformations are examples of Nonblocking synchronous row-based transformations:

Ø  Audit transformation
Ø  Character Map transformation
Ø  Conditional Split transformation
Ø  Copy Column transformation
Ø  Data Conversion transformation
Ø  Derived Column transformation
Ø  Export Column transformation
Ø  Import Column transformation
Ø  Lookup transformation
Ø  Multicast transformation
Ø  OLE DB Command transformation
Ø  Percentage Sampling transformation
Ø  Row Count transformation
Ø  Row Sampling transformation
Ø  Script Component transformation configured as Nonblocking Synchronous
Row–based transformation
Ø  Slowly Changing Dimension transformation

Partially Blocking Asynchronous Row-Set-Based Transformations:

Integration Services provides some transformations that essentially add new rows in the
data flow and hold on to the data buffers before they can perform the transformation.
The nature of such transformations is defined as Partially Blocking because these
transformations hold on to data for a while before they start releasing buffers. As
these transformations add new rows in the data flow, they are asynchronous in
nature. For example, a Merge transformation combines two sorted data sets that may
be the same data, but it essentially adds new rows in the data flow. The time taken
by this transformation before starting to release the buffers depends on when these
components receive matching data from both inputs.

Following is a list of Partially Blocking asynchronous row set-based transformations:

Ø  Data Mining Query transformation
Ø  Merge Join transformation
Ø  Merge transformation
Ø  Pivot transformation
Ø  Term Lookup transformation
Ø  UnPivot transformation
Ø  Union All transformation
Ø  Script Component transformation configured as Partially Blocking Asynchronous
Row-set-based transformation

Blocking Asynchronous Full Row-Set-Based Transformations:

These transformations require all the rows to be assembled before they can perform
their operation. These transformations can also change the number of rows in the
data flow and have asynchronous outputs. For example, the Aggregate transformation
needs to see each and every row to perform aggregations such as summation or
finding an average. Similarly, the Sort transformation needs to see all the rows to
sort them in proper order. This requirement of collecting all rows before performing
a transformation operation puts a heavy load on both processor and memory of the
server. You can understand from the nature of the function they perform that these
transformations block data until they have seen all the rows and do not pass the data to
the downstream component until they have finished their operation.

The following are Blocking asynchronous full row-set-based transformations:

Ø  Aggregate transformation
Ø  Fuzzy Grouping transformation
Ø  Fuzzy Lookup transformation
Ø  Sort transformation
Ø  Term Extraction transformation



Resources:
For more detailed information regarding this study, please see
Ø  Microsoft SQL Server 2008 Integration Services by Ashwani Nanda
Ø  Microsoft Books Online




Saturday 9 February 2013

Redirect Duplicate Rows Using SSIS Step By Step


Redirect Duplicate Rows Using SSIS

There are many ways to redirect the duplicate into another direction. You can use Script Component, Lookup Transformation etc. I am showing you the simplest way to redirect the duplicate. For this, here I am using Fuzzy Group Transformation to redirect the duplicate into another direction.
Here, I am taking Repeat_Data table from my Employee Database. In this table I have created some duplicate records for better understand.


Now Go to BIDS, Start New projects, Name the Project.
Take Data flow Task and double click on it.
Take OLEDB Data Source, double click on it and configure it.


Choose Table Repeat_Data.


Go to Columns and take all columns.


Take Fuzzy Group Transformation and double Click on it. A Fuzzy Group Transformation editor will open. Leave the first tab that is Connection Manager . Click the second tab i.e. Columns and configure it as directed below.


 Make the Minimum Similarity of name column to 0.25.
Go to Advanced tab and set the Threshold to 0.50 and press OK button.


Take Conditional Split Transformation and double Click on it and write Condition.


Condition: _key_in == _key_out
Make the Default output name as Duplicate Rows.

Your all configuration has completed now you can take any destination where you want to keep your unique records and where you want to keep your duplicate records. Here I am taking Multicast Transformation for just showing you the output of this configuration with the help of Data Viewer.


It is the Preview of your Package.



(Note: It is very easy way to redirect the duplicate into another direction but it has a drawback. It is a blocking transformation so if you use it then may be  you have some performance issue.)

















Wednesday 3 October 2012

Remove Duplicate Records Using SSIS


 Remove Duplicate Records Using SSIS Step by Step



It is very easy task to remove the duplicate rows from the table using SSIS package. To remove the duplicate records in SSIS there are multiple ways and I am using SORT transformation. Below I am showing you a simple example for removing the duplicates.

Go to the BIDS >> choose new project >> choose Integration Service Project >> give your package name and press OK button.
Now inside the Toolbox from Control flow item select the Data Flow Task and drag and drop into design area.
Rename the Data Flow Task.


Double click on Data Flow Task OR Right Click on Data Flow Task and choose edit.


Inside the Data Flow Task choose OLEDB Data Source and Drag and drop it into design area.
Rename it.
Double click on OLEDB Data Source.



 OLEDB Data Source Editor will open. Now click on NEW button.

 Again a new Editor will open here we again click on NEW button to configure the connection from the Database.



Here we create  connection to Database. For this I am using AdventureWorks Database So I use our local Server and Database AdventureWorks. After choosing the Database just check that everything is ok or not by clicking Test Connection button. If everything is ok then press OK button.


Click on Name of the Table or View and choose the table then click on column to select the column.


Now choose the columns which you want to display and click on OK.



Drag the SORT Transformation and double click on it for configure.


Here enable the check box i.e. at the bottom of the SORT  Transformation to remove the duplicate row and press OK.
Now you can choose any destination to load the data. I am using derived column to show you the result.
Here Data Viewer Shows the output rows.








Monday 16 April 2012

How many types Control Flow elements provides SSIS


Q: How many types Control Flow elements provides SSIS ?

SSIS provides three different types of Control Flow elements
i)                    Containers that provides structure in packages
ii)                   Tasks that provides functionality
iii)                 Precedence Constraints that connects the executables, containers and tasks into an ordered control flow.

Saturday 14 April 2012

SSIS Package, Tasks, Containers, Variables, Data Pipeline, Event Handler


SSIS Package, Tasks, Containers, Variables, Data Pipeline, Event Handler

SSIS Package :  A discrete executable unit of work composed of a collection of control flow and other objects, including data sources, transformations, process sequence, and rules, error and event handling and data destinations.

Containers :  Package objects that provide structure to packages and special services to tasks. Containers are used to support repeating control flows in packages and to group tasks. Containers can include other containers in addition to tasks.

Tasks : Package elements that define activities and processes, including data sources, destinations, transformations, and others.

Precedence Constraints :  Constraints that link executables, containers, and tasks within the package control flow and specify conditions that determine the sequence and conditions for determining whether executables run.

Variables :  Storage for values that an SSIS package and its containers , tasks, and event handlers can use at run time. The scripts in the scripts task and the script component can also use variables.

Control Flow :  An SSIS package process control component used to control flow elements that containers that provide structure in packages and services to tasks, tasks that provide functionality in packages, and precedence constraints that connect containers and tasks.

Data Flow :   An SSIS package data process  control component defined from within package control flow  that loads data from sources, transforms  and routes it through transformations, and saves it to destinations.

Event Handler :  An SSIS package process control component used to define the process activities to be performed at the time of a specific event state for the package or for any of its tasks or containers.

Data Pipeline :  The memory – based, multithreaded, buffered transformation process flow of data through an SSIS data flow task during package execution.

BIDS :   SQL Server Business Intelligence Development Studio. Provides the integration services project in which you create packages, their data sources, and data sources views. Read more ...

import column, export column, Copy Column transformation


Q: What is import column transformation ?
This transformation would be useful for loading a database with images, documents of media files.

Q: What is export column ?
It exports Binary data which means things like images, documents and other media which have been stored in a relational database. It exports them out to the file system.

Q: What is Copy Column ?
It takes an input column and creates a duplicate of it with a new name.

Wednesday 11 April 2012

When we use For Loop, For Each Loop and Sequence Container task


Q: When we use For Loop, For Each Loop and Sequence Container task ?

For Loop Container Task : When we Know the exact how many times we have repeat given data flow for execution.
For Each Loop Container Task : When there is dependency of number of time task should execute on data such as files, variables we will use this.
Sequence Container Task : When we the exact Sequence and sometimes we need to perform parallel operation based on our client requirement.

Tuesday 10 April 2012

Command line utility in SSIS


Q: Command line utility in SSIS ?
SQL Server Provides a Command Line Utility (DTEXEC.EXE) that helps the developers to execute the SSIS package. It can be directly used from the command prompt by moving around to the folder where the package is available and run the utility using this EXE.
DTEXEC  /? Provides the list of available options to execute the package from the command prompt.



DTEXECUI : This approach is a straight forward user interface option to execute a package.




How to deploy SSIS package


Q: How to deploy SSIS package ?
We have 3(Three) options are available for deploying the SSIS package. These are
·         Deployment Utility
·         Command Line Executable
·         SQL Server Management Studio.

What is SSIS


Q: What is SSIS ?
SSIS is an ETL tool (Extract , Transform , Load) Which is very much needed for Data warehousing applications.  SSIS  is also used to perform the operation like loading data based on the need , performing different transformations on the data like doing calculations (Sum, Average, etc.) and to define workflow of the process flow and perform some tasks on the day to day activities.