Problem: How to transform
data in SSIS and what are available data transformations in SSIS?
Transformations are part of ETL activities, as we get the source
files from different regions and different file formats. To combine all this
sources into a single destination table we need to set the standards and data
types and cleanse the values for certain extent to fit the need.
If you open SSIS Toolbox in data flow you will notice the area with this view.
All the Transformations that SSIS provides are inside in the
data flow area of the package, some of them are used to aggregate, merge, distribute, and modify data. Transformations can also
perform lookup operations and generate sample data sets.
Below is the list of transformations and the
Transformation
|
Description
|
configures the updating of a slowly changing
dimension.
|
|
standardizes values in column data.
|
|
looks up values in a reference table using a fuzzy
match.
|
|
extracts terms from text.
|
|
looks up terms in a reference table and counts
terms extracted from text.
|
|
runs data mining prediction queries.
|
|
corrects data from a connected data source by
applying rules that were created for the data source.
|
|
applies string functions to character data.
|
|
adds copies of input columns to the transformation
output.
|
|
converts the data type of a column to a different
data type.
|
|
populates columns with the results of expressions.
|
|
inserts data from a data flow into a file.
|
|
reads data from a file and adds it to a data flow.
|
|
uses script to extract, transform, or load data.
|
|
runs SQL commands for each row in a data flow.
|
|
performs aggregations such as AVERAGE, SUM, and
COUNT.
|
|
sorts data.
|
|
creates a sample data set using a percentage to
specify the sample size.
|
|
creates a sample data set by specifying the number
of rows in the sample.
|
|
creates a less normalized version of a normalized
table.
|
|
creates a more normalized version of a
nonnormalized table.
|
|
routes data rows to different outputs.
|
|
distributes data sets to multiple outputs.
|
|
merges multiple data sets.
|
|
merges two sorted data sets.
|
|
joins two data sets using a FULL, LEFT, or INNER
join.
|
|
looks up values in a reference table using an exact
match.
|
|
writes data from a connected data source in the
data flow to a Cache connection manager that saves the data to a cache file.
The Lookup transformation performs lookups on the data in the cache file.
|
|
The transformation distributes buffers of incoming
rows uniformly across outputs on separate threads to improve performance of
SSIS packages running on multi-core and multi-processor servers.
|
|
makes information about the environment available
to the data flow in a package.
|
|
counts rows as they move through it and stores the
final count in a variable.
|
Example:
In the example, I would like to show we can use derived
column to add the date of load as today’s date from a variable.
Step 0: Create a variable date as string and set the scope
to package. Create an expression for setting the default values as today’s
date.
Step 1: From the tool box drag a data flow task and use OLE
DB Source and Destination and also in between the source and destination I want
to use a transformation to inject todays date as load date.
Step 2: map the column from derived column output
Step 3: Execute the task.
Output:
As you can see the the date stamp of input date is today. This becomes handy when there are lot of files and batch loads.
0 comments:
Post a Comment