Prior to SSIS 2012 it was not possible to re-use or share the same cache across packages. What that meant is if you created a cache you will be able to consume it only in the very package where the cache was instantiated. Therefore, a developer could take only two routes from here – either make as much heavy ETL processing as possible inside the same package where the cache resides, or populate yet another cache object in every other package where it is needed. The latter was especially harmful leading to unnecessary heavy extra database calls and an extended development time. This limitation has been overcome in the SSIS 2012 release.
Let’s take a look at how to implement a shared cache and what are the new opportunities this improvement brings to us!
Will begin with creating a new SSIS 2012 package of course.
Fire up the SQL Server Data Tools (a Visual Studio 2010 Shell will open if you have no Visual Studio 2010 of any kind installed, or a new instance of Visual Studio 2010 whatever edition you have will pop up).
1. Building the Cache and Warming it Up
Create a new Business Intelligence – Integration Services project, then drag and drop a new Data Flow Task. Name it. At this stage you should have something like:
Go into DFT design tab in which add a new source connection. In my case I built a query to extract some personal data from the AdventureWorks2008R2 database using a join as follows:
1: -- Select a limited number of records for performances sake
2: SELECT TOP 1000
8: FROM [AdventureWorks2008R2].[Person].[Person] P
9: JOIN [AdventureWorks2008R2].[Person].[PersonPhone] PP
10: ON PP.[BusinessEntityID] = P.[BusinessEntityID]
I guess the use of such a query could be in providing data to other packages on all the individuals a given company is aware of.
At the next step we will add a cache connection manager:
and set to dump the data to a CAW file:
And let’s configure it like this:
The Index Position column if set to something greater than 0 simply denotes an index, let’s say a key or Primary Key if you wish.
Here is the zest – if you chose the file then any package in the project, not necessarily even a child package (run from the parent package) will have the access to the cache!
But wait, one more step is necessary before then: right-click on the Cache Connection Manager and choose the option “Convert to Project Connection” from the context menu items:
The next component to add to this DFT is the Cache Transform which we will configure as depicted below:
As you have probably already guessed, the BusinessEntityID we had in your SQL query is now the key column for lookups.
At this point, if we would want we could run this package already and the cache would be built and ready to be consumed. Right, we are almost done!
2. Consuming the cache in an external package
Sure we would be able to consume the cache inside the same package, this is the pre-SSIS 2012 trick, but our objective is to use it the SSIS 2012 way, in a package that is only part of the same project.
To do so right click on the SSIS Packages folder and choose the “New SSIS Package” option. Name the package. In my case, because I wanted to demonstrate how external data can be matched against the an existing cache and I came across the sick leave hours value in the Employee table I thought it could be a good candidate, so here it goes: a new package called collect sick leave encompassing a single DFT in which we have again a connection to the AdventureWorks2008R2 database bringing the contents of the Employee table with only a few columns chosen:
Remember when building your own packages the identity column has to exist, otherwise re-using the cache would not be possible.
So now drag and drop the cache transformation component and begin configuring it as follows:
An interesting discovery: the connection property page of the Lookup Transformation discovers and accepts the cache connection made in the package in step 1:
The columns are mapped like follows:
Two steps, and basically this is it to consuming the cache.
The output now can travel to one destination or another:
But what is important – there is no need to warm the cache up again anywhere in this project to be consumed by any other packages.