An Avanade Blogging Community

Welcome to An Avanade Blogging Community Sign in | Join | Help
in Search

Old School Database Guy

No School Like Old School

I always hesitate to write about anything involving someone else’s code, lest the favor get returned.  In this case however, one must have some sympathy for the developer, as the source system for a particular ETL job was not yet in production.  He had to write his packages against highly idealized, hypothetical test data.  Surprise! Surprise! The real data causes some issues. 

My predecessor followed a standard set of steps for any SSIS work for loading a data mart:

1.       Get source data and load it to a staging table

2.       Select data from the staging table relevant to the fact table to be populated

3.       "Look up" relevant foreign keys using data from the staging table - this creates the correct referential integrity between the fact and dimension tables in the data mart.  This was done with a look up task in SSIS.

4.       Load the fact table with the results of the prior two steps.

Once we got into “real” data, the issues associated with look up tasks became apparent.  We repeatedly saw errors in the look ups.  The error, however was extremely vague and provided no direction or ability to track down the issue.  It was “Row yielded no match during lookup.” 

The first source of these errors were the differences between the datatypes and collations used in the actual source system, in this case Oracle and the target system, SQL Server 2005.  When we used a simple SQL join, the joins also failed with collation errors.  It was not until we used a join with the appropriate collation logic that we were able to actually “look up” the data as follows:

SELECT DISTINCT <Columns>

FROM My.STAGING_Table S left outer join dbo.dimension D on S.Unique = D.Unique COLLATE Latin1_General_CI_AS  

Other look up errors, frankly make no sense to me, as the data types and collations were the same and SQL “look ups” did not require data conversions or collation hints.  The simple “lookup” works fine
SELECT * 
from STAGING.myTable
where uniqueString in (select uniqueString  FROM dbo.DimensionTable;

(Yes I know this is sloppy syntax.  It was a down and dirty test!)

In the end we replaced data transfer tasks with multiple look ups with a SQL selects and inserts using OLEDB sources and destinations.  Every lookup encountered could be replaced with a join or a common table element.  This provided several advantages.  First, all of the data required was obtained with a single set operation, instead of multiple trips to the database to lookup values.  Second, we had much more insight into exactly where potentially “bad” rows of data existed, so we could handle those errors far more elegantly.  Third, the operations actually ran slightly faster. 

Another reason to use those old school set operations!

 

Published Thursday, July 24, 2008 7:45 AM by Old School Database Guy
Filed Under: , , , , , ,

Comments

No Comments
Anonymous comments are disabled

About Old School Database Guy

After spending 10 years as a lawyer, Mike discovered his true avocation as a database guy. He has over 10 years experience working with major corporations on database design and integration issues. Mike lives in Colorado where he skis, rides motorcycles and listens to Verdi and Puccini operas in his free time. His personal musings may be found at http://tamethemonkeymind.blogspot.com/.

This Blog

Post Calendar

<July 2008>
SuMoTuWeThFrSa
293012345
6789101112
13141516171819
20212223242526
272829303112
3456789

Syndication