This article is a continuation of http://blog.avanadeadvisor.com/blogs/old_school_database_guy/archive/2009/03/23/13221.aspx. That article dealt with error logging for system failures or other unrecoverable errors for a SSIS package. In this post, I will deal with data quality errors that prevent individual rows from loading.
SSIS Logging Best Practices
WTF Logging
Every ETL project invariably comes across data in unexpected formats or is otherwise a knuckleball (apologies to Europeans for the baseball analogy). For example, you are expecting integers and get “two” instead of “2” – Where’s That From?
None of these issues will rise to the level where an application can or should fail. Instead, these represent business rule or data quality issues that should be reported back to users so the issues might be addressed.
Many of which can be quite amusing in hindsight. I worked on one application where with and HL7 (HL7 is a health care industry standard for data transfer) data source having a patient’s blood type in a field that was supposed to hold the social security number. In another application, the “cumulative” production of an oil well would reset to 0 from time to time by the source system because of a source system bug. So, when calculating average production of a well, we tried to use a simple formula of
(Cumulative Volume Now – Cumulative Volume Then)/elapsed time
with a “cumulative” volume reset to zero we would get a negative number as a result – implying the company was putting oil back in the ground. At the price at the time, about $140/barrel, this seemed unlikely.
These three examples illustrate different types of data quality issues each of which might be handled in different ways using SSIS. Those issues can broadly be classified as
1. Data from the source system fails some constraint or business rule on the target system
2. The source system consistently violates some standard, either industry wide or a corporate standard
3. A bug in the source system produces unexpected or illogical data.
There are numerous ways to handle these issues as they arise. The first, and my preference, is to work with the folks running the source systems to resolve the issues. This usually involves a dope slap, accompanied with some variation of “What ‘cha stew-ped?! You ain’t supposed ta put dat data in dat field!”
Unfortunately this method, while effective, violates most HR policies. Your fallback here should be more politic negotiations around field mappings, particularly where you wish to keep your data mappings within some reasonable semblance of logic and industry standards.
The next thing that should be done is to enforce business rules and logical rules on the target database itself. While this is a standard best practice for database design, it is remarkable how often developers fail to follow this practice.
SSIS provides easy mechanisms for logging violations of these rules or constraints. So, for example, consider the case where the “cumulative” volume of oil pumped from a well is reset to zero. A simple check constraint on the target field enforces the rule that the cumulative volume for today must be greater than or equal to the cumulative volume for yesterday. Should this test fail, the insert of today’s data will fail. SSIS can then catch and log this failure for this one row of data, while continuing to load data that complies with the business rule. So we see, below, an extremely simple example of how this can be handled. Data is extracted from an OLE DB source to be loaded directly to an OLE DB target. Should any insert fail, the data will be written to a comma delimited file along with the error code and the error column.

Most frequently, the business process breaks down at this point. You have this data in some form of log… and then?
Best practice is to provide an easy user interface that will allow a business user to address these issues. At a minimum, this interface would provide:
1. Identifying information that will allow a user to track the offending data to its source;
2. A plain English (or French, or Mandarin, or whatever) reason that the insert or update failed. This will require that you translate the error code. The codes themselves can be found at http://msdn.microsoft.com/en-us/library/ms345164.aspx for SQL Server 2008 and at http://msdn.microsoft.com/en-us/library/ms345164(SQL.90).aspx for SQL Server 2005. Adding an error description requires a single line of script, as described in the SSIS help topic Enhancing an Error Output with the Script Component
3. Additional information that will help the user to identify the error. For example, simply telling the user “The data value violated the schema constraint.” is not helpful. You must tell the user why the constraint was violated so they can do their job to fix the issue.
Audit and Event Driven Logging
There will be times when the business requires auditing of data transfers. Most of these requirements can be met with the event driven logging internal to SSIS. The basic logging screen is relatively simple. You just select which events to log.

On the advanced screen, you have more control over what gets logged, as shown below

You should be cautious with SSIS logging. It carries a large I/O overhead and can slow application performance dramatically. Unless absolutely required I would limit SSIS logging to OnError and OnTaskFail and OnWarning events.
The How
How one should log will vary by your specific situation. Here, I have limited myself to the simplest text file logging. You can also log to any OLE DB compliant application you choose by selecting the correct tool from the SSIS toolbox, or creating your own custom script or class. This is limited only by your imagination.
Conclusion
SSIS logging is a powerful tool the use of which is limited only by your imagination.