An Avanade Blogging Community

Welcome to An Avanade Blogging Community Sign in | Join | Help
in Search

Old School Database Guy

SSIS Logging Best Practices: Part 1

SSIS Logging Best Practices

One of the things we often forget as software developers is that, after we are finished writing our brilliant code, someone will need to actually operate the system we have created.  Frankly, no matter how spectacular our code may be, there is no way we can anticipate everything that may change in a system, or the myriad ways other developers and the user community might change requirements or otherwise play with our heads.

Issues concerning system changes, data changes and errors are particularly acute in any ETL work, as ETL code, by definition, must bridge two or more systems, and those systems will certainly change over time.  Even without new development, there will certainly be upgrades, patches and other changes that can cause issues.  In addition, the user community will continually add data to source systems.  Data might well be “allowable” on the source system but violate a business rule set on the target system, causing your elegantly designed ETL code to throw up like a frat boy on his 21st birthday. 

We therefore need to design our systems so that we can not only recover from these errors, if possible, but also provide sufficient information to the appropriate folks so they can address these issues on an ongoing, operational basis.  This raises the issues of WHAT should be logged and WHERE it should be logged, versus HOW you should log it.

The WHAT and WHERE….

Broadly speaking there are three main areas for logging,

·         Things that go bump in the night

·         WTF (as in”Where’s This From?” …. Get your minds out of the gutter)

·         Event driven logging or Audit logs

Things that go bump in the night

To be precise, these things can go bump at any time, but the typical ETL job is run late in the evening or during early morning hours, when data is extracted from a transactional system and loaded into a data mart or warehouse for analysis, or when different systems are synchronized during a maintenance window. 

Specifically here, I am referring to unrecoverable errors.  As good developers we need to provide the operations staff with enough information for them to find and fix the issue.  As self-centered megalomaniacs, we need to provide this data so we are not getting phone calls in the early morning hours or on weekends asking us to come in and fix a broken system. 

These issues should be (but often are not) rare.  For example, it is difficult to imagine how one can programmatically recover from someone kicking the power cord out of the source systems server.  We simply need to build our code with a nod to the fact that it runs on a machine and machines occasionally break.

When these events occur, the error needs to be raised in such a way that it is visible to the folks who can begin to address the issue.    So, the first thing required here is some interface for the operations staff to view status of each job as it runs.  Let me say the following in biog bold letters, so we are very clear…

NEVER, EVER USE EMAIL TO REPORT ERRORS!!!!!!!!!!!!!!            

Why? The inexperienced may well ask.  First, email lists for notification purposes are rarely, if ever maintained or kept up to date. 

Second, everyone, without exception, puts rules on their email accounts to route such emails to some folder.  No matter what this folder is named the name means “Ignore Until Management Calls.”  The result is the error is ignored until you have annoyed users.  

Third, bad things come in threes.  So invariably, if the first thing bad is that the application died, the second thing bad is that the person who gets the email about it is out sick, on vacation or otherwise unavailable, and is probably in some remote location with no cell phone coverage.  The third bad thing will be the 2 AM call you get to get out of bed and come in to fix the problem.

Fourth, many times these emails are sent to groups.  It is an ironclad law of corporate behavior that when everyone is responsible no one is responsible, so nothing will happen until someone with a “C” (as in “CIO”) in their title gets a call asking why there is no data.  This is not where you want to be.

Ideally, there will be a simple interface with an easy to understand “red light, green light” approach to these errors.  That application will direct the operations staff to detailed – emphasis on detailed – messages that will help them.

Unfortunately, we often get rather cryptic error messages back from our applications.  Often we have stumbled across these issues while in the development cycle or handing off to operations.  We should be making it a practice to decipher these messages for the operations staff. 

For example, I recently completed a project that used a web service as a source for a data mart.  Towards the end of development, I would often get an exception in my code “An unknown error has occurred. Please contact the application administrator for further information.”  This is hardly the most informative error message ever created.  However, we noticed that, invariably this error would arise because the web service server was down.  It seemed to me to be a matter of common courtesy, if nothing else, to pass this knowledge to the poor soul who would be getting it at 2 AM, when this data load was scheduled to run.  In this case, some simple code wrote the additional details out to a text file, and the path the text file was then put in the message logged in the Windows Event logs, as follows:

 

public static void RaiseTheAlarm(string errPath, string LoadName, Boolean writeToEventLog, Exception ex)

{

//irrelevant code removed

StreamWriter ErrorWrite = new StreamWriter(logPath.ToString()  );

            ErrorWrite.WriteLine("There has been an error in the load of " + LoadName);

            ErrorWrite.WriteLine("Exception Message");

            ErrorWrite.WriteLine(ex.Message);

            string errMess = ex.Message;

 

            if (ex.InnerException == null)

            {

                if (errMess.Contains("An unknown error has occurred. Please contact the application administrator for further information."))

                {

                    ErrorWrite.WriteLine("This error typically appears when there is a failure with a call to a web service.  Please chack the appropriate WCF service for errors.");

                }

            }

            else

            {

                ErrorWrite.WriteLine("Inner exception: ");

                ErrorWrite.WriteLine(ex.InnerException.Message);

                if ( ex.InnerException.InnerException != null)

                {

                    ErrorWrite.WriteLine("Inner exception: ");

                    ErrorWrite.WriteLine(ex.InnerException.InnerException.Message);

                }

            }

            ErrorWrite.Flush();

            ErrorWrite.Close();

          //irrelevant code removed

}

 

 

It should also be noted that, in this application the inner exception messages contained a good deal of information that could help solve the issue, so we always logged these when possible.

Where you raise such errors will depend on where the monitoring system looks for application status.  I have yet to find such a monitoring tool that did not look at the Windows Event Logs for errors, so that would be a good place to start.  Again, such code is quite simple using the EventLog class

 

if(! System.Diagnostics.EventLog.SourceExists(LoadName))

                {                    System.Diagnostics.EventLog.CreateEventSource(LoadName,"Application");

                }

StringBuilder PleaseSee = new StringBuilder();

PleaseSee.Append( "Please see ");

PleaseSee.Append(logPath.ToString());

PleaseSee.Append(" for further details");

System.Diagnostics.EventLog.WriteEntry(LoadName, ex.Message,System.Diagnostics.EventLogEntryType.Error);

System.Diagnostics.EventLog.WriteEntry(LoadName, PleaseSee.ToString(), System.Diagnostics.EventLogEntryType.Error);

              

 

 

In this case, we simply note that an error has occurred, raise it to the Application event log and direct the user to a text file where they can find the details they need to diagnose the issues.  In short, this need not be complex, but it does need to provide the operations staff with the data they need to do their jobs. 

Best practice is, therefore

1.       Operational staff needs a simple clear user interface to monitor ETL status (NOT EMAIL!).

2.       Provide details someplace.  It is easy to simply write these details to a text file and provide the path to this file via the Windows Logs or the monitoring tool.

3.       Have some courtesy.  If you can provide more details then the exception thrown by the application, do so. 

 

Published Monday, March 23, 2009 12:23 PM by Old School Database Guy
Filed Under: , , , , , ,

Comments

No Comments
Anonymous comments are disabled

About Old School Database Guy

After spending 10 years as a lawyer, Mike discovered his true avocation as a database guy. He has over 10 years experience working with major corporations on database design and integration issues. Mike lives in Colorado where he skis, rides motorcycles and listens to Verdi and Puccini operas in his free time. His personal musings may be found at http://tamethemonkeymind.blogspot.com/.

This Blog

Post Calendar

<March 2009>
SuMoTuWeThFrSa
22232425262728
1234567
891011121314
15161718192021
22232425262728
2930311234

Syndication