An Avanade Blogging Community

Welcome to An Avanade Blogging Community Sign in | Join | Help
in Search

Old School Database Guy

  • SQL Azure First Looks

    I had a look at the latest on SQL Data Services, also called SQL Azure at http://www.msteched.com/online/view.aspx?tid=01053bed-425d-4152-a293-e96fef6240fc and I would recommend all you database folk have a look as well.

    I see some really good things here, but also some troubling issues. The bad news:

    1. There is no integration with Windows security, therefore no real way to audit data access. This is a big issue for anyone storing sensitive data or with a regulatory requirement for a data access audit trail (e.g. medical applications and anyone holding my financial data).

    2. There is a 10 GB limit on database size – read small applications only.

    3. This is a hosted environment; therefore people outside your organization have physical access to the data. This is a problem in certain regulated enterprises, again like medical providers.

    The good news:

    1. Rapid deployment. No more need to wait to get hardware installed.

    2. The hosting company is Microsoft, so the physical security is probably much better here than it might be for the typical small or medium size business.

    3. This is ideal for smaller applications with shorter lifecycles. For example, dealing with contractor to sub contractor relations on a short term project, or quick applications to take advantage of a trend or seasonal retail opportunity (ski season is almost here, for example).

    4. There would appear to be a great opportunity for “application in a can” scenarios, where you want to have a shell of an application you can modify rapidly and deploy, like an on line store or catalog that requires a small amount of customization.

  • Towards a Database Professional’s Code of Ethics

    I have been giving a great deal of consideration to the ethics of data storage and manipulation lately. I have documented in the past certain extreme lapses in database security and additional lapses are routinely reported. The recent indictment of Albert Gonzalez (http://www.usdoj.gov/opa/pr/2009/August/09-crm-810.html ) is but one example.


    In my own life, I have been the victim of data theft at least twice. In both cases, a credit card number was stolen. Once, by a waiter who swiped the magnet strip of my card twice – once for the restaurant and once to sell the data. The second time resulted from a hack of a vendor I have done some business with, though I do not know the exact nature of this hack. Just this morning, we were informed that one of my wife’s credit card numbers had been used for purchases outside of the USA. We do not know how this particular theft occurred yet.

    Start with what should be a self evident proposition: when you are entrusted with sensitive information, you have a duty to prevent the misuse of that information. In Buddhism one has a duty to reduce the suffering in the world. In Islam, there is an affirmative duty to prevent crime. While no Bible or Torah quote springs to mind, I find it difficult to believe that a Rabbi or Christian minister would reject this proposition.

    This duty arises from the basic, though unspoken, properties of the commercial transaction. I pay for goods or services via a credit card, thus providing the merchant with access to my line of credit for the limited purpose of obtaining the agreed upon funds. The merchant does not have permission to use or distribute that access to anyone else for any other purpose. I trust the merchant to limit access to this information for the limited purposes only. The merchant breaks that trust when he allows access by others to this information, whether intentionally or through their failure to exercise reasonable care.

    Again and again I am amazed at the failure of organizations to take even the most basic steps to secure data, ensuring that people (myself included) will become the victims of identity theft. In the case of my wife and I, we were lucky that our credit card company caught the issue early (and bravo to whoever wrote the data mining programs at Citibank to catch these crimes early).

    Others are not so lucky. Mr. Gonzalez and his associates are alleged to have stolen 130 million credit and debit card accounts from several companies. Radisson hotels recently revealed a security breach that resulted in theft of credit card data in the United States and Canada.

    Yet, even with these reports in the news and with an ever increasing amount of cyber crime, I routinely find the cyber-security equivalent of a bank leaving piles of cash deposits in the lobby. I would like to propose that we, as database professionals, take onto ourselves a code of professional responsibility. I would love to get input on this, but I think the first step would be this commitment:

    Recognizing that people entrust their identity, personal and financial data to me, I have taken on a sacred duty to protect that data and, thereby, protect the people who have placed their trust in me. I will take all reasonable steps to ensure the safety and security of that data in its storage and transmission.


    Reasonable steps would include:


    • Ensuring network protections are available to safeguard against unauthorized access

    • Ensuring software is written to prevent SQL injection attacks

    • Logging and auditing all data access

    • Ensuring that access to system administration accounts and DBO accounts is limited to appropriate staff

    • Strictly limiting data access to the data one needs to do one’s job, and no more

    • Encrypting all sensitive data in storage

    • Encrypting all sensitive data in transit

    • Ensuring that all security patches, anti-virus and anti-intrusion software is working and up to date

    • Prevent the storage of sensitive data on local systems, laptops and thumb drives.

    • All users should use strong passwords.

    Got any more?

  • Interesting news item

    The Wall Street Journal is reporting serious compromises in computer security around key infrastructure in the United States.  I would not be surprised if the same issues are present in the other countries of the G-20.  The article can be found at http://online.wsj.com/article/SB123914805204099085.html.

     

    While it is not surprising that the attempted hacks occurred, it is surprising that the companies involved did not catch them.  Instead various government intelligence and law enforcement agencies discovered the intrusions. 

     

    For me, the surprise came in just how much infrastructure is vulnerable.  It does extend to items I had not considered in the past.  I expect nuclear power plants, the NY Stock exchange and NASDAQ to be targets, and thus have very sophisticated defenses.  I did not think of sewage treatment plants, for example, as targets, but when one considers the damage that could be done – as well as the pure gross out factor (if one can apply the term “pure” in this situation) – then they are a target. 

     

    Living in Colorado, where one is upstream from pretty much everyone west of the Mississippi, one realizes the damage this could do.  A hack on sewage treatment facilities on the treatment plants for just the cities on the front range (Pueblo, Colorado Springs, Denver and Fort Collins, plus associated smaller communities)  would be devastating not just to those cities, but to Eastern Colorado, Kansas, Nebraska, Missouri, Oklahoma, Texas, Arkansas and Louisiana as well.      

  • SSIS Logging Best Practices Part 2

    This article is a continuation of http://blog.avanadeadvisor.com/blogs/old_school_database_guy/archive/2009/03/23/13221.aspx.  That article dealt with error logging for system failures or other unrecoverable errors for a SSIS package.  In this post, I will deal with data quality errors that prevent individual rows from loading.

    SSIS Logging Best Practices

    WTF Logging

    Every ETL project invariably comes across data in unexpected formats or is otherwise a knuckleball (apologies to Europeans for the baseball analogy).   For example, you are expecting integers and get “two” instead of “2” – Where’s That From?

    None of these issues will rise to the level where an application can or should fail.  Instead, these represent business rule or data quality issues that should be reported back to users so the issues might be addressed.

    Many of which can be quite amusing in hindsight.  I worked on one application where with and HL7 (HL7 is a health care industry standard for data transfer) data source having a patient’s blood type in a field that was supposed to hold the social security number.   In another application, the “cumulative” production of an oil well would reset to 0 from time to time by the source system because of a source system bug.  So, when calculating average production of a well, we tried to use a simple formula of

    (Cumulative Volume Now – Cumulative Volume Then)/elapsed time

    with a “cumulative” volume reset to zero we would get a negative number as a result – implying the company was putting oil back in the ground.  At the price at the time, about $140/barrel, this seemed unlikely.

    These three examples illustrate different types of data quality issues each of which might be handled in different ways using SSIS.  Those issues can broadly be classified as

    1.       Data from the source system fails some constraint or business rule on the target system

    2.       The source system consistently violates some standard, either industry wide or a corporate standard

    3.       A bug in the source system produces unexpected or illogical data.

    There are numerous ways to handle these issues as they arise.  The first, and my preference, is to work with the folks running the source systems to resolve the issues.  This usually involves a dope slap, accompanied with some variation of “What ‘cha stew-ped?! You ain’t supposed ta put dat data in dat field!” 

    Unfortunately this method, while effective, violates most HR policies.  Your fallback here should be more politic negotiations around field mappings, particularly where you wish to keep your data mappings within some reasonable semblance of logic and industry standards.

    The next thing that should be done is to enforce business rules and logical rules on the target database itself.  While this is a standard best practice for database design, it is remarkable how often developers fail to follow this practice. 

    SSIS provides easy mechanisms for logging violations of these rules or constraints.  So, for example, consider the case where the “cumulative” volume of oil pumped from a well is reset to zero.  A simple check constraint on the target field enforces the rule that the cumulative volume for today must be greater than or equal to the cumulative volume for yesterday.  Should this test fail, the insert of today’s data will fail.  SSIS can then catch and log this failure for this one row of data, while continuing to load data that complies with the business rule.    So we see, below, an extremely simple example of how this can be handled.  Data is extracted from an OLE DB source to be loaded directly to an OLE DB target.  Should any insert fail, the data will be written to a comma delimited file along with the error code and the error column.

    Most frequently, the business process breaks down at this point.   You have this data in some form of log… and then? 

    Best practice is to provide an easy user interface that will allow a business user to address these issues.  At a minimum, this interface would provide:

    1.       Identifying information that will allow a user to track the offending data to its source;

    2.       A plain English (or French, or Mandarin, or whatever) reason that the insert or update failed.  This will require that you translate the error code.  The codes themselves can be found at http://msdn.microsoft.com/en-us/library/ms345164.aspx for SQL Server 2008 and at http://msdn.microsoft.com/en-us/library/ms345164(SQL.90).aspx for SQL Server 2005.  Adding an error description requires a single line of script, as described in the SSIS help topic Enhancing an Error Output with the Script Component

    3.       Additional information that will help the user to identify the error.   For example, simply telling the user “The data value violated the schema constraint.” is not helpful.  You must tell the user why the constraint was violated so they can do their job to fix the issue.

     

    Audit and Event Driven Logging

    There will be times when the business requires auditing of data transfers.  Most of these requirements can be met with the event driven logging internal to SSIS.   The basic logging screen is relatively simple.  You just select which events to log.

     

     

     

    On the advanced screen, you have more control over what gets logged, as shown below

     

    You should be cautious with SSIS logging.  It carries a large I/O overhead and can slow application performance dramatically.  Unless absolutely required I would limit SSIS logging to OnError and OnTaskFail and OnWarning events. 

    The How

    How one should log will vary by your specific situation.  Here, I have limited myself to the simplest text file logging.  You can also log to any OLE DB compliant application you choose by selecting the correct tool from the SSIS toolbox, or creating your own custom script or class.  This is limited only by your imagination.

    Conclusion

    SSIS logging is a powerful tool the use of which is limited only by your imagination.    

     

  • SSIS Logging Best Practices: Part 1

    SSIS Logging Best Practices

    One of the things we often forget as software developers is that, after we are finished writing our brilliant code, someone will need to actually operate the system we have created.  Frankly, no matter how spectacular our code may be, there is no way we can anticipate everything that may change in a system, or the myriad ways other developers and the user community might change requirements or otherwise play with our heads.

    Issues concerning system changes, data changes and errors are particularly acute in any ETL work, as ETL code, by definition, must bridge two or more systems, and those systems will certainly change over time.  Even without new development, there will certainly be upgrades, patches and other changes that can cause issues.  In addition, the user community will continually add data to source systems.  Data might well be “allowable” on the source system but violate a business rule set on the target system, causing your elegantly designed ETL code to throw up like a frat boy on his 21st birthday. 

    We therefore need to design our systems so that we can not only recover from these errors, if possible, but also provide sufficient information to the appropriate folks so they can address these issues on an ongoing, operational basis.  This raises the issues of WHAT should be logged and WHERE it should be logged, versus HOW you should log it.

    The WHAT and WHERE….

    Broadly speaking there are three main areas for logging,

    ·         Things that go bump in the night

    ·         WTF (as in”Where’s This From?” …. Get your minds out of the gutter)

    ·         Event driven logging or Audit logs

    Things that go bump in the night

    To be precise, these things can go bump at any time, but the typical ETL job is run late in the evening or during early morning hours, when data is extracted from a transactional system and loaded into a data mart or warehouse for analysis, or when different systems are synchronized during a maintenance window. 

    Specifically here, I am referring to unrecoverable errors.  As good developers we need to provide the operations staff with enough information for them to find and fix the issue.  As self-centered megalomaniacs, we need to provide this data so we are not getting phone calls in the early morning hours or on weekends asking us to come in and fix a broken system. 

    These issues should be (but often are not) rare.  For example, it is difficult to imagine how one can programmatically recover from someone kicking the power cord out of the source systems server.  We simply need to build our code with a nod to the fact that it runs on a machine and machines occasionally break.

    When these events occur, the error needs to be raised in such a way that it is visible to the folks who can begin to address the issue.    So, the first thing required here is some interface for the operations staff to view status of each job as it runs.  Let me say the following in biog bold letters, so we are very clear…

    NEVER, EVER USE EMAIL TO REPORT ERRORS!!!!!!!!!!!!!!            

    Why? The inexperienced may well ask.  First, email lists for notification purposes are rarely, if ever maintained or kept up to date. 

    Second, everyone, without exception, puts rules on their email accounts to route such emails to some folder.  No matter what this folder is named the name means “Ignore Until Management Calls.”  The result is the error is ignored until you have annoyed users.  

    Third, bad things come in threes.  So invariably, if the first thing bad is that the application died, the second thing bad is that the person who gets the email about it is out sick, on vacation or otherwise unavailable, and is probably in some remote location with no cell phone coverage.  The third bad thing will be the 2 AM call you get to get out of bed and come in to fix the problem.

    Fourth, many times these emails are sent to groups.  It is an ironclad law of corporate behavior that when everyone is responsible no one is responsible, so nothing will happen until someone with a “C” (as in “CIO”) in their title gets a call asking why there is no data.  This is not where you want to be.

    Ideally, there will be a simple interface with an easy to understand “red light, green light” approach to these errors.  That application will direct the operations staff to detailed – emphasis on detailed – messages that will help them.

    Unfortunately, we often get rather cryptic error messages back from our applications.  Often we have stumbled across these issues while in the development cycle or handing off to operations.  We should be making it a practice to decipher these messages for the operations staff. 

    For example, I recently completed a project that used a web service as a source for a data mart.  Towards the end of development, I would often get an exception in my code “An unknown error has occurred. Please contact the application administrator for further information.”  This is hardly the most informative error message ever created.  However, we noticed that, invariably this error would arise because the web service server was down.  It seemed to me to be a matter of common courtesy, if nothing else, to pass this knowledge to the poor soul who would be getting it at 2 AM, when this data load was scheduled to run.  In this case, some simple code wrote the additional details out to a text file, and the path the text file was then put in the message logged in the Windows Event logs, as follows:

     

    public static void RaiseTheAlarm(string errPath, string LoadName, Boolean writeToEventLog, Exception ex)

    {

    //irrelevant code removed

    StreamWriter ErrorWrite = new StreamWriter(logPath.ToString()  );

                ErrorWrite.WriteLine("There has been an error in the load of " + LoadName);

                ErrorWrite.WriteLine("Exception Message");

                ErrorWrite.WriteLine(ex.Message);

                string errMess = ex.Message;

     

                if (ex.InnerException == null)

                {

                    if (errMess.Contains("An unknown error has occurred. Please contact the application administrator for further information."))

                    {

                        ErrorWrite.WriteLine("This error typically appears when there is a failure with a call to a web service.  Please chack the appropriate WCF service for errors.");

                    }

                }

                else

                {

                    ErrorWrite.WriteLine("Inner exception: ");

                    ErrorWrite.WriteLine(ex.InnerException.Message);

                    if ( ex.InnerException.InnerException != null)

                    {

                        ErrorWrite.WriteLine("Inner exception: ");

                        ErrorWrite.WriteLine(ex.InnerException.InnerException.Message);

                    }

                }

                ErrorWrite.Flush();

                ErrorWrite.Close();

              //irrelevant code removed

    }

     

     

    It should also be noted that, in this application the inner exception messages contained a good deal of information that could help solve the issue, so we always logged these when possible.

    Where you raise such errors will depend on where the monitoring system looks for application status.  I have yet to find such a monitoring tool that did not look at the Windows Event Logs for errors, so that would be a good place to start.  Again, such code is quite simple using the EventLog class

     

    if(! System.Diagnostics.EventLog.SourceExists(LoadName))

                    {                    System.Diagnostics.EventLog.CreateEventSource(LoadName,"Application");

                    }

    StringBuilder PleaseSee = new StringBuilder();

    PleaseSee.Append( "Please see ");

    PleaseSee.Append(logPath.ToString());

    PleaseSee.Append(" for further details");

    System.Diagnostics.EventLog.WriteEntry(LoadName, ex.Message,System.Diagnostics.EventLogEntryType.Error);

    System.Diagnostics.EventLog.WriteEntry(LoadName, PleaseSee.ToString(), System.Diagnostics.EventLogEntryType.Error);

                  

     

     

    In this case, we simply note that an error has occurred, raise it to the Application event log and direct the user to a text file where they can find the details they need to diagnose the issues.  In short, this need not be complex, but it does need to provide the operations staff with the data they need to do their jobs. 

    Best practice is, therefore

    1.       Operational staff needs a simple clear user interface to monitor ETL status (NOT EMAIL!).

    2.       Provide details someplace.  It is easy to simply write these details to a text file and provide the path to this file via the Windows Logs or the monitoring tool.

    3.       Have some courtesy.  If you can provide more details then the exception thrown by the application, do so. 

     

  • Today’s SSIS 2005 Gotcha’s

    I spent an hour today quietly cursing while I attempted to create a new configuration file for a SSIS package.  The error message was a classic non-helpful message…

     

    “Could not generate the configuration file. (Microsoft.Data.TransformationServices.Wizard)

    Technical Details:
    ===================================

    Could not complete wizard actions. (Microsoft Visual Studio)

    ===================================

    Could not generate the configuration file. (Microsoft.DataTransformationServices.Wizards)”

     

    I began with the usual suspects.  The path was not read only, the file had been checked out for editing, no other path worked.  Finally I left off two of the variables I was attempting to put in the configuration file, et voila, it worked.

     

    The variables in question were object type variables.  Had I thought about it for a few seconds, I would have realized that configuration files are not .NET aware.  Moreover the generic system.object type could be almost anything, and configuration files really are limited to the more specific data types (e.g. strings, integers and the like).

     

    It would be nice if the error message was a tad more informative however.

  • No School Like Old School

    I always hesitate to write about anything involving someone else’s code, lest the favor get returned.  In this case however, one must have some sympathy for the developer, as the source system for a particular ETL job was not yet in production.  He had to write his packages against highly idealized, hypothetical test data.  Surprise! Surprise! The real data causes some issues. 

    My predecessor followed a standard set of steps for any SSIS work for loading a data mart:

    1.       Get source data and load it to a staging table

    2.       Select data from the staging table relevant to the fact table to be populated

    3.       "Look up" relevant foreign keys using data from the staging table - this creates the correct referential integrity between the fact and dimension tables in the data mart.  This was done with a look up task in SSIS.

    4.       Load the fact table with the results of the prior two steps.

    Once we got into “real” data, the issues associated with look up tasks became apparent.  We repeatedly saw errors in the look ups.  The error, however was extremely vague and provided no direction or ability to track down the issue.  It was “Row yielded no match during lookup.” 

    The first source of these errors were the differences between the datatypes and collations used in the actual source system, in this case Oracle and the target system, SQL Server 2005.  When we used a simple SQL join, the joins also failed with collation errors.  It was not until we used a join with the appropriate collation logic that we were able to actually “look up” the data as follows:

    SELECT DISTINCT <Columns>

    FROM My.STAGING_Table S left outer join dbo.dimension D on S.Unique = D.Unique COLLATE Latin1_General_CI_AS  

    Other look up errors, frankly make no sense to me, as the data types and collations were the same and SQL “look ups” did not require data conversions or collation hints.  The simple “lookup” works fine
    SELECT * 
    from STAGING.myTable
    where uniqueString in (select uniqueString  FROM dbo.DimensionTable;

    (Yes I know this is sloppy syntax.  It was a down and dirty test!)

    In the end we replaced data transfer tasks with multiple look ups with a SQL selects and inserts using OLEDB sources and destinations.  Every lookup encountered could be replaced with a join or a common table element.  This provided several advantages.  First, all of the data required was obtained with a single set operation, instead of multiple trips to the database to lookup values.  Second, we had much more insight into exactly where potentially “bad” rows of data existed, so we could handle those errors far more elegantly.  Third, the operations actually ran slightly faster. 

    Another reason to use those old school set operations!

     

  • Security Report

    Verizon has just issued a new set of reports on security that make for interesting reading.  One striking statistic, from my perspective as a developer, is that approximately 80% of security breaches arise from errors of omission by developers or administrators. 

     

    A cure?  All systems should go through a security test before launch and be subject to periodic mock attacks or hack attempts.

     

    The report is available at http://www.verizonbusiness.com/resources/security/databreachreport.pdf.  An executive summary is available at http://www.verizonbusiness.com/about/news/displaynews.xml?newsid=25135&mode=vzlong&lang=en&width=530

  • SSIS Error Codes

    If you have any experience with SSIS, you have come to realize that the error codes for SSIS are not the same as the error codes for SQL Server.  Personally, I would have preferred the same error code for the same error.  For example, a violation of a unique constraint in SSIS is -1071607685 while the same violation in T-SQL is 2627.

     

    SSIS error codes are typically hard to track down, but you can find them at http://msdn.microsoft.com/en-us/library/ms345164.aspx.  

     

    Is my curmudgeonly nature showing?

  • Green IT

    Jason Hiner, the executive editor of TechRepublic, recently published a story on Green IT (http://blogs.techrepublic.com.com/hiner/?p=653&tag=nl.e138 ).  In the article he claimed that Green IT is viewed by some as a fad or a myth. 

     

    Closer to the truth, I think, would be to say that the critics of “greenwashing” are pointing out that there is a great deal of marketing hype around something we as IT professionals should be doing anyway. 

     

    Part of our goal in any project is to deliver a solution with a low cost of operations.  We are not going a client any favors by creating a solution that requires more servers, with their attendant capital, licensing and operational costs.  Our code and solutions need to be operationally efficient, to minimize these costs.  If that has the happy consequence of conserving energy and preserving the planet, I am one happy man.  The bottom line for this, however, is the bottom line, not a lower carbon footprint. 

     

  • Homeland Security Department Hacked! (Are you feeling safe?)

    If you needed yet another reason to automate your security patches so they apply automatically, here it is:  http://blog.wired.com/monkeybites/2008/04/microsoft-datab.html.  

     

    Apparently this SQL injection attack compromised several hundred thousand servers including the US Department of Homeland Security (feel safe yet?). 

     

    I wonder if the folks who write code that is so open to attack lock their doors when they leave for work. 

     

    My wife has a habit of checking to make sure the door is locked when we leave the house.  She checks not once, not twice, but three times.  Perhaps she could bottle that habit and sell it to developers who do not check their inputs for malicious attacks.

     

      

  • Extracting Complex XML for ETL

    The XML data source does not handle complexity within XML very well.  For the simplest XML it is fine, but for anything that is complex – in other words, real world data – one needs to resort to Script tasks or executables to actually get to data.  Whenever you see “<xsd:complexType name=”Whatever”>” you can be sure you will have issues with the XML data source.

     

    I have been doing some work with the PROML standard (http://www.prodml.org/prodml/Default.asp).  For a glimpse at the complexity of some of these data models, have a look at http://www.witsml.org/ or at http://www.witsml.org/schemas/131/addendum/combo/doc/schema/obj_productVolume.html.  For now, I am convinced that the easiest way to handle this is within a Script task or an executable.  The steps would be:

    1. Create a proxy DLL (see http://msdn2.microsoft.com/en-us/library/aa528855.aspx for details)
    2. Add the proxy DLL as a reference to your script task
    3. Create a binding object
    4. Create an endpoint object
    5. Create a proxy object
    6. Load the proxy with data
    7. Loop through the proxy hierarchy to get the data you need.

    Binding:

     

    We found in our tests that basic HTTP binding is very slow when used for ETL.  We were able to improve performance over 75% by using TCP binding instead of HTTP binding.  This will require some coordination between the SSIS development teams and the service development teams.

     

    Endpoint and Proxy

     

    The endpoint is a one of the building blocks for the proxy object.  It simple serves to provide the service URL to the proxy.   

     

    How you use the proxy will vary by your particular situation, but typically you will need to build a request object.  In our case we used a StreamReader to read an XML file, deserialized it and converted it to the object type the service expects. 

     

    Loading the data is a simple matter of calling the right method of the proxy object.

     

    Looping

     

    Your script will loop down through the hierarchy of the complex XML until you get to the data you need.  If you have to obtain data from multiple levels then this is easily handled through each step in the loop. 

     

    Code Sample

     

    Here we fetch XML and write it to a simpler file in a SSIS script task.  You can easily write this to a database as well.

     

    Dim fileRequests As String = Dts.Variables("fileRequests").Value.ToString

            Dim serviceURI As String = Dts.Variables("myServiceURI").Value.ToString

            Dim XMLOutputFile As String = Dts.Variables("strFileName").Value.ToString + "_" + Dts.Variables("XMLOutputFile").Value.ToString

            Dim XmlOutputFolder As String = Dts.Variables("XmlOutputFolder").Value.ToString

            Dim XMLOutputPath As String = XmlOutputFolder + XMLOutputFile

            Dim binding As System.ServiceModel.NetTcpBinding  

            binding = New System.ServiceModel.NetTcpBinding

            binding.Name = "ServiceEndpoint"

            binding.MaxReceivedMessageSize = Int32.MaxValue

            binding.SendTimeout = System.TimeSpan.MaxValue

     

            Dim endpoint As System.ServiceModel.EndpointAddress = New System.ServiceModel.EndpointAddress(serviceURI)

            Dim proxy As New MyService.Client.PROD_ProductVolumeSoapClient(binding, endpoint)

     

            Try

     

                'deserialize the XML file

                Dim SrRequest As New StreamReader(fileRequests)

                Dim xmlQuerySet As New XmlSerializer(GetType(query_Object))

                Dim objRequest As New MyService.Client.query_Object

     

     

                objRequest = CType(xmlQuerySet.Deserialize(SrRequest), MyService.Client.query_Object)

     

                'get the data

                Dim objData As New MyService.Client.obj_Results

                objData = proxy.GetDataSync(objRequest, Nothing)

     

                If objData Is Nothing Then

                    ‘do something to handle the error

     

                    Exit Sub

                End If

     

                'write the data to a file

     

                Dim doc As XmlDocument = New XmlDocument()

                Dim rootNode As XmlNode

                Dim recordElement As XmlElement

                doc.LoadXml("<myResults/>")

                rootNode = doc.SelectSingleNode("//getMyResults")

     

     

                For Each resultLevel1 As obj_resultLevel1 In objData.Result

                    For Each obj_resultLevel2 As resultLevel2 In resultLevel1.obj_resultLevel2

    'do something useful with this data

                        For Each resultSet As ResultSet In obj_resultLevel2.resultSet

                            If resultSet.property Is Nothing Then

    ‘do something to handle the error

     

                            Else

                                Try

                                        If resultSet.property.uid.ToString Is Nothing Then

    'do nothing.  We have an empty data set

     

                                        Else

                                    

                                            MeasuredValue = resultSet.property.stringValue.Value.ToString

                                            recordElement = doc.CreateElement("myData")

                                            recordElement.SetAttribute("MeasuredValue", MeasuredValue)

                                            rootNode.AppendChild(recordElement)

                                        End If

                                    Catch ex As Exception

                                        'we'll need to log this bad data

                                    End Try

     

                                Next

                            End If

     

                        Next

                    Next

                Next

                doc.Save(XMLOutputPath)

              

                Dts.Variables("XMLOutputPath").Value = XMLOutputPath

     

     

            Catch ex As Exception

                ‘handle the error

                errPath = "C:\Program Files\ErrorFiles\ERROR_" + Format(Now(), "MMMddyyyyhhmmssf") + ".TXT"

                Dim swError As StreamWriter = New StreamWriter(errPath)

                swError.WriteLine("Exception Message: ")

                swError.WriteLine(ex.Message)

     

                If ex.InnerException Is Nothing Then

     

                Else

                    swError.WriteLine("Inner Exception")

                    swError.WriteLine(ex.InnerException.Message)

                    If ex.InnerException.InnerException Is Nothing Then

     

                    Else

                        swError.WriteLine("Inner Exception")

                        swError.WriteLine(ex.InnerException.InnerException.Message)

                        If ex.InnerException.InnerException.InnerException Is Nothing Then

                        Else

                            swError.WriteLine("Inner Exception")

                            swError.WriteLine(ex.InnerException.InnerException.InnerException.Message)

                        End If

                    End If

                End If

                swError.WriteLine("End Time" + CStr(Now()))

                swError.Close()

                Dts.TaskResult = Dts.Results.Failure

                Exit Sub

     

     

            End Try

     

            Dts.TaskResult = Dts.Results.Success

        End Sub

     

     

     

     

     

  • Cheaper by the Dozen?

    As I have written a few time, I think technical people have an obligation to help our non-technical fellows, particularly around issues of security.


    An interesting article appeared on Slate.com (http://www.slate.com/id/2189902/ ) concerning how much an identity thief can get for your credit card.  It turns our – not much.  Credit card numbers, by themselves, are being offered for sale over IRC for as little as $0.40 per number.

     

    The only conclusion I can draw is that the supply for these numbers far out strips the demand.  It also follows that it must be remarkably easy to compromise some systems. 


    Of course, it may also be that the people who rip off credit card numbers are also trying to rip off the people who buy those numbers, but I would pause before ascribing such nefarious motives to such simple business people just trying to make a living.

     

     

  • A SSIS challenge

    One of the more interesting blogs out there (at least for those of us with no life) is Geek Trivia.  This week’s entry, at http://blogs.techrepublic.com.com/geekend/?p=1260, describes an interesting problem in data integration.  The problem is posed by the new Large Hadron Collider (LHC) — the largest scientific instrument ever built.  The LHC will be colliding sub atomic particles to attempt to replicate conditions immediately after the big bang. 

     

    How do you handle a feed that is coming at a rate of 1.8 GB per second, store up to 15 petabytes (as in 10^15 bytes – as in more then a few) per year?  To top it off, the network can only handle 1 GB per second of data, so 0.8GB must be cached every second.  Since the cache will not be cleared before new data hits, the cache will grow as a multiple of the length of time each test will take. 

     

    An interesting challenge for a team of folks, it would seem.

     

     

     

  • Dawn Breaks Over Marblehead

    I finally had a chance to catch up on some back issues of MSDN magazine today.  It is a task I generally put off as MSDN is targeted towards developers and, with all due respect, developers know nothing about databases, right?

    Like most database folks, particularly those who do lots of Oracle work, I became accustomed to a less then rigorous approach to source code control.  You write a script, make sure it runs and put it into whatever passes for source control – even if it is only a directory on a Linux server.  With database people moving in and out of projects, this led inevitably to spaghetti coding practices that would make an old COBOL programmer blush.  It would also lead to an inability to restructure databases because you never knew what was important and what was not important or what would break if you changed anything.

    Team Foundation (TFS) allows developers to use a test driven development methodology that has numerous advantages.

    ·         It automates regression testing;

    ·         Simplifies some integration testing

    ·         Allows for easier incremental design changes

    I read Jamie Laflen’s article in MSDN with great interest.  I do not think I need to repeat his description of how to use TFS – it is pretty straightforward and can be found at http://msdn2.microsoft.com/en-us/magazine/cc164243.aspx.  He did shake me out of an old style of development.  As is said in some neighborhoods in Boston “Dawn Breaks Over Marblehead!”

    It also goes to show that old school does not mean NO SCHOOL.

     

     

More Posts Next page »

This Blog

Post Calendar

<March 2010>
SuMoTuWeThFrSa
28123456
78910111213
14151617181920
21222324252627
28293031123
45678910

Syndication