The next task in this project was to import the text files into a SQL Server 2008 DB.
To do this, I created an SSIS package. I will cover the package creation in another post.
The next step actually bears the point of this series, as attempting the validation of the title/authors has become the crux of the whole thing (as data validation usually is).
The basic problem looks like this:
Column_1 Column_2
Mark Twain Huck Finn
Tom Sawyer Mark Twain
Call Of The Wild Jack London
Arthur Conan Doyle Sherlock Holmes
In the case of Mark Twain and Tom Sawyer, they can both be valid author names - there's no special characters, etc.
So, I went looking for a validation method. My first thought was to use Amazon. That should be pretty simple, right? Just grab an API, plug it in, and away we go!
As it turns out? Not so simple to use Amazon from SSIS.