Nearly everyone uses the flat file source adapter, and nearly everyone that uses it runs into a common set of challenges. In Denali, we’ve made a couple of key enhancements to the Flat File Source to support the most commonly asked-for options:
A common way NULL data is represented at the end of rows in delimited files is without the column delimiter. Here’s an example:
When represented as a delimited file, it’s fairly common to see this kind of representation
Note the lack of a trailing comma after the “6” in the second row. Customarily, most flat file parsers treat the lack of data on this row to mean there are NULLs in the remaining columns on the row.
Prior to Denali, SSIS took a different approach, where it would ignore any row delimiter until it believed it was parsing the last column of the row. So for example the flat file above might be parsed as:
Of course, since “HANAR,4,2” isn’t a valid number, if ShipVia was an integer column, the parse would fail. And worse yet, there was little you could do to work around the limitation; your choices were to write your own parser, convert the data before parsing it, use the Flat File Source just to parse rows, or some such.
In Denali, this kind of file is parsed differently – by default, we’ll always look for a row delimiter in unquoted data, and start a new row if it’s seen. So the table above is parsed as one would expect:
This new behavior is on by default, but can be disabled at any time using the AlwaysCheckForRowDelimiters property on the connection manager.
Another custom in delimited files is the use of a qualifier character to “escape” or embed a qualifier character into a qualified string, for example:
Can’t Buy a Thrill
Echoes, Silence, Patience & Grace
To allow the literal commas to be part of the string, the use of qualifiers is typical in a flat file representing this data. However if an apostrophe(') is used for qualifying, the apostrophe in the first row must be escaped, and a typical way to accomplish this is by doubling the qualifier character:
1148,'Can''t Buy a Thrill'
1149,'Echoes, Silence, Patience & Grace'
Prior to Denali, such files would always fail to parse: though SSIS supported qualifiers, SSIS had no support for embedded qualifiers, and would treat the first qualifier character (in this case the first apostrophe after the “n” in “Can’t Buy a Thrill”) as the end of that field, and throw an error due to the lack of a delimiter immediately after the apostrophe.
In Denali, this is fixed, and the data above is properly parsed as:
Can't wait to use the feature with embedded qualifiers. That was always an Achilles Heel for us when parsing flat files.
For the second improvement, what would happen if the embedded qualifier is not doubled, i.e.
1148,'Can't Buy a Thrill'
It should still be possible to distinguish that from a real qualifier by the lack of column delimiter next to it - will the Flat File Source do that?
I just tried it out - it doesn't look like we support that functionality. The embedded qualifier needs to be doubled.
Will there be an update to fix this issue in the older versions of SQL Server? This to me is a bug, not a feature.
No - as this new functionality is a significant change, I don't think we'd be back porting it to older versions of SQL Server.
Hmm. That's a shame. The weird thing is that Embedded Qualifiers works in the preview tab, but not in production in the 2005/2008 versions.
So does this mean you are finally going to support all of RFC 4180?
I don't get the improvement for embedded qualifiers. If it cannot parse when a single qualifier is embedded then I'd say you missed the point. The following example (all double quotes) was working in 2008 and is broken in 2012. So now I have to change all my packages? Do you think we have the ability to escape someone elses text files? Even MS Access can handle this. If we specifiy a qualifier then why not parse and return everything inside?
100, "21 E "S" STREET","LOVE SSIS, BUT THIS BLOWS"
Am I missing something here?