[Update: As Paul Owen notes in the comments to this post, there were issues with the markup as presented in the post. I removed the formatting and replaced the XSLT to clarify the stylesheet.]
In the SQLXML group on Yahoo! groups, Jonathan Smith asks:
I'm trying to find a way to convert a .csv file to xml using stylesheets and I can't seem to find anything. I did read about something called fxml or something like that for flat files but I don't think it's a standard. Can anyone help me in this regard?
This is certainly doable with XSLT, provided that the data contains characters allowable in XML. If not, you will have some data scrubbing and character escaping to do to represent the same data as XML. In the simple case of "<" characters, this is easily done with character code escapes like <. For high-order values, this is going to require some other scrubbing and character replacement because XSLT cannot transform the data even with character entity references for the characters in unacceptable ranges.
The first order of business is to put a root tag on the data. This will allow us to load the data into a DOM and ensure that the data can be represented as XML.
Once we have an XML document where all of the data is well-formed, we next need to start the transformation process. Logically, we can break the problem into 2 steps. We need to get the rows of data, and then break each row up by its individual fields.
The XPath string function substring-before() can be used to grab all of the data before the first carriage return. The function substring-after() can then be used to grab all of the data after the first carriage return. We just call the same function recursively until the string we are processing no longer has any carriage returns, at which point we add the remaining string to the result tree.
To break up the columns in each row, we go through the same process. Instead of looking for carriage returns, we look for the first comma. We then call ourselves recursively with the rest of the string until the string contains no more commas.
Now that we have the 2 template rules in place, we form the entire stylesheet together.
The result of the transformation is: