Welcome to MSDN Blogs Sign in | Join | Help

Regular Expression Flat File Source

The RegEx flat file source is one of the new community samples for SQL Server 2008 we’ve published to Codeplex. It uses regular expressions to extract values from a text file. It works similar to the flat file source, except that it’s not limited to CSV-type files (I saw a demo where the developer who created the sample used it to extract class names from a source files).

Note, when I ran the installer for the sample, it didn’t put the RegExFlatFileSource.dll file under 100\DTS\PipelineComponents. It only put the source down. We’ll either update the installer, or update the docs on the site. To get the component, I opened the project in Visual Studio, and ran a build. The project has a post build step which places the DLL under the PipelineComponents directory, and runs gacutil.exe to place it in the GAC.

Once the sample is installed, you can add it to your toolbar in Visual Studio.

image 

We’ll start off with a simple flat file example. My data looks like this:

value 1,1,2001-01-01
value 2,2,2002-02-02
value 3,3,2003-03-03
value 4,4,2004-04-04
value 5,5,2005-05-05
value 6,6,2006-06-06
value 7,7,2007-07-07
value 8,8,2008-08-08

Columns are defined using groups. If I want the component to behave like a regular CSV parser, I can use a regular expression like this:

(\w+),(\w+),(\w+)

image

If I click on the Column Mappings tab, I can see four columns have been defined.

image

Like with regular expression matches, the first match (column 0) is the entire pattern (which is the whole line in this case). The next three columns are the groups I defined in my regex.

I can also provide default names for my columns by naming the groups. This regex is a little more specific, and adds names to the groups using the ?<name> syntax supported by .NET.

 

 

 

 

 

 

 

 

 

(?<text>.*?),(?<number>\d),(?<date>\d{4}-\d{2}-\d{2})

If I look back at the Column Mappings tab, I can see the columns now have names.

image 

You’ll notice that the component provides two outputs – one for rows that match the pattern, and one for rows that don’t match. The non-matching row output will always have a single string column which contains the entire line.

image

Adding a data viewer to the Match output, I can see all of the column matches it made:

image

I really like this sample because it opens up a lot of data sources, like log files, that used to require custom parsing using a script component. I think it has a lot of potential!

Published Wednesday, July 23, 2008 8:52 PM by mmasson
Filed under:

Comments

# re: Regular Expression Flat File Source

Hi, How do i download this sample for sample test?

Thanks,

-Amol

Thursday, July 02, 2009 1:36 AM by AmolRaj

# re: Regular Expression Flat File Source

I got the sample code

Thanks.

Thursday, July 02, 2009 1:59 AM by AmolRaj
Anonymous comments are disabled
 
Page view tracker