Welcome to MSDN Blogs Sign in | Join | Help

Use Regular Expressions to get hyperlinks in blogs

At Southwest Fox conference I presented a sample calling a VB.NET server to do regular expression matching.

Here’s the sample I used. It gets some HTML from my blog and parses all the hyperlinks (looks for the HREF tags) and puts them into a VFP table:

 

First create a VB server: A Visual Basic COM object is simple to create, call and debug from Excel

 

The VFP code gets the blog page as html then passes that html to the VB server, which does the regular expression matching:

 

LOCAL ox as vbcom.ComClass1

 

oVB=CREATEOBJECT("VBCom.ComClass1")

LOCAL oHTTP as "winhttp.winhttprequest.5.1"

oHTTP=NEWOBJECT("winhttp.winhttprequest.5.1")

oHTTP.Open("GET","http://blogs.msdn.com/calvin_hsia",.f.)

oHTTP.Send()

cHTML=ohTTP.ResponseText

cXML =oVB.RegEx(chtml)

XMLTOCURSOR(cxml)

BROWSE LAST NOWAIT

 

 

 

Add a method to the VB class (this version works in VS 2003):

 

Imports System.Text.RegularExpressions

Imports System.Xml

 

    Public Function RegEx(ByVal cHtml As String) As String

        Dim cregex As Regex = New Regex("href\s*=\s*(?:""(?<1>[^""]*)""|(?<1>\S+))", _

             RegexOptions.IgnoreCase Or RegexOptions.Compiled)

        Dim MatchCollection As MatchCollection = cregex.Matches(cHtml)

        Dim sb As New System.Text.StringBuilder

        Dim xw As XmlTextWriter = New XmlTextWriter(New System.IO.StringWriter(sb))

 

        xw.WriteStartElement("VFPData")

        For Each m As Match In MatchCollection

            xw.WriteStartElement("Row") ' for each Row

            xw.WriteStartElement("RegEx") ' field name

            xw.WriteString(m.Value)

            xw.WriteEndElement()

            xw.WriteEndElement()

        Next

        xw.WriteEndElement()

        Return sb.ToString

 

 

Of course, when I did the demo, I used a newer version of VB and I did a SQL Select from the Regular Expression results. I also used XLINQ, the new XML features of LINQ

 

                   Dim aList As New List(Of Match)

                   For Each m In MatchCollection

                             aList.Add(m)

                   Next

 

                   Dim res = Select p From p In aList Order By p.Tostring()

                   Dim xmlMain = <VFPData/>

                   For Each item In res

                             Dim xRow = <Row/>

                             xRow.Add(<RegEx><%= item %></RegEx>)

                             xmlMain.Add(xRow)

                   Next

                   Return xmlMain.ToString

 

 

 

 

Published Tuesday, November 22, 2005 6:53 PM by Calvin_Hsia
Filed under: , ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

# re: Use Regular Expressions to get hyperlinks in blogs

Wednesday, November 23, 2005 5:48 PM by davidfung
This and the previous VBCOM Debugging post show that VFP and VB.NET are quite interoperable. VB.NET COM objects sounds like a good way to allow VFP to make use of .NET features down the road...

# The VB version of the Blog Crawler

Monday, June 12, 2006 8:26 AM by Calvin Hsia's WebLog
This is the VB.Net 2005 version of the Blog Crawler. It’s based on the Foxpro version, but.it uses SQL...

# Create a .Net UserControl that calls a web service that acts as an ActiveX control to use in Excel, VB6, Foxpro

Friday, July 14, 2006 12:48 PM by Calvin Hsia's WebLog
Here’s how you can use Visual Studio to create a .Net User Control that will act as an ActiveX control...

# Create a .Net UserControl that calls a web service that acts as an ActiveX control to use in Excel, VB6, Foxpro &raquo; Wagalulu - Microsoft &raquo; &raquo; Create a .Net UserControl that calls a web service that acts as an ActiveX control to

Leave a Comment

(required) 
required 
(required) 

  
Enter Code Here: Required
 
Page view tracker