<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Media And Microcode : Resolve-Link</title><link>http://blogs.msdn.com/mediaandmicrocode/archive/tags/Resolve-Link/default.aspx</link><description>Tags: Resolve-Link</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>Microcode: PowerShell Scripting Tricks: Scripting the Web (Part 3) (Resolve-Link, Get-WebPageLink)</title><link>http://blogs.msdn.com/mediaandmicrocode/archive/2008/12/12/microcode-powershell-scripting-tricks-scripting-the-web-part-3-resolve-link-get-webpagelink.aspx</link><pubDate>Fri, 12 Dec 2008 12:59:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9201602</guid><dc:creator>JamesBrundage</dc:creator><slash:comments>1</slash:comments><comments>http://blogs.msdn.com/mediaandmicrocode/comments/9201602.aspx</comments><wfw:commentRss>http://blogs.msdn.com/mediaandmicrocode/commentrss.aspx?PostID=9201602</wfw:commentRss><description>&lt;P&gt;The first post in this series was learning to crawl.&amp;nbsp; I introduced &lt;A href="http://blogs.msdn.com/mediaandmicrocode/archive/2008/12/01/microcode-powershell-scripting-tricks-scripting-the-web-part-1-get-web.aspx" mce_href="http://blogs.msdn.com/mediaandmicrocode/archive/2008/12/01/microcode-powershell-scripting-tricks-scripting-the-web-part-1-get-web.aspx"&gt;Get-Web&lt;/A&gt;, which allows you to use System.Net.Webclient to download web sites in a variety of ways.&amp;nbsp; The next post was learning to walk.&amp;nbsp; I showed us &lt;A href="http://blogs.msdn.com/mediaandmicrocode/archive/2008/12/08/microcode-powershell-scripting-tricks-scripting-the-web-part-2-get-markuptag.aspx" mce_href="http://blogs.msdn.com/mediaandmicrocode/archive/2008/12/08/microcode-powershell-scripting-tricks-scripting-the-web-part-2-get-markuptag.aspx"&gt;Get-MarkupTag&lt;/A&gt;, which helps coerce parts of the web into XML.&amp;nbsp; Now we can start to really have some fun with the data and run wild.&lt;/P&gt;
&lt;P&gt;Pulling out semi-structured data is one thing, but it’s important to be able to pull out more complex information as well.&amp;nbsp; One interesting case is pulling out all of the links from a webpage.&amp;nbsp;&amp;nbsp; This task breaks down into four smaller tasks:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Downloading the page (done with Get-Web) 
&lt;LI&gt;Getting the &amp;lt;a&amp;gt; tags in a meaningful way (done with &lt;A href="http://blogs.msdn.com/mediaandmicrocode/archive/2008/12/08/microcode-powershell-scripting-tricks-scripting-the-web-part-2-get-markuptag.aspx" mce_href="http://blogs.msdn.com/mediaandmicrocode/archive/2008/12/08/microcode-powershell-scripting-tricks-scripting-the-web-part-2-get-markuptag.aspx"&gt;Get-MarkupTag&lt;/A&gt;) 
&lt;LI&gt;Extracting out the href attribute 
&lt;LI&gt;Determining if the link is relative or absolute &lt;/LI&gt;&lt;/OL&gt;
&lt;P&gt;To determine if the link is relative or absolute, I made a Resolve-Link function.&amp;nbsp; It takes a base url (e.g &lt;A href="http://www.foo.com/blah/blah.asp" mce_href="http://www.foo.com/blah/blah.asp"&gt;http://www.foo.com/blah/blah.asp&lt;/A&gt;) and a link found on it, and returns the real item it resolves to.&amp;nbsp; It optionally returns a property bag with the type of link and the resolved link.&lt;/P&gt;
&lt;P&gt;Here’s Resolve-Link:&lt;/P&gt;&lt;I&gt;
&lt;BLOCKQUOTE&gt;&lt;PRE class=CmdletDefinition&gt;function Resolve-Link([Uri]$uri,
    [string]$link,
    [switch]$returnLinkType) {
    #.Synopsis
    #   Resolves a relative or absolute link to an absolute url
    #.Description
    #   Takes a uri and a link to a page and returns the absolute url, or
    #   optionally returns a property bag with the link type
    #   (absolute, relative, or host relative) and the link
    #.Parameter uri
    #   The uri the link is located on
    #.Parameter link
    #   The original link text
    #.Parameter returnLinkType
    #   The return link type
    #.Example
    #   Resolve-Link http:/www.microsoft.com/ /technet/scriptcenter
    if ($link.StartsWith("/")) {
        # Relative to Host site
        if ($returnLinkType) {
            return New-Object Object |
                Add-Member NoteProperty Type "Host Relative" -PassThru |
                Add-Member NoteProperty Link ([uri]"$($uri.Scheme)://$($uri.DnsSafehost)$($link)") -PassThru
        }
        return "$($uri.Scheme)://$($uri.DnsSafehost)$($link)"
    } else {
        if ($link.StartsWith("$($uri.Scheme)://")) {
            # Absolute Link
            if ($returnLinkType) {
                return New-Object Object |
                    Add-Member NoteProperty Type "Absolute" -PassThru |
                    Add-Member NoteProperty Link ([uri]$link) -PassThru
            }            
            return $link
        } else {
            # Relative link
            $realLink = $uri.AbsoluteUri.Substring(0,
                $uri.AbsoluteUri.LastIndexOf("/")) + "/$link"    
            if ($returnLinkType) {
                return New-Object Object |
                    Add-Member NoteProperty Type "Relative" -PassThru |
                    Add-Member NoteProperty Link ([uri]$realLink) -PassThru
            }
            return $realLink            
        }
    }    
}&lt;/PRE&gt;&lt;/BLOCKQUOTE&gt;&lt;/I&gt;
&lt;P&gt;Once Resolve-Link was written, making Get-WebPageLink is an incredible snap.&amp;nbsp; It’s below, and it actually takes only 3 lines to do the real work and&amp;nbsp; 11 lines to explain the work and give examples.&lt;/P&gt;&lt;I&gt;
&lt;BLOCKQUOTE&gt;&lt;PRE class=CmdletDefinition&gt;function Get-WebPageLink($url) {
    #.Synopsis
    #   Returns all of the links within a webpage
    #.Description
    #   Resolves all &amp;lt;A&amp;gt; references and returns a property bag with
    #   the text contained in the link, the page the link came from,
    #   and the type of link returned (absolute, host relative, or relative)
    #.Parameter urltp
    #   The page to get links from
    #.Example
    #   Get-WebPageLink http://blogs.msdn.com/
    Get-MarkupTag a (Get-Web $url) | Foreach-Object {
        Resolve-Link $url $_.Xml.Href -returnLinkType |
            Add-Member NoteProperty Text $_.Xml."#text" -PassThru 
    }
}&lt;/PRE&gt;&lt;/BLOCKQUOTE&gt;&lt;/I&gt;
&lt;P&gt;Go ahead and give Get-WebpageLink a whirl: &lt;/P&gt;&lt;I&gt;
&lt;BLOCKQUOTE&gt;&lt;PRE&gt;Get-WebpageLink http://blogs.msdn.com&lt;/PRE&gt;&lt;/BLOCKQUOTE&gt;&lt;/I&gt;
&lt;P&gt;Ready for some real fun? Remember way back when I did a post about getting RSS feeds in PowerShell with Microsoft.FeedsManager (&lt;A href="http://blogs.msdn.com/mediaandmicrocode/archive/2008/11/11/microcode-scripting-rss-feeds-with-powershell-and-microsoft-feedsmanager.aspx" mce_href="http://blogs.msdn.com/mediaandmicrocode/archive/2008/11/11/microcode-scripting-rss-feeds-with-powershell-and-microsoft-feedsmanager.aspx"&gt;Get-Feed&lt;/A&gt;).&amp;nbsp; If you have that script handy, go ahead and check out this one liner that will refresh every RSS item you’ve got and extract out all of the links from it.&lt;/P&gt;&lt;I&gt;
&lt;BLOCKQUOTE&gt;&lt;PRE&gt;    
Get-Feed -recurse -articles | Foreach-Object { Get-WebPageLink $_.Link }&lt;/PRE&gt;&lt;/BLOCKQUOTE&gt;&lt;/I&gt;
&lt;P&gt;That particular command line can take a while, depending on how many blogs you subscribe to, but it gives you a brand new view on blogs (as a simmering stew of scripts, rather than just text to be read and comprehended).&lt;/P&gt;
&lt;P&gt;There’s more fun to come in unlocking the web, but these two scripts should get you started in extracting a little more into the wild world of the web.&lt;/P&gt;
&lt;P&gt;Hope this Helps,&lt;/P&gt;
&lt;P&gt;James Brundage [MSFT]&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=9201602" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/mediaandmicrocode/archive/tags/PowerShell/default.aspx">PowerShell</category><category domain="http://blogs.msdn.com/mediaandmicrocode/archive/tags/Microcode/default.aspx">Microcode</category><category domain="http://blogs.msdn.com/mediaandmicrocode/archive/tags/Scripting+Tricks/default.aspx">Scripting Tricks</category><category domain="http://blogs.msdn.com/mediaandmicrocode/archive/tags/Get-Feed/default.aspx">Get-Feed</category><category domain="http://blogs.msdn.com/mediaandmicrocode/archive/tags/Get-Web/default.aspx">Get-Web</category><category domain="http://blogs.msdn.com/mediaandmicrocode/archive/tags/Get-MarkupTag/default.aspx">Get-MarkupTag</category><category domain="http://blogs.msdn.com/mediaandmicrocode/archive/tags/Get-WebPageLink/default.aspx">Get-WebPageLink</category><category domain="http://blogs.msdn.com/mediaandmicrocode/archive/tags/Resolve-Link/default.aspx">Resolve-Link</category><category domain="http://blogs.msdn.com/mediaandmicrocode/archive/tags/Scripting+The+Web/default.aspx">Scripting The Web</category></item></channel></rss>