Hartmut Maennel's Blog

A LINQ provider for Web queries

To start a series of "LINQ provider" posts, today I upload a provider sample that in some sense treats the Internet as a database: For a SQL Server database, you can make tables in a database accessible to LINQ by writing classes with attributes that define how objects of these classes are retrieved from rows in tables. LINQ can then use these classes to issue queries against the database. Similarly, this provider allows adding attributes to classes to specify how such objects are retrieved from Web pages, and you can then issue LINQ queries against them.

The project "WebLinq" in the attached solution contains this provider - it is not very sophisticated, it just contains three files:
- WebLinqAttributes.cs contains the attributes that are recognized
- WebContext.cs is the class your WebLinq enabled classes inherit from
- Utils.cs contains helper functions to GET / POST to a web site and to find substrings in a text.

The project "WebSources" defines some classes for 
- Searching for articles in the CiteSeer web sites (see below)
- Searching for articles in the MSDN web sites
- Translating words / sentences
- Integrating functions of one variable
- Looking up the current values of stocks from the company symbol

The project "SimpleDemos" uses these two DLLs to demonstrate the last three classes.

The project "TestWebLinq" demonstrates the access to the CiteSeer web sites.

CiteSeer is a database of computer science articles; you can search for articles by keywords, and obtain information about articles, and often even retrieve them directly from the Web site.
To use the CiteSeer demo, enter for example "Support Vector Machines" in the text box labeled "Search terms", and click on the "Retrieve" button. It will take some while to visit the web pages which list available articles, to visit the web page for each article, retrieve the information from this article, and access a another web page for details, but then you should see a list of paragraphs which contain
- Author's name(s)
- Title and year
- Some three lines of introduction
- URL for this article
- URL for downloading the article as pdf file
- Information about the rights for this article

If you are only interested in new articles, try entering 2002 in the "Publication year >=" text field and click again on "Retrieve" (currently I get 3 results back).

Here is how the corresponding query looks in the code:

var doc =
new GoogleCiteSeer(searchTerms,0);
var
query = from art in doc.Articles
           
where art.details.Document != null
              
&& art.details.Document.bibtex != null
              
&& art.details.Document.bibtex.year>=minYear
            
select art.details;

Here is an example for a class that defines how to read the "BibTeX" part of the Web page with details for an article:

public class CsBibTex {
  [
StartPart("author = \"")] [EndPart("\"")] public string
author;
  [
StartPart("title = \"")]  [EndPart("\"")] public string
title;
  [
StartPart("year = ")]     [EndPart(",")]  public int
year;
}

This sample code is provided as-is and does not come with any warranty.
You can modify and use the code for commercial and non-commercial purposes.

Published Monday, June 12, 2006 5:14 PM by Hartmut Maennel
Filed under:

Attachment(s): WebLinq.zip

Comments

 

Charlie Calvert's Community Blog said:

This week I heard a number of people on the C# team talking about the hot new gaming technology for Windows and the Xbox called XNA. The big news is that a beta of the XNA Game Studio Express is available as a free download.
September 8, 2006 9:15 PM
 

Charlie Calvert's Community Blog said:

Here are some useful links to LINQ information. Use the comments or write me if you want to add to this

February 28, 2008 2:27 PM
 

Charlie Calvert's Community Blog said:

I've recently updated the list of LINQ Providers found on my Links to LINQ page, accessible from the

February 28, 2008 2:47 PM
 

TerryLee said:

微软在.NET3.5中推出了LINQ,现在各种LINQProvider满天飞,刚才在老外站点上看到了一份LINQProvider列表,近30多个:LINQtoAmazonLINQto...

March 1, 2008 7:38 AM
 

Hecgo.com » Linq to ... everything: A List of LINQ Providers said:

March 3, 2008 2:35 AM
 

Hilton Giesenow's Jumbled Mind said:

I mentioned in a post a little while ago about the various LINQ To projects I had seen, but Charlie Calvert

March 19, 2008 3:06 AM
 

Carlos Fernando Paleo da Rocha
SBS MVP in Brazil
said:

LINQ Providers LINQ to Amazon LINQ to Active Directory LINQ over C# project LINQ to CRM LINQ To Geo

March 22, 2008 10:47 AM
 

Tecnologias said:

LINQ Providers LINQ to Amazon LINQ to Active Directory LINQ over C# project LINQ to CRM LINQ To Geo

March 22, 2008 11:17 AM
 

Jacques Snyman » LINQ To … said:

March 27, 2008 9:20 AM
 

LINQ to [AnyWhere] said:

April 10, 2008 6:34 AM
 

LINQ to [AnyWhere] said:

April 22, 2008 5:09 PM
 

Charlie Calvert's Community Blog said:

Here are some useful links to LINQ information. Use the comments or write me if you want to add to this

September 19, 2008 8:40 PM
 

Alex Krakovetskiy's blog said:

Офіційні: LINQ to SQL (DLINQ) LINQ to XML (XLINQ) LINQ to XSD LINQ to Entities BLINQ PLINQ Неофіційні

November 11, 2008 1:57 PM
 

Краковецький Олександр - персональний блог said:

Офіційні: LINQ to SQL (DLINQ) LINQ to XML (XLINQ) LINQ to XSD LINQ to Entities BLINQ PLINQ Неофіційні

November 18, 2008 6:24 AM
 

A List of LINQ Providers « vincenthome’s Tech Clips said:

November 29, 2008 12:28 PM
 

knom's developer corner said:

This weekend I’ve built a small application, which queries the “Simpsons” seasons guide data and updates

April 27, 2009 5:24 AM
 

Hartmut Maennel s Blog A LINQ provider for Web queries | Paid Surveys said:

June 2, 2009 3:04 AM
Anonymous comments are disabled

© 2009 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker