This is a continuation on An Advanced Look at Web Services and DataSources.  The original entry is located here:   http://windowsliveagents.spaces.live.com/blog/cns!5BCD45E519E07634!711.entry
 

Now let’s take a look at the datasource itself.  The datasource essentially is a function itself, with arguments to pass, and variables to return. In the preprocess section, the POST_DATA variable where the XML SOAP request string was built is put into here.  In addition, the actual web service URL is stated here as well. 

In the preprocess section, there are also two built-in variables that can be used, LIMIT and OFFSET.  These two variables are used to ‘page’ results in a cursor.  In the example above, we look at LIMIT to populate a variable called MAXRESULTS. The MAXRESULTS variable is then used in the COUNT element (in this case 10) to bring back 10 results per request.  If the user needs more, then the datasource then starts at the next row and retrieves 10 more results.

The simple xml section is a hierarchical representation of the XML response from the SOAP API, to be flattened out into a 2-dimension look when the data is retrieved.  Indentation is used to signify a parent-child relationship. The {loop=content} statement acts as a loop within the XML, iterating through the XML.  The end nodes (highlighted in BOLD) are the fields that is used to capture information and passed back to the calling routine.  Note that fields can be skipped in the simple xml section if the user does not need it.

It should be noted here that by using simple xml to represent the XML response, there is no provision for providing a “dynamic” representation of the XML using simple xml.  So in essence, you would have to potentially write a different datasource function for each different search type in this case.  For generate an advanced datasource that could output differently depending on the search type would require outputting a datasource in Buddyscript. We’ll cover this in a different blog.

datasource LiveSearchAPI(SEARCH, CULTURE_INFO) => Title, Description, Url, Source, NewsYear, NewsMonth, NewsDay, NewsHour, NewsMinute, NewsSecond {expire="in 1 hour" continue_on_error="true" timeout="15" }

  preprocess

    if LIMIT>10 || LIMIT<=0

      MAXRESULTS = 10

    else

      MAXRESULTS = LIMIT

    FIELDLIST = "Title Description Url Source DateTime"

    POST_DATA = BuildSearchAPIPostData(SEARCH, "News", OFFSET, MAXRESULTS, CULTURE_INFO, FIELDLIST)

  http

    http://soap.search.msn.com:80/webservices.asmx

    header

      Accept: application/soap+xml

    postdata {encode=no}

      POST_DATA

  simple xml

    Envelope

      Body

        SearchResponse

          Response

            Responses

              SourceResponse

                Offset => RESULTOFFSET    // Where we're starting from.

                Total => TOTAL     // Total number of results.  

                Results

                  Result {loop=content}

                    Title

                    Description

                    Url

                    Source

                    DateTime

                      Year

                      Month

                      Day

                      Hour

                      Minute

                      Second

  postprocess

    INFO.Offset = RESULTOFFSET

    INFO.MaxCount = TOTAL

    return INFO

There are other datasource properties that should be considered to either increase performance and or deal with potential errors in accessing/retrieving information from the datasource.  The first one is the Timeout property.  You can specify this time in order to lengthen or shorten the time it takes before the datasource quits accessing the web service.  The default value is 10 seconds.  In our case, we have it at 15 seconds.  The next property is the continue_on_error property.  By changing this property to ‘yes’, execution will still continue and the datasource caller can retrieve the error message in the SYS.Data.Error variable.  This is only on those sources that call the ABErrorProc.  The final property is very important.  It is the Expire property.  This determines how long retrieved data should be valid, i.e. kept in cahsed memory.  The ability to cache retrieved data in memory will improve performance on retrieving information in the datasource.  You should consider these factors:

1) how often does the data change?

2) how often will the same retrieved data be asked again?

3) how large is the retrieved data set?

4) server memory cache size (N/A on hosted applications)

5) how fast does the web service perform?

All of these are considerations.  In our case, since news items change frequently, we’ll set it for a relatively short time period, say 1 hour. 

Examples of the Expire property:

Expire=”never” /* this is the default expiration for most non-Buddyscript datasources */

Expire=”in 1 hour”

Expire =”now”  /* no caching at all, same as “never” */

Expire=”tomorrow at 5am” /* Note that this time is the server time, not the client time.  In hosted applications, this is in GMT time */

The postprocess section is important for returning a range of information.  For datasources that do not handle the processing of data using offsets and limits (i.e. simple xml), if the postprocess section is missing, the processing QueryServer will process the data coming back from the datasource in its entirety. In cases where the output coming back is one entity or one row, or if the amount of data needed to be processed is small, the postprocess section is not needed.

  postprocess

    INFO.Offset = RESULTOFFSET

    INFO.MaxCount = TOTAL

    return INFO

Looking at the postprocess section, this section is used to set the offset and total count of rows in a variable.  This variable is then used by Buddyscript to control the display of output.

In this case, INFO is the name of an object variable. The names of the variables inside the object is Offset and MaxCount, and these values are populated from the datasource:

                Offset => RESULTOFFSET    // Where we're starting from.

                Total => TOTAL     // Total number of results.  

Finally, here is a crude routine to pass a request to the Live Search API, access the web service and display the contents of the data, using Buddyscript code to control the amount of data coming in.

? Tell me some news about STRING=Anything

  LOCALE="en-us"

  TITLE, DESCRIPTION, LINK, SOURCE, YEAR, MONTH, DAY, HOUR, MINUTE, SECOND = LiveSearchAPI(STRING, LOCALE) show 10

    * Here are the results:

    - TITLE, SOURCE

    * <blank/>

      <ifmore>Type "more" for more news.</ifmore>

  else

    - Sorry, no news sites were found for your input.  

In the input, you could ask a question such as “Tell me some news about Baron Davis” for example, and get back results that looks like this (notice that the output only contains 2 out of the 10 arguments returned, Title and Source):

Here are the results:

Baron Davis Going South, San Francisco Gate

The Baron Davis, Gilbert Arenas Switch-a-roo?, San Francisco Gate

Clippers set sights on Baron Davis, Los Angeles Times

Baron Davis on verge of signing with Clippers, Washington Post

NBA: Warriors trying to woo Brand, Newsday

Baron Davis becomes free agent, Chicago Sun-Times

Report: Davis to ditch Warriors for Clippers, FOXSports.com

Logo? Colors? History? Don't mean a thing if you ain't got that team, CBS Sportsline

Davis on verge of joining Clippers, CNN Sports Illustrated

Davis on verge of signing with Clippers, Salon

Type “more” for more news.

.

.

.

.

Notice that in the pattern routine, there is an option called SHOW 10.  This means to output 10 rows at a time.  Buddyscript will go to the output datasource to retrieve the information, which in this case happens to be exactly 10 rows, since the request was to buffer 10 rows per datasource request.  If the user were to type in “more”, another 10 rows will be retrieved from the datasource and 10 more rows will be displayed, and so on.  (Note that there is a Buddyscript variable named SYS.Presentation.Maxlength that also controls the number of characters that can be displayed on an IM client.  Depending on what this is set to, this number will also control the number of rows displayed back.)

With the SHOW command, this allows the user a quick and equivalent way of emulating a forward read-only cursor, i.e. displaying x number of rows of output at one time.  The other option would be to put the data into an object and loop through the object, displaying each row, which involved more coding. It’s very possible that for control purposes, the latter method is the right way to go, but for quick coding and display, SHOW is very powerful.

Hopefully you have gotten a chance to absorb the intricacies of using datasources by accessing a really powerful web service.  Thanks for your attention!