As an Agents developer, you need to be able to handle and present data accurately and efficiently.  There are many string parsing, manipulation, and examination functions in the Buddyscript arsenal.  Many are of the standard variety that are found in most programming languages.  The more common and obvious ones that follow the same concepts as most other high-level computer programming languages are the following (for more details on these, you can look at the online help under Procedures and Functions in the String Handling section):

 

Common String Parsing Functions:

 

  StringSub - the most common substring function that gets a part of a string.  Don't confuse this with the StringSubstitute function, which does string replacing.

  StringGrabPart - this function splits out a string into different parts based on a given separator.  A number is provided as one of the arguments in the function, indicating which part to return.  Compare this with StringSplit.

  StringSplit - given a string, splits the string into parts based on a separator string.  These parts are put into an object list.

 

Common String Manipulation Functions:

 

  StringUppercase - uppercases a string

  StringUppercaseFirst - uppercases only the the first letter in the string

  String UppercaseInitials - uppercases the first letter in each word of the string

  StringLowercase - lowercases a string

  StringConcat - concatenates two or more strings.  Each string to be concatenated is a parameter.

  StringTrim - trims off leading and trailing spaces from a string

  StringLTrim - trims off leading spaces from a string

  StringRTrim - trims off trailing spaces from a string

  StringSubstitute - given a target string, replaces all occurences of one string with another string.  Don't confuse this with StringReplace (see below)

  StringReplace - given a target string, replaces a string with another string at a particular position in the string.

  StringFormat - similar to the printf function in standard C programming language library, where a format is given to define the string.

 

Common String Examination functions:

 

  StringLength - returns the length of a string

  StringSearch - finds the position number (Buddyscript uses an offset starting at 0 in its string) in a string, given a target string.

  StringStartsWith - used as a conditional (true or false, comparing a string with a desired start-with string to see if the string starts with the start-with string

  StringEndsWith - used as a conditional (true or false, comparing a string with a desired end-with string to see if the string ends with the end-with string

  StringAscii - returns the ascii value of a character

  StringChar - the opposite of StringAscii - returns the character value of an ASCII code

 

Buddyscript does provide a few more string parsing and manipulation functions that are not as common to programming languages, but are very powerful.  Some of these functions are quite useful. I'm going to go into a little more depth on some of these functions.

 

 

The concept of a thawed string - Using StringClean

 

Let's say you need to capture data that can come in in different formats.  For example, a telephone number.  A telephone number could come in looking like 650-555-1212, (650) 555-1212, 650.555.1212, and so on. One pretty solid approach you could take is to use the StringClean function to normalize the data.  What the StringClean function does is to convert a string into its thawed state.  The term 'thawed' is something germaine to Buddyscript. A thawed string is a string that has been stripped of all special characters except for letters and numbers.

 

Here is a little test code snippet:

 

+ _testme TESTSTRING=Anything

  STRINGTHAWED = StringClean(TESTSTRING)

  STRINGNORMALIZED = StringSubstitute(STRINGTHAWED, " ", "")

  - STRINGNORMALIZED

 

Let's run this:  _testme (650) 555-1212

 

The STRINGTHAWED variable will look like:   650 555 1212

 

By then replacing all whitespace characters with nothing using the StringSubstitute function, the STRINGNORMALIZED variable will then contain 6505551212.

 

Hashing in Buddyscript - Using StringHash

 

A hash is usually used as a numerical representation of a longer string.  In many programming languages, It can be used as a way to better index a set of data, i.e. indexing on the hash rather than on a long string.  A hash can also be used to check that a string or computer file has not been altered.  We can do this by comparing an expected hash value with the hash value of the string or file.  The form of the syntax for StringHash is as follows

 

HASH = StringHash(URL, 2147483647)

 

The 2nd number is the maximum number allowed for hash values.  You can specify any number up to 2147483647. This number is 2^31 -1, which is the absolute maximum you can specify.  Any number greater will return zeroes.  For example, the hash generated from the statement StringHash("http://www.microsoft.com", 2147483647) would be 251890749. 

 

Also, note that there are also two other hashing functions that are undocumented, GetMD5 and GetSHA1, which are also built-in.  These use the MD5 and the SHA-1 encryption algorithms.  Both return a hexadecimal string.

 

HEXSTRING = GetMD5(String)

HEXSTRING = GetSHA1(String)

 

 

Using the regex functions

 

If you're not familar with regular expressions, hang on to your seat belt.  Regular expressions are commands that also do string processing.  But think of regular expressions as string processing on steroids.  With a small set of commands, one can do a lot of very flexible and powerful processing of text data.

 

BuddyScript has a slew of string functions that accept regular expressions as a means for searching and manipulate data.  Let's look at a real world example.

 

Currently you are reading my blog post right now.  So let's say, just as a crude example, that you wanted to find the last posting date in this blog and report it in your agent.  So how would we do this?

 

This is known as (web)site scraping.  In general, this is not a good practice, since HTML can change all the time at the whim of a web page designer.  It is far better to use a web service that has an RSS feed or an XML feed that contains this data.  (Also, hitting against a website too often can lead to administrators thinking that there is a Denial of Service attack going on, so don't try this often).  Having said this, let's pretend that no feeds exist, and you really have to grab the date from the website.

 

First of all, you'd set up a datasource to access the data, and then put the data into a variable.  Here's an example:

 

datasource dsBlog() => PAGE

  http

    http://blogs.msdn.com/windowsliveagents/

 

DATA = dsBlog()

 

The variable DATA now contains all the HTML, most of it which is fairly unreadable unless you are an HTML guru.  We know that the most recent posting date is between the word "Posted" and the word "by" (as I said before this is a very unstable technique but bear with me).  We can now use the StringMatch function to grab this specific data.  The StringMatch function has 3 parts to it, the string, the regular expression, and the object hash where the splitting of the data will go into for the regular expression.

 

The actual HTML snippet might look something like this:

 

"Posted <a id="postlist" href="http://blogs.msdn.com/feedback.aspx">Saturday, September 13, 2008 12:32 AM</a> by"

 

 

The statement to extract this would look something like this:

 

  EXTRACT = ()

  MATCHES = StringMatch(TESTSTRING, "Posted(.*?)\.aspx\">(.*?)</a> by", EXTRACT)

  - EXTRACT[1]

 

We allocate a list called EXTRACT, which will contain the contents as specified in the regular expression inside the parenthesis, (.*?).  So what does .*? mean?  The '.' dot represents matching any one character.  The '*' is a quantifiers.  The '*' means to grab all characters.  The ? means to look for the first occurence.  There is a lot to regular expressions which this blog post won't be able to cover.

 

EXTRACT[0] will thus contain <a id="postlist" href="http://blogs.msdn.com/feedback

EXTRACT[1] will contain Saturday, September 13, 2008 12:32 AM

MATCHES will contain the number 2, since there are 2 matches found.

 

There are other string functions that use regular expressions as input as well.  Once again, look at the online documentation for more details.

 

  StringSplitRegex - is like StringSplit, but instead of a fixed separator value, a regular expression is used instead.

  StringSubstituteRegex - is like StringSubstitute, but instead of a fixed string to be looked for for replacing, a regular expression can be used instead.