Welcome to MSDN Blogs Sign in | Join | Help

F# Language Details (Gotchas)

The ‘F’ in F# stands for fun. However, there are some details in F# that might lead to bugs, surprises, and/or un-fun. This post is to highlight a couple of random ‘gotchas’ when exploring some corners of the F# language.

Unicorn Image #13260

 

Overriding Equals and Not Equals

F# allows you to overload operators so you can better map mathematical concepts to your code. For example, consider this simple vector type. Notice how it overrides the Equals operator to compare just the vector’s magnitudes, not their actual values.

type Vector =
    | Vector of float * float * float

    member this.Length =
        // Extract values via ninja pattern match
        // hidden in a let binding
        let (Vector(x,y,z)) = this
        sqrt <| x * x + y * y + z * z

    static member (+) (lhs, rhs) =
        let (Vector(x1,y1,z1)) = lhs
        let (Vector(x2,y2,z2)) = rhs

        Vector(x1 + x2, y1 + y2, z1 + z2)
        
    // Equals
    static member (==) (lhs : Vector, rhs : Vector) =
        lhs.Length = rhs.Length

    // Not equals
    static member (!=) (lhs : Vector, rhs) =
        not (lhs == rhs)

When you run this code in FSI, it doesn’t work.

> // Bug?
let vec1 = Vector(10.0, 0.0, 0.0)
let vec2 = Vector(0.0, 0.0, 10.0);;

val vec1 : Vector = Vector (10.0,0.0,0.0)
val vec2 : Vector = Vector (0.0,0.0,10.0)

> vec1 = vec2;;
val it : bool = false
> vec1.Length = vec2.Length;;
val it : bool = true

The key here is to understanding a bit more about how operator overloading in F# and .NET works. When you override an operator such as (+) the compiler generates a method named op_Addition. So when a .NET compiler sees “x + y” it looks for a method with that name. Similarly it loops for op_Subtraction, op_BitwiseOr, etc.

Since F# allows you to define symbolic operators (functions with symbols for names) you can overload operators that aren’t valid in C# or VB.NET, for example (==>):

// Rotate the vector about the X axis
static member (==>) (vec : Vector, rads : float) =
    let (Vector(x,y,z)) = vec
    
    let x' = x * cos rads - y * sin rads
    let y' = x * sin rads + y * cos rads
    let z' = z

    Vector(x', y', z)
> let v = Vector(10.0, 0.0, 0.0);;

val v : Vector = Vector (10.0,0.0,0.0)

> v ==> System.Math.PI;;
val it : Vector = Vector (-10.0,0.0,0.0)
> v ==> (System.Math.PI / 2.0);;
val it : Vector = Vector (0.0,10.0,0.0)

While F# recognizes these symbolic operators, to call them from C# you need to use the ‘long name’ which in the previous example would be op_EqualsEqualsGreater.

To override the equals operator in C# requires that you define (==), which in turn generate a method called op_Equality. However, defining (==) in F# generates a method called op_EqualsEquals.

To properly overload the equals and not equals operators in F# you must use the F# operators for equality (=) and (<>).

// Equals
static member (=) (lhs, rhs) =
    lhs.Length = rhs.Length

// Not equals
static member (<>) (lhs : Vector, rhs) =
    not (lhs == rhs)

Improper use of Range Expressions

Consider the following two loops:

for i in [1 .. 10000000] do
    ...

for i in 1 .. 10000000 do
    ...

They may not look that different, but they have dramatically different performance profiles. You can find these differences detailed in the recently updated F# Language Specification. Both are examples of range expression, which is a way to create a sequence between to indexes. However, the subtlety here is that the first generates a full list, while the second is just the raw sequence.

So while one requires ten million memory allocations to generate the list, the second loop doesn’t. More importantly, the compiler can optimize the second loop into a standard for loop over a single integer. (And not need to do any memory allocation at all!)

Posted by ChrSmith | 5 Comments

Awesome F# - Decision Trees – Part II

In my previous post I went over the theory behind the ID3 algorithm. Now that we got all that painful math out of the way, let’s write some code! Here is an implementation of the algorithm in F#. (It is also attached to this blog post, download it via the link at the bottom.)

open System

type Record = 
    {
        Outlook     : string
        Temperature : string
        Humidity    : string
        Wind        : string
        PlayTennis  : bool 
    }

    /// Given an attribute name return its value
    member this.GetAttributeValue(attrName) =
        match attrName with
        | "Outlook"     -> this.Outlook
        | "Temperature" -> this.Temperature
        | "Humidity"    -> this.Humidity
        | "Wind"        -> this.Wind
        | _ -> failwithf "Invalid attribute name '%s'" attrName

    /// Make the %o format specifier look all pretty like
    override this.ToString() =
        sprintf
            "{Outlook = %s, Temp = %s, Humidity = %s, Wind = %s, PlayTennis = %b}" 
            this.Outlook
            this.Temperature 
            this.Humidity
            this.Wind
            this.PlayTennis

type DecisionTreeNode =
    // Attribute name and value / child node list    
    | DecisionNode of string * (string * DecisionTreeNode) seq
    // Decision and corresponding evidence
    | Leaf         of bool * Record seq

// ----------------------------------------------------------------------------

/// Return the total true, total false, and total count for a set of Records
let countClassifications data = 
    Seq.fold 
        (fun (t,f,c) item -> 
            match item.PlayTennis with
            | true  -> (t + 1, f, c + 1)
            | false -> (t, f + 1, c + 1))
        (0, 0, 0)
        data

// ----------------------------------------------------------------------------

/// Return the theoretical number of bits required to classify the information.
/// If a 50/50 mix, returns 1, if 100% true or false returns 0.
let entropy data = 
    let (trueValues, falseValues, totalCount) = countClassifications data        

    let probTrue  = (float trueValues)  / (float totalCount)
    let probFalse = (float falseValues) / (float totalCount)

    // Log2(1.0) = infinity, short circuiting this part
    if trueValues = totalCount || falseValues = totalCount then
        0.0
    else
        -probTrue * Math.Log(probTrue, 2.0) + -probFalse * Math.Log(probFalse, 2.0)

/// Given a set of data, how many bits do you save if you know the provided attribute.
let informationGain (data : Record seq) attr =
    
    // Partition the data into new sets based on each unique value of the given attribute
    // e.g. [ where Outlook = rainy ], [ where Outlook = overcast], [ ... ]
    let divisionsByAttribute = 
        data 
        |> Seq.groupBy(fun item -> item.GetAttributeValue(attr))

    let totalEntropy = entropy data
    let entropyBasedOnSplit =
        divisionsByAttribute
        |> Seq.map(fun (attributeValue, rowsWithThatValue) -> 
                        let ent = entropy rowsWithThatValue
                        let percentageOfTotalRows = (float <| Seq.length rowsWithThatValue) / (float <| Seq.length data)
                        -1.0 * percentageOfTotalRows * ent)
        |> Seq.sum

    totalEntropy + entropyBasedOnSplit
    
// ----------------------------------------------------------------------------

/// Give a list of attributes left to branch on and training data,
/// construct a decision tree node.
let rec createTreeNode data attributesLeft =
    
    let (totalTrue, totalFalse, totalCount) = countClassifications data

    // If we have tested all attributes, then label this node with the 
    // most often occuring instance; likewise if everything has the same value.
    if List.length attributesLeft = 0 || totalTrue = 0 || totalFalse = 0 then
        let mostOftenOccuring = 
            if totalTrue > totalFalse then true
            else                           false
        Leaf(mostOftenOccuring, data)
    
    // Otherwise, create a proper decision tree node and branch accordingly
    else
        let attributeWithMostInformationGain =
            attributesLeft 
            |> List.map(fun attrName -> attrName, (informationGain data attrName))
            |> List.maxBy(fun (attrName, infoGain) -> infoGain)
            |> fst
        
        let remainingAttributes =
            attributesLeft |> List.filter ((<>) attributeWithMostInformationGain)

        // Partition that data base on the attribute's values
        let partitionedData = 
            Seq.groupBy
                (fun (r : Record) -> r.GetAttributeValue(attributeWithMostInformationGain))
                data

        // Create child nodes
        let childNodes =
            partitionedData
            |> Seq.map (fun (attrValue, subData) -> attrValue, (createTreeNode subData remainingAttributes))

        DecisionNode(attributeWithMostInformationGain, childNodes)

The entropy and informationGain functions were covered in my last post, so let’s walk through how the actual decision tree gets constructed. There’s a little work to calculating the optimal decision tree split, but with F# you can express it quite beautifully.

let attributeWithMostInformationGain =
    attributesLeft 
    |> List.map(fun attrName -> attrName, (informationGain data attrName))
    |> List.maxBy(fun (attrName, infoGain) -> infoGain)
    |> fst

First, it takes all the potential attributes left to split on…

attributesLeft 

… and then maps that attribute name to a new attribute name / information gain tuple …

|> List.map(fun attrName -> attrName, (informationGain data attrName))

… then from the newly generated list, pick out the tuple with the highest information gain …

|> List.maxBy(fun (attrName, infoGain) -> infoGain)

…finally returning the first element of that tuple, which is the attribute with the highest information gain.

|> fst

Once you can construct a decision tree in memory, how do get it out? The simplest way is to print it to the console.

image

The code is very straight forward. Note the use of ‘padding parameter’, so that recursive calls get indented more and more. This is a very helpful technique when printing tree-like data structures to the console.

/// Print the decision tree to the console
let rec printID3Result indent node =
    let padding = new System.String(' ', indent)

    match node with
    | Leaf(classification, data) ->
        printfn "\tClassification = %b" classification
        // data |> Seq.iter (fun item -> printfn "%s->%s" padding <| item.ToString())

    | DecisionNode(attribute, childNodes) ->
        printfn "" // Finish previous line
        printfn "%sBranching on attribute [%s]" padding attribute
        
        childNodes
        |> Seq.iter (fun (attrValue, childNode) ->
                        printf "%s->With value [%s]..." padding attrValue
                        printID3Result (indent + 4) childNode)

However, it’s almost the year 2010. So in lieu of flying cars perhaps we can at least do better than printing data to the console. Ideally, we want to generate some sexy image like this:

image

You could painstakingly construct the decision tree using Microsoft Visio but fortunately there are tools to do this for you. AT&T Research has produced a great tool called GraphViz. While the end result doesn’t quite have sizzle, it’s very easy enough to get going.

The following function dumps the decision tree into a format that GraphViz can plot. (Just copy the printed text into the tool and plot it using the default settings.)

/// Prints the tree in a format amenable to GraphViz
/// See http://www.graphviz.org/ for more format
let printInGraphVizFormat node =

    let rec printNode parentName name node = 
        match node with
        | DecisionNode(attribute, childNodes) ->

            // Print the decision node
            printfn "\"%s\" [ label = \"%s\" ];" (parentName + name) attribute

            // Print link from parent to this node (unless it's the root)
            if parentName <> "" then
                printfn "\"%s\" -> \"%s\" [ label = \"%s\" ];" parentName (parentName + name) name

            childNodes 
            |> Seq.iter(fun (attrValue, childNode) -> 
                    printNode (parentName + name) attrValue childNode)

        | Leaf(classification, _) ->
            let label =
                match classification with
                | true  -> "Yes"
                | false -> "No"
            
            // Print the decision node
            printfn "\"%s\" [ label = \"%s\" ];" (parentName + name) label

            // Print link from parent to this node
            printfn "\"%s\" -> \"%s\" [ label = \"%s\" ];" parentName (parentName + name) name

    printfn "digraph g {"
    printNode "" "root" node
    printfn "}"

So there you have it, ID3 in F#. With a little bit of mathematics and some clever output you can construct decision trees for all your machine learning needs. Think of the ID3 algorithm in the future the next time you want to mine customer transactions, analyze server logs, or program your killer robot to find Sarah Conner.

<TotallyShamelessPlug> If you would like to learn more about F#, check out Programming F# by O’Reilly. Available on Amazon and at other fine retailers. </TotallyShamelessPlug>

Posted by ChrSmith | 2 Comments
Filed under: , ,

Attachment(s): ID3.fsx

Awesome F# - Decision Trees – Part I

Programming F# is out! Meaning you can, and should, go to the store and pick up a copy today.

With Programming F# serving as a solid guide for the F# Language, I’d like to start posting less about language features and more about applications. That is, what can you do with this awesome language.

This is the first blog post in a series titled Awesome F#. These posts will provide advanced, real-world applications of the F# language to do awesome things. This post is about Decision Trees and the ID3 algorithm, motivated by this question on StackOverflow.com.

If you want to learn more about data mining and machine learning THE BOOK you need to get is Machine Learning by Tom Mitchell. I’ll give a warning however, it is the most academically rigorous and dense book I’ve ever read. Seriously, not for the faint of heart!

cover

Defining the Problem

So what does the ID3 algorithm do? It is a technique for creating decision trees much like the following used for determining whether or not to play tennis. Starting at the topmost node, if the outlook is overcast then you should play tennis. If it is rainy however, you should play tennis only if the wind is weak.

image

In addition to helping plan your sports activities, decision trees are an easy way to gather business intelligence. Imagine you run mid-sized paper company in Scranton, Pennsylvania. You have a database filled with IP addresses of people who have visited your site, special deals you’ve offered them on your products, and whether or not they ended up purchasing some paper.

This is a situation where you want to apply data mining to extract meaning from the data. In this case, how to optimize your sales.

There are many different data mining algorithms based on the type of data you have and what you are looking for. In this blog post I will be covering decision trees, which is a machine learning technique for predicting a concept based on data. (For example, given a browser session predict if the user will purchase something.)

More formally: given a set of instances represented as name/value pairs - where the values are discrete - create a decision tree to most accurately classify any new instances. In the example above the name value pairs were Outlook : {Sunny, Rainy, Overcast} and the classification was ‘should you plan tennis’.

The Approach

The simplest way to do this is to generate tree branches greedily. Given your training data, find the best branch first and then moving on with more specific subsets of rules on reduced data sets. For example, if trying to determine if someone will purchase a product from your website whether or not they are a returning customer is probably the most significant piece of information.

To find the ‘best split’ we will use an approach known as information again. But first, let’s look at a concept from information theory called entropy. Imagine you want to create a decision tree to determine if someone is an evil genius. Each datum may contain information such as gender and whether or not they had a Ph.D.

image

Entropy and information gain are metrics for calculating how much you learned from taking a given branch. If exactly 50% of evil geniuses are men, then you have learned nothing by branching on gender. (Since the same number of evil geniuses will be on the left side of the tree as the right side of the tree.) However, for determining evil geniuses education is likely very important – since we can assume the majority of evil geniuses have a Ph.D. in malevolence. In that case the decision tree branch is very effective at whittling down the data into a homogenous group: evil genius or not.

Enter the Mathematics

While information gain and entropy are simple concepts, the math is a bit intimidating. Let’s carry on with the ‘is an evil genius’ predictor using the following data:

Gender Education Is Evil Genius?
Female No Ph.D. No
Male No Ph.D. No
Male No Ph.D. No
Female No Ph.D. No
Female No Ph.D. No
Male No Ph.D. No
Male No Ph.D. No
Female No Ph.D. No
Female No Ph.D. No
Male No Ph.D. Yes
Male No Ph.D. Yes
Female No Ph.D. No
Female No Ph.D. No
Male No Ph.D. No
Male No Ph.D. No
Female No Ph.D. No
Female No Ph.D. No
Male No Ph.D. No
Male No Ph.D. No
Female No Ph.D. No
Female No Ph.D. No
Male No Ph.D. No
Male No Ph.D. No
Female No Ph.D. No
Female No Ph.D. No
Male Ph.D. Yes
Male Ph.D. Yes
Female Ph.D. Yes
Female Ph.D. Yes
Male Ph.D. No

Entropy

Given a set of data, S, which can be divided c ways with the probability of each division being p the entropy of that set is given by:

clip_image002

For example, if out of a set of six evil geniuses only four are men what is the entropy of that set?

clip_image002[16]

Or 0.9182. Now, if out of that set of five people with Ph.D.s, four were evil geniuses. What is the entropy there?

clip_image002[18]

Or 0.6883. That lower number means that it requires less information to encode that the evil geniuses of the log have a Ph.D. If you run the math for entropy, it will be zero when everything in a set has the same type and exactly 1 when its a 50/50 split. Looking something like this:

(Image generated from Wolfram Alpha.)

So the entropy for a homogenous set is lower than that of a mixed set. This is important because in our decision tree we want to narrow the data into more specific values – so we can make predictions more accurately.

Information Gain

Now that we know how to measure how homogenous a set of data is, how do we go about deciding the best attribute to branch on? That is, do we branch on gender, has a Ph.D., or some other attribute? To calculate this we can use the information gain function defined as:

clip_image002[12]

Given a set of values, S, and an attribute of each datum in that set, A, determine how much information has been gained by branching on that attribute. Note that Sv is the subset of values S where each element has attribute value v.

So what is the information gained by putting our root branch at ‘has a Ph.D'.’ or not?

clip_image002[1]

clip_image002[3]

Or 0.2002. If we calculated the information gain from branching on gender, the result is –0.0513. So the math backs up our assertion that Ph.D. is a more significant factor in determining evil genius than gender. (Which makes sense since 4/5ths of the Ph.D.s are evil, whereas only 2/25ths of the non Ph.D.s are evil.)

Putting it Together

In Part II we’ll implement the ID3 algorithm in F#, which generally goes through the following process:

  1. Find the attribute that leads to the most information gain.
  2. Add a branch for that attribute.
  3. For each child node, repeat until all the data have the same classification.
Posted by ChrSmith | 5 Comments

Upcoming F# Talks

Recently legendary Cambridge research Don Syme presented an Introduction to F# at JAOO Aarhus in Denmark. From what I’ve heard the talk went really well, and the slide deck is a great way to pick up the F# language (or give an intro-level presentation yourself). I figured I’d point out some other upcoming F# talks I’ll be giving as well:

Nov 17-19, Platform Developers Conference (PDC), Los Angeles

I’ll be at PDC this year, but unfortunately I won’t be giving any talks. You’ll find me manning the Visual Studio 2010 booth talking about what’s going on in the languages space: talking about new features in C#, Visual Basic.NET, and what to expect in F#, in addition to plugging Iron Ruby and Iron Python.

Nov 21-22, SoCal Code Camp, Los Angeles

The weekend after PDC I’ll be giving the same F# for Architects talk, though with some updated slides and 50% more sizzle. If you are in the LA area I’d encourage you to stop by.

Jan 13-15, CodeMash, Sandusky Ohio

My talk Being an Evil Genius with F# and .NET was accepted and as you can tell from the abstract, I am super excited about giving the presentation. CodeMash is an awesome conference, and indoors at the Kalahari resort in Sandusky is absolutely where you want to be on a cold week in January.

In other news, the F# team is still plugging away trying to finish up Visual Studio 2010. We’ll have some announcements coming soon, but in the mean time the latest release of F# can is May 2009 CTP or Visual Studio 2010 Beta1. And, perhaps more importantly ;) Programming F# is finally out! It’s so out that I have a copy of the book on my desk in front of me right now. I’d love to start a book tour, so if any of you know how to get me 10 minutes on the Today Show let me know.

Cheers!

Posted by ChrSmith | 2 Comments

Grotesque F# Code - I

Recently a friend came to me in a mild panic about some massive refactoring he needed to do to an F# code base. The code he had was very complicated and maintenance was a pain. After only a few seconds scanning through the code I certainly could see that the code was more complicated than it needed to be, in fact I would even go so far as to say it was grotesque.

Odd or unnatural in shape, appearance, or character; fantastically ugly or absurd; bizarre.

Note that I didn’t say terrible or wrong. The code he showed me did exactly what it was supposed to do. But it was non-idomatic; so any seasoned F# developer would probably cringe when looking at it. The point of this blog post isn’t to criticize any F# code out in the wild, but rather just show how using functional programming can simplify your code.

Here is one of the most grievous sections of the code he showed me. Essentially it takes a list of data and produces a new list after ‘processing’ each element. (Obviously with the method names and parameters renamed.)

// The original code
let computeStuff x y z = (* ... *) ()

let rec analyzeStuff paramX paramY listOfData =
    match listOfData with
    | [] 
        -> []
    | hd :: tl 
        -> 
            [ 
                computeStuff paramX paramY hd 
            ] @ analyzeStuff paramX paramY tl

 

Eliminating Unnecessary Parameters

The first thing to point out is that the function’s parameters paramX and paramY are being passed directly to the computeStuff function and otherwise aren’t even used by this function. So why keep them around? Having additional function parameters only makes it a little more confusing. Using function currying you can eliminate those parameters by simply passing in a function that only accepts one argument. (And currying the rest as necessary.)

You can rewrite the analyzeStuff method like follows. The analyzeStuff2 function only takes two parameters now, f – the function for processing – and the list of data.

// Eliminating unnecessary parameters
let rec analyzeStuff2 f listOfData =
    match listOfData with
    | [] 
        -> []
    | hd :: tl 
        -> 
            [ 
                f hd 
            ] @ analyzeStuff2 f tl

And then, simply pass in a curried version of computeStuff. This allows you to reuse the analyzeStuff method and perhaps operate on different functions (rather than hard coding it to call computeStuff).

// Pass in a curried function
analyzeStuff2 (computeStuff paramX paramY)

The fact that there were unnecessary parameters to the function isn’t what, in my mind, made the code so bad. There were a couple of other idiomatic F# things that stuck out…

Prefer Cons instead of Append

As I’ve mentioned in a previous post, cons (::) is fast and append (@) is slow. I mean, if Dustin Campbell has posted about it is common knowledge, right? ;)

Think for a moment about what the following code actually does:

[ f hd ] @ analyzeStuff2 f tl

If your answer was ‘more than it needs to’ you would be correct.

  1. Create a new, one element list via ‘[ f hd ]’
  2. Make a copy of that entire list via ‘ @ ‘
  3. Set the last element’s next pointer to the result of ‘analyzeStuff2 f tl’ (via the rest of the append)

The problem lies in bullet number 2. You create a list only to create a second, identical list so you can use append on it. If you want to put exactly one element in the front of the rest of the list use cons. Which brings us to the following code snippet, which is slightly more performant than analyzeStuff2.

// Eliminating unnecessary append
let rec analyzeStuff3 f listOfData =
    match listOfData with
    | [] 
        -> []
    | hd :: tl 
        -> (f hd) :: analyzeStuff3 f tl

So while we have trimmed off a few characters and eliminated a few unnecessary memory allocations, there is still one final, significant problem with this code.

Using the F# Library Where Possible

Remember the whole purpose of the analyzeStuff method? It was to take a list of data and produce a new list of data after ‘processing’ each element. In other words, mapping a function to each element of the input list. This function may sound familiar – it is already built into the F# library and is called List.map. So we can remove the analyzeStuff function entirely, in lieu of just using what is already baked into the F# library.

// Final code
let results = List.map (computeStuff paramX paramY) listOfData

The moral of the story is that if your code seems more complicated than it needs to be, then it probably is. However, once my book comes out there will be no excuse for writing grotesque F# code in the future ;)

Posted by ChrSmith | 8 Comments
Filed under: , ,

F# for Architects: Hitting the sweet spot

When I was at DevLink last week I gave a talk designed to specifically identify why and when you should use F#. I was going to post the slides, but then I realized that they are in the form of a ‘presentation deck’ rather than a ‘reading deck’. So rather than having a few vague slogans and images in a .pptx file, I’ve transcribed my talking points. So imagine a deep and sexy voice narrating this blog post aloud. I’ve indicated where I ‘break character’ in the transcription.

---------------------------------------------

Slide1

Hello, my name is Chris Smith and I work at Microsoft on the F# team.

Slide2

[Ed. I wish every talk started with a slide like this.]

Of all the talks I’m giving at DevLink, this is the one I am looking forward to the most. This talk is really the one people should be giving at conferences. There certainly is a lot of excitement surrounding F#, but the question on most people’s minds is does it matter to me? I think we can assume that F# has to be good at something, because it if it wasn’t why would Microsoft be spending so much time and energy putting it into Visual Studio 2010?

So this talk is about removing some of the excitement associated with your typical F# talk and just deal in facts. My goal isn’t to convince you to start programming in F#, but instead articulate what the language is good at so you can decide for yourself if F# is worth looking into.

If you’re super-happy writing C# or VB.NET code then you really have no reason to learn F#. Just keep on being happy ;) But if you find yourself struggling to write your .NET code, then perhaps F# is what you’ve been waiting for.

Slide3

Let’s begin with some F# Facts:

  • False. F# isn’t going to replace C# or VB, ever. F# is just different, so it makes some things easier and some things harder. But the point is that with F# you have a choice – and this talk is about highlighting some of the situations where F# may make more sense than C#.
  • False. While F# (and other functional languages) make it easier to parallelize your code, it doesn’t do it for free. In fact, you can write the same deadlocks and race conditions in F# than you can in C#. Just in F# is that you have to point the gun at your foot first.
  • False. Functional programming is enjoying a lot of attention these days with new languages like Erlang, Scala, Nemerle, Haskell, etc. However the core ideas of functional programming have been around for quite some time without ever ‘making it big’. I don’t see functional programming becoming the one true paradigm that everyone will switch to. Rather, I see it as a great way for solving a specific class of problems.  F# is a hybrid language, so it allows you to write code in the paradigm that makes the most sense for the problem at hand.
  • True. Absolutely.
  • False. In the keynote at DevLink Josh Holmes specifically called this out as being a bad idea. Not to say that F# + ASP.NET isn’t a good idea, but that using new technologies for the sake of using new technologies makes things harder than they have to be.

Bottom line is this: you should learn what F# is about, and if you’ll be more productive using it then do so. Otherwise, go with what you know.

Slide4

[Ed. This slide’s pretty self-explanatory.]

Slide5

Language

  • As I mentioned before, F# is a multi-paradigm language. You can write functional, imperative, and object-oriented code in it. This allows you to be pragmatic instead of trying to constrain every problem you see into classes and interfaces.
  • F# is a type inferred language, leading to succinct programs. Think of how much time you spend adding type signatures, semicolons, and curly braces to your favorite programming language. All that information is just a crutch for making it easier to write the compiler. Why not write just the code you want and let the compiler figure out the rest? You’ll see later why I think F#’s succinctness is it’s greatest strength.
  • F# is a .NET programming language. It compiles down to IL and the same runtime as C# and VB. So F# will ‘just work’ with your existing .NET code and have the same performance profile as C#.

Tools

  • F# will be ‘in the box’ for Visual Studio 2010,but is also available now as an add-in for Visual Studio 2008.
  • It features a REPL like Ruby and Python, called the F# Interactive window -  you’ll see that in action shortly.

It’s easy to get overwhelmed with all the details of a new programming language. The one thing I want you to remember about F# as a whole is that it is fun. F# eliminates a lot of the hassles associated with programming allows you to focus on just writing the code you care about.

Slide6

[Ed. This is the part where I give a quick demo of F# in Visual Studio. The main features are:

  • Creating new projects
  • The F# Intellisense experience
    • Hovering over a local value shows its type information
  • The F# Interactive window

]

Slide7

It is important to note that F# supports nearly every feature that C# does. So when you use F# it isn’t an all-or-nothing type of proposition. You don’t have to throw away your existing code and move everything over to F#. In fact, we expect F# code to primarily be used as a class library integrated into a larger software package. Perhaps some backend server component is written in F# while the rest of your code remains in C#/PHP/Smoke Signals/whatever.

Slide8

Being able to seamlessly interoperate with .NET opens up a lot of possibilities for F#. Not only can you take advantage of existing tools like the Visual Studio debugger, but also other .NET platforms such as using XNA to run F# code on the X-Box or Silverlight to run F# code on the web. This synergy is perhaps the main reason why F# get’s much more attention than earlier functional programming languages like OCaml. OCaml’s great, but does it come with a world-class debugger, visualization libraries, and 3rd party ecosystem?

Slide9

So I’ve been talking up F# a bit – bla bla .NET interop – bla bla F# makes programming fun. But why should bother learning a new language?

Slide10

Let’s start by looking at some of the buzzwords associated with F#.

  • Statically typed. F# is a statically typed language, in contrast with Ruby which is dynamically typed. This means that type information is known at compile time. So while you can’t do things like monkey patching in F#, code executes much faster and you’ll know about most errors at compile time.
  • Succinct. In addition to being type inferred, F# is also whitespace significant. So if you tab a few lines of code under an if-statement, the compiler will treat that as the body of the if-statement. This saves you the burden of littering your code with curly braces as well as saves you vertical whitespace. Less code on the screen means you can see more of your program at a time, meaning you do less context switching while you look through code. It may not sound like much, but this will be a large boon for your productivity. 
  • Scalable. This isn’t talking about data centers or web servers, but just that if you apply solid software engineering principles to F# code you can write enterprise level applications. (Which is a fancy way to say that F# is not a ‘write-once’ language.)
  • Explorative. In F# you don’t need to commit to a design in order to see how things work. You can rapidly prototype solutions in F# as well as experiment with algorithms and approaches in the F# Interactive window. Building on F#’s succinct nature, you don’t need to make a lot of code investment to experiment with your programs.
  • Libraries. F# comes with its own library for writing functional code and more.

Slide11

Functional programming has been around since the 1950’s (LISP) – so why is functional programming so important now? A lot of people will say things like how immutability helps in writing parallel programs, which IMHO isn’t that big of a deal. Because if you wanted to get serious about parallel programming there are programming languages designed to do that and only that exceptionally well.

The benefit of functional programming lies in what is referred to as programming in the small.

Think about the current state of the art when it comes to writing software: object-oriented programming. The main way we approach the ask of converting an algorithm into code is by breaking it into objects, effectively modeling the domain through classes and behaviors. That’s programming in the large. This process of abstraction works great for modeling large systems and breaking it down into smaller and smaller layers, but eventually you run into a wall.

What about the method bodies? The actual code that goes inside those classes? That’s programming in the small. The problem with object-oriented programming is that when you are writing algorithmic type code, it doesn’t make sense to create classes and interfaces. When you are programming in the small all you really care about is data transformation and raw computation.

Think of object-oriented programming as a baseball mitt, it’s great for a catching pop fly but certainly isn’t that useful for catching a football. To do that you need a little more finger dexterity. Functional programming is all about programming in the small. It emphasizes the manipulating data that needs to be preformed without all the overhead of classes and an arbitrary type hierarchy.

[Ed. For more information about this concept check out this and this.]

Slide12

[Ed. This animated slide doesn’t look nearly as awesome when you print it out :-/]

The first area that F# really excels in is technical and quantitative computing. AKA number crunching.

Simulation

Simulation is one area that F# would we well suited for – imagine you are writing some sort of physics simulator, or trying to model some real-world situation. In F# you can write the functions you want cleanly without needing to force a code-based abstraction onto the real-world processes.

Another aspect of simulations which makes F# a good fit is a feature called units of measure, which allow you to encode units such as ‘feet’ or ‘meters per second’ into the actual F# type system. So you’ll get a compiler warning if you try to add acceleration to a velocity. This allows you to eliminate an entire class of bugs.

Computational Finance

Some F# early adopters are in the computational finance business. These firms have large codebases for automated stock trading, calculating risk, and so on. (Exactly the sort of computerized finance that caused the current global economic PITA ;) F# allows these investment banks not only a way to express their financial models in a simpler and more declarative way, but also integrate with the rest of their software stack through .NET interop. (For example, farming out intensive computations to multiple machines through a COM-service or the like.)

Large Scale Data Processing

Finally, F# does a good job at data-exploration type problems. For example, imagine you were trying to mine some customer data to identify trends. In F# you can simply play around with your data in the F# Interactive window, no need to litter your machine with dozens of throw away projects. Also, using F#’s succinct programming model the cost of failure – refactoring or rethinking your approach – is much lower.

Slide13

Next up is language oriented programming. Which generally means blurring the line between a programming language and natural language.

Writing Parsers

If you wanted to be all hardcore about creating domain specific languages, why not go ahead and write your own parser? Built into the F# PowerPack library are FSLex and FSYacc, two great tools for creating custom parsers. In fact, the F# compiler’s parser is built using FSLex and FSYacc.

Augmenting F# Code

Another great example of language oriented programming is the use of F# computation expressions. This is a high-powered language feature that allows you to write normal F# code and customize how it gets executed. For example, let’s say you wanted to write some F# code to interact with a robot on Mars. Rather than writing a ton of boilerplate code to retry every command you send to the rover – in the case of cosmic interference or Martian uprising - why not write an F# computation expression (AKA F# workflow) to automatically retry any F# code several times until it times out or succeeds.

[Ed. This is a pretty advanced topic that I’ll need to devote an entire series of blog posts to later.]

Creating New Languages

Another area F# excels in is creating internal domain specific languages (mini programming languages embedded in your F# code). F#’s discriminated union types makes declaring ASTs (or any other tree-like data structure) very simple.

Quotations (Expression Trees)

Finally, F# has the same Expression Trees feature found in C# and VB.NET. This allows you to get ‘compiler generated’ form of F# code, which with the right libraries, can allow you to defer the execution of that F# code to other platforms. For example, on a SQL server or on your video card’s GPU.

Slide14

F# is also good at parallel programming, which to be pedantic is referring to the: parallel, asynchronous, concurrent, and reactive programming domains. F# doesn’t automatically parallelize your code, nor have any built-in support for parallel primitives. Instead, it has a couple of features that make parallel programming easier – but none of them are a silver bullet.

Immutable by Default

In F#, like most other functional programming language, data is immutable by default. So if you program in the purely functional paradigm you don’t have to worry about race conditions when executing multiple threads because there is no shared, mutable state for two threads to fight over.

Parallel Extensions to .NET

This isn’t an F#-specific feature per-say but will go a long way towards making it easier to write parallel programs and that is the Parallel Extensions (PFX) to .NET found in CLR 4.0. While the PFX adds a lot of great capabilities to the .NET platform, there are two parts in particular I am super-excited about.

The Task Parallel Library (TPL) is a set of classes for abstracting the process of spawning ‘tasks’ of work. With the TPL there is no longer any need to spawn or manage threads manually. Instead, you can trust the TPL ‘goo’ to deal with that for you as well as modulate how many threads are currently asking for CPU time to make sure you aren’t overwhelming your system with parallel tasks.

The other, and perhaps even more exciting, part of the PFX is the set of new Concurrent Data Structures. These are a new set of collection types that are built specifically with high-performance parallel applications in mind. So rather than sweating all the details of sharing mutable state, you can just have two threads writing to one of these concurrent data structures and all the messy locking and unlocking of data is taken care of for you.

Asynchronous Workflowws

You may have heard somewhere before that F# makes asynchronous programming super easy, that is 100% true. 

Slide15

The way to write asynchronous programs on .NET is through the use of the Asynchronous Programming Model or APM. This is a pattern where you begin an asynchronous operation – such as reading a file – by calling BeginOp (e.g.,  BeginRead) and passing in a callback. When the asynchronous operation completes, the callback is executed in which you must call EndOp (e.g., EndRead).

While this approach has served .NET well for nearly a decade it’s a huge pain in the ass to deal with.

  • Passing state across callbacks is tedious and error prone
  • You cannot easily handle exceptions if your asynchronous code is spread across myriad callbacks
  • etc.

The code in the slide is from the Microsoft Patterns and Practices group and is the recommended way for processing a series of images in parallel. If anything, I want to point out that there is a lot of code which doesn’t look that easy to write.

Slide16

This is how to write that same code in F#, using the F# Asynchronous Workflows library. Like I’ve said many times, F# wasn’t created specifically to write parallel programs. F# Asynchronous Workflows is simply some functionality built into the F# library that makes asynchronous programming very easy. This is enabled through a feature called F# Computation Expressions.

In the code you see the ‘let!’ and ‘do!’. The F# Asynchronous Workflows library completes these operations asynchronously, doing all that jazz with thread hopping and callbacks behind the scenes.

Slide17

So I have told you all about how awesome F# is, but let’s try to bring the discussion back down to reality. Certainly F# isn’t all unicorns and rainbows, right?

Slide18

[Ed. I should probably get a better slide title from the marketing department ;) ]

F# isn’t great for everything, here are a few such areas.

Presentation Layer

In F# v1.0 we won’t support any code generators in the box. So you won’t get any of the WYSIWYG editors for WinForms or WPF in F#. So if you want to write presentation layer code in F# you’ll need to unfortunately do it all by hand. However, this isn’t the end of the world because of F#’s .NET interop story. You could just as easily create your UI in C# or VB and simply call into your F# code through a class library.

Object Oriented Hierarchies

Even though F# can write object-oriented code, that isn’t the language’s main paradigm so when writing the heavily OO code of can look a little strange. Also F# supports many functional programming constructs that C# and VB don’t, such as discriminated unions and function currying. So if you created an F# library that exposed ‘functional’ code, it certainly wouldn’t look right if you tried to bring that into a non-functional language.

The key here is to properly encapsulate your F# code – programming in the small – when you want to integrate it with a larger component – programming in the large.

Slide19

So what happens if F# does make sense for part of your application? Are you going to need to hire a bunch of rocket scientists? No. Actually, I’ll even go on record and bust out a hearty hell no. F# isn’t just for eggheads. In fact, this summer we’ve had a highschool intern working on some F# samples. (That’s right, someone who just graduated highschool writing F# code for their internship.)

There is nothing inherently difficult with functional programming, it’s just a different way to view problems. But pay attention to what your team says while they are learning F#:

  • “It’s just like C#”. This means they are doing it wrong. The whole point of F# is that it provides you a different way to think about programming, and in that different approach be more effective. If you write C# code in F# you won’t be gaining anything.
  • “F# is too simple for real-world problems”. While F# is a simple programming language to learn, you can accomplish quite a lot through just writing simple functions.
  • “Functional programming hurts my brains”. This means you are on the right track. Learning F# will expose you to different approaches to solving problems. (functional v imperative, functional v object-oriented, etc.)

Slide20

Even if F# will help you be more productive writing code today, what is the long-term cost of that F# code?

Well one great benefit of F# is that it is terse. I mentioned this before, but when you are trying to debug an issue being able to see all the relevant code on the screen at once you don’t have to constantly be context switching between different code files.

It can be said that mind-bending functional code is harder to understand than more normal imperative code. That’s totally true – but the reality is that you shouldn’t write mind-bending functional code in the first place. (Just like you shouldn’t write mind-bending code in any programming language.) Idiomatic F# code is clear and readable. You can certainly use features like function composition and function currying without making the code ‘write once’.

Slide21

The bottom line is this: if you are happy with C# or VB, then just keep using them. In fact, ignore F# all together. You can benefit from learning F# like any programming language to make you a better programmer, but don’t switch to F# if there isn’t a specific reason.

But if you find that you are struggling against your programming language to express your ideas. That you find too much syntactic clutter gets in the way, then check out F#. In domains where there is a lot of computation or data transformation to perform – technical computing, language oriented programming, parallel / async – you might be more productive.

Writing code in F# doesn’t magically make it faster or take up less memory. All it does is provide you with a different way to view the problem space, and in that different view be more effective in encoding algorithms to solve your problems by giving you more ways of expressing your ideas. (That is, in an imperative, object-oriented, or functional style.)

Slide22

Unlike a lot of ‘awesome new technologies’ coming out from Microsoft, you can actually use F# today. In fact to my knowledge there are a growing number of developers who are paid to write F# code every day. You can find the latest F# CTP, which is an add-in for Visual Studio 2008, from the F# Dev Center at http://fsharp.net. I you want to try even fancier new technologies, you can also check out Visual Studio 2010 Beta1 from http://msdn.com.

For non-windows folks you can F# via Mono at http://mono-project.com

Slide23

For learning F# I recommend the three B’s: Books, Blogs, and Bugging other people.

Books

For being a language that hasn’t officially come out here there are already several great books on the subject. The most recent general book on F# was Expert F# by Don Syme, Adam Granicz, and Antonio Cisternino. It’s a good book, but it’s main drawback these days is just when it was published: 2007. Since then the language has undergone some significant revision, especially in the F# libraries.

I – in my completely unbiased opinion – recommend you wait until early November and buy my book, Programming F# from O’Reilly. :)  It is a comprehensive guide to the F# language. In it you won’t find a lot of theory, but instead a clear introduction to what F# has that your favorite language doesn’t, and how you can take advantage of those features.

Blogs

There is a great F# community online with a ton of high-quality blogs. In fact nearly everyone on the F# team has their own blog. Check out the F# Development Center where you’ll find an aggregated feed for F# blogs.

Bugging other people

Finally, there are two great places to go for learning more about F#. http://hubfs.net is the Hub FS website, a great forum dedicated to exclusively F#. In addition, there has been growing interest in F# on http://stackoverflow.com. I recommend you ask questions there so that I may answer you and receive glorious reputation. ;)

Slide24l

I hope you enjoyed the talk, if it sounds like F# doesn’t apply to your line of work and never will – no hard feelings. If it does, then I encourage you to download the latest F# CTP and experiment.

[Ed. Hopefully I’ve been able to capture the essence of my presentation. While you can’t summarize everything about F# in a single blog post, hopefully this helps clarify some domains where we expect F# to be the most heavily used. If you have any questions please feel free to use the contact link on this blog.]

Posted by ChrSmith | 4 Comments
Filed under: ,

Attachment(s): F# for Architects - Released.pptx

Back in Action!

I know it’s cliché to blog about blogging, but I’d like to take this opportunity to explain why there has been such a lull. In short, I’ve been busy.

Finished Programming F#

As you might have heard, I was working on a book. Well no more! The book is done and only undergoing minor technical edits (read fixing bugs due to breaking changes). Check for it in stores late October.

Unfortunately, the sad news is that the final book will have a different cover than the previously discussed Jelly Fish. Instead, it’ll be a fat (but cute) songbird… Perhaps singing in the key of F#? (Sorry to disappoint you Greg.)

lrg[1]

Finished my Masters Degree

I’ve been attending the University of Washington’s Professional Masters Program for the past few years, and I finally got around to graduating. If you live in the Seattle area and are interested in taking your education to the next level I would highly recommend it. I certainly wouldn’t be able to excel at my job working on the F# team if it weren’t for taking a few graduate-level classes in Programming Languages and Compilers.

Went to Japan

After working nights and weekends for over a year with the book and grad school, I figured I would take a little vacation. So I spent a week in Japan bumming around with a buddy of mine. I’ll spare you the boring details, but let’s just say a LOT of karaoke was involved.

Japan 097

Spoke at DevLink

Last week I was in Nashville speaking at DevLink. It was a wonderful opportunity to meet up with some old friends and find plenty of new ones. There I gave four talks on F# and functional programming, totaling six hours of material. Some talks went better than others, but I’m pleased to note that the talk I really wanted to knock out of the park – F# for Architects: Hitting the sweet spot – was very well received.

Edit 8/20: I'll post the slides for the F# for Architects talk here later this week.

devLink Technical Conference

So What’s Next?

So where does that leave me? Now that I’ve got my new degree, the book coming out, and some painful personal life drama behind me – what’s next? Well I’ve been thinking about that myself for a while. And the answer is more of the same. Specifically, I want to keep writing, blogging, and getting an awesome v1.0 of F# out and in the hands of developers. So check back soon!

Posted by ChrSmith | 9 Comments

Lightweight PGN parser in F# – Part I

If you’ve ever spent a lot of time around the game of Chess you’ve probably seen or interacted with the PGN file format in one way or another. PGN stands for Portable Game Notation and is a concise way to represent Chess games.

Using regular expressions I’ve written a very lightweight parser to handle the file format. I wouldn’t be surprised if the regular expressions don’t capture every legal .pgn file, but it shows how effective regular expressions can be for semi-structured data.

Not being a Chess player myself, I’m not really sure what to do with:

"Nxf7 Rxe1+"

but perhaps I’ll look at that in a future blog post.

Edit: On second thought, there is all sorts of wrong with this parser. For example, notice how move ‘3’ is listed twice. The ‘3…’ continues on after the comment.

Alas, once the raw list of moves is parsed another step to coalesce them is required. In addition, a commenter pointed out that the 1/2-1/2 is the result of the game (a draw). I’ll spend a little more time on this before I can call it an official ‘parser’ but it’s a nice start :D. 

open System.Text.RegularExpressions

let pgnGameText = "
[Event \"F/S Return Match\"]
[Site \"Belgrade, Serbia Yugoslavia|JUG\"]
[Date \"1992.11.04\"]
[Round \"29\"]
[White \"Fischer, Robert J.\"]
[Black \"Spassky, Boris V.\"]
[Result \"1/2-1/2\"]
 
1. e4 e5 2. Nf3 Nc6 3. Bb5 {This opening is called the Ruy Lopez.} 3... a6
4. Ba4 Nf6 5. O-O Be7 6. Re1 b5 7. Bb3 d6 8. c3 O-O 9. h3 Nb8  10. d4 Nbd7
11. c4 c6 12. cxb5 axb5 13. Nc3 Bb7 14. Bg5 b4 15. Nb1 h6 16. Bh4 c5 17. dxe5
Nxe4 18. Bxe7 Qxe7 19. exd6 Qf6 20. Nbd2 Nxd6 21. Nc4 Nxc4 22. Bxc4 Nb6
23. Ne5 Rae8 24. Bxf7+ Rxf7 25. Nxf7 Rxe1+ 26. Qxe1 Kxf7 27. Qe3 Qg5 28. Qxg5
hxg5 29. b3 Ke6 30. a3 Kd6 31. axb4 cxb4 32. Ra5 Nd5 33. f3 Bc8 34. Kf2 Bf5
35. Ra7 g6 36. Ra6+ Kc5 37. Ke1 Nf4 38. g3 Nxh3 39. Kd2 Kb5 40. Rd6 Kc5 41. Ra6
Nf2 42. g4 Bd3 43. Re6 1/2-1/2
"

// Remove comments and markup from the PGN file
let removeMarkup text = 
    let tagPairs = new Regex(@"\[.*\]")
    let noTagPairs = tagPairs.Replace(text, "")

    let comments = new Regex(@"\{.*\}")
    let noComments = comments.Replace(noTagPairs, "")
    
    // Trim any leading whitespace and convert to a single-line
    noComments.Trim().Replace("\r", "").Replace("\n", " ")
        
// Get the list of moves, each prefixed with a number and one or three dots
let getMoves text =
    let factRegex = new Regex(@"\d+\.+", RegexOptions.Multiline)
    factRegex.Split(text)

let normalizedText = removeMarkup pgnGameText
    
let printGameMoves() = 
    getMoves normalizedText
    |> Array.map (fun move -> move.Trim())
    |> Array.iteri (fun idx move -> printfn "Move %2d: %s" idx move)
(*
OUTPUT:

> printGameMoves();;
Move  0: 
Move  1: e4 e5
Move  2: Nf3 Nc6
Move  3: Bb5
Move  4: a6
Move  5: Ba4 Nf6
Move  6: O-O Be7
Move  7: Re1 b5
Move  8: Bb3 d6
Move  9: c3 O-O
Move 10: h3 Nb8
Move 11: d4 Nbd7
Move 12: c4 c6
Move 13: cxb5 axb5
Move 14: Nc3 Bb7
Move 15: Bg5 b4
Move 16: Nb1 h6
Move 17: Bh4 c5
Move 18: dxe5 Nxe4
Move 19: Bxe7 Qxe7
Move 20: exd6 Qf6
Move 21: Nbd2 Nxd6
Move 22: Nc4 Nxc4
Move 23: Bxc4 Nb6
Move 24: Ne5 Rae8
Move 25: Bxf7+ Rxf7
Move 26: Nxf7 Rxe1+
Move 27: Qxe1 Kxf7
Move 28: Qe3 Qg5
Move 29: Qxg5 hxg5
Move 30: b3 Ke6
Move 31: a3 Kd6
Move 32: axb4 cxb4
Move 33: Ra5 Nd5
Move 34: f3 Bc8
Move 35: Kf2 Bf5
Move 36: Ra7 g6
Move 37: Ra6+ Kc5
Move 38: Ke1 Nf4
Move 39: g3 Nxh3
Move 40: Kd2 Kb5
Move 41: Rd6 Kc5
Move 42: Ra6 Nf2
Move 43: g4 Bd3
Move 44: Re6 1/2-1/2
val it : unit = ()
*)
Posted by ChrSmith | 5 Comments

Programming F# – Official Cover

Edit 8/19: You might notice, the cover is no longer a jellyfish. While I know this is a slight disapointment for some - including myself - trust me when I say I have a plan to remedy this. Stay tuned! 

So as it turns out my petition for a kickass cover was ultimately unsuccessful. But the good news is that I’m not suck with something neither cute nor cuddly. Instead I got a Jellyfish, after spending some time thinking about it is pretty good. (Though certainly not as awesome as a hydra.)

Jellyfish Facts

  • Jellyfish can be found in every ocean. Therefore, in the time if trouble they can call on the nine pirate lords for help. Much like F# being able to use nine different programming paradigms and styles:
    • functional, imperative, object-oriented, metaprogramming, scripting, concurrent, reactive, declaritive, and awesome. (Yes, ‘awesome’ is a programming paradigm… and Haskell doesn’t support it.)
  • Jellyfish do not have a brain or central nervous system. Just like F#… umm… you don’t need a nervous system to write world-class applications?  Let me get back to you on this one…
  • The real reason I’m excited about having a Jellyfish on the cover is that they can Sting and Kill you. Like F#, Jellyfish are deadly.

So anyways, this October be on the lookout for Jellyfish Book!

Programming F#: Rough Cuts Version

Posted by ChrSmith | 22 Comments
Filed under:

F# Scripting Zen – Bulk Updating Testcases

As the F# team is busy working to finish up Visual Studio 2010, one task left to complete is to localize the compiler, so that on a Japanese machine the error messages will be in Japanese.

While I’m sure a few Ugly American Programmers might question the value of localized error messages, imagine if the F# compiler told you this every time you made a mistake:

"このタイプのパラメータ値を推定タイプ略語の安定の下では、消去されていません。このドロップ、または並べ替えのタイプのパラメータを、 のタイプは、例えば、略語の使用によるものです< ' a >= はまたは\ Ñ \ ttypeスワップ< 'は、 ' b > = ' b * '1つの。 明示的にこの値の型のパラメータを宣言、 \ Ñ例: \ tlet< 'は、 ' b > x y )に:スワップ< ' b ' a >を参照) :スワップ< 'は、 ' b > = (イは、 X "

Anyways, my job is to update our F# compiler test automation to be localization agonistic. Meaning that tests will work just fine on both English, Japanese, and any of the other dozen or so languages Visual Studio gets localized to.

The majority of our compiler automation works by taking a code file, compiling and running it, and if the process returns with a non-zero exit code report a failure. Negative tests - where we expect the code file not to compile – work by checking the compiler output for specific messages.

The following is a sample F# compiler testcase. Note that the ‘expected message’ is an arbitrary regular expression, which is why the ‘(‘ and ‘)’ need to be escaped with a backslash.

// FSB 1488, Implement redundancy checking for dynamic type test patterns

 

//<Expects status=warning>\(9,7-9,16\): warning FS0026: This rule will never be matched</Expects>

//<Expects status=warning>\(15,7-15,16\): warning FS0026: This rule will never be matched</Expects>

 

let _ =

    match box "3" with

    | :? string  -> 1

    | :? string  -> 1  // check this rule is marked as 'never be matched'

    | _ -> 2

   

let _ =

    match box "3" with

    | :? System.IComparable -> 1

    | :? string  -> 1  // check this rule is marked as 'never be matched'

    | _ -> 2

   

exit 0

As you can probably guess, matching against English compiler output is problematic when testing a localized compiler. The F# QA team (consisting of 3 people) discussed a few options, but the design we settled on was simply encode more metadata into the ‘Expects’ tag, and simply not validate specific error text on non-English builds.

So we want to rewrite all of our negative tests to look something like this:

// FSB 1488, Implement redundancy checking for dynamic type test patterns

 

//<Expects id="FS0026" span=(9,7-9,16) status="warning">This rule will never be matched</Expects>

//<Expects id="FS0026" span=(15,7-15,16) status="warning">This rule will never be matched</Expects>

 

let _ =

    match box "3" with

    | :? string  -> 1

    | :? string  -> 1  // check this rule is marked as 'never be matched'

    | _ -> 2

   

let _ =

    match box "3" with

    | :? System.IComparable -> 1

    | :? string  -> 1  // check this rule is marked as 'never be matched'

    | _ -> 2

   

exit 0

When the testcase is in this format, then rather than just matching the ‘message text’ we can build up the regular expression which we will match against the compiler output. So we will look for something like: <span> + <message id> + <text>.

Then, on localized installs we simply won’t match the text. (But still validate the line/column span of the error, and the error ID from the compiler.) This, combined with a little loc-specific testing, should be enough.

So how do you fix the testbed? We have ~1,600 automated testcases which use the <Expects> technique, of those ~600 which need to be updated. As much as I love manually going through and tweaking formatting of testcases, hopefully I can put my F# knowledge to good use and whip up an F# script do to this for me.

Defining Types

Before solving a problem in F#, the first step is usually defining new types. In this case, we want to create types to describe the metadata encoded in testcases.

I’d like to draw your attention to how concise F#’s syntax is for declaring types. If this were C# you would need to sprinkle in property getters and setters, curly braces and semicolons. Instead, creating discriminated unions and records is straight forward. In addition, I’ve added a static Parse methods which we will use later.

type MessageType =

    | Warning

    | Error

    | NotIn

    | Success

   

    override this.ToString() =

        match this with

        | Warning -> "warning"

        | Error   -> "error"

        | NotIn   -> "notin"

        | Success -> "success"

   

    static member Parse(txt : string) =

        match txt.ToUpper() with

        | "WARNING" | "\"WARNING\"" -> Warning

        | "ERROR"   | "\"ERROR\""   -> Error

        | "NOTIN"   | "\"NOTIN\""   -> NotIn

        | "SUCCESS" | "\"SUCCESS\"" -> Success

        | _ -> failwithf "Unknown message type: [%s]" txt

 

type MessageRange =

    | NoRange                            // No error range specified

    | OnePoint of int * int              // (a,b)

    | TwoPoints of int * int * int * int // (a,b-c,d)

 

    static member Parse(txt : string) =

        let onePtRegex = @".*\\\((\d+),(\d+)\\\).*"

        let twoPtRegex = @".*\\\((\d+),(\d+)-(\d+),(\d+)\\\).*"

       

        let onePtMatch = Regex.Match(txt, onePtRegex)

        let twoPtMatch = Regex.Match(txt, twoPtRegex)

        if   onePtMatch.Success then

            OnePoint(int onePtMatch.Groups.[1].Value, int onePtMatch.Groups.[2].Value)

        elif twoPtMatch.Success then

            TwoPoints(int twoPtMatch.Groups.[1].Value, int twoPtMatch.Groups.[2].Value,

                      int twoPtMatch.Groups.[3].Value, int twoPtMatch.Groups.[4].Value)

        else

            NoRange

   

type ExpectedMessage =

    {

        MsgType  : MessageType

        MsgID    : string

        MsgRange : MessageRange

        MsgText  : string

    }

   

Parsing (AKA Regular Expression voodoo)

Here comes the fun part: greping through each tescase file and trying to parse it using regular expressions.

I’ll be honest when I say that we haven’t been super-consistent in how to encode error messages. So some tests have error spans, some don’t. Some match the error ID, some don’t.

Admittedly this code could be better commented, but it should be pretty straight forward. Note that the removeTokens function could easily be rewritten to be a bunch of functions composed together (rather than using piplining). Having the separate ‘fixLineForm'X’ methods makes debugging a tad easier.

/// Given the match string for a testcase strip out error spans and compiler error numbers.

/// E.G.     "\(9,18-9,21\): warning FS0035: This form of object expression is deprecated."

/// becomes: "This form of object expression is deprecated"

let removeTokens line =

   

    let fixLine regexPattern (groupIdx : int) line =

        let m =

            Regex.Match(

                line,

                regexPattern)

 

        if m.Success then

            m.Groups.[groupIdx].Value

        else

            line

 

    // Specify error range (a,b-c,d): error FS????:

    let fixLineForm1 = fixLine @"\\\(\d+,\d+-\d+,\d+\\\)..(error|warning) FS\d\d\d\d..([^<]*)" 2

    // Specify error range (a,b): error FS????:

    let fixLineForm2 = fixLine "\\(\d+,\d+\\): (error|warning) FS\d\d\d\d..([^<]*)" 2

    // Start with error FS????:

    let fixLineForm3 = fixLine @"(error|warning) FS\d\d\d\d..([^<]*)" 2

    // Start with FS????:

    let fixLineForm4 = fixLine @"FS\d\d\d\d..([^<]*)" 1

    // Just error range (Note this must come before line form1)

    let fixLineForm5 = fixLine @"\\\(\d+,\d+-\d+,\d+\\\)..([^<]*)" 1

    // Just error FS????

    let fixLineForm6 = fixLine @"(error|warning) FS\d\d\d\d([^<]*)" 2

 

    line |> fixLineForm1 |> fixLineForm2

         |> fixLineForm3 |> fixLineForm4

         |> fixLineForm5

 

/// If the line contains an <Expects> block, parse out the error ID, span, and text
/// and rewrite it making them explicit.

let fixExpectedMessageLine line =

   

    let isMessageLineRegex = "//.*<Expect[^>]*status=([^>]+)>([^<]*)</Expect.+"

   

    let m = Regex.Match(line, isMessageLineRegex)

   

    if m.Success then

        // If we've identified it as a line containing testcase metadata,

        // parse it and rewrite it.

        let message =

                {

                    MsgType  = MessageType.Parse(m.Groups.[1].Value)

                    MsgID    = (let id = Regex.Match(m.Groups.[2].Value, ".*(FS\d\d\d\d).*")

                                if id.Success then id.Groups.[1].Value

                                else               "FS0191")

                    MsgRange = MessageRange.Parse(m.Groups.[2].Value)

                    MsgText  = removeTokens (m.Groups.[2].Value)

                }

               

        let spanPart =

            match message.MsgRange with

            | NoRange            -> ""

            | OnePoint(x,y)      -> sprintf "span=\"(%d,%d)\" " x y

            | TwoPoints(a,b,c,d) -> sprintf "span=\"(%d,%d,%d,%d)\" " a b c d

           

        if message.MsgType = Error || message.MsgType = Warning then

           

            sprintf

                "//<Expects id=\"%s\" %sstatus=\"%s\">%s</Expects>"

                message.MsgID spanPart (message.MsgType.ToString()) message.MsgText

           

        else

            // Only rewrite negative tests, skip SUCCESS or NOTIN tests...

            line

    else

        // This line didn't contain any metadata information

        line

Fixing Testcases

The work to actually fix testcases requires a few utility functions. I’ll cover each one in order.

First is my favorite four-line sequence expression: return all files under a given folder. Using yield! inside of a sequence expression is quite sexy.

let rec filesUnder basePath =

    seq {

        yield! Directory.GetFiles(basePath)

        for subDir in Directory.GetDirectories(basePath) do

            yield! filesUnder subDir }

Next is creating a type called FixedFile, which keeps track of any changes we made to the source code. (That is, if we need to rewrite any lines containing testcase metadata.) The code simply tries to modify each line, and if the updated array of strings is different then we know the file has been modified.

You cannot compare two arrays to see if they are equal in C#, because it will use referential equality (comparing pointers.) The F# code on the other hand uses structural equality (comparing array elements).

type FixedFile = { FilePath : string; FileFixed : bool; UpdatedLines : string[] }

           

let fixFile filePath =

    let lines = File.ReadAllLines(filePath)

    let fixedLines = lines |> Array.map fixExpectedMessageLine

    // Note use of structural equality

    {

        FilePath = filePath

        FileFixed = (lines <> fixedLines)

        UpdatedLines = fixedLines

    }

Next is a couple of functions to checkout a file for edit using using Source Depot. (Shelling out to tf.exe to use Visual Studio Team Foundation Server would work just as well.)

/// Spawns a new process. Returns (exit code, stdout)

let shellExecute executablePath workingDir args =

 

        let startInfo = new ProcessStartInfo()

        startInfo.FileName         <- executablePath

        startInfo.WorkingDirectory <- workingDir

        startInfo.Arguments        <- args

       

        startInfo.UseShellExecute        <- false

        startInfo.CreateNoWindow         <- true

        startInfo.RedirectStandardOutput <- true

 

        let proc = Process.Start(startInfo)

        proc.WaitForExit()

        (proc.ExitCode, proc.StandardOutput.ReadToEnd())

 

/// Checks out a given file for edit

let checkoutFile filePath =

 

    printfn "Checking out: %s" filePath

 

    let pathToSD = @"d:\Tools\SourceDepot\sd.exe"

    let enlistmentPath = @"D:\dd\cambridge_staging_3\src\"

   

    // Add working directory so SD knows which client to use...

    let (ec, txt) = shellExecute pathToSD enlistmentPath ("edit " + filePath)

   

    printfn "SD exited with code %d\n%s" ec txt

    ()

Putting it Together

Unfortunately our script is somewhat anti-climactic. To fix all 600 compiler testcases the actual work is done in six lines of code:

@"D:\dd\cambridge_staging_3\src\tests\fsharpqa\Source"

|> filesUnder

|> Seq.map fixFile

|> Seq.filter (fun fixedFile -> fixedFile.FileFixed)

|> Seq.iter (fun ff -> checkoutFile ff.FilePath

                       File.WriteAllLines(ff.FilePath, ff.UpdatedLines))

So about 167 lines later, I have a script file that automates a task that would have easily taken a couple work days to complete. This F# script leveraged the existing IO and Regular Expressions libraries built into .NET, and can easily be packaged into a class library and shared with other developers.

So when you next encounter a problem you want to automate, consider looking to see if an F# script is the right solution for you.

Posted by ChrSmith | 2 Comments
Filed under: , , ,

F# Community Roundup

There’s plenty going on in the F# community these days, I figured I’d provide a shameless plug for few blog posts or videos I especially liked.

Toronto F# Study Group

If you live in Soviet Canuckistan and are in the Toronto area, Justin Lee is hosting an F# Study Group at the Dark Horse Cafe in Spadina on Thursday, May 7th.

IL analysis using F#

Ian Voyce looks (briefly) at some minimal F# code for picking apart the IL of a managed assembly. This is certainly something that I’d love to see more of in the future.

To learn a bit more about IL, a great resource is Haibo Luo’s (now defunct) blog.

Jason Olson, the Tom Cruise of F#?

Jason Olson F# lover and all around nice guy has been busy the past few weeks:

Imperative Computation in F#

Thomas Petricek has a great blog post up showing how you can create imperative constructs for F# code when using computation expressions. In short, you can add support for break and continue within while loops (just like C#).

// Counts '1, 3, 5, 7, 9'
imperative { 
    for x in 1 .. 10 do 
        
        if x % 2 = 0 then 
            do! continue // <---
            
        printfn "number = %d" x 
}
 
// Counts '1, 2, 3' and then stops   
imperative { 
    for x in 1 .. 10 do

      if x >= 4 then 
        do! break // <---

      printfn "number = %d" !x
}

If you’d like to keep a better pulse on what the blogosphere has to say about F#, checkout the Planet F# RSS feed.

Posted by ChrSmith | 1 Comments

Idiomatic F# – Functional vs. Imperative

Our story begins with this guy, Stuart Bowers, sipping functional programming cool aid out of a kickass mug from Cafe Press.

IMAGE_154

Stuart is a developer on Amalga and full-time F# developer at Microsoft. The world is at a loss because this man does not blog...

Anyways, he has been reviewing chapters some chapters of my book and we recently had a long conversation about what should be idiomatic F# style which I’d like to share.

The Problem

Question: What is the ‘correct F# way’ to write the following code:

Create a shared, mutable value and spawn two thread pool threads to sum elements in a list. (Intentionally creating a race condition, which was the point of the code example.)

The Imperative (?) Way

Let’s start with how to do this imperatively.

open System.Threading

let sumArray (arr : int[]) =
    let total = ref 0

    // Add the first half
    let thread1Finished = ref false
    ThreadPool.QueueUserWorkItem(
        fun _ -> for i = 0 to arr.Length / 2 - 1 do
                    total := arr.[i] + !total
                 thread1Finished := true
        ) |> ignore

    // Add the second half
    let thread2Finished = ref false
    ThreadPool.QueueUserWorkItem(
        fun _ -> for i = arr.Length / 2 to arr.Length - 1 do
                    total := arr.[i] + !total
                 thread2Finished := true
        ) |> ignore

    // Wait while the two threads finish their work
    while !thread1Finished = false ||
          !thread2Finished = false do

          Thread.Sleep(0)

    !total

The imperative code is very straight forward and offers a few benefits:

  • Simplicity. Not a whole lot is actually going on, just two calls to QueueUserWorkItem and a couple of boolean flags to keep track of when the threads complete.

However, there are some drawbacks too:

  • Redundancy. Both lambdas passed to QueueUserWorkItem are essentially the same, so the code is duplicated. If you want to spice things up, such as adding logging you will either forget to update the code in both locations or simply add it twice.
  • Hidden data. The key difference between the two threads is the range of elements in the array that they sum. The first array counts 0 to len / 2 – 1, the second thread to len / 2 to len. However, this crucial fact is buried in the code in two separate places.

The Functional (?) Way

Now let’s examine how to do this in a more functional style.

open System.Threading

let sumArray (arr : int[]) =
    // Define a location in shared memory for counting
    let total = ref 0

    // Define two flags in shared memory
    let thread1Finished, thread2Finished = ref false, ref false
    
    // Generate a lambda to sum a section of the array. 
    let sumElements (startIndex,endIndex) flag = 
          fun (_:obj) -> 
              for i = startIndex to endIndex do
                total := arr.[i] + !total
              flag := true  
    
    // Divy up the array
    let firstHalf  = 0,(arr.Length / 2 - 1)  
    let secondHalf = (snd firstHalf)+1,(arr.Length - 1)
    
    // Generate Delegates to sum portions of the array
    let thread1Delegate = sumElements  firstHalf  thread1Finished
    let thread2Delegate = sumElements  secondHalf thread2Finished
    
    [ thread1Delegate; thread2Delegate ]       // Queue up the delegates     
    |> List.map ThreadPool.QueueUserWorkItem   // Queue the use work items
    |> ignore                                  // Ignore the results 
        
    // Wait while the two threads finish their work
    while !thread1Finished = false ||
          !thread2Finished = false do

          Thread.Sleep(0)

    !total

Let me start with the drawbacks of this functional style first:

  • Too many values. There are seven let bindings in the function – presumably they are all important.
  • Concept count. This more functional way of doing thing certainly requires you to understand a bit about F# coding. (Mapping the thread code to a curried QueueUserWorkItem is pretty hot though.)

While arguably obfuscated, the functional style offers some clear benefits:

  • Important data is called out. Rather than hard coding the ‘0 to len / 2’ and ‘len / 2 to len’ values, the bounds for which to sum array elements is called out explicitly and on consecutive lines. This makes declaring an off-by-one error much easier to spot and fix.
  • Code reuse. Rather than hard coding the two thread pool functions, a sumElements function is introduced which takes explicit bounds to sum.

Personally I’m partial to the simplicity of the first example, but the value in calling out the range values in the second example shouldn’t be discounted. One of the best things I like about F# is that being a multi-paradigm language there is usually more than one approach to solve a problem, meaning you have options.

Thanks goes to Stuart for letting me post his code. I’m curious to hear what you think with regard to the two styles, please leave a comment and let me know what you think.

Posted by ChrSmith | 11 Comments
Filed under: ,

F# and the PFX Round 1

I’m currently working on a chapter for Programming F# titled Asynchronous and Parallel Programming which should turn out pretty well. Anyways, I have an example that was too awesome not to blog.

In this post I’ll show how to use the Parallel Extensions to the .NET Framework for finding the shortest path between two nodes on a graph.

Problem Description

Let’s say you are planning a road trip this summer, you will start in city X and travel to city Y. Since your car cannot fly, you cannot head straight to city Y. Instead, you need to find the shortest path from X to Y. (With bonus points for site seeing, like the World’s Largest Rocking Chair in Iowa)

CS Gangstas have studied this problem and gave it the banal name ‘Shortest path problem’. If you do your research you can find plenty of clever and efficient algorithms; in this blog post I’ll just be solving this problem using a simple approach.

The Algorithm

image

The simplest way to solve this problem is create a dictionary of ‘shortest known path to city X’. Then, keep track of the cities you would like to visit and the distance traveled so far on that route.

If you are at a city and the distance you have traveled so far exceeds the shortest known path, then return because you already know a shorter route to that city.

Otherwise, update your ‘shortest known path’ data structure, and visit all of the city’s neighbors and repeat the process. Once you have exhaustively tried all paths the resulting ‘shortest known path’ data structure will contain the shortest path from your starting city to your final destination.

Lame Solution in C#

Let’s try to quickly write this up in C#.

    enum City
    {
        LosAngeles,
        Seattle, 
        // [List cities here]
        Boston,
        NewYork
    }

    /// <summary>
    /// Initialize the routes between cities and the distances. 
    /// Example: int distTweenBostonAndChicago = x[Boston][Chicago];
    /// </summary>
    static void InitializeDistances(Dictionary<City, Dictionary<City, int>> dist)
    {
        // [Load your data here]
        return new Dictionary<City, Dictionary<City, int>>();
    }

    /// <summary>
    /// Object to represent a route to a given city.
    /// </summary>
    class Route
    {
        public Route()
        {
            DistanceTraveled = 0;
            Route = new List<City>();
        }
        /// <summary>
        /// Add a new leg to an existing route.
        /// </summary>
        public Route(Route r, City nextCity, int distance)
        {
            DistanceTraveled = r.DistanceTraveled + distance;
            Route = new List<City>(r.Route);
            Route.Add(nextCity);
        }
        public int DistanceTraveled { get; set; }
        public List<City> Route { get; set; }
    }

    /// <summary>
    /// This makes up for C# not having Tuples...
    /// </summary>
    class CityToVisit
    {
        public CityToVisit(City c, Route r)
        {
            this.City = c;
            this.Route = r;
        }
        public City City { get; set; }
        public Route Route { get; set; }
    }

    static void Main(string[] args)
    {
        var distTweenCities = new Dictionary<City, Dictionary<City, int>>();
        InitializeDistances(distTweenCities);

        var StartingCity = City.LosAngeles;
        var FinalDestination = City.NewYork;

        // Keep track of the shorest paths to get from the starting city to
        // any other reachable city.
        var shortestPaths = new Dictionary<City, Route>();

        // Initialize our 'cities to visit' list, begining at our start city having
        // traveled zero miles.
        var citiesToVisit = new Stack<CityToVisit>();
        citiesToVisit.Push(new CityToVisit(City.LosAngeles, new Route());


        while (citiesToVisit.Count > 0)
        {
            var curLoc = citiesToVisit.Pop();

            // Have we visited this city before? If not, then add it...
            if (!shortestPaths.ContainsKey(curLoc.City))
                shortestPaths.Add(curLoc.City, nextCity.Route);

            // Is the route we used to get here shorter than the best known?
            if (shortestPaths[curLoc.City] < curLoc.Route.DistanceTraveled)
            {
                // Update our shortest paths...
                shortestPaths[curLoc.City] = nextCity.Route;

                // ... and visit all of its neighbors using this shortest path
                foreach (var neighboringCity in distTweenCities[curLoc.City].Keys)
                {
                    var updatedRoute = new Route(curLoc.Route, 
neighboringCity,
distTweenCities[curLoc.City][neighboringCity]; var cityToVisit = new CityToVisit(neighboringCity, updatedRoute); citiesToVisit.Push(cityToVisit); } } } Console.WriteLine("Shorest path from {0} to {1} is {2} miles.", StartingCity, FinalDestination, shortestPaths[StartingCity][FinalDestination]); } }

Awesome Solution in F#

So while the above code solves the problem, it suffers two major setbacks. First, the algorithm behaves on order O(N^2), which means it is slow. This wouldn’t be as much of a problem if it weren’t for problem two: the implementation is single threaded.

Being either slow or being single threaded is OK, but both at the same time is bad news.

Fortunately in CLR 4.0 (and available on 3.5 in CTP form) the Parallel Extensions to the .NET Framework (PFX) make parallelizing code pretty simple.

It is pretty clear how you to execute our algorithm in parallel, simple parallelize the process of visiting each city. The problem however comes when you are trying to read or write from the shorestPaths dictionary. Having multiple threads read and write from the same data structure at the same time is a recipe for disaster.

However, the PFX contain some built-in collection types that are designed exactly for this sort of application. The PFX ConcurrentDictionary type acts and behaves just like a normal dictionary, except that you can safely use it in a concurrent environment. (I know that sounds really vague, I’ll write more about the benefits of the ConcurrentDictionary type in a later post.)

Pedant’s Note: The solution locks on STDOUT so output messages don’t spew nonsense to the output; however this effectively makes the program run synchronously.

open System
open System.Threading.Tasks
open System.Collections.Generic
open System.Collections.Concurrent
    
type City = 
    | Boise     | LosAngeles    | NewYork   | Seattle
    | StLouis   | Phoenix       | Boston    | Chicago   
    | Denver
    
// Known disances between US cities in miles. If there is a path from A to B,
// there is also a path from B to A. If no path is listed, then you cannot travel
// between the two cities.
let distTweenCities = 
    let distances = new Dictionary<City, Dictionary<City, int>>()
    
    [
        (Boise,   Seattle, 496);      (Boise,   Denver,  830);    
        (Boise,   Chicago, 1702);     (Seattle, LosAngeles, 1141); 
        (Seattle, Denver,  1321);     (LosAngeles, Denver,  1022);  
        (LosAngeles, Phoenix, 371);   (Phoenix, Denver,  809);      
        (Phoenix, StLouis, 1504);     (Denver,  StLouis, 8588);     
        (Denver,  Chicago, 1009);     (Chicago, NewYork, 811);     
        (Chicago, Boston,  986);      (StLouis, Chicago, 300);
        (Boston, StLouis,  986);      (NewYork, Boston,  211)
    ]
    // Convert the list of links between cities into a dictionary
    |> List.iter (fun (cityA, cityB, dist) -> 
        
        if not <| distances.ContainsKey(cityA) then
            distances.Add(cityA, new Dictionary<City, int>())
        if not <| distances.ContainsKey(cityB) then
            distances.Add(cityB, new Dictionary<City, int>())
            
        distances.[cityA].Add(cityB, dist)
        distances.[cityB].Add(cityA, dist))
        
    // Return our dictionary of dictionaries of distances
    distances

let stdout = Console.Out

let shortestPathBetween startingCity finalDestination =
    
    // Keep track of the shorest path from 'startCity' to a given city, and
    // the path used to get to that city.
    let shortestPaths = new ConcurrentDictionary<City, int * City list>()
    
    let rec searchForShortestPath curCity distanceSoFar citiesVisitedSoFar = 

        // Visit all available cities from the current city
        let visitAvailableDestinations() =
            
            let availableDestinations = distTweenCities.[curCity]
            
            // Loop through destinations and spawn new tasks
            for dest in availableDestinations.Keys do
                Task.Factory.StartNew(
                    new Action(fun () -> 
                        searchForShortestPath 
                            dest 
                            (distTweenCities.[curCity].[dest] + distanceSoFar) 
                            (citiesVisitedSoFar @ [dest])
                )) |> ignore

        // Have I already found a way to travel to this city?
        let visitedBefore = shortestPaths.ContainsKey(curCity)
        
        if not visitedBefore then
            // First time visiting this city, add to our 'visited cities'
            shortestPaths.TryAdd(curCity, (distanceSoFar, citiesVisitedSoFar))
            
            lock stdout 
                 (fun () -> printfn 
                                "Origional route to %A (%d) via: %A" 
                                curCity distanceSoFar citiesVisitedSoFar)

            
            visitAvailableDestinations()
        else // We have visited this city before, let's see if this route is faster
            let shortestKnownPath, cities = shortestPaths.[curCity]
            
            if distanceSoFar < shortestKnownPath then
                // Update shortest path, revisit neighboring cities
                shortestPaths.[curCity] <- (distanceSoFar, citiesVisitedSoFar)
                
                lock stdout 
                     (fun () -> printfn 
                                    "Found shorter route to %A (%d) via:  %A" 
                                    curCity distanceSoFar citiesVisitedSoFar)
                
                visitAvailableDestinations()
            else // Ignore, we have already found a faster way to get here
                ()

    // Create the master task to find the shortest path between the two cities
    let t = 
        Task.Factory.StartNew(
            new Action(fun () -> searchForShortestPath startingCity 0 [])
        )
    t.Wait()
    
    let dist, path = shortestPaths.[finalDestination]
    
    printfn 
        "The shortest distance from %A to %A is %d miles, with route:\n%A" 
        startingCity finalDestination 
        dist path
Posted by ChrSmith | 11 Comments
Filed under: , ,

Book Update – NEW Coauthor!

You might be concerned about my lack of blogging, perhaps that I was lost in the woods and perhaps eaten by a bear. Well, unfortunately, all of my time has been devoted to three tasks:

I would like to give you an update on item #1 my book, as there have been some rather interesting developments. There are plenty of great books on F# being written, and book competition is a great thing. However, one F#-related book in particular has caught my attention:

 

Real World Functional Programming

By Tomas Petricek

clip_image001

Tomas currently works as an intern at Microsoft Research in Cambridge on F# and is crazy-smart. (Seriously, he’s a level-9 ninja master at F#. Have you checked out F# Web Tools?)

But I’m not too worried about my book competing with his for two reasons:

  1. His book is about Functional Programming and mine is specifically about F#. So while there is some overlap, his book is about applications of functional programming while mine is about concepts and syntax.
  2. His has some creepy looking dude* and mine will (hopefully) have a fire-breathing Hydra on the cover.

For a time I wasn’t worried about Tomas’ book. Until now…

It was recently announced that Manning was adding a co-author to Tomas’ book; none other than Jon Skeet himself. Make no mistake, Jon is not some mortal programmer. Rather, his brain contains 63.8% of all knowledge on StackOverflow.

So Real World Functional Programming not only has one great authoring. But now it has two. So really, any other book on F# or functional programming is simply hosed. No point in publishing it – Tomas Petricek and Jon Skeet are teaming up to write a book on functional programming. Save your time and go home.

I don’t mind the challenge of competing with this book, but it certainly raises the stakes.

I’ve been making a few phone calls to Tim O’Reilly and Laurel Ruma about what we can do about this. It took a while to get layers to figure things out, but now I have a new coauthor of my own for Programming F#:

image

That’s right. CHUCK NORRIS has agreed to not only write the forward, but help coauthor the book to!

Chuck and I are still working out who will own which chapters, but I’ve been able to get an advanced copy of his forward to the book.

Dear Reader,

The fact you are sitting here reading this forward means that you weren’t already killed by the Russians. You have me and me alone to thank for that.

In this book Chris Smith and I will teach you how to roundhouse kick concurrency issues, use an M16 to mow down mutation-related bugs, and how to divide by zero. (I’ve been doing it for years, and trust me – it’s worth it.)

While I am very excited to have Chuck onboard, reading the page on Jon Skeet Facts has me worried.

  • Jon Skeet is immutable. If something's going to change, it's going to have to be the rest of the universe.
  • The Dining Philosophers wait while Jon Skeet eats.
  • Jon Skeet can recite π. Backwards.
  • When Jon gives a method an argument, the method loses
  • When Jon pushes a value onto a stack, it stays pushed
  • When invoking one of Jon's callbacks, the runtime adds "please"

It may be a tough call as for which book turns out better, but I can’t wait to see the results! In related news, Chapters 1-4 of Programming F# are available on Rough Cuts with another couple on the way. So stay tuned!

*Rumor has it that, the “creepy looking dude” on the cover of Real World Functional Programming is Danish philosopher Søren Kierkegaard [pic]… but does he breath fire?

Posted by ChrSmith | 4 Comments

Petition for Programming F#’s Book Cover!

You might have noticed my blogging has slowed down to a lull; never fear, this is just because I have been working  it is been because I have been hard at work writing I am working full steam on finishing up Programming F#. But I cannot finish this book without your help, read on.

The good news

The good news is that the first three chapters of Programming F# should be available online at O’Reilly’s Rough Cuts site very soon. (Maybe this week…) Signing up for Safari will give you access to pre-release chapters of the book as well as give me feedback on how to make the book better. (Something I would greatly appreciate.)

The bad news

But to be honest, as much as I’d like your feedback on early chapters of the book what I really need your help on is convincing ‘the man’ that Programming F# deserves a better cover than merely a fish, a rat, or a rabbit. Seriously, a rabbit?!? The linked books are all great, but having some cute and cuddly creature on the front doesn’t really inspire me to take the technology seriously.

That is why when I first announced the book I mocked up a cover with a Squig.

The Squigg - old and busted 

But ‘the man’ shot me down. You see, if O’Reilly is going to start having creatures more awesome than a frog on the cover they need to enforce a few other rules:

  • The creature should be easily identifiable. The awesomeness of the Squig is self-evident, but few can name what it is exactly. The cover on the book should be something people could recognize.
  • The creature can’t be copyright. Most of the Squig’s awesomeness was due to the fact it was created by Games Workshop, currently licensed to Mythic Entertainment. Licensing stuff like this makes lawyers cry at night.

So I literally went back to the drawing board to think about what animal best represents F#. Something that conveys its multi-paradigm nature. Specifically, it supports imperative, object-oriented, and functional programming

Then it hit me. What creature has three heads and totally kicks ass?

 

Cerberus

800px-Cerberus-Blake

Cerberus (Greek: Κέρβερος, Kérberos) is the name given to the entity which, in Greek and Roman mythology, is a multi-headed dog which guards the gates of Hades, to prevent those who have crossed the river Styx from ever escaping.

Now we are onto something, but to be honest having the gatekeeper for hell on the cover of my book doesn’t sit too well with me. While certainly keeping the denizens of Hades in check is a noble task, it carries along a lot of spirtual baggage. Is Cerberus evil? Or just doing his job?  Also, it isn’t like there are many cerberi – rather there is just the one (that I know of).

So, unfortunately I need to try again. But the more I think about it I like my second choice much better. Instead of Cerberus I went with a G-D-M-F-fire-breathing-HYDRA!

 

Hydra

You don’t F with those, you know why? Because it’s got multiple heads, and since the specifics on hydras are pretty unclear I’m going to go ahead and say that the hydra on the cover of my book is both undead and breathes fire.

Yes people, a lich hydra that breaths hellfire from its multi-paradigm heads of destruction.

So, I present to you, the cover of my book hopefully awesome enough to make the powers that be rethink their ‘cute and cuddly’ colophons.

Hydra Half

Use Multi-Paradigm Programming to Devour Your Foes!

“The only thing better than one fire-breathing paradigm is three!”

Yeah, NOW we are talking.

Call to Action!

However, to be clear this isn’t my official book cover, but it would certainly be awesome if it was. So I need your help to convince the good folks at O’Reilly the same.

So please politely send a tell to @LaurelAtOReilly on Twitter and let her know that a Hydra would make a much better cover than whatever the hell is on the cover of Unix in a Nutshell.

I also have a Cafe Press site up for T-Shirts. I haven’t received mine in the mail yet so I can’t comment on how well they will turn out.

Thanks for your help and support. And remember: the ability to breath fire is always better than being furry. Always.

Posted by ChrSmith | 21 Comments
More Posts Next page »
 
Page view tracker