Welcome to MSDN Blogs Sign in | Join | Help

Correct by Construction in F#

Correct by Construction in F#

Jomo Fisher—a theme in the design of the F# language is that problems in the code are revealed as compilation errors. Consider this C# code which is used to compose a courteous letter to a customer:

enum Courtesy { Mr, Ms, Dr }

class Customer {

    public Courtesy Courtesy { get; set; }

    public string Name { get; set; }

}

class Letter {

    public string Introduction { get; set; }

    public string Body { get; set; }

}

static Letter Correspond(Customer customer, string message) {

    switch (customer.Courtesy)

    {

        case Courtesy.Mr:

            return new Letter { Introduction = "Mr. " + customer.Name,

                                Body = message };

        case Courtesy.Ms:

            return new Letter { Introduction = "Ms. " + customer.Name,

                                Body = message };

    }

    return null;

}

 

This compiles without errors despite having some problems (can you spot them?)

I want to stop now and claim that I’m not picking on C# which is a wonderful language. It’s just that I need to compare F# to something in order to illuminate my point and I happen to know C# pretty well.

Now, transliterate the code above into F#:

#light

type Courtesy = Mr | Ms | Dr

type Customer = {

    Courtesy:Courtesy

    Name:string

}

type Letter = {

    Introduction:string

    Body:string

}

let Correspond(customer,message) =

    match customer.Courtesy with

    | Mr -> {Introduction = "Mr. " + customer.Name; Body=message}

    | Ms -> {Introduction = "Ms. " + customer.Name; Body=message}

    | _ -> null

 

Compiling this gives a warning on the last line:

error FS0043: The type 'Letter' does not have 'null' as a proper value.

In C#, objects are allowed to be null by default and your code is responsible for handling the possibility of nullity everywhere. F# objects are not allowed to be null and, except in a few special cases such as interoperating with non-F# code, you can assume that object instances aren’t null.

So the caller of the Correspond function need not worry about receiving a null Letter instance and I can just delete the last line:

let Correspond(customer,message) =

    match customer.Courtesy with

    | Mr -> {Introduction = "Mr. " + customer.Name; Body=message}

    | Ms -> {Introduction = "Ms. " + customer.Name; Body=message}

 

Now I get a warning:

warning FS0025: Incomplete pattern match on expression. The value ‘Dr’ will not be matched

This is an easy mistake to make: I forgot one of the cases in the Courtesy enumeration.

An aside on pattern matching: If you code in F# you’ll find yourself using pattern matching for a significant amount of the logic in your program. It’s far more than just a synonym for switch. It can do complex matching against just about any type of object. This, combined with completeness checking, helps eliminate a large class of bug.

Here’s my final corrected code:

let Correspond(customer,message) =

    match customer.Courtesy with

    | Mr -> {Introduction = "Mr. " + customer.Name; Body=message}

    | Ms -> {Introduction = "Ms. " + customer.Name; Body=message}

    | Dr -> {Introduction = "Dr. " + customer.Name; Body=message}

 

This idea that F# should lead you to write code that is correct by construction is not just about the features I mentioned above. Its fundamentally weaved through the language. Take a look at how casting works for another example.

Honestly, it took me some time to get used to. I particularly struggled with the non-nullity of object instances. I was used to being able to smuggle the special “null” value around without changing my code to accommodate it. I realize now I was just sweeping a potential problem under the rug.

This posting is provided "AS IS" with no warranties, and confers no rights.

 

Posted by Jomo Fisher | 7 Comments
Filed under: ,

Strange Confluence: An Immutable Queue in F#

Jomo Fisher--Reading one of my favorite blogs this morning--Eric Lippert's Fabulous Adventures in Coding--I came across his article on implementing an immutable queue in C#. The funny thing is that just yesterday I wrote exactly the same structure in F#. Here it is for your reading pleasure:

    type Fifo<'a> =

        new()={xs=[];rxs=[]}

        new(xs,rxs)={xs=xs;rxs=rxs}

       

        val xs: 'a list;

        val rxs: 'a list;

   

        static member Empty() = new Fifo<'a>()

        member q.IsEmpty = (q.xs = []) && (q.rxs = [])

        member q.Enqueue(x) = Fifo(q.xs,x::q.rxs)

        member q.Take() =

            if q.IsEmpty then failwith "fifo.Take: empty queue"

            else match q.xs with

                    | [] -> (Fifo(List.rev q.rxs,[])).Take()

                    | y::ys -> (Fifo(ys, q.rxs)),y

This is code modified from a mutable structure that I found in the F# distribution. The queues are pretty similar except for a few minor details:

  • F# version doesn't implement IEnumerable<T>.
  • F# version relies on the built-in list class which is semantically identical to an immutable stack.
  • C# version carries the "popped" instance of T along with the queue itself whereas the F# version returns this as part of a tuple from Take(). I think I prefer my way on this point--it seems like information leakage for the queue have a memory of the last thing popped.

That's all for now, and as always...

 

This posting is provided "AS IS" with no warranties, and confers no rights.

Posted by Jomo Fisher | 1 Comments
Filed under: , , ,

Tight Code--A Puzzle in F#

Jomo Fisher--Luke Hoban wrote something in a blog entry that resonated with me:

One of the most striking features of F# code is that it is very terse - ideas can typically be expressed with a small amount of code. 

Don Syme once mentioned (I'm paraphrasing) that an aspirational goal for F# in the beginning was to make a compiler that could compile itself in less than 10,000 lines of code. Though that line count has since been passed I often get the sense that, with enough thought, I could boil whatever code I'm working on down to almost nothing.

Consider this coding problem (which I have since used as an interview question). Take a singly-linked-list of singly-linked-lists and pivot the inner values. For example say I have this list:

let have = [['a';'b';'1'];['c';'d';'2'];['e';'f';'3']]

Then the list I want is this:

let want = [['a';'c';'e'];['b';'d';'f'];['1';'2';'3']]

So the new list's first list has the first item from each of the original inner lists and so on. This problem actually came up in real-world piece of code I was working on. My first cut was procedural and about twenty lines of code. I don't have it here to show you, but after a few iterations and refactorings I got it down to a respectable nine lines:

let flatteninto source target =

    List.zip target source |> List.map(fun (head, tail)->head@[tail])

let pivot list =

    let rec work target = function

          head::tail-> work (flatteninto head target) tail

        | [] -> target

    let empties = list|>List.map(fun l->[])

    work empties list

At this point, I was stuck. It worked, but it seemed like there should be a better solution. In particular, the second to last line where I build up a list of empty lists didn't seem quite right. Being stuck, I asked my teammates whether there was a better solution out there. As it turns out, James Margetson had seen a beautiful four line solution to the problem.

I'll post the solution James showed me in a few days. In the meantime, I'd like to invite you to give the problem a try in the programming language of your choice. Can you cleanly beat my nine-line solution above? Can you get it down to four lines? I only ask that you don't change the input structure--singly-linked-list of singly-linked-list. Also, in my problem, the input was guaranteed to be well formed so I didn't need to check for jagged inner lists or other malformed inputs. Finally, notice that while I showed a 3x3 input in the example the dimensions don't have to be the same.

Update 11/20/2007 - Spoiler Alert!

As promised, here's the F# that James showed me

let rec transpose haves =

    match haves with

    | (_::_)::_ -> map hd haves :: transpose (map tl haves)

    | _         -> []

This is essentially the same as the Haskell algorithm that has been posted in the comments.

I want to thank the folks who posted solutions in various different languages--Scheme, Haskell, Ruby, C#, Python. Also, there was an OCaml solution which is indeed compatible with F#.

Several folks pointed out the analogy with matrix transpose. The code does get a lot easier when your input is an array of arrays, but you don't always get to pick your input representation.

One commenter opined that functional code may be easier to read than it is to write. This is true for me in one sense: there seems to always be a way to write the code a little better. Figuring out when I'm done is a challenge.

 

This posting is provided "AS IS" with no warranties, and confers no rights.

Posted by Jomo Fisher | 36 Comments
Filed under: ,

The Least a C# Programmer Needs to Know about F# Part II--Modules

Jomo Fisher--Many languages, especially those in the OO vein, require an outermost class to put code in. Usually, good practice requires an enclosing namespace as well.  F# allows functions and even function calls in the outermost scope. Here is the minimal F# program:

 

printf "Hello!"

 

Really, that's the whole thing. My first concern upon seeing this was that the all these functions in the global namespace would eventually collide in larger projects making things unmanageable. It seemed a lot like returning to the nasty global state that you sometimes see in older C codebases. It turns out, however, that F# puts each file's functions, function calls and types inside its own construct called a "Module". Consider this slightly modified version:

 

// File Hello.fs

#light

let Say s = printf s

Say "Hello!"

 

Which is saved in file called Hello.fs. This defines a function called 'Say' and then calls it. Using Reflector, I can see the Say method decompiled into C#:

 

[CompilationMapping(SourceLevelConstruct.Module)]

public class Hello

{

    public static T Say<T>(Format<T, TextWriter, Unit, Unit> s)   {

        return Pervasives.printf<T>(s);

    }

}

 

The F# compiler has put the method into a public class called Hello . This is F#'s representation of the Module. The class name Hello is derived from the file name Hello.fs.  You can change the Module name if you like:

 

module MyModule =

    let Say s = printf s

    Say "Hello!"

 

For larger projects, I think its best to explicitly name modules. For smaller, one-off projects its convenient not to have to think about it.

 

In most .NET projects I've worked on, there usually ends up being a few utility classes that just have a bunch of related static methods in them. These are functions that just didn't cleanly fit the OO paradigm. Its tempting to think of these classes as code smells. After all, surely there's a clean OO representation for them that I haven't thought of yet, right?

 

I think the answer to this is: no, not everything fits well into object-orientation.

 

My proof by existence is to consider .NET's Stream classes that were released in .NET 1.0. These classes were extremely well factored. They beautifully hid the underlying media or communication protocol behind an abstract stream of information.  They were also damn hard to use for simple cases. In a later .NET release the File, Path and Directory classes were introduced. These were just classes with nice, simple static methods for doing the common things you want to do with files, paths and directories.

 

F#  makes this style of API a first-class citizen: the Module.

 

So am I saying good-bye to OO? No way. But I consider it a hammer and not every problem is a nail.

 

This posting is provided "AS IS" with no warranties, and confers no rights.

Posted by Jomo Fisher | 3 Comments
Filed under:

WideFinder--Naive F# Implementation

Jomo Fisher--Here's an interesting problem that some people are having fun with. Don Box posted a naive implementation in C# so I thought I'd post the equivalent in F#: 

#light

open System.Text.RegularExpressions

open System.IO

open System.Text

 

let regex = new Regex(@"GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+)", RegexOptions.Compiled)

 

let seqRead fileName =

    seq { use reader = new StreamReader(File.OpenRead(fileName))

          while not reader.EndOfStream do

              yield reader.ReadLine() }

             

let query fileName =

    seqRead fileName

    |> Seq.map (fun line -> regex.Match(line))

    |> Seq.filter (fun regMatch -> regMatch.Success)

    |> Seq.map (fun regMatch -> regMatch.Value)

    |> Seq.countBy (fun url -> url)

And here's the code to call it:   

for result in query @"file.txt" do

    let url, count = result

One nice thing is that F#'s interactive window has a #time;; option which shows you wall-clock time and CPU time. Here is the result from running the code above on a 256meg file I concatenated together (I couldn't find the one Don was using):

Real: 00:00:06.899, CPU: 00:00:04.165, GC gen0: 416, gen1: 1, gen2: 0

It looks like the majority of the time is in CPU so there should be ample opportunity to parallelize. One thing to note: I think the interactive window is unoptimized--when I just compile and run the code, I get times in the sub 5-seconds range. My machine is a 4-way 2.4 GHz Core Duo.

This posting is provided "AS IS" with no warranties, and confers no rights.


                    
				            
Posted by Jomo Fisher | 4 Comments
Filed under:

The Least a C# Programmer Needs to Know about F# Part I--Implicit Types

Jomo Fisher--A few weeks ago, a fellow C# programmer asked me what the biggest differences between programming in C# and programming in F# are. Since then, I've been building a list of differences. My plan was to write a single article that discussed everything. This morning the list got too long to reasonably put in a single article. Also, I'm not sure when I'll be done discovering the differences. So here's my first in a series of articles. (This is based on the 3.0 version of the C# language. If you want to see what's new in C#, it might be helpful to start here.)

 

You rarely mention types in F# code

In version 3.0, C# added the var keyword. This keyword allows you to not specify a type and instead rely on the compiler to figure out the type.  This works for locals only so most types are still explicitly specified in the code. In F#, this balance is reversed. You can explicitly specify types but you're rarely required to. F# has a powerful system for deducing the types. Consider this F# function:

 

let Add a b = a + b

 

This defines a function which adds two of something to each other. So what are the types of the variables a and b? The F# compiler has chosen int because of the addition that's happening in the function. A second function:

 

let AddPlusOne a b = a + b + 1.1

 

looks similar, but now the a, b and the return of the function are types as float. In this case, the compiler decides float is the right type because we're adding constant float value.

 

Most of the time in F# you don't specify types. I find myself relying heavily on the tooltips in Visual Studio for type information:

image

I think its legitimate to be uncomfortable with this level of implicitness. Especially coming from the C++ diaspora there is a tradition of being explicit when possible. Indeed, in F# you are free to explicitly specify types everywhere if you choose.

 

That said, I really like C#'s type inference and have used it in production code (LINQ to SQL). It tends to make the code more readable because you spend more time looking at the algorithm and less time looking at concrete type names.  I haven't shipped or maintained F# code yet, but my feeling is that more type inference is better.

 

 

 

This posting is provided "AS IS" with no warranties, and confers no rights.

 

Posted by Jomo Fisher | 6 Comments
Filed under: ,

Soma on F#

Soma announced some exciting news this morning. Developer Division--the people at Microsoft who make Visual Studio--is partnering with Microsoft Research on F#. We're going to fully integrate F# into Visual Studio:

“This is one of the best things that has happened at Microsoft ever since we created Microsoft Research over 15 years ago.  Here is another great example of technology transfer at work.  We will be partnering with Don Syme and others in Microsoft Research to fully integrate the F# language into Visual Studio and continue innovating and evolving F#.”

Don Syme and his team work out of Cambridge. Over the past few weeks we've been busily spinning up a team in Redmond to work with them. As its stands today, the team looks like this:

Cambridge: Don Syme (does everything), James Margetson (development) and Ralf Herbrich (development)

Redmond: Luke Hoban (PM), Me (development)

Of course we'll add more people over the next weeks and months. Critically, we still need to build the test organization.

The C# Product Unit is sponsoring the partnership and for the foreseeable future Luke and I will continue to work in the C# organization. I think this is a strong match because the C# org, as you probably expect, has a deep relationship with the other parts of Microsoft that will be important going forward: Parallel Fx (PFx), the CLR team, and so forth. We also have Anders Hejlsberg.

F# is a language that makes it easy to make beautiful, concise code. I'm excited to be part of getting it into people's hands.

This posting is provided "AS IS" with no warranties, and confers no rights.

Posted by Jomo Fisher | 6 Comments

Adventures in F#--Sweet Test-First Kung Fu

Jomo Fisher--Up until now, I've been avoiding using F# with the VS IDE. I've been using notepad.exe and fsc.exe because I wanted to build my own expectation for what the experience should be before I experienced what it actually was. I can tell you that I didn't expect the sweet experience of using the F# interactive window.

I've written in the past about Test Driven Development. Regardless of language, I really like the process of iteratively writing code and unittests tests for that code. For very small projects--simple algorithms and classes--I've found the process of getting going very tedious compared to the amount of actual code I want to write. In order to get started, you need a new project and solution for your code, a separate project for your tests and you need to reference your code project from your test project. All this means friction and tends to discourage firing up VS and freely inventing for very small projects.

Enter the F# Interactive window. Once you've installed F# and checked the Add-in box under Tools\Add-in Manager you'll see the F# interactive window. You can type code into this window and it will execute immediately. Importantly, you can select some text from your current source file and press alt-enter to execute it.

So here's my new process for these tiny one-off projects:

1) Ctrl-N to create a new text file (rename to mycode.fs)

2) Type in the skeleton of the function

3) Select function and press alt-enter

4) Write test, select it and press alt-enter (see failure)

5) Fix function to make test pass and goto 3

Notice that there's not a project or even a solution involved here--its just developer writing code. It's a really sweet experience. I only wish F# would install over C# Express 2008 since that's what I tend to use at home.

I like to show code in my posts when possible and I want to give you a visual idea of what I'm talking about. Here's a simple function I wrote--with tests--that takes a string in the form of A.B.C.D or vA.B.C.D and returns a tuple of 16-bit ints with values (A,B,C,D). The tests are at the end.

#light
let ParseVersion s = 
    // Build a list with each element of the version in reverse order.
    let rec parse(s:string,pos,count,result) = 
        let versionSize = 4
        if pos >= 0 then 
            if s.[pos] = 'v' && pos = 0 then parse(s,1,0,[])
            else 
                let dotPos = s.IndexOf(".", pos)
                if dotPos = -1 then parse(s, -1, count+1, int_of_string(s.Substring(pos))::result)
                else parse(s, dotPos+1, count+1, int_of_string(s.Substring(pos, dotPos-pos))::result)
        elif count < versionSize then parse(s, pos, count+1, 0::result) 
        else result
        
    let l = parse(s,0,0,[])
    (uint16(List.nth l 3), uint16(List.nth l 2), uint16(List.nth l 1), uint16(List.nth l 0))
    
ParseVersion "2.0.1"
ParseVersion "2.0"
ParseVersion "v2.0"
ParseVersion "v1.2.3.4"
ParseVersion "v1"
ParseVersion "1.2.3.4"

(Notice how I made it tail-recursive to take advantage of F#'s tail-call optimization?)

I wrote this using more-or-less test-first principles and I was very happy with the flow of the process.

This posting is provided "AS IS" with no warranties, and confers no rights.

Posted by Jomo Fisher | 3 Comments
Filed under:

Adventures in F#--Lazy Like a Fox

Jomo Fisher--This is the next part of the open-notebook series documenting my exploration of F#. Today, I'm looking at the lazy keyword. When an F# function is invoked its value is immediately evaluated. F# does not do lazy evaluation by default. However, laziness can be useful and F# provides an easy way to make a lazy function:

let f n = lazy (n+1)

This represents a function named 'f' that takes a parameter named 'n'. It returns a placeholder for the result of adding 1 to n. If the actual value is needed later during program evaluation then it is computed then. If the value is never needed then its never computed.

Turning to Lutz Roeder's Reflector, we can see the equivalent C#:

    public static Lazy<int> f(int n) {
        return new Lazy<int>(new LazyStatus.LazyStatus<int>._Delayed(new f@2(n)));
    }

    // Nested Types
    [Serializable]
    internal class f@2 : FastFunc<Unit, int> {
        // Fields
        public int n;

        // Methods
        public f@2(int n) {
            this.n = n;
        }

        public override int Invoke(Unit _delay1) {
            return (this.n + 1);
        }
    }

That first static function is the 'f' function I wrote in F#. It returns a Lazy<int> which is the placeholder I mentioned earlier.

Next, there's a class called f@2 that inherits from FastFunc<Unit, int>. If f@2 looks like a fishy identifier name to you the reason is that its not a legal C# or F# identifier. It is, however, a legal identifier for the CLR. I think the compiler is choosing this name to avoid conflicts with other identifiers in the project. I came across FastFunc before here--it's a class that represents a function call. This class holds an int called 'n'--this is the value captured from the call to function f.

The first type parameter to FastFunc<,> is Unit (my eyes have confused this with Uint several times now). Functions in F# aren't allowed to take or return no parameters. In the case of our function above, the returned lazy placeholder has had its parameter specified already so the resulting FastFunc needs to accept no parameter. The solution is Unit which is a type that has exactly one possible value (I assume this value is '1' because of the name of the type).

Let's take a peek at the Lazy class in fslib.dll:

    [Serializable, DefaultAugmentation(false), CompilationMapping(SourceLevelConstruct.RecordType)]
    public sealed class Lazy<A> : IStructuralHash, IComparable {
        // Fields
        public LazyStatus.LazyStatus<A> _status;

        // Methods
        public Lazy(LazyStatus.LazyStatus<A> _status);
        public sealed override int CompareTo(Lazy<A> that);
        public override int CompareTo(object that);
        public static Lazy<A> Create(FastFunc<Unit, A> f);
        public A Force();
        public sealed override int GetStructuralHashCode(ref int nRemainingNodes);
        public A SynchronizedForce();

        // Properties
        public bool IsDelayed { get; }
        public bool IsException { get; }
        public bool IsForced { get; }
        [CompilationMapping(SourceLevelConstruct.Field, 0)]
        public LazyStatus.LazyStatus<A> status { get; set; }
        public A Value { get; }
    }

The property Value is what evaluates and returns the computed value. That _status is essentially an enumeration that describes the current state: Delayed, Value, Exception. The 'Delayed' state means the function is currently uncomputed, the 'Value' state means the value has already been computed, finally 'Exception' means that computation was attempted and an exception resulted.

From this, it seems clear that the computation is only performed once and a cached value is returned thereafter. If there was an exception, then this exception is cached and rethrown each time the result is accessed.

This posting is provided "AS IS" with no warranties, and confers no rights.

Posted by Jomo Fisher | 3 Comments
Filed under:

Adventures in F#--Corecursion

Jomo Fisher--In a prior post I touched on recursion in F#. One of the comments was about mutually recursive functions. The example given was,

let f1 a
do print a
f2 a

let f2 a
do print a
f1 a

f1 1

It turns out that this F# doesn't compile because F# scoping rules are different than what you might expect coming from C# or VB. At least it wasn't what I expected. The function f1 cannot call f2 because f2 is not in scope yet because it's lower in the file.

Mutual recursion is a useful feature and I was sure F# must have a way to support it. I searched around quite a bit, but I didn't know the right question to ask the search engines. Eventually, I got some help from my friend  Luke Hoban.

The way to create mutually recursive functions in F# is with the and keyword:

let rec f1 n =
    f2 (n+1)
and f2 n =
    f1 (n+1)
   
f1 1

Will this program overflow the stack? Let's look at the corresponding C# decompiled by Lutz Roeder's Reflector:

    public static T f1<T>(int n) {
        return f2<T>(n + 1);
    }

    public static T f2<T>(int n) {
        return f1<T>(n + 1);
    }

At first glance, it looks like this will overflow the stack. However, if you try to verify this experimentally you'll see that it doesn't. You have to look directly at the IL to understand why:

.method public static !!T f1<T>(int32 n) cil managed
{
    .maxstack 4
    L_0000: ldarg.0 
    L_0001: ldc.i4.1 
    L_0002: add 
    L_0003: tail 
    L_0005: call !!0 Test::f2<!!T>(int32)
    L_000a: ret 
}

The secret is the tail opcode. This instructs the following call to behave like a jump rather than a function call. Since no stack is consumed, the stack cannot overflow.

 

This posting is provided "AS IS" with no warranties, and confers no rights.

Posted by Jomo Fisher | 4 Comments
Filed under: ,

Adventures in F#--Tail Recursion in Three Languages

Jomo Fisher—Here’s the F# I'm looking at today:

#light
let rec f n =
    do printf "%d\n" n
    f (n+1)

f 1

This defines a recursive function 'f' that takes a value 'n' as a parameter. This function prints the value of n to the console and then calls itself with n+1. So we've defined a function that will count up for a long time.

The question is, in what manner does this program end?

There are plenty of possibilities. The value of n must have a type and types always have physical limitations. A 32-bit number will overflow eventually, for example. If it overflows and arithmetic is checked, then it will throw an exception and that would end the program. If arithmetic isn't checked then the value will wrap around to the lowest value of the type, forming a never-ending loop. In this case, the program's ultimate fate is CTRL+C.

If the type of n is something with an arbitrary number of digits then perhaps the program could count up until memory on the machine is exhausted with holding information about the number. Though before this happened the machine would disintegrate along with the heat-death of the universe in a few trillion years. Probably way before that happened I would press CTRL+C.

Programmers accustomed to recursion may propose a more down-to-earth possibility. That the stack will overflow quickly and the subsequent exception will end the program.

Tail Recursion in C#

To see what I mean about stack overflow. Here's a hand-written transliteration of the F# above into C#:

    class Program {
        static void f(int n) {
            Console.WriteLine(n);
            f(n + 1);
        }

        static void Main(string[] args) {
            f(1);
        }
    }

On my 32-bit Vista machine, this counts up to 119922 and then throws a StackOverFlowException. On the other hand, if I run exactly the same executable on a 64-bit machine then it will run indefinitely, eventually looping back around to Int32.Min and continuing on from there until I press CTRL+C . There's a reason for this, and its not because there's more stack space on the 64-bit machine (though there is). I'll come back to this point later.

Tail Recursion in F#

If you run the F# I originally posed, it will run indefinitely on both 32-bit and 64-bit machines. To understand why, I'll disassemble that F# into C# using Lutz Roeder's reflector and render it into equivalent C#:

    public static T f<T>(int n) {
        while (true) {
            var fmt = new Format<FastFunc<int, Unit>, TextWriter, Unit, Unit>("%d\n");
            Printf.fprintf<FastFunc<int, Unit>>(Console.Out, fmt).Invoke(n);
            n++;
        }
    }

First, that Format business is just the printf in the loop. I'm a little surprised that the allocation of 'fmt' isn't lifted out of the loop. Note to self to investigate this later.

Setting that aside, the function which was written as recursive as been tranformed by the F# compiler into one with no recursion--its just a while loop. This is why there's no stack overflow in F#. This translation by the compiler is called the Tail Call Optimization.

Tail Recursion in C++\CLI

To get one more data point, let's look at the same function transliterated into C++\CLI:

    void f(int n) {
        printf("%d\n", n);
        f(n+1);
    }

    void main(){
        f(1);
    }

This will give you a nice warning:

warning C4717: 'f' : recursive on all control paths, function will cause runtime stack overflow

True to its word, if you execute the resulting code on either 32 or 64-bit machines you will get a stack overflow.

If you turn on full optmization in C++ (/Ox) you will still get the warning, but inspection under reflector shows that you will never actually get a stack overflow:

public private: static void __gc* modopt(CallConvCdecl __gc*) f(Int32 __gc* n)
{
    while (true)
    {
        <Module>::printf(&<Module>::?A0xdb2783a0.unnamed-global-0, n);
        n++;
    }
} 

As you can see, the C++ compiler has done the tail call optimization in this case.

Why the Difference Between 32 and 64 Bit?

There are some nagging questions left. Why does the C# version not overflow the stack under 64-bit when no tail recursion optimization is done by the compiler?

The missing piece is that, for managed code, there is a second chance to optimize. This is at JIT time or NGEN time. It turns out that the 64-bit jitter will do the tail call optimization on the raw IL and the 32-bit version doesn't (at least right now).

This raises a second question. Why did the unoptimized C++\CLI version overflow the stack on 64-bit machines? The reason is more mundane here: the C# project system targets AnyCPU by default while the C++ project system targets x86. If you run an x86-targetted .exe on a 64-bit machine it will still use the 32-bit jitter. If you change the C++ project to target 64-bit then you will get the tail optimized result.

Compiler Design Choices

So which compiler is doing it the right way? I think they all are. For C# applications tail recursion is relatively rare. Leaving the optimizing to the jitter to do or not makes sense--a better jitter benefits all managed languages. This tradeoff frees up the C# team to focus on things like LINQ. On the other hand, C++ already had a world-class optimizer available. It's a clear win to make as much of this optimizer work for C++\CLI as possible. Finally, tail recursion is very common when programming under the functional paradigm which is where you'd like to spend much of your time in F#. Without this optimization in the compiler itself, F# would be significantly crippled on 32-bit machines.

 

This posting is provided "AS IS" with no warranties, and confers no rights.

Posted by Jomo Fisher | 9 Comments
Filed under: , , ,

Adventures in F#--Discriminated Unions

Jomo Fisher-- Easily my favorite feature of F# so far is the combination of discriminated union and pattern matching. Together, these allow you to concisely represent a complex language-like data structure and operations over that structure. We used this pattern extensively in LINQ to SQL (though in C#) and it can be seen in some form or another in most compilers.

As in my past F# bloglets--Lay of the Land, Probing Type Inference, Function Type Inference, Poking Tuples with a Stick--I'm going to compile some simple F# and then use Lutz Roeder's Reflector to disassemble into C#. For this article I'm going to look at discriminated unions and save pattern matching for later.

Here's a simple discriminated union:

type MyUnion =
      Int of int
    | String of string
    | TwoStrings of string * string

This represents a type which may hold an integer, a string or a pair of strings. Here's the C#:

[Serializable, CompilationMapping(SourceLevelConstruct.SumType)]

public class MyUnion : IStructuralHash, IComparable

{

    public const int tag_Int = 0;

    public const int tag_String = 1;

    public const int tag_TwoStrings = 2;

 

    public MyUnion();

    public override int CompareTo(object that);

    public sealed override int CompareTo(Test.MyUnion that);

    public override bool Equals(object that);

    public override int GetHashCode();

    public sealed override int GetStructuralHashCode(ref int nRemainingNodes);

    [CompilationMapping(SourceLevelConstruct.Alternative, 0)]