Is C# becoming a functional language?

Article
01/23/2007

As many of you will be aware, C#3.0 is adding a significant number of new language features. While the overall driving force behind putting these features in is the support of LINQ (Language INtegrated Query), the way we do it is strongly inspired by functional programming techniques. Moreover, we strive to make the new features as general-purpose as possible (with the exception of query expressions – they are undeniably tied to the querying domain!), so they do come out as features that will enable more of a functional style of programming.

This begs the question: Are we making C# a functional language?

This of course is a vague question. First of all, what exactly is a functional programming language? What does it take to qualify? Rather than trying to define the canonical list of language features that you have to have to be functional, I think it makes more sense to define it as “a language that supports a functional style of programming.”

Given that, my just as vague answer would be “to some degree”: If you’re a dedicated functional programmer, is this the time to throw out your Haskell, Scheme, F# or OCaml compiler and join the party? Probably not. C# doesn’t go the whole way. In this post I’ll explore in more detail the goodie bag of functional programming, in a kind of willy-nilly fashion, not focusing on a particular functional language, and then examine to what extent C# is supporting each of the features or styles of programming.

Please be aware that Visual Basic is adding most of the same language features as C# in the Orcas release.

Expression-based programming

Wes Dyer has a great post about expression-based versus statement-based programming. When you construct things the expression-based way, you build them bottom-up from their constituents, as opposed to the statements based way, where you create things top down by modifying them.

Instead of

Point p1 = new Point();

P1.X = 3; P1.Y = 5;

Point p2 = new Point();

p2.X = -5; p2.Y = 4;

Point p3 = new Point();

p3.X = -1; p3.Y = -6;

Polygon triangle = new Polygon();

Triangle.Add(p1);

Triangle.Add(p2);

Triangle.Add(p3);

Given the right constructors you could do it the expression-oriented way:

Polygon triangle =

new Polygon(

new Point(3,5),

new Point(-5,4),

new Point(-1,-6));

Much nicer, and composes really well. Trouble is you have to have the right constructors, but sometimes you don’t, or sometimes it’s impractical or annoying to write all the constructors that correspond to reasonable initialization patterns of your type. Furthermore they may get big, and the caller then has to remember the order of all the arguments.

To this end, C#3.0 introduces object initializers and constructor initializers. Assuming that Polygon is a collection type (see my previous post for a definition of that term), you can now write this:

Polygon triangle =

new Polygon {

new Point { X = 3, Y = 5 },

new Point { X = -5, Y = 4 },

new Point { X = -1, Y = -6 }};

In LINQ, queries are all about expressions. In order to produce the right output of a query, you often need to construct new objects out of old ones as in

from m in members

group m by m.Address.PostalCode into group

select new Entry { PostalCode = group.Key, Number = group.Count() };

The subexpressions of the query expressions are, well, expressions, and it is really important that you can create and initialize new objects in that context, or we don’t have projection in our query story.

Value based programming

Expression based programming limits the need for mutation operations (assignment) to such a degree that some functional languages (the “pure” ones) don’t even have them. Instead all data, whether simple or complex, is considered immutable values.

Even if they do allow mutation, many functional languages rely more on value types. The lack of mutability is really quite a feature. No-one can mess with the stuff you pass to them, and concurrent access is really easy to synchronize! You can have sound value-based equality and hashing. Implementations can optimize to their heart’s contend by copying, parallelizing, etc.

In C# you can certainly create user defined types that are immutable, but it is far from as easy as in a functional language. Making properties immutable is quite easy – don’t write a setter - but the associated value based semantics for equality and hashcode computation are a pain to write.

C# 3.0 doesn’t change that a lot, but this is something we may want to look into in the future, as the need for parallelism increases with the number of cores in our CPUs, and the number of people wanting to use value-based techniques increases.

Types on the fly

Many functional programming languages rely to a larger degree on structural types that do not have to be declared but are just there when you need them: You have type constructors that allow you to build types from other types, and corresponding value constructors to built values of those types. In a fully structurally typed world you never declare a new type: any type declaration would just introduce an alias for an existing type. (And you need that, because these types can get large!)

We don’t quite go there in C#3.0, but we do take you further in the direction of not having to declare so many trivial little types. With anonymous types, for instance, the above query can be written as

from m in members

group m by m.Address.PostalCode into group

select new { PostalCode = group.Key, Number = group.Count() };

eliminating the need to author a trivial type Entry to hold your results. The limitation here is that you cannot designate the type in any way, so it cannot be used

Another thing that takes you in the direction of structural types is generics: You still have to declare a generic type, but once declared you can construct new types from it over and over. We utilize that in the LINQ libraries to make a set of fully generic delegate types called Func, one for each number of arguments, e.g.:

public delegate TResult Func<TArg0, TResult>(TArg0 arg0);

From this you can create a delegate for almost any one argument function type, e.g. Func<int,bool> for functions from int to bool. Because we use the Func types consistently within LINQ, in that context they essentially act as structural function types. You never need to write your own delegate type again. Essentially, Func is a type constructor for delegate types.

Tuples

One kind of on-the-fly types heavily used in functional programming is tuples: strongly typed lists of values; just like the set of arguments to a method, only as a value in itself that you can pass around.

In popular syntax (1,”one”) is an example of a tuple value of the type (int,string). Some functional languages don’t even allow you to specify functions of more than one value; instead that value can be a tuple. Where tuples would be really interesting would be for multiple results of a function or method. Imagine

(int,int) NextFib(int curr, int prev) { return (curr+prev,curr); }

Tuples is one really useful kind of type constructor that it might be useful to add in the future. Like the Func types above, you could imagine .NET having centrally defined generic Tuple<…> types predeclared up to a certain arity, e.g.:

public class Tuple<TFirst,TSecond> {

readonly TFirst first;

readonly TSecond second;

public TFirst First { get { return first; } }

public TSecond Second { get { return second; } }

public Tuple(TFirst first, TSecond second) { … }

}

You get the idea. It would then be a matter of syntactic desugaring to translate (5,3) into new Tuple<int,int>(5,3).

There are currently no plans of shared tuple types on .NET, but languages such as F# on top of .NET roll their own.

Pattern matching

To really make use of constructed values, you need a good way of deconstructing them. For the functional programmer, pattern matching is the tool of choice. With tuples for instance, rather than “dotting your way” into the individual values, as in:

(int,int) result = NextFib(5,3);

int current = result.First;

int previous = result.Second;

you can create and assign in one fell swoop the variables to hold the individual component value of the tuple, by “matching” its structure:

(int current, int previous) = NextFib(5,3);

Pattern matching can be used to declare a function by cases – as in:

FibHelp(0) = (1,0)

FibHelp(n) = NextFib(FibHelp(n–1))

This defines one function, but saves the if-then-else logic by matching on the arguments.

A final common use of type constructors and pattern matching is with built-in immutable list types. Let’s say that [] is the empty list. You can the “cons up” a list as e.g. 1 :: 2 :: 3 :: []. Each occurrence of :: creates a new list value with the left hand side value prepended to the right hand side list. The cool thing is that you can pattern match the elements back off when you see them:

AddAll([]) = 0;

AddAll(n :: l) = n + AddAll(l);

Pattern matching is a whole different approach to plucking values apart, and we are nowhere near adding such features to C#.

Higher-order programming

When you want to have a piece of code executed conditionally, or repeatedly, C# has a nice set of built-in control structures. This is, however, a very closed regime: No one gets to add their own control structures as new language features! In order to write your own control structures you need the ability to be parameterized with functionality, with chunks of code.

Any object oriented programming language actually facilitates that. You can write a class

class ChunkOfCode {

public abstract void DoIt();

}

which people can subclass to override the DoIt method with the right functionality. ChunkOfCode objects can then be passed to “control structure methods” that execute the DoIt methods if and where they want.

On .NET we make this a whole lot easier with delegate types. You can grab a delegate to any old method and pass it around.

In C#2.0 we made it even easier by allowing you to construct a delegate “on the fly” with anonymous methods. The major improvement here is not so much avoiding to declare a method, but the fact that anonymous methods are “closures” that allow you to refer to arguments and local variables in the enclosing method body. This is crucial when the functionality you want to pass depends on your local state, as is so often the case.

At the core of LINQ you find the standard query operators, such as Where, Select, GroupBy, Join, etc. These are really user defined control structures, parameterized by little bit of functionality represented as delegate types. Thus to make the passing of these delegates really neat we’ve given anonymous methods a face lift in the form of lambda expressions:

members

.GroupBy(m => m.Address.PostalCode)

.Select(group => new { PostalCode=group.Key, Number=group.Count() });

Lambda expressions in C# are not entirely like anonymous functions in a functional language because they don’t have a type in and off themselves. Instead they can be converted to delegate types, and with the Func types sufficiently standardized, the effect is pretty much the same.

Currying

We talk above about tuples as a way of making input and output more symmetric on methods or functions. However, whereas a C# programmer today only has the option of writing multiple parameters, not multiple results, a functional programmer will often choose to do the opposite – use tuples only for compound result types. The reason is that functions of multiple arguments are often written using currying, where the original function only takes one argument, and then returns a function that takes the rest. Example: Instead of

Add (x,y) = x + y

You write

Add (x) (y) = x + y

which is a shorthand for

Add x = y => x + y

So instead of calling Add(3,2) you can call Add(3) to get a function that will take an integer and return the result of adding 3 to it! Hardcore functional programmers will use this technique extensively and take great care to arrange their curried arguments in such an order that they will lead to the most useful “derived functions” when only some of the arguments are passed.

You have to question how useful this is in practice, but enough functional programmers swear by it that I guess I’m willing to believe it will save me some day.

Strictly speaking, with anonymous methods or lambda expressions C# methods can also be curried, but it ain’t pretty:

Func<int,int> Add(int x) { return y => x + y; }

However currying of lambda expressions themselves gets pretty agreeable:

x => y => x + y;

Type inference

One convenient feature of statically typed functional languages is that you can have most or all of your type annotations inferred for you so that typing is really the compiler’s problem. In an object-oriented world we don’t have that option in the large, and in statically typed object-oriented languages such as C# we sort of take pride in the fact that our APIs are in fact annotated with types that users can see. At the architectural level, having explicit contracts is important.

However, in the small there are often cases where types are just in the way.

Already in C# 2.0 we introduced type inference for generic method calls, so that most of the time you do not have to be explicit about the type arguments to a method – they are inferred from the types of the method arguments.

In C# 3.0 we take this idea further, by being really clever about the way we use lambda expressions for type inference when they are used as arguments to a call of a generic method. Thus the types “flow” through the method call in and out of the lambdas and out in the return type.

Furthermore we’re adding type inference for local variables with the var keyword. Thus in

var query = customers

.Where(c => c.City == “London”)

.SelectMany(c => c.Orders)

.GroupBy(o => o.OrderDate.Year);

Without writing a simple type the compiler can figure out that query is of type IGrouping<int,Order>.

In C# we have deliberately kept the type inferencing algorithm simple, in the sense that it never infers a type that is not already there in the constituent values: We don’t synthesize types. This keeps the types under control, so the user has a chance of understanding what is going on if they get an error message etc.

Comprehensions

Oftentimes the passing around of lambdas gets too messy even for a functional programmer, and so many functional languages introduce syntactic sugar for common patterns involving the passing of functions. Such sugar is called a comprehension (presumably because it makes the code comprehensible) and we snatched that idea right in with query expressions in C# and VB – indeed in the early design days we did call them query comprehensions. In C# these are really just syntactic sugar which enables people to write most queries without lambda expressions. The above query (or one very much like it), for instance is generated by the query expression

var query =

from c in customers

where c.City == “London”

from o in c.Orders

group o by o.OrderDate.Year;

The extreme syntacticness of comprehensions is central in LINQ; it means that we just generate the method calls to the query operators syntactically, but anyone can implement their own methods with the right names on a given type and have it be the source of query expressions.

Conclusions

C# 3.0 and LINQ owe a lot to the functional programming languages in terms of the mechanisms we adopt. No doubt this will rub off to some degree on non-LINQ scenarios where a more expression-oriented style of programming becomes possible. People who are functional programmers by night and have to write C# by day will find that the daily headache sets in a little later in the afternoon.

However, as you can see from the above, there is some way to go before full-blown functional programming becomes totally natural in C#. There is probably a limit as to how far you can straddle before the pants rip, and we may never get to a point where C# is a true multiparadigm language with equal treatment of object-oriented and functional programming. Still I think there is room for more of this kind.

We have this thing about listening to customers, so I guess it is also up to you!

Is C# becoming a functional language?

Additional resources