This article will discuss effective use of types to catch some common problems at compile time not normally found by the typechecker.
The typechecker is the most important tool in the programmer's arsenal. Types enable you to structure code in a more maintainable way and prevent entire classes of errors from ever occurring at runtime, many of which can be insidiously difficult to debug. If you've written anything extensive in assembly language, you can understand the frustration of dealing with misinterpretations of data leading to corruption of program state. Researchers like to say "well-typed programs never go wrong".
But what does "go wrong" mean? One answer is: "more than you think". By making effective use of the type system to encode invariants about your program, you can use the compiler to catch more errors at compile time that you currently catch at run time.
A simple example is SQL injection, where insertion of user-supplied data into a SQL statement string can enable attacks where the user modifies the nature of the statement with carefully chosen data. At first the solution seems simple: double your apostrophes to escape them, and then the value cannot be misinterpreted. This can be trickier than sounds, though, since in a complex program you might not be able to identify which values come from an untrusted source, directly or indirectly, which are safely produced from trusted sources, and which are untrusted data that has already been properly escaped. These three types of data travel from method to method, following a winding path that obscures their source. The compiler is of no help, since it sees them all as just strings.
The solution, called "tainting" in languages like Perl and Ruby, is to make them not "just strings", but to create two types of data: tainted data, which is unsanitized data derived from an untrusted source, and untainted data, which is either derived from a trusted source or already sanitized. For our example, suppose we're performing our queries with SqlDataReader. We create a wrapper class for it which has an identical interface except that wherever SqlDataReader takes a string, our wrapper takes a SafeSqlString. This new type is simply a wrapper for string that can only be generated in a few strictly limited ways:
What does this buy us? Now, as long as we use our new SqlDataReader wrapper for all database access, we are essentially immune to SQL injection: the only way it could sneak in is through AssumeSafe(), which is obviously dangerous and makes the code easy to review. As an added bonus, double escaping can never happen: the data only gets escaped once as it changes from string to SafeSqlString, and Escape() cannot be applied to SafeSqlString.
Because these types flow through the program, we can specify that a particular method always returns a SafeSqlString, or assumes that its arguments are SafeSqlStrings, or assumes that some private variable is always safe, and enforce this statically instead of by convention. This is important, because it allows us to use AssumeSafe() at a point where it's obvious that the string is safe, as opposed to five calls up the call stack where we really have no idea who else might be calling us.
What else can we encode in types? All sorts of things! Here are some more examples:
In short, wherever you find yourself wanting to keep track of a static property as it flows through a program, types are a great way of doing so - use them to catch errors sooner.
In programming languages with more sophisticated type systems, you can encode even more information in types. Taking this to the extreme are dependent types, a relatively new system of types based on values that are extremely difficult to typecheck but so powerful that they can express many correctness properties of real programs. A number of research languages such as Dependent ML and Epigram already implement dependent types. With the power of systems like these, maybe in a few years the primitive C-based type systems of C# and C++ will look a lot like assembly language does now. But before we can take advantage of new technology, let's start making better use of what we have - encode your program's properties using types.