I'm just sayin'

  • A Tale of Two Compilers

    In previous posts, I have hinted at the fact that there is more than one C# compiler on a machine with Visual Studio and .NET Framework installed. Sometimes there are several.

    Simply put, when we release Visual Studio we release a compiler referred to as the in-process compiler, or in-proc compiler. We generally also release a new version of the .NET Framework at the same time. In the .NET Framework we also ship a separate compiler: the framework compiler, CSC.EXE. The in-proc compiler is tucked away in a Visual Studio DLL containing a bunch of other code as well. The presence of multiple compilers can result in an awkward servicing story and general confusion.

    Why two compilers? Well, that wasn't the original plan, but late in the Visual Studio 2005 cycle, the plan changed. The reason that Visual Studio doesn't just use the framework compiler is for performance. Using the in-proc compiler avoids the cost of spinning up another process, and it also reuses a database of interned strings. These issues may not seem significant today, but at one time they had a real performance impact.

    The downside of using the in-proc compiler is that it limits scalability. The Visual Studio address space is pretty crowded, and compiling large projects takes up a lot of address space. This wouldn't be a concern if Visual Studio spawned CSC.EXE for the build. So if you run into build scalability issues with a Visual Studio full build, you can often work around them by invoking MSBUILD.EXE from the command line supplying the .SLN file.

    I hear you saying, "But my Visual Studio is spawning CSC.EXE. I see a message saying so in the output window." When the output window of Visual Studio tells you the command-line it is invoking CSC.EXE with, don't believe it. Visual Studio is calling the in-proc compiler with the equivalent switches, not the framework compiler. This may change for future versions of Visual Studio, but in Orcas, you're getting the in-proc compiler.

    The presence of two compilers per release can pose a problem for servicing. Visual Studio and .NET Framework generally have two different servicing schedules. This means that sometimes users may see a fix in Visual Studio but not in CSC.EXE or vice versa. For service packs to Visual Studio 2005 and .NET Framework 2.0 this is definitely the case. Several more bugs were fixed in CSC.EXE in .NET Framework 2.0 SP1 (and SP2) than were in Visual Studio 2005 SP1. Thankfully, servicing of .NET Framework 3.5 and Visual Studio 2008 is happening at almost the same time. Right now we're working on .NET Framework 3.5 SP1 and Visual Studio 2008 SP1 and all fixes made so far have been made to both compilers.

    Speaking of servicing, we really appreciate all of the defect reports we get through Connect. I've heard a lot of grumbling about Connect, but do know that the C# compiler team places very high value on reports coming through this forum.

  • Problems Upgrading from .NET Framework 3.5 Beta 2

    I've heard of reports of problems upgrading the C# command-line compiler from the Beta 2 version to the final release, and indeed there is a slight problem. The problem rests with the version number of the compiler's resource DLL.

    If you've upgraded from Beta 2 of Visual Studio 2008 to the final release version or if you had previously installed the Beta 2 version of the 3.5 .NET framework and upgraded, your resources may not have been upgraded. Run csc.exe from the 3.5 directory and check the reported version.

    C:\WINDOWS\Microsoft.NET\Framework\v3.5>csc
    Microsoft (R) Visual C# 2008 Compiler Beta 2 version 3.5.30206
    for Microsoft (R) .NET Framework version 3.5
    Copyright (C) Microsoft Corporation. All rights reserved.

    If you're seeing the red text, your resources were not upgraded correctly, and you may be getting incorrect error messages or worse ones than that of the released product. The reason for this failure to upgrade is that the resource DLL, cscompui.dll, which resides in a subdirectory of "v3.5" such as "1033," has a larger version number than that of the final released product, and setup did not upgrade it.

    To get the correct resources, manually rename cscompui.dll to something else and then re-run either VS 2008 or the .NET Framework install, repairing the installation. Afterwards confirm that you have a new cscompui.dll.

  • Breaking Change in Linq Queries Using Explicitly-Typed Range Variables

    There's a change coming in .NET Framework 3.5 Service Pack 1 that will affect some programs containing queries that explicitly specify the type of the range variable. The affected queries are those whose range variable type differs from the element type of the sequence being queried and the element type cannot be converted to the range variable type via reference conversion or boxing/unboxing conversion. Whew, that was a bunch of spec-speak. To help understand, consider the following query.

    var floats = new ArrayList { 2.5f, 3.5f, 4.5f };
    var ints = from int i in floats
                select i;
    

    Iterating over this query yields some surprising results, {2, 4, 4}. Why not {2, 3, 4} as one might expect? To see why this happens, let's start with the compiler's translation of this query into a series of method calls. The above query expression is rewritten into the following.

    var ints = floats.Cast<int>().Select<int,int>(i => i);

    Follow the flow of type information through this query. The source sequence "floats" is an ArrayList implementing IEnumerable. Cast<int>() takes this sequence as IEnumerable and returns a sequence implementing IEnumerable<int>. Select<int,int>() acts upon that sequence and returns another sequence of IEnumerable<int>. Now look at the signature of Cast<T>.

    public static IEnumerable<T> Cast<T>(this IEnumerable source)

    This method's purpose is to convert a non-generic IEnumerable sequence of some type T (or boxed T as the case may be) to IEnumerable<T> for use as an argument to the subsequent sequence operators which must know the compile-time type of the sequence elements. It sounds simple enough, and it should be, but due to a late-game foul up in development, it's not.

    The body of Cast<T> should effectively have these semantics: roll through the sequence converting each element to the target type T, iterator style. Something like this.

    foreach (object obj in sourceSequence) yield return (T)obj;

    Now, looking back at the original query, if Cast<T> were implemented with these semantics, a runtime exception would occur at (T)obj, the cast from boxed float to int. Can't do that. You have to convert from boxed float to float. Then you can convert to int.

    But this isn't the shipping semantics of Cast<T>, and "magically" you can convert the sequence of boxed floats to a sequence of ints, you just get, uh, Banker's rounding as opposed to truncation. Banker's rounding (round to even) is not the C# user's expected behavior when converting float to int. I'm not sure it's anyone's expected semantics, but, sadly, it is what we shipped.

    The fix

    In .NET Framework 3.5 Service Pack 1 (SP1) we are going to return Cast<T> to its intended semantics described above. Not only is the current behavior not intuitive, it's slow as Christmas. But fixing this is obviously a breaking change. Once you get SP1 you may find that queries which once worked now throw exceptions. That's not great, but it's something that can easily be dealt with by developers - change the type of the range variable to match the collection element type and then, as necessary, add casts where the range variable is used.

    But one important thing to understand about this change is that the breaking change is in the .NET Framework libraries, not the compiler. Cast<T> is a framework method. This means that if your application contains a problematic query and has been distributed to users, it will begin to throw when your user gets SP1.

    Avoid the problem altogether by omitting the range variable type

    The call to Cast<T> in the above query expression was introduced by the rewriter in response to the presence of an explicitly-typed range variable. That's how the syntactic rewrite rules are specified. But if you omit the "int" in "from int i" no call to Cast<T> is generated.

    Specification of the type of the range variable is optional in the query syntax, but if you're using a collection that only implements IEnumerable, you've got to specify it. On the other hand, when using a collection that implements IEnumerable<T> you can, and should, omit the range variable type. Not only does it avoid this entire can of worms, but it has the performance benefit of omitting an unneeded iterator in the chain of iterators mentioned before.

  • The Win32Manifest Switch

    In order for managed applications to play nice with Vista, specifically to avoid virtualization when writing to special areas of the registry and filesystem, the VS2008 and .NET 3.5 C# and VB compilers write a manifest in the native resource section of EXEs. The C# and VB compilers now support a pair of switches, "Win32Manifest" and "NoWin32Manifest," to manage this embedding.

    The MSDN documentation on these switches is pretty good. Read it carefully and you'll notice a comment saying "the compiler inserts a standard application name 'MyApplication.app' into the xml" to workaround a problem on Win2K3. This is not exactly true. Yes, the <assemblyIdentity> element is present in the manifest to work around a defect on Win2K3, but the compiler doesn't insert this element into the manifest. Instead the manifest is treated as opaque data by the compiler and simply put in the correct place of the PE. If you don't specify one and you don't tell the compiler not to put the default in, then the compiler blindly writes in a default, static manifest.

    If you need to write your own custom manifest, maybe to specify a different execution privilege level, you gotta get it right (duh). Combine the absence of compile-time verification of the manifest with the fact that the manifest section of a PE is interpreted by the Windows loader at runtime, and you can produce spectacular failures from seemingly innocuous typos. But there is one failure that is more subtle and easily goes unnoticed without thorough testing.

    On Win2K3, if your managed application has an embedded manifest, and the manifest does not contain the optional <assemblyIdentity> element, then the CLR will not be able to locate your exe.config file and any assembly redirects (or anything else) you may have specified in there will not be honored. Interestingly, the "name" and "version" attributes of the <assemblyIdentity> element need not be meaningful to work around the problem. The element just has to be present, contain those attributes, and be well-formed.

    That's the important part of this post, but perhaps you're wondering how we arrived at the decision to label every app produced by the managed compilers as "MyApplication.app" in the manifest...

    The original design of this feature, writing a manifest into the output, called for embedding a static file. There was not a strong motivation to add validation or comprehension of its contents. Our original default manifest didn't contain the <assemblyIdentity> element. No need. We were just trying to address Vista's UAC needs. But during testing we uncovered the defect on Win2K3 and then the workaround for it. At that point we had a choice to begin understanding the manifests and to place the correct application name in the "name" attribute, or continue to use a static manifest but use a meaningless application name. We chose the latter. There's no good reason or discoverable way to look at this name programmatically as far as I know. And expanding the scope of the feature and resetting the testing was just not worth the gain.

  • NXCOMPAT and the C# compiler

    The C# compiler in Visual Studio 2008 and the .NET 3.5 Framework (csc.exe) is now generating PE files with the NXCOMPAT bit set. What is that bit and who cares, you ask? You may very well care if your application interops with native binaries or exposes a plugin model to 3rd parties. First, some background...

    DEP is short for Data Execution Prevention. It is a technology that exists in Microsoft operating systems which prevents execution of code from memory pages which are not marked as executable. DEP exists to reduce the attack surface available to malicious software that is trying to hijack a process, and it has been acknowledged to be very helpful in that regard. In Windows Vista, the set of processes and applications to which DEP is applied is configurable by administrators, but there is also a role for application developers.

    In the header of a PE file there is a flag called IMAGE_DLLCHARACTERISTICS_NX_COMPAT. This flag affects whether or not the OS enables DEP for a process. Setting this flag tells the OS that the image is compatible with DEP. For executable images, if this flag is set, the process is run with DEP enabled unless the machine is configured with the DEP policy set to AlwaysOff. If the image is a DLL and the flag is set, the OS skips checking the DLL against a compatibility database which results in a small performance improvement. All of this applies to x86, 32-bit processes only. On a 64-bit OS, DEP is always enabled for 64-bit processes, but 32-bit processes are configured by the PE flag and system policy as described above. So how does one control the flag in the PE header?

    Since the C# compiler emits PE files which are MSIL only and therefore compatible with DEP, the output binaries from the VS 2008 and .NET 3.5 C# compilers have this flag set. Our expectation is that the vast majority of C# executables produced by these compilers will be part of a DEP-compatible application. For that reason we did not surface a compiler switch to configure the NXCOMPAT setting. Of course you can write a C# application that uses a native or mixed binary which is not compatible with DEP. Some ATL types in 7.1 and earlier used to do simple code generation into data pages which is a DEP no-no. If your application is generating IP_ON_HEAP exceptions, then you may need to clear the IMAGE_DLLCHARACTERISTICS_NX_COMPAT flag for your executable. To do this you can use EDITBIN.EXE from the VC toolset like so:

    editbin.exe /NXCOMPAT:NO <your binary>

    If you're using Visual Studio, you can add a post build step to your executable's project. You'll need to setup the environment so that EDITBIN's dependencies can be resolved. Since the post build steps you author in Visual Studio's properties page are written into a batch file that is launched by the build process, you can use Visual Studio's VSVARS32.BAT to establish the right environment. My post build step looks like this:

    call $(DevEnvDir)..\tools\vsvars32.bat
    editbin.exe /NXCOMPAT:NO $(TargetPath)

    If you sign the binary in Visual Studio, flipping the IMAGE_DLLCHARACTERISTICS_NX_COMPAT flag in the post build step after the binary has been signed will result in an assembly that will fail strong name validation. To work around this you'll need to begin signing your binary as part of the post build steps. To do this, use SN.EXE from the Windows SDK.

    The .NET 2.0 and VS 2005 compilers are also affected

    We like DEP so much that when you install .NET Framework 2.0 SP1 your C# compilers in VS 2005 and .NET Framework 2.0 will also begin to emit binaries with the IMAGE_DLLCHARACTERISTICS_NX_COMPAT bit set. This will undoubtedly surprise a few developers...download a framework service pack, recompile, run your app, and you're now getting IP_ON_HEAP exceptions. Obviously this is not ideal, but aggressively building a computing ecosystem filled with DEP-enabled applications and their accompanying security benefits is very beneficial to Windows users. If you begin to encounter IP_ON_HEAP exceptions after installing .NET Framework 2.0 SP1, you can use the same technique described above to clear the IMAGE_DLLCHARACTERISTICS_NX_COMPAT bit in VS 2005. The only difference I'm aware of is that the SDK (and therefore the location of SN.EXE) has moved.

  • Please allow me to introduce myself

    I'm the Development Lead for the Visual Studio and .NET Framework C# compilers. In addition to the compiler, my team owns the Visual Studio debugger expression evaluator for C#, the debugger visualizer framework, the Linq to objects APIs as well the Linq expression tree APIs and compiler. Whew, no wonder I'm tired at the end of a work day.

    I've worked on the compiler team for 3 years and acted as Development Lead for most of the VS 2008 product cycle. I'm very proud of what my and the other Visual Studio teams have been able to accomplish in VS 2008, particularly the language features needed for the Linq experience as well as the plumbing for a more accurate and complete C# intellisense experience in Visual Studio.

    With this blog I hope to be able to provide some nuts and bolts information about the compiler and other technologies my team is responsible for.


© 2009 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker