Welcome to MSDN Blogs Sign in | Join | Help

Working with Signed Non-Decimal and Bitwise Values [Ron Petrusha]

Recently, a number of questions have surfaced about the accuracy of the .NET Framework when working with the binary representation of numbers. (For example, see https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=295117.) The issue surfaces most clearly when we convert the hexadecimal or octal string representation of a numeric value that should be out of range of its target data type to that data type. For example, in the following code we would expect that an OverflowException would be thrown when we increment the upper range of a signed integer value by one, call the Convert.ToString method to convert this integer value to its hexadecimal string representation, and then call the Convert.ToInt32 method to convert the string back to an integer. Here is the C# code:

const int HEXADECIMAL = 16;
// Increment a number so that it is out of range of the Integer type.
long number = (long)int.MaxValue + 1;
// Convert the number to its hexadecimal string equivalent.
string numericString = Convert.ToString(number, HEXADECIMAL);
// Convert the number back to an integer.
// We expect that this will throw an OverflowException, but it doesn't.
try {
    int targetNumber = Convert.ToInt32(numericString, HEXADECIMAL);
    Console.WriteLine("0x{0} is equivalent to {1}.",
                      numericString, targetNumber);
}
catch (OverflowException) {
    Console.WriteLine("0x{0} is out of the range of the Int32 data type.",
                      numericString);
}  

And here is the equivalent Visual Basic code:

Const HEXADECIMAL As Integer = 16

' Increment a number so that it is out of range of the Integer type.
Dim number As Long = CLng(Integer.MaxValue) + 1
' Convert the number to its hexadecimal string equivalent.
Dim numericString As String = Convert.ToString(number, HEXADECIMAL)
' Convert the number back to an integer.
' We expect that this will throw an OverflowException, but it doesn't.
Try
    Dim targetNumber As Integer = Convert.ToInt32(numericString, HEXADECIMAL)
    Console.WriteLine("0x{0} is equivalent to {1}.", _
                      numericString, targetNumber)
Catch e As OverflowException
    Console.WriteLine("0x{0} is out of the range of the Int32 data type.", _
                      numericString)
End Try

Instead of the expected OverflowException, this code produces what is apparently an erroneous result:

0x80000000 is equivalent to -2147483648

If we look at the binary rather than the decimal and hexadecimal representations of this numeric operation, the source of the problem becomes readily apparent. We began with Int32.MaxValue:

Bit #:  3         2         1
       10987654321098765432109876543210

       01111111111111111111111111111111

For Int32.MaxValue, each bit except the highest order bit of the 32-bit value is set. This represents the maximum value of a signed integer because the single unset bit is the sign bit in position 31. Because this bit is unset, it indicates that the value is positive. We then increment Int32.MaxValue by 1. Note that the variable to which we assign the new value is an Int64; we cannot assign the value to an Int32 without exceeding the bounds of the Int32 data type and causing an OverflowException to be thrown. The new bit pattern of the resulting value is:

Bit #:    6         5         4         3         2         1
       3210987654321098765432109876543210987654321098765432109876543210

       0000000000000000000000000000000010000000000000000000000000000000

So incrementing Int32.MaxValue by one sets bit 31 and clears bits 0 through 30. Bits 32 through 62 remain unset and the sign bit in position 63 is set to 0, which indicates that the resulting value is positive.

Because leading zeroes are always dropped from the non-decimal string representations of numeric values, the call to Convert.ToString(value, toBase) produces a binary string whose length is 32:

Bit #:  3         2         1
       10987654321098765432109876543210

       10000000000000000000000000000000

This suggests that the unexpected output produced by our code is the result of two different programming errors. First, we’ve inadvertently allowed the string representation of a 64-bit signed integer value to be interpreted as the string representation of a 32-bit signed integer value.  Second, by ignoring how signed and unsigned integers are represented, we’ve allowed a positive integer to be misinterpreted as a signed negative integer. Let’s look at each of these issues in some detail.

Accidental Change of Type

Ordinarily, the C# compiler enforces type safety by prohibiting implicit narrowing conversions, and the Visual Basic compiler can be configured to prohibit implicit narrowing conversions by setting Option Strict on. This constraint means that, in order to successfully compile code that performs a narrowing conversion, the developer must explicitly use a C# casting operator or a Visual Basic conversion function. This, of course, requires that the developer be aware of the narrowing conversion. In other words, handling a narrowing conversion is the responsibility of the developer.

For example, if the previous code is rewritten so that it does not have to parse the string representation of a numeric value, we must deal with the fact that an Int64 cannot be safely converted to an Int32. The resulting C# code is:

// Increment a number so that it is out of range of the Integer type.
long number = (long)int.MaxValue + 1;
// Convert the number back to an integer.
// This will throw an OverflowException if the code is compiled 
// with the /checked switch.
try {
    int targetNumber = (int)number;
    Console.WriteLine("Converted {0} to a 32-bit integer.", targetNumber);
}
catch (OverflowException) {
    Console.WriteLine("{0} is out of the range of the Int32 data type.",
                      number);
}  

If Option Strict is set on, the resulting Visual Basic code is:

' Increment a number so that it is out of range of the Integer type.
Dim number As Long = CLng(Integer.MaxValue) + 1
' Convert the number back to an integer.
' This will throw an OverflowException.
Try
    Dim targetNumber As Integer = CInt(number)
    Console.WriteLine("Converted {0} to a 32-bit integer.", targetNumber)
Catch e As OverflowException
    Console.WriteLine("{0} is out of the range of the Int32 data type.", _
                      number)
End Try

Conversions can still produce overflows at run time, but at least the compiler alerts the developer that an overflow is possible and should be handled. However, because our original example converted a numeric value to its string representation and then converted it back to a numeric value, we’ve bypassed the safeguards that the compiler implements to alert us to the possibility of data loss in a narrowing conversion. To put it another way, the developer is solely responsible for ensuring type safety and for handling conversions when converting between numbers and their string representations. Had our code enforced type safety, it would have converted the string representation of Int32.MaxValue + 1 to an Int64 value rather than an Int32 value, as the following C# code shows:

const int HEXADECIMAL = 16;

// Increment a number so that it is out of range of the Integer type.
long number = (long)int.MaxValue + 1;
// Convert the number to its hexadecimal string equivalent.
string numericString = Convert.ToString(number, HEXADECIMAL);
// Convert the number back to a long integer.
long targetNumber = Convert.ToInt64(numericString, HEXADECIMAL);
Console.WriteLine("0x{0} is equivalent to {1}.",
                  numericString, targetNumber);

The equivalent Visual Basic code is:

Const HEXADECIMAL As Integer = 16

' Increment a number so that it is out of range of the Integer type.
Dim number As Long = CLng(Integer.MaxValue) + 1
' Convert the number to its hexadecimal string equivalent.
Dim numericString As String = Convert.ToString(number, HEXADECIMAL)
' Convert the number back to a long integer.
Dim targetNumber As Long = Convert.ToInt64(numericString, HEXADECIMAL)
Console.WriteLine("0x{0} is equivalent to {1}.", _
                     numericString, targetNumber)

Working with Numeric Representations

A second serious source of error in our initial example is that we’ve failed to consider numeric representations and their effect on our conversion operation. This is a common source of errors in programs. However, while the compiler provides some safeguards against data loss in narrowing conversions, it provides no safeguards when the developer chooses to work with binary data directly. In these cases, ensuring that the representation of a number is appropriate for the operation being performed is always the responsibility of the developer. This is true whenever the developer works with binary (or octal or hexadecimal) data directly either as a sequence of bits (for example, when the developer performs bitwise operations on two values or as a byte array) or when the developer is working with the non-decimal string representation of a numeric value. Moreover, this is true of any platform and is not limited to Microsoft Windows or the .NET Framework. In particular:

  • When performing bitwise operations, such as a bitwise And, the developer must make sure that both operands share the same binary representation. If they do not, the result of the bitwise operation is invalid.
  • When converting the string representation of a number to its numeric equivalent, the developer must make sure that the numeric string representation is of the type expected by the conversion method or operator.

Our initial example produced unexpected results because we passed the string representation of what turned out to be an unsigned 32-bit integer to a conversion method, Convert.ToInt32(value, fromBase), that expected the value parameter to be the string representation of a signed 32-bit integer. Note that the actual result of this conversion depends on the particular magnitude of the 32-bit unsigned integer, as the following table illustrates.

Unsigned Integer Range Result
0 - 2,147,483,647 (or Int32.MaxValue) Successful conversion (no loss of data).
2,147,483,648 - 4,294,967,295 (or UInt32.MaxValue) value misinterpreted as a negative number.

A clearer illustration of the problems that result from working with binary values that have different numeric representations arises when we perform a bitwise operation on integers with different signs. For example, the Visual Basic code

Console.WriteLine(16 And -3)

produces a rather unexpected result of 16 when run under the common language runtime. This result reflects the fact that the runtime uses two’s complement representation for negative integers and absolute magnitude representation for positive integers. The following example illustrates why the result of this bitwise And operation is 16:

    00000000000000000000000000010000
And 11111111111111111111111111111101

    00000000000000000000000000010000

Although the .NET Framework uses two’s complement representation for signed integers, one’s complement representation is also in use on some platforms. We can determine the method of representation with the two utility functions shown in the following C# and Visual Basic code:

// C#
public class BinaryUtil
{
   public static bool IsTwosComplement()
   {
      return Convert.ToSByte("FF", 16) == -1;
   }

   public static bool IsOnesComplement()
   {
      return Convert.ToSByte("FE", 16) == -1;
   }
}
' Visual Basic
Public Class BinaryUtil
    Public Shared Function IsTwosComplement() As Boolean
        Return Convert.ToSByte("FF", 16) = -1
    End Function

    Public Shared Function IsOnesComplement() As Boolean
        Return Convert.ToSByte("FE", 16) = -1
    End Function
End Class

Performing the And operation with integers that have different signs then requires that we use a common method to represent their values. The most common method is a sign and magnitude representation, which uses a variable to store a number’s absolute value and a separate Boolean variable to store its sign. Using this method of representation, we can define the And operation as follows:

// C#
public static int PerformBitwiseAnd(int operand1, int operand2)
{
    // Set flag if a parameter is negative.
    bool sign1 = Math.Sign(operand1) == -1;
    bool sign2 = Math.Sign(operand2) == -1;

    // Convert two's complement to its absolute magnitude.
    if (sign1)
        operand1 = ~operand1 + 1;
    if (sign2)
        operand2 = ~operand2 + 1; 

    if (sign1 & sign2) 
        return -1 * (operand1 & operand2);
    else
        return operand1 & operand2;
}
' Visual Basic
Public Function PerformBitwiseAnd(ByVal operand1 As Integer, ByVal operand2 As Integer) As Integer
    ' Set flag if a parameter is negative.
    Dim sign1 As Boolean = (Math.Sign(operand1) = -1)
    Dim sign2 As Boolean = (Math.Sign(operand2) = -1)

    ' Convert two's complement to its absolute magnitude.
    If sign1 Then operand1 = (Not operand1) + 1
    If sign2 Then operand2 = (Not operand2) + 1

    If sign1 And sign2 Then
        Return -1 * (operand1 And operand2)
    Else
        Return operand1 And operand2
    End If
End Function

String Representations, Conversions, and Signs

While converting binary values to sign and magnitude representation solves the problem of performing binary operations on non-decimal numbers, it does not address either of the issues raised when converting the string representation of a non-decimal number to a numeric value. When performing such string-to-numeric conversions, the root of the problem lies in the fact that at the time it is created, the string representation of a number is effectively disassociated from its underlying numeric value. This can make it impossible to determine the sign of that numeric string representation when it is converted back to a number.

However, we can solve the problem of restoring a non-decimal value from its string representation by defining a structure that includes a field to indicate the sign of the decimal value. For example, the following structure includes a Boolean field, Negative, that is set to true when the numeric value from which a non-decimal string representation is derived is negative. It also includes a Value field that stores the non-decimal string representation of a number.

// C# 
struct NumericString {
   public bool Negative;
   public string Value;
}
' Visual Basic
Public Structure NumericString
    Public Negative As Boolean
    Public Value As String
End Structure

Storing a sign flag together with the string representation of a non-decimal number preserves the tight coupling between the string representation of a number and its sign. This in turn allows us to examine its sign field and to make sure that the appropriate conversion or action is taken when the string is converted back to a numeric value. For example, the following code defines a static (or Shared in Visual Basic) method named ConvertToSignedInteger that takes a single parameter (an instance of the NumericString structure defined previously) and returns an integer. The method throws an OverflowException if the string’s numeric value overflows the range of the Int32 data type. It also throws an OverflowException if the NumericString.Negative field is False, indicating that the numeric value is negative, but the sign bit is set in the numeric value represented by the NumericString.Value field. This indicates that the numeric value is positive but that its value lies in the range from Int32.MaxValue + 1 to UInt32.MaxValue, which lies entirely outside the range of the Int32 data type.  

// C#
class ConversionLibrary
{
   public static int ConvertToSignedInteger(NumericString stringValue)
   {
      // Convert the string to an Int32.
      try
      {
         int number = Convert.ToInt32(stringValue.Value, 16);
         // Throw if sign flag is positive but number is interpreted as negative.
         if ((! stringValue.Negative) && ((number & 0x80000000) == 0x80000000))
            throw new OverflowException(String.Format("0x{0} cannot be converted to an Int32.", 
                                        stringValue.Value));
         else
            return number;
      }
      // Handle legitimate overflow exceptions.
      catch (OverflowException e)
      {    
         throw new OverflowException(String.Format("0x{0} cannot be converted to an Int32.", 
                                     stringValue.Value), e);
      }
   }
}
' Visual Basic
Public Class ConversionLibrary
    Public Shared Function ConvertToSignedInteger(ByVal stringValue As NumericString) As Integer
        ' Convert the string to an Int32.
        Try
            Dim number As Integer = Convert.ToInt32(stringValue.Value, 16)
            ' Throw if sign flag is positive but number is interpreted as negative.
            If (Not stringValue.Negative) And ((number And &H80000000) = &H80000000) Then
                Throw New OverflowException(String.Format("0x{0} cannot be converted to an Int32.", _
                                            stringValue.Value))
            Else
                Return number
            End If
            ' Handle legitimate overflow exceptions.
        Catch e As OverflowException
            Throw New OverflowException(String.Format("0x{0} cannot be converted to an Int32.", _
                                        stringValue.Value), e)
        End Try
    End Function
End Class

Our initial code example returned an erroneous result when we incremented Int32.MaxValue by 1, converted it to a hexadecimal string, and then converted the string back to an integer value. When we perform the same basic set of operations using the NumericString structure and the ConvertToSignedInteger method, the result is an OverflowException. This is shown in the following code:

// C#
public class Executable
{
   public static void Main()
   {
      // Define a number.
      Int64 number = (long)Int32.MaxValue + 1;
      // Define its hexadecimal string representation.
      NumericString stringValue;
      stringValue.Value = Convert.ToString(number, 16);
      stringValue.Negative = (Math.Sign(number) < 0);
      ShowConversionResult(stringValue);
      
      NumericString stringValue2;
      stringValue2.Value = Convert.ToString(Int32.MaxValue, 16);
      stringValue2.Negative = Math.Sign(Int32.MaxValue) < 0;
      ShowConversionResult(stringValue2);
      
      NumericString stringValue3; 
      stringValue3.Value = Convert.ToString(-16, 16);
      stringValue3.Negative = Math.Sign(-16) < 0;
      ShowConversionResult(stringValue3);
   }
   
   private static void ShowConversionResult(NumericString stringValue)
   {   
      try {
         Console.WriteLine(ConversionLibrary.ConvertToSignedInteger(stringValue).ToString("N0"));
      }
      catch (OverflowException e) {
         Console.WriteLine("{0}: {1}", e.GetType().Name, e.Message);
      }
   }
}
' Visual Basic
Module Executable
    Public Sub Main()
        ' Define a number.
        Dim number As Int64 = CLng(Int32.MaxValue) + 1
        ' Define its hexadecimal string representation.
        Dim stringValue As NumericString
        stringValue.Value = Convert.ToString(number, 16)
        stringValue.Negative = (Math.Sign(number) < 0)
        ShowConversionResult(stringValue)

        Dim stringValue2 As NumericString
        stringValue2.Value = Convert.ToString(Int32.MaxValue, 16)
        stringValue2.Negative = Math.Sign(Int32.MaxValue) < 0
        ShowConversionResult(stringValue2)

        Dim stringValue3 As NumericString
        stringValue3.Value = Convert.ToString(-16, 16)
        stringValue3.Negative = Math.Sign(-16) < 0
        ShowConversionResult(stringValue3)
    End Sub

    Private Sub ShowConversionResult(ByVal stringValue As NumericString)
        Try
            Console.WriteLine(ConversionLibrary.ConvertToSignedInteger(stringValue).ToString("N0"))
        Catch e As OverflowException
            Console.WriteLine("{0}: {1}", e.GetType().Name, e.Message)
        End Try
    End Sub
End Module

When this code is executed, it displays the following output to the console:

OverflowException: 0x80000000 cannot be converted to an Int32.
2,147,483,647
-16
Posted by BCLTeam | 6 Comments
Filed under:

Where did BigInteger go? [Melitta Andersen]

This has been the subject of several recent feedback e-mails we’ve received.  Moreover, a few recent correspondents were kind enough to point out that not only did we remove it, but we didn’t say anything about it.  I apologize for that.  We weren’t trying to be sneaky about it.  When I made my introductory post there were several comments about BigInteger and I’d mentioned that it would not be included in the release in a reply to one of them.  It was also brought up in a comment on Inbar’s post about the .NET 3.5 Beta release.  With all of the comment activity about it on the blog, I forgot that it hadn’t made it up to a regular post.

So why was BigInteger cut?  The basic rationale behind making BigInteger internal was that it just wasn't ready to ship.  We thought our implementation met the needs for a BigInteger type.  But then we had some other teams take a look at it and they pointed out some performance and compatibility issues that we just didn't have time to fix before we shipped.

It was a really tough call, but we decided that rather than have people write a bunch of code dependent on a BigInteger class that we wanted to revamp or replace, we would pull it from 3.5 and make sure we resolved the issues before we made it available.

There’s not too much I can say about the current status of BigInteger.  We do know that you want it.  Since we didn’t get it into 3.5, we’re looking into how we can get it out there but we don’t yet know when we’ll be able to.  We’re also looking into other possible investments in numerics.

That’s the story.  If you have some scenarios or applications you definitely need a BigInteger for, I'd appreciate getting that feedback.  If you’ve already posted comments to this effect on another entry or given me feedback in some other way, there’s no need to repost here.  I’ve looked at what you said.  But I want to make sure the type does what it needs to when we release it, so if you’ve got something new, feel free to share.

Thanks,
Melitta

Posted by BCLTeam | 9 Comments

Parallel Extensions CTP and the Parallel Computing Developer Center [Judd Hall]

The CLR Team has been working with the Parallel Computing Platform Team for the past year on some innovative ideas in parallel computing.  Yesterday, the Parallel Computing Platform Team announced the Parallel Computing Developer Center along with their first Community Technology Preview (CTP) of Parallel Extensions to the .NET Framework.  We encourage you to download this early release CTP and provide feedback so that we can grow this technology together. 

Parallel Extensions is a managed programming model for data parallelism, task parallelism, and coordination on parallel hardware unified by a common work scheduler.  As such, it makes it easier for you to write programs that scale to take advantage of parallel hardware—providing improved performance as the numbers of cores and processors increase—without having to deal with many of the complexities of today’s concurrent programming models.  It does so via library-based support for introducing concurrency into applications written with any .NET language, including but not limited to C# and Visual Basic.

Two major components in Parallel Extensions are the Task Parallel Library (TPL), and Parallel LINQ (PLINQ), a technology extending the Language Integrated Query (LINQ) technology introduced in .NET 3.5.  As such, the CTP requires .NET Framework 3.5.

With TPL, you get the concept of Tasks, Futures, and Parallel loops, for starters.  So you can take the following:

for (int i = 0; i < 100; i++) {

    a[i] = a[i]*a[i];

}

And make it scalable across all the processors available:

Parallel.For(0, 100, delegate(int i) {

    a[i] = a[i]*a[i];

});

Similarly, with PLINQ, you get a query execution engine that accepts any LINQ-to-Objects or LINQ-to-XML query and automatically utilizes multiple processors or cores for execution when they are available.  As such, you can take a simple LINQ query:

IEnumerable<T> data = ...;

var q = data.Where(x => p(x)).Orderby(x => k(x)).Select(x => f(x));

foreach (var e in q) a(e);

And scale it:

IEnumerable<T> data = ...;

var q = data.AsParallel().Where(x => p(x)).Orderby(x => k(x)).Select(x => f(x));

foreach (var e in q) a(e);

And behind it all is a work-stealing task scheduler to reduce thread starvation — and this scheduler interleaves TPL tasks with PLINQ queries on-the-fly.

There are limitations of course, mostly related to making sure your parallel operations are independent and the such.  And there are known correctness bugs.  As such, it’s worth checking out the extensive documentation posted on the Parallel Computing Developer Center.  There you will find links to articles and videos, and tons of samples.

Links:

Posted by BCLTeam | 5 Comments
Filed under: ,

December 2007 Cumulative Time Zone Update is Now Available [Josh Free]

In case you have not heard, the December 2007 cumulative time zone update for Microsoft Windows operating systems is now available.  The update can be downloaded right now from http://support.microsoft.com/kb/942763; the software update will also be available via Windows Update on an upcoming date.  This update includes everything previously released in the August 2007 cumulative time zone update plus additional time zone changes that were singed in to law after the August 2007 update was created.  This includes updates to existing standard time zones such as Arabic, Australia (Central, Eastern, and Tasmania), Egypt, Israel, and South America (E. South America, Central Brazilian).

The most notable change is the inclusion of a new time zone for the capital of the Bolivarian Republic of Venezuela, Caracas:

Id

Display Name

Standard Name

Daylight Name

DST start

DST end

Venezuela Standard Time

(GMT-04:30) Caracas

Venezuela Standard Time

Venezuela Daylight Time

12/31/2007 at 24:00

Not applicable

The TimeZoneInfo class can use the new time zone by referencing its identification string “Venezuela Standard Time”

To date Venezuela has typically observed South America Western Standard Time but is in the process of migrating to the new time zone — adjusting clocks backwards 30 minutes, from UTC -4:00 to UTC -4:30.  According to the latest news reports on the official Venezuelan government news site (see translated page) the start date may or may not be changed again between now and the end of calendar year 2007.

Additional Information

For the latest information on time zone changes please refer to the Microsoft Daylight Saving Time & Time Zone FAQs Blog, Hot Topics for Daylight Saving Time changes in 2007, and http://www.microsoft.com/time/.

Update: Fixed typo in the table.

Posted by BCLTeam | 5 Comments
Filed under:

.NET Framework 3.5 Now Available! [Justin Van Patten]

.NET Framework 3.5 and Visual Studio 2008 have officially shipped!  Soma has the announcement on his blog and the downloads are available here.

There's over 250 new features in .NET 3.5 and Visual Studio 2008.  Here's a list of new BCL features available in .NET 3.5:

  • System.DateTimeOffset
    A new date time data structure that can specify an exact point in time relative to the UTC time zone.  DateTimeOffset is made up of a DateTime and offset. It includes most of the functionality of the current DateTime and allows seamless conversion to DateTime.  In addition, SQL Server 2008 adds full support for DateTimeOffset as a new column data type.  DateTimeOffset is the new preferred type to use for most date time scenarios.  For more guidance on when to use DateTime vs. DateTimeOffset refer to my blog post introducing DateTimeOffset.

  • System.TimeZoneInfo
    Comprehensive time zone support.  Previously the .NET Framework only supported conversions between Local and UTC times.  TimeZoneInfo enables enumeration of all the time zones supported by the operating system, conversion of date times from one time zone to another, and serialization of time zones across machines. This support includes cases where the Daylight Saving Time (DST) rules can change from one year to the next, such as the recent 2007 DST change in North America, and will do historically accurate conversions across these changes. Support is also provided for detecting invalid or ambiguous times caused by Daylight Saving Time, and creating custom time zones.  See Kathy's TimeZoneOffset Starter Guide (Note: TimeZoneInfo was previously called TimeZone2).  Also check out Josh Free's blog post on Exploring Windows Time Zones with TimeZoneInfo and Working with Ambiguous and Invalid Points in Time.

  • System.Collections.Generic.HashSet<T>
    A high-performance set collection.  HashSet is an unordered collection that contains unique elements. In addition to the standard collection operations, HashSet provides standard set operations such as union, intersection, and symmetric difference.  See Kim Hamilton's original blog post introducing HashSet.

  • System.IO.Pipes
    Support for anonymous and named pipes.  Pipes can be used to achieve inter-process communication (IPC) between any process running on the same machine, or on any other windows machine within a network.  Anyone familiar with streams should be comfortable using these new APIs to achieve IPC.  See my original blog post introducing Pipes.

  • System.Diagnostics.EventSchemaTraceListener
    EventSchemaTraceListener is highly tuned for logging performance. Similar to the XMLWriterTraceListender, this trace listener logs XML to disk. In particular, this type logs in the event schema, which is shared by some other new technologies. This tracelistener has performance which is drastically improved over previous logging tracelisteners, especially on machines with multiple processors. Additionally, this is the first trace listener which allows many different disk logging options, such as circular logging across multiple files.  See Inbar Gazit's blog post on EventSchemaTraceListener for more info.

  • System.Diagnostics.Eventing
    Better integration with Event Tracing for Windows (ETW), including an ETW provider and ETW trace listener APIs.

  • System.Threading.ReaderWriterLockSlim
    A lightweight reader/writer lock class.  ReaderWriterLockSlim supports basic read and write locks, allowing for better scalability for read-only concurrent worker scenarios. As its name implies, this lock performs anywhere from 2x to 5x better than the existing ReaderWriterLock class, and scales better on multi-processor and multi-core machines. This type also supports upgradeable-read support: if code needs to inspect some state before deciding to acquire the write-lock, upgradeable-reads allow concurrency-safe reading with an optional deadlock-free upgrade to write. Recursion is also disabled by default, helping to write correct code, with an optional recursive mode turned on at lock instantiation time.

  • System.Security.Cryptography
    Support for the "Suite B" set of cryptographic algorithms, as specified by the National Security Agency (NSA).  Cryptography developers can now use the FIPS-certified implementations of advanced SHA hashing algorithms and AES encryption algorithm in managed code. These classes follow the same familiar patterns as the existing cryptography algorithms, making it easy for developers to use the new classes right away.  Check out Shawn Farkas' blog post on the New Crypto Algorithms in .NET 3.5.

  • System.AddIn
    A new add-in hosting model that makes it easy for managed applications to host add-ins (i.e. plug-ins, add-ons, extensions, etc.) with support for Discovery, Activation, Isolation, Unloadability, Sandboxing, and Lifetime management of add-ins.  This is an exciting new feature that makes it easy to create add-ins for your managed applications without having to deal with all the associated isolation and security plumbing yourself.  The CLR Add-In Team Blog has a lot more information on the new add-in model.

Also, be sure to check out Jack Gudenkauf's blog on What's new in the .Net Framework 3.5.  He mentions some additional new CLR features (GC, Security, and ThreadPool) that you may find interesting.

Posted by BCLTeam | 10 Comments
Filed under: ,

Change in System.ServiceProcess shutdown is coming in 3.5 RTM [Inbar Gazit]

In all current versions of the Framework we do not close the actual service when we get a shutdown request from the OS. Instead we just call OnShutdown and hope that the user has overridden this method and called Stop() themselves. We found out that many developers didn’t know they had to do that and this is a major issue when shutting down Vista laptops as it can add 10-20 seconds, for each managed service, to the time you have to wait before the machine is turned off.

To remedy the situation we decided to make a late-game change to the behavior of System.ServiceProcess.ServiceBase.  After we call OnShutdown we will check to see if the service is in the stopped state or not and if it’s not (meaning the developer did not call Stop() themselves) we will call Stop() on your behalf. This guarantees that the service would be stopped quickly when a shutdown operation is taking place to reduce delay and improve the experience of the customer. This was the recommended way to override OnShutdown before. If you have this code — you don’t have to make any changes. If you don’t have this code, you can pick up 3.5 RTM when it’s available to have this issue fixed.

class MyService : System.ServiceProcess.ServiceBase {

    protected override void OnShutdown() {

        // any shutdown-specific code not included in your OnStop method

        this.Stop();

    }

}

We will update the documentation to reflect this.

Please let us know if this change affects you in any unintended way.

Posted by BCLTeam | 8 Comments

Dispose Pattern and Object Lifetime [Brian Grunkemeyer]

The Dispose pattern is the way to think of object lifetime in the .NET Framework.  Admittedly, it can be a little subtle.  A customer asked a question on our MSDN documentation for implementing the Dispose pattern.  I’ll get to this question, but let’s review some basics.

Basics of Disposing, Finalizing, & Resurrection

The Dispose pattern exists to help impose order on the concept of object lifetimes.  You would naively think that object lifetime is relatively trivial, but there are some rather daunting subtleties.  Fortunately, the Dispose pattern will help lead the way.  The basics here are assumptions that need to be agreed upon by library authors, developers using libraries, and language designers, so it’s important that everyone is on the same page.  Perhaps in another world this could have been designed differently, but we don’t live in that world.

First, here’s a restatement of the Dispose pattern (though you can find more in the Framework Design Guidelines, which were excerpted in this blog post from Joe Duffy).  A disposable type needs to implement IDisposable & provide a public Dispose(void) method that ends the object’s lifetime.  If the type is not sealed, it should provide a protected Dispose(bool disposing) method where the actual cleanup logic lives.  Dispose(void) then calls Dispose(true) followed by GC.SuppressFinalize(this).  If your object needs a finalizer, then the finalizer calls Dispose(false).  The cleanup logic in Dispose(bool) needs to be written to run correctly when called explicitly from Dispose(void), as well as from a finalizer thread.  Dispose(void) and Dispose(bool) should be safely runnable multiple times, with no ill effects. 

This pattern is part of the platform, and languages like managed C++ have assumed that library writers follow this pattern (mostly) correctly.

Next, let’s review the basics of how objects live & die, so we can avoid some unfortunate confusion that comes up later.  If you don’t know what finalization is, read this finalization intro on Maoni’s blog.  There are two distinct operations that often overlap — the lifetime of the object (ie, when it is in a usable state), and the duration of time that the GC commits memory for an object.  In a normal finalizable object’s lifetime, the GC commits memory for an object, then the CLR runs the object’s constructor, passing in the newly-committed memory as the “this” pointer for the object.  Note that the usable lifetime of the object is a subset of the lifetime for the committed memory in the GC heap.  Usually, developers think of this memory committing & constructor running as an identical operation.  These can be easily merged if you’re coming from C#, Visual Basic, and Java, because in those languages, there is no way of disentangling the two.  C++ is more interesting, allowing you to reserve some memory on the stack, then run a constructor on that block of memory using the placement new operator.  (Also, the managed String class uniquely uses a different calling convention — we run a constructor which then computes the length of the String instance necessary to hold data, allocates the memory, then it returns the new instance as the “this” pointer.)  Merging these two concepts is a perfectly acceptable simplification for constructing object instances in most languages, but the same doesn’t hold true when you free objects.

The end of the object’s usable lifetime, according to our Dispose pattern, is when the user calls the Dispose(void) method.  Then, at a later point in time, the garbage collector will detect that there are no outstanding references to an object, and it will try freeing the memory.  But first, the GC provides the object with an opportunity to clean up resources, called finalization.  This is a backstop to ensure that resources are freed if someone did not explicitly call Dispose(void), to ensure that the object’s lifetime is correctly terminated before we release memory.  This isn’t necessarily where a programmer intended to end the lifetime of an object.  But the awkwardness runs deeper.

Finalization is fundamentally different from ending an object’s lifetime.  From a correctness point of view, there is no ordering between finalizers (outside of a special case for critical finalizers), so if you have two objects that the GC thinks are dead at the same time, you cannot predict which finalizer will complete first.  This means you can’t have a finalizer that interacts with any finalizable objects stored in instance variables.  Also, finalization happens on a completely different thread, sometimes at a different priority level.  In future versions, perhaps the GC will require multiple finalizer threads, running your finalizers in parallel with themselves.  Some managed hosts (like SQL Server) do not allow users to define finalizers on their types.  Chris Brumme included a more complete list of restrictions, limits & surprises in his finalization blog post.  Reading through this might help you understand an obscure stress bug.

Additionally, both normal process exit & appdomain unloading complicate the picture for finalizers.  As you know, an application domain is essentially a process within a process, and each appdomain gets a separate copy of static variables.  When we unload appdomains, at some point, finalizable objects stored in static variables must be garbage collected.  At this phase during appdomain unloading, your finalizer cannot take a dependency on other finalizable objects, because all the finalizable objects reachable by static variables might be finalized.  Every method call might throw an ObjectDisposedException, or in a pathologically poorly written set of classes, stuff just doesn’t work right in weird ways.  Process exit should conceptually be similar to unloading all appdomains (but is subtly different — there’s no appdomain unload event) and runs into the same issue with statics being finalized.  Keep reading below for a solution.

There’s a complication to when the GC can release memory.  It’s possible that an object’s finalizer might store a reference to an object somewhere else in the GC heap, potentially even in a live object.  If so, during the next GC, the committed memory is still reachable from live GC roots, so the GC cannot release the memory.  This is called “resurrection”, where an object instance is raised from the dead to haunt the living with potentially inconsistent state.  Additionally, it’s possible that the finalizer might run again on the same instance, if someone called GC.ReRegisterForFinalize(). 

It should be obvious now that the Dispose(bool) method has two unrelated functions — ending an object’s lifetime, and a last-ditch attempt at ending an object’s lifetime in a more constrained environment.  Since the finalization logic is supposed to live in the Dispose(bool) method on the code path where the parameter is false, then it may be convenient to talk about finalization code to encompass both code in finalizers as well as in the Dispose(false) path. 

Developer Knowledge Gaps

Now, let’s talk about where the above really hurts people.  One example I’ve seen somewhat commonly in the .NET Framework is the assumption that the constructor for an object always completes successfully.  This is not true for two reasons.  The first is the obvious case where a constructor checks some precondition (such as whether a parameter is null) then throws an exception.  Most people write their finalization code to check one variable to see if it’s initialized, and if so, then clean up all the state in their object.  They happen to luck out usually in this first case, but not always (I’ve seen code that dereferences pointers that can be null without checking first).  The second reason is more subtle — asynchronous exceptions can occur basically between any two machine instructions in a managed method body, including constructors.  So it’s possible to initialize 2 of the 5 variables in your type, get a ThreadAbortException, then your finalizer runs.  This obviously doesn’t work.  Your finalization code needs to be more defensive than this.

Speaking of other reasons to be defensive, I mentioned above the appdomain unloading & finalizers don’t interact particularly well, producing the restriction that finalizers shouldn’t rely on static variables that may use finalizable objects.  This may not be practical in all scenarios, or you’d like to make an attempt at something anyways, such as writing to a log file.  There is a predicate exposed in the BCL to help you — Environment.HasShutdownStarted.  It exists solely to allow finalization code to figure out if they can depend on static variables.

Resurrection is not something most developers plan for.  Resurrection can cause some extremely wacky behavior, and thinking about it will hurt your head.  Trust me, I know.  Resurrection is the best reason to defensively add checks along every public entry point to a type that ensure an object is not disposed.  Thread safety is a close second reason to explicitly checking for a disposed state. 

In case people have missed this, SafeHandle is an excellent tool, giving you correctness benefits in addition to better reliability.  Please use it when accessing native resources. 

In a future version, we might consider adding a public IsDisposed predicate on disposable objects, to serve as a publicly consumable state flag usable in preconditions, as part of a much larger effort.

What Did Our Customer Want to Know?

Now that I threw all of that information at you, let’s get back to the original motivation for this post.  A customer read our MSDN docs for the Dispose pattern and was confused by the sample code.  That customer asked if we could clarify the example, and it requires three pieces:

  1. A simple wrapper class that exposes an unmanaged resource
  2. A subclass of a disposable type
  3. A class that wraps a disposable type

What’s in our MSDN Documentation Today?

I’m happy to report that we have a pretty good example of the first, though perhaps not quite in the place you’d like to see.  My “How to use SafeHandle” blog entry contains a good example in the middle — look for the type named SafeHandleDemoV2.  Our MSDN documentation includes a sample using IntPtr to represent native resources like handles & memory.  That works, but we’d prefer that you use SafeHandle for this purpose.  Use SafeHandle to ensure your libraries don’t leak resources, both to ensure long-running servers stay up & running, as well as to fix some relatively obscure security concerns. 

For the second item, the MSDN documentation has a sufficient example showing how to derive from a disposable type — see the MyResourceWrapper class.  The key part of the example is showing how to override Dispose(bool), and importantly, to call the base class’s Dispose(bool) method at the end. 

What Should We Add?

So, what did we miss from our MSDN documentation?  Finalization is not free, and it’s a feature that we don’t want spread throughout libraries for no good reason.  One curious part of our sample on MSDN is the base class defines a finalizer, which of course calls Dispose(false).  But does the finalizer need to exist on the base type?  The sample code is admittedly contrived, so the base type actually allocates native resources & requires a finalizer to serve as a backstop for people that didn’t call Dispose(void).  This is realistic sometimes, but suffers from two problems that should mean this isn’t a common case.  First, the code isn’t reliable — if it did use SafeHandle, then SafeHandle’s critical finalizer would be sufficient to free the underlying resource, and you could remove the finalizer from the BaseResource class.

The second reason why a disposable base type often doesn’t include a finalizer is that they sometimes aren’t necessary, at a certain layer of abstraction.  One common pattern is using an abstract base class, like Stream.  In that case, the base class makes no policy decisions about how data is represented, and as such, isn’t capable of determining whether a finalizer is needed.  Instead, individual subclasses should figure out whether a finalizer is needed, and if so, add one that calls Dispose(false).  For example, MemoryStream uses a managed byte[] internally, so it doesn’t need a finalizer to release any resources.  If Stream defined a finalizer, then MemoryStream would be finalizable, meaning you’d pay a little unnecessary perf penalty every time a MemoryStream object went dead (unless it was explicitly disposed). 

In terms of other information we should include in our MSDN documentation, pointing out the restrictions on finalizers is important when writing your Dispose(bool) method.  The lack of ordering among finalizers really hurt finalization code’s usefulness, while the shutdown issues mean you need to at least cognizant of when you must use Environment.HasShutdownStarted. 

For resurrection, the best thing we can mention is to ensure public entry points enforce the precondition that the current object instance has not been disposed.  Again, we should also remind people about SafeHandle.

Additionally, the Dispose pattern needs to be followed consistently.  We have some types in the .NET Framework where they do not properly follow the Dispose pattern.  While we’d like to fix them, we didn’t fix some of them in our previous release, often for schedule reasons.  The reasons why we need consistency are first & foremost, that languages (like managed C++) can and do take a dependency on the Dispose pattern.  Implementing the pattern correctly is critical for anyone subclassing your type, and the C++ compiler will convert the normal C++ idiom of a destructor into Dispose(true) code.  The compiler emits a lot of plumbing for developers, and this must be done in a sensible way. 

The second motivator for consistency is the consequences of failing to consistently expose object cleanup to subclasses.  If a base class uses Close (or a virtual Dispose(void) method) for its cleanup code then one subclass uses Dispose(bool), getting the wiring right between the methods is a little tricky.  With three subclasses each on different plans, you can either skip cleanup logic or run it multiple times, and it isn’t possible to disentangle the web.  Just follow the pattern — you’ll be better off for it later, and you won’t have to write graphs of various hypothetical subclass chaining rules between multiple conflicting rules.  I fixed up Stream to no use Close before .NET 2.0 shipped, and I had to review 60 subclasses of Stream within the Developer Division’s code base.  It was not pleasant.

So, where was your third item on the list above?

Our user wanted to see an example of a subclass that uses a disposable resource.  So here’s how I would write that type.  I’ve included a syntax error here, so you don’t forget to put in your own resource type for the instance field below.

// General-purpose skeleton of how a class should use a disposable resource.

// Real-world examples include StreamWriter, which wraps a disposable Stream.

// If you cut & paste this sample, replace the second “IDisposable” below with

// a real type that implements IDisposable. Also, consider whether you need to

// lazily initialize the resource, or whether allocating it in the constructor

// is sufficient.

// For more info on how to use SafeHandle in an IDisposable wrapper type like

// this, look at the SafeHandleDemoV2 on the CLR Base Class Library team’s blog:

// http://blogs.msdn.com/bclteam/archive/2006/06/23/644343.aspx

public class UsesADisposableResource : IDisposable

{

    privateIDisposable” _resource; // The field’s type should be some useful

    // type that implements IDisposable, such as Stream, TextReader, Process, EventLog, etc.

            private bool _disposed;

    // Invariant: This instance of UsesADisposableResource is disposed iff

    // _disposed is true. _resource may be lazily initialized or allocated in

    // the constructor, but its lifetime does not exceed the lifetime of

    // UsesADisposableResource.

 

    public UsesADisposableResource() {

        _resource = ...; // Often people initialize a resource here, or wrap

        // some disposable resource passed as a parameter to the constructor.

        _disposed = false;

    }

 

    public void Dispose() // Note that Dispose(void) is public and non-virtual

    {

        Dispose(true);

        GC.SuppressFinalize(this); // In case a subclass adds in a finalizer    

    }

 

    protected virtual void Dispose(bool disposing) {

        // Note: If you need thread safety, use a lock around these operations,

        // as well as in your methods that use the resource..

        if (!_disposed) {

            // Dispose of the underlying resource, but only if we’re eagerly

            // disposing. If we’re finalizing this instance, then the underlying

            // type might get finalized before this instance (due to a lack of

            // ordering among finalizable objects). The only exception to this

            // rule would be critical finalizable objects like SafeHandle, where

            // there is a very weak ordering: critical finalizable objects are

            // finalized after normal finalizable objects that the GC detects

            // are unreachable during the same collection.

            if (disposing) {

                if (_resource != null)

                    _resource.Dispose();

            }

            // Additionally, if we’re finalizing, ensure that we don’t rely on

            // static variables, or during appdomain unloading, we might find

            // that our static variables point to finalized instances of objects!

            // We can protect ourselves from that possibility by never touching

            // static variables in our cleanup logic if Environment.HasShutdownStarted

            // returns true. Most people never have the need to touch disposable

            // static variables during finalization, so it is easy to overlook this restriction.

            _resource = null;

            _disposed = true; // Indicates this instance has now been disposed.

        }

    }

 

    // Precondition: This instance must not be disposed.

    public void DoSomethingWithResource() {

        if (_disposed)

            throw new ObjectDisposedException();

        // If you are lazily initializing _resource, then ensure that _resource

        // has been initialized at this point, by calling a helper initialization

        // method.

 

        // Do something interesting here, like read or write to the underlying re