Welcome to MSDN Blogs Sign in | Join | Help

Simulating "Extension Interfaces" with Structs and Generics

While many still debate the value and appropriateness of Extension Methods in C# 3.0, I have always felt that there was still something missing. Extension methods allow me to add individual methods to classes; however, there are several cases where I have always wanted to add a new interface to a class that is sealed and outside my control. For lack of a better term, I would consider these an "Extension Interface."

Take for example the TextWriter and StringBuilder classes. Both of these classes serve the function of "sending text" to another object. In fact, these classes have similar methods (.Append() and .Write()) that support a wide-range of data types. I've always wished that they supported a common interface as follows. (For the purposes of this example, I only show two signatures, but you could imagine support for other common data types and options.)

public interface ITextSender
{
    void Write(char c);
 
    void Write(string s);
}

If I has such as interface, then I would be able to write algorithms that generate text output that could efficiently write to strings, files, etc.

One option, of course, would be to write a wrapper class (such as TextSender) that could be subclassed for various output scenarios. The downside is that this is an extra memory allocation on the heap. While .NET is optimized for small, short-lived objects, the reality is that there is still a very real cost to instantiating and destroying objects. Particularly if you are creating methods in libraries that are going to be called often, it's worth the effort to make them efficient both in terms of speed and memory.

Creating Wrappers Using Structs

The trick to simulating "Extension Interfaces" while maximizing performance is to use structs in conjunction with generics.

For example, using the ITextSender interface that I defined above, I can create two structs that implement the behaviors for both StringBuilder and TextWriter.

public struct StringBuilderTextSender : ITextSender
{
    private readonly StringBuilder output;
 
    public StringBuilderTextSender(StringBuilder output)
    {
        this.output = output;
    }
 
    public void Write(char c)
    {
        output.Append(c);
    }
 
    public void Write(string s)
    {
        output.Append(s);
    }
}
 
 
public struct TextWriterTextSender: ITextSender
{
    private readonly TextWriter output;
 
    public TextWriterTextSender(TextWriter output)
    {
        this.output = output;
    }
 
    public void Write(char c)
    {
        output.Write(c);
    }
 
    public void Write(string s)
    {
        output.Write(s);
    }
}

Example: Using the "Extension Interface"

Let's imagine that I want to create a simple encoder that converts new lines within a string to a "<br />". (To be flexible, let's consider a new line to be "\r", "\n" or "\n\r".)

We'll start by creating the base method that will take a generic "T" of that implements the interface ITextSender.

public static void ConvertLineFeeds<T>(string source, T output)
    where T : ITextSender
{
    char currChar;
    char lastChar = '\0';
 
    for (int i = 0; i < source.Length; i++)
    {
        currChar = source[i];
 
        if (currChar == '\r' || (currChar == '\n' && lastChar != '\r'))
            output.Write("<br />");
        else if(currChar != '\n')
            output.Write(currChar);
 
        lastChar = currChar;
    }
}

It is extremely important that the base method take a generic as opposed to a parameter of type ITextSender! If our parameter was an interface, than our struct would be boxed into an object and it would have defeated the performance objective.

Now that we have the base method, we can then quickly create three overloads to support output to String, StringBuilder and TextWriter.

public static string ConvertLineFeeds(string source)
{
    StringBuilder output = new StringBuilder(source.Length);
 
    ConvertLineFeeds(source, output);
 
    return output.ToString();
}
 
public static void ConvertLineFeeds(string source, StringBuilder output)
{
    ConvertLineFeeds<StringBuilderTextSender>(source, new StringBuilderTextSender(output));
}
 
public static void ConvertLineFeeds(string source, TextWriter output)
{
    ConvertLineFeeds<TextWriterTextSender>(source, new TextWriterTextSender(output));
}

We can test our methods using a simple bit of test code. Each of the following variations will return "Line 1<br />Line 2".

public void Test()
{
    //
    //  Test as String
    //
 
    Console.WriteLine(ConvertLineFeeds("Line 1\r\nLine 2"));
 
    //
    //  Test as StringBuilder
    //
 
    StringBuilder sb = new StringBuilder();
    ConvertLineFeeds("Line 1\r\nLine 2", sb);
    Console.WriteLine(sb.ToString());
 
    //
    //  Test as TextWriter
    //
 
    ConvertLineFeeds("Line 1\r\nLine 2", Console.Out);
    Console.WriteLine();
 
}

Performance

There are several benefits to this approach:

  • There is no object instantiation require to implement the interface.
  • Methods that implement generics are compiled for each combination of types passed when calling the method. This provides a unique opportunity for the method to be optimized for each type. Members of the type parameter could be potentially inlined in ways that may not be possible if you did not use generics.
  • structs can never be null and member calls are very efficient.

In .NET Framework 3.5 and earlier, there are a number of optimizations that do not apply to structs; however, we are told that we can expect improvements in the area in the next Framework release.

If you found this useful, drop me a line!

Happy Coding!

Posted by wifry | 2 Comments
Filed under: ,

Creating Text Summaries from HTML

On several projects, I have had the need to convert large HTML blobs into short text summaries that can be displayed in a list. For example, in SharePoint I often need to display lists of Publishing Page content and I want to summarize some of the HTML columns.

This blog post provides code and describes the process for converting HTML to a text summary.

The Process

There are three basic steps to the process:

  • Strip out all HTML tags (tags, comments and CDATA)
  • Normalize the whitespace (reduce multiple spaces, tabs and line feeds into single spaces)
  • Truncate and cleanup the result and append ellipses (if necessary)

Regular Expressions

In order to remove the HTML content, I used several regular expressions that I concatenate together to create one large regular expression.  (In order to keep my sanity, I store several smaller regular expressions in separate strings that I concatenate together.)

It is obvious that we will want to remove normal HTML tags and a comments, but it is less obvious that we want to remove CDATA.  CDATA tags are not that common and if we were to include their content, we would need to HTML-encode the contents; it is much easier to simply remove them.

The first three patterns below represent the "contents" of tags (the stuff in between the "<" and ">").  The fourth pattern concatenates the results inside the opening/closing brackets.

string TagContentsRegexPattern = @"(?:[^\>\""\']*(?:\""[^\""]*\""|\'[^\']*\')?)*";
string CommentContentsRegexPattern = @"\!\-\-.*?\-\-";
string CDataContentsRegexPattern = @"\!\[CDATA\[.*?\]\]";
 
string HtmlTagCommentOrCDataRegexPattern = @"\<(?:" + CommentContentsRegexPattern
                                           + "|" + CDataContentsRegexPattern
                                           + "|" + TagContentsRegexPattern + @")\>";

The final combined regular expression for identifying HTML tags is below:

\<(?:\!\-\-.*?\-\-|\!\[CDATA\[.*?\]\]|(?:[^\>\"\']*(?:\"[^\"]*\"|\'[^\']*\')?)*)\>

In the final code, you will a method called StripTags that replaces these tags with an empty string. 

It was difficult to choose whether replace tags with a space or a zero-length string.  I ultimately choose to use a zero-length string which introduces the possible risk of incorrectly concatenating two words together (for example, if two <p> tags had no whitespace between them).  In this case, I felt it was a better choose to incorrectly combine two words rather than introduce extra whitespace.  An improvement to the code might be to detect certain tags such as <p> and <td> and always convert those to spaces.

Normalizing Whitespace

The NormalizeWhitespace method is responsible for converting sequences of whitespace (including space, tabs and linefeeds) into a single space. The string is also effectively trimmed so all whitespace at the start or end of the string is removed.

Truncate and Cleanup

Once we have removed the tags and normalized the whitespace, it's time to truncate the results.

If life were simple, we would simply truncate the string at a particular length; unfortunately, it's a bit more complicated. To do a "great job", we perform the following steps:

  • If the result is longer than our maximum length, we truncate.
  • Next, we need to determine if we accidentally broke an HTML entity.  For example, imagine if we accidentally truncated "&amp;" in the middle and ended up with "&am", this would effectively corrupt the output.  To fix this problem, we look for the last "&" and the last ";" and fix the problem if it exists.
  • Next, we look for the last space and truncate there.  That way, we can avoid splitting a word in the middle.
  • Finally, we append the ellipse if needed.

But wait... we didn't decode the HTML entities!?!?!?!

That's correct. The assumption for this method is that we ultimately want to rewrite the result into an HTML stream; therefore, we can leave the entities as they are. Do not run the results of these methods through a function that HTML-encodes; otherwise, your output will be double-encoded!

The Final Code

Our final code is listed below:

using System;
using System.Text;
using System.Text.RegularExpressions;
 
namespace Core.Web
{
    public class HtmlToText
    {
        //
        //  Html Tag Regex Patterns
        //
 
        public static readonly string TagContentsRegexPattern = @"(?:[^\>\""\']*(?:\""[^\""]*\""|\'[^\']*\')?)*";
        public static readonly string CommentContentsRegexPattern = @"\!\-\-.*?\-\-";
        public static readonly string CDataContentsRegexPattern = @"\!\[CDATA\[.*?\]\]";
 
        public static readonly string HtmlTagCommentOrCDataRegexPattern = @"\<(?:" + CommentContentsRegexPattern
                                                                        + "|" + CDataContentsRegexPattern
                                                                        + "|" + TagContentsRegexPattern + @")\>";
 
        public static Regex FindTagRegex = new Regex(HtmlTagCommentOrCDataRegexPattern,
                                                    RegexOptions.Multiline | RegexOptions.Singleline
                                                    | RegexOptions.Compiled | RegexOptions.ExplicitCapture);
 
        public static string CreateHtmlSummary(string s, int maximumLength, bool appendEllipse)
        {
            string result;
 
            if (s == null)
                result = null;
            else if (s.Length == 0 || maximumLength <= 0)
                result = "";
            else
            {
                //  Remove Tags...
                result = StripTags(s);
 
                //  Normalize Whitespace...
                result = NormalizeWhitespace(result);
 
                if (result.Length > maximumLength)
                {
                    int truncateLen = maximumLength;
 
                    //
                    //  Find the last position of the "&" and ";".
                    //  If the last ";" is not after the last "&"
                    //  then we have split an Entity and need to truncate
                    //  before the "&"...
                    //
 
                    int lastAmpersandPosition = result.LastIndexOf('&', truncateLen - 1);
 
                    if (lastAmpersandPosition != -1)
                    {
                        int lastSemicolonPosition = result.LastIndexOf(';', truncateLen - 1);
 
                        if (lastSemicolonPosition < lastAmpersandPosition)
                            truncateLen = lastAmpersandPosition;
                    }
 
                    //  Locate the last space and truncate there so we don't
                    //  split words...
                    if (truncateLen > 0 && result[truncateLen] != ' ')
                    {
                        int spacePosition = result.LastIndexOf(' ', truncateLen);
 
                        if (spacePosition > 0)
                            truncateLen = spacePosition;
                    }
 
                    result = result.Substring(0, truncateLen);
 
                    //  Append ellipse, if needed...
                    if (appendEllipse)
                        result += "...";
                }
            }
 
            return result;
        }
 
        public static string NormalizeWhitespace(string s)
        {
            string result;
 
            if (s == null)
                result = null;
            else if (s.Length == 0)
                result = "";
            else
            {
                int startPos = 0;
 
                //  Trim initial whitespace
 
                while (startPos < s.Length && char.IsWhiteSpace(s[startPos]))
                {
                    startPos++;
                }
 
                if (startPos == s.Length)
                    result = "";
                else
                {
                    int firstNonWhitespaceCharacter = startPos;
 
                    while (startPos < s.Length && !char.IsWhiteSpace(s[startPos]))
                    {
                        startPos++;
                    }
 
                    if (startPos == s.Length)
                    {
                        if (firstNonWhitespaceCharacter == 0)
                            result = s;
                        else
                            result = s.Substring(firstNonWhitespaceCharacter);
                    }
                    else
                    {
                        bool haveSeenWhitespace = true;
                        char c;
                        StringBuilder sb = new StringBuilder(s.Length - startPos);
 
                        sb.Append(s, firstNonWhitespaceCharacter,
                                  startPos - firstNonWhitespaceCharacter);
 
                        for (int i = startPos + 1; i < s.Length; i++)
                        {
                            c = s[i];
 
                            if (char.IsWhiteSpace(c) && !haveSeenWhitespace)
                            {
                                haveSeenWhitespace = true;
                            }
                            else
                            {
                                if (haveSeenWhitespace)
                                {
                                    sb.Append(' ');
                                    haveSeenWhitespace = false;
                                }
 
                                sb.Append(c);
                            }
                        }
 
                        result = sb.ToString();
                    }
                }
            }
 
            return result;
        }
 
        public static string StripTags(string s)
        {
            if (s == null)
                return null;
            else
                return FindTagRegex.Replace(s, string.Empty);
        }
 
        public static string StripTagsAndNormalize(string s)
        {
            return NormalizeWhitespace(StripTags(s));
        }
    }
}
 

Possible Enhancements

This could could be enhanced by:

  • Allowing the consumer to provide the text that should be appended during truncation (instead of "...").
  • Allow the consumer to provide a parameter to indicate whether the results should be HTML-encoded or pure text.

Drop me a line if you find this code helpful!

Posted by wifry | 0 Comments

SharePoint 2007 and TokenAndPermUserStore Issues

In 2007, I was working on a large SharePoint 2007 project and we discovered that after the system ran for 3-4 days (or 24 hours of simulated stress), performance would suddenly degrade significantly (i.e. minutes for pages to load, if they would load at all). However, when we looked at all of the traditional performance indicators (memory, CPU, network throughput) on the web front-ends, application servers and database server, everything appeared healthy.

We ultimately tracked the problem down to a SQL Trace Flag 4618 needed to regularly purge the TokenAndPermUserStore cache.

This post discusses the symptoms and ultimately simple solution to the problem.

Configuration

Hardware

  • 4 web-front ends
    Windows Server 2003 SP1 (32-bit)
    2 process web garden
  • 2 clustered database servers
    SQL Server 2005 SP2 servers (64-bit)
    "lots of memory"
  • Search was handled outside of SharePoint so there were no dedicated application servers

Usage Pattern

  • 3 Site Collections
  • 2,000 Publishing Sites
  • ~18,000 Publishing Pages

Symptoms

  • Significant performance degradation after 3-4 days of normal use or 24 hours of simulated stress
  • Pages would take minutes to load or would simply time-out
  • Web front-ends
    • Low CPU usage
    • Normal memory usage
    • Normal network throughput
  • Database servers
    • Low CPU usage
    • Normal memory usage
    • Normal network throughput

In short, there were no obvious indications of a problem other than the slow response time...

Diagnostic Approach

On this project, I had the pleasure to work with a Microsoft Architect named Jay Gore. When Jay executed the following query against the SQL Server:

SELECT scheduler_id, current_tasks_count, runnable_tasks_count
FROM sys.dm_os_schedulers
WHERE scheduler_id < 255

The query returned results such as:

scheduler_id     current_tasks_count       runnable_tasks_count
0                      19                                 16
1                      35                                 31
2                      36                                 31
3                      33                                 29
4                      33                                 29
5                      35                                 31
6                      33                                 30
7                      35                                 31

A runnable task count of greater than zero for any substantial length of time is a problem. The observed behavior was that the database was only at 25% CPU usage but had 10 to 25 tasks in runnable state per scheduler and it was spending all it’s time waiting on SOS_SCHEDULER_YIELDS with greater than 90% signal waits vs. resource waits.

On a hunch, Jay cleared the caches (dbcc freeproccache) and suddenly performance returned to normal.

After further diagnosis, Jay traced the problem to the TokenAndPermUserStore and he was able to consistently resolve the problem by issuing the following command:

dbcc freesystemcache ('TokenAndPermUserStore')

Root Cause

By design, if SQL Server 2005 has "plenty of memory", the cache of dynamically compiled queries will continue to grow and "old items" will not be automatically purged. Eventually, the cache can grow so large that it takes a substantial amount of time to locate the appropriate compiled query.

This behavior is documented in the following Microsoft Knowledge Base articles: KB933564 and KB927396.

Resolution

To resolve this issue, we applied the SQL Server Trace Flag 4618 to the startup parameters for SQL Server 2005.  This flag forces "old items" to be purged from the cache immediately.

Posted by wifry | 0 Comments
Filed under:

Prepare for Internet Explorer 8.0 Today! Yes, Today.

For twelve years, I was an architect of enterprise, commercial software. One of the most important lessons that I learned is that the software that you write today will be in use much longer than you imagine... MUCH LONGER! If you take a shortcut in your work, it will cost you much more to fix it later. I often sat in meetings and heard the counter-argument "But we can fix it in the next Service Pack." Unfortunately, it costs customers real money to upgrade patches and service packs even if they are free. In some regulated industries, the customer much perform a complete re-evaluation of the system after installing a patch.

My advice: if you know that there is a problem heading your way, do what you can today to help prevent a crisis tomorrow.

Internet Explorer 8.0 Beta 2 will be available to the public in August. Some web sites may experience problems due to changes that were made to improve standards compliance. Symptoms may include changes in the size or position of objects and possible JavaScript errors.

The good news is that Microsoft has provided an excellent new feature in Internet Explorer that allows you to explicitly state what version of Internet Explorer you designed your web application to support. This means that there is something very simple that you can do today to help prevent problems tomorrow, next year and ten years from now!

At minimum, add the following meta tag to the head of your pages. This tag will tell Internet Explorer that your pages were designed for IE7.

    <meta http-equiv="X-UA-Compatible" content="IE=7" />

When future versions of Internet Explorer see this tag, they will know that they should render the page exactly as it was render in IE7.

"But wait," you say, "I work on an internal application so I don't need to worry about Service Packs. I can go in and make changes anytime I want. It will be a year or more before it's IE8 is a standard at my company." As a consultant, I work with many companies. Even with corporate standards, there are always users that are running new or different software. (Perhaps they are using VPN from home or they received a new portable that came preloaded with non-standard software.) Ultimately, internal customers will still blame you if your application does not work as expected.

What can you do today to prevent a crisis tomorrow?

  1. Add the single meta tag listed above to your pages today. If you do nothing else, that will help ensure that your pages do not suddenly break.
  2. If you wrote code that attempts to detect the browser version, ensure that you did not hard code it to look for exactly "7".  (Every time a new browser is released, developers make the same mistake.)  Microsoft provides future-safe code at http://www.microsoft.com/windows/products/winfamily/ie/ie8/readiness/DevelopersExisting.htm
  3. Install Internet Explorer 8.0 on a test machine and ensure that your applications are working correctly.
  4. Read about what's new or different about 8.0 at http://www.microsoft.com/windows/products/winfamily/ie/ie8/readiness/

Do it today! Yes, today.

Posted by wifry | 0 Comments
Filed under:

Internet Explorer 8.0 and Accessibility

I'm a strong believer in creating accessible applications... not only because it's the right thing to do, but also because I believe that it results in interfaces that are more obvious.

While studying the Internet Explorer 8 Readiness Toolkit, I learned about IE8's partial support for the W3C's Accessible Rich Internet Application (ARIA) standard. This standard will help to make dynamic web applications and custom user experiences more accessible.  This has always been a concern to me as I watch Web 2.0 applications become the norm.

IE8 will support the "role", "state", and "property" attributes from the ARIA standard.

To learn more about ARIA, read the W3C's ARIA Support white paper from Microsoft.

Additional resources are available on the W3C web site:

Posted by wifry | 0 Comments
Filed under: ,

PowerPoint-like buttons in XAML/WPF

Someone posted a question to a discussion group asking for a way to create buttons like those in PowerPoint 2007 using XAML.

Below is my first attempt.

Button

<UserControl x:Class="TestApp.TestButton"

    xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"

    xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"

    Height="100" Width="300">

    <Grid>

        <Grid.ColumnDefinitions>

            <ColumnDefinition Width="40" />

            <ColumnDefinition Width="*" />

            <ColumnDefinition Width="40" />

        </Grid.ColumnDefinitions>

        <Grid.RowDefinitions>

            <RowDefinition Height="40" />

            <RowDefinition Height="*" />

            <RowDefinition Height="40" />

        </Grid.RowDefinitions>

        <Rectangle Grid.RowSpan="3" Grid.ColumnSpan="3" RadiusX="20" RadiusY="20">

            <Rectangle.Fill>

                <LinearGradientBrush StartPoint="0,0" EndPoint="0,1">

                    <GradientStop Color="LightGreen" Offset="0" />

                    <GradientStop Color="Green" Offset="1" />

                </LinearGradientBrush>

            </Rectangle.Fill>

        </Rectangle>

        <Rectangle Grid.Row="2" Grid.RowSpan="1" Grid.ColumnSpan="3" RadiusX="20" RadiusY="20">

            <Rectangle.Fill>

                <LinearGradientBrush StartPoint="0,0" EndPoint="0,1">

                    <GradientStop Color="#0000" Offset=".5" />

                    <GradientStop Color="#3000" Offset="1" />

                </LinearGradientBrush>

            </Rectangle.Fill>

        </Rectangle>

        <Rectangle Grid.Row="0" Grid.RowSpan="3" Grid.ColumnSpan="1" RadiusX="20" RadiusY="20">

            <Rectangle.Fill>

                <LinearGradientBrush StartPoint="0,0" EndPoint="1,0">

                    <GradientStop Color="#0000" Offset=".5" />

                    <GradientStop Color="#3000" Offset="0" />

                </LinearGradientBrush>

            </Rectangle.Fill>

        </Rectangle>

        <Rectangle Grid.Column="2" Grid.RowSpan="3" Grid.ColumnSpan="1" RadiusX="20" RadiusY="20">

            <Rectangle.Fill>

                <LinearGradientBrush StartPoint="0,0" EndPoint="1,0">

                    <GradientStop Color="#0000" Offset=".5" />

                    <GradientStop Color="#3000" Offset="1" />

                </LinearGradientBrush>

            </Rectangle.Fill>

        </Rectangle>

        <Rectangle Grid.RowSpan="1" Grid.ColumnSpan="3" RadiusX="20" RadiusY="20" >

            <Rectangle.Fill>

                <LinearGradientBrush StartPoint="0,0" EndPoint="0,1">

                    <GradientStop Color="PaleGreen" Offset="0" />

                    <GradientStop Color="#0fff" Offset=".5" />

                </LinearGradientBrush>

            </Rectangle.Fill>

        </Rectangle>

        <Rectangle Grid.RowSpan="3" Grid.ColumnSpan="3" RadiusX="20" RadiusY="20" StrokeThickness="2">

            <Rectangle.Stroke>

                <LinearGradientBrush StartPoint="0,0" EndPoint="0,1">

                    <GradientStop Color="#33006600" Offset="0" />

                    <GradientStop Color="#33003300" Offset="1" />

                </LinearGradientBrush>

            </Rectangle.Stroke>

        </Rectangle>

    </Grid>

</UserControl>

Posted by wifry | 0 Comments
Filed under: ,

WCF: BodyWriter and Raw XML Problems

I've been recently writing code with Windows Communication Foundation (WCF) using raw Message contracts. The purpose of raw Message contracts is to allow the developer to be in complete control of the format of the message that is received and sent by a service. This is a good choice for the project that I'm working on because I am dealing with legacy code that is not strongly typed and I need the flexibility to process messages with structures that may vary between calls. (This is of course less than ideal, but it allows legacy code to function in WCF while we refactor the code.)

WCF provides several options for writing XML into your message. The Message.CreateMessage() method accepts Object (which will automatically be serialized using the Xml Serializer), XmlReader, XmlDictionaryReader or BodyWriter. The BodyWriter object provides an event that you can override to implement your own serialization.

In the project that I was creating, the data for my Message was already in XML, so I thought my most efficient solution would be to implement a BodyWriter that would simply write the raw XML into the stream. So I created the following class:

    public class SimpleMessageBody : BodyWriter
    {
        string xmlContent;

        public SimpleMessageBody(string content)
            : base(true)
        {
            this.xmlContent = content;
        }

        protected override void OnWriteBodyContents(XmlDictionaryWriter writer)
        {
            writer.WriteRaw(xmlContent);
        }
    }

I then used this class to be the input to the CreateMessage method as follows:

    message = Message.CreateMessage(MessageVersion.Default, 
                "http://tempuri.org/SomeMethod", new SimpleMessageBody(xml));

Much to my surprise, my XML content was automatically encoded in my method! However, if I opened an XmlReader on my XML content and passed the reader into the method, everything worked fine.

Eventually, I used the .NET Reflector utility to look deep inside the BodyWriter class. It turns out that the BodyWriter uses it's own special implementation of XmlWriter and when you look at this implementation, whenever you call the .WriteRaw() method, that method actually calls the .WriteText() method under the hood which ultimately encodes your raw XML!

It's possible that Microsoft wanted to prevent developers from writing raw XML into the message in order to prevent corruption, namespace conflicts or potential security violations. Unfortunately, this behavior does not appear to be clearly documented.

The only option appears to be to open an XmlReader and use the .WriteNode() method.

    protected override void OnWriteBodyContents(XmlDictionaryWriter writer)
    {
        using (StringReader stringReader = new StringReader(xmlContent))
        {
            using (XmlReader xmlReader = XmlTextReader.Create(stringReader))
            {
                writer.WriteNode(xmlReader, true);
            }
        }
    }
Posted by wifry | 0 Comments
Filed under: ,

Visual Studio and Arithmetic Overflow

Microsoft has made an important change between Visual Studio 2003 and Visual Studio 2005/2008:

By default, Visual Studio 2005/2008 projects have the “Check for arithmetic overflow/underflow” compilation switch turned off. In Visual Studio 2003, this switch defaulted to true.

What Does This Mean?

This means that if code attempts to multiply or add numbers that are very large (too large for the particular data type), the code will not throw an exception, but rather it will produce an unexpected result! This can lead to data corruption and potentially even infinite loops. (see “How We Discovered The Problem” below…)

Correcting the Problem

Every time you create a Visual Studio 2005 project:

  • Right-click on the project and bring up Properties.
  • Click on the Build tab.
  • Click on the Advanced… button. (You may need to scroll down to find it.)
  • Check the Check for arithmetic overflow/underflow checkbox.

Why Did Microsoft Do That?

Performance.

Checking for overflow requires a few extra programming cycles. Microsoft obviously traded safety for speed. (Although, this is a bit like saying “I can make a car faster if I don’t include brakes”.)

How I Discovered The Problem

I had written a little function that operated on a byte. I wanted to write a Unit Test Case to ensure that the function worked as expected for all possible byte values, so I wrote the following code:

for(byte b=0; b <= 255; b++)
{
    myFunction(b);
}

However, there is a serious bug in the code above. Once b == 255, this function will try to increment b one more time which should cause an overflow. Rather than overflowing, b actually became 0 and it became an infinite loop!

I was shocked that I did not get an overflow exception, so then I tried…

byte b = byte.MaxValue;
b++;
Console.WriteLine(b);

And I received 0!

So I tried…

int i = int.MaxValue;
i++;
Console.WriteLine(i);

And I received a large negative number.

And finally, I tried…

int i = int.MaxValue;
i *= 10;
Console.WriteLine(i);

And I received -10!!!

At this point, I knew something was terribly wrong…

What If I Know That My Code Can’t Overflow?

In general, it’s better for us to be safe than sorry and always checked for overflow. HOWEVER, if you are 100% certain that you have a function that could never overflow, C# provides the unchecked { } block that allows you to mark a block of code to not require overflow checking.

I strongly advise against any developer actually using the unchecked statement unless you really, really, really know what you are doing!

Posted by wifry | 0 Comments
Filed under:

Default Namespaces and XML: Who Knew!?!?

I work a lot with XML and XML will obviously play an important role in future development. Over the last couple of months, I've been observing some incredibly strange behaviors with namespaces. I was certain that I must be observing a bug in Microsoft's XML Parsers, so today I decided I was going to finally figure out what was happening.

Take the following example:

<html xmlns="http://www.w3.org/1999/xhtml"> 
    …     
    <a href="http:someurl" >     
    … 
</html> 

Question #1: What is that namespace for the "a" element?

Answer #1: It's "http://www.w3.org/1999/xhtml".

Question #2: What is the namespace for the "href" attribute?

Answer #2: Did you say "http://www.w3.org/1999/xhtml"? Wrong! It's actually null!

What? That didn't make any sense, so then I tried using an explicit prefix on the elements…

<h:html xmlns:h="http://www.w3.org/1999/xhtml"> 
     … 
     <h:a href="http:someurl" > 
     … 
</h:html> 

When you use an explicit prefix on elements then the namespace for both "a" and "href" becomes "http://www.w3.org/1999/xhtml".

This certainly seemed like a bug, so I went straight to the source. . . the W3C.

http://www.w3.org/TR/2006/REC-xml-names-20060816/#scoping

Amazingly enough, it's not a bug! If you use a default namespace, then elements with no prefix belong to the default namespace; however, attributes have a null namespace. If you use a namespace prefix on elements, then attributes with no prefix have the same namespace as their containing element!

I have absolutely no idea why the W3C chose this behavior, but it's incredibly important if you are trying to write XPath expressions or are using the Microsoft XML classes.

Understanding this behavior should save you from a lot of frustration!

Happy Coding!

Posted by wifry | 0 Comments
Filed under:
 
Page view tracker