Peter has been giving me a hard time about getting the WordML transform integrated into WordBlogX for a while now. Well, right now I'm writing this blog entry with Microsoft Office Word 2003 under the debugger with a debug build of the WordBlogX assembly. So far so good.

 

There were two bugs that were driving me crazy.  First, if you were watching Mike Howard's blog closely when he first started blogging you would have noticed some funky looking "A" characters spread about a couple of his entries (to be truthful my blog entries had the same problem sometimes).  Well, I tracked those annoying "A" characters down to Word changing the first character of two spaces after a period into some magical character with a Unicode value of 160 (in CLR: \u0160).  I don't think this would be a problem if http://blogs.gotdotnet.com had the UTF-8 markers at the top of our web pages (we don't control that part of the blog entries, BlogX does).  Unfortunately, they don't.  So I added the following code to WordBlogX to convert all the text here to UTF-7:

 

    // convert the entry into UTF7 (since BlogX seems to prefer that)

    byte [] entryBytes = Encoding.Unicode.GetBytes(entry); // get the entry as a bunch of unicode bytes

    byte [] entryBytesUtf7 = Encoding.Convert(Encoding.Unicode, Encoding.UTF7, entryBytes);

    string entryUtf7 = Encoding.UTF7.GetString(entryBytesUtf7); // get the bytes as a utf7 string

 

Now, if I did my job correctly, that text came out coloured (ahh, the dreaded "u" still exists in WordBlogX v2!) and indented and everything.  Honestly, getting that text indented turned out to be the most painful part of this whole feature.  This is the slightly hacked code that I'm currently using to get white-space preserved:

 

    // convert all double spaces into a space + non-breaking space to preserve "indentedness" (like for code)

   StringBuilder entryWithSpaces = new StringBuilder();

    int iEntryStart = 0;

    int iEntryEnd = 0;

   regex = new Regex(@"<span.*?>(.*)</span>"); // look for those spans with two or more spaces in their body

   mc = regex.Matches(entry);

    if (0 < mc.Count)

   {

        foreach (Match m in mc)

       {

           iEntryEnd = m.Groups[0].Index;

            if (iEntryStart < iEntryEnd)

           {

               entryWithSpaces.Append(entry.Substring(iEntryStart, iEntryEnd - iEntryStart));

           }

 

           entryWithSpaces.Append(m.Groups[0].Value, 0, m.Groups[1].Index - m.Groups[0].Index);

           entryWithSpaces.Append(m.Groups[1].Value.Replace("  ", " &nbsp;"));

 

           iEntryStart = m.Groups[1].Index + m.Groups[1].Length; // move beyond all the matched stuff

       }

 

        // if anything is left over, tack it on to the end

        if (iEntryEnd < iEntryStart)

       {

           entryWithSpaces.Append(entry.Substring(iEntryStart));

       }

 

        // get the entry back as a string

       entry = entryWithSpaces.ToString();

   }

 

I'm not proud of that code but it seems to be getting the job done now.  Assuming everything posts correctly finally getting it working tonight will be totally worth it.

 

I still haven't written the deferred CustomAction to take care of the path manipulations necessary for WordBlogX but that will be cake compared to the hoops I've been trying to jump through getting WordBlogX to look good.

 

Well, that's enough for tonight.  I have a lot I'd like to blog about since I've kinda' been distracted lately.  More later...