Peter Taylor's WebLog

  • Ororo Monroe

    I know I just posted my latest blog entry, but further to my deep seated love of the X-Men I just wanted to pass along this link. While I'm sure Li Ruqing is a very nice man, I regret to inform him he's no STORM (a.k.a. Ororo Monroe).
  • Uncanny Dynamic Portals

    Not many people know this, but I am a published comic book author. In 1997, I co-wrote a series for Caliber comics titled “Technopolis” with my good friend Adrie Van Viersen who is – to say the least – an exceptional artist. Adrie has since gone on to do a lot of work on some movies you’ve seen, for example, a little movie called “X-Men 2”.

     

    I’m obsessed with the X-Men. Not in an unhealthy-I’ve-got-Wolverine-tattooed-on-my-neck sort of way, but back when I was an impressionable young lad in the late 70’s I’m proud to say that I purchased, read, re-read, lost, found, re-read, and then lost again some comic books that would be very expensive today. I’m not bitter though. Let’s just say when I’m browsing certain sections on eBay that my intestines spontaneously arrange themselves into a monkey knot and leave it at that.

     

    Where am I going with this? In the X-Men comics, the X-Men travel through space and time by means of a mutant named Gateway. Gateway lives in the Australian desert and he generates a wormhole portal for the X-Men by swinging his bullroarer to rip apart the very threads that make up the fabric of the universe as we know it. Therefore, portals are in X-Men comics so they MUST be cool. Right?

     

    If you’ve visited a major web site in the recent past, you’ve seen what are commonly referred to as portals. I won’t dwell on the subject of what a portal is, but go to Amazon.com or Yahoo.com and take a look. The basic idea is the “portal” page contains links to pages with related content – sort of like a table of contents.

     

    We construct portals in the LonghornSDK to organize the massive amounts of information stored within. There are two types of portal pages: authored portal pages and dynamically generated portal pages. The authored pages are created by, you guessed it, authors – here’s an example of one. The dynamically generated pages are conjured up during our documentation build with no authoring interaction.

     

    The process of creating the dynamic portal pages is interesting, and I’ll go into that later. For now, I’ll go over how the authored portal pages work. Let’s assume we’re building a website to showcase our favorite breakfast foods, and it consists of five xml pages (assume these pages will be transformed to HTML using XSLT).

     

    <topic id=”capncrunch”>

                <title>Cap’n Crunch</title>

                <abstract>The Cap’n is the king of breakfast cereals. It’s crunchy and sweet. The best of both worlds.</abstract>

                <metadata>

                            <data name=”texture” value=”crunchy”/>

                            <data name=”sweetness” value=”high”/>

                </metadata>

                <content><!-- witty stuff goes here --></content>

    </topic>

     

    <topic id=”grapenuts”>

                <title>Grape Nuts</title>

                <abstract>It’s crunchy and tasty, and best of all – it’s good for you!</abstract>

                <metadata>

                            <data name=”texture” value=”crunchy”/>

                            <data name=”sweetness” value=”low”/>

                            <data name=”healthy” value=”yes”/>

                </metadata>

                <content><!-- witty stuff goes here --></content>

    </topic>

     

    <topic id=”oatmeal”>

                <title>Oatmeal</title>

                <abstract>It’s like your mouth is on vacation – in Scotland!</abstract>

                <metadata>

                            <data name=”texture” value=”soft”/>

                            <data name=”sweetness” value=”low”/>

                            <data name=”healthy” value=”yes”/>

                </metadata>

                <content><!-- witty stuff goes here --></content>

    </topic>

     

    <topic id=”poptarts”>

                <title>Pop Tarts</title>

                <abstract>It’s a cornucopia of tasty fillings surrounded by wonderful sweet pastry. It’s nature’s perfect food.</abstract>

                <metadata>

                            <data name=”texture” value=”soft”/>

                            <data name=”sweetness” value=”high”/>

                </metadata>

                <content><!-- witty stuff goes here --></content>

    </topic>

     

    <topic id=”granola”>

                <title>Granola</title>

                <abstract>A synonym for ‘hippie’ no more! High in fiber, crunchy, and sweet! It’s good for you, in theory at least.</abstract>

                <metadata>

                            <data name=”texture” value=”crunchy”/>

                            <data name=”sweetness” value=”high”/>

                            <data name=”healthy” value=”yes”/>

                </metadata>

                <content><!-- witty stuff goes here --></content>

    </topic>

     

    Now let’s move on to creating the main entry page for our portal. This will be the first page visitors to our web site will see.

     

    <topic id=”main”>

                <title>Peter Taylor’s Cavalcade of Breakfast Treats</title>

                <metadata/>

                <abstract>Peter Taylor eats breakfast every day, so he should know what’s good.</abstract>

                <content>

                            <p>Like breakfast? Me too!</p>

                            <group>

                                        <p>Want something crunchy and sweet? How about these?</p>

                                        <query reference_id=”crunchy_and_sweet”/>

                            </group>

                            <group>

                                        <p>You like the sweet stuff, don’t you? Here it is.</p>

                                        <query reference_id =”sweet”/>

                            </group>

                            <group>

                                        <p>I want to be healthy, but I have no teeth.</p>

                                        <query reference_id =”healthy_and_soft”/>

                            </group>

                </content>

    </topic>

     

    In our “build tool”, let’s assume we have a database that contains all of our documents in such a way that we can run queries against them. I’ve created my very own query language below, but in our tools we actually use xpaths. However, in order to write xpaths you need to know something about the structure of the xml you’re writing an xpath against so let’s pretend that my query language is the BEST query language EVER and you don't actually need to know anything about the structure of the database. Let’s also assume that during our transform process our parser knows what to do when it encounters the <query> tag – which in this case is to go off and perform the query and return the results and copy some data. And hey! Why not use XML to store our queries?

     

    <queries>

                <query id=”crunchy”>

                            <select>name=’texture’ value=’crunchy’</select>

                            <copy>/abstract</copy>

                </query>

     

                <query id=”sweet”>

                            <select>name=’sweetness’ value=’sweet’</select>

                            <copy>/abstract</copy>

                </query>

     

                <query id=”healthy_and_soft”>

                            <select>>(name=’texture’ value=’soft’) AND (name=’healthy’ value=’yes’)</select>

                            <copy>/abstract</copy>

                </query>

    </queries>

     

    Note that there are two parts to the query. First is the query itself, and it specifies the criteria for selecting a topic. Second is an operation, and for the sake of clarity I’m just throwing in a tag called <copy>. In reality the system is more robust and configurable than this, but I can talk more about that later. For now, all we need to care about is the fact that two things happen – the selecting of a topic based on its metadata and an operation to copy over information relevant for the portal page.

     

    If we could grab a snapshot of this document as it was being parsed by the build process, we’d see it looked something like this:

     

    <topic id=”main”>

                <title>Peter Taylor’s Cavalcade of Breakfast Treats</title>

                <metadata/>

                <abstract>Peter Taylor eats breakfast every day, so he should know what’s good.</abstract>

                <content>

                            <p>Like breakfast? Me too!</p>

                            <group>

                                        <p>Want something crunchy? How about these?</p>

    <crosslink ref_id=”capncrunch”/>

                                        <crosslink ref_id=”grapenuts”/>

                            </group>

                            <group>

                                        <p>You like the sweet stuff, don’t you? Here it is.</p>

                                        <crosslink ref_id=”capncrunch”/>

                                        <crosslink ref_id=”poptarts”/>

                                        <crosslink ref_id=”granola”/>

                            </group>

                            <group>

                                        <p>I want to be healthy, but I have no teeth.</p>

                                        <crosslink ref_id=”oatmeal”/>

                            </group>

                </content>

    </topic>

     

    The next step is to actually look up the targets referenced in the crosslink tag in the database. In the queries it was specified that the abstracts should also be copied over into the final document. Add a dash of fancy presentation logic in the xslt and voila! The result is a page that looks something like this:

     

    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

     

    <html>

    <head>

                <title>Peter Taylor's Cavalcade of Breakfast Treats</title>

    </head>

     

    <body>

    Like breakfast? Me too!

     

    <ul>

                <p>Want something crunchy and sweet? How about these?</p>

                <li><A href="...link to capncrunch">Cap'n Crunch</A>: The Cap’n is the king of breakfast cereals. It’s crunchy and sweet. The best of both worlds.</li>

                <li><A href="...link to grapenuts">Grape Nuts</A>: It's crunchy and tasty, and best of all - it's good for you!</li>

    </ul>

     

    <ul>

                <p>You like the sweet stuff, don't you? Here it is.</p>

                <li><A href="...link to capncrunch">Cap'n Crunch</A>: The Cap’n is the king of breakfast cereals. It’s crunchy and sweet. The best of both worlds.</li>

                <li><A href="...link to poptarts">Pop Tarts</A>It's a cornucopia of tasty fillings surrounded by wonderful sweet pastry. It's nature's perfect food.</li>

                <li><A href="...link to granola">Granola</A>: A synonym for 'hippie' no more! High in fiber, crunchy, and sweet! It's good for you, in theory at least.</li>

    </ul>

     

    <ul>

                <p>I want to be healthy, but I have no teeth.</p>

                <li><A href="...link to oatmeal">Oatmeal</A>: It's like your mouth is on vacation - in Scotland!</li>

    </ul>

    </body>

    </html>

     

    So, no, building portal pages is not as complicated as building, say, CEREBRO. Only the mutant with the most powerful mind in the WHOLE WORLD is capable of that. However, hopefully you can see the utility of this. The actual content of the portal page has to be authored but the build tool worries about collecting up the links and inserting updated abstracts. Thus, all the authors need to concern themselves with is authoring their documents because they know the build process will pick up their changes and make sure they are represented on the main portal page.

     

    This posting is provided "AS IS" with no warranties, and confers no rights.  

  • Separated at birth

    As per Vanya's request, hockey content! Today's entry - “Tampa Bay Lightning's Separated at Birth“.

    I couldn't find a picture of Nikolai Khabibulin in his Tampa Bay Lightning uniform that was suitable. My apologies.

  • Dereferencing a dynamic crosslink in an xml-based markup language

    So, let's say you have created your own xml-based markup language. You have designed the schema, you've got some xsd's to describe it, some xslt to transform it to html so your readers can view it in a browser. Just for fun, let's say your directory structure looks something like this:

    /root
     /a.xml
     /b.xml
     /c.xml

    In a.xml, you want to create an active link to b.xml with your <crosslink> tag. No problem, right? You just need to hardcode that path into the crosslink tag.

    <crosslink target="b.xml"/>

    It's easy to transform that.

    <xsl:template match="crosslink">
     <A>
      <xsl:attribute name="href"><xsl:value-of select="@target"/></xsl:attribute>
      <!-- to make it easy, throw the name of the link in -->
      <xsl:value-of select="@target"/>
     </A>
    </xsl:template>

    Hang on. Just for kicks let's say suddenly we need to move b.xml to a different directory.

    /root
     /a.xml
     /all_files_starting_with_b
      /b.xml
     /c.xml

    What now? Okay, we need to go into a.xml and change the target attribute in our <crosslink> tag. Easy. But what happens when our documentation set becomes very large and there are tens and hundreds of <crosslink> tags with a target attribute pointing to b.xml? It suddenly becomes a very large universal search and replace job.

    Wouldn't it be easier to create a lookup table, give b.xml an ID, and adjust the path in one place instead of many?

    <root>
     <document id="the_b_document">
      <path>/root/all_files_starting_with_b/b.xml</path>
     </document>
    </root>

    Now the <crosslink> tag looks something more like this: <crosslink ref_id="the_b_document"/>. The xslt to transform it looks much the same, but with one big difference.

    <xsl:template match="crosslink">
     <A>
      <xsl:attribute name="href">
      <!-- grab the @ref_id, pass it to lookup code and put path returned here -->
      </xsl:attribute>
     </A>
    </xsl:template>

    Yes, some magic to get access to the lookup table needs to happen. It can be done in xslt if you write some inline jscript functions to load up the xml file and perform lookups. If you've got an xml build system as I mentioned earlier, you can write the code to perform the lookups there.

    However, there are a few gotchas here. If you go the xslt/jscript route, the lookup table will need to be loaded every single time you transform the document. If you've got users and they want to be able to run the transform regularly to preview what the final output will look like, they won't be happy when the lookup table gets very large and takes several minutes to spool off the harddrive and into memory each time. Here you could write a nice ActiveX component that caches the lookup table for them, but then they'll have to install and configure it correctly and they'll never be quite sure which version of the lookup table is loaded unless you provide them with a user interface to inspect it.

    If you've got a build system, they'll have to run that build system to see what the final output will look like. You could have a simple preview xslt that doesn't try to perform lookups, but then your users won't be able to check that they've entered ID's correctly and that they resolve to the intended target.

    Anyone else have any other approaches?


    This posting is provided "AS IS" with no warranties, and confers no rights.

     

  • Anyone else using xml as a markup language?

    I'd like to do an informal survey - how many of you out there are using xml as markup? There is a great deal of discussion of using xml to represent data but very little discussion of using xml as markup. I guess that's because HTML and XHTML exist already.

    So, anyone else out there design an xml schema to be used as markup?

  • Back to the Schema Part I

    Today I'd like to talk a bit more about the authoring xml schema that is used to create documentation for the Longhorn SDK. In reality there are several xml schemas that are used to author content for the Longhorn SDK but there is only one 'native' xml schema that the Red October system can 'build'. Red October itself is schema agnostic in that it provides an 'interface' to the build system for any xml schema, but at this point we have only provided a build implementation for one schema. The other schemas I mentioned are boiled down to Red October's schema via a few transform pre-processes.

    I talked a bit in an earlier post about authoring content in raw xml versus a wysiwyg forms-based editor but that subject requires a whole other discussion. In the Red October schema, authors and editors work directly in the raw xml.

    You can think of the xml documentation set as a structured database stored in a distributed manner in the file system. The schema used is not normalized in that we have a different 'schema' for each reference page type, however, this isn't absolutely correct because each reference page type shares common core schema elements. The only difference between the individual reference page types exists in the xml that represents that syntax - the API's signature. This isn't *entirely* true, but for the purposes of this discussion, IT IS.

    A 'page' is broken down into three major sections. Metadata, content, and production information. The metadata section contains information about the page and the api's signature. The content section contains, you guessed it, authored content. The production information section contains more metadata about the document owner, editor, status (ready for publishing, not ready for publishing, etc.).

    Now, I'm certain there are those out there who will want to know why we don't keep all the document metadata and production data in SQL. We do. Sort of. On the metadata side, our goal was to make the task of writing queries against the database easy. We collect up all the individual xml documents into a gigantic single DOM so we can write xpaths against it. On the production data side, the data gets scrubbed and thrown into a set of query tools. As far as I can tell (and any writers/editors using the system can correct me if I'm wrong) they find it easier to work with the data directly in the document itself since they are working in the raw xml.

  • An XML "build" system.

    I've received some mail requesting more information on what exactly comprises an 'xml build system'. Some have asked why we just don't deliver xml and have xslt transform the document on the server before it's sent down to the browser. Thus, I'm going to backtrack even more and explain a little more about the requirements of our xml publishing tools.

    My team generates electronic documentation which is published to multiple target platforms. MS Help 1.0 and 2.0, the MSDN web site, even MS Press. One of the requirements of our build system is that it be able to create files of various types - asp, asp.net, regular old html, others - from authored xml source.

    The process looks like this:

    authored xml file -> build application -> asp, htm, aspx, xml.

    At its most basic, the 'build application' mentioned above does a couple of things:

    1. Loads up the authored xml file
    2. Apply xslt to xml file and cache results
    3. Save cached results of xslt transform to a file

    By the way, I'd like to give a shout out to my biggest fan, Josh.

  • Adventures in schema design

    One of the practical considerations in designing a xml-based documentation system is creating an appropriate schema.

    I'm going to digress here a little, so please bear with me. In theory at least, the users of the system (programmer/writers, editors, other contributors) should never really have to deal with the raw xml - there should be some nice wysiwyg authoring tool they can use to edit documents. There are plenty of off the shelf xml editing tools out there that can be employed by users of the system, but none of them are truly wysiwyg. They can approximate what the output will look like, but the only way for someone using a very complex and data-driven transform system such as ours to accurately 'preview' what a document will look like when built is to actually pump that xml document through the transform tool. Again, it's not really practical for individual users to set up an extremely proprietary and fairly complex build system on their desktops just to see what a document will look like. There is also the problem of staying “in sync” with the binaries and XSLT.

    My team solved this problem by hosting the Red October system behind an asp.net web service. Each of the xml documents contains a processing directive that points to an xslt file they keep on their local machine. The xslt is very simple - it uses jscript to instantiate the xmlhttp object in msxml4, passes the xml in the document as a string to the web service using HTTP POST, and the response string is what's handed back to the parser and displayed.

    While the programmer/writers that use the system might disagree, I think this is a good way to do it. The problem the users of the 'preview web service' have with it is that I haven't been able to spend as much time as is necessary to maintain it properly so sometimes it goes down unexpectedly or potentially has BUGS in it. Shock, I know. I decided a while ago that that best way for me to spend my time efficiently on it was for it to only run 'release' versions of the Red October bits. It really is a novel use of the .Net Frameworks and asp.net and speaks volumes of the power of said technologies.

    I still haven't talked about the actual xml schema yet, have I?

  • The Longhorn SDK

    The easiest way to have a look at the Longhorn SDK is to check it out here on MSDN. The entire site you see there is built using the tools developed internally by the Longhorn SDK team.

    The 'big picture' process is pretty simple - a tool called “Olympia” uses .Net reflection to query the metadata in all the Longhorn assemblies then generates a set of individual XML documents representing the entire Longhorn API set. These documents are skeleton XML documents essentially devoid of authored content. At this point, each document contains some metadata about the API syntax and the document itself, a <content> section where authored content can be added by a programmer/writer, and a section that contains data about the status of the document used by writing managers to track the progress of the documentation set. At this point the status of the document is 'created by Olympia'. Olympia does contain code to integrate content from external XML sources such as the developer XML comment files generated by Visual Studio using the /doc compiler switch, but I'll discuss more about this later.

    So, how does a programmer/writer add content to these newly generated XML documents and have that content persist from day to day? We maintain a source control server that contains a static set of XML documents. Every day after Olympia runs, a process called “OlympiaDiff” follows behind - this process examines each document in source control and compares it on a signature-by-signature basis with the newly generated document produced by Olympia from the latest Longhorn build. OlympiaDiff queries the relevant sections in each document to determine if there's been a change in the API in Longhorn. If it discovers a difference, a log entry is generated. Each day the programmer/writers check out the OlympiaDiff log to determine if there's been a change in the API's they maintain, and there's some automation to hopefully make it easier for them to move their authored content to the correct document. The assumption here is that the authored XML content in source control is always the most accurate representation of the Longhorn API.

    Each day the latest set of XML documents are copied down from source control, then the build and transform tool named “Red October” does some slicing and dicing on the XML data and performs an XSLT transform to build the asp.net documents you see on the Longhorn SDK site. There's a lot of data manipulation going on at build time - dereferencing links, building tables of information, building the dynamically generated content on the main portal pages, etc. etc.

    The process is obviously a little more complicated than this in practice, but that's pretty much it.

  • My first post

    Here's my first blog post of any sort - EVER. I have to admit that I'm not a blog aficionado, so if I should ever stray from blog protocol you'll know it's because I'm ignorant.

    So, my name is Peter Taylor and I work here at Microsoft on the Longhorn Software Development Kit team. I work on the tools that build all the documents that go into the Longhorn SDK. The tools are written using the various versions of the .Net Frameworks and rely heavily on reflection, xml/xslt, asp.net, and a little bit of ado.net. I'll explain more about the tools later.

    Since I'm in internal tools development my experience using these technologies is more congruent with that of a customer - I run into all the same problems everyone else runs into trying to build real applications in Windows using the .Net Frameworks. I decided that writing a blog would be a good way to share some of my experiences working with these technologies.


© 2009 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker