Today I'd like to talk a bit more about the authoring xml schema that is used to create documentation for the Longhorn SDK. In reality there are several xml schemas that are used to author content for the Longhorn SDK but there is only one 'native' xml schema that the Red October system can 'build'. Red October itself is schema agnostic in that it provides an 'interface' to the build system for any xml schema, but at this point we have only provided a build implementation for one schema. The other schemas I mentioned are boiled down to Red October's schema via a few transform pre-processes.

I talked a bit in an earlier post about authoring content in raw xml versus a wysiwyg forms-based editor but that subject requires a whole other discussion. In the Red October schema, authors and editors work directly in the raw xml.

You can think of the xml documentation set as a structured database stored in a distributed manner in the file system. The schema used is not normalized in that we have a different 'schema' for each reference page type, however, this isn't absolutely correct because each reference page type shares common core schema elements. The only difference between the individual reference page types exists in the xml that represents that syntax - the API's signature. This isn't *entirely* true, but for the purposes of this discussion, IT IS.

A 'page' is broken down into three major sections. Metadata, content, and production information. The metadata section contains information about the page and the api's signature. The content section contains, you guessed it, authored content. The production information section contains more metadata about the document owner, editor, status (ready for publishing, not ready for publishing, etc.).

Now, I'm certain there are those out there who will want to know why we don't keep all the document metadata and production data in SQL. We do. Sort of. On the metadata side, our goal was to make the task of writing queries against the database easy. We collect up all the individual xml documents into a gigantic single DOM so we can write xpaths against it. On the production data side, the data gets scrubbed and thrown into a set of query tools. As far as I can tell (and any writers/editors using the system can correct me if I'm wrong) they find it easier to work with the data directly in the document itself since they are working in the raw xml.