Updated to clarify that the PDF version is provided by the publishers
One interesting aspect of the XML format from the National Library of Medicine is its use in PubMed Central, which is the largest archive for biomedical articles (http://www.pubmedcentral.nih.gov/). After articles are published in different journals (print and online), they are then submitted for archival at PubMed Central. As part of the archival process, the articles are converted to the NLM XML format. The NLM format is light on presentation elements (while there are elements like bold and italic, which influence the presentation of text, there is no control over background color, or border styles and colors for tables, for example), but the format encompasses a lot of metadata (more on that in a future post).
The main focus of the archival format is on the content itself, rather than on how the content is to be presented. When someone reads an article on PubMed Central, for example http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2253543&rendertype=abstract, the article is presented in HTML or PDF, generated from the XML content, or PDF (provided by the publishers). As part of this conversion, presentation related attributes are applied to the content that was expressed in XML. Storing the content in XML provides a lot of versatility in terms of presenting content, while preserving the relevant information, and I think is likely to be a more common process in the future, across journals and other repositories, as more journals become electronic only and the content is consumed through devices with different form factors.
The collaboration with the staff at the National Center for Biotechnology Information has been critical in the development of the add-in, as well as to ensure that the output from the final version of the add-in will meet the needs of journals and conform to the requirements for archival. Many thanks to them for their input and for their support.
“We are delighted that Microsoft has chosen to support the NLM DTD, building on Office 2007, and has produced this technology preview as a proof of concept. Having a tool that will automatically transfer an author’s work from Microsoft Word into the NLM DTD will benefit authors as well as publishers. We are eager to see Microsoft make the release version of this tool available to the community.”
- David Lipman, M.D., Director of the National Library of Medicine’s National Center for Biotechnology Information