Welcome to MSDN Blogs Sign in | Join | Help

Ontology Add-in for Word 2007 - Technology Preview

We just posted a Technology Preview build of an add-in that enables adding semantic knowledge to documents by associating words in the document to ontology terms.  This add-in can be installed from CodePlex, and the source is available here under the Ms-PL license.

Add-in Basics

The add-in, developed in collaboration with the University of California San Diego and Science Commons, serves as a solution accelerator for those working in the ontology field.  The add-in works in two ways:

- background scanning

- direct tagging

By default the add-in will scan terms in the document and present suggestions for terms it recognizes using SmartTags.  Through the SmartTag menu, authors can associate recognized words to the appropriate ontology terms.

SmartTag highlighting and menu Ontology SmartTag menu

Authors can also tag words directly, by highlight the words and selecting the ontology term that they want to associate with them.

There are a large number of ontologies available online, using the OBO format, and through the configuration dialog, additional ontologies can be downloaded to the computer for use by the add-in.

Target Audience

Looking at the developer stack from higher to lower levels of abstraction, the add-in will be useful in at least three key areas:

·         Development of new ontologies

·         Investigation of new author interaction paradigms

·         Integration into publishing and semantic workflows

 

For those developing new ontologies, the add-in provides a very easy way to test those ontologies with their target audience.  In many scientific disciplines, Microsoft Word is a very popular tool for authoring papers and articles, and as such, authors are already familiar with its usage and features.  The add-in is able to seamlessly build on this familiarity to expose new functionality and additional ontologies can be downloaded through a REST interface.

 

For researchers who are focused on new ways of analyzing content and detecting terms automatically, or on extending the user interaction with authors, access to the source code provides a great foundation to build on, without having to start from scratch.  Also, community members can add incremental value to the add-in and share it back with others in the community.  For this purpose, CodePlex provides a good forum to host discussions, report bugs, and publish documents related to the project.

 

Developers that work in the publishing industry, or at libraries and repositories, can customize or extend the add-in to present a user interface specific to their organization, or to add information to the XML content when terms are recognized and tagged by the add-in.  Enhancing the information in the XML tag that captures the semantic information would also be useful to those doing semantic analysis, search, or storing information in databases.

 

Tagging

The add-in relies on custom XML tags to associate the semantic information with the matched words.  The semantic information is stored as part of the document content.  Utilities and applications that read or process docx files can retrieve and use this information (or transform it to other formats).

<w:customXml w:uri="http://biolit.ucsd.edu/biolitschema" w:element="biolit-term">

<w:customXmlPr>

<w:attr w:name="id" w:val="GO:0031386" />

<w:attr w:name="type" w:val="Biological process" />

<w:attr w:name="status" w:val="true" />

<w:attr w:name="OntName" w:val="Biological process" />

<w:attr w:name="url" w:val="http://purl.org/obo/owl/GO#GO_0031386" />

</w:customXmlPr>

<w:smartTag w:uri="BioLitTags" w:element="tag1">

<w:r>

<w:t>protein tag</w:t>

</w:r>

</w:smartTag>

</w:customXml>

Additional Functionality

The add-in also enables authors to search for terms in ontologies, look up their definition, as well as browse the ontologies to understand their organization and structure, as well as examine the terms.  The add-in also provides a way to highlight tagged terms, which makes it simple to review the document and identify all tagged words.

Ontology browser

A final useful piece of functionalty is the recognition of protein ID patterns from the National Center for Biotechnology Information (NCBI) and Protein Data Bank (PDB) databanks.

Protein ID recognition

 

Published Wednesday, March 11, 2009 9:17 AM by pablofe

Comments

Wednesday, March 11, 2009 1:51 PM by eScience @ Microsoft

# MSR Open Tools to Enhance Scientific Research Efforts Building on Science Commons Ontologies

At ETech 2009 today, the announcement went out that Science Commons in conjunction with MSR External

Wednesday, March 11, 2009 8:04 PM by Dan on eScience

# MSR Open Tools to Enhance Scientific Research Efforts Building on Science Commons Ontologies

At ETech 2009 today, the announcement went out that Science Commons in conjunction with MSR External

Thursday, March 12, 2009 7:52 AM by Weblogul lui Zoli

# Creative Commons Add-in a fost publicat sub Ms-PL

Microsoft a publicat codul sursă de la add-in-ul de CC pentru Office 2007. Add-in-ul era pus la download

Anonymous comments are disabled
 
Page view tracker