Download Research Tools
Could a semantic, chemical authoring tool be developed for Microsoft Word? The paper and PDF formats that are the standard vehicles for scholarly communication are great at presenting natural language for people to read, but are not as good at carrying the machine-interpretable semantic data that is becoming an increasingly important aspect of making sense of today's "data deluge." Tony Hey, Savas Parastatidis, and Lee Dirks from Microsoft Research initially discussed this possibility with Dr. Peter Murray-Rust of the Unilever Centre at Cambridge University back in 2007. Peter is considered the "father" of Chemical Markup Language (CML is a semantic XML representation of chemical entities) and explained that a large percentage of chemists use Microsoft Word to write their research papers. He hoped that by incorporating CML into Word, he could expedite his idea of developing a semantic, chemical authoring tool. We wondered if Peter would partner with us in this endeavor. Suffice it to say that when he agreed to sign on, a project was soon underway.
I first met Peter and Joe Townsend in early 2008, just after I joined the Microsoft External Research team. Peter, Joe, and Jim Downing were all visiting Redmond to discuss making this idea a reality through a joint development project between our team in Redmond and Peter and his team in Cambridge. I was asked to serve as the program manager for this adventure.
We all had a common vision for what we wanted to achieve, but we faced many obstacles. Multiple time zones, varying degrees of programming language familiarity, different project management styles, and a total lack of chemistry knowledge on my part made the first six months a little slow going.
Still, we made progress: Embedding the CML files for each molecule referenced in the document was fairly straightforward. One of the nice features of the new DOCX format is that each file is basically a ZIP file—a container into which we could park each bit of chemistry as its own XML file. And we could also anchor these to bits of text in the document itself (in other words, the document.xml file).
So far, so good—as long as the thing in the document was text. Or an image. In fact, we got by for a while by having a handy PNG graphic for each of the molecules that we had in CML, so that when we imported a CML file we could also slip the pre-fabricated graphic into the document. It couldn't be edited, but it made the point to the casual observer that a human could see a two-dimensional (2-D) representation of the molecule, or the label in the document. More importantly, it demonstrated that a machine can understand the underlying semantics of the chemistry by reading the CML representation.
But what about all the fancy subscripts and superscripts, and pre-sub, and pre-super, and sub-super, or super-super scripts required of charges, electrons, isotopes, hydrogen dots, labels, and so forth? For this, we looked to Murray Sargent, the guru of all things mathematical and the driving force behind the great equation editing features in recent releases of Word. After reviewing our options, we decided to build upon Word's math zone features. This would allow us to take advantage of the work already done to support the complex and flexible layout required of mathematical equations.
Meanwhile, our team was spending a good deal of time reviewing options for our 2-D chemical editor. We ended up launching a separate Windows Presentation Foundation (WPF) pane from within Word, which reads in the CML, renders it, and allows the user to perform various editing functions, all while preserving a certain amount of "chemical intelligence."
This was not just characters and lines on a drawing board. When you select a particular atom, the options that you get for editing are dependent on the sorts of things that are chemically viable in that particular structure. And when you save an edited structure, the Chemistry Add-in for Word (Chem4Word) writes the modified CML back into the DOCX package, creates a PNG (for viewing in the document), updates the chemical formula, and prompts the user to update any of the other labels from the CML file.
Once this initial work was established, we brought the chemical intelligence developers and the WPF developers closer together—in the U.K.—so they could meet in person more frequently. This helped move the project along at a good pace, culminating in our beta release at the American Chemical Society Annual Meeting in March 2010.
Since the beta release, most of the work has been on Joe's shoulders. He has done significant clean-up, fixing bugs and taking in a lot of usability feedback (especially from his students). Most importantly, he has added the ability to look up existing molecular structures via existing web services at the NCBI's PubChem and the Unilever Centre's OPSIN databases. These can be used in the Chem4Word version 1.0 ribbon via load from PubChem.
I am extremely proud of this project and I am thrilled to finally see version 1.0 released to the world. We have so much more to do, however. A colleague in the U.K. recently helped explain the potential directions:
"The future of research will be powered not only by ever more rapid dissemination of ever large quantities of data, but also by software tools that 'understand' something about science. These tools will behave intelligently with respect to the information they process, and will free their human users to spend more time doing the things that humans do best: generating ideas, designing experiments, and making discoveries," said Timo Hannay, Managing Director for Digital Science at MacMillan Publishers Ltd. "Chem4Word is one of the best examples so far of this important new development at the interface between science and technology."
The Chem4Word project was one of our team's first open source releases. Just after the beta release last March, we launched the source code project on CodePlex under an Apache 2.0 license. And today, we are announcing that the project has joined the Research Accelerators gallery as a part of the Outercurve Foundation.
Here's to a long and happy future for the Chem4Word project—we hope it will offer the community a method for better facilitating and enabling semantic chemistry.
—Alex Wade, Director of Scholarly Communication, Microsoft Research
For more information, check out the Chemistry Add-in for Word press release.
One of the responsibilities for us as researchers is to have the courage to challenge accepted "truths" and to seek out new insights. Richard Feynman was a physicist who not only epitomized both of these qualities in his research but also took enormous pleasure in communicating the ideas of physics to students. Feynman won the Nobel Prize for his computational toolkit that we now call Feynman Diagrams. The techniques he developed helped the physics community make sense of Quantum Electrodynamics (QED) after the war, when the entire community was in a state of confusion about how to handle the infinities that appeared all over the place when one tried to make a perturbative expansion in the coupling.
Feynman was the subject of a recent TEDxCaltech conference, fittingly called, "Feynman's Vision: The Next 50 Years." The event was organized in recognition of the 50-year anniversary of Feynman's visionary talk, "There's Plenty of Room at the Bottom," in which he set out a vision for nanoscience that is only now beginning to be realized. It is also 50 years since he gave his revolutionary "Feynman Lectures on Physics," which educated generations of physicists.
I had the honor of speaking about Feynman's contributions to computing, from his days at Los Alamos during the war, his Nobel Prize winning computational toolkit (Feynman Diagrams), and his invention of quantum computing, By striving to think differently, he truly changed the world. The following are some highlights from my presentation.
Parallel Computing Without Computers
Feynman worked on the Manhattan Project at Los Alamos in the 1940s with Robert Oppenheimer, Hans Bethe, and Edward Teller. In order to make an atom bomb from the newly-discovered trans-uranic element, Plutonium, it was necessary to generate a spherical compression wave to compress the Plutonium to critical mass for the chain reaction to start. It was, therefore, necessary to calculate how to position explosive charges in a cavity to generate such a compression wave; these calculations were sufficiently complex that they had to be done numerically. The team assigned to perform these calculations was known as the "IBM team," but it should be stressed that this was in the days before computers and the team operated on decks of cards with adding machines, tabulators, sorters, collators, and so on. The problem was that the calculations were taking too long, so Feynman was put in charge of the IBM team.
Feynman immediately discovered that because of the obsession with secrecy at Los Alamos, the team members had no idea of the significance of their calculations or why they were important for the war effort. He went straight to Oppenheimer and asked for permission to brief the team about the importance of their implosion calculations. He also discovered a way to speed up the calculations. By assigning each problem to a different colored deck of cards, the team could work on more than one problem at once. While one deck was using one of the machines for one stage of the calculation, another deck could be using a different machine for a different stage of its calculation. In essence, this is a now-familiar technique of parallel computing—the pipeline parallelism familiar from the Cray vector supercomputers, for example.
The result was a total transformation. Instead of completing only three problems in nine months, the team was able to complete nine problems in three months! Of course, this led to a different problem when management reasoned that it should be possible to complete the last calculation needed for the Trinity test in less than a month. To meet this deadline, Feynman and his team had to address the more difficult problem of breaking up a single calculation into pieces that could be performed in parallel.
My next story starts in 1948 at the Pocono Conference where all the great figures of physics—Niels Bohr, Paul Dirac, Robert Oppenheimer, Edward Teller, and so on—had assembled to try to understand how to make sense of the infinities in QED. Feynman and Schwinger were the star speakers, but Feynman was unable to make his audience understand how he did his calculations. His interpretation of positrons as negative energy electrons moving backwards in time was just too hard for them to accept. After the conference, Feynman was in despair and later said, "My machines came from too far away."
Less than a year later, Feynman had his triumph. At an American Physical Society meeting in New York, Murray Slotnick talked about some calculations he had done with two different meson-nucleon couplings. He had shown that these two couplings indeed gave different answers. After Slotnick's talk, Oppenheimer got up from the audience and said that Slotnick's calculations must be wrong since they violated Case's Theorem. Poor Slotnick had to confess that he had never heard of Case's Theorem and Oppenheimer informed him that he could remedy his ignorance by listening to Professor Case present his theorem the following day.
That night, Feynman couldn't sleep so he decided to re-do Slotnick's calculations by using his diagram techniques. The next day at the conference, Feynman sought out Slotnick, told him what he had done, and suggested they compare results. "What do you mean you worked it out last night?" Slotnick responded. "It took me six months!" As the two compared answers, Slotnick asked, "What is that Q in there, that variable Q?" Feynman replied that the Q was the momentum transfer as the electron was deflected by different angles. "Oh," Slotnick replied. "I only have the limiting value as Q approaches zero. For forward scattering." Feynman said, "No problem, we can just set Q equal to zero in my formulas!" Feynman found that he had obtained the same answer as Slotnick.
After Case had presented his theorem, Feynman stood up at the back of the audience and said, "Professor Case, I checked Slotnick's calculations last night and I agree with him, so your theorem must be wrong." And then he sat down. That was a thrilling moment for Feynman, like winning the Nobel Prize—which he did much later—because he was now sure that he had achieved something significant. It had taken Slotnick six months to do the case of zero momentum transfer while Feynman had been able to complete the calculation for arbitrary momentum transfer in one evening. The computational toolkit that we now call Feynman Diagrams have now penetrated to almost all areas of physics and his diagrams appear on the blackboards of physicists all around the world. This toolkit is undoubtedly Feynman's greatest gift to physics and the story perfectly illustrates Feynman's preference for concrete, detailed calculation rather than reliance on more abstract theorems.
The Physics of Computation
At the invitation of his friend Ed Fredkin, Feynman delivered a keynote lecture at "The Physics of Computation" Conference at MIT in 1981. Feynman considered the problem of whether it was possible to perform an accurate simulation of Nature on a classical computer. As Nature ultimately obeys the laws of quantum mechanics, the problem reduces to simulating a quantum mechanical system on a classical computer. Because of the nature of quantum objects like electrons, truly quantum mechanical calculations on a classical computer rapidly become impractical for more than a few 10's of electrons.
Feynman then proceeded to consider a new type of computer based on quantum mechanics: a quantum computer. He realized that this was a new type of computer: "Not a Turing machine, but a machine of a different kind." Interestingly, Feynman did not go on to explore the different capabilities of quantum computers but simply demonstrated how you could use them to simulate true quantum systems.
By his presence at the conference, Feynman stimulated interest both in the physics of computation and in quantum computing. At this conference 30 years later, we heard several talks summarizing progress towards actually building a quantum computer. In the last five years of his life, Feynman gave lectures on computation at Caltech, initially with colleagues Carver Mead and John Hopfield, and for the last three years by himself.
I was fortunate enough to be asked by Feynman to write up his "Lectures on Computation." The lectures were a veritable tour de force and were probably a decade ahead of their time. Feynman considered the limits to computation due to mathematics, thermodynamics, noise, silicon engineering, and quantum mechanics. In the lectures, he also gave his view about the field of computer science: He regarded science as the study of natural systems and classified computer science as engineering since it studied man-made systems.
Inspiring Later Generations
Feynman said that he started out very focused on physics and only broadened his studies later in life. There are several fascinating biographies of Feynman but the one I like best is No Ordinary Genius by Christopher Sykes. This is a wonderful collection of anecdotes, interview, and articles about Feynman and his wide range of interests—from physics, to painting, to bongo drums and the Challenger Enquiry. Feynman was a wonderful inspiration to the entire scientific community and his enjoyment of and enthusiasm for physics is beautifully captured in the TV interview, "The Pleasure of Finding Things Out," produced by Christopher Sykes for the BBC. Feynman is forever a reminder that we must try to think differently in order to innovate and succeed.
—Tony Hey, corporate vice president of the External Research Division of Microsoft Research
Human memory is all too fallible. We all misplace items or forget to run an errand occasionally; our memories of specific events can fade with time as well. But severe memory issues can have a devastating impact on quality of life for individuals with clinically diagnosed memory disorders that are related to acquired brain injury (for example, an accident) or neurodegenerative diseases (for example, Alzheimer's disease).
There is no cure for memory loss. In the past, neuropsychologists had to rely on fairly primitive devices (such as photo albums, diaries, and electronic reminders) to help patients cope with memory conditions. Technology is rapidly evolving, however, and providing new opportunities to help patients.
A notable development in the field is the SenseCam, a memory-enhancing camera developed by Microsoft Researchers at the Cambridge campus and subsequently licensed to Vicon. Vicon sells the SenseCam as a medical device, the Vicon Revue, which has been named one of the 100 best innovations of 2010 by Popular Science. The SenseCam uses a wide-angle lens to document the patient's day—including places visited and people seen—creating visual "memories" through pictures. The camera, which is worn around the neck, takes a photograph:
At the end of the day, the patient downloads the images to a computer. These images create visual reminders of events from throughout the day—essentially, they are digital memories. These SenseCam images appear to stimulate the episodic memory of patients who view them. Unlike staged (or posed) photographs, which tend to change the nature of the very moment being captured, SenseCam images are recorded passively, with no conscious effort or intervention. Combined with the relatively large number of images, this seems to have a powerful effect on recall. Numerous patients have benefitted from true autobiographical recall through this technology; typically, a handful of images stimulates the same feelings and emotions the wearer had when they occurred.
Ultimately, we hope that SenseCam will have the potential to alleviate the onset of Alzheimer's disease in at-risk patients. Multiple studies around the globe, funded by Microsoft External Research, have helped us understand how SenseCam can help patients with a variety of memory-loss conditions. These studies include:
The SenseCam was recently featured in TIME magazine and is currently on display at the Science Museum in London. For more information, see the Introduction to SenseCam.
—Steve Hodges, Principal Hardware Engineer, Microsoft Research, and Kristin Tolle, Director, Natural User Interfaces Team, External Research division of Microsoft Research