Download Research Tools
Jonathan Fay and I recently visited the California Academy of Sciences, the world-class natural history museum and research center located in San Francisco’s Golden Gate Park. Our goals were to connect in person with Academy researchers and to help prepare the Academy’s Morrison Planetarium for an Academy NightLife event. This particular NightLife featured Microsoft technology—including Windows Mobile—and a planetarium show that demonstrated the astronomy capabilities of Microsoft Research’s WorldWide Telescope. I was particularly interested in talking with Academy researchers about my work with the new and more local phase of Worldwide Telescope, which uses recently developed tools and capabilities to create earth-science data visualizations and tell stories—supported by the high-resolution imagery of Bing maps.
Hippocampus reidi, the Atlantic seahorse whose phylogenetic tree we are re-growing in WWT
I spoke at some length with several of the Academy’s researchers about the potential of WorldWide Telescope (WWT) as an earth-science research tool, and about how to visualize interesting and often complex datasets. As one consequence, I am now working on a proof-of-concept example for them that uses WWT to render the phylogenetic tree for a particular genus of seahorse as a space-time diagram in relation to the Earth. The idea is to illustrate the genetic distance between different species in relation to spatial separations between their native habitats. By extension, such diagrams could span orders, families, and even classes of living organisms.
We also spoke with the Academy researchers about broader data problems: specifically, the potential for using what we call the Environmental Information Framework—developed within our Earth, Energy, and Environment group—to address information-management challenges. We discussed different approaches to managing and publishing data and how Microsoft technologies (research tools, products, and services) might be applied to help.
Another fun development from this visit: by using GeoSynth, the standalone version of Microsoft Photosynth, I generated a digital representation of the museum’s iconic Tyrannosaurus rex (T. rex) from a set of photographs I took. I then created a Worldwide Telescope narrative tour that relocated the T. rex outside to a nearby baseball park. (I have her playing center field.) The tour features several of the source photographs as well; so we are exploring a combination of different types of data to tell the story.
To further expand on this new idea of employing WWT for more than exploring stars and galaxies: I use WWT to create visualizations of robotic submarine missions, clouds of organic molecules culled from Arctic rivers, twisted and knotted magnetic fields around the sun, distributions of soil carbon across the United States, biodiversity of sharks throughout the world’s oceans, and lots more. The complexity that all these datasets have in common makes it difficult to place them on a chart; to get them into WWT, all I need is some sort of coordinate basis—be it geographical, geometrical or parametric—and off we go!
Digital representation of Tyrannosaurus rex with photo inset of actual skeleton
To me, the common denominator between seahorses, dinosaurs, stars, planets, and carbon molecules is our fascination with learning about the natural world and the enjoyment of sharing our understanding with one another. Scientists spend years learning the details of their specialization, learning which questions to ask next. With the emergence of new data-generating tools to answer these questions, we see the corresponding emergence of the “drowning in data” syndrome, a malady we find rampant across all specializations and domains. Well, computers got us into this fix, and so we are working on ways to use computers to get us out.
As a member of the Microsoft Research Connections team, I feel extremely fortunate to have the opportunity to talk to research scientists and say, essentially, “How can we help?” Given a few minutes to show them what we have in the works, the ensuing conversations are enjoyable, and often lead to productive collaborations. We hope that those collaborations, in turn, lead to solutions that will be usable by others in the earth- and life-sciences. But, as a T. rex playing center field might suggest: one step at a time…so I’m looking forward to our ongoing conversations with the Academy.
—Rob Fatland, Senior Program Manager, Microsoft Research Connections
The challenge of DNA sequencing is central to all genomics research, and while the technology has existed since the 1970s, today’s massively-parallel sequencing instruments are capable of producing gigabytes of raw genomic data quickly and increasingly cheaply. Reconstruction of a DNA sequence from this data (for example, through de novo assembly) is a compute-intensive task, and experimentation has shown that data quantity is no substitute for quality when it comes to the accurate reconstruction of a DNA sequence. Unfortunately, not all sequencing technologies produce reliable and accurate results, and experimental data will always contain varying rates of error. Therefore, a preliminary quality control (QC) step is regularly employed to detect and counteract such sequencing errors.
The QC of sequencing results may range from simple manual filtering procedures to comprehensive automated solutions. To contribute to this area of QC tools development, we present Sequence Quality Control Studio (SeQCoS), a Microsoft .NET software suite that is designed to perform an array of QC evaluations and post-QC manipulation of sequencing data. SeQCoS generates a series of standard plots that illustrate the quality of the input data. These plots (saved in JPEG file format) provide information on commonly observed measurements, such as GC content (the proportion of guanine and cytosine nucleotide bases in a DNA sequence), and distribution of quality scores at position-specific and sequence-specific levels. In order to filter out poorly performing sequences, SeQCoS also conducts basic trimming and discarding functions to manipulate sequence files.
At Microsoft Research, the Microsoft Biology Initiative team is collaborating with academic research groups in the sequencing of various organisms. To ensure that the sequenced sample is not contaminated by other strains or sequencing vectors, SeQCoS optionally integrates NCBI BLAST for PCs running the Windows operating system to search against a BLAST-formatted database. We provide a pre-formatted database of NCBI UniVec, a repository of vector sequences, adapters, linkers and PCR (polymerase chain reaction) primers that are used in DNA sequencing; however, researchers can use a different database if they prefer.
About the Tools
SeQCoS was written in C#, using the .NET Bio (formerly the Microsoft Biology Foundation [MBF]) bioinformatics toolkit and Sho, a data analysis and visualization application. It is freely available as open-source code under the Apache 2.0 license. Further details and software downloads are available from Sequence Quality Control Studio.
.NET Bio is a library of common bioinformatics functions (file parsers, algorithms, and web service connectors) that simplify the creation of bioinformatics applications on the .NET platform and is an open-source project that is freely available for academic and commercial use under the Apache 2.0 license. While this project was initiated by Microsoft Research, it is owned by the Outercurve Foundation, a non-profit organization, and is governed by a growing community of users and contributors.
—Kevin Ha, Microsoft Research Intern
From November 9 to 12, 2011, Portland, Oregon, the City of Roses, becomes the City of Hoppers, as technology-minded women from the across the United States flock to the Grace Hopper Celebration (GHC) of Women in Computing, an annual conference that brings the research and career interests of women in computing to the forefront. Named for the legendary computer scientist, U.S. Navy Rear Admiral Grace Hopper, past GHCs have drawn 1,500 or more participants and dozens of corporate sponsors. This year, a record number of attendees (more than 2,000) are expected.
As in the past, leading researchers will present their current work, and special sessions will focus on the role of women in computer science, information technology, research, and engineering—as well as trends in these fields. And as always, a large contingent of corporate recruiters will be on hand—including many from Microsoft—looking to snag the top talent that GHC attracts and to help researchers and technical professionals expand their computer science knowledge and networks.
It’s exciting to see the lineup of amazing speakers from academic institutions, governments, nonprofits, and industry—including more than a dozen from Microsoft. All in all, more than 100 Microsoft researchers and technical employees will be attending, and the company is involved in more than 16 plenaries and sessions (see the line-up of Microsoft speakers). We also will be actively involved in the career night, the poster session, and the Sponsor Night Party. Fact is, Microsoft is a Platinum Sponsor of the Grace Hopper Celebration, for the fifth year in a row. We are proud to support the GHC and the contributions of the Anita Borg Institute and the Association for Computing Machinery, which are critical in attracting and retaining the women who will create the new technologies and drive new innovations for our global future. Be sure to come visit our booth (Exhibit Hall B 417), learn about natural user interfaces, and try out Kinect for Xbox at our Kinect Lounge in Hall C next to CyberCenter.
Now, let me plug my hometown for just a minute. As the United States’ top green city, Portland derives half its power from renewable sources; a quarter of the workforce commutes by bike, carpool, or public transportation; and it has more than 35 buildings certified by the U.S. Green Building Council. Microsoft shares Portland’s focus on harnessing green technology and was recently named one of the Top Green IT organizations by ComputerWorld. In line with our efforts to reduce carbon emissions by 30 percent per unit of revenue by 2012, Microsoft will be going collateral free at this year’s GHC, so we encourage all attendees to visit our Grace Hopper event site to find the information that would typically have been available as booth handouts.
That said, we still want every Hopper to stop by the Microsoft booth to pick out a photosynthetic “research partner” from our Project Epiphyte nursery. You and your air plant will collaboratively recycle carbon dioxide and oxygen as you symbiotically photosynthesize and respire, and you will join the Project Epiphyte community of dedicated plant-human partners. What’s more, you might even beautify your workspace. The epiphyte is more than just a highly-evolved organism that has transcended the limitations of its soil-bound ancestors. It symbolizes our desire to nurture a lasting relationship with GHC attendees and is a metaphor for the collaborative process of research, where knowledge is built on previous efforts and leads to entirely new fields of study. The first 1,500 attendees who visit our booth will receive an epiphyte and our renowned Microsoft Grace Hopper chocolate.
Stop by the Microsoft booth to participate in Project Epiphyte and learn what these items are all about.
Also, visit our recruiting booth (Exhibit Hall A566). In addition to full-time positions, we offer a number of internships, scholarships, and fellowships. We think Microsoft is a great place for technological women (and men) to realize their ambitions, and we aren’t alone. Just last month, Great Place to Work, a global research, consulting, and training firm, named Microsoft the world’s best global company at which to work. As I have been telling all my friends for the last 10 years that I work at the best company in the world, now they don’t have to only take my word for it! So while you stop to smell the roses in the City of Roses, set aside some time to sniff out the possibilities of becoming a “Softie.”
“What If” is this year’s theme of the GHC, and it aligns nicely with our theme across Microsoft this year: “Be What’s Next.” Everyone at the conference can “Be What’s Next” by answering and investigating all the possible “What Ifs.” And if that didn’t make sense, I’ll be glad to rephrase it in person at the GHC. See you in Portland.
—Rane Johnson-Stempson, Director of Women in Research, Science, and Engineering, Microsoft Research Connections