Download Research Tools
The challenge of DNA sequencing is central to all genomics research, and while the technology has existed since the 1970s, today’s massively-parallel sequencing instruments are capable of producing gigabytes of raw genomic data quickly and increasingly cheaply. Reconstruction of a DNA sequence from this data (for example, through de novo assembly) is a compute-intensive task, and experimentation has shown that data quantity is no substitute for quality when it comes to the accurate reconstruction of a DNA sequence. Unfortunately, not all sequencing technologies produce reliable and accurate results, and experimental data will always contain varying rates of error. Therefore, a preliminary quality control (QC) step is regularly employed to detect and counteract such sequencing errors.
The QC of sequencing results may range from simple manual filtering procedures to comprehensive automated solutions. To contribute to this area of QC tools development, we present Sequence Quality Control Studio (SeQCoS), a Microsoft .NET software suite that is designed to perform an array of QC evaluations and post-QC manipulation of sequencing data. SeQCoS generates a series of standard plots that illustrate the quality of the input data. These plots (saved in JPEG file format) provide information on commonly observed measurements, such as GC content (the proportion of guanine and cytosine nucleotide bases in a DNA sequence), and distribution of quality scores at position-specific and sequence-specific levels. In order to filter out poorly performing sequences, SeQCoS also conducts basic trimming and discarding functions to manipulate sequence files.
At Microsoft Research, the Microsoft Biology Initiative team is collaborating with academic research groups in the sequencing of various organisms. To ensure that the sequenced sample is not contaminated by other strains or sequencing vectors, SeQCoS optionally integrates NCBI BLAST for PCs running the Windows operating system to search against a BLAST-formatted database. We provide a pre-formatted database of NCBI UniVec, a repository of vector sequences, adapters, linkers and PCR (polymerase chain reaction) primers that are used in DNA sequencing; however, researchers can use a different database if they prefer.
About the Tools
SeQCoS was written in C#, using the .NET Bio (formerly the Microsoft Biology Foundation [MBF]) bioinformatics toolkit and Sho, a data analysis and visualization application. It is freely available as open-source code under the Apache 2.0 license. Further details and software downloads are available from Sequence Quality Control Studio.
.NET Bio is a library of common bioinformatics functions (file parsers, algorithms, and web service connectors) that simplify the creation of bioinformatics applications on the .NET platform and is an open-source project that is freely available for academic and commercial use under the Apache 2.0 license. While this project was initiated by Microsoft Research, it is owned by the Outercurve Foundation, a non-profit organization, and is governed by a growing community of users and contributors.
—Kevin Ha, Microsoft Research Intern
From November 9 to 12, 2011, Portland, Oregon, the City of Roses, becomes the City of Hoppers, as technology-minded women from the across the United States flock to the Grace Hopper Celebration (GHC) of Women in Computing, an annual conference that brings the research and career interests of women in computing to the forefront. Named for the legendary computer scientist, U.S. Navy Rear Admiral Grace Hopper, past GHCs have drawn 1,500 or more participants and dozens of corporate sponsors. This year, a record number of attendees (more than 2,000) are expected.
As in the past, leading researchers will present their current work, and special sessions will focus on the role of women in computer science, information technology, research, and engineering—as well as trends in these fields. And as always, a large contingent of corporate recruiters will be on hand—including many from Microsoft—looking to snag the top talent that GHC attracts and to help researchers and technical professionals expand their computer science knowledge and networks.
It’s exciting to see the lineup of amazing speakers from academic institutions, governments, nonprofits, and industry—including more than a dozen from Microsoft. All in all, more than 100 Microsoft researchers and technical employees will be attending, and the company is involved in more than 16 plenaries and sessions (see the line-up of Microsoft speakers). We also will be actively involved in the career night, the poster session, and the Sponsor Night Party. Fact is, Microsoft is a Platinum Sponsor of the Grace Hopper Celebration, for the fifth year in a row. We are proud to support the GHC and the contributions of the Anita Borg Institute and the Association for Computing Machinery, which are critical in attracting and retaining the women who will create the new technologies and drive new innovations for our global future. Be sure to come visit our booth (Exhibit Hall B 417), learn about natural user interfaces, and try out Kinect for Xbox at our Kinect Lounge in Hall C next to CyberCenter.
Now, let me plug my hometown for just a minute. As the United States’ top green city, Portland derives half its power from renewable sources; a quarter of the workforce commutes by bike, carpool, or public transportation; and it has more than 35 buildings certified by the U.S. Green Building Council. Microsoft shares Portland’s focus on harnessing green technology and was recently named one of the Top Green IT organizations by ComputerWorld. In line with our efforts to reduce carbon emissions by 30 percent per unit of revenue by 2012, Microsoft will be going collateral free at this year’s GHC, so we encourage all attendees to visit our Grace Hopper event site to find the information that would typically have been available as booth handouts.
That said, we still want every Hopper to stop by the Microsoft booth to pick out a photosynthetic “research partner” from our Project Epiphyte nursery. You and your air plant will collaboratively recycle carbon dioxide and oxygen as you symbiotically photosynthesize and respire, and you will join the Project Epiphyte community of dedicated plant-human partners. What’s more, you might even beautify your workspace. The epiphyte is more than just a highly-evolved organism that has transcended the limitations of its soil-bound ancestors. It symbolizes our desire to nurture a lasting relationship with GHC attendees and is a metaphor for the collaborative process of research, where knowledge is built on previous efforts and leads to entirely new fields of study. The first 1,500 attendees who visit our booth will receive an epiphyte and our renowned Microsoft Grace Hopper chocolate.
Stop by the Microsoft booth to participate in Project Epiphyte and learn what these items are all about.
Also, visit our recruiting booth (Exhibit Hall A566). In addition to full-time positions, we offer a number of internships, scholarships, and fellowships. We think Microsoft is a great place for technological women (and men) to realize their ambitions, and we aren’t alone. Just last month, Great Place to Work, a global research, consulting, and training firm, named Microsoft the world’s best global company at which to work. As I have been telling all my friends for the last 10 years that I work at the best company in the world, now they don’t have to only take my word for it! So while you stop to smell the roses in the City of Roses, set aside some time to sniff out the possibilities of becoming a “Softie.”
“What If” is this year’s theme of the GHC, and it aligns nicely with our theme across Microsoft this year: “Be What’s Next.” Everyone at the conference can “Be What’s Next” by answering and investigating all the possible “What Ifs.” And if that didn’t make sense, I’ll be glad to rephrase it in person at the GHC. See you in Portland.
—Rane Johnson-Stempson, Director of Women in Research, Science, and Engineering, Microsoft Research Connections
On October 25, 2011, Microsoft Research Connections released an update to Zentity, a repository platform designed to manage research objects—such as journal articles, reports, datasets, projects, and people—as well as the relationships among them. Zentity supports arbitrary data models, and provides semantically rich functionality that enables users to find and visually explore interesting relationships among elements by using the Microsoft Silverlight PivotViewer control and Microsoft Research Visual Explorer.
With the 2.1 release, Zentity now includes the Resource Manager web user interface that provides better content management capabilities via easier ways to query the database, review and update records, and create and edit relationships among items. The Resource Manager will work with custom data models and even enables users to save searches for later use. Zentity 2.1 also offers the option to install a localized Spanish-language version of the software.
I would like to highlight and thank a few of our partners who have been working with a variety of institutions to customize their Zentity deployments.
Building Blocks has partnered with the UK Economic and Social Research Council (ESRC) to expose the ESRC’s catalog of research projects and their outputs. The ESRC catalog contains more than 100,000 research objects, including books and journal articles as well as research outcomes and impact reports. The PivotViewer control integrated into Zentity 2.1 provides a visually compelling yet simple way for end-users to browse, filter, and explore decades’ worth of ERSC grant data and to find relevant research reports.
In a case study on this project, Building Blocks wrote:
Zentity was seen as the ideal research repository solution as it can handle the complex data models, whilst also providing data access in many open formats. In addition the team designed a more intuitive and robust backend system to enable ESRC support teams to manage the submission of research outputs, reducing management overhead. The quality and consistency of the data was also improved by ensuring the internal workflows were more efficient and allowing integration with other academic data sources such as SHERPA/RoMEO.
Meanwhile, in Scotland, Company Net partnered with Queen Margaret University to create an online experience for the digital archive of content from the Homecoming Scotland 2009 events. A Scottish government initiative, Homecoming Scotland 2009 was a year-long celebration of Scottish culture and achievements. The archive site also uses the PivotViewer control to make it easy to pivot among the people, places, and events associated with the Homecoming Scotland 2009 celebrations.
And finally, working with a collection of researcher data and electronic theses and dissertations at the Jorge Tadeo Lozano University (UJTL) in Bogotá, Colombia, Microsoft Partner Softtek delivered a solution localized in Spanish and customized to the needs of the researchers and integrated into UJTL’s environment. In his Softtek blog, Antonio Macias writes:
Having partnered with Microsoft Research in the deployment of Zentity 2.0 has definitely been an enriching experience for us since, on one hand, we have demonstrated Softtek’s continuous commitment to deliver high-quality services while working jointly with a highly respected high-tech company like Microsoft. We have been exposed to emergent technologies that will shape our world in the next 5 or 10 years. Indeed this exposure will help us add a fresh perspective to the set of solutions that we already provide to our large base of customers.
Zentity 2.1 is a freely available via download from Microsoft Research. I hope that you’ll give it a try, and if you are looking for partners to help on a deployment project, that you’ll use the Microsoft Partner Network.
—Alex Wade, Director for Scholarly Communication, Microsoft Research Connections