Download Research Tools
The following blog is from guest contributor Paul Greenfield of CSIRO, Australia’s national science agency. He and his colleagues have developed a new correction tool to address the problem of DNA sequencing errors in biological and ecological research, and they have just released it to the research community worldwide.
—Simon Mercer, Director, Microsoft Research
The rapid development of next-generation DNA sequencing has revolutionized biological and ecological research in the last few years. The cost of DNA sequencing has fallen dramatically, and sequencing machines are becoming a standard piece of lab equipment. Low-cost sequencing is enabling researchers to uncover the gene differences that make some people more susceptible to diseases; to explore the genetic makeup microbial communities from the human gut or the bottom of the ocean; and to rapidly identify the organism responsible for a life-threatening infection.
But while the costs of sequencing have plummeted, the accuracy of the data produced has improved only slowly: about 1 percent of the bases generated are still called incorrectly. The bioinformatics community has responded to this problem by building specialized error correction tools that use the inherent redundancy in sequence data to find and repair miscalls and other sequencing errors. Tests have shown that incorporating the best of these error-correction tools into standard bioinformatics analytical pipelines can result in much better quality genomes and more accurately called gene variants.
However, accurately correcting errors turns out to be a difficult problem, largely because of the repetitive and ambiguous nature of genomes. It is easy to correct simple substitution errors, such as when 50 sequence reads say that a given base is an A, and only the read being corrected says it’s a G. Such simple errors are well handled by downstream tools such as assemblers and aligners. The challenge is making the right correction when there are multiple plausible corrections—such as when 50 reads say A, 49 say G, and the read being corrected says T—as happens whenever reads fall across the end of a repeated region within a genome. Just to make things more challenging, this correction has to be done without any knowledge of the genomes being sequenced, and the only clues about which corrections are ‘“right” comes from the sequence data itself.
My colleagues and I at the Commonwealth Scientific and Industrial Research Organisation (CSIRO) have just released a new error correction tool we’ve developed for use by the research community. We call it “Blue.” Blue is a high-performance C# application that runs natively on Windows systems, and under Mono on Linux and OS X. As we reported in a paper published in Bioinformatics, test results show that Blue is significantly faster than other available tools—especially on Windows—and is also more accurate as it recursively evaluates possible alternative corrections in the context of the read being corrected.
Another uncommon feature of Blue is that it can correct all three types of possible errors (substitutions, deletions, and insertions), making it suitable for use of data produced by the Roche 454 and Life Technologies Ion Torrent systems. Blue also allows for the correction of one set of reads with a consensus derived from another set of reads, and this capability has been used to correct small numbers of long (and expensive) Roche 454 reads with a consensus derived from a large file of cheaper (but shorter) Illumina reads. This “cross-correction” method has been used very effectively to improve the quality of several reference assemblies, ranging in size from bacteria to moths and grasses.
Blue and its associated tools can be downloaded from CSIRO Bioinformatics.
—Paul Greenfield, Research Group Leader, CSIRO, Division Computational Informatics
Summer Bridge students and their hosts at Microsoft
Experts agree that the next wave of innovation in computing requires diversity in the research and development teams who will create it. I believe that means expanding the pipeline of students entering computing. In particular, we need to get more girls into the pipeline, which is why I am so pleased to have had two amazing young women working with me as interns this summer: Veronica Catete, a third-year doctoral student at North Carolina State University, and Alka Pai, a senior at Tesla STEM High School in Redmond, Washington.
Veronica and Alka are enthusiastic about encouraging more young women to study and work in the computer sciences. To that end, they are developing a free, online computer science toolkit for middle-school girls as well as a course that teaches principles of computer science through game design. When they aren’t busy developing amazing tools, this dynamic duo is participating in events and activities that are designed to excite young people about the future of computer science.
I’d like to hand it over to Veronica and Alka, to discuss an event they hosted in July at Microsoft’s Redmond (Washington) campus. As you read their account, I encourage you to ask yourself how you, too, might help foster more diversity to computing. We all have an interest in promoting innovation in technology and computer science. Perhaps Veronica and Alka’s blog inspires some ideas—if so, I’d love to hear from you!
—Rane Johnson-Stempson, Principal Research Director for Education and Scholarly Communication, Microsoft Research
On July 17, 13 students (10 girls and 3 boys) from the greater Seattle area came to Microsoft to explore the possibilities offered by careers in computing. These students are part of the Summer Bridge Program, an academic enrichment and college readiness project offered through the University of Washington Women’s Center. This program is designed for promising eighth-grade students who are interested in exploring science, technology, engineering, and mathematics—the so-called STEM fields.
We gave the students a tour of the Microsoft campus, highlighting several of the amazing projects underway here. The students started their day by exploring modeling and graphics by designing 3D models of Seattle’s iconic Space Needle, which they were able to print in the Microsoft Research hardware lab.
Working together, students build a model of the Seattle Space Needle.
During lunch, our visitors enjoyed a panel discussion from three of our high-school interns, Alisha Meherally, Arjun Narayan, and me (Alka). We discussed how we got started in computer science and what it’s like to work at Microsoft. We also offered our tips for finding opportunities to work in and learn about computer science outside the classroom. I think we surprised the students by admitting that all three of us entered computer science studies reluctantly—kicking and screaming, so to speak. But we hastened to add that now, having experienced the thrill of resolving software bugs and seeing computing’s potential for creative disruption, we are avid enthusiasts, deeply passionate about our work in computer science.
The Summer Bridge students then participated in a TouchDevelop workshop, where they used Windows 8 phones to write actual software code. Then we headed off to tour Microsoft’s state-of-the-art Cybercrime Center, where the students got upclose and personal with the forensics lab and experienced, firsthand, the tools and techniques used to spot cyber crimes. For example, students Waltana Dewit, Yohannes Seghane, and Sarina Tran examined several supposed Microsoft products, working together to determine which were legitimate and which were counterfeit. “You have to look really hard to notice the differences,” said Yohannes. “If someone were to buy one of these from Amazon, I don’t think they would be able to tell.”
Looking for cyber crimes: students try to identify counterfeit software products.
Our visitors finished the day by touring Microsoft Research’s hardware lab. There they got to see the cool gadgets that the researchers use to prototype their ideas or fix a broken part.
The students were excited to see the potential of computer science careers to change the world, and they came away with a deeper understanding of why they should study STEM. They left with smiles on their faces, souveniors in their pockets, and a world of opportunity ahead. “This place is amazing,” observed Ngocmi Ngo. “I’ve already decided that I want to work here, now I just have to wait until I’m a junior.”
That’s the spirit, Ngocmi.
—Veronica Catete and Alka Pai, Microsoft Research Interns
About 10 months ago, China’s first planetarium driven by the WorldWide Telescope (WWT) was launched at the Shixinlu primary school. Powered by six high-resolution projectors, the 8-meter dome installation has enabled students not only to see and study the stars and the universe in an immersive planetarium setting, but it also has allowed them to create their own tours of the heavens and have them displayed on the dome.
That installation marked the beginning of the WWT Digital Dome project in China, a project that aims to add WWT-driven planetariums to schools at every level—from primary through university. Currently, three primary schools and three universities are constructing or are committed to building a WWT Digital Dome, and three additional universities and the Beijing Planetarium have expressed strong interest in hosting a Digital Dome installation.
The WWT Digital Dome installation at Shixinlu primary school
Recognizing the teaching potential of this growing network of WWT Digital Dome installations, the National Astronomical Observatories of the Chinese Academy of Sciences (NAOC), Central China Normal University (CCNU), and Chongqing Wutai Technology Co. Ltd are working to form an alliance among the WWT planetariums. This alliance will enable the various schools to share their experiences with the WWT Digital Dome—including tricks and tips for using the hardware and software.
More importantly, the alliance will allow participating schools to exchange key takeaways about developing curriculum and tours based on Digital Dome content. This pedagogical cross-fertilization is already taking shape with the design of WWT curricula for primary and secondary schools. There are now 16 WWT courses for primary and secondary schools, including 22 modules for primary schools, 33 modules for middle schools, and 26 modules for high schools. More than 2,000 students have been taught by using these courses.
In addition, dozens of guided tours have been completed at the primary school and secondary school level, covering such basic astronomical concepts as “Exploring our Family—the Earth,” “Understanding the Galaxy and the Universe,” and “Viewing the Seasonal Stars.” Meanwhile, advanced tours—which take advantage of the Layerscape (a WWT add-in for Excel that enables users to visualize spatial data)—are being prepared for high schools and colleges. And a community of WWT users in Beijing has begun integrating traditional Chinese constellation images into the WWT astronomical data sets, providing a unique cultural link between the past and present.
Seeking to build on these educational efforts, from July 29 to 30, Microsoft Research, NAOC, and CCNU sponsored a WWT training workshop in Chongqing, our fourth such collaborative workshop. The event, which took place at the Shixinlu primary school, drew more than 30 faculty members from every level of schooling—primary, secondary, and higher education.
Dr. Cuilan Qiao's talk covered basic and advanced features of the WorldWide Telescope.
During the first day of this intense, tightly focused event, I gave a talk on the history of WWT in China and introduced one of Microsoft Research’s latest teaching tools: Office Mix, a tool for creating compelling online lessons. Then Dr. Chenzhou Cui of NAOC gave an overview of the Chinese Virtual Observatory—a project that aims to create a data-intensive, online astronomical research and education platform—and WWT’s potential role in this national effort. This was followed by a presentation from Dr. Cuilan Qiao and her team from CCNU, who introduced the attendees to WWT’s basic and advanced features.
On day two, we focused on the WWT Digital Dome, showing how it can be an invaluable teaching tool. The day’s events included a WWT video designed by students and faculty from the Shixinlu school, as well as talks by the WWT engineering team on the construction and operation of the Digital Dome.
Faculty from every level—primary, secondary, and higher education—anxiously awaited a tour highlighting the WWT's teaching potential.
This workshop was just the latest example of our continued collaborative efforts to develop a WWT curriculum and construct Digital Dome planetariums in China. With the planned addition of WWT Digital Dome installations described earlier, the body of WWT teaching materials will undoubtedly grow even faster. Microsoft Research is pleased to be part of this educational program, which is increasing scientific literacy and sparking intellectual curiosity among Chinese students.
—Guobin Wu, Research Program Manager, Microsoft Research