You know there are times when superlatives don't do justice - I'll just leave it at this - David and Carl are doing great things - take a look at the app and the code...and the cool part is that the PhyloD code will run on the DigiePede Network.....
Microsoft Research Releases Tools to Help Science Progress Toward an AIDS Vaccine After two years of pioneering work and collaboration in AIDS research, Microsoft openly shares its software source code with the greater scientific community to help expedite global research. Related Links Microsoft Resources: Microsoft Computational Biology Web Tools Microsoft Computational Biology Web Tools Source Code Epitope Prediction Tool white paper Science Web Site AIDS/HIV: Finding Footprints Among the Trees (paid subscription required) Founder Effects in the Assessment of HIV Polymorphisms and HLA Allele Associations (paid subscription required) REDMOND, Wash., June 13, 2007 -- The source code for a set of software tools developed by Microsoft Research to advance AIDS vaccine research and development is available for download starting today from Microsoft’s CodePlex Web site. By sharing the code openly and at no charge with the worldwide AIDS research community, Microsoft hopes to spur other scientists and researchers to take up the tools and even build on them, thereby speeding the way toward a vaccine. The code for four software tools is available now at no charge via CodePlex, an online portal created in 2006 to foster collaborative software development projects and host shared source code as part of Microsoft’s Shared Source Initiative. The tools and source code are an initial piece of Microsoft’s technical computing effort – a company-wide initiative to collaborate with the worldwide scientific community by reducing the time to new scientific insights and breakthroughs by furthering the state of information technology in scientific research.
Microsoft Computational Biology Web Tools
Microsoft Computational Biology Web Tools Source Code
Epitope Prediction Tool white paper
Science Web Site
AIDS/HIV: Finding Footprints Among the Trees (paid subscription required)
Founder Effects in the Assessment of HIV Polymorphisms and HLA Allele Associations (paid subscription required)
REDMOND, Wash., June 13, 2007 -- The source code for a set of software tools developed by Microsoft Research to advance AIDS vaccine research and development is available for download starting today from Microsoft’s CodePlex Web site. By sharing the code openly and at no charge with the worldwide AIDS research community, Microsoft hopes to spur other scientists and researchers to take up the tools and even build on them, thereby speeding the way toward a vaccine.
The code for four software tools is available now at no charge via CodePlex, an online portal created in 2006 to foster collaborative software development projects and host shared source code as part of Microsoft’s Shared Source Initiative. The tools and source code are an initial piece of Microsoft’s technical computing effort – a company-wide initiative to collaborate with the worldwide scientific community by reducing the time to new scientific insights and breakthroughs by furthering the state of information technology in scientific research.
Source: Microsoft Research Releases Tools to Help Science Progress Toward an AIDS Vaccine: After two years of pioneering work and collaboration in AIDS research, Microsoft openly shares its software source code with the greater scientific community to help expedite global research.
Just saw that the Business Intelligence folks have setup BI Labs – It will be interesting to see all the prototypes and concepts as the come out and how they can be used for Scientific research/exploration. I’m currently looking at the Fuzzy Lookup Add-in to see how it performs for environmental datasets.
The Fuzzy Lookup Add-In for Excel was developed by Microsoft Research and performs fuzzy matching of textual data in Microsoft Excel. It can be used to identify fuzzy duplicate rows within a single table or to fuzzy join similar rows between two different tables. The matching is robust to a wide variety of errors including spelling mistakes, abbreviations, synonyms and added/missing data. For instance, it might detect that the rows “Mr. Andrew Hill”, “Hill, Andrew R.” and “Andy Hill” all refer to the same underlying entity, returning a similarity score along with each match. While the default configuration works well for a wide variety of textual data, such as product names or customer addresses, the matching may also be customized for specific domains or languages.
BI Labs BI Labs is a collection of experimental business intelligence projects and useful applications made available from internal sources across Microsoft. These projects are prototypes and concepts, and there are no current plans to include them in Microsoft products. New ideas can pop up at any time, so please check back often to see what's new. We look forward to your feedback. Enjoy!
BI Labs
If you've ever been interested in using datamining tools - but don't have the time to figure out how it all works...take a look at these add-ins for the Excel2007. With the data in a spreadsheet - you can kick off an analysis.
I can see this a great way to clean and analyze data...especially scientific data.
Microsoft SQL Server 2005 Data Mining Add-ins for Office 2007 (Data Mining Add-ins) allow you take advantage of SQL Server 2005 predictive analytics in Office Excel 2007 and Office Visio 2007. The download includes the following components: Table Analysis Tools for Excel: This add-in provides you with easy-to-use tasks that leverage SQL Server 2005 Data Mining under the covers to perform powerful analytics on your spreadsheet data. Data Mining Client for Excel: This add-in allows you to go through the full data mining model development lifecycle within Excel 2007 using either your spreadsheet data or external data accessible through your SQL Server 2005 Analysis Services instance. Data Mining Templates for Visio: This add-in allows you to render and share your mining models as annotatable Visio 2007 drawings.
Microsoft SQL Server 2005 Data Mining Add-ins for Office 2007 (Data Mining Add-ins) allow you take advantage of SQL Server 2005 predictive analytics in Office Excel 2007 and Office Visio 2007. The download includes the following components:
Source: Download details: SQL Server Data Mining Add-ins for Office 2007 CTP
I meant to do an entry on this paper by Stuart Ozer (MSR) and David Kim & David Baker (Rosetta@Home) months ago...it's a great way to integrate SQL Reporting services w/ something like Rosetta@Home, and provide really great service for not only the community users - but also for the researchers using the system. Below is the architecture diagram...
Reporting@Home: Delivering Dynamic Graphical Feedback to Participants and Researchers in Community Computing Projects Stuart Ozer; David Kim; David Baker February 2007 Available Documents:Word 638 Kb PDF 482 Kb A new generation of computationally intensive scientific research projects relies on volunteers from around the world contributing idle computer time to calculate mathematical models. Many of these projects utilize a common architecture to manage the scheduling and distribution of calculations and collection of results from participants. User engagement is critical to the success of these projects, and feedback to participants illustrating their role in the project’s progress is known to increase interest and strengthen the community. This article describes how one project -- University of Washington’s Rosetta@Home, which predicts and designs the folded conformations of proteins and protein complexes -- created a web-based, on-demand reporting system that graphically illustrates a user or team’s contributions to the project. The reporting service is also useful to the project scientists in assessing the utility of alternative models and computational techniques. The system relies on a comprehensive database platform that includes tools for data integration, data management, querying and web-based reporting. The reporting components integrate seamlessly with the rest of the project’s data and web infrastructure, and the report pages have proven to be popular among both participants and lab members.
Stuart Ozer; David Kim; David Baker
February 2007
Available Documents:Word 638 Kb PDF 482 Kb
A new generation of computationally intensive scientific research projects relies on volunteers from around the world contributing idle computer time to calculate mathematical models. Many of these projects utilize a common architecture to manage the scheduling and distribution of calculations and collection of results from participants. User engagement is critical to the success of these projects, and feedback to participants illustrating their role in the project’s progress is known to increase interest and strengthen the community. This article describes how one project -- University of Washington’s Rosetta@Home, which predicts and designs the folded conformations of proteins and protein complexes -- created a web-based, on-demand reporting system that graphically illustrates a user or team’s contributions to the project. The reporting service is also useful to the project scientists in assessing the utility of alternative models and computational techniques. The system relies on a comprehensive database platform that includes tools for data integration, data management, querying and web-based reporting. The reporting components integrate seamlessly with the rest of the project’s data and web infrastructure, and the report pages have proven to be popular among both participants and lab members.
Source: Reporting@Home: Delivering Dynamic Graphical Feedback to Participants and Researchers in Community Computing Projects
While I’ve been pushing the ideas of using OLAP data cubes to evaluate scientific data for awhile, I thought it might be a good time to pull together some relevant papers and links. I believe OLAP is ideal to help analyze large quantities of data including time series information...making it easier for the scientist/researcher to explore the data in real-time and from tools they know like Excel. For example the data served up on FluxData site is done by creating OLAP cubes using SQL Server Analysis Services.
A couple of tools/links that might be of interest as well:
Here are a couple of papers that reference the use of OLAP for different types of scientific data.
Now this is fun science - Microsoft Research and Disney•Pixar team up to offer guided tours of the universe with WorldWide Telescope – how better to get our children interested in science and the universe – for most of us it was the Apollo Missions that interested in science and space, now WALL•E is a good ambassador.
WALL•E's Universe Explore the Universe with WALL•E and Andrew Stanton. Zoom, pan, spin and learn about planets, constellations, stars and galaxies. © Disney/Pixar
WALL•E's Universe
Explore the Universe with WALL•E and Andrew Stanton. Zoom, pan, spin and learn about planets, constellations, stars and galaxies.
© Disney/Pixar
WorldWide Telescope
Graywulf is the natural evolution of Beowulf Clusters – it brings together HPC clusters and databases to do efficient processing and data management. It’s name and design also pays homage to Jim Gray – who helped champion the use of relational databases in the scientific projects.
At it’s simplest form Graywulf is having a database installed on each of the HPC compute nodes – this brings the data to the computation – one of the points Jim made quite often and utilizes the power of databases (queries, stored procedures, etc). Since it’s a generic architecture Graywulf clusters can be built using any OS and any database…the ones in the case study below implemented them using Windows HPC Server and SQL Server and the motivation was to be more efficient in doing the science – it’s always great to have innovative folks using technologies to do good work.
“To put it simply, a scientist needs to be able to live within the data,” says Alexander Szalay, a cosmologist-turned-computer-scientist at The Johns Hopkins University (JHU) in Baltimore, Maryland. The power of information, Szalay says, is determined not by its quantity so much as how easy it is to access, manipulate and analyze. “It’s not just about doing the numerical calculations,” adds Andrew Simms, a biomedical health informatics graduate student working on protein structure analysis in Valerie Daggett’s bioengineering laboratory at the University of Washington (UW) in Seattle. “It’s also about assembling the data so we can run calculations while performing analyses and ad hoc explorations and then feed it all back into the data warehouse.”
Graywulf Takes Byte Out of Data Overload Astronomers at The Johns Hopkins University and protein scientists at the University of Washington are using inexpensive computer hardware combined with powerful computing and database software to help manage and analyze a growing volume of scientific data. For details, read the Graywulf case study. Project Principals Alexander Szalay, Alumni Centennial Professor, Department of Physics and Astronomy, The Johns Hopkins University Valerie Daggett, Professor of Bioengineering, University of Washington
Astronomers at The Johns Hopkins University and protein scientists at the University of Washington are using inexpensive computer hardware combined with powerful computing and database software to help manage and analyze a growing volume of scientific data.
For details, read the Graywulf case study.
Graywulf Takes Byte Out of Data Overload - Microsoft Research
With this ADK, users can convert their own astronomical images/data to the format that can be read by WWT and share with other WWT users. Can’t wait to see more images/datasets made available.
WorldWide Telescope Academic Development Kit, January 2009 Release The WorldWide Telescope (WWT) Academic Development Kit, January 2009 release contains two utilities that enable people to convert their astronomical images, panoramas, sky surveys, and planetary textures to a format that can be read by WWT and shared with other WWT users. It produces image pyramids of the photographs, thumbnails, and WTML files. WTML files are XML files in the WWT format that point to the images on the Internet and store details of how they are to be displayed in WWT and metadata such as image title and credits. The WWT SphereToaster Tool enables users to provide images in an equirectangular format that covers all or part of the inside or outside of a sphere. This includes, for example, cylindrical projections of panoramas and all-sky surveys. SphereToaster converts these to a different projection system—the TOAST system, currently unique to WWT—and then stores an image pyramid of the resulting TOAST-projected image. The tool also produces thumbnails and WTML files. The WWT StudyChopper Tool enables users to provide photographs of small parts of the sky, such as a high-resolution image of the Crab Nebula, and enter appropriate coordinate information and metadata. It creates image pyramids of the photographs, thumbnails, and WTML files. Once the output image pyramids and thumbnails are hosted by the user's servers and the WTML files are made available to others, anyone with access to the WTML files will be able to browse the images in WWT.
The WorldWide Telescope (WWT) Academic Development Kit, January 2009 release contains two utilities that enable people to convert their astronomical images, panoramas, sky surveys, and planetary textures to a format that can be read by WWT and shared with other WWT users. It produces image pyramids of the photographs, thumbnails, and WTML files. WTML files are XML files in the WWT format that point to the images on the Internet and store details of how they are to be displayed in WWT and metadata such as image title and credits. The WWT SphereToaster Tool enables users to provide images in an equirectangular format that covers all or part of the inside or outside of a sphere. This includes, for example, cylindrical projections of panoramas and all-sky surveys. SphereToaster converts these to a different projection system—the TOAST system, currently unique to WWT—and then stores an image pyramid of the resulting TOAST-projected image. The tool also produces thumbnails and WTML files. The WWT StudyChopper Tool enables users to provide photographs of small parts of the sky, such as a high-resolution image of the Crab Nebula, and enter appropriate coordinate information and metadata. It creates image pyramids of the photographs, thumbnails, and WTML files. Once the output image pyramids and thumbnails are hosted by the user's servers and the WTML files are made available to others, anyone with access to the WTML files will be able to browse the images in WWT.
WorldWide Telescope Academic Development Kit, January 2009 Release - Microsoft Research
Since I utilize SharePoint Designer allot for editing internal SharePoint sites…this is a really good thing to see.
Customize SharePoint with SharePoint Designer We are implementing a number of changes to promote and facilitate even more customization efforts on top of the SharePoint platform including making SharePoint Designer 2007 available as a free download. Download SharePoint Designer 2007 for free
We are implementing a number of changes to promote and facilitate even more customization efforts on top of the SharePoint platform including making SharePoint Designer 2007 available as a free download.
SharePoint Designer Home Page - Microsoft Office Online
It’s been a rainy Friday – so for fun I wanted to see how my WorldMap is doing. I have hits from all 50 states, except North Dakota – so my challenge is how do I get one of the just over 640K residents to hit my blog….if you have any thoughts on it or have relatives in ND send them my link…
btw – I found the state facts of North Dakota quite interesting – Nicknames: Peace Garden State, Flickertail State, and Roughrider State – Motto: Liberty and Union Now and Forever, One and Inseparable – Beverage: Milk – Fossil: Teredo Petrified Wood (Teredo was a worm-shaped mollusk) – and the State Fruit: Chokecherry (I never would have guessed it is part of the rose family).
Once I get that North Dakotan – the next goal – how to get hits from all the Canadian provinces – looks like I still need Yukon, Northwest Territories and Saskatchewan. :-)
Thanks to Office OFFline I was able to create my own Shoe Circus Clown Club card as featured in the new ads. Just make sure you right click on the link and save as.
Today at the MSR Faculty Summit – I was able to play with the Sphere Project, and it’s a real unique way to interact with information/data. It makes you think what data could be place in a system like this and how to interact with it. We’ll be playing with it more and are interested in ideas on how to enable environmental projects with this system…
[Cnet]Academics to get a glimpse of Microsoft's Sphere
Video submitted to UIST '08 conference (wmv 30MB)
Sphere: A Multi-Touch Interactive Spherical Display Sphere is an interactive spherical display prototype that uses custom optics hardware as well as computer vision and graphics software to enable interaction on a spherical surface.
Sphere is an interactive spherical display prototype that uses custom optics hardware as well as computer vision and graphics software to enable interaction on a spherical surface.
Sphere Project
The Interactive Visual Media Group from Microsoft Research has released Image Composite Editor (ICE) – which let’s you stitch together images and output to multi-resolution tiled formats – great for creating HD View and Silverlight Deep Zoom images…
More details on it at Matt’s HDView Blog
Image Composite Editor (ICE) What is ICE? ICE is an advanced panoramic image stitcher. You shoot a set of overlapping photographs of a scene from a single location, and ICE creates a high-resolution panorama incorporating all your images at full resolution. Then save your stitched panorama in a wide variety of formats, from common formats like JPEG and TIFF to multi-resolution tiled formats like HD View and Silverlight Deep Zoom.
Microsoft Research Image Composite Editor (ICE)
Savas and I had talked about this idea of using Silverlight-based cycle stealing and wondered how well it would work. It's good to see this article on CodeProject about Legion: Build your own virtual super computer with Silverlight by Daniel Vaughan.
Legion is a Grid Computing framework that uses the Silverlight CLR to execute user definable tasks. Legion uses an ASP.NET application and web services to download tasks, upload result data, and provide grid-wide thread-safe operations for web clients or agents. Multiple tasks can be hosted at once, with Legion managing the delegation of tasks to agents. Client performance metrics, such as bandwidth and processor speed, may be used to tailor jobs for clients. Legion provides a management service and WPF application that is used to monitor the Legion grid. I have deployed Legion to a demonstration server here so you can see it in action.
Legion is a Grid Computing framework that uses the Silverlight CLR to execute user definable tasks. Legion uses an ASP.NET application and web services to download tasks, upload result data, and provide grid-wide thread-safe operations for web clients or agents. Multiple tasks can be hosted at once, with Legion managing the delegation of tasks to agents. Client performance metrics, such as bandwidth and processor speed, may be used to tailor jobs for clients. Legion provides a management service and WPF application that is used to monitor the Legion grid.
I have deployed Legion to a demonstration server here so you can see it in action.
I wonder if Daniel is aware of the previous Legion grid system by Andrew Grimshaw that turned into Avaki (now part of Sybase)
Scientists should find this Word 2007 add-in very useful, especially when submitting to PubMed central. It also shows how this could be utilized with other systems need to capture Metadata at the time of authoring.
This Technology Preview release of the Article Authoring Add-in for Microsoft Word 2007 provides authors of scientific articles with the ability to read and write files from Word 2007 into the XML format used by the National Library of Medicine for archiving articles in the U.S. National Institutes of Health (NIH) free digital archive of biomedical and life sciences journal literature, PubMed Central.
Download details: Article Authoring Add-in
More details on the add-in at Savas' blog
A MSR Tech Report is now available on Statistical Resolution of Ambiguous HLA Typing Data - Here's the non-technical summary of the paper:
At the core of the human adaptive immune response is the train-to-kill mechanism in which specialized immune cells are sensitized to recognize small peptides from foreign sources (e.g., from HIV virus or bacteria). Following this sensitization, these immune cells are then activated to kill other cells which display this same peptide (and which contain this same foreign peptide). However, in order for sensitization and killing to occur, the foreign peptide must be ‗paired up‘ with one of the infected person‘s other specialized immune molecules—an HLA molecule. The way in which peptides interact with these HLA molecules defines if and how an immune response will be generated. There is a huge repertoire of such HLA molecules, with almost no two people having the same set. Furthermore, a person‘s HLA type can determine their susceptibility to disease, or the success of a transplant, for example. However, obtaining high quality HLA data for patients is often difficult because of the great cost and specialized laboratories required, or because the data are historical and cannot be retyped with modern methods. Therefore, we introduce a statistical model which can make use of existing high-quality HLA data, to infer higher-quality HLA data from lower-quality data.
Statistical Resolution of Ambiguous HLA Typing Data
Here's a really good article from the Berkeley Lab View on the work that Catharine Van Ingen and Stuart Ozer from MSR have been involved with w/ LBL and the Berkeley Water Center. The use of SQL Server Analysis Services and Reporting Services have made a real difference in how scientists can explore AmeriFlux and water sensor data. You can see the datasets for the AmeriFlux and the Russian River at http://bwc.berkeley.edu/ Also Deb Agarwal and team have done a real good job w/ their User Manual outlying how to access the data via the web and Excel
Lab Team Helping Smooth Flow of Water Data By Jon Bashor A collaboration among Microsoft, Berkeley Lab and UC Berkeley is underway to make it easier for researchers to access and analyze collected data on water, with the goal of accelerating research in the increasingly important areas of water supply and climate change. Called Microsoft e-Science, the project is part of the Berkeley Water Center’s effort to marshal expertise from public institutions and the private sector to enable researchers to easily access and work with water data. The year-old center is the brainchild of Berkeley Lab’s Computational Research Division (CRD), UC Berkeley’s College of Engineering and UC Berkeley’s College of Natural Resources.
By Jon Bashor
A collaboration among Microsoft, Berkeley Lab and UC Berkeley is underway to make it easier for researchers to access and analyze collected data on water, with the goal of accelerating research in the increasingly important areas of water supply and climate change.
Called Microsoft e-Science, the project is part of the Berkeley Water Center’s effort to marshal expertise from public institutions and the private sector to enable researchers to easily access and work with water data. The year-old center is the brainchild of Berkeley Lab’s Computational Research Division (CRD), UC Berkeley’s College of Engineering and UC Berkeley’s College of Natural Resources.
Source: [Read More] Berkeley Lab View -- March 16, 2007
Very cool - a light weight way to share applications...brings me back to the NetMeeting days..
There is even integration with Word - could this be the way for academic papers to be written, such that they aren't being emailed back and forth all the time.
If a Microsoft Office Word document is being edited during a SharedView session, the Track Changes feature in Word is automatically enabled, and each change is highlighted with a text identifier indicating which user made the change.
Hold more effective meetings and conference calls Connect with up to 15 people in different locations and get your point across by showing them what's on your screen. Work together in real time Share, review, and update documents with multiple people in real time. Use when and where you want SharedView is easy to use, from anywhere, at a moment's notice.
Connect with up to 15 people in different locations and get your point across by showing them what's on your screen.
Share, review, and update documents with multiple people in real time.
SharedView is easy to use, from anywhere, at a moment's notice.
Source: Microsoft SharedView Beta
Sat through a presentation on Paint.NET – really cool paint app built on .NET from students from WSU. They are coming out with a 2.0 version on Dec 17th.
One of the most hidden features in Word 2007 is the equation editor – it allows you to input equations using the linear format and the equations that are generated are truly visualizing appealing.
There are some videos showing the use of equation editor, but I just see that Murray Sargent is the “star” in a new video walking through some complex equations and showing some of the other formatting/alignment features that are included.
Silverlight version of Video
There is a new release of the Python Tools for Visual Studio and it includes Pyvot: a connector to Excel that allow data transfer and manipulation – check out the tutorial. It also has a PyKinect, to leverage Kinect for new natural user interactions (NUIs)…
An integrated environment for developing Python in VS2010 PTVS 1.1 Alpha is Live! Supports CPython and IronPython Python editor with advanced member and signature intellisense Code navigation “Find all refs”, goto definition, and object browser Local and remote debugging Profiling with multiple views Integrated REPL window with inline matplotlib graphics Support for HPC clusters and MPI, including debugging & Profiling Interactive parallel computing via integrated IPython REPL
Python Tools for Visual Studio
Last week I ended up finding that one of my directories of pictures lost their file extensions – so I thought I’d spin up PowerShell in Win8 and see if I could remember how to do.
After a little trial and error I ended up with the following PowerShell Script – reminded me how powerful and easy PowerShell is for scripting all of Windows.
While the Data Deluge is upon on the scientific communities, how to manage and share the scientific data is still a challenge. To really allow data to be useful for scientists and general consumers data needs to be Discoverable, Accessible, and Consumable.
Discoverable – How do you find the data? Searching for data via search engines is not the right way to find the information. Sites like data.gov is a good start for getting to scientific data, but how to find the smaller pots of data.
Accessible – To do anything useful with the information/data – it needs to be made available – that means the data needs to be easily downloaded, not hidden behind many web pages and locked up behind passwords.
Consumable – it needs to be straight forward (one-click) to bring the data into applications for analysis (ie. Excel, MatLab, etc). Put it into the hands of the users.
Today is the 2nd anniversary of the launch of WWT – congrats to Jonathan, Curtis, and the rest of the small team. Besides the initial windows client (that lead to Scoble’s post - Microsoft researchers make me cry) there is the web client (silverlight), a web control, and the Bing Map WWT addin.
There is still more to come…
Great to see someone having fun and getting into SOAP
Love the message campaign
I understand the need to think about legacy applications and that companies must keep their customers happy. So if customers want to use Web Services technologies in order to build their object-oriented systems, why not? Let’s encourage them (e.g., CORBA binding to WSDL). I don’t want to restart the old argument on why WSDL is not yet another object IDL (post 1, post 2, post 3) but it seems that we are treating it as such. I think we are forgetting the importance of SOAP, the importance of the message. The Indigo people get it.So... I would like to start a campaign for the promotion of SOAP, the “love the message campaign” or “love SOAP campaign”. Here’s a ribbon to go with it. What do you think? Can we make this happen? Can we make people believers? Spread the word!!! :-)