<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Dan on eScience &amp; Technical Computing @ Microsoft : Data Analysis</title><link>http://blogs.msdn.com/dan_fay/archive/tags/Data+Analysis/default.aspx</link><description>Tags: Data Analysis</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>Project Gemini – Videos available</title><link>http://blogs.msdn.com/dan_fay/archive/2009/08/26/project-gemini-videos-available.aspx</link><pubDate>Wed, 26 Aug 2009 17:35:05 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9885395</guid><dc:creator>Dan Fay</dc:creator><slash:comments>0</slash:comments><comments>http://blogs.msdn.com/dan_fay/comments/9885395.aspx</comments><wfw:commentRss>http://blogs.msdn.com/dan_fay/commentrss.aspx?PostID=9885395</wfw:commentRss><description>&lt;p&gt;I previously posted about using Project Gemini as a tool for &lt;a href="http://blogs.msdn.com/dan_fay/archive/2009/08/21/science-analytics-look-to-use-project-gemini.aspx" target="_blank"&gt;scientific analysis&lt;/a&gt; – as a quick way to learn about them and see them in action, take a look at the &lt;a href="http://www.youtube.com/geminute" target="_blank"&gt;one minute Gemini videos&lt;/a&gt; by &lt;a href="http://twitter.com/donalddotfarmer"&gt;Donald Farmer&lt;/a&gt;.&amp;#160; One of the latest videos is about using &lt;a href="http://www.youtube.com/watch?v=uSt5UqQmk7A" target="_blank"&gt;Reports as data sources&lt;/a&gt; and highlights the orange button for getting data sets.&amp;#160; This is a perfect way for scientists to publish data and easily make the feeds available for others to consume.&amp;#160; It could even be included in papers to easily enable research reproducibility.&amp;#160; &lt;/p&gt;  &lt;div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; float: none; padding-top: 0px" id="scid:5737277B-5D6D-4f48-ABFC-DD9C333F4C5D:9d124fe6-e3b5-4a59-9ec3-9e458116abc1" class="wlWriterEditableSmartContent"&gt;&lt;div id="3df43d5e-15cf-4c80-aa76-59c775592286" style="margin: 0px; padding: 0px; display: inline;"&gt;&lt;div&gt;&lt;a href="http://www.youtube.com/watch?v=uSt5UqQmk7A" target="_new"&gt;&lt;img src="http://blogs.msdn.com/blogfiles/dan_fay/WindowsLiveWriter/ProjectGeminiVideosavailable_6AA3/video52a995bec9b5.jpg" style="border-style: none" galleryimg="no" onload="var downlevelDiv = document.getElementById('3df43d5e-15cf-4c80-aa76-59c775592286'); downlevelDiv.innerHTML = &amp;quot;&amp;lt;div&amp;gt;&amp;lt;object width=\&amp;quot;425\&amp;quot; height=\&amp;quot;355\&amp;quot;&amp;gt;&amp;lt;param name=\&amp;quot;movie\&amp;quot; value=\&amp;quot;http://www.youtube.com/v/uSt5UqQmk7A&amp;amp;hl=en\&amp;quot;&amp;gt;&amp;lt;\/param&amp;gt;&amp;lt;embed src=\&amp;quot;http://www.youtube.com/v/uSt5UqQmk7A&amp;amp;hl=en\&amp;quot; type=\&amp;quot;application/x-shockwave-flash\&amp;quot; width=\&amp;quot;425\&amp;quot; height=\&amp;quot;355\&amp;quot;&amp;gt;&amp;lt;\/embed&amp;gt;&amp;lt;\/object&amp;gt;&amp;lt;\/div&amp;gt;&amp;quot;;" alt=""&gt;&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;  &lt;p&gt;Thanks to &lt;a href="http://blogs.msdn.com/robertbruckner/archive/2009/08/25/reports-as-data-feeds-for-gemini.aspx"&gt;Robert Bruckner's Advanced Reporting Services Blog : Reports As Data Feeds for Gemini&lt;/a&gt;&lt;/p&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=9885395" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/dan_fay/archive/tags/eScience/default.aspx">eScience</category><category domain="http://blogs.msdn.com/dan_fay/archive/tags/Data+Analysis/default.aspx">Data Analysis</category><category domain="http://blogs.msdn.com/dan_fay/archive/tags/Gemini/default.aspx">Gemini</category></item><item><title>Graywulf Takes Byte Out of Data Overload</title><link>http://blogs.msdn.com/dan_fay/archive/2009/05/08/graywulf-takes-byte-out-of-data-overload.aspx</link><pubDate>Fri, 08 May 2009 20:46:17 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9597386</guid><dc:creator>Dan Fay</dc:creator><slash:comments>3</slash:comments><comments>http://blogs.msdn.com/dan_fay/comments/9597386.aspx</comments><wfw:commentRss>http://blogs.msdn.com/dan_fay/commentrss.aspx?PostID=9597386</wfw:commentRss><description>&lt;p&gt;&lt;a href="http://blogs.msdn.com/blogfiles/dan_fay/WindowsLiveWriter/GraywulfTakesByteOutofDataOverload_F3E8/jimgray_2.gif"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; margin-left: 0px; border-left-width: 0px; margin-right: 0px" title="jimgray" border="0" alt="jimgray" align="right" src="http://blogs.msdn.com/blogfiles/dan_fay/WindowsLiveWriter/GraywulfTakesByteOutofDataOverload_F3E8/jimgray_thumb.gif" width="147" height="244" /&gt;&lt;/a&gt;Graywulf is the natural evolution of &lt;a href="http://en.wikipedia.org/wiki/Beowulf_cluster" target="_blank"&gt;Beowulf Clusters&lt;/a&gt; – it brings together HPC clusters and databases to do &lt;a href="http://blogs.msdn.com/blogfiles/dan_fay/WindowsLiveWriter/GraywulfTakesByteOutofDataOverload_F3E8/graywulf-full-color_2.jpg"&gt;&lt;img style="border-right-width: 0px; margin: 5px 5px 5px 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="graywulf-full-color" border="0" alt="graywulf-full-color" align="left" src="http://blogs.msdn.com/blogfiles/dan_fay/WindowsLiveWriter/GraywulfTakesByteOutofDataOverload_F3E8/graywulf-full-color_thumb.jpg" width="244" height="148" /&gt;&lt;/a&gt;efficient processing and data management.&amp;#160; It’s name and design also pays homage to &lt;a href="http://research.microsoft.com/en-us/um/people/gray/" target="_blank"&gt;Jim Gray&lt;/a&gt; – who helped&amp;#160; champion the use of relational databases in the scientific projects.&lt;/p&gt;  &lt;p&gt;At it’s simplest form Graywulf is having a database installed on each of the HPC compute nodes – this brings the data to the computation – one of the points Jim made quite often and utilizes the power of databases (queries, stored procedures, etc).&amp;#160; Since it’s a generic architecture Graywulf clusters can be built using any OS and any database…the ones in the case study below implemented them using &lt;a href="http://www.microsoft.com/hpc"&gt;Windows HPC Server&lt;/a&gt; and &lt;a href="http://www.microsoft.com/sql"&gt;SQL Server&lt;/a&gt; and the motivation was to be more efficient in doing the science – it’s always great to have innovative folks using technologies to do good work.&amp;#160; &lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;“To put it simply, a scientist needs to be able to live within the data,” says Alexander Szalay, a cosmologist-turned-computer-scientist at The Johns Hopkins University (JHU) in Baltimore, Maryland. The power of information, Szalay says, is determined not by its quantity so much as how easy it is to access, manipulate and analyze.     &lt;br /&gt;“It’s not just about doing the numerical calculations,” adds Andrew Simms, a biomedical health informatics graduate student working on protein structure analysis in Valerie Daggett’s bioengineering laboratory at the University of Washington (UW) in Seattle. “It’s also about assembling the data so we can run calculations while performing analyses and ad hoc explorations and then feed it all back into the data warehouse.”&lt;/p&gt; &lt;/blockquote&gt;  &lt;blockquote&gt;   &lt;h4&gt;&lt;a title="Graywulf Takes Byte Out of Data Overload" href="http://research.microsoft.com/en-us/collaboration/focus/e3/graywulf.aspx"&gt;Graywulf Takes Byte Out of Data Overload&lt;/a&gt;&lt;/h4&gt;    &lt;p&gt;&lt;img style="margin: 0px 0px 0px 5px; display: inline" title="Graywulf takes byte out of data overload" alt="Graywulf takes byte out of data overload" align="right" src="http://research.microsoft.com/en-us/collaboration/focus/e3/graywulf1.jpg" /&gt;Astronomers at The Johns Hopkins University and protein scientists at the University of Washington are using inexpensive computer hardware combined with powerful computing and database software to help manage and analyze a growing volume of scientific data. &lt;/p&gt;    &lt;p&gt;For details, read the &lt;a href="http://research.microsoft.com/en-us/collaboration/focus/e3/graywulf.pdf"&gt;Graywulf case study&lt;/a&gt;. &lt;/p&gt;    &lt;h5&gt;Project Principals&lt;/h5&gt;    &lt;ul&gt;     &lt;li&gt;&lt;a href="http://physics-astronomy.jhu.edu/people/faculty/szalay.html"&gt;Alexander Szalay&lt;/a&gt;, Alumni Centennial Professor, Department of Physics and Astronomy, The Johns Hopkins University &lt;/li&gt;      &lt;li&gt;&lt;a href="http://depts.washington.edu/daglab/valerie.html"&gt;Valerie Daggett&lt;/a&gt;, Professor of Bioengineering, University of Washington &lt;/li&gt;   &lt;/ul&gt; &lt;/blockquote&gt;  &lt;p&gt;&lt;a href="http://research.microsoft.com/en-us/collaboration/focus/e3/graywulf.aspx"&gt;Graywulf Takes Byte Out of Data Overload - Microsoft Research&lt;/a&gt;&lt;/p&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=9597386" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/dan_fay/archive/tags/WinHPC/default.aspx">WinHPC</category><category domain="http://blogs.msdn.com/dan_fay/archive/tags/Research/default.aspx">Research</category><category domain="http://blogs.msdn.com/dan_fay/archive/tags/SQL+Server/default.aspx">SQL Server</category><category domain="http://blogs.msdn.com/dan_fay/archive/tags/Science/default.aspx">Science</category><category domain="http://blogs.msdn.com/dan_fay/archive/tags/Data+Analysis/default.aspx">Data Analysis</category><category domain="http://blogs.msdn.com/dan_fay/archive/tags/Graywulf/default.aspx">Graywulf</category></item><item><title>new CloudApp(): Azure™ Developer Challenge</title><link>http://blogs.msdn.com/dan_fay/archive/2009/05/05/new-cloudapp-azure-developer-challenge.aspx</link><pubDate>Tue, 05 May 2009 19:53:51 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9589221</guid><dc:creator>Dan Fay</dc:creator><slash:comments>1</slash:comments><comments>http://blogs.msdn.com/dan_fay/comments/9589221.aspx</comments><wfw:commentRss>http://blogs.msdn.com/dan_fay/commentrss.aspx?PostID=9589221</wfw:commentRss><description>&lt;p&gt;The Azure folks have setup the &lt;a href="http://www.newcloudapp.com/" target="_blank"&gt;new CloudApp&lt;a href="http://www.azure.com/" target="_blank"&gt;&lt;img style="margin: 0px 0px 0px 5px; display: inline" title="Windows Azure Logo" border="0" alt="Windows Azure Logo" align="right" src="http://www.microsoft.com/azure/images/Windowsazuresmall.gif" /&gt;&lt;/a&gt;()&lt;/a&gt; challenge – I’d like to see some Science apps created on Azure and see how it changes the way to work with different datasets.&amp;#160; &lt;/p&gt;  &lt;blockquote&gt;   &lt;h3&gt;&lt;a title="new CloudApp() - The Azure™ Services Platform Developer Challenge" href="http://www.newcloudapp.com/"&gt;new CloudApp() - The Azure™ Services Platform Developer Challenge&lt;/a&gt;&lt;/h3&gt;    &lt;h4&gt;What Is It?&lt;/h4&gt;    &lt;p&gt;new CloudApp() is a US-based developer challenge for .NET &amp;amp; PHP developers creating cloud applications or services (hereafter &amp;quot;application&amp;quot;) on the Azure™ Services Platform. Have your application judged by industry leaders &lt;strong&gt;Om Malik&lt;/strong&gt; and &lt;strong&gt;Michael Cote&lt;/strong&gt; and share your cloud coding skills with other developers. Grand Prize Winners will be announced on stage at &lt;a href="http://events.gigaom.com/structure/09/"&gt;Structure 09&lt;/a&gt; and featured on &lt;a href="http://azure.com/"&gt;azure.com&lt;/a&gt;.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&lt;a href="http://www.newcloudapp.com/"&gt;new CloudApp(): The Azure™ Services Platform Developer Challenge - Home&lt;/a&gt;&lt;/p&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=9589221" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/dan_fay/archive/tags/Science/default.aspx">Science</category><category domain="http://blogs.msdn.com/dan_fay/archive/tags/Data+Analysis/default.aspx">Data Analysis</category><category domain="http://blogs.msdn.com/dan_fay/archive/tags/Azure/default.aspx">Azure</category><category domain="http://blogs.msdn.com/dan_fay/archive/tags/Cloud+Computing/default.aspx">Cloud Computing</category><category domain="http://blogs.msdn.com/dan_fay/archive/tags/Competition/default.aspx">Competition</category></item><item><title>Using Flickr for Astronomy – and viewing in WWT</title><link>http://blogs.msdn.com/dan_fay/archive/2009/02/20/using-flickr-for-astronomy-and-viewing-in-wwt.aspx</link><pubDate>Sat, 21 Feb 2009 01:55:17 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9436658</guid><dc:creator>Dan Fay</dc:creator><slash:comments>1</slash:comments><comments>http://blogs.msdn.com/dan_fay/comments/9436658.aspx</comments><wfw:commentRss>http://blogs.msdn.com/dan_fay/commentrss.aspx?PostID=9436658</wfw:commentRss><description>&lt;p&gt;The use of online services such as Flickr to help scientists is in its infancy and applications utilizing commodity based solutions will continue to pick up momentum.&amp;#160; I especially like the integration and the ease of use – science should be about discovery and exploration – not about the technology.&amp;#160;&amp;#160; Of course the ability to view those analyzed images in WorldWide Telescope completes the circle and allows you to view the image in context.&amp;#160; &lt;/p&gt;  &lt;p&gt;Check out the &lt;a href="http://www.flickr.com/photos/flxzr/3053801145/in/pool-astrometry/"&gt;Orion Nebula&lt;/a&gt;. &lt;a href="http://www.worldwidetelescope.com/wwtweb/ShowImage.aspx?scale=2.74&amp;amp;name=Orion+Nebula&amp;amp;imageurl=http://farm4.static.flickr.com/3150/3053801145_c41d557253_o.jpg&amp;amp;credits=Alan+Third+(All+Rights+Reserved)&amp;amp;creditsUrl=&amp;amp;ra=83.8540026266&amp;amp;y=1007&amp;amp;x=1519&amp;amp;rotation=156.40&amp;amp;dec=-5.03028217595&amp;amp;thumb=http://farm4.static.flickr.com/3150/3053801145_7b07fb1495_t.jpg" target="_blank"&gt;&lt;img style="border-right-width: 0px; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" border="0" src="http://sharepoint/sites/erwkgrp/Earth%20Energy%20%20Environment/WWT%20Academic%20Program/viewInWWT.jpg" /&gt;&lt;/a&gt;&amp;#160; &lt;br /&gt;After it opens up – click on the thumbnail at the top. &lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.msdn.com/blogfiles/dan_fay/WindowsLiveWriter/UsingFlickrforAstronomyandviewinginWWT_CDC9/web_corona_rot_6A767906%5B1%5D_2.jpg"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; margin-left: 0px; border-top: 0px; margin-right: 0px; border-right: 0px" title="web_corona_rot_6A767906[1]" border="0" alt="web_corona_rot_6A767906[1]" align="right" src="http://blogs.msdn.com/blogfiles/dan_fay/WindowsLiveWriter/UsingFlickrforAstronomyandviewinginWWT_CDC9/web_corona_rot_6A767906%5B1%5D_thumb.jpg" width="79" height="69" /&gt;&lt;/a&gt;You can also add your own – check out Dinoj’s post on the WWT Data Blog - &lt;a href="http://community.research.microsoft.com/blogs/wwt_data_blog/archive/2008/11/27/sticking-images-on-the-sky-with-wwt.aspx"&gt;Sticking images on the sky with WWT&lt;/a&gt;.&amp;#160; For fun you can see the crown for the Corona Borealis overlaid on the sky &lt;a href="http://www.worldwidetelescope.org/wwtweb/ShowImage.aspx?name=Crown " target="_blank"&gt;&lt;img style="border-right-width: 0px; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" border="0" src="http://sharepoint/sites/erwkgrp/Earth%20Energy%20%20Environment/WWT%20Academic%20Program/viewInWWT.jpg" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;See the article written by &lt;a href="http://www.readwriteweb.com/about_Frederic.php"&gt;Frederic Lardinois&lt;/a&gt; on &lt;a title="ReadWriteWeb" href="http://www.readwriteweb.com/archives/using_flickr_for_astronomy.php"&gt;ReadWriteWeb&lt;/a&gt;.&lt;/p&gt;  &lt;blockquote&gt;   &lt;h3&gt;&lt;a title="The Great Gig in the Sky: Using Flickr for Astronomy" href="http://www.readwriteweb.com/archives/using_flickr_for_astronomy.php"&gt;The Great Gig in the Sky: Using Flickr for Astronomy&lt;/a&gt; &lt;/h3&gt;    &lt;p&gt;&lt;img style="display: inline; margin-left: 0px; margin-right: 0px" alt="flickr_astronomy_logo.jpg" align="left" src="http://www.readwriteweb.com/images/flickr_astronomy_logo.jpg" /&gt;&lt;a href="http://flickr.com"&gt;Flickr &lt;/a&gt;hosts a wide range of beautiful images, but a new project built on top of Flickr's API only focuses on photos of the night sky from amateur astronomers. The &lt;a href="http://astrometry.net/"&gt;Astrometry.net project&lt;/a&gt; constantly scans the &lt;a href="http://www.flickr.com/groups/astrometry/"&gt;Astrometry Flickr group&lt;/a&gt; for new images to catalog and to add to its &lt;a href="http://astrometry.net/summary.html"&gt;open-source sky survey&lt;/a&gt;. At the same time, this project also provides a more direct service to the amateur astronomers, as it also analyzes each image and returns a high-quality description of the photo's contents.&lt;/p&gt;    &lt;p&gt;The Astrometry group currently has over 400 members, and as &lt;a href="http://skydrive.live.com/"&gt;Christoper Stumm&lt;/a&gt;, a member of the Astrometry.net team, told the &lt;a href="http://code.flickr.com/blog/2009/02/18/found-in-space/"&gt;Flickr Code&lt;/a&gt; blog, the back-end software uses &lt;a href="http://en.wikipedia.org/wiki/Geometric_hashing"&gt;geometric hashing&lt;/a&gt; to exactly pinpoint and describe the objects in the images. When you submit an image to the Flickr pool, the robot will not just respond with a comment that contains an exact description of what you see in the image, but it will also annotate the image automatically.&lt;/p&gt;    &lt;p&gt;&lt;img alt="astrometry_flickr_feb09.png" align="right" src="http://www.readwriteweb.com/images/astrometry_flickr_feb09.png" /&gt;While a lot of members of the Astrometry group use &lt;a href="http://www.pbase.com/david_r_astrophotography/equipment"&gt;high-end telescopes and cameras&lt;/a&gt;, the Astrometry.net solver can also analyze &lt;a href="http://www.flickr.com/photos/prawnwarp/3173311602/in/pool-astrometry"&gt;images&lt;/a&gt; from consumer-level digital cameras.&lt;/p&gt;    &lt;p&gt;While just being able to automatically analyze and catalog these images is pretty cool already, every description also contains a link that displays the image in Microsoft's &lt;a href="http://www.worldwidetelescope.org/Home.aspx"&gt;WordWide Telescope&lt;/a&gt;. &lt;/p&gt;    &lt;p&gt;Astronomy is one of those few scientific disciplines where dedicated amateurs can still make &lt;a href="http://www.washingtonpost.com/wp-dyn/articles/A28301-2004Mar3.html"&gt;major discoveries&lt;/a&gt; and this is definitely one of the cooler applications of Flickr's API that we have seen in a long time.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&lt;a href="http://www.readwriteweb.com/archives/using_flickr_for_astronomy.php"&gt;The Great Gig in the Sky: Using Flickr for Astronomy - ReadWriteWeb&lt;/a&gt;&lt;/p&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=9436658" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/dan_fay/archive/tags/Research/default.aspx">Research</category><category domain="http://blogs.msdn.com/dan_fay/archive/tags/Cool+Software/default.aspx">Cool Software</category><category domain="http://blogs.msdn.com/dan_fay/archive/tags/Science/default.aspx">Science</category><category domain="http://blogs.msdn.com/dan_fay/archive/tags/WWT/default.aspx">WWT</category><category domain="http://blogs.msdn.com/dan_fay/archive/tags/Data+Analysis/default.aspx">Data Analysis</category></item><item><title>New Tools Mobilize Local Data to Study Global Environmental Issues from Berkeley Lab</title><link>http://blogs.msdn.com/dan_fay/archive/2009/02/04/new-tools-mobilize-local-data-to-study-global-environmental-issues-from-berkeley-lab.aspx</link><pubDate>Thu, 05 Feb 2009 07:06:12 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9397495</guid><dc:creator>Dan Fay</dc:creator><slash:comments>1</slash:comments><comments>http://blogs.msdn.com/dan_fay/comments/9397495.aspx</comments><wfw:commentRss>http://blogs.msdn.com/dan_fay/commentrss.aspx?PostID=9397495</wfw:commentRss><description>&lt;p&gt;Here’s a really good article from the folks at Lawrence Berkeley National Laboratory on the collaboration MSR has ongoing between LBL and the Berkeley Water Center.&amp;#160; It highlights the use of databases for scientific information as Catharine mentions… &lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;“One of the greatest challenges of the next century will be developing cyber-architectures that allow scientists to easily navigate their digital assets. Today, the internet has given environmental researchers instant access to a wealth of field data. Now, they need a scientific ‘safety deposit box’ system that will not only store this information, but also organize it so it is searchable and ready for analysis,” says van Ingen.&lt;/p&gt; &lt;/blockquote&gt;  &lt;blockquote&gt;   &lt;h4&gt;&lt;a href="http://newscenter.lbl.gov/feature-stories/2009/02/04/local-data-environmental-issues/"&gt;New Tools Mobilize Local Data to Study Global Environmental Issues&lt;/a&gt;&lt;/h4&gt;    &lt;h5&gt;&lt;i&gt;&lt;/i&gt;&lt;/h5&gt;    &lt;p&gt;Guarding water supplies, protecting endangered species and curbing greenhouse gases is going high-tech. Environmental scientists are turning to innovative cyber-infrastructures and data-mining tools.&lt;/p&gt;    &lt;p&gt;&lt;a href="http://newscenter.lbl.gov/wp-content/uploads/fkux-tower-at-tonzi.jpg"&gt;&lt;img style="display: inline; margin-left: 0px; margin-right: 0px" title="fkux-tower-at-tonzi" alt="" align="right" src="http://newscenter.lbl.gov/wp-content/uploads/fkux-tower-at-tonzi-300x225.jpg" width="297" height="222" /&gt;&lt;/a&gt;&lt;/p&gt;    &lt;p&gt;As they strive to develop effective strategies for guarding water supplies, protecting endangered species and curbing greenhouse gases, environmental scientists are turning to innovative cyber-infrastructures and data-mining tools developed by an ongoing collaboration between researchers at Lawrence Berkeley National Laboratory, Microsoft Research, and the University of California, Berkeley.&lt;/p&gt;    &lt;p&gt;The Microsoft eScience program is the primary funder of this project, which is one of numerous ventures cultivated by the Berkeley Water Center (BWC). Launched approximately three years ago by researchers from the Berkeley Lab and UC Berkeley’s Colleges of Engineering and Natural Resources, the BWC marshals expertise from public institutions and the private sector in support of projects that enable science and public policy researchers to more easily access and work with water and environmental datasets.&lt;/p&gt;    &lt;p&gt;“The most cost-efficient way to impact issues like global climate change and water management is to develop cyber-architectures that organize data and foster scientific collaboration,” says Susan Hubbard, staff scientist in the Berkeley Lab’s Earth Sciences Division and associate director of the BWC.&lt;/p&gt;    &lt;p&gt;Environmental scientists typically collect data on a project-by-project basis, in campaigns targeted at very specific topics. One study may use NASA satellites to track annual rainfall of deserts around the globe, while another project sponsored by the National Science Foundation (NSF) might measure the annual water tables of the Sahara desert with commercial sensors. The data are then typically stored in local archive systems and accessed by researchers associated with that particular project. These sites are scattered across the country, tend to be aligned with specific campaigns, and are funded by a variety of organizations.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Rest of the article at: &lt;a href="http://newscenter.lbl.gov/feature-stories/2009/02/04/local-data-environmental-issues/"&gt;New Tools Mobilize Local Data to Study Global Environmental Issues « Berkeley Lab News Center&lt;/a&gt;&lt;/p&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=9397495" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/dan_fay/archive/tags/eScience/default.aspx">eScience</category><category domain="http://blogs.msdn.com/dan_fay/archive/tags/Research/default.aspx">Research</category><category domain="http://blogs.msdn.com/dan_fay/archive/tags/Science/default.aspx">Science</category><category domain="http://blogs.msdn.com/dan_fay/archive/tags/Article/default.aspx">Article</category><category domain="http://blogs.msdn.com/dan_fay/archive/tags/Data+Analysis/default.aspx">Data Analysis</category></item><item><title>Data Mining Services in the Cloud – Mine your Data, Any Place, Any Time</title><link>http://blogs.msdn.com/dan_fay/archive/2008/09/04/data-mining-services-in-the-cloud-mine-your-data-any-place-any-time.aspx</link><pubDate>Thu, 04 Sep 2008 22:24:38 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:8925029</guid><dc:creator>Dan Fay</dc:creator><slash:comments>1</slash:comments><comments>http://blogs.msdn.com/dan_fay/comments/8925029.aspx</comments><wfw:commentRss>http://blogs.msdn.com/dan_fay/commentrss.aspx?PostID=8925029</wfw:commentRss><description>&lt;p&gt;This is great news - &lt;a href="http://www.microsoft.com/softwareplusservices/" target="_blank"&gt;Software-plus-Services&lt;/a&gt; that any scientist/researcher could use.&amp;#160; The SQL Server Data Mining folks have a &lt;a title="Data Mining Service " href="http://www.sqlserverdatamining.com/cloud/"&gt;Data Mining Service&lt;/a&gt; in the cloud they are testing out…I posted previously [&lt;a href="http://blogs.msdn.com/dan_fay/archive/2008/08/20/olap-and-scientific-data.aspx"&gt;OLAP and Scientific Data&lt;/a&gt; &amp;amp; &lt;a href="http://blogs.msdn.com/dan_fay/archive/2007/02/28/data-mining-addins-for-office-2007-excel-visio.aspx"&gt;Data Mining Addins for Office 2007 (Excel &amp;amp; Visio)&lt;/a&gt;] about the Excel addins that allow anyone with Excel to do Data Mining on Excel tables.&amp;#160; Now&lt;a href="http://blogs.msdn.com/blogfiles/dan_fay/WindowsLiveWriter/DataMiningServicesintheCloudMineyourData_AE80/image_2.png"&gt;&lt;img title="image" style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="162" alt="image" src="http://blogs.msdn.com/blogfiles/dan_fay/WindowsLiveWriter/DataMiningServicesintheCloudMineyourData_AE80/image_thumb.png" width="244" align="right" border="0" /&gt;&lt;/a&gt; the team is testing out SQL Server Data Mining Services – from which you can do the data mining directly from Excel 2007 or even upload a csv file.&amp;#160; &lt;/p&gt;  &lt;p&gt;So for fun – I downloaded a csv file of a stream gauge near Redmond into Excel and ran the “Highlight Exceptions” tool to find outliers in the dataset – it read the table, uploaded it to the service and in seconds returned the results - which included the number of outliers - in this case air temperature and it also highlighted in the table the &lt;a href="http://blogs.msdn.com/blogfiles/dan_fay/WindowsLiveWriter/DataMiningServicesintheCloudMineyourData_AE80/image_4.png"&gt;&lt;img title="image" style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="151" alt="image" src="http://blogs.msdn.com/blogfiles/dan_fay/WindowsLiveWriter/DataMiningServicesintheCloudMineyourData_AE80/image_thumb_1.png" width="244" align="right" border="0" /&gt;&lt;/a&gt;rows.&amp;#160; It was so easy.&amp;#160; I can see it being used for many scientific datasets - even to clean them before doing other analysis, charting, graphs, uploads, etc. &lt;/p&gt;  &lt;p&gt;The Table Analysis Tools included are:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Analyze Key Influencers&lt;/li&gt;    &lt;li&gt;Detect Categories&lt;/li&gt;    &lt;li&gt;Fill from Example&lt;/li&gt;    &lt;li&gt;Forecast&lt;/li&gt;    &lt;li&gt;Highlight Exceptions&lt;/li&gt;    &lt;li&gt;Scenario Analysis&lt;/li&gt;    &lt;li&gt;Prediction Calculator&lt;/li&gt;    &lt;li&gt;Shopping Basket Analysis&lt;/li&gt; &lt;/ul&gt;  &lt;blockquote&gt;   &lt;h4&gt;&lt;a href="http://www.sqlserverdatamining.com/cloud" target="_blank"&gt;Microsoft SQL Server Data Mining Services&lt;/a&gt; &lt;/h4&gt;    &lt;p&gt;Mine your Data, Any Place, Any Time&lt;/p&gt;    &lt;p&gt;The SQL Server Data Mining team is working to extend the power and ease of use of SQL Server Data Mining to the Cloud. Our goal is provide services that allow you to build rich, predictive applications without worrying about server infrastructure, and showcase these services with cool applications that give you a glimpse of what’s possible. We bring you a technology preview of our work below. Enjoy! &lt;/p&gt;    &lt;p&gt;Current Projects&lt;/p&gt;    &lt;p&gt;Table Analysis Tools for the Cloud&lt;/p&gt;    &lt;p&gt;Build powerful predictive reports on &lt;i&gt;your&lt;/i&gt; data with just a few clicks!      &lt;br /&gt;- No data mining expertise required      &lt;br /&gt;- No server installation required      &lt;br /&gt;- All you need is your Internet connection&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;   &lt;p&gt;&lt;a href="http://131.107.181.99/CloudDM/Default.aspx"&gt;&lt;/a&gt;&lt;/p&gt;   &lt;a href="http://www.sqlserverdatamining.com/cloud/"&gt;http://www.sqlserverdatamining.com/cloud/&lt;/a&gt;&lt;/p&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=8925029" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/dan_fay/archive/tags/eScience/default.aspx">eScience</category><category domain="http://blogs.msdn.com/dan_fay/archive/tags/Research/default.aspx">Research</category><category domain="http://blogs.msdn.com/dan_fay/archive/tags/Tech+Interop/default.aspx">Tech Interop</category><category domain="http://blogs.msdn.com/dan_fay/archive/tags/Software/default.aspx">Software</category><category domain="http://blogs.msdn.com/dan_fay/archive/tags/Environment/default.aspx">Environment</category><category domain="http://blogs.msdn.com/dan_fay/archive/tags/Beta/default.aspx">Beta</category><category domain="http://blogs.msdn.com/dan_fay/archive/tags/Data+Analysis/default.aspx">Data Analysis</category></item><item><title>F# – Sept CTP available and units of measure checking/inference</title><link>http://blogs.msdn.com/dan_fay/archive/2008/08/30/f-sept-ctp-available-and-units-of-measure-checking-inference.aspx</link><pubDate>Sat, 30 Aug 2008 20:07:22 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:8910307</guid><dc:creator>Dan Fay</dc:creator><slash:comments>1</slash:comments><comments>http://blogs.msdn.com/dan_fay/comments/8910307.aspx</comments><wfw:commentRss>http://blogs.msdn.com/dan_fay/commentrss.aspx?PostID=8910307</wfw:commentRss><description>&lt;p&gt;&lt;img style="border-bottom: 0px; border-left: 0px; border-top: 0px; border-right: 0px" title="clip_image001" border="0" hspace="12" alt="clip_image001" align="right" src="http://blogs.msdn.com/blogfiles/dan_fay/WindowsLiveWriter/FSeptCTPavailableandunitsofmeasurechecki_8E55/clip_image001_3.jpg" width="196" height="141" /&gt;The &lt;a target="_blank" href="http://www.microsoft.com/downloads/details.aspx?FamilyID=61ad6924-93ad-48dc-8c67-60f7e7803d3c"&gt;September 2008 CTP of F#&lt;/a&gt; is now available for download.&amp;#160; F# is a functional programming language for the .NET Framework and really should be looked at by scientists/researchers.&amp;#160; Also check out the &lt;a target="_blank" href="http://msdn.microsoft.com/en-gb/fsharp/default.aspx"&gt;F# Developer Center&lt;/a&gt; on MSDN for more info and resources.&amp;#160; &lt;/p&gt;  &lt;p&gt;There are a lot of &lt;a target="_blank" href="http://blogs.msdn.com/dsyme/archive/2008/08/29/the-f-september-2008-ctp-is-now-available.aspx"&gt;new features&lt;/a&gt; in this release – here’s a sampling:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Broadly improved &lt;b&gt;Visual Studio 2008 integration&lt;/b&gt;, which allows F# users to scale from scripting and explorative development, up to large-scale component and application design, all within Visual Studio.&lt;/li&gt;    &lt;li&gt;Improvements to the &lt;b&gt;F# language and libraries&lt;/b&gt; to make them simpler and more regular.&lt;/li&gt;    &lt;li&gt;An exciting new language feature, &lt;b&gt;&lt;a target="_blank" href="http://blogs.msdn.com/andrewkennedy/archive/2008/08/20/units-of-measure-in-f-part-one-introducing-units.aspx"&gt;Units of Measure&lt;/a&gt;&lt;/b&gt;, which extends F#’s inference and strong typing to floating point data.&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;The &lt;a target="_blank" href="http://blogs.msdn.com/andrewkennedy/archive/2008/08/20/units-of-measure-in-f-part-one-introducing-units.aspx"&gt;Units of Measure checking and interference&lt;/a&gt; feature is very exciting feature and &lt;em&gt;&lt;strong&gt;potentially most scientifically revolutionary programming language features around&lt;/strong&gt;&lt;/em&gt; - scientists and engineers to check out.&amp;#160; This is because the F# compiler knows the &lt;em&gt;&lt;strong&gt;rules of units&lt;/strong&gt;&lt;/em&gt; &lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;When values of floating-point type are multiplied, the units are multiplied too; when they are divided, the units are divided too, and when taking square roots, the same is done to the units. So by the rule for multiplication, the expression inside sqrt above must have units m^2/s^2, and therefore the units of speedOfImpact must be m/s.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Take a look at the &lt;strong&gt;&lt;a target="_blank" href="http://code.msdn.microsoft.com/fsharpsamples"&gt;SolarSystem sample&lt;/a&gt;&lt;/strong&gt; -&amp;#160; A Solar System simulation application, taking advantage of Units of Measure in F# to do physics simulation.&amp;#160; Andrew Kennedy, who researched, architected and implemented this feature has all the &lt;a target="_blank" href="http://blogs.msdn.com/andrewkennedy/archive/2008/08/20/units-of-measure-in-f-part-one-introducing-units.aspx"&gt;details&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;Other F# resources:&lt;a href="http://www.amazon.com/gp/product/images/0470242116/ref=dp_image_0/103-8847971-3664603?ie=UTF8&amp;amp;n=283155&amp;amp;s=books"&gt;&lt;img border="0" alt="F# for Scientists" align="right" src="http://ecx.images-amazon.com/images/I/41D7MuHHniL._SL500_AA240_.jpg" width="187" height="187" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;&lt;a target="_blank" href="http://blogs.msdn.com/dsyme"&gt;Don Syme’s Blog&lt;/a&gt; – all the F# details&lt;/li&gt;    &lt;li&gt;&lt;a href="http://msdn.microsoft.com/en-gb/fsharp/cc835246.aspx"&gt;Learn F#&lt;/a&gt;&lt;/li&gt;    &lt;li&gt;&lt;a target="_blank" href="http://msdn.microsoft.com/en-gb/fsharp/default.aspx"&gt;F# Developer Center&lt;/a&gt;&lt;/li&gt;    &lt;li&gt;&lt;a title="The F# Website" href="http://research.microsoft.com/projects/fsharp/"&gt;The F# Research Website&lt;/a&gt;&lt;/li&gt;    &lt;li&gt;&lt;a href="http://code.msdn.microsoft.com/fsharpsamples"&gt;F# Samples&lt;/a&gt;&lt;/li&gt;    &lt;li&gt;&lt;a target="_blank" href="http://www.amazon.com/F-Scientists-Jon-Harrop/dp/0470242116"&gt;F# for Scientists&lt;/a&gt; Book &lt;/li&gt;    &lt;li&gt;&lt;a href="http://cs.hubfs.net/forums/default.aspx"&gt;hubFS: THE place for F#&lt;/a&gt;&lt;/li&gt;    &lt;li&gt;&lt;a target="_blank" href="http://www.langnetsymposium.com/talks/3-02%20-%20FSharp%20-%20Luke%20Hoban.html"&gt;Introduction to F# Video&lt;/a&gt;&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;&lt;a href="http://msdn.microsoft.com/en-gb/fsharp/default.aspx"&gt;Microsoft F# Developer Center&lt;/a&gt;&lt;/p&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=8910307" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/dan_fay/archive/tags/eScience/default.aspx">eScience</category><category domain="http://blogs.msdn.com/dan_fay/archive/tags/Research/default.aspx">Research</category><category domain="http://blogs.msdn.com/dan_fay/archive/tags/Cool+Software/default.aspx">Cool Software</category><category domain="http://blogs.msdn.com/dan_fay/archive/tags/Software/default.aspx">Software</category><category domain="http://blogs.msdn.com/dan_fay/archive/tags/Science/default.aspx">Science</category><category domain="http://blogs.msdn.com/dan_fay/archive/tags/Search/default.aspx">Search</category><category domain="http://blogs.msdn.com/dan_fay/archive/tags/Beta/default.aspx">Beta</category><category domain="http://blogs.msdn.com/dan_fay/archive/tags/Data+Analysis/default.aspx">Data Analysis</category></item><item><title>OLAP and Scientific Data</title><link>http://blogs.msdn.com/dan_fay/archive/2008/08/20/olap-and-scientific-data.aspx</link><pubDate>Thu, 21 Aug 2008 01:57:52 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:8882605</guid><dc:creator>Dan Fay</dc:creator><slash:comments>3</slash:comments><comments>http://blogs.msdn.com/dan_fay/comments/8882605.aspx</comments><wfw:commentRss>http://blogs.msdn.com/dan_fay/commentrss.aspx?PostID=8882605</wfw:commentRss><description>&lt;p&gt;While I’ve been pushing the ideas of using OLAP data cubes to evaluate scientific data for awhile, I thought it might be a good time to pull together some relevant papers and links. I believe OLAP is ideal to help analyze large quantities of data including time series information...making it easier for the scientist/researcher to explore the data in real-time and from tools they know like Excel.&amp;#160; For example the data served up on &lt;a href="http://www.fluxdata.org" target="_blank"&gt;FluxData&lt;/a&gt; site is done by creating OLAP cubes using &lt;a href="http://www.microsoft.com/sql/technologies/analysis/default.mspx" target="_blank"&gt;SQL Server Analysis Services&lt;/a&gt;. &lt;/p&gt;  &lt;p&gt;A couple of tools/links that might be of interest as well:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;The Data Mining Add-ins for Office 2007 - very useful since you can do much of the data mining directly from Excel.&amp;#160;&amp;#160; &lt;a href="http://www.microsoft.com/downloads/details.aspx?FamilyID=896a493a-2502-4795-94ae-e00632ba6de7&amp;amp;DisplayLang=en" target="_blank"&gt;Microsoft SQL Server 2008 Data Mining Add-ins for Microsoft Office 2007&lt;/a&gt;&lt;/li&gt;    &lt;li&gt;&lt;a href="http://www.sqlserverdatamining.com"&gt;http://www.sqlserverdatamining.com&lt;/a&gt;&lt;/li&gt;    &lt;li&gt;&lt;a href="http://www.microsoft.com/sqlserver/2008/en/us/analysis-services.aspx" target="_blank"&gt;SQL Server 2008 Analysis Services&lt;/a&gt;&lt;/li&gt;    &lt;li&gt;&lt;a href="http://atom.research.microsoft.com/bio/Default.aspx" target="_blank"&gt;Microsoft Computational Biology Web Tools&lt;/a&gt;&lt;/li&gt;    &lt;li&gt;&lt;a href="http://blogs.msdn.com/jamiemac/default.aspx" target="_blank"&gt;Jamie’s blog&lt;/a&gt;&lt;/li&gt;    &lt;ul&gt;&lt;/ul&gt; &lt;/ul&gt;  &lt;p&gt;Here are a couple of papers that reference the use of OLAP for different types of scientific data.&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;&lt;a href="http://research.microsoft.com/research/pubs/view.aspx?type=Technical%20Report&amp;amp;id=1488&amp;amp;0sr=p" target="_blank"&gt;MSR-TR-2008-71 - Enabling Eco-Science Analysis with MatLab and DataCubes in the Cloud&lt;/a&gt;&lt;/li&gt;    &lt;li&gt;&lt;a href="http://research.microsoft.com/research/pubs/view.aspx?tr_id=1180&amp;amp;0sr=p" target="_blank"&gt;MSR-TR-2006-134 - Using Data-Cubes in Science: an Example from Environmental Monitoring of the Soil Ecosystem&lt;/a&gt;&lt;/li&gt;    &lt;li&gt;&lt;a href="http://research.microsoft.com/research/pubs/view.aspx?tr_id=1180&amp;amp;0sr=p" target="_blank"&gt;Dynameomics: a multi-dimensional analysis-optimized database for dynamic protein data. Protein Engineering Design &amp;amp; Selection, 2008 21: 379-386, 2008&lt;/a&gt;&lt;/li&gt;    &lt;li&gt;&lt;a href="http://research.microsoft.com/research/pubs/view.aspx?type=Technical%20Report&amp;amp;id=1259&amp;amp;0sr=p" target="_blank"&gt;MSR-TR-2007-17 - Reporting@Home: Delivering Dynamic Graphical Feedback to Participants and Researchers in Community Computing Projects&lt;/a&gt;&lt;/li&gt;    &lt;li&gt;&lt;a href="http://research.microsoft.com/research/pubs/view.aspx?msr_tr_id=MSR-TR-2006-90&amp;amp;0sr=p" target="_blank"&gt;MSR-TR-2006-90 - Life Under Your Feet: An End-to-End Soil Ecology Sensor Network, Database, Web Server, and Analysis Service&lt;/a&gt;&lt;/li&gt;    &lt;li&gt;&lt;a href="http://bwc.berkeley.edu/Presentations/list.htm" target="_blank"&gt;Berkeley Water Center Data Server Publications&lt;/a&gt; &lt;/li&gt; &lt;/ul&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=8882605" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/dan_fay/archive/tags/eScience/default.aspx">eScience</category><category domain="http://blogs.msdn.com/dan_fay/archive/tags/SQL+Server/default.aspx">SQL Server</category><category domain="http://blogs.msdn.com/dan_fay/archive/tags/Paper/default.aspx">Paper</category><category domain="http://blogs.msdn.com/dan_fay/archive/tags/Science/default.aspx">Science</category><category domain="http://blogs.msdn.com/dan_fay/archive/tags/Data+Analysis/default.aspx">Data Analysis</category></item></channel></rss>