Welcome to MSDN Blogs Sign in | Join | Help

How To Host Your Site and Content On Azure Quickly and Easily

This entry seeks to provide you with a quick and easy way to get up to speed on Azure quickly by deploying your own personal website as an MVC application in to the cloud. Consider it a “Hello World”. I will do the following:

  • Demonstrate how to write and deploy a simple Azure hosted website
  • Demonstrate how to to create your own image and content server using Azure Storage and expose your content publically through URLs
  • Demonstrate how to use new tools like Azure Storage Explorer to access your cloud storage

Introduction

Now that Azure has been released (well, in January 2010) a lot of people are busy coding a lot of awesome applications. I’m proud of you. I’m not one of them. I just have a personal website that I’ve hosted through a collection of GoDaddy, Amazon S3 (for images and PowerPoint slides, etc.) and some custom JavaScript.

So over the Thanksgiving week I decided to move all my stuff over to Azure for fun. This includes hosting my website, moving my RoR code over to a ASP.net MVC code (don’t freak, ASP.net MVC is pretty much set up like RoR and PHP as far as directories and deployment, so it’s easy), and moving all my images and other media over to Azure Storage so that I can just reference images and CCS using URLs without needing to redeploy my website (much like I did with Amazon’s S3).

SIDEBAR: If ASP doesn’t interest you, we now have PHP, Java, Eclipse, Tomcat and MySQL on Azure. Check it out here.

Now, since I’m a fan of bullets, this is what I’ll walk you through:

  • Setting up your Azure Compute and Azure Storage instances
  • Grabbing some tools to make it easy to upload images/documents/code/zips to your Azure Storage cloud 
  • Creating an ASP MVC application in Visual Studio
  • Making a simple MVC website
  • Get the URLs for your images and content from your Azure Storage and plugging them in (Optional)
  • Deploying your application in Azure
  • Changing your DNS settings to point your domain to your cloud application
  • Enjoy

If you have access to Azure, all the other tools in this post are *free* – and you can probably do a simple site in an evening.

SIDEBAR: I’m most familiar with Amazon’s cloud storage solutions, as are you most likely. So, you probably want tools and functionality that matches that experience. I’ll do my best to set you up in the same way. After all, that’s what I wanted too.

 

First: Setting up Azure Instances

I really wanted to launch in to the tools first since they are so cool and easy to use, but first I wouldn’t be a good citizen if I didn’t tell you how to get all the tokens and keys you will need to use the tools you’re going to download. So, first you’re going to need to set up a few things in the Azure portal. I assume you are either in the CTP *or* you’re reading this in January and purchased Azure through the Microsoft Online portal. Either way the actual configuration portal is: https://windows.azure.com

You will need at least two things created here:

  • Azure Hosted Service Instance (for your code)
  • Azure Storage Account Instance (for your images) (optional)
On Introducing a Model using SQL Azure

You *may* want a SQL Azure instance if you want to really make your Model in your MVC fancy. I’m doing an easy slope today, but once you read through this adding SQL to your Model won’t be hard at all. In fact, the new Azure portal actually gives you connection strings you can just cut/paste in to your application, so hand holding isn’t really necessary. You also can technically use “Tables” in your Azure Storage account and use ADO.net to do simple things. However, these tables aren’t true SQL but more MySQL circa 2000 - flat and dumb but good enough for simple things.

Both of these instances are available from the Azure portal, just click on the “New Service” link and set up one of each:

 

Windows Azure Portal - Create New Instance

As you can see from above, I’ve already used my one Hosted Services account the Azure Gods give you for CTP, but you get two Storage Accounts. Once you go through the configuration, you’ll be given a bunch of information on how to access your instances (particularly for your Storage Account). It’s fine just to “Next” past it all, you can get back to it easily.

Important Things To Consider When Selecting URLs and Names in Azure Instances

Keep in mind that whatever you choose here, especially URLs, will be publically facing. It’s best to name them something that you can easily type, remember, and won’t cause suspicion when others access your data. Sure you may be redirecting to your website’s domain name, but anyone is going to think twice when they see turnyellowteethwhite.cloudapp.net somewhere in your CCS.

On Affinity Names

One of the options you’ll see when creating both instances is setting your “Affinity”. You can choose the geographic region that your storage and application is hosted in. This is interesting, since Google AppEngine and Amazon S3 abstracts this away. However, a lot of customers like to know exactly where their instances are to make sure they are physically as close as possible for latency reasons.  You wouldn’t want your Storage on one coast and your Compute instance on another. With Google or Amazon you don’t get that assurance. You can just leave it be as “Anywhere, US” or if you want select a region, name it, and then select it any time you create other instances to make sure it’s all in one place.

Once finished, you should see something like this:

image

Here I named all my instances “Personal” so I know it hosts my personal site, and I simply made my URLs brandonwerner.* for each.

We will be going in to each of these to either deploy our application or get the tokens we need to connect to our storage, but for right now just keep the tab open and ready to go.

 

Second: Grab Some Tools

Azure Storage Explorer

I imagine right about now there are tons of developers doing for Windows Azure what CloudBerry S3 and DropBox did for Amazon S3 – making the Storage API something easy to use and abstracted away. As both of these companies proved, lots of money can be made by developers offering this piece of functionality to customers alone. As of this writing, while we are still in CTP, those tools have yet to emerge on the scene yet (but they are coming). However, one application contributed by a great development team on CodePlex is well on it’s way to bridging this gap. Called “Azure Storage Explorer”, it basically provides the same functionality as CloudBerry S3, if not providing as many features for consumers. However, for quickly getting connected to your Azure Storage and start adding images and files in the cloud, this app is awesome. It’s also free.

image

TODO: Grab and Install Azure Storage Explorer From CodePlex here: http://azurestorageexplorer.codeplex.com/

Visual Studio 2010 (Paid or Free)

There are now many types of Visual Studio as we try to offer more ways for developers to use the Windows and Azure platform without having to pay for it. As of this writing, Visual Studio 2010 is in beta 2 and free to download. Do so. I love it, and find it hard to go back to anything Visual Studio < 2010. It’s nice, and not nearly as heavy as past Visual Studios. However, if you want something you know we won’t come hat in hand asking for money later, you can also download one of the Visual Studio 2010 Express editions when they become available. These are also extremely nice and free. I don’t know why they are not more widely known, the delta between them and the full version isn’t too much for general development tasks. Azure SDK (which we’ll get next) supports all of these. It does support Visual Studio 2008 and Microsoft Visual Web Developer 2008 Express with SP1, but not the cool new MVC stuff we’ll be doing in this example. For now, just grab Visual Studio 2010 Beta, and then move on to an Express 2010 edition later if you need. Before you ask, VS2008 and VS2010 can sit side by side. Azure is better with VS2010.

TODO: Grab and Install Visual Studio 2010 Beta here: http://www.microsoft.com/visualstudio/en-us/products/2010/default.mspx

Grab the Azure SDK

The Azure SDK is amazingly lightweight for all the stuff it does for you. It will provide some nice templates for all your cloud work and provide a little virtualized Azure right inside your computer for testing and debugging. You’ll need to flip some settings on your Windows computer first though, as you’ll need IIS and ASP.net running on your machine.

To do this, let’s start doing it the Windows 7 / Vista smart way and make this three steps:

  1. Click on your Windows Orb, type “Turn Windows Features on or off” and hit enter. (isn’t that easier than 7 clicks?)
  2. In that dialog box that shows, select the following:
  • Under Microsoft .NET Framework 3.0, select Windows Communication Foundation HTTP Activation.
  • Under Internet Information Services, expand World Wide Web Services, then Application Development Features, then select ASP.NET and CGI.
  • image

    3. Now click “OK” and those features will install.

    Now, assuming you have installed Visual Studio 2010 and selected these features above and installed them in your OS, all you have to do is download the Azure SDK and go through the prompts.

    TODO: Grab and Install the Azure SDK here: http://www.microsoft.com/downloads/details.aspx?FamilyID=6967ff37-813e-47c7-b987-889124b43abd&displaylang=en

     

    Third: Launch Visual Studio 2010 and Create a MVC WebRole for Azure

    So, by now you should have:

    • Setup your Azure Hosted and Storage Instances
    • Downloaded the cool tools to get to work

    This part is so easy it’ll make you want to go back in time and punch your RoR self in the face for being so smug. We’re going to make a MVC website and deploy it in the cloud in just a few clicks. No rails needed, but batteries are very much included (as I’ll show below)

    1. Launch Visual Studio 2010 (NOTE: You need to run Visual Studio as Administrator to deploy Azure applications)
    2. Select “New Project” from the left of the application
    3. Under Visual C#, select “Cloud Service”, name it what you want below and click “OK”

    image

    4. Next, you will be prompted with all I can say is the *real* Azure Cloud selection service, which offers a bunch of different Web Roles now supported with the new Windows Azure and Visual Studio 2010 in a bunch of different languages (I told you it was worth downloading the beta). You see that C# and Visual Basic gives you the most functionality, allowing you to create four different types of web roles.

    SIDEBAR: What are Web Roles? Consider them little runtime bundles that Azure understands. Each has a purpose and is supported by Azure when being deployed (but you’ll have issues if you try to upgrade from one role to another, as I’ll explain later) They are pretty self explanatory for anyone familiar with development. The best thing to say here is that a *real* Azure application will have many roles – separated primarily by backend (Worker) roles and frontend (Web) roles.

    For our purposes, we will only need one role, the ASP.NET MVC 2 Web Role. Select it, and move it over to your solution. You’ll see it automatically names it “MvcWebRole1”:

    image

    When you are finished, click “OK” and you’ll be prompted if you want to add test cases to your application. Yes, testing is good… but for our purposes click “No” and OK.

    Your application is created. Technically, you could hit F5 and run this right now if you wanted, but that would be too easy. Next, well do some quick MVC hacking to get a website up and running.

    Fourth: Create Your Website

    I won’t go in to the whole design pattern of MVC here, except to say that it stands for Model, View, Controller. The View represents what people see, the Model is the data that your website uses and stores state, and the Controller is where all the work happens which uses the data in the Model and displays it in the View. It’s a great way to separate your programming in to different areas of concern. For our simple demo, we won’t be making much use of the Model, but we will be using the Controller and View (in fact, we will be storing data inside of the Controller since it is static text). It is also helpful to state that the MVC piece in ASP.net works much like Servlets in Java, you won’t be referencing files or keeping code in real files as shown in the web browser. Instead, information is called up using REST-esque URL endpoints.

    Visual Studio sets this up very nice for you by providing a simple folder structure that separates the three components as shown below:

    image

    As you see above, Visual Studio has a high level group called “PersonalSite”. This is where all your WebRoles get bundled up for deployment. Some may be tempted to make a comparison to J2EE with Roles being EJBs, but don’t – configuration and deployment are miles easier in this environment than in J2EE environments, and WebRoles do not match 1:1 in regards to containment or design patterns. EJBs are meant to be very self contained things, whereas WebRoles expect to be able to communicate to each other and do not suffer from RMI hell that require local or remote calls.

    Creating Our Template Page

    In the process of including all the batteries you’d need to get up and running fast, the MVCWebRole template put a lot of code in your environment that you’ll need to customize. First on this list is the Site.Master template and the CCS.

    ASP.net uses templates the same way as Dreamweaver and some Java IDEs use them, to provide a basic faming for a site and then allow all derived pages to inherit that template. It looks like this:

    image

    Looking at the Site.Master, it’s pretty easy to see you’ll need to do the lion share of the work here to get your website to look the way you want. The rest is just content.

    You’ll probably want at least the following to be static  in the Site.Master:

    • A Menu that shows the pages in your website (this will be code that references your Views, more on that later)
    • A footer that says what rights the views have to your site, if any
    • A header and a site banner.

    What you’ll probably want to define with <asp:ContentPlaceHolder> inside of some <div>s:

    • Your title, which you will want to change for each page your user visits (so many people don’t do this)
    • Your main content, such as your text in your About page or Publication list in your Publications page

    Here is what I’ve done for my site. Notice the callouts for helpful pointers on what you should do on your Site.Master:

    image

    Brief View and Controllers Overview

    One of the things it is helpful to discuss here is what the Menu code is doing here. This ties deeply into our MVC pattern and you must have the view and controller code in your site in order for your menu to work. Essentially, I can boil this down visually for you in this manner:

     

    image

    As you see, each  menu item references both the Controller for that link and the View which will use the information defined inside that Controller.

    It’s helpful to illustrate here that one Controller can be used for many Views (in fact, that’s an important part of the pattern.)

    In my website, which is simple, I just use the Controller to provide some static content for each View. In particular, I set three things for each View():

    Name Name of the Specific Page (will be used in Title)
    Title The Title of the First Heading (I customize each <h3> heading this way)
    TagLine Just the tagline to use on the site. Unused at the moment

    The code looks like this (notice how each method links to a separate view page):

    HomeController.cs

       1:  public class HomeController : Controller 
       2:  { 
       3:      public ActionResult Index() 
       4:      { 
       5:          ViewData["Name"] = "Brandon Werner"; 
       6:          ViewData["Title"] = "About Brandon"; 
       7:          ViewData["TagLine"] = "I Love Software"; 
       8:   
       9:          return View(); 
      10:      } 
      11:   
      12:      public ActionResult About() 
      13:      { 
      14:   
      15:          ViewData["Name"] = "About Brandon Werner"; 
      16:          ViewData["Title"] = "Who Am I?"; 
      17:          ViewData["TagLine"] = "I Love Software"; 
      18:   
      19:          return View(); 
      20:      } 
      21:   
      22:      public ActionResult Publications() 
      23:      { 
      24:   
      25:          ViewData["Name"] = "Brandon Werner"; 
      26:          ViewData["Title"] = "Publications";  
      27:         ViewData["TagLine"] = "I Love Software"; 
      28:   
      29:          return View(); 
      30:   
      31:      } 
      32:  }

    You will obviously wish to add more to use in your own View, including handing off much of the data to a Model and performing CRUD operations on that data from the Controller. However, we are building a simple website here, so it should do for now.

    Creating Our Index.aspx and Other Site Pages

    As you can see from above, we now have a HomeController.cs filled with all the stuff we may want to use in our webpages (Views). Visual Studio 2010 makes this part relatively painless, since we already have the template defined. All we need to do is call in our Template file, make sure we call in content from our Controller as we need to to make it dynamic, and fill our views with content.

    Here is what it looks like at a high level:

    image

    If you notice above, we call things out of the Controller by using the code:

    <h3><%= Html.Encode(ViewData["Title"]) %></h3>

    Obviously, ASP.net provides a lot more features than simply pulling out Strings through this call, but for our simple website this enough to experiment with. Be sure to check out the ASP.net tutorials for more on this and the code available in the Controller, as there is deep functionality I am glossing over here. This code should get you pretty far, however.

    Using the Design View for Views with Templates

    You can leverage the Design view in Visual Studio 2010 to add content quickly without needing to play in the code very much. For instance, this is what my editor for index.aspx looks like after I’ve applied the Site.Master template:

    image

    As you see, it has hidden and locked all the content except the content I can edit in this page. (the code that is inside the <asp:Content> tags.) For really elaborate content, this view may be helpful to see if the content areas you’ve called out in Site.Master actually look appropriate to your website with the template applied. Although you can also type and add content in this open area, I wouldn’t. Although Visual Studio 2010 strives to be standards compliant, nothing will replace lovingly crafted <divs> with logical classes and ids. That’s just a personal opinion though, you can do whatever you wish.

    Go To Town With Your Own Site

    These are the basics you’ll need to start creating a great site. From here you can add pages to your heart’s content. You can even start injecting ADO.net in to your controller for a Model, or just define some global variables that all your site will use. Either way, you’ll have a ton of fun making this dynamic

    When You Are Done: Run Your Website Locally!

    Visual Studio 2010 and the Azure SDK make this very easy to do. Just Press F5. You will see your “Azure Development Fabric” startup and become an icon in your notification bar. If this is your first time running the Azure Developer Fabric locally, you will be prompted to setup your SQL instance to store some data. This usually requires no input from you (unless you did something fancy to your SQL Server Express install) and you’ll be up and running in no time.

    image

    SIDEBAR: The Azure Development Fabric tab has some great functionality, including being able to show you log files for *each* instance of our Compute instance you are running, and if you have more than one role, you can see the interaction of the various roles inside the consoles as you test your application. All of this is pretty impressive, but boring for us since we only have one webrole running in one instance.

     

    image

    Fifth: Create Your Own Image and Content Server On Azure Storage (Optional)

    Whew. I know that you probably spent a good evening getting your site just the way you want it. Maybe you even played with some deeper and richer content from your Controller. Excellent. Maybe you’ll be writing some awesome Azure apps in your future? If not, you probably just want to crank open that awesome Azure Storage Explorer I talked about above and get cranking putting your content up there to reference in your website or CCS file. Awesome, you’re in the right place.

    SIDEBAR: Why is this optional? Because you can just as easily add content inside of the Content folder of your WebRole in Visual Studio 2010 and references those images locally. This works quite well. You’ll only want to go this route if you like the idea of having unique URL access to your images and also wish to store other content in the cloud under your own URL.

    Grab Your Azure Storage Token From the Azure Portal

    Just like Amazon S3 and other cloud storage solutions, Azure Storage uses Access Keys to unlock your storage. You’ll need to pull these keys from the Azure Portal to use Azure Storage Explorer and start loading in content. All you need to go is go back to your Overview page, click on your Azure Storage account, and retrieve the Primary Access Key from the portal. It will look like below (obviously I will regenerate my keys after this is published)

    image

    Keep this tab open, and launch Azure Storage Explorer. (NOTE: You will also need to run this as Administrator.)

    image

    Azure Storage Explorer will start with some strange placeholders that will fail upon load. What will not fail, however, will be your own local Azure Development Fabric instance (that you created in Visual Studio 2010 above) that you’ll see in the list to the left. You can use this tool to insert in Tables, Blobs and Queues in your local instance for testing purposes as well (how cool is that?). However, we will need to use our Primary Key to set up our external storage “in the cloud”.

    To set this up, you need to click:

    Tools –> Storage Settings

    Here, you will be prompted to add additional Storage instances.

    You will need:

    • Account Name: The unique id you gave Windows Azure when setting up your Azure Storage. Remember I named this brandonwerner (as seen above)
    • Account Key: The Primary Key from your website

    Enter this information in to one of the empty fields for as many instances as you want to access in the tool.

    image

    After you have done this, Azure Storage Explorer will connect to your instance and in a few moments it will appear to the left (as shown above)

    Create Your Blobs, Start Adding Things

    The only thing left to do is create a new Blob instance inside your storage and start adding content. We’re almost home!

    But first, a conversation on the URL mapping before you start calling your Blob containers something funky:

    How External URLs on Public Blobs Map to URLs

    The URLs you will use to access content (or allow others to access files and content) are mapped the unique Azure Storage name you selected at setup and the name of your Blob container which you will soon create:

    imageThis means that you’ll want to choose container names inside your Blob instance that make sense in regards to the content you will be putting in the Blob instance. So, if you are going to host your awesome XNA game installers on your Blob, you’ll want to call it something like “Apps” or “PublicApps”. That will make it clearer for both you and your customers.

    For us, we’ll want to create a Blob called images since that will be what we are putting in our Blob instance.

    To do this in Azure Storage Explorer, simply select the “Blob Containers” folder and go to Storage Account –> New Blob Container and create a container called “Images”

    image

    image 

     

    NOTE: Make sure to make your Blob Container “Public” so others can access the content.

    Adding images and other content in Azure Storage Explorer

    After you’ve created your Blob Container all that’s left is the simple matter of uploading whatever you want to put in your blob space. It’s fairly simple with the Azure Storage Explorer, as shown here:

    imageimage

    Once you’ve added all your images, it’s very handy to get the URL for the content you’ve uploaded to your Blob. Simply right-mouse click any file and select “View in Browser”.

    Pretty cool huh?

    Going Crazy And Loading My Important Stuff In Private Containers

    You may have been thinking this is awesome and you’d love to use this as your backup site for important files and documents you don’t want to lose. You can easily do that, and I recommend you do. But before you get too comfortable with using Azure Storage Explorer for all those tasks, I’d like to recommend you wait for other tools to emerge. For instance, right now Blobs do not support hierarchical folders. Amazon S3 doesn’t support folders either, but the interfaces in to Amazon S3 APIs by third party software emulates this functionality.

    Using the Azure Storage Blob URLs inside of your CCS, other content

    Now that you’ve added everything you want in to the security of the cloud, you can go back in to your code and add the URL references to anywhere you would like, including your CCS files and any links to external content you may wish to host in the cloud.

    Sixth Step: Deploy To Azure!

    Now that you have everything ready to go, time to hit F5 one more time and click around and admire your handy work! It should be quite a thrill to see everything that was running in a dedicated environment executing smoothly locally and ready to go to the cloud. After one last look, click on your Cloud Project and select “Publish”. Windows Azure SDK will neatly package everything up for you and helpfully launch both a the Azure Portal and the location of your deployment payload.

    image

    Your deployment directory will contain two files:

    • A Service Package file that will have all your roles and configuration neatly in one package
    • A ServiceConfiguration file which will have the configuration data for your application (number of Azure instances, any external config strings you may have defined for runtime access, etc.

    This is what their beautiful icons looks like in Windows Explorer (just cause I love we also now have hi-resolution icons in Windows finally)

    image

    Beautiful aren’t they?

    image How you’ll probably see it in your boring “just the facts” view

    Now that we have our package and configuration file, on to the Azure website for Deployment.

    The Azure Deployment Portal

    The Azure Deployment Portal is the place where you will deploy your bundled application. This is for the computer instance, which includes for our purposes the MVCWebRole we just compiled and deployed above.

    You can access this by going to the Azure Portal and clicking on the name of  your Hosted Services instance you created at the top of this blog entry. Once there, you will see an empty section under “Hosted Services” with options to deploy a new instance.

    Production vs. Staging

    The Azure Portal has two deployment environments, Staging and Production. You will want to use Staging first to verify your application works correctly before deploying it in to Production. You will get your own unique URL that is internet accessible to validate your application’s functionality. HINT: This would be a good time to test your website’s standards compliance as the W3C tool will be able to access the staging URL.

    When you are ready to deploy in to Production, you can move or swap your Staging code by clicking the “arrow” icon in between the two environments. You’ll see that in a moment.

    Below, I already have a Production instance, but I’ve deleted my Staging instance to better show you what you should see when first visiting the portal:

    image

    The Azure team has done a great job since CTP providing a lot of visual and textual cues on deployment experiences as you publish your application. The example above around operation time is a good example of this. Each step along the way the portal will indicate to you when something is happening and provide good feedback on the status.

    Click deploy to begin deploying your code to Azure. You will see the web page below. Here you simply upload your package and your config file where prompted. Pretty simple. Yes, you can also have uploaded this in to the Azure Storage Blob and deploy it from there. However, considering this is a simple site there is no reason to do so.

    image

    Once that has been accomplished, you’ll see the “Deploying Package” progress bar as illustrated below. During this time, it is also assigning you a Deployment ID for tracking and that temporary staging URL I mentioned above. Once that has completed, you’ll want to click “Run” to actually start up your Azure deployment. You’ll see an “Enabling Deployment” progress bar while it does so.

     

    image image image image

    Once that is completed, you’ll see an initializing status indicator in your WebRole status. This will turn to Green and “Ready” once your application is fully deployed. While we’re waiting… it may be helpful to go through the different lifecycles and examine the deployment cube in more detail

     image

    Once you see that your webrole is green and ready, click on the URL and check your application. This is the second best time to catch application breaks (the first was on your local machine, before all this deployment work). If you are satisfied, you should see your portal look much like the portal below, except that your Production instance will be empty.

    image

    Now, simply click the arrows button and it’ll do what it suggests, either move the staging environment over to Production (you’ll get an empty Staging environment for your next use) or it will swap what is in Production with the new Staging code. You can then continue to modify and deploy your application in the Staging environment and when it’s ready for prime time.. click the button to production!

    imageWe’re deployed!

    The Seventh and Final Step: Point Your URL to Azure with CNAME

    It’s been a long journey, but you are finally ready to cut over completely to Windows Azure for your personal site. All that is left to do is log in to whatever domain registrar you used for your own website and change the CNAME to point to your Azure application. This is pretty straight forward, as all you’ll need to do is route “www” to point to your Azure Web Site URL.

    NOTE ABOUT CNAME: This is not ideal, as you will have to make sure your users visit “www”.yoursite.com and not go directly to yoursite.com (which will fail or go to your old hosting account if you still have it). However, Azure currently does not provide static IP addresses for their cloud compute instances. Indeed, if you think about this for a moment it’s hard to imagine they ever would considering the range of ipv4 IP addresses left and the amount of applications users will deploy on Azure. For the moment, CNAME is the best way to get this redirection to occur. If you choose to use your domain for your Azure Storage URL, you will also use CNAME for that as well.

    Here is an example of what this menu looks like in my registrar, GoDaddy.com:

    image You’ll see that I’ve already pointed my “www” to my cloud application, brandonwerner.cloudapp.net. However, chances are good that you will have an “@” symbol in there instead (you can see I have this on my “ftp” host name). This is simply the registrar’s way of indicating that all traffic should go to the default IP associated with this domain, most likely set up by your old hosting account when you bought the domain and the hosting package. In your settings, edit “www” to point to your cloud application instead. Set the TTL to 1 hour and click submit.

    imageThe configuration screen for CNAME in GoDaddy.com as an example

    Once you’ve done this step, it should be relatively instantaneous depending on what DNS servers you use. In a few hours, your website will go directly to Azure and operate directly from that URL in all ways (URLs and pages)

    You’re Done…. Or Are You?

    I hope this brief introduction to Azure got you interested in the whole platform, and especially what you can do around Roles in Visual Studio 2010. Now that you have the very basic “Hello World” of a Simple MVC application under your belt, branch out and try new things with the platform. Read up on the MSDN website or explore the Azure forums. Also, keep an eye out for new applications leveraging the Azure Storage instances you have, or if you’re feeling like being a superstar, write your own (I’d really like a Firefox plugin, if you are taking requests.)

    Thanks!

    Software Transactional Memory: Debunked?

    If I go in to my excellent academic article organizer, Papers, and search for "software transaction memory" or "stm" I get at least 30 results of papers both high level and detailed regarding this next big thing that will allow us to finally, without any effort, take advantage of our multi-core CPUs and handle all the nasty locking and synchronization issues for us with nothing more than a language keyword. So much publicity has been given to this idea that no less than three presenters at the Google Scalability Conference mentioned it, with one presentation being nothing but a glimpse in to the STM future.

    As I wrote then:

    This was the trend though, as all of the presentations had a bit of hand waving regarding performance metrics and distribution of computation. This was highlighted by the talk of Vijay Menon of Google - whose work at Intel I was familiar with - discussing Software Transactional Memory. He illustrated the challenges of implementing this in an imperative language (I’m suspicious you can even do STM well in an imperative language with state - as I discussed before) but beyond suggesting the keyword “atomic” to replace “synchronized” in the Java language there was very little real content discussed for those already familiar with the issue of locks and multiprocessors. Concurrent Haskell wasn’t even mentioned.

    It turns out that, according to a paper published in the Communications of the ACM, Software transactional memory: why is it only a research toy?, Software Transactional Memory may not work at all. The article presents research from IBM, who built the IBM XL C/C++ for Transactional Memory for AIX, known as IBM STM, and also takes benchmarks from the Intel STM and the SUN TL2 STM. In the paper, they put the STM implementations through the ringer using b+tree and the Delaunay Mesh Refinement algorithm. It's well worth a read.

    Their final analysis puts a deep nail in the coffin of STM:

    Based on our results, we believe that the road ahead for STM is quite challenging. Lowering the overheads of STM to a point where it is generally appealing is a difficult task and significantly better results have to be demonstrated. If we could stress a single direction for further research, it is the elimination of dynamically unnecessary read and write barriers—possibly the single most powerful lever toward further reduction of STM overheads. However, given the difficulty of similar problems explored by the research community such as alias analysis, escape analysis, and so on, this may be an uphill battle. And because the argument for TM hinges upon its simplicity and productivity benefits, we are deeply skeptical of any proposed solutions to performance problems that require extra work by the programmer.

    Many academics takes the approach that most developers don't need to be aware of, much less optimize for, atomic transactions in their code. Much like pointers and Aunt May's apple pie, it's best to leave those things to the professionals and their compilers. This is the approach argued by Bryan Cantrill and Jeff Bonwick from Sun Microsystems in their article Real-world concurrency. Seeing as these are the same people who brought us D-Trace and lockstat in Solaris OS, so it's probably best to take their word on it.

    From the article:

    The most important conclusion from our foray into the history of concurrency is that concurrency has always been employed for one purpose: to improve the performance of the system. This seems almost too obvious to make explicit. Why else would we want concurrency if not to improve performance? And yet for all its obviousness, concurrency's raison d'être is seemingly forgotten, as if the proliferation of concurrent hardware has awakened an anxiety that all software must use all available physical resources. Just as no programmer felt a moral obligation to eliminate pipeline stalls on a superscalar microprocessor, no software engineer should feel responsible for using concurrency simply because the hardware supports it. Rather, concurrency should be considered and used for one reason only: because it is needed to yield an acceptably performing system...

    ...To make this concrete, in a typical Model/View/Controller application, the View (typically implemented in environments like JavaScript, PHP, or Flash) and the Controller (typically implemented in environments like J2EE or Ruby on Rails) can consist purely of sequential logic and still achieve high levels of concurrency provided that the Model (typically implemented in terms of a database) allows for parallelism. Given that most don't write their own database (and virtually no one writes their own operating system), it is possible to build (and indeed, many have built) highly concurrent, highly scalable MVC systems without explicitly creating a single thread or acquiring a single lock; it is concurrency by architecture instead of by implementation.

    However, I think that some easy atomic transactional wrappers would be helpful for developers, and hope that the research in to some way of implementing atomic transactions in an easy and accessible way continues. Of course, I am still skeptical that any imperative language with living objects that have state can easily have their transactions atomic and it would appear this paper agrees with this skepticism. Anyone for Concurrent Haskell?

    What I've Been Working On: Microsoft Online Launched

    Microsoft Online HomepageOne of the things that is hard to get my head around is what to be secret about and what I am free to talk about. Therefore, I have decided to not talk about what I'm working on very much. Some choose not to talk at all. I considered that, and can see the merit. However, I also like participating in the community quite a bit. This awkward compromise will have to do.

    There has been a lot of changes working for Microsoft - but it is also the most challenging and rewarding job I've ever had. Being part of a product that seeks to take an entire company, and it's millions upon millions of customers, in a new direction is a humbling experience and demands you give your best. Twenty four hour email and high stress ship decisions are part of life.

    It's with that enthusiasm that I'd like to talk about a huge milestone our team accomplished - shipping Microsoft Online. More specifically, the Business Online Productivity Suite, known by us acronym lovers as "BPOS". It is our first offering that combines Exchange, Sharepoint and Live Meeting together "in the cloud". Your company can either exist completely on the cloud - or you can sync your Active Directory at various times throughout the day from your own corporate datacenter to the cloud so that your infrastructure is always in sync across the enterprise. Already, large customers such as Eddie Bauer, Energizer, and Blockbuster have made the switch. Saying those names in a blog would have had me escorted out of the building a few weeks ago - but now that the launch event has occurred and they talked about their experience using our product, it's out in the open that we are launching with some great companies evaluating and using our products in the cloud.

    This is just the beginning - and something I'm very excited to participate in. Microsoft is a great place to work - I can't believe I get to do this stuff every day.

    Congratulations to the team and co-workers for a great release - and even better to come.

    By the way, if you'd like to see our work on building a great community for MS Online users, check out the TechNet forums for Microsoft Online. Feel free to participate.

    Un-PC Reality

    One of the things in all the better Leadership training seminars I've been to in my career has been the insistence and dedication to reality. Usually the best strike hard by announcing a string of facts about what is changing that is causing organizational problems and asking everyone to confront them.

    There are some that have taken this advertising campaign as a validation that their world has not changed. Some that continue to believe we can maintain the way things have existed - the way our revenue and ecosystem has worked in the past - and confuse that desire with the goals of the PC Campaign. Certainly, Windows as a platform is not going anywhere - the marketshare numbers still show remarkable power. The problem is Google and others have proven that not only is that not a hindrance to their efforts - they can use it against us by layering a platform on top of it. That is the reality we find ourselves in. We risk becoming a pretty TCP/IP stack for the larger world.

    That is why it is dangerous to assume "I'm a PC" means an old client PC with a tower/flat panel screen in the corner of the Den running Windows 95. It's tempting to want that old model back, but the reality is Windows has moved away from the PC. What Windows is the air we breathe? Are you talking just of the client? What about Windows in the cloud? If I spend 90% of my time on my MacBook Pro inside mesh.com using Silverlight - am I then - in your opinion - still running Windows?

    To borrow from Emerson: there is no wall where I, the device, ends and you, the operating system, begins. Windows now comes to see us without bell. The walls are taken away.

    I'm still trying to figure out if we have seriously - each of us - looked that reality dead in the face. There is a sense of urgency that needs to spread through everyone - an urgency that I've seen so I'm optimistic we can achieve a lot more in the future than we have in the past. What we have to guard against, however, is assuming that this future looks like the past - a PC on every desktop running Microsoft software.

    Perhaps we should second the new mission statement with a vision "the network, through every device, running Microsoft software"


    I think in some there is real fear of this new reality- fear that needs to be addressed - a way forward clearly communicated for them. That I believe would help Microsoft regain it’s spirit - which never relied on the current products - but always the future. We should aspire to make our customers fans - fans of the brand and the innovation, not of just the PC.

    Tech Trends For Fall Reading: Software Transactional Memory, Cloud Computing Storage, and more

    Now that the summer is over – the tech industry is back to work – and the new products and service announcements are coming quick, why not do some good reading to prepare for the fall when everyone returns from vacation and you get back to the serious business of deadlines, programming and of course geeky arguments about the topics of the day. Here is a good reading list to bookmark.

    Get up to speed of Generic Programming, or Programming In General

    My first recommendation is the collected papers of Alexander Stepanov, which you can get from his website entitled... Collected Papers of Alexander Stepanov. For those who don't know, Stepanov is the key person behind the C++ Standard Template Library, which he started to develop around 1993. He had earlier been working for Bell Labs close to Andrew Koenig and tried to convince Bjarne Stroustrup to introduce something like Ada Generics in C++. His papers are a treasure of thought on generic programming, logic, robotics and anything else that made you turn to the Computer Science page in your university's catalog. Best of all he also provides slides for his book in progress, written with Paul McJones, called Programming Elements. This is a great book for refreshing your knowledge of abstract and concrete concepts in quick and easy powerpoint format. Just take a look at the table of contents and I dare you not to click on at least one of the Chapter links. Don't worry, I won't tell.

    The "Core" Debate of the Community: Concurrent Programming and Software Transaction Memory

    Yes, the pun was bad. It does however illustrate one of the facets of the problem that is burning up academic and commercial researchers alike, and responsible for a large amount of papers flooding the ACM portal: Software Transactional Memory (STM). Well, actually, that's a possible answer to the problem - not the problem itself. They are often confused now. The problem is that since Intel and AMD have decided to start introducing more cores on to single chip we have to deal with the big problem that comes along with that: managing the threads of multiple cores trying to do the same work on behalf of the system it's working for. It also scales in to bigger problems of any type of work you may want to farm off to "locales" that may need to cross boundaries and work on the same data within a transaction (for more information on some of this, see my post from the Google Scalability Conference regarding Cray's work to replace MPI with a new concurrent language Chapel and the GIGA+ filesystem below)

    I think Simon Peyton Jones from Microsoft Research in Cambridge illustrates it best in his paper Composable Memory Transactions(PPOPP'05) :

    The dominant programming technique is based on locks, an approach that is simple and direct, but that simply does not scale with program size and complexity. To ensure correctness, programmers must identify which operations con?ict; to ensure liveness, they must avoid introducing deadlock; to ensure good performance, they must balance the granularity at which locking is performed against the costs of ?ne-grain locking. Perhaps the most fundamental objection, though, is that lock-based programs do not compose: correct fragments may fail when combined. For example, consider a hash table with thread-safe insert and delete operations. Now suppose that we want to delete one item A from table t1, and insert it into table t2; but the intermediate state (in which neither table contains the item) must not be visible to other threads. Unless the implementor of the hash table anticipates this need, there is simply no way to satisfy this requirement. Even if she does, all she can do is expose methods such as LockTable and UnlockTable � but as well as breaking the hash-table abstraction, they invite lock-induced deadlock, depending on the order in which the client takes the locks, or race conditions if the client forgets. Yet more complexity is required if the client wants to await the presence of A in t1, but this blocking behaviour must not lock the table (else A cannot be inserted). In short, operations that are individually correct (insert, delete) cannot be composed in to larger correct operations.

    The most that has come out of this is that we know it's a problem and we'd love to use the keyword "atomic" to wrap our transactional code in our languages. Beyond that, it's a lot of hand waiving and Powerpoint slides. Some people though are actually trying to work it out. The best starting point here are the papers from the before mentioned researcher Simon Peyton Jones. His collection of papers on STM offers a good starting point of the problem and what some possible solutions are. In his papers he uses Haskell, and his work has led to Concurrent Haskell. Haskell lends itself to STM for reasons I won't go in to here, but it will be quite a bit more of a challenge to get the same functionality in Java and C#, but there is already an API for C# Software Transactional Memory from Microsoft Research you may want to explore.

    If you don't care about this, just don't go naming classes atomic and you should be fine.

    Storing The Cloud: How Do We Scale?

    Solid State (read: Flash) drives aren't the only thing showing the age of our old file system technologies. As we expose software as services and begin taking on large numbers of tenants for our software, cloud computing needs clusters with thousands of nodes that, with the multi-core technology mentioned above, will impose a challenge for storage systems. We will need the ability to scale to handle data generated by applications executing in parallel in tens of thousands of threads. There have been some solutions posed, such as IBM General Parallel File System (GPFS) and Microsoft Research's Boxwood technology.

    I was lucky enough to watch a presentation on GIGA+, another solution that is being researched by Swapnil V. Patil at Carnegie Mellon University. One of its neatest ideas is leaving the header-table behind, using a bitmap instead. I got to sit down with him afterward and talk about the challenges we face in this space. It was a great time. His primary concern about GPFS and Boxwood is the use of hashing and B-trees, which causes the possibility of bottlenecks and synchronization issues. By using a bitmap, and keeping it small so that it can be shared across nodes easily, GIGA+ eliminates a need for "metanodes" or other controllers on the HPC storage architecture.

    His paper, GIGA+ : Scalable Directories for Shared File Systems, is a great read for those interested both in the problem of high-performance computing and storage. Their work seeks to maintain the UNIX file structure however, so those who care about scaling Microsoft infrastructure may find less to enjoy, but the overall architecture and problems outlined in the paper is applicable to any massively large storage cluster technology.

    Enough Already

    That should be enough to get you through August. When your boss comes back from his Alaskan cruise, nothing will ensure he leaves you alone more than talking about Concurrent Haskell or how much you enjoyed Chapter 9 of Programming Elements: Algorithms on increasing ranges. Enjoy the air conditioning you lucky bums.

    On The New Communications of the ACM Redesign

    Communications of the ACM July 08 A while ago ACM embarked on an ambitious mission: to change their flagship publication, Communications of the ACM, for Association for Computing Machinery members, in to the JAMA of Computer Science. If this new issue of the re-designed CACM is any indication, they will succeed. In the first few pages we have quantum computing, modeling to eliminate errors in software, an analysis of cloud computing, a debate about the future of the computer science curriculum and what it means for their career path as programming becomes offshored, and the history of the IT industry in India.

    .. and I'm only on page 33.

    There are 112 pages.

    It use to be that way - back from the inception of CACM on through the 1970s the magazine was a collection of computer science research for the academic professional. However, as the 1980s and 1990s moved computers in to people's homes and the IT field changed from Phds toying with large Turing machines to undergrads who used Visual Basic and Java for basic business purposes, the magazine changed. These new practitioners didn't come from the academic field, didn't really understand the basic underpinnings of a computer, and usually didn't care. The funding of the ACM dried up as well, even as the number of people in the field boomed. The CACM changed to grab these people by becoming more of a mainstream magazine geared towards those new entrants - maybe to attract these people to the ACM membership. It didn't seem to work. The magazine lost its way.

    Now we are once again approaching a change in the computer science field. Much like the way Cloud Computing is taking us back to the large machines in the back rooms and thin clients at the edge, software engineering is changing back from large numbers of engineers with basic knowledge to a smaller number with more specialized knowledge. The Googles of this world are not as worried about basic applications written across millions of detached machines - things that usually create reusable patterns and easy software construction from a weekend's reading of O'Reilly books. Instead, they are worried about problems of concurrency, massively scalable storage systems and parallel processing while sharing the same memory space. The choice of the language has changed to an implementation detail to express these ideas and can be interchangeable. These problems require knowledge of tuples and binary trees and graph theory, to name a few.

    At the same time programming jobs that boomed in the 90s and 00s are being outsourced to cheaper and cheaper labor overseas with the harder proofs being demonstrated once on the internet and then communicated across the world for others to incorporate. Pre-packaged software for businesses are becoming more configurable to existing systems and removing the need for custom software from programmers in non-software companies. This means that those who are serious about the profession are diving deeper in to the roles of architect, designer and academic - while those whom aren't as interested are moving on to other careers. These two changes are providing an entrance for a journal like CACM to come alive again and publish the best research available needed to solve these hard problems.

    The new CACM couldn't come at a better time.

    Posted by brandon_werner | 0 Comments
    Filed under: ,

    Goodbye Map Reduce - Hello Cascading

    An interesting post from Nathan Marz regarding an abstraction layer from Chris Wensel called Cascading:

    We have been doing a lot of batch processing with Hadoop MapReduce lately, and we quickly realized how painful it can be to write MapReduce jobs by hand. Some parts of our workflow require up to TEN MapReduce jobs to execute in sequence, requiring a lot of hand-coordination of intermediate data and execution order. Additionally, anyone who has done really complex MapReduce workflows knows how hard it is to keep “thinking” in MapReduce. Luckily, we discovered a great new open source product called Cascading which has alleviated a ton of our pain. Cascading is the brainchild and work of Chris Wensel, and he’s done a great job developing an API which solves many of our problems. Cascading abstracts away MapReduce into a more natural logical model and provides a workflow management layer to handle things like intermediate data and data staleness.

    Very good walkthrough of how they take a tuple problem set and use Cascading to simplify the management of pipes, particularly forking and merging pipes together.

    You may also want to see Yahoo Research's Pig as another example of an abstraction layer over MapReduce, which seem to be all the rage now as we need a way to query / join and generally work with these large datasets in an easy way. Yahoo's Pig seems to rely heavily on SQL like syntax - an approach I'm not as fond of as the approach Cascade takes.

    Microsoft Live Mesh on Apple Mac OS X

    This is a screenshot of Mesh running on the Silverlight platform on Mac OS X. Pretty neat example of the future.

    By the way, if your interested in developing for Live Mesh, there are some new videos posted on Microsoft Videos that provide an impressive amount of content. The RESTful services and the Pub/Sub model are of particular interest to me, since I think it will unleash a host of service aggregation possibilities in the future. The ability to ask the cloud to give you JSON or RSS without mapping or conversion is amazing.

    Microsoft Mesh Running In Safari Browser on OS X

    The Rise Of Functional Programming: F#/Scala/Haskell and the failing of Lisp

    Over at Lambda The Ultimate, the best academic programming blog on earth, there is a large debate going on regarding what the future of languages will be for 2008. The most important thing to emerge from the discussion is the larger role functional programming will play. It seems like a safe bet. This year has seen the explosion of interest and creation of functional languages such as Apple OS X's Nu, Java's JVM using Scala and Microsoft Research's .Net language F#.

    I am ecstatic at this change.

    The Failure Of Lisp

    It's hard to understand where it came from. Certainly one can argue the broader academic community had nothing to do with it, the old guard Common Lisp hackers are still as fickle and as judgmental to new comers as ever. Also, the old standards in Lisp languages, Franz and LispWorks have not lowered their prices to anything approachable to the casual developer. There are open source ANSI Lisp implementations without all the supporting engines and functionality, such as SBCL. In fact, my most linked thing I've ever written in my career is the installation walk-through I did for installing SBCL and Allegro which includes adding your repository and packages for CLOS and automatically compiling the FASL files, especially dealing with the asdf differences between the implementations. The complexity of this in itself points to problems with portability and configuration in Lisp. However, even that project that targeted Lisp's Bread and Butter, the parsing of semantic ontologies for the Semantic Web, was met in the message boards with worries on if there would be enough developer participation using such an odd language, and recommendations on moving it to Java.

    In reality, Common Lisp showed its failure as a community by sitting out this enthusiasm that has been generated around functional programming languages. It didn't have to be that way. I recall my first awareness of functional programming's growth was the awesome work of Lemonodor's blog and Sriram Krishnan posting "Lisp Is Sin". I was happy at the time that Lisp was getting such attention, as well as functional language architectures in general. I imagined that as OO languages had grown so verbose and feature dense that even the IDEs to develop your applications run in to the tens of gigabytes, a new evolution "Back To The Future" was inevitable. Even more, I believe long suffering Lisp deserves to be back in favor again, it's certainly spent its time in purgetory. Yet, it didn't happen. You can blame the old 50 year old men sitting on IRC channels for that. It was the most thorny and un-inspiring community I've ever participated in, despite my extreme interest in the language. It's jaw dropping that a language with such promise has sat out the resurgence, and speaks to what an un-friendly and un-inviting community can do a technology platform. I would be the first to march it off to the grave.

    The Rise Of Functional Languages

    The interest in functional programming actually grew up around more academic but pure languages like Scheme and Haskell. Although these languages sit within their own island and lack many of the "dirty" aspects of Lisp's CLOS environment that make it easy to access OS and hardware resources, they are still strikingly useful in learning things that are the staple of functional languages, such as Closures and Lambdas. Indeed, one could argue that the movement to move Closures in to OO languages (first C#, now Java) was in part due to the rise of awareness of functional languages.

    Further, it seems to me that functional programming languages answered two prayers of those more ambitious engineers who don't seem to want to stick with the script and Java worlds they were taught in college. Those two large wins, far more important than the semantic features of functional languages that have gotten all the attention, are architecture foundations of functional languages:

    • Referential Transparency / Side Effects
    • Concurrency

    Referential Transparency

    To those coming from a pure OO world, Referential Transparency and the restriction of side-effects can be something hard to get their heads around. The best way I describe this concept is by hitting at the root of their assumptions: Everything they deal with are dead. The objects are dead, the variables are dead, the entire atmosphere is dead, as if something had come along and killed everything in your stack and you have to assemble your program by only what's been given to you, nothing more. There are no instances, objects do not "come alive" and have state; a state that you have to poke in to and a state that can change at any time. A function will always do what you expect, and nothing can come along and change that behavior.

    One of the things that seems to appeal to developers most about the promise of SOA architectures happening in enterprise environments, if you're smart enough to pry it out of them, is that they get the same referential transparency in services. No one can override a service (besides versioning, which is explicit to the developer) and a service will only return what it did earlier in your code and earlier in the year. This forces developers to design services that have the same relationship to the world as functional programmers write their functions for. This is perhaps the trickiest part of migrating enterprise teams to a services based model, their expectations of the mutableness of the services they are accessing and their inability to anticipate what working in that world will be like. Especially for those who use tools or libraries to convert service interaction in to an object, the interaction can be jarring.

    However, the soon find the predictability and the safety of such an environment liberating. In much the same way OO programmers were use to making their objects or variables immutable to maintain their contracts and relationships with other objects, often sacrificing many of the benefits that OO programming promised their stack, now they have immutability and transparency in an environment where functional paradigms are key, they do not expect to be able to "embrace and extend" services. They are what they are. This tends to cascade out to the living instantiated code a developer writes as well, as there is no point in entering the world of the living if what you have to return to is a dead function.

    This was hinted at in an article in the ACM Queue magazine by Terry Coatta, entitled "From Here to There, The SOA Way". He states,


    Objects are still a very good way to model systems and they function reasonably efficiently in the local context. But they don't distribute well, particularly if one tries to use them in a naive way. A service-oriented architecture solves this problem by dealing with the latency issues up front. It does this by looking at the patterns of data access in a system and designing the service-layer interfaces to aggregate data in such a way as to optimize bandwidth, usage, and latency.

    Not that SOA limitations are the only thing that is affecting the consciousness of a software engineer, the other issue is the large rise in the complexity of managing a large enterprise library written in an OO language. One of the largest pain points of any application of large size is the management of graphs and graphs of live objects and the living data within them. When software engineers experience the lack of side-effects in functional languages, it's a breath of fresh air.

    Concurrency

    A funny thing happened on the way to those multi-core processors. People loaded their applications on them and noticed nothing got much faster, particularly when it came to transaction intensive tasks. Turns out Intel and AMD left out an important fact about their Moore's Law cheating multi-core environment: you can't ring as much performance out of it without changing the way you manage concurrency and threads. Sequential programming could always rely on going faster as the single processor speed got faster, but as multicores come in to play that isn't always the case. You want to farm off transactions to occur on separate processors, and in the living world of mutable objects and variables, breaking out two transactions to work concurrently that operate on the same living data is a bad idea. Add structural programming's solution to this problem, optimistic and pessimistic locking, and you have dead-locks in short order.

    Functional programming has been a natural place to explore parallel processing and new ways of doing atomic transactions because of the reasons above. More important, these atomic structures can be composable which is lost when doing locks in structural programming. A lot of the buzz has been generated around the idea of software transactional memory, where execution blocks can be flagged and managed and built upon. The best introduction to this topic is the paper by Tim Harris entitled Concurrent Programming Without Locks. Although this use to be expressed only in the confines of Concurrent Haskell, others have shown how the same techniques can be used in other functional languages, such as F# using nothing more than PowerList.

    This experimentation is one of the large reasons why functional languages have become more important as software engineers wrestle with the problems and promise of multi-core processors in transaction processing. Although not every engineer will be interested in the deeper details of STM or other strategies in concurrent programming, the fact that these libraries will emerge and only be available in the functional realm will force software engineers to learn the core concepts and bring even more visibility to the functional programming space.

    Functional Hybrids: Functional Programming Is Now Approachable

    The other driver for adoption of functional programming languages, besides the architectural benefits it has to solve current problems, is the fact that languages such as F# and Scala have adopted a more hybrid model in their language design, where a developer isn't forced completely outside her comfort zone. Scala is a combination of functional and deeper OO methodologies (as in SmallTalk) and has access to the entire Java library, significantly reducing the learning curve. The same can be said for F# and .Net and Nu and Objective-C. This does have draw-backs however, as both F# and Scala have not been able to use more of the STM strategies that Concurrent Haskell allows because the underlying thread architecture of the VMs they run against are built for structural programming languages. It is easy to see how this can be fixed, however, and allow those using hybrid functional languages the same power as those who express their ideas in Haskell or even Lisp.

    As I said, I am excited about this new resurgence in functional programming languages, and I am enthusiastic 2008 will have even more to offer those who are just getting their toes wet. I personally know some college freshman who started out using Nu as their first language, and are already contributing to the community. The future of software engineering is bright.

    Thoughts On Google's Conference on Scalability In Seattle

    Google Scalability Conference Logo

    If you are looking for a good collection of notes regarding the topics covered at the Seattle Conference on Scalability, you can do no better than what James Hamilton put together. Instead, I'll write a quick commentary on what I experienced.

    Scalability Is Your Problem Too

    The goals of the conference are laudable. Scalability is an issue that almost all practitioners of software engineering face, especially as we move towards offering services both inside and outside the enterprise. Many are taken off guard by the sudden issues that confront them after wiring up a large scale services-based environment; especially around distributing load, distributing the data, and writing the data quickly. Sadly, I didn't see too many people from large companies there - most were software companies like Microsoft, Google, MySpace and Amazon.com. The attendance may be a consequence of the subject matter. This was some intense stuff dealing with MPI at Cray and its hopeful successor, Wikipedia redone with DHT and Erlang, a b-tree vs. Hashmap debate and scalable storage issues when dealing with billions of files. A more fun loving person would have done better going over to Adobe and hanging out at BarCampSeattle, which was going on at the same time.

    Despite the intimidating material, there are real architectural and design issues that these discussions present that should be in the mind of anyone dealing with large datacenters that scale globally or even nationally. The approach of GIGA+ file storage, maidsafe's new computer architecture, and NetWorkSpaces for the R language was uniform: off-loading responsibility for management of data (meta or otherwise) to all vertices in the deployment graph instead of a central repository. NetWorkSpaces in R and maidsafe even discussed computational scalability - while Cray's new Chapel language and the discussion around Software Transactional Memory focused on scalability across processing cores as well as machines.

    GIGA+ Bitmap Example

    GIGA+'s approach of maintaining a small bitmap file on each node and passing that around - while anticipating and accepting stale data on a few edge nodes - was brilliant in the patterns it hinted at, including that perhaps being right all the time isn't as important as being fast. You can be right most of the time and accept the performance hit of not being right some of the time. There are many people who would cringe at this, but at this point we're going to have to play loose and leave a few balls up in the air as we juggle - doing the math of how often one may fall while keeping the rest going as fast as we can.

    Pay No Attention To The Man Behind The Curtain

    Yet if I had to sum up the content of the conference I would say it was big on strategy and architecture but short on implementation. There was a lot of things hinted at "behind the curtain" but nothing assured hand raising from the compsci geeks in the room more than hand waving when you got to the distributed piece of your solution. For instance, one of the big benefits of Chapel - the MPI successor that Bratford Chamberlain of Cray presented - was that you could have distributed arrays and graphs that would be automatically sliced up to be distributed to parallel cores or even other "locales" if desired. How the language determines where to split these large arrays and graphs and farm them out was not discussed. One of the more interesting slides was dashed lines drawn across various nodes and vertices of a graph symbolizing how it would be chopped and distributed. Someone in the audience raised their hand at this - but he moved on and the hand went back down. To be fair, Chapel was called a "multi-resolution" language where one could start fairly abstract and then add more detail and control to get the best desired result - something I assume you have to do to get good or intelligent chopping and distribution of the data. Given that one of his slides was a comparison of code lines between Fortan using MPI and Chapel: seeing a working code snippet of Chapel would have been helpful. It may turn out to be the same amount of work after you get past the "global view".

    This was the trend though, as all of the presentations had a bit of hand waving regarding performance metrics and distribution of computation. This was highlighted by the talk of Vijay Menon of Google - whose work at Intel I was familiar with - discussing Software Transactional Memory. He illustrated the challenges of implementing this in an imperative language (I'm suspicious you can even do STM well in an imperative language with state - as I discussed before) but beyond suggesting the keyword "atomic" to replace "synchronized" in the Java language there was very little real content discussed for those already familiar with the issue of locks and multiprocessors. Concurrent Haskell wasn't even mentioned. A better introduction and discussion is to be had by watching the O'Reily's OSCON video from Simon Peyton-Jones (the writer of GHC and now at Microsoft Research) on the subject. After that, if you're still hungry, his collection of papers on his Microsoft Research site is a delight.

    Of course the point of these conferences is the discussions that occur during the breaks and in the networking event afterwards - something that I treasure having newly moved to the Seattle area from Cincinnati. Instead of just observing and blogging from afar - I get to be at the same table as Vijay Menon, Thorsten Schuett, Swapnil Patil, Paul Watson and others.

    Summary of the Architectural Patterns I Saw

    If I had to summarize what I took away from the conference from a high-level architectural stand-point, here are they are:

    • Every node must be aware of the state of every other node without a centralized controller.
    • To do this, a mechanism should be in place to share state quickly but peer-to-peer.
    • It's ok to let some nodes go stale.
    • Client/Server is now one thing. Pub/Sub with computation. Every node on the graph should do work.
    • As much as possible, each node should maintain its own security and state. You should be able to have anonymous resources appear in your data center and be put to use without much configuration.
    • As much as possible, abstract the distribution of processing away from programmers.
    • Key,Value with Hashes are best for scalability and distribution (it seems to have won out in all the solutions presented here.) Blame MapReduce.
    • Ants can be used to demonstrate anything.

    I hope everyone had a good of a time as I did.

     
    Page view tracker