Larry Franks and Brian Swan on Open Source and Device Development in the Cloud
I've been researching how to troubleshoot Node.js applications hosted on Windows Azure recently, and while I'm don't have a complete story yet I wanted to share what I do have. So here's a brain dump of diagnostics information that Node.js developers might find useful.
Also, don't be depressed/upset about some of the problems I call out with things like logging on Cloud Services. We know that we need to improve this story. The Windows Azure developers are working on it, but if you have thoughts on how we can improve troubleshooting for Node.js applications on Windows Azure, let me know.
Unfortunately the troubleshooting story is different between Web Sites and Cloud Services. Even within Cloud Services, there's a big difference between what's available for applications hosted in a Web Role vs. a Worker Role. In general, everything I mention in this post about Cloud Services is for Cloud Services implemented using a Web Role. The reason for this is that Node.js applications hosted as a Web Site or a Cloud Service in a Web Role use the IISNode module, which provides some nice troubleshooting functionality. Applications hosted in a Cloud Service Worker Role are just raw Node.exe talking to the wire.
This doesn't mean that you should never use Worker Roles, just that if you do use one and need to troubleshoot your application that you'll have to fall back on standard works-for-every-node-application troubleshooting steps.
Here's the breakdown of what is and isn't available for troubleshooting Node.js applications hosted as a Web Site or Cloud Service (Web Role only):
Since most of the troubleshooting functionality is implemented through IISNode, it's important to understand how it works. Basically, IISNode is a layer that sits between IIS and Node.exe, and routes requests that arrive at IIS to Node.exe. This lets IIS handle things like spinning up multiple node instances (1 per core by default) and lifecycle management tasks.
IISNode has a bunch of configuration options, which can be controlled by adding an iisnode.yml file to the root of your application. A full listing of switches can be found at https://github.com/tjanczuk/iisnode/blob/master/src/samples/configuration/iisnode.yml.
So here's one of the gotcha's with IISNode - Windows Azure doesn't use the full version. Instead it uses a 'core' version that only implements some of the functionality of the full version. I don't have a complete list of what is/isn't in the core version, but the things I mention in this article should work (unless I call out specifically why they don't.)
IISNode can capture stdout/stderr streams and save them to a log file. This is currently enabled by default for Windows Azure Web Sites and Cloud Service Web Roles. By default, the logs are stored in a subdirectory named iisnode. If you want to disable logging, you can create an iisnode.yml file and add loggingEnabled: false to it.
For Windows Azure Web Sites, the log files can be pulled down by visiting the FTP link for your web site, or by using the Command-line tools command of azure site log download command. The download command will pull down a Zip archive containing not only the IISNode logs, but any other logging you may have enabled in the portal.
azure site log download
Windows Azure Web Sites have one other nice feature; the logs from all instances of your web site are aggregated, so there's just the one directory containing logs of the stdout/stderr output for all instances of the application.
The story for Windows Azure Cloud Services isn't as good currently. The log files are stored into the iisnode subdirectory, but there's no centralized aggregation of the log files and there's no functionality to easily get to them. The logs are stored on a per role instance basis, and you need to either remotely connect to each instance to retrieve them or set up a process that periodically copies them off the instance.
If you read through the iisnode.yml example linked above, you may be thinking "But I can just access the log files via HTTP, why do I have to remote into the instances or download the logs?" Well, this functionality is disabled for Web Sites and Cloud Services by default. You can make it work by modifying the IIS URL Rewrite rules in the web.config for your application, however consider this; how useful is it to enable HTTP access to the logs when you can't easily determine which hosting instance you'll connect to?
For example, if you're hosting the application on a Cloud Service you might have it scaled out to 2 instances. Those are load balanced behind the virtual IP assigned to the service, so hitting that URL is going to take you to one of the instances, but you can't really direct it to one or the other.
So for a production, scaled out application, HTTP logs are sort of not that useful. It's far more useful to have some sort of centralized aggregation of the logs, such as that done by Windows Azure Web Sites.
Ever browsed your application after deploying it and received an HTTP 500 response? Not very helpful. When development errors are enabled, you'll get an HTTP 200 response instead, along with the last few lines of information sent to stderr.
Development errors can be controlled by adding devErrorsEnabled: true or false to your iisnode.yml file. Currently the default seems to be true. For a production application, I'd recommend disabling this setting, as most end users don't want to see your code's error spew and would appreciate a custom friendly error instead.
IISNode isn't the only thing available in Windows Azure that can help diagnose problems; since IISNode runs under IIS, and on Windows, there are additional logs for those pieces of the infrastructure that can be enabled in the diagnostics section of the CONFIGURATION settings for your Web Site or Cloud Service. For more information on working with these settings and the information logged by them, see How to Monitor Cloud Services and How to Monitor Web Sites.
I hope the above information is useful to those of you hosting Node.js applications on Windows Azure, and if you have any additional diagnostic/troubleshooting tips you'd like to share feel free to send them my way.