I recently sat down and thought a little about the typical user experience when troubleshooting IIS6, assuming s/he had little/no IIS context that long-time users have... and the picture did not look so good.
Now, I know that IIS7 will make huge improvements in this area (and will unfortunately obsolete some of this information... but not the general concepts! :-) ), but it is not available yet. I am also certain that users will continue to push us to literally start to self-diagnose the issue instead of providing useful information for users to make corrective actions (and certainly not merely dump trace data, like we do in WS03SP1...), so there has to be a good balance.
But in the meantime, I wanted to gather together some of the basic troubleshooting steps that I go through to diagnose issues which appear to involve IIS, as well as explain some of the rationale of the steps, so that the reader can better understand the process.
First and foremost, when you are trying to troubleshoot why something is not working, you want to preserve the system state for examination. I cannot emphasize this enough... especially when you want someone else to help you. I can tell you that developers will not even look at your issues unless you give them the exact state that triggers the issue, and their rationale is simple - they want to diagnose the actual issue... not some psuedo or related issue. Only the Real Thing.
Thus, you want to treat the system more like a "crime scene" where you take measured steps around the chalk figures, take non-destructive samples, record the location of everything, etc... If you must make changes, make note of the order in which you do them as well as any consequential effects. I prefer to keep a little temporary "journal" in notpad.exe of my actions - so that I remember what I did, why I did them, and in what order... so that subtle patterns may emerge in later analysis without needing to gather data again.
For example, RESIST the urge to try to uninstall/reinstall IIS, change filesystem ACLs to wide-open, make user accounts to be administrator, or lower security settings on the server or in the browser just to see if things work. These sorts of actions merely destroy system state such that you may never know *why* something started working (and since you destroyed system state, no one else can help re-interpret for you)... so you never know *how* to deal with it in the future. In other words, you fail to learn and grow from the experience, and it can only cost you more time/effort in the future. Resist the urge to snap up short-term gains and focus on the long-term benefit. Simply finding solutions is ok; learning how to solve a class of problems is even better. Gee... these same ideas apply to life equally well... ;-)
This is why on newsgroups/Q&A, when someone tells me that they reinstalled, wiped ACLs, changed Administrators, etc during their "troubleshooting", I simply have to turn off and stop helping them... not because I do not care... but because at that point I am no longer dealing with the original issue but some new meta-issue.
Now, I know that some of you enjoy "troubleshooting" by tinkering around with settings in the UI to see if any combination works and if it does, great. But realize that this simply means that you condemn yourself to be limited by whatever is accessible via the UI, and in the case of an open platform like IIS, understanding fundamentals of how things plug together is really important because the UI simply cannot and does not represent all useful interactions. Thus, the astute reader should notice that I never write a blog entry from the perspective of the UI but always from the perspective of the IIS Core Server and what is going on... and the UI is merely a means to configure the necessary settings as appropriate.
In general, I prefer to the classic warfare approach of first gathering and analyzing my "recon data" before formulating any strategy and making any intrusive deployments and movements...
A common pitfall that I see users fall into is to claim:
I did not see any useful errors in X, so I decided to do something more drastic.
I did not see any useful errors in X, so I decided to do something more drastic.
While I understand that users are trained to look for errors in a variety of places (Event Log seems to be a common place where people expect everything to be logged), and IIS does not log everything everywhere (and certainly does not flood the event log), it pays to be patient. Even when I am dealing with some system I do not fully understand (like the Windows Firewall, Exchange, SQL...), I still first search the web for any diagnostic aid, search the filesystem/registry for any log file clues, etc... before trying anything else... simply because I value preserving system state.
The most valuable log files for IIS6 (and their locations) are:
** I'm not going to complicate things with Centralized Binary Logging and such... because the point of the logfile remains the same; just location differs. Materially it does not change the discussion nor rationale.
One of the more non-intuitive things when dealing with IIS log files is that there is no single location to correlate all of them. Worse, it often appears that information is haphazardly split amongst them, so it is never clear where to look for what. Barring unintentional bugs, here is how I logically classify the log files:
One common source of "debugging information" that I dissuade you from relying upon for troubleshooting is the HTTP response displayed by your web browser. Quite frankly, do not trust the browser for troubleshooting unless you have nothing better. Use a tool like WFetch from IIS Resource Kit Tools. Browsers simply have too many "usability" features that limit their usefulness, including:
In short, first look at the aforementioned Log files on the server before doing anything else...
When the log files do not tell the whole story, I resort to pragmatic, non-invasive monitors like FileMon, RegMon, and NetMon to observe and record what happens on the situation in question.
Suppose the log files and pragmatic monitors fail to tell the whole story... I then attach debuggers like NTSD or WINDBG from the Microsoft Debugging Toolkit (do NOT use Visual Studio because installing/using that tool changes too much machine state which may be relevant. Visual Studio is more a development platform than a debugging tool), set up symbols as directed, and then investigate the process responsible for handling the request in question.
A couple of useful breakpoints to use include:
In conjunction with Log files indicating the error response/codes, Network Monitor indicating what is wrong with the response, and trapping what ISAPI Filters/Extensions do on the server, you can usually track down whether the problematic response came from IIS or some particular ISAPI... which is a huge step forward in troubleshooting.
Ok, I am going to stop at this point because I do not want this blog entry to be some all-encompassing novel that takes me forever to write, edit, and perfect and hence never publish. ;-) I hope that this information provides a useful scaffold for any user of IIS to effectively gather data to troubleshoot a variety of IIS-related issues by employing various associated diagnosis techniques (which I intentionally did not mention... though they would be nice subjects for future blog entries).
Please note that my troubleshooting steps do not involve changing the system's or even IIS's configuration other than replaying the request or action that triggers the issue under investigation... because to me, preserving system state and recording my actions/observations is most important. Why? Well... suppose I cannot actually resolve the issue... then I definitely do not want to prevent anyone else from helping me resolve the issue and want to provide them with the best environment and all the information I had already independently gathered to save them time.
Yes, I realize that this approach does not make you feel empowered because you do not actively change anything... but please realize that troubleshooting is about making correct changes quickly; not quickly making correcting changes... ;-)
[2006-07-14] Hmm... minds thinking alike. I recommend this URL: