One of the things I’ve learn over the last few years and what I find missing is the answer to the question of, “What’s the best approach to debugging a network issue?”. After spending long hours in front of nasty networking issues I think the following are the steps is the best way to go about it.

1. Understand the System
This is probably the most important step to debugging an application. You should find out all the entities you are interacting with and how the communication takes place. Create a picture of what the application is trying to do. For networking problems, I find timeline diagrams to be very useful.

In addition, network sniffing tools are a big help in helping you know whether your model of how the system works is how it actually works. In my mind, there are 3 parts to understanding the system – understand the client, understand the network, understand the server. These are 3 active entities performing operations and without all 3 of them working together, we have a non-functional system.

2. Check the Configuration Options
One of the most common reasons for failure of network applications is due to incorrect configuration. Incorrect configuration could happen at the client, the network and the server. The following lists possible mistakes in configuration that networking applications depend upon:

The client:
• If you are using a Pocket PC or a Smartphone, make sure you have the connection manager settings setup correctly.
• Make sure the client clock is correct and matches the server clock.
• Make sure you have the required certificates installed on the device.

The Network:
• Firewall Issues
• IPSec Issues
• Errors in routing tables (presumably this never happens).

The Server:
• Incorrectly configured redirection
• Server application problems
• Incorrectly configured authentication settings

3. Perform Sanity Checks & Isolate the Subsystem Causing the Problem
Sanity checks are typically simple programs that check the validity of your setup and configuration. This kind of debugging is approaching the problem from the opposite direction as that described in the next section. In this section, we start out with a bunch of small programs each of which tests the sanity of (ideally) exactly one subsystem. This way we could start out small verifying the functionality of small systems which we will then make into larger programs to verify larger and larger subsystems. In the next section, we go in the other direction by taking a large application and removing all components that are not necessary and still be able reproduce the problem. This step is typically better for cases where we get 100% reproducible failures. This typically is caused due to some incorrect configuration and is quickly identified by this step.

A scenario for how this would be used is say the test that tests SSL fails and all other http tests pass, then the options for rectifying the problem would be:
• Check that the dll that provides the SSL services is present on the device. (This is typically not a problem for standard devices, and more of a problem for custom devices).
• Check that the entire chain of CA upto the root CA of the server you are connecting to is installed on the device.
• Check that the system clock shows the correct time.

Examples of checking the underlying components on which your program is built might be trying to verify the following pieces of information:
Does DNS work? 
Does a simple socket application work? 
Does a simple http application using a GET request work? 
Does a simple http application using a POST request work? 
Does the authentication subsystem work? 
Does SSL work? 
Am I going through a proxy and does that work? 

4. Create a Repro Case
In order to debug any application, the very first step is to isolate the problem to the smallest portion of code that could be causing the problem. Networking problems are no different. Isolating the smallest piece of code that reproduces the problem might involve stripping your application down to the minimum code that still has the problem. In some cases this might not be possible and you might feel like it is not worth the effort doing this. But if the problem is not apparent, this might be the only recourse to figuring out what the problem is.

5. Check the RFCs/ MSDN Documentation / KB Articles/ Whitepapers/ Blogs/ Search Online
These are useful in general, but more specifically after you hit a problem and you’ve narrowed down the scope of the problem using the information in the previous two sections, this step can really help you to understand how to solve the problem if it’s a known issue. MSDN docs and RFCs might have some caveats hidden in the documentation for some of the APIs or protocols. If the documentation falls short of describing what *should* happen in the situation you have encountered, fallback to checking the frames of reference by seeing how a native application behaves on the same device or how an application built on .NET running on a desktop machine behaves. This will help you to identify correct behaviors or atleast verify that more than one piece of software behaves the same way.

6. Ask for Help

There are multiple avenues available for asking for help.
• If you have friends who are knowledgeable about the technology, it is probably easiest and quickest to ask them. Part of this, is posting on the newsgroups in the technology area of interest. (For instance, the .NET Compact Framework’s newsgroup can be found at http://forums.microsoft.com/msdn/ShowForum.aspx?ForumID=33)
• As a last resort contact the product support people in the organization.

When asking for help make sure you include all or most of the information below. While developers experienced in the technology can spot known problems immediately, the availability of any information helps debugging a problem. Things that stand out that are of immediate benefit for some third party debugging your issue are the output of network packet capture tools. Information that could be of use is as follows.
• Network traces at the client, server and proxy.
• The set of experiments conducted and the information about whether each of them succeeded or failed.

This posting is provided "AS IS" with no warranties, and confers no rights.