Welcome to MSDN Blogs Sign in | Join | Help

Tripping over Missing Servers

A common complaint is that the first call on a client object takes some disproportionately large amount of time, usually ten seconds or more, while successive calls are instantaneous. There are many reasons why this might happen so there's no generic resolution for this problem. Sometimes it is caused by a truly legitimate need to do a great deal more work than normal. For example, if the service you're talking with has been shut down, hibernated, and put away, restoring things to sufficient operation for processing your request may actually take a noticeable amount of time.

On the other hand, this sometimes is caused by indiscernible factors that vary from machine to machine. Frequently, these hard to diagnose and track down slowdowns are caused by a characteristic of distributed systems that Leslie Lamport noted a long time ago:

A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.

There are several cases that spring to mind working out differing machine configurations and sometimes just the vagaries of different pieces of software interacting. Three common ones go like this:

  1. If your primary name server is down or responding very slowly, almost every connection attempt will end up failing or taking much longer than expected while waiting for a lookup response.
  2. If you take a laptop between a corporate domain and home network while hibernating, it will from time to time try to chat with domain controllers that are not in your living room.
  3. We use the default proxy settings for your web browser to make HTTP requests unless you tell us otherwise. This is probably what you want unless your web browser is configured to sit there trying to automatically detect the proxy server that you don't actually have.
Published Friday, May 22, 2009 5:00 AM by Nicholas Allen

Comments

Friday, May 22, 2009 2:16 PM by Tobias

# re: Tripping over Missing Servers

The problem with servers can be solved with a central UDDI and dynamic WCF clients. After some tweaking we got a reliable plattform. A bunch of "unreliable" services grouped by a UDDI can make a pretty reliable service. See http://www.codeproject.com/KB/WCF/uddiservicefactory.aspx for code and details.

This "all is static approach" of standard WCF implementation is its biggest conceptual weakness and imo not suited for serious enterprise apps.

The first call is slow problem. Before the first call the client is in "Created" state. First call establishes the connection which can consist of a couple of calls between service and client (encryption) and thus can be very time consuming.

This can be partially solved by calling the the proxies .Open() method before the first method call.

But be careful. There is a nasty caching bug waiting. Check http://social.msdn.microsoft.com/Forums/en-US/wcf/thread/49c6683d-043b-4358-ae3c-3f75bfb34cb0 for a workaround.

Friday, May 22, 2009 5:13 PM by Tobias Manthey

# re: Tripping over Missing Servers

I would go even one step further...

If a distributed system is for you a system one in which the failure of a computer you didn't even know existed can render your own computer unusable, you have a major design flaw in your distributed system.

Distributed systems are by definition systems in which you do not control every node and hence redundancy and a fail over mechanisms are essential components.

But then a distributed system can be a source of impressive quality of service. Guess you have a crappy service with a availability of 90%. Having 4 services of them with a fail over mechanism turns down the probability of a total service failure to 0.1^4 = 0.0001 or the service availability to 99.99%. This makes the difference between crap and high availability.

But regarding the proxy I definitely agree. Especially in enterprise environments using a proxy should be very common. Proxy and WCF does not go very well together and by default all .Net HTTP requests use your Internet Explorer settings. Not only that proxies slow down things significantly, especially "1 out 100 service calls fails" errors can be caused by proxies.

Either turn off the proxy usage in code by setting

Webrequest.DefaultProxy = null;

or switch it off in configuration...

<system.net>

  <defaultProxy useDefaultCredentials="True">

     <proxy useSystemDefault="False" autoDetect="False"/>

  </defaultProxy>

</system.net>

Use Fiddler on your client computer to confirm that you were successful.

Saturday, May 23, 2009 7:56 AM by Udi Dahan

# re: Tripping over Missing Servers

If you were doing queued messaging between your services, these technical problems would just disappear.

Of course, you'd have to rethink your service contracts to move to a more one-way model (possibly with callback contracts), but that's actually a good thing - technical decoupling leading to greater logical decoupling.

<a href="http://www.nservicebus.com">NServiceBus</a> is a messaging framework which is based on these two principles - technical and logical decoupling, which is designed to make building robust and scalable distributed systems easier by preventing you from making decisions that can get you in trouble later.

Tuesday, May 26, 2009 4:43 PM by Nicholas Allen

# re: Tripping over Missing Servers

Hi Tobias,

Replication is indeed a common way of improving reliability beyond that of the individual components in the system.  There are some restrictions to this approach (for example, replicated systems frequently need to trade off consistency of write visibility to achieve adequate performance) but there's a very large number of applications for which these restrictions are acceptable.  A client might still have single points of failures even in a replicated system though, including some of the examples in the article.  DNS is a highly replicated distributed system but a client might not have a mechanism for retrying after the name servers it has been configured with fail to respond.

Tuesday, May 26, 2009 4:53 PM by Nicholas Allen

# re: Tripping over Missing Servers

Hi Udi,

I agree with you that more people should be thinking about queued topologies that provide decoupled messaging than do so today.  I avoid calling queues a generic resolution to the problem though because replacing connected messaging with queued messaging is a semantic change to the application.  Adding a queue to the system can change the boundaries for transactions and acknowledgments, as well as some of the properties of message delivery sessions.  Application changes or additional protocols may be needed, which you might not always have the design freedom to introduce.  That's why more people need to think about these problems up front because it is often hard to switch to a better approach after the problem is seen in deployment.

New Comments to this post are disabled
 
Page view tracker