Recently, I have seen a bunch of questions asking: "IIS6 runs fine for X amount of time [where X varies from days to weeks] and then all of a sudden, it stops accepting all connections. If I restart/reboot the server, then it is fine again... until X amount of time later, when I have to repeat the same procedure."
Sometimes, the user even notices that HTTP Error logs in %SYSTEMROOT%\System32\LogFiles\HTTPERR\*.log mention Connections_Refused for all those requests, but what the heck does that mean?
Ok, ignoring the jokers in the back that are snickering "hmm, this is Microsoft software; what do you expect?" , this issue is actually very diagnosable and should be treatable, as I will shortly show...
Every four or five days the web site hoste on my machine will stop responding to HTTP requests on port 80.
IIS doesn't crash so I cannot get an IIS State log, just this:
Excerpt from the HTTPErr.Log:
(... working fine up until this ...)2005-09-02 18:45:12 - - - - - - - - - 1_Connections_Refused -2005-09-02 18:46:07 - - - - - - - - - 1_Connections_Refused -2005-09-02 18:46:27 - - - - - - - - - 2_Connections_Refused -2005-09-02 18:46:42 - - - - - - - - - 1_Connections_Refused -2005-09-02 18:47:02 - - - - - - - - - 2_Connections_Refused -2005-09-02 18:48:22 - - - - - - - - - 2_Connections_Refused -2005-09-02 18:48:57 - - - - - - - - - 2_Connections_Refused -2005-09-02 18:49:02 - - - - - - - - - 1_Connections_Refused -2005-09-02 18:49:42 - - - - - - - - - 7_Connections_Refused -2005-09-02 18:50:02 - - - - - - - - - 3_Connections_Refused -2005-09-02 18:50:37 - - - - - - - - - 6_Connections_Refused -2005-09-02 18:50:42 - - - - - - - - - 2_Connections_Refused -2005-09-02 18:51:02 - - - - - - - - - 2_Connections_Refused -2005-09-02 18:51:47 - - - - - - - - - 1_Connections_Refused -2005-09-02 18:52:02 - - - - - - - - - 3_Connections_Refused -2005-09-02 18:52:27 - - - - - - - - - 1_Connections_Refused -2005-09-02 18:53:17 - - - - - - - - - 1_Connections_Refused -2005-09-02 18:53:22 - - - - - - - - - 2_Connections_Refused -2005-09-02 18:54:02 - - - - - - - - - 2_Connections_Refused -2005-09-02 18:54:22 - - - - - - - - - 1_Connections_Refused -2005-09-02 18:54:32 - - - - - - - - - 1_Connections_Refused -2005-09-02 18:55:22 - - - - - - - - - 1_Connections_Refused -2005-09-02 18:56:02 - - - - - - - - - 2_Connections_Refused -2005-09-02 18:56:42 - - - - - - - - - 1_Connections_Refused -2005-09-02 18:58:17 - - - - - - - - - 2_Connections_Refused -2005-09-02 18:58:42 - - - - - - - - - 2_Connections_Refused -2005-09-02 18:58:57 - - - - - - - - - 1_Connections_Refused -
I have to reboot the machine to get it to work again.
Does anyone have any ideas?
Connections_Refused is actually a very diagnosable failure pattern, and I am going to show you how to diagnose it and what is going on.
According to KB 820729, Connections_Refused means that the "Kernel Non Paged Pool memory has dropped below 20MB and HTTP.SYS has stopped receiving new connections". What all this means in English is this:
Hopefully, it is now clear what Connections_Refused means. HTTP.SYS is basically telling you "someone is using up a lot of NPP memory, and for protective reasons, I am going to stop accepting requests". We need to figure out what driver is using up all the NPP memory and address it, and the Connections_Refused should naturally go away because NPP memory will not be under pressure.
One of the nice properties of NPP memory is that a unique tag is associated with every piece of this memory, so we can always track down who is using what and how much of NPP memory. The user-mode tool to do this is called POOLMON.EXE (if you have a Kernel Debugger attached, you can use the !poolused command to get this info within the Kernel Debugger). Here are some additional bits of related information:
I am now going to step through the POOLMON output of a user who had Connections_Refused and comment on what is going on. Here is a sample snippet of running POOLMON.EXE -b :
Memory: 1048016K Avail: 438396K PageFlts: 219 InRam Krnl: 3604K P:42372K Commit: 538908K Limit:11245092K Peak: 609576K Pool N:109056K P:43740K System pool information Tag Type Allocs Frees Diff Bytes Per Alloc ISil Nonp 1408366 ( 56) 1327427 ( 58) 80939 36678632 ( -872) 453 I100 Nonp 11048877 ( 217) 10967968 ( 219) 80909 14886928 ( -368) 183 PcNw Nonp 292427 ( 0) 88967 ( 0) 203460 12203296 ( 0) 59 SavE Nonp 1596066 ( 0) 1595659 ( 0) 407 11717016 ( 0) 28788 IAM Nonp 1946164 ( 49) 1904725 ( 50) 41439 8519576 ( -208) 205 tdLL Nonp 74748 ( 0) 69830 ( 4) 4918 2672832 ( -1776) 543 LSwi Nonp 1 ( 0) 0 ( 0) 1 2576384 ( 0) 2576384 R200 Nonp 22 ( 0) 1 ( 0) 21 2297816 ( 0) 109419 MmCm Nonp 302 ( 0) 48 ( 0) 254 2236960 ( 0) 8806 TCPt Nonp 190486 ( 4) 190462 ( 4) 24 1392264 ( 0) 58011 ULHP Nonp 3127 ( 0) 41 ( 0) 3086 1047216 ( 0) 339 PTrk Nonp 999750 ( 0) 996561 ( 0) 3189 956824 ( 0) 300 File Nonp 9628160 ( 65) 9623252 ( 65) 4908 750688 ( 0) 152 rg81 Nonp 6910 ( 0) 5294 ( 0) 1616 603864 ( 0) 373 Pool Nonp 6 ( 0) 3 ( 0) 3 602112 ( 0) 200704 Devi Nonp 680 ( 0) 262 ( 0) 418 558552 ( 0) 1336 Thre Nonp 417221 ( 6) 416457 ( 6) 764 476736 ( 0) 624 LSwr Nonp 128 ( 0) 0 ( 0) 128 416768 ( 0) 3256 Mm Nonp 26 ( 0) 2 ( 0) 24 379880 ( 0) 15828 AfdC Nonp 21989 ( 0) 19872 ( 1) 2117 338720 ( -160) 160 ....
Some observations:
Note: Don't shoot the messenger (HTTP.SYS)! Always strive to figure out the root of your problem and address that. The rest of the problems should naturally fall away.
//David