MaxConcurrentAPI.. what is it?
It is referenced a few places:
Netlogon reg entries
IAS Best Practices
How to configure your ISA Server for a very large number of authentication requests
Edit 11/21/2011 NOTE: Please see http://support.microsoft.com/kb/975363 for updated informaton
Background:ISA Servers were very very slow when processing web requests only when domain authentication was turned on.
First, enable the netlogon log ( can you tell from previous posts, that I really like his log? ) on the ISA server and examine the log during the time the ISA response was slow.
Second - parse this log for interesting data during the time of failure(s)
Y:\data>findstr /i NlAllocateClientApi Server02Netlogon.log > AllocateAPI.txt
12/13 16:53:58 [CRITICAL] SomeCompany: NlAllocateClientApi timed out: 0 258
12/13 16:53:59 [CRITICAL] SomeCompany: NlAllocateClientApi timed out: 0 258
We see a repeated pattern here:
12/13 16:53:58 [CRITICAL] SomeCompany: NlpUserValidateHigher: Can't allocate Client API slot.
12/13 16:53:58 [LOGON] SamLogon: Network logon of SomeCompany\User7161 from ClientMachine16151 Returns 0xC000005E
12/13 16:53:58 [LOGON] SamLogon: Network logon of SomeCompany\ User233 from ClientMachine16333 Entered
12/13 16:53:58 [LOGON] SamLogon: Network logon of SomeCompany\ User3433 from ClientMachine16123 Returns 0x0
12/13 16:53:58 [LOGON] SamLogon: Network logon of SomeCompany\User23423 from ClientMachine435 Entered
12/13 16:53:59 [LOGON] SamLogon: Network logon of SomeCompany\User546456 from ClientMachine453 Returns 0x0
12/13 16:53:59 [LOGON] SamLogon: Network logon of SomeCompany\User54642 from ClientMachine565 Entered
12/13 16:53:59 [CRITICAL] SomeCompany: NlpUserValidateHigher: Can't allocate Client API slot.
12/13 16:53:59 [LOGON] SamLogon: Network logon of SomeCompany\User23423 from ClientMachine435 Returns 0xC000005E
err.exe shows us ( http://www.microsoft.com/downloads/details.aspx?FamilyId=BE596899-7BB8-4208-B7FC-09E02A13696C&displaylang=en )
# for hex 0xc000005e / decimal -1073741730 :
# There are currently no logon servers available to service
# the logon request.
# 1 matches found for "0xC000005E"
A review of this area in netlogon shows us we have a standing timeout period of 45000 ms before we will post the first error ( and subsequent error for NlpUserValidateHigher and the return error 0xC000005E)
The number of outstanding RPC calls to the DC is throttled via the MaxConcurrentApi setting.
We use a semaphore to manage this queue, so when the netlogon log shows us that the API timed out – it was waiting longer than 45 sec for a slot to open up, and finally gave up and quit.
They had already increased the setting for MaxConcurrentAPI per q326040, to 10.
The theory here, was that the DC simply was not responding fast enough to satisfy the queued requests, even with this value increased.
We took a userdump of LSASS on the ISA server and the DC which the ISA server held a secure channel to ( found via Nltest) .
This was important because we need to see what ISA is doing at this time, as well as what the DC was doing at this time.
The ISA LSA dump confirms that we are very busy with 283 threads in LSA - about 115 of them waiting for one of these slots to open.
What's the DC doing? It had 118 threads .
Some are processing the logon requests which takes in the data from netlogon on the DC, sends the data to the NTLM SSP (msv1_0.dll) and eventually calls into the Authentication directory .
But, most ( if not all ) of the busy threads were waiting.
So what are all the DC threads waiting on? A critical section which is needed when we allocate memory from the heap. Normally this would not be an issue, but someone had set gflags on LSASS on this machine.
Current NtGlobalFlag contents: 0x00001040
hpc - Enable heap parameter checking
ust - Create user mode stack trace DB
UST should not cause a huge perf hit, but HPC will.
Disabled via running: "gflags -i lsass.exe -hpc -ust"
After disabling and rebooting the DC – things ran smoothly.
PS: The netlogon log is really really cool...