Spat's WebLog (Steve Patrick)

When things go wrong...

MaxConcurrentAPI - or, how fast can you authenticate users?

MaxConcurrentAPI - or, how fast can you authenticate users?

  • Comments 8

 MaxConcurrentAPI.. what is it?

 

It is referenced a few places:

 

Netlogon reg entries

IAS Best Practices

How to configure your ISA Server for a very large number of authentication requests

 

 

Edit 11/21/2011 NOTE: Please see http://support.microsoft.com/kb/975363 for updated informaton

Background:
ISA Servers were very very slow when processing web requests only when domain authentication was turned on.

 

 

 

First, enable the netlogon log  ( can you tell from previous posts, that I really like his log? )  on the ISA server and examine the log during the time the ISA response was slow.

 

Second - parse this log for interesting data during the time of failure(s)

 

Y:\data>findstr /i NlAllocateClientApi   Server02Netlogon.log  >  AllocateAPI.txt

 

AllocateAPI.txt shows:

 

12/13 16:53:58 [CRITICAL] SomeCompany: NlAllocateClientApi timed out: 0 258

12/13 16:53:58 [CRITICAL] SomeCompany: NlAllocateClientApi timed out: 0 258

12/13 16:53:58 [CRITICAL] SomeCompany: NlAllocateClientApi timed out: 0 258

12/13 16:53:59 [CRITICAL] SomeCompany: NlAllocateClientApi timed out: 0 258

12/13 16:53:59 [CRITICAL] SomeCompany: NlAllocateClientApi timed out: 0 258

 

 

We see a repeated pattern here:

 

12/13 16:53:58 [CRITICAL] SomeCompany: NlAllocateClientApi timed out: 0 258

12/13 16:53:58 [CRITICAL] SomeCompany: NlpUserValidateHigher: Can't allocate Client API slot.

12/13 16:53:58 [LOGON] SamLogon: Network logon of SomeCompany\User7161 from ClientMachine16151 Returns 0xC000005E

12/13 16:53:58 [LOGON] SamLogon: Network logon of SomeCompany\ User233  from ClientMachine16333 Entered

12/13 16:53:58 [LOGON] SamLogon: Network logon of SomeCompany\ User3433 from ClientMachine16123 Returns 0x0

 

 

12/13 16:53:58 [LOGON] SamLogon: Network logon of SomeCompany\User23423 from ClientMachine435 Entered

12/13 16:53:59 [LOGON] SamLogon: Network logon of SomeCompany\User546456 from ClientMachine453 Returns 0x0

12/13 16:53:59 [LOGON] SamLogon: Network logon of SomeCompany\User54642 from ClientMachine565  Entered

12/13 16:53:59 [CRITICAL] SomeCompany: NlAllocateClientApi timed out: 0 258

12/13 16:53:59 [CRITICAL] SomeCompany: NlpUserValidateHigher: Can't allocate Client API slot.

12/13 16:53:59 [LOGON] SamLogon: Network logon of SomeCompany\User23423 from ClientMachine435 Returns 0xC000005E

 

 

 

err.exe shows us ( http://www.microsoft.com/downloads/details.aspx?FamilyId=BE596899-7BB8-4208-B7FC-09E02A13696C&displaylang=en  )

 

F:\tools\err.exe\i386>err.exe 0xC000005E

# for hex 0xc000005e / decimal -1073741730 :

  STATUS_NO_LOGON_SERVERS                                   ntstatus.h

# There are currently no logon servers available to service

# the logon request.

# 1 matches found for "0xC000005E"

 

 

A review of this area in netlogon shows us we have a standing timeout period of 45000 ms   before we will post the first error ( and subsequent error for NlpUserValidateHigher  and the return error 0xC000005E)

 

The number of outstanding RPC calls to the DC is throttled via the MaxConcurrentApi setting.

We use a semaphore to manage this queue, so when the netlogon log shows us that the API timed out – it was waiting longer than 45 sec for a slot to open up, and finally gave up and quit.

 

They had already increased the setting for MaxConcurrentAPI per q326040, to 10.

The theory here, was that the DC simply was not responding fast enough to satisfy the queued requests,  even with this value increased.

 

 

We took a userdump of  LSASS on the ISA server and the DC which the ISA server held a secure channel to ( found via Nltest) .

This was important because we need to see what ISA is doing at this time, as well as what the DC was doing at this time.

 

 

The ISA LSA dump confirms that we are very busy with 283 threads in LSA - about 115 of them waiting for one of these slots to open.

 

What's the DC doing? It had 118 threads . 

 

Some are processing the logon requests which takes in the data from netlogon on the DC, sends the data to the NTLM SSP (msv1_0.dll) and eventually calls into  the Authentication directory .

 

But, most ( if not all ) of the busy threads were waiting.

 

So what are all the DC threads waiting on? A critical section which is needed when we allocate memory from the heap.  Normally this would not be an issue, but someone had  set gflags on LSASS on this machine.

 

 

0:037> !gflag

Current NtGlobalFlag contents: 0x00001040

    hpc - Enable heap parameter checking

    ust - Create user mode stack trace DB

 

UST  should not cause a huge perf hit, but HPC will.

 

Disabled  via running: "gflags -i lsass.exe -hpc -ust"

After disabling and rebooting the DC – things ran smoothly.

 

PS: The netlogon log is really really cool...

 

 

Spat

Leave a Comment
  • Please add 7 and 8 and type the answer here:
  • Post
  • One other angle on this is the DC is trying to contact trusted domains over slow links or the like.

    If there are a lot of trusts and the user managed to send their logon name as NULL\user

    - OR -

    There is a trusted domain and pass through has to really bad link your DC will take forever to time out.

    Look at the NEVERPING netlogon parameter that added a while back.
  • Excellent point Slight has - if there are trusted domains you basically follow the paths back - and in these cases MaxConcurrentApi applies to the trusting DC ( as opposed to a member server as in the case I posted about )

    The neverping refers to http://support.microsoft.com/?kbid=825107
  • In my last post I laid out the basic flow of NTLM authentication with SharePoint when all the accounts...
  • Is that value 4500ms or 45000ms? Is this value in the registry?
  • Matheesha

    Thanks! Sorry - its 45000 ( 45 seconds )
    And no its not configurable ( that I know of - I need to go back and review it to be 100% )

    PS
    I corrected the value in the post thanks to you
    spat
  • Kerberos is well… Kerberos. And NTLM is NTLM right? Right. Most of the time. This post is from a recent

  • Note - you can set the reg keys for maxconcurrent and then cycle the netlogon svc

  • In the truted domains scenario, how is possible to verify the effect of this Key? Perfomon "NDTS\NTLM Authentications" item could be useful?

    It could be nice to have a performance baseline before regkey optimization and after.

Page 1 of 1 (8 items)