If you’ve ever spent any time with SharePoint you might’ve seen this error. It appears as a critical error in the Central Administration health status because, depending on your farm config/topology, it can be a service-killing error. This is just a quick post about how to diagnose the root-cause of the problem more effectively and some possible workarounds too.
The impact of this problem is basically service-app calls can start to fail as the security token needed to run whatever operations isn’t able to be generated. Search will fail without STS for example as the searching server won’t be able to get a security context for a search and so will fail. Other apps will fail for similar reasons.
The message seen in CA is basically just a “we pinged the service and got something back other than HTTP 200” – helpful to let us know something is wrong but not helpful to know what is wrong.
To find out more about what is wrong, the quickest way is per-server, open the Security Token Service (STS) in IIS and browse it.
If the STS service is healthy you should see this:
If STS is not healthy you should see an ASP.Net exception instead like this one below.
If you get a generic error instead you might need to disable custom errors to see the full thing.
Easier said than done and the resolution depends a lot on the root cause. For example, if your farm has any transient farm trusts for consuming other SharePoint service-applications, make sure that farm is visible to the failing server as that can be why.
It’s also possible though that the STS/local-environment is just misconfigured, which a simple psconfig command can often fix:
That’ll be “psconfig -cmd upgrade -inplace b2b” to you, and would fix the example ASP.Net error above. No “upgrade” is actually done assuming it’s not needed but it should sort out any IIS configuration needed. Once done, try opening the STS page again – if it was a simple misconfiguration issue this should’ve sorted it out.
Failing that it’s very difficult to say what the issue could be; you’ll probably want to get in touch with our support teams to help out. It’s worth seeing if it’s just one server or all, and if the behaviour is the same on a new configuration DB or not but either way we’re in “too complex for a blog-post” territory if none of the above worked.
Hopefully this helped someone anyway.
// Sam Betts