This is the story of a case which took me some time to diagnose, and which I considered worth sharing, not because it's potential to become a common problem in many customer scenarios, but more because how interesting the troubleshooting steps could be.

Problem Description:

Analysis Manager fails with the following error “Unable to connect to the registry on the Server (MyOLAPServer) or you are not a member of OLAP Administrators group”, when trying to connect to a clustered instance and the cluster resource was owned by one of the nodes. Because when it was owned by the other node, Analysis Manager was able to connect to the clustered instance of Analysis Services.


Troubleshooting:

  1. Asked customer to collect Time Travel Debugging traces (developed by the Center for Software Excellence team at Microsoft) of the MMC.exe hosting the Analysis Services add-in, while she reproduced the error.
  2. After having spent a considerable amount of time understanding how that client side part of OLAP code worked, I found that when the server was running from the faulty node, the client wasn’t receiving the Repository Connection String and RemoteRepositoryConnectionString properties, but some “garbage” instead.
  3. Then I went to the server side and had  look at the "HKLM\SOFTWARE\Microsoft\OLAP Server\Server Connection Info" registry key and noticed that the values Repository Connection String and RemoteRepositoryConnectionString weren’t in plain text (REG_SZ) as in a test instance I had freshly installed locally, but they were stored in a binary stream (REG_BINARY) and, at first sight, appeared to be encrypted.
  4. With that, I decided to go to the code and found that SP4 had introduced some new security improvement which checked if those two registry values were stored in plain text, and in that case it encrypted them using a dynamically symmetric encryption key (CryptGenKey) generated and imported by the default cryptographic provider (CryptAcquireContext). Another hardcoded fixed symmetric key was imported into the crypto provider store (CryptImportKey) and then it was exported after it was encrypted with the dynamically generated key (CryptExportKey). So, basically we have a hardcoded key which we encrypt with a dynamically generated one. After that, using that key we encrypt the contents of the original registry values "Repository Connection String" and "RemoteRepositoryConnectionString" (CryptEncrypt), and save the resulting BLOB back in the Registry but this time as REG_BINARY data. Finally, we persist the dynamically generated encrypted key we used to encrypt the two registry values in a binary value we will call SavedEncryptionKey whose location I prefer not to reveal. :-)
  5. So, what could make this decryption not working from one of the nodes? Only two possibilities I could think off: 1) SavedEncryptionKey was different on both nodes, or 2) msmdsrv.exe was SP4 in one node but not in the other (before SP4 msmdsrv.exe didn't have notion of this encryption mechanism, and might happen that a later build could have changed the hardcoded key?).
  6. We went to compare SavedEncryptionKey values on the two nodes and found they were identical. Somehow expected since the Registry key where it is stored is replicated across the possible owner nodes.
  7. Went to C:\Program Files\Microsoft Analysis Services\Bin on both nodes and found that both had build 8.00.2039 for msmdsrv.exe. So, now what?
  8. Asked customer to capture one Time Travel Debugging trace again, but this time from msmdsrv.exe while the client (MMC.exe) reproduced the error. While analyzing the trace, I noticed that msmdsrv.exe wasn’t build 2039 (SP4) but 194 (RTM), and that the exe hadn’t been loaded from C:\Program Files\Microsoft Analysis Services\Bin but from E:\Program Files\Microsoft Analysis Services\Bin instead.
  9. With that, we went back to the faulty node and built a theory of what was actually happened on that system. And this is it…

Somebody installed Analysis Services in one node, pointing the Data Folder to the shared Disk Resource (E:\...), and putting the binaries in the local disk, default location (C:\Program Files\Microsoft Analysis Services\Bin). Then, on the secondary node, Analysis Services was installed to have the Data Folder pointing to the shared disk again (E:), which is correct, BUT this time the location for the binaries was set to E:\Program Files\Microsoft Analysis Services\Bin. With that situation, the value RootDir under HKLM\SOFTWARE\Microsoft\OLAP Server\CurrentVersion was inconsistent across the two nodes. Primary node was pointing to C:\ drive while secondary pointed to E:\. But it was inconsistent for a short period of time. Why? Once you bring online a clustered resource (Analysis Services) which you have configured to synchronize a certain registry key (HKLM\SOFTWARE\Microsoft\OLAP Server\Server Connection Info), it actually keeps it synchronized.


So, at failover time from primary to secondary node, it overwrote the E:\Program Files\Microsoft Analysis Services\Bin with C:\Program Files\Microsoft Analysis Services\Bin as it was in the primary node.

If you don’t apply Service Pack 4 you don’t experience this issue. But when you apply SP4, it goes to the registry to see what’s the location where it has to deploy the binaries (HKLM\SOFTWARE\Microsoft\OLAP Server\CurrentVersion à RootDir) and it’s wrongly set to C:\Program Files\Microsoft Analysis Services\Bin in both nodes. So, the first node is correctly patched, but the second ends up with a new directory with the SP4 version of the binaries, which is not used by anyone.

Solution:

  1. Uninstall everything from secondary node. Reinstall it properly (deploying the binaries on the same directory as in primary node) and patch it to SP4, or…
  2. With the services offline, go through the registry locating any entries pointing to E:\Program Files\Microsoft Analysis Services\Bin and replacing with C:\Program Files\Microsoft Analysis Services\Bin. After all registry entries have been changed, copy C:\Program Files\Microsoft Analysis Services\ on top of E:\Program Files\Microsoft Analysis Services\, and then move E:\Program Files\Microsoft Analysis Services\ on top of E:\Program Files\Microsoft Analysis Services\.

Of course, all this was done after having attempted the known and documented reasons like http://support.microsoft.com/kb/812601, http://support.microsoft.com/kb/231951, etc.


End of story!


Hope you enjoyed it.