Microsoft Azure Cloud Integration Engineering

(Compute, Cache, Storage, ACS, Service Bus, WebSites, VMs, SQL Azure, Data Sync, Import Export)

Cache retry fails .. what next ??

Cache retry fails .. what next ??

  • Comments 5

When using In-Role Cache or Cache Service applications may get retry’ble error such as below

ErrorCode<ERRCA0017>:SubStatus<ES0002>:There is a temporary failure. Please retry later. (The request did not find the primary.). Additional Information : The client was trying to communicate with the server: net.tcp://<IP>:20004/. ---> Microsoft.ApplicationServer.Caching.DataCacheException:  ……………

.

.

ErrorCode<ERRCA0016>:SubStatus<ES0001>:The connection was terminated_ possibly due to server or network problems or serialized Object size is greater than MaxBufferSize on server. Result of the request is unknown.. Additional Information : The client was trying to communicate with the server: net.tcp://<IP>:20003 ………….

Reasons in general can be in case of High Availability the underlying cache service is load balancing the partitions and the secondary node is transitioning to primary and the client still is sending request to old primary node OR for some reason the cache service got moved to a different VM as part of service healing process but cache client still is having the old IP address of cache service VM.

Though its good to have a retry policy in place but in extreme cases where retry is not helping then you could use below approach in your application to mitigate the errors by refreshing the cache client when an exception is thrown. 

Sample code:

Application Code

try {

DataCacheHelper.DataCache.Get("key");

}

catch (DataCacheException) {

DataCacheHelper.Refresh();

}

DataCacheHelper.cs

using Microsoft.ApplicationServer.Caching;

using System;

using System.Reflection;

namespace DataCacheHelpers {

public static class DataCacheHelper {

private static DataCacheFactory _factory;

private static DataCache _cache;

public static DataCacheFactory DataCacheFactory {

get {

if (_factory == null) {

_factory = new DataCacheFactory();

}

return _factory;

}

}

public static DataCache DataCache {

get {

if (_cache == null) {

_cache = DataCacheFactory.GetDefaultCache();

}

return _cache;

}

}

public static void Refresh() {

var factory = _factory;

if (factory != null) {

factory.Dispose();

_factory = null;

}

_cache = null;

// Clear DataCacheFactory._connectionPool

var coreAssembly = typeof(DataCacheItem).Assembly;

var simpleSendReceiveModulePoolType = coreAssembly.

GetType("Microsoft.ApplicationServer.Caching.SimpleSendReceiveModulePool", throwOnError: true);

var connectionPoolField = typeof(DataCacheFactory).GetField("_connectionPool", BindingFlags.Static | BindingFlags.NonPublic);

connectionPoolField.SetValue(null, Activator.CreateInstance(simpleSendReceiveModulePoolType));

// Clear DistributedCacheSessionStateStoreProvider._staticInternalProvider

var providerType = typeof(Microsoft.Web.DistributedCache.DistributedCacheSessionStateStoreProvider);

var providerField = providerType.GetField("_staticInternalProvider", BindingFlags.Static | BindingFlags.NonPublic);

providerField.SetValue(null, null);

}

}

}
Comments
  • I'm using the azure cache for handling session state.  Does this work for session state as well?  I'm not using a DataCacheFactory at all.  Thanks!

    I've been getting this errors as well and the only thing that fixes it is restarting the app.

    Would it be easier just to restart the role?  RoleEnvironment.RequestRecycle()

  • Tom Wilson's unanswered question is almost exactly mine.  I am also using the cache only for session state, and no factory is involved.  I wonder whether I should be using the code only after the comment "Clear DistributedCacheSessionStateStoreProvider._staticInternalProvider".  This is difficult to test though, and some further clarification would be welcome.

  • I would use a global.asax application_error handler for the session state refresh.  That is where you'll catch the datacacheexception.  My customers have been using that successfully.  I would also check if it is explicitly the errca0017 error.  This also applies to substatus 6.  Microsoft knows about this issue and there is an internal bug on this.  Hopefully, we'll see this in future NuGet packages.

    thanks!

    mike

  • This is what I put in my Application_Error in global.asax.  I couldn't catch the error when updating the session.  Does this look right?

    protected void Application_Error(object sender, EventArgs e)

           {

               Exception ex = Server.GetLastError();

               // Log the exception and notify system operators

               if (ex.GetType() == typeof(DataCacheException))

               {

                       DataCacheHelper.Refresh();

               }

               else

               {

                   LoggingUtility.TraceError(ex);

               }

           }

  • This code assumes only a single DataCacheFactory will exist, however we have a scenario where many caches may be accessed from within a single process and therefore use multiple instances of DataCacheFactory which are created in code and have separate lifetime management.  

    I notice that the Refresh logic here targets a static field for the connection pool.  I presume therefore that prior to this code being used, ALL active factories should be disposed and then re-initialised following the connection pool reset?

    Also, I assume that the code after the session state comment can safely be ignored when not using cache for session state?

Page 1 of 1 (5 items)
Leave a Comment
  • Please add 3 and 5 and type the answer here:
  • Post