Welcome to MSDN Blogs Sign in | Join | Help

Unfortunately it is not possible to update/patch SharePoint without occurring some amount downtime. So the only options available to us are to minimize downtime.

I think of downtime in two ways, 'not available' and 'reduced functionality'. The second obviously is more closely aligned to zero downtime however for a large farm it is difficult logistically to achieve.  Reduced functionality is providing your users with a read only farm during the time in which the patching of the primary farm is occurring. This requires the entire primary farm to be swung to another farm with its content DBs set to read only mode. Once patching on the primary farm is complete traffic is redirected back to the primary farm.

Since we want to always keep our downtime window as small as possible we should always follow a good and well tested practice of upgrading a farm to ensure there are no surprises along the way that will impact our downtime. There are two phases of upgrade, laying down updated binaries and running psconfig. The second phase, running psconfig, is the one that is going to take the majority of the time and the time taken is directly proportional to the amount of site collections within the farm. Psconfig upgrades each content DB schema and in many installations this can take many hours to complete. We have found in our testing in many real world deployments that detaching all content DBs other than the CA content DB, running psconfig, and then attaching content DBs back can help reduce the amount of downtime needed when patching SharePoint. One myth that needs to be dispelled now; the reduction in downtime is not achieved because psconfig runs faster or gets better throughput when upgrading the Content DB.

As with most things in SharePoint there are rules around the DB Attach process:
1. Only one Content DB can be attached to a farm at any one time.
2. Once a Content DB has been attached to a farm all of its content is marked as updated and therefore will incur what is effectively a full crawl the first time search crawls this DB. More on this later.

 So taking into account rule #1 there are two ways we can optimize this process to reduce downtime, 1) Prioritize, attach content DBs that belong to Web Applications that are highly sensitive to downtime first and make them available,  and 2)Use surrogates, build out additional worker farms which are used as surrogates to host attaching content DBs.   

 The process of using surrogate farms includes building one or more single server, throw away, farms that are running the same patch level you are upgrading your primary farm to. This process has been well documented here. This approach however has downsides such as the need for additional hardware, the additional time and effort to build out these farms, and the need for your SQL server to be able to handle the additional load.  

 Let's take a look at the steps of a typical upgrade with DB prioritization:

0. Announce and coordinate downtime with IT, users, etc.
1. Take farm offline, typically you are pulling the WFEs out of a load balancer or for the case of reduced functionality, swinging DNS settings to another read only farm replica.
2. Detach all content DBs from the farm.
3. Run WSS and if applicable MOSS upgrade patches on each server and choosing  to not run psconfig. You are only going to run psconfig once, not once for each upgrade package.
4. After all servers have been patched run psconfig on the CA machine. The execution of psconfig will not take near as long because all the content DBs are detached.
5. For each additional machine in the farm and one at a time run psconfig.
6. At this point your farm, without any content DBs, is upgraded and only the content DBs require upgrade.
7. Starting with the web application that is most critical to get back into production start attaching its content DB(s). Once complete put this Web Application back in the load balancer and notify everyone it is back online.
8.Continue running through each additional Content DB until each is attached back into the farm.

I have a tool that I will be releasing soon named CDBManager that will help with the DB prioritization method of upgrade, specifically it allows you to:

  • Mass DB detach all the content DBs in a farm
  • Reorder content DBs by priority
  • Automate the attaching of content DBs and provides ETA of when each DB will be complete (important because there is not a progress indicator otherwise)
  • Manages the upgrade.log file by creating an upgrade.log file for each content DB attached. As you may know each time a Content DB is attached to a farm and upgraded a new upgrade log is either created or if one already exists it is appended too. The problem with this approach is that all your upgrade logging is in a single file and not split out by Content DB. CDBManager renames the upgrade.log file after each Content DB has completed attaching with the name of the Content DB. This makes it much easier to go back through each log and analyze what might have gone wrong within a certain Content DB on upgrade.

I have a couple of large enterprise customers that are testing the tool now. Once we get past any breaking issues I publish the bits.

One additional point about the prioritization upgrade approach; should a content DB fail to attach along the way for whatever reason you should continue to DB Attach the remaining content DBs. While the upgrade is progressing along you now have the opportunity to investigate and mitigate the issues with the failed content DB in parallel to the upgrade and once ready retry the DB attach operation.

 So time to revisit DB Attach rule #2. When a content DB is attached into a farm its ID is changed. This has the side effect of effectively marking each object within the content DB as changed. This means that when the crawler service hits this content DB, weather doing an incremental or a full crawl, it will effectively do a full crawl as it believes all the content has changed. The Infrastructure Update (IU) changes all of this and effectively takes this rule out of play. After installing the IU the Content DB is not longer changed. This means an incremental crawl after re-attaching a Content DB is really an incremental crawl. No more full crawls after detaching and attaching content DBs, yea!

The Infrastructure Update (IU) KB article also has this blurb:

 Improvements to the time that is required to update and upgrade Windows SharePoint Services sites.

So what does this mean? Any fix before the IU psconfig updates each site collection in the farm by updating its build version to reflect the most recent value. After installing the IU we only hit site collections to update them if a schema object of the site collection needs to be updated, such as an update to the template schema. This type of update is far less frequently since hotfixes rarely do site collection schema updates (IU however does to support the new search features). The end result is that if a fix does not require a schema update psconfig does not go through each site collection and update the build number, so this will drastically reduce the amount of time necessary to perform the upgrade.

So there you have it, while we cannot upgrade a live farm we do have processes and available fixes that will move us closer to the nirvana of a zero downtime upgrade.

Happy upgrading!

6 Comments
Filed under:

I am a big fan of anything that gives me more insight as to what is happening on my system. One such IIS 6 tweak which I find is greatly overlooked is the additional AppPool logging you can get out of IIS 6. It blows me away that this was not "on" by default -- but that is another blog entry.

The tweek involves modifying the metabase per http://support.microsoft.com/kb/332088 to turn on this additional logging at the AppPool. This KB has a command:

cscript adsutil.vbs Set w3svc/AppPools/DefaultAppPool/LogEventOnRecycle 255

This turns the logging on for the DefaultAppPool however a more global approach, and one that would be useful esp. when you have more than one AppPool like us SharePoint guys is:

cscript adsutil.vbs Set w3svc/AppPools/LogEventOnRecycle 255

It goes without saying but after you set this you want to do some kind of monitoring for these events to get a since as to what is going on with your AppPools throughout the day.

I have spent a good deal of time lately working with one of my MOSS 2007 customers on a database disconnect issue that has been plaguing them since SPS 2003. The following blog entry are the steps I went through to track down this issue. It was not without the help of some of my co-workers that made this journey successful. I also need to mention that the customer made an incredible contribution of time and effort on this as well. In many cases most customers would have bailed and just resigned to the fact this issue would not be resolved but this one was different. They stuck it out and in the end we nailed the issue.

The diagnostic tools used during the investigation included:
  • ULS Logs
  • Netmon
  • ADO.Net Tracing
  • Netstat
  • DbgView
  • TDIMon
  • NT Event Log
  • WinDBG (Debugger)
  • LogParser
  • netdiag


This issue occurred in every single environment they had, Production, QA, Dev, POC and was prevalent in that it occurred several dozen times a day on each server. The issues presented as multiple event log entries in the event log, specifically there were many event 7888 and 6483 events littered about in the Application Event Log.Looking at the ULS logs we could confirm the same error being logged there.

What was most interesting about these events where that each where logged as a timeout expired operation and the operation being performed was an initial connection. You can tell this by looking at the various method names such as TdsParser.Connect() (on an already existing connection we would be running TdsParser.Parse), AttemptOneLogin, CreateConnection(), etc would all seem to indicate we were not using a pooled connection. The timeout message really bothered me too; I knew the connection strings being used where managed by SharePoint and typically the TIMEOUT parameter is not set in the connection string explicitly so this was probably not the result of a low TIMEOUT value being set by a rouge configuration setting.

Event Type: Error
Event Source: Office SharePoint Server
Event Category: Office Server General Event ID: 7888
Date: 9/6/2007
Time: 7:24:10 AM
User: N/A
Computer: XXX-XXXXX
Description: A runtime exception was detected. Details follow.
Message: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
Techinal Details:
System.Data.SqlClient.SqlException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj)
at System.Data.SqlClient.TdsParserStateObject.ReadSniError(TdsParserStateObject stateObj, UInt32 error)
at System.Data.SqlClient.TdsParserStateObject.ReadSni(DbAsyncResult asyncResult, TdsParserStateObject stateObj)
at System.Data.SqlClient.TdsParserStateObject.ReadPacket(Int32 bytesExpected)
at System.Data.SqlClient.TdsParser.ConsumePreLoginHandshake(Boolean encrypt, Boolean trustServerCert, Boolean& marsCapable)
at System.Data.SqlClient.TdsParser.Connect(ServerInfo serverInfo, SqlInternalConnectionTds connHandler, Boolean ignoreSniOpenTimeout, Int64 timerExpire, Boolean encrypt, Boolean trustServerCert, Boolean integratedSecurity, SqlConnection owningObject)
at System.Data.SqlClient.SqlInternalConnectionTds.AttemptOneLogin(ServerInfo serverInfo, String newPassword, Boolean ignoreSniOpenTimeout, Int64 timerExpire, SqlConnection owningObject)
at System.Data.SqlClient.SqlInternalConnectionTds.LoginNoFailover(String host, String newPassword, Boolean redirectedUserInstance, SqlConnection owningObject, SqlConnectionString connectionOptions, Int64 timerStart)
at System.Data.SqlClient.SqlInternalConnectionTds.OpenLoginEnlist(SqlConnection owningObject, SqlConnectionString connectionOptions, String newPassword, Boolean redirectedUserInstance)
at System.Data.SqlClient.SqlInternalConnectionTds..ctor(DbConnectionPoolIdentity identity, SqlConnectionString connectionOptions, Object providerInfo, String newPassword, SqlConnection owningObject, Boolean redirectedUserInstance)
at System.Data.SqlClient.SqlConnectionFactory.CreateConnection(DbConnectionOptions options, Object poolGroupProviderInfo, DbConnectionPool pool, DbConnection owningConnection)
at System.Data.ProviderBase.DbConnectionFactory.CreateNonPooledConnection(DbConnection owningConnection, DbConnectionPoolGroup poolGroup)
at System.Data.ProviderBase.DbConnectionFactory.GetConnection(DbConnection owningConnection)
at System.Data.ProviderBase.DbConnectionClosed.OpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory)
at System.Data.SqlClient.SqlConnection.Open()
at Microsoft.Office.Server.Data.SqlSession.OpenConnection()
at Microsoft.Office.Server.Data.SqlSession.ExecuteNonQuery(SqlCommand command)
at Microsoft.Office.Server.Data.SqlDatabaseManager.HasAccess(String user)
at Microsoft.Office.Server.Administration.SharedResourceProvider.SynchronizeConfigurationDatabaseAccess(SharedComponentSecurity security)
at Microsoft.Office.Server.Administration.SharedResourceProvider.SynchronizeAccessControl(SharedComponentSecurity sharedApplicationSecurity)
at Microsoft.Office.Server.Administration.SharedResourceProvider.Microsoft.Office.Server.Administration.ISharedComponent.Synchronize()
 


Event Type: Error
Event Source: Office SharePoint Server
Event Category: Office Server Shared Services
Event ID: 6483
Date: 9/26/2007
Time: 6:55:10 AM
User: N/A
Computer: XXX-XXXXX
Description:
Application synchronization failed for Microsoft.Office.Server.Search.Administration.SearchService.

Reason: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.

Techinal Support Details:
System.Data.SqlClient.SqlException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj)
at System.Data.SqlClient.TdsParserStateObject.ReadSniError(TdsParserStateObject stateObj, UInt32 error)
at System.Data.SqlClient.TdsParserStateObject.ReadSni(DbAsyncResult asyncResult, TdsParserStateObject stateObj)
at System.Data.SqlClient.TdsParserStateObject.ReadPacket(Int32 bytesExpected)
at System.Data.SqlClient.TdsParser.ConsumePreLoginHandshake(Boolean encrypt, Boolean trustServerCert, Boolean& marsCapable)
at System.Data.SqlClient.TdsParser.Connect(ServerInfo serverInfo, SqlInternalConnectionTds connHandler, Boolean ignoreSniOpenTimeout, Int64 timerExpire, Boolean encrypt, Boolean trustServerCert, Boolean integratedSecurity, SqlConnection owningObject)
at System.Data.SqlClient.SqlInternalConnectionTds.AttemptOneLogin(ServerInfo serverInfo, String newPassword, Boolean ignoreSniOpenTimeout, Int64 timerExpire, SqlConnection owningObject)
at System.Data.SqlClient.SqlInternalConnectionTds.LoginNoFailover(String host, String newPassword, Boolean redirectedUserInstance, SqlConnection owningObject, SqlConnectionString connectionOptions, Int64 timerStart)
at System.Data.SqlClient.SqlInternalConnectionTds.OpenLoginEnlist(SqlConnection owningObject, SqlConnectionString connectionOptions, String newPassword, Boolean redirectedUserInstance)
at System.Data.SqlClient.SqlInternalConnectionTds..ctor(DbConnectionPoolIdentity identity, SqlConnectionString connectionOptions, Object providerInfo, String newPassword, SqlConnection owningObject, Boolean redirectedUserInstance)
at System.Data.SqlClient.SqlConnectionFactory.CreateConnection(DbConnectionOptions options, Object poolGroupProviderInfo, DbConnectionPool pool, DbConnection owningConnection)
at System.Data.ProviderBase.DbConnectionFactory.CreateNonPooledConnection(DbConnection owningConnection, DbConnectionPoolGroup poolGroup)
at System.Data.ProviderBase.DbConnectionFactory.GetConnection(DbConnection owningConnection)
at System.Data.ProviderBase.DbConnectionClosed.OpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory)
at System.Data.SqlClient.SqlConnection.Open()
at Microsoft.Office.Server.Data.SqlSession.OpenConnection()
at Microsoft.Office.Server.Data.SqlSession.ExecuteNonQuery(SqlCommand command)
at Microsoft.Office.Server.Data.SqlDatabaseManager.HasAccess(String user)
at Microsoft.Office.Server.Administration.SharedDatabase.Microsoft.Office.Server.Administration.ISharedAccessControl.
SetAccessControl(SharedComponentSecurity security)
at Microsoft.Office.Server.Search.Administration.SearchSharedApplication.SynchronizeDatabase()
at Microsoft.Office.Server.Search.Administration.SearchSharedApplication.Synchronize()
at Microsoft.Office.Server.Administration.SharedResourceProvider.SynchronizeApplications(SharedComponentSecurity sharedApplicationSecurity)
 


Looking down at the bottom of the stack I could see the method that started this entire mess was one that runs within the OWSTimer. This was important because we knew this was not as a result of a user making a connection to the DB in the process of doing some kind of SharePoint operation on the Web Front End (WFE). This meant this error was not impacting their experience as well as when a job in the OWSTimer fails the operation will be retried at the next interval for the job.

As typically with many issues that present as a problem with networking we captured several network sniffs at the time of the events. We could see that the server was responding fine and there appeared to be no errors on the network, very fast responses basically all the time. In fact during the period in which the events were being thrown every attempted SQL connection from the time of the event and back 30 seconds succeeded.

Next I started looking into the ADO.NET source code and soon I developed a theory. So the theory was that we are inside of the ADO.NET ReadSync() method and we are getting a very small timeout value (possibly zero) passed in which caused us to not do the proper WaitForSingleObject() on the socket, basically passing 0 for the TIMEOUT which does not allow enough time for the response from the server. A TIMEOUT of zero basically causes the thread to release the remainder of its quantum and then immediately go back into a ready state to be scheduled back on the processor. So what would cause such a condition? ADO.NET calculates the timeout time by using the timeout passed in from the user code and calculates a time in the future with which to expire the connection. As the code progresses it compares the current time with this expiry time to get the number of milliseconds remaining and then uses that as the basis for the TIMEOUT on the WaitFor… methods. The problem with this approach is that it is susceptible to clock skews and corrections. So assume a 15 sec timeout and the connection attempt started at 12:00:00, the timeout would be at 12:00:15, now the timer service has figured out the clock time for the system is a bit slow and needs to be adjusted to 12:00:20, we just blew our timeout period. At this point you are probably saying to yourself this guy must be on crack if he really thinks this is what is going on. Well I am not on crack and you are correct this is not what was going on. If you think about it there is a really small window of opportunity for this to occur and to occur at the frequency we were seeing was very unlikely. What made this more unlikely was that we did not have the issue on just one server but rather dozens in various isolated environments.

The next step was to enable ADO.NET tracing to get a peek into what ADO.NET is really doing and try to ascertain why we are getting this timeout. Setting up ADO.NET tracing is trivial once you know how to do it. Below are the following files that I used.

  • Setup.cmd -- is what you run to install the necessary settings to perform the tracing.
  • Setup_trace.reg -- is the registry file that is imported.
  • ADO_NET.mof -- is the mof file that is compiled to enable the tracing.
  • _startTrace.cmd -- is the file you run when you are ready to start the tracing
  • Ctrl.guid.todd -- is the file that contains the IDs of the tracing handlers
  • _stopTrace.cmd -- is the file you run when you want to stop the tracing.
    • Setup.cmd

      Regedit /s setup_trace.reg
      Mofcomp ADO_NET.mof

      Setup_trace.reg

      Windows Registry Editor Version 5.00[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\BidInterface\Loader]
      ":Path"="c:\\windows\\Microsoft.NET\\Framework\\v2.0.50727\\ADONETDiag.dll"

      ADO_NET.mof

      #pragma classflags("forceupdate")
      #pragma namespace ("\\\\.\\Root\\WMI")

      /////////////////////////////////////////////////////////////////////////////
      //
      // ADONETDIAG.ETW

      [
      dynamic: ToInstance,
      Description("ADONETDIAG.ETW"),
      Guid("{7ACDCAC8-8947-F88A-E51A-24018F5129EF}"),
      locale("MS\\0x409")
      ]
      class Bid2Etw_ADONETDIAG_ETW : EventTrace
      {
      };

      [
      dynamic: ToInstance,
      Description("ADONETDIAG.ETW"),
      Guid("{7ACDCAC9-8947-F88A-E51A-24018F5129EF}"),
      DisplayName("AdoNetDiag"),
      locale("MS\\0x409")
      ]
      class Bid2Etw_ADONETDIAG_ETW_Trace : Bid2Etw_ADONETDIAG_ETW
      {
      };

      [
      dynamic: ToInstance,
      Description("ADONETDIAG.ETW formatted output (A)"),
      EventType(17),
      EventTypeName("TextA"),
      locale("MS\\0x409")
      ]
      class Bid2Etw_ADONETDIAG_ETW_Trace_TextA : Bid2Etw_ADONETDIAG_ETW_Trace
      {
      [
      WmiDataId(1),
      Description("Module ID"),
      read
      ]
      uint32 ModID;

      [
      WmiDataId(2),
      Description("Text StringA"),
      extension("RString"),
      read
      ]
      object msgStr;
      };

      [
      dynamic: ToInstance,
      Description("ADONETDIAG.ETW formatted output (W)"),
      EventType(18),
      EventTypeName("TextW"),
      locale("MS\\0x409")
      ]
      class Bid2Etw_ADONETDIAG_ETW_Trace_TextW : Bid2Etw_ADONETDIAG_ETW_Trace
      {
      [
      WmiDataId(1),
      Description("Module ID"),
      read
      ]
      uint32 ModID;

      [
      WmiDataId(2),
      Description("Text StringW"),
      extension("RWString"),
      read
      ]
      object msgStr;
      };

      /////////////////////////////////////////////////////////////////////////////
      //
      // System.Data.1

      [
      dynamic: ToInstance,
      Description("System.Data.1"),
      Guid("{914ABDE2-171E-C600-3348-C514171DE148}"),
      locale("MS\\0x409")
      ]
      class Bid2Etw_System_Data_1 : EventTrace
      {
      };

      [
      dynamic: ToInstance,
      Description("System.Data.1"),
      Guid("{914ABDE3-171E-C600-3348-C514171DE148}"),
      DisplayName("System.Data"),
      locale("MS\\0x409")
      ]
      class Bid2Etw_System_Data_1_Trace : Bid2Etw_System_Data_1
      {
      };

      [
      dynamic: ToInstance,
      Description("System.Data.1 formatted output (A)"),
      EventType(17),
      EventTypeName("TextA"),
      locale("MS\\0x409")
      ]
      class Bid2Etw_System_Data_1_Trace_TextA : Bid2Etw_System_Data_1_Trace
      {
      [
      WmiDataId(1),
      Description("Module ID"),
      read
      ]
      uint32 ModID;

      [
      WmiDataId(2),
      Description("Text StringA"),
      extension("RString"),
      read
      ]
      object msgStr;
      };

      [
      dynamic: ToInstance,
      Description("System.Data.1 formatted output (W)"),
      EventType(18),
      EventTypeName("TextW"),
      locale("MS\\0x409")
      ]
      class Bid2Etw_System_Data_1_Trace_TextW : Bid2Etw_System_Data_1_Trace
      {
      [
      WmiDataId(1),
      Description("Module ID"),
      read
      ]
      uint32 ModID;

      [
      WmiDataId(2),
      Description("Text StringW"),
      extension("RWString"),
      read
      ]
      object msgStr;
      };

      /////////////////////////////////////////////////////////////////////////////
      //
      // System.Data.SNI.1

      [
      dynamic: ToInstance,
      Description("System.Data.SNI.1"),
      Guid("{C9996FA5-C06F-F20C-8A20-69B3BA392315}"),
      locale("MS\\0x409")
      ]
      class Bid2Etw_System_Data_SNI_1 : EventTrace
      {
      };

      [
      dynamic: ToInstance,
      Description("System.Data.SNI.1"),
      Guid("{C9996FA6-C06F-F20C-8A20-69B3BA392315}"),
      DisplayName("System.Data.SNI"),
      locale("MS\\0x409")
      ]
      class Bid2Etw_System_Data_SNI_1_Trace : Bid2Etw_System_Data_SNI_1
      {
      };

      [
      dynamic: ToInstance,
      Description("System.Data.SNI.1 formatted output (A)"),
      EventType(17),
      EventTypeName("TextA"),
      locale("MS\\0x409")
      ]
      class Bid2Etw_System_Data_SNI_1_Trace_TextA : Bid2Etw_System_Data_SNI_1_Trace
      {
      [
      WmiDataId(1),
      Description("Module ID"),
      read
      ]
      uint32 ModID;

      [
      WmiDataId(2),
      Description("Text StringA"),
      extension("RString"),
      read
      ]
      object msgStr;
      };

      [
      dynamic: ToInstance,
      Description("System.Data.SNI.1 formatted output (W)"),
      EventType(18),
      EventTypeName("TextW"),
      locale("MS\\0x409")
      ]
      class Bid2Etw_System_Data_SNI_1_Trace_TextW : Bid2Etw_System_Data_SNI_1_Trace
      {
      [
      WmiDataId(1),
      Description("Module ID"),
      read
      ]
      uint32 ModID;

      [
      WmiDataId(2),
      Description("Text StringW"),
      extension("RWString"),
      read
      ]
      object msgStr;
      };

      /////////////////////////////////////////////////////////////////////////////
      //
      // System.Data.OracleClient.1

      [
      dynamic: ToInstance,
      Description("System.Data.OracleClient.1"),
      Guid("{DCD90923-4953-20C2-8708-01976FB15287}"),
      locale("MS\\0x409")
      ]
      class Bid2Etw_System_Data_OracleClient_1 : EventTrace
      {
      };

      [
      dynamic: ToInstance,
      Description("System.Data.OracleClient.1"),
      Guid("{DCD90924-4953-20C2-8708-01976FB15287}"),
      DisplayName("System.Data.OracleClient"),
      locale("MS\\0x409")
      ]
      class Bid2Etw_System_Data_OracleClient_1_Trace : Bid2Etw_System_Data_OracleClient_1
      {
      };

      [
      dynamic: ToInstance,
      Description("System.Data.OracleClient.1 formatted output (A)"),
      EventType(17),
      EventTypeName("TextA"),
      locale("MS\\0x409")
      ]
      class Bid2Etw_System_Data_OracleClient_1_Trace_TextA : Bid2Etw_System_Data_OracleClient_1_Trace
      {
      [
      WmiDataId(1),
      Description("Module ID"),
      read
      ]
      uint32 ModID;

      [
      WmiDataId(2),
      Description("Text StringA"),
      extension("RString"),
      read
      ]
      object msgStr;
      };

      [
      dynamic: ToInstance,
      Description("System.Data.OracleClient.1 formatted output (W)"),
      EventType(18),
      EventTypeName("TextW"),
      locale("MS\\0x409")
      ]
      class Bid2Etw_System_Data_OracleClient_1_Trace_TextW : Bid2Etw_System_Data_OracleClient_1_Trace
      {
      [
      WmiDataId(1),
      Description("Module ID"),
      read
      ]
      uint32 ModID;

      [
      WmiDataId(2),
      Description("Text StringW"),
      extension("RWString"),
      read
      ]
      object msgStr;
      };

      _startTrace.cmd

      @Logman start MyTrace -pf ctrl.guid.todd -ct perf -o Output.etl -ets

      Ctrl.guid.todd

      {7ACDCAC8-8947-F88A-E51A-24018F5129EF} 0x00000000 ADONETDIAG.ETW
      {C9996FA5-C06F-F20C-8A20-69B3BA392315} 0xFFFFFFFF System.Data.SNI.1

      _stopTrace.cmd

      @Logman stop MyTrace -ets


      Once the trace is complete you end up with a potentially huge Output.etl file. So once I opened the file in my favorite text editor (UltraEdit) I needed to find the error. So I began hunting around for the string “{WINERR}” with the actually error number preceding this string. The only errors I could find were 258 (WAIT_TIMEOUT) and 10065 (WSAEHOSTUNREACH). Both seemed relevant and both occurred on the same thread. Using LogParser I was able to break out the huge file into something more manageable. One of the columns you can use to do this is the ThreadID. My errors occurred on threaded 3136 so I used this command to break out the file:
      LogParser.exe "SELECT EventNumber,Timestamp,EventDescription,ThreadID,msgStr INTO bid_3136.csv FROM Output.etl WHERE ThreadID=3136" -oTsFormat:"yyyy-MM-dd:hh:mm:ss.llnn" -fMode Full


      I narrowed my search even further and finally I was able to figure out what was going on. Using EventNumbers we can see at 62463 we attempted to connect a socket, and at 62515 (the next trace item on this thread) we can see this event occurred 21 seconds later with our destination unreachable error. We then retry the connection, but remember our connection timeout of 15 seconds is already past, so in 62647 we have a timeout of 0, we send on the stocket and then attempt to wait on it and get the error in 62651 which is our WAIT_TIMEOUT which is the result of calling WaitForSingleObject on the socket handle with a TIMEOUT of 0. This is the error that is logged in the event log however it is not the real reason we failed to connect. Rather it was the first error we needed to focus on.
      EventNumber Timestamp EventDescription ThreadID msgStr
      62463 2007-11-16:14:33:28.367774700 System.Data.SNI.1 3136 enter_06 <Tcp::SocketOpenSync|API|SNI> AI: 000000001D52EE00{ADDRINFO*}, ai_family: 2, event: FFFFFFFFFFFFFFFF{HANDLE}, timeout: -1
      62515 2007-11-16:14:33:49.344609000 System.Data.SNI.1 3136 <Tcp::SocketOpenSync|RET|SNI> 10065{WINERR}
      62517 2007-11-16:14:33:49.344618700 System.Data.SNI.1 3136 <Tcp::Open|ERR|SNI> ProviderNum: 7{ProviderNum}, SNIError: 0{SNIError}, NativeError: 10065{WINERR}
      62519 2007-11-16:14:33:49.344631800 System.Data.SNI.1 3136 <Tcp::Open|RET|SNI> 10065{WINERR}
      62646 2007-11-16:14:33:49.348822900 System.Data.SNI.1 3136 enter_01 <SNIReadSync|API|SNI> 311#{SNI_Conn}, pConn: 00000000008BF2F0{SNI_Conn*}, ppNewPacket: 000000002380DB18{SNI_Packet**}, timeout: 0
      62647 2007-11-16:14:33:49.348827800 System.Data.SNI.1 3136 enter_02 <Tcp::ReadSync|API|SNI> 312#, ppNewPacket: 000000002380DB18{SNI_Packet**}, timeout: 0
      62648 2007-11-16:14:33:49.348832000 System.Data.SNI.1 3136 <SNI_Packet::SNIPacketAllocate|API|SNI> pConn: 00000000008BF2F0{SNI_Conn*}, IOType: 0
      62649 2007-11-16:14:33:49.348836000 System.Data.SNI.1 3136 <SNI_Packet::SNIPacketAllocate|SNI> 4#{SNI_Packet} from pool for 311#{SNI_Conn}
      62650 2007-11-16:14:33:49.348839400 System.Data.SNI.1 3136 <SNI_Packet::SNIPacketAllocate|RET|SNI> 0000000000875740{SNI_Packet*}
      62651 2007-11-16:14:33:49.348893400 System.Data.SNI.1 3136 <Tcp::ReadSync|ERR|SNI> ProviderNum: 7{ProviderNum}, SNIError: 11{SNIError}, NativeError: 258{WINERR}


      We then went through a battery of config changes to see if we could knock this one out, we checked the IP settings and routing tables to confirm everything was correct. We checked GPO IPSec policies and none were being applied. We disabled or removed additional NICs that were not being used. We removed load balancers, AV software, etc but nothing helped.

      We also performed a TDIMon trace and this confirmed exactly what we found in the ADO.NET logs.

      1469       4:18:34 PM          w3wp.exe:3820                88F44518             TDI_CONNECT   TCP:0.0.0.0:2283               40.1.221.120:80                HOST_UNREACHABLE-1792


      Next I attempted to use a Debugger however the logistics to make this reality were very difficult and once attached I had limited time to try to reproduce the issue. We seemed to run clean with the debugger which typically means we have some sort of timing issue or race condition (not the case here however).

      Next I attempted to use iDNA which is a really cool technology developed by MSResearch which can trace every operation the machine performs. You can then load the log into the debugger and step forward or back in time, aka time traveling. Unfortunately there are some limitations, it does not work on x64 and it does not work within a VMWare VM. Our test environment was both.

      The customer was on SP1 of Windows 2003 and we upgraded to SP2, while we did see some improvement it was only marginal.

      Knowing this was something within the customer’s environment it must be something that applies to all machines so I received a copy of their build documents and attempted to build an identical machine using their process however this did not prove helpful.

      Finally we decided to use a special build of tcp.sys that dumped a message to a DebugDiag window when a Destination Unreachable error occurred. Fortunately there are only a few places within tcp.sys where this error is returned. Once we received the log we could see were the message was being returned and looking at the code we saw the code path necessary to get us to that place meant IPSec was enabled. BUT we checked IPSec right. Wrong, we did not check to see what policies were being applied at the local registry. To determine whether IPSec policies are being applied from either the local registry or through a Group Policy object (GPO) we followed these steps:
      1. We Installed Netdiag.exe from the Windows Server 2003 CD by running Suptools.msi from the Support\Tools folder.
      2. Opened a command prompt and changed directories to %ProgramFiles%\Support Tools. (Assumes you chose the default install path).
      3. Ran the following command to verify that there is not an existing IPSec policy assigned to the computer: netdiag /test:ipsec.

      Now if there was no policy we would have expected to see a message similar to “IP Security test…….:Passed IPSec policy service is active, but no policy is assigned” however what we found was that although there was no Group Policy IPSec policy being applied someone (namely the security team) had gone in and applied IPSec policy directly to the machine. In my experience this is typically not done just because this practice is an administrative headache and is riddled with problems. What happens when you update your policy? What about if you miss applying a policy to one machine? What if you apply an outdated policy to a machine? You get the point…

      Here is one of the entries from the IPSec log, note we are blocking TCP source port 2283.
      Filter name: 2
      Connection Type: ALL    
      Weight                  : 34603266
      Source Address          : xx.x.xxx.xx        (255.255.255.255)
      Destination Address     : <Any IP Address>   (0.0.0.0         )
      Protocol                : TCP      Src Port: 2283    Dest Port: 0    
      Mirrored                : no
      Outbound Action         : Blocking


      There were some 30+ blacklisted ports the next table gives the full list.
      Protocol        UDP     Src     Port:   1080
      Protocol        TCP     Src     Port:   1080
      Protocol        UDP     Src     Port:   2283
      Protocol        TCP     Src     Port:   2283
      Protocol        UDP     Src     Port:   2535
      Protocol        TCP     Src     Port:   2535
      Protocol        UDP     Src     Port:   2745
      Protocol        TCP     Src     Port:   2745
      Protocol        UDP     Src     Port:   3127
      Protocol        TCP     Src     Port:   3127
      Protocol        UDP     Src     Port:   3128
      Protocol        TCP     Src     Port:   3128
      Protocol        UDP     Src     Port:   3410
      Protocol        TCP     Src     Port:   3410
      Protocol        UDP     Src     Port:   5554
      Protocol        TCP     Src     Port:   5554
      Protocol        UDP     Src     Port:   8866
      Protocol        TCP     Src     Port:   8866
      Protocol        UDP     Src     Port:   9898
      Protocol        TCP     Src     Port:   9898
      Protocol        UDP     Src     Port:   10000
      Protocol        TCP     Src     Port:   10000
      Protocol        UDP     Src     Port:   10080
      Protocol        TCP     Src     Port:   10080
      Protocol        UDP     Src     Port:   12345
      Protocol        TCP     Src     Port:   12345
      Protocol        UDP     Src     Port:   17300
      Protocol        TCP     Src     Port:   17300
      Protocol        UDP     Src     Port:   27374
      Protocol        TCP     Src     Port:   27374
      Protocol        UDP     Src     Port:   65506
      Protocol        TCP     Src     Port:   65506


      So what was it able this IPSec policy that was a deal breaker? Several of these blocked ports fell within the ephemeral port range and when we attempted to use one of these ports we would fail.

      So how do you fix this issue? According to all documentation the fix is to set the registry key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\ReservedPorts to exclude a range of ports so that TCP did not attempt to use them. This makes since, knowing they are blocked to remove them from the ephemeral pool of possible ports. The KB article that documents this key however does not mention that multiple ranges can be included. By adding ranges such as 1080-1080 and 2282-2283 one can exclude these ports from the ephemeral pool. There is however an open question on whether this actually works or not. The customer chose to use another workaround mainly the built in firewall in Windows 2003 Server.

      So why are these ports being blocked? It seems the security folks connect to each server and make the IPSec changes to block ports that have known vulnerabilities however they failed to tell anyone that they actually do this so this is likely why it did not make it into their build documents.

      2 Comments
      Filed under:

      I am frequently asked about the various .Net framework versions floating around. As the .Net framework continues to go through its various revs the waters will only become muddier. Here is a quick overview of each version we have to date…

      .Net 1.0 – This was the initial release of the framework which as of today only .Net 1.0 SP3 is currently supported by PSS. This only shipped as x86.

      .Net 1.1 – This is a side-by-side release of the framework in which no dependencies on 1.0 exist. Hotfixes and Service packs for this version do not affect other framework versions. This only shipped as x86 and the latest Service Pack is SP1. This version also shipped inside the OS on Windows 2003 RTM but only in the x86 SKU. For x64 and IA64 versions of Windows 2003 there is no .Net framework installed by default. The x86 SP1 for Windows 2003 updates the 1.1 framework however it is SP1+ in that it does not have the same binaries as let's say a Windows 2000 machine with .Net 1.1 SP1 installed.

      .Net 2.0 – This is a side-by-side release just as .Net 1.1. This version ships as x86, x64, and IA64. Windows 2003 R2 ships this version but does not install it by default. Using add/remove programs on any SKU will allow for installation. There currently is no service pack for this version. The x86 and x64 redist packages can each be installed side-by-side on a single x64 system.

      .Net 3.0 – This is the first add-on framework. It adds functionality in the form of WPF, WCF, and WF, plus upgrades the 2.0 binaries by adding a few necessary fixes to make the new components function properly. This version shipped with all SKUs of Vista and has a redistributable package as well. The 3.0 redist package will install the 2.0 framework should it not be present or will upgrade an existing 2.0 installation. The x86 and x64 redist packages can each be installed side-by-side on a single x64 system. Currently there is no Service pack available.

      .Net 3.5 – This is another side-by-side for 2.0 and 3.0 which adds functionality and updates the binaries of the previous two frameworks. This version has not been released and not much about it is public at this point but you can assume an x86, x64 and an IA64 version will continue to be available. You can probably also assume this will debut with Windows 2008.

      Probably one of the worst KB articles I have seen lately introduces, what has the potential to be a really exciting new feature of WSS 3.0/4.0 – Remote Blob Storage (RBS). This interface allows the storage of "blobs" outside of SQL server in a kind of BlobBank (I call dibs on this name. J). Blobs are the stream of content, a byte[] for those dev heads, which represent files within SharePoint. They are stored alongside their metadata in SQL. Having the ability to move these blogs to external storage systems is a huge win for some whom have TB of SQL storage and whom want to reduce the dependency on SQL for this type of storage.

      Simon Skaria plans on blogging about the interface so I will not go into that here however what I did want to cover were some of the pros and cons or as some would say "considerations" when evaluating the Remote Blob Storage API.

      Pros

      1. Ability to use storage other than SQL Server for blobs.
      2. Ability to leverage a lower TCO storage solution for blobs. This point should be scrutinized for each enviroment especially when you consider some of the Cons pointed out below.

      Cons

      1. Without ISV involvement there is no mechanism to ensure Backup and Restore is consistent between SQL and external storage solution. Since part of the blob, the ID is still stored in SQL and the blob itself is stored elsewhere these items must remain in a consistent state. So bottom line here is you must be careful when planning your backup and disaster recovery strategies.
      2. Lack of documentation.
      3. Depending on the external solution chosen it may be slower than SQL which would have obvious impact to the performance.
      4. The provider works at the farm level only.
      5. Removing the provider once deployed is going to be a huge problem should one need to back out.
      6. This is custom code that runs all the time within SharePoint; if there is a performance problem, memory leak, etc it will impact SharePoint stability greatly.
      7. The O12 implementation is not fully baked and will very likely change for O14. So the likelihood that an O12 provider will need to be rewritten for O14 is very probable.
      8. There is no migration path for the provider in an existing SharePoint deployment. New blobs will be stored with your provider while existing blogs will remain in SQL. You may touch the blogs and cause them to be migrated however that has a whole set of other problems.
      9. To date there is very little experience rolling out this API. So the real world knowledge is very tribal deep within MSFT and not likely to make it into the public domain in any quick manner.
      10. While the API will be supported however it is highly unlikely the support teams will have training or knowledge about its proper use.
      11. It is very unclear what the experience will look like while upgrading an O12 using this solution to O14.

      BTW if anyone writes a provider I would love to hear about it!

      Thanks
      Todd

      9 Comments
      Filed under:

      Up to this point Microsoft has always required customers to make a phone call into Microsoft Product Support to receive a hotfix. Today we have made available a web application that allows customers to fill out a simple web form and receive the hotfix by email.

      Here is how it works: The form on the site has field entries for the following information:

      Country/Region              (Pulldown menu)

      KB article number          (Text entry)

      Platform                        (Pulldown menu)

      Product Language          (Pulldown menu)

      E-mail address              (Text entry)

      You enter the data listed above and select the "Submit" button .  You are then taken to a submission confirmation webpage.  This webpage lets you know that you should receive a response from a Microsoft Professional within 8 business hours.  The e-mail response that you receive looks like the automated hotfix e-mail that you may have seen should you have ever used the previous call in method.

      Happy Hotfixin'.

      Todd 

      Recently I have been looking into MOSS related certifications and ran across the MCTS certifications. I have narrowed the huge list of certifications and exams down to just those that are MOSS 2007 and WSS V3 specific and have provided links below.

      MCTS Certification

      Configuration

      Windows SharePoint Services 3.0, Configuration

      Exam:    70-631: Configuring Windows SharePoint Services 3.0

      Related Course(s) : 5060, 5244, 5245, 5246, 5247, 5248, 5249, 5403, 5942, 5943

      E-Learning: Collection 5403: Implementing Microsoft Windows SharePoint Services 3.0

      Microsoft Office SharePoint Server 2007, Configuration

      Exam:     70-630: TS: Configuring Microsoft Office SharePoint Server 2007

      Related Course(s): 5061, 5250, 5251, 5252, 5253, 5254, 5255, 5404

      E-Learning: Collection 5404: Implementing Microsoft Office SharePoint Server 2007

      Application Development

      Microsoft Windows SharePoint Services 3.0, Application Development

      Exam:    70-541: TS: Microsoft Windows SharePoint Services 3.0 - Application Development

      Related Course(s): 999, 5385, 5386, 5387, 5388, 5389, 5390, 5391, 5392, 5393, 5394, 5395, 5396, 5397, 5398

      E-Learning: Collection 5385: Developing Solutions with Microsoft Windows SharePoint Services 3.0 and Visual Studio 2005

      Technology Specialist: Microsoft Office SharePoint Server 2007, Application Development

      Exam:     70-542: TS: Microsoft Office SharePoint Server 2007, Application Development

      Related Course(s): None

      E-Learning: None    

      0 Comments
      Filed under:

      Since coming to Developer Support in 2003 I have been working with ASP.NET and IIS. 4 years in any group at Microsoft is a good stint and it was time that I moved on. Because I love working with .Net and web technologies SharePoint seemed like a likely choice. Given SharePoint's huge popularity I felt this provided me with an opportunity to learn a technology that was here to stay and which would only get larger. As a platform there is just a ton that you can do with SharePoint and since it is built on top of ASP.NET I am not totally in the woods when it comes to learning this new product.

      So going forward my blogs will be more SharePointy in focus and hopefully (fingers crossed) more frequent and (again fingers crossed) as technically accurate and spelling and grammar error free as possible.

      0 Comments
      Filed under:

      Question:

      So what happens when you have an x64 development box running Windows 2003 R2 x64 Enterprise with IIS6 in WOW64mode and you want to install the 32bit version of WSS 3.0?

      Answer:

      ---------------------------

      Setup Errors

      ---------------------------

      Setup is unable to proceed due to the following error(s):

      - This 32-bit product must be run on a 32-bit Operating System. For 64-bit support, please install the 64-bit version of this product.

      - Internet Information Services is running in 32-bit emulation mode.

      Correct the issue(s) listed above and re-run setup.

      ---------------------------

      OK

      ---------------------------

      1 Comments
      Filed under:

      Recently an issue came up where an admin was upgrading their Windows 2003 Server OS from 32bit to 64bit but they wanted to keep their ASP.NET application as is and running under WOW. The issue they hit was with monitoring performance counters. While using perfmon they can see the counter data they could not get these to log to a logfile. The solution here is to change the perfmon service to run under WOW64 (x86) if you want to log counters from a WOW64 process.

      The following script sets this all up:

      sc \\servername config sysmonlog binPath=%systemroot%\syswow64\smlogsvc.exe

      Credit for the solution goes to Chris St.Amand of the Microsoft.com Debugging team.

      I just completed building my first MSS07 application and I thought I would jot down my thoughts and findings.

      My Application

      My application is very simple, I only take one piece of information from the user, query a database via a web service and play back a prompt based upon the results of the query. Simple right? Well one requirement was to support both English and Spanish and not speaking Spanish this made it a little difficult. I decided to go with a Managed workflow application as I am very green to Speech development but very familiar with managed code so it just seemed like a good choice. The application currently does all inbound traffic but I hope to extend it to do outbound too. The entire development process took a couple of months of working only at night.

      Lessons learned

      1. Speech is hard. I have written many command prompt, windows, controls, services, and web applications in my 10 years at Microsoft but developing a good Speech application was not a trivial task. Part of the problem is captured in my #2 lesson learned but speech in just inherently hard to do correctly. I grew to appreciate just how difficult Speech and IVR development can be and was really glad that the Speech team went with Windows Workflow Foundation (WF) as the primary interface. Developing speech applications with WF really works well; a very nice partnership of technologies. Some of the challenges I had was dealing with multiple languages, switching languages, handling silence, handling barge ins, developing a custom grammar, and handling no recognition. Don't get me wrong MSS helps tremendously here but it cannot do everything.
      2. Read the docs. Yea I hear this one all the time too but for Speech Server, esp while in Beta, it pays off. The dividends are the saving of time and frustration. Fortunately for me the Speech Server team is not only passionate about developing a solid Speech product but helping out folks such as myself that tend to get stuck from time to time. Their quick and detailed responses to my questions not only resolved my issues but offered insight into how the product works.
      3. The samples are your friend. I typically learn by example and the sample applications that ship with MSS07 Beta are very helpful when you need to solve a particular task such as playing a thinking sound, changing languages, handing silence and non-recognition, etc.
      4. A really well thought out and user friendly UI directly impacts your user's experience and. For speech applications prompts and grammars are your UI. Do not underestimate the value of professional voice talent when developing a speech application. My first attempt at prompts was when I sent a friend of mine's girl that works for him to a studio to record the prompts. Yuk, that was a $400 lesson. I then used Digital Base Productions; they were easy to work with and very reasonably priced. It is best to not engage a voice talent producer until you are locked in on your prompts, going back and rerecording prompts costs time and money (yea I did this too). I created a spreadsheet (Excel of course) for my prompts with two columns, one for English and the other for Spanish. I had a friend of mine do the translation to Spanish and found not everything translates very well, for example the # key, 'pound' or 'hash'? I used both. Digital Base was able to record my prompts and deliver them in only a couple of days. Ensure that when you have various language prompts recorded that the voice talent's voices are in sync since they will most likely be recorded by different individuals and one voice is not overpowering and the volumes on the wav files are leveled in sync.
      5. Unless you speak the language (and I do not) the telecoms will get really confused when trying to match a solution to what you think your needs are. I went to both AT&T and Verizon asking for a VoIP solution where I would get SIP from them which could then be sent directly to my MSS07 server. I got the feeling this is really new for them and AT&T really had a hard time with the request. Verizon finally got it and then offered a solution that required my purchasing Cisco's Call Manager. This was a deal breaker when I priced out Call Manager, WOW$$. I received some good advice from one of the guys on the Speech Server team, Keep it simple. That being said I went with a standard PRI T1 from AT&T. I found that Verizon and AT&T are really close on price however since I already had AT&T and their long distance ('LD' in telecom speak) costs were a bit less than Verizon. A T1 has 23 channels (phone lines) which is way more than I need.
      6. I purchased a Mediant 2000 by Audiocodes with a single T1 card. I live in Dallas and there just happened to be an ISV here locally that sold me the Mediant and 4 hours of setup where the tech guys connected to the box and set it up. We probably only used about 30 min of that time however so if you decide to go with a Mediant you may want to try to set it up yourself. It has a web server interface much like many of the home routers. Of course one change we did have to make that was buried pretty deep in the UI was configuring it to send SIP over TCP as its default is UDP but it worked on the 10th test phone call (see next lesson)!
      7. So my Mediant did not work on the first call but I used Netmon3 to sniff and found that it has a really nice SIP parser built right in. My problem of course was that MSS supports SIP over TCP and the Mediant uses UDP. With Netmon it was a breeze to figure this out and get it changed on the Mediant.
      8. I used a ton of Debug.WriteLine() statements to learn the flow my speech application took through the various turns and learned when various events where raised. This was critical to learning where to place code within the application.
      9. AT&T provided me with 100 phone numbers and to keep things simple I just mapped them all to my application. As you can imagine I get plenty of wrong numbers. The occasional wrong number is not such a bother however I noticed I was getting some calls from the same number. To try to keep my LD costs down and keep from tying up my channels I wrote a small piece of code that took the number that was calling and check it against a denied/black list and drops the call if on the list. I monitor my logs for phone spam and fax machines hitting my application and add their calling numbers to my black list. I wish MSS would provide this as I am sure I am not the first person to have to write a kind of firewall for speech.
      10. To decide on how I wanted my application to flow I called a ton of different IVR applications to see how they solved various problems and how they handled someone trying to trip them up. I found some I knew I did not want my application to resemble and others that I really liked. One in particular that I liked was American Airlines.

      The hardware

      Dell Precision 1850 with x64 Xeon processors and 4 GB of memory. 1Gb teaming NIC, runs Speech server like a champ.

      Audiocodes Mediant 2000 with a single T1 card.

      I developed my application on a Toshiba laptop which although it has a built in microphone I decided to purchase a Plantronics headphone and microphone since I mainly worked on this at night and I kept waking up my wife "talking to my computer". Money well spent.

      Thanks

      Todd

      1 Comments
      Filed under:

      I just got involved in a case where a customer of mine was hitting the following exception on .Net Framework 2.0.

      Exception information:

          Exception type: InvalidOperationException

          Exception message: Hashtable insert failed. Load factor too high.

         

      The callstack for the faulting thread is:

      System.Collections.Hashtable.set_Item()
      System.Runtime.Serialization.SerializationEventsCache.GetSerializationEventsForType()
      System.Runtime.Serialization.ObjectManager.RaiseOnDeserializingEvent()
      System.Runtime.Serialization.Formatters.Binary.ObjectReader.ParseObject()
      System.Runtime.Serialization.Formatters.Binary.ObjectReader.Parse()
      System.Runtime.Serialization.Formatters.Binary.__BinaryParser.ReadObjectWithMapTyped()
      System.Runtime.Serialization.Formatters.Binary.__BinaryParser.ReadObjectWithMapTyped()
      System.Runtime.Serialization.Formatters.Binary.__BinaryParser.Run()
      System.Runtime.Serialization.Formatters.Binary.ObjectReader.Deserialize()
      System.Runtime.Serialization.Formatters.Binary.BinaryFormatter.Deserialize()

      <<snip>>

      Searching around on the Internet I found others too that were hitting this issue. This appears similar to 831730 which is a 1.1 fix, however that fix was checked into the RTM build of 2.0 so we have another issue here. The good news is we have a fix for this issue however we DO NOT have a KB article for it just yet (I am working to correct that now). We have a race condition that is causing this issue and the fix will correct it. The hotfix package is 927579. Yea this KB article and hotfix package does not look like a fix for the above issue however what we need to understand here is that fixes are checked into our source control in a cumulative manner. So as long as the fix is checked in and you get a binary that was built that includes that source after this check-in date you inherit the fix.

      So if you are hitting this issue, which seems to be most prevalent in ASP.Net environments, you should call into Microsoft Support and request the hotfix for 927579.

      Recently I was asked how one could share session state between two ASP.NET 2.0 applications. Well I had to be totally honest; I had never even looked into this and really did not know why one would want to do this. Well when I was queried about a solution for this problem a couple times by different folks within a week I decided to take a look at the problem and not worry about why anyone would want to share state cross application. Besides who am I to say what people will use or not use, I thought EBAY was a horrible idea when I first heard about it years ago, I though damn everyone will just take your money and not send you the products....

      My first thought was that we should write a custom SQL provider that addresses the problem however writing a custom session state provider is not trivial and I like trivial solutions. Out of the box SQL session state can handle storing session from multiple web applications and it prevents session from being shared. The various session state items are stored in a single table 'ASPStateTempSessions' using a SessionId as a primary key. The SessionId is actually the string representation of the SessionId used to identify the user's session plus the ApplicationId. The ApplicationId is created when the session state provider calls the 'TempGetAppID' stored procedure. This proc either creates the id by hashing the application name or returns the id stored for the name in the ASPStateTempApplications table. For each web application that is using a database for session state you will have a row in ASPStateTempApplications that represents that application.

      So to work around this segmentation I modified the TempGetAppID stored procedure to always return the same Application ID (in my case '1') and I use a new application name of 'Global Session State Application'. Now any ASP.NET 2.0 web application that you point to this state database will be able to share session.

      After the modification to the stored proc I verified the changes have taken effect by looking at the SessionID column in the ASPStateTempSessions table. I now see values like: '13clw2vlrjio0d45opi0qg4500000001' and 'gz3wc5nqeq2es2ezzxfzpbyb00000001', note the last 8 characters of these strings that is my new global application ID.

      Here are the changes that I made to the stored procedure. Note that before you run this on your SQL box you may want to empty your ASPStateTempApplications and ASPStateTempSessions tables for general housecleaning. Also you will want to bounce your ASPNet Applications since they only call TempGetAppId stored procedure when the session state module is loaded.

      set ANSI_NULLS ON

      set QUOTED_IDENTIFIER OFF

      GO

      ALTER PROCEDURE [dbo].[TempGetAppID]

      @appName tAppName,

      @appId int OUTPUT

      AS

      SET @appName = LOWER('Global Session State Application')

      SET @appId = 1

      SELECT @appId = AppId

      FROM [ASPNET Session State].dbo.ASPStateTempApplications

      WHERE AppId = 1

      IF @appId IS NULL BEGIN

              INSERT [ASPNET Session State].dbo.ASPStateTempApplications

      VALUES

      (@appId, @appName)

      END

      RETURN 0

      Note about testing ---I have done minimal testing with this solution so please do you own testing and let me know your results. I look forward to your feedback.

      Carlo was kind enough to include me in his recent "tagging" activity.

      So here are 5 things you probably don't know about me:

      1. I recently graduated from SMU with a masters in software engineering.
      2. I own an XBOX but I cannot play it for more than 15 min at a time without getting violently ill.
      3. My family Veronica, Garrett & Grace are the most important things in my life.
      4. I totally love my job. Yea that is corney but true. When I was in high school and working on my undergarduate I always wanted to find something I could do with my life that was rewarding, paid a good salary and that I could work at for hours, days, and years and never get board. Found IT!
      5. I live in really hot Texas. I am totally jazzed though about this summer because we decided to finally build a pool.


      Looking around at how I got tagged I find that most of my virtual friends have already been tagged...what a bummer.

      0 Comments
      Filed under:
      The other day I had a friend of mine approach me about a possible bug he had found in the CLR memory perfomance counters. He pointed me to a performance log where his customer had captured a log that showed that the Bytes in All Heaps  exceeded that of Private Bytes. Since he knew that Bytes in All Heaps represented the memory in the managed heap and the managed heap is part of Private Bytes how could this be possible? Well the answer is quite simple, its not, and what he was looking at was not a performance bug either but a side effect of how the GC performance counters operatate. The GC only updates it performance counters after a garbage collection as that is the time at which the values that feed into the counters are the most stable and available to publish to the performance block. It just so happens in this performance log that all GC activity had stopped for a while just before the end of the log (probably at the end of a test run) so the counters where not being updated. The private bytes counter will continue to be updated as it is managed by the OS.
      More Posts Next page »
       
      Page view tracker