I had a question from my blog (thanks Alex!) on SQL deadlocks and error messages like the following on a busy server.
System.Data.SqlClient.SqlError: Transaction (Process ID 84) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
You will also see error id 7747 in the application event log.
This can be an issue with systems that are quite stressed and in all cases I have seen relates to the process that selects the queue jobs for processing. It does not break anything as such and no data is lost – but processing of queue jobs is delayed (but as the system is very busy they probably wouldn’t have processed quickly anyway!).
Deadlocks occur when two transactions interact in such a way that one requires a resource that the other has locked, and vice versa. Because neither task can continue until a resource is available and neither resource can be released until a task continues, a deadlock state exists. SQL Server selects one of the transactions as the victim and ends it – and posts the above error. See the SQL Server Books Online for more details.
In Project Server 2007 you can monitor activity using perfmon, and the counters include SQL retries per minute for both the Project and Timesheet queues. You can also modify the queue settings which can reduce the occurrence or behavior of the deadlocks. We don’t have any prescriptive guidance yet on suggested changes, but certainly reducing the number of threads, increasing the polling interval, or increasing the SQL retry intervals would likely reduce the number of deadlocks you see. However, these changes will also reduce the throughput of your queue – particularly when processing light weight jobs. If you see the deadlock behavior at specific time of day only – and want to change queue settings to suit workload you could even use the QueueSystem web service to change the settings (using the SetQueueConfiguration method).
I’m not sure if anyone will really want to micro-manage their queue in this way – or what the overall throughput benefits would be – but the option is there.
Technorati Tags: Project Server 2007
I had a comment posted by Vince Rothwell of a suggestion for troubleshooting the type of errors described in my last posting and thought it was worth a full posting rather than getting lost in comments. Vince gave me a link to his blog at http://blog.thekid.me.uk/archive/2007/02/15/a-solution-to-quot-an-unexpected-error-has-occurred-quot-in-wss-v3.aspx where he describes making some changes to the web.config file for the application giving the error - so that you get a less generic error message. I tried this on one of my servers where I could "force" some typical errors of this type. Instead of the "unknown error" you get a stack trace which may give good clues to finding the true problem. In my cases one of the issues caused by some unexpected NULL values coming from the database gave a good indication that this was the problem - next step would be tracing where the bad data is. In another case I could see a "GeneralUnhandledException" so still needed to dig deeper. The final case gave me a specific GUID identifying the problem data element. So all in all a good way to troubleshoot these types of errors.
A couple of words of warning though - you probably don't want to do this for a public facing site - and even exposing your users to these types of errors is best avoided. One way to limit the viewing of these errors is to set custom errors to "RemoteOnly" rather than Off - which would then only give the full error details on the server (assuming you can repro on the server). The stack trace errors could tell a potential hacker more about your back-end systems than you would want them to know!
Thanks again to Vince - and please do visit his blog - a great source of excellent information on SharePoint!
With Project Server 2007 being a WSS 3.0 application you may come across either of these error messages. The Unexpected error can appear on web part pages, and the Unknown error is more likely on pages without web parts. The two errors are really telling you the same thing - This page (or web part) needed some data - but the data doesn't fit what I was expecting. If you get this then you probably need to raise a support call (see http://support.microsoft.com for options available to you) - or search through the KBs to see if there is a fix for the specific problem you are seeing (depending when you read this there may or may not be fixes up there right now).
But for some potential immediate relief from the problem here are a few tips that may at least get that page displaying again. The problem could occur on a number of different pages - but just to give some context lets talk about the "My Tasks" page. This will be displaying the "My Assignments" view by default, but in fact as this web part loads the SQL stored procedure behind the scenes will be bringing in data for any of the views that a specific user has access to from the drop down of views. So even if "My Assignments" is the default - if you also have a custom view in the list that perhaps includes some custom fields then this data will also have been retrieved (and could be the culprit!). So identifying the bad data will help us with your support call and also hopefully get that page working again. Thinking in rows and columns - any customization will be changing the columns - and the rows will depend on things such as date ranges, tasks and assignments. So things you can try:-
None of these are meant as long term fixes - they are really just avoiding the issue until a fix can be found - which either the support call or KBs will help you achieve. But hopefully it will give you an understanding of what might have led to the issue - and could get your resources back up and running while together we solve the deeper problem.
We have been working on different SharePoint farm configurations recently and particularly those involving adding Microsoft Office SharePoint Server to a farm that already has Project Server loaded. The penultimate step - 8 of 9 - was being really slow and we had reports of users canceling, starting again to try and solve this problem. Then one of my colleagues said "That step is taking about an hour". Hmm - having read Chris Boyd's recent posting on event registration not happening immediately because of the recent DST changes I suspected this process may also be affected by the issue documented in KB 932563. A quick test setting the timezone to GMT and my colleague confirmed the whole process was now back to the 10 minutes or so! So until next week be patient - or change your time zone as mentioned in the KB.
I'm just waiting for the Autumn (Fall) as the clocks go back the other way to see if it then finishes 10 minutes before you start!
Sometime on a support call the customer will say "I have a great repro - just open this mpp file and things will crash!". Unfortunately this is a little like seeing an automobile wreck and trying to work out how the cars got where they did - and which direction they possibly came from. The mpp is already bad - but how did it get that way? The perfect repro is starting from a new project, doing a set a steps and then something bad happens. Even better is when following the same steps on our machines still gives the same problem!
Having a good repro also helps when working with Project Server 2007 issues - and the more information you have for us then the better we can help you. This may be in terms of getting to a resolution more quickly - or understanding the root cause of something that can be avoided until we get a good fix. Help us to help you! It isn't unknown that when working a call we suspect a specific problem - but can't get confirmation directly, but the clues are all over the logs. You know what the issue is; you know that we know that you know what the issue is...
This leads me on to one of my favorite current issues - but this will have to wait for my next blog - "An unexpected error has occurred". What's the deal with this message - and how can you recover?