Cascade Skyline - with Microsoft Logo and Project Support header - author Brian Smith

My Queue is Stuck! - How to manage your queue service in Project Server 2007

My Queue is Stuck! - How to manage your queue service in Project Server 2007

The new world of Project Server 2007 and the architectural changes are catching a few of our customers out - and I thought I'd share a few tips and tricks for keeping the queue flowing - and some tips for getting things moving again if they appear to have stopped. 

First I will point to a great TechNet article on the Queue and as you will all have read this then my explanations will make more sense :).

Under Server Settings in Project Web Access the Manage Queue option allows you to see what is happening in the project and timesheet queues - if you don't have admin access then the Personal Settings will give you a glimpse of your queue jobs.  The latter option may not however give you the complete picture and allow you to see what might be ahead or you.  It is like being stuck on the highway and not being able to see around the corner to where the flashing lights are... 

So lets start with some definitions:-

Waiting to be processed - means exactly what it says.  Once I get to the front of the queue then I am ready to go.  But there may be other active jobs ahead that will stop my job starting even if I am first in line.  The queue is clever enough that it will hold jobs back if their processing would interfere with other running jobs.  An example might be a publish job that will need to wait for a cube build to finish.

Processing - means that I made it to the front of the queue, was allocated a thread and am working away!  One thing I have noticed is that the % complete indicator doesn't always make you think that "processing" is happening - but generally it is.  Looking in the ULS logs, event logs or at general server activity (particularly the Microsoft.Office.Project.Server.Queuing.exe process should help if you have continued doubts that processing is moving along.

Skipped for optimization - is the queue's way of telling you that it is not going to do the same thing twice.  Some queue jobs have a payload (such as saving a project) and others are merely instructions (such as publish a project).  If several of the same instruction are in the queue, then only one needs to be actioned.  An example might be working on a project and publishing a few times during a period of time.  If the queue was busy all of these jobs might be sitting waiting for a while - and then rather than doing each in turn it just needs to do one.  It is just an instruction to publish the content of the saved project.  This would not happen with a queue job that had a payload as each of these contains real data that needs to be applied - rather than just an instruction to do something with data somewhere else.

Getting Queued - appears to be one of the more confusing messages.  I mentioned above that some jobs, such as save project from Project Professional, have a payload. This payload goes into the queue as a group of related messages, which then get processed once they reach the front of the queue.  Getting queued means that these messages are going into the queue.  It is possible that the Getting Queued message appears for some time because a very large project is coming in across a very slow link.  One other potential problem that can break things is if this flow in of messages does not complete.  Perhaps the Project Manager saving the project shuts down Project before it completes - or perhaps goes out of wireless range midway through the process.  Either way the Getting Queued could sit there for some time. To fix this up find the person who has this project in mid-save and get them to reconnect and complete the job.  As a last resort you can cancel the Getting Queued - but YOU WILL LOSE DATA!  Any changes the Project Manager made will not get saved.  To protect you from inadvertently canceling one of these jobs we add a check box under Advanced options labeled "Cancel jobs getting enqueued" which will need to be checked before these jobs can be canceled.

Failed and Not Blocking correlation - is a failure that is isolated and not stopping any other jobs from processing.  The term correlation is used to group related queue jobs together.  There should be an associated error message and entries in the log to help explain the problem.

Failed and Blocking correlation - means that something bad happened that is also blocking other things in the related group.  If a save fails then a publish could not continue would be one example.

Success - is the one message we like to see!  It can also be useful to sometimes show the Success messages (by default they are not shown in the Manage Queue display) as it is a way of seeing if the queue is working at all.  Adding the completion state of Success through the options on the manage queue page is how this is done.

Canceled - means what it says.  It could have been canceled by a user, but it is also possible for jobs to be canceled by the server.  One example would be a failure early on in a save from Project Professional.  A job would have been added to the queue for the save - but reconnection may lead to cancellation of this job and the addition of another save job - it really depends hoe far the save got before the problem.  I simulate bad things like this by pulling my network cable out just after hitting save - just to see what happens!

I will follow up with another posting on the queue with some further tips on troubleshooting -but my parting gift is a guide to what the dialogs at the bottom of Project Professional 2007 mean during a save.

    • Blue progress bar - saving to local cache
    • Synchronizing data to server... - The data is going from the local cache to the PSI and being passed into the queue (Getting Queued)
    • Save job xx% complete.  Expected Wait Time 20s - The job is either Waiting to be processed or more likely Processing.  Once you see this then it is safe to close Project Professional - your saves are safely in the queue!

Technorati Tags: Project Server 2007

Leave a Comment
  • Please add 6 and 3 and type the answer here:
  • Post
  • Hi Rob,

    If they just open Pro and connect to the same server with the same cache then any incomplete jobs should continue - assuming the cache hasn't been cleared and the queue job is still sitting there.  Be aware that the cache if opening the project directly from Pro is different to that used if the project is opened indirectly from  PWA.

    It depends exactly what point it got to if it will continue - you may see a cancelled job and then a new save begin.

    Best regards,

    Brian.

  • Hi Brian:

    We have am issue that I haven't seen before.  The project server app is on one box and the sql server 2005 is on another, a virtualized server.  Whenever we try to publish, save or do anything, the queue always says 'Waiting to be processed (Sleeping).'  This is during UAT, so there are about 2 people total on the system.  We have applied SP1 and the recent Infrastructure upgrade.  They are not using MOSS, only WSS.  Any ideas as to why every event in the queue is 'sleeping'?  

    Thanks, Brian!

    Michelle

  • Hi Michelle,

    Usually you will find clues to "sleeping" jobs in the ULS logs, where it may give details of why the job went into a sleep state.  It usually does this if something else is happening that it needs to wait for - and it will wake up every 5 minutes or so to see if the condition has changed.  It is possible it thinks something such as a reporting database refresh is happening when in fact it isn't (or it was -  but has terminated unexpectedly, which could be due to a number of reasons).  As there could be a number of different causes you may need to open a support incident to have one of our engineers work through with you to find the root cause.

    Best regards,

    Brian.

  • Hi, Smith,

    This is a great web. There are several "Failed But Not Blocking Correlation" events in my queue and seems that I am not able to cancel them.

    I am wondering whether should I delete these events in the queue? if should, how can I do?

    Thanks

  • Hi, Brian,

    This is a great web. I am not having several 'Failed But Not Blocking Correlation' in the queue for days, and I am not able to cancel them.

    I am wondering should I delete them or I may just leave them there? if these events should be cancelled, how can I do it?

    Thanks,

    JJw

  • Hi JJw,

    As these have already failed then there is nothing to cancel.  They will get cleaned up based on the settings in the Queue Settings page - so no need to do anything.  In some cases these jobs cab be re-tried, but without knowing what jobs these are I do not know if this makes sense for you.

    Best regards,

    Brian.

  • Hi Brian,

    This is a duplicate post but relevant to stuck Queues.

    If you could answer this you would make a whole bunch of very frustrated people very very happy.....  ;-)

    (1) I take a full farm backup using STSADM of my prod box which has PS2007 running on WSS 3.0

    (2) Then I attempt to create a copy of the farm on a different server ( the DEV server ). I create an installation of PS2007 including up to PWA.

    (3) I delete all databases except the sharepoint management dbs.

    (4) I create a new SSP called SSP2, and move the management and PWA to it. It has a new database name. I delete the original SSP so that I can restore onto the box.

    (5) I then restore the prod farm from backup using:

    STSADM -o restore <\\DEV_SERVER\Share> <Prod_GUID_from_the prod_farm_backup> -restoremethod new

    (6) Then I make sure I move the PWA and admin into the new SSP and delete the old SSP2, but not delting the SSP2 database ( attempting to delete the db causes it to fail ).

    (7) I do an IISRESET.

    (8) I then run RelinkAllWSSSites http://<dev_server> http://dev_server>/PWA ( note if I use the port ID seems to make it fail, so I ignore the port ID ). Thsi links all the original PWA project sites intot he new system.

    Now - it all works BUT the QUEUE seems to ignore the new system.

    Brian, please do you have any ideas on why the queue might do this? I tried delaying the start to let SQL catch up but no joy.

    Is there a better way to duplicate a farm and have a different server name on the destination server? In effect I am migrating prod to dev.

    Many thanks - a good solution will solve I think a whole lot of pain for a lot of people.

    Thanks in advance,

    SteveW

  • Duplicate response to that posted to other thread...

    Hi SteveW,

    Not sure where you got the steps from but I haven't seen this process used before and can imagine that it leaves some disconnection in the relationship between the PWA site and the project application.  If you just install your dev box to get Central Admin working then do a full farm restore - change the name of the server - all should be well.  See my postings on moving production to development http://blogs.msdn.com/brismith/archive/2008/09/26/project-server-2007-moving-a-copy-of-production-to-test-part-2.aspx and http://blogs.msdn.com/brismith/archive/2008/09/20/project-server-2007-moving-a-copy-of-production-to-test-part-1.aspx.

    I don't know of a supported way to get you working the way you have migrated.

    Best regards,

    Brian

  • Hi Brian,

    I did as you suggested and it worked - but - the Issues and Risks for each project in its site i.e. /PWA/<project_name> seem to be broken, but Project Change Requests is fine.

    Do you any suggestions please? All the rest of the functionality including ebable able to  open projects , save & publish from a client PC using PP2007 to this new server seems fine.

    I did try to use the RelinkAllWSSSites.exe to try and fix the issue, but it keeps giving me attitude , saying :

    "ERROR: Unable to find server http://<server_restored_to> . Check the imput parameters and try again."

    Now I know PWA and the individual project sites are there, so do you have any suggestions please? Is there a need to alter the RelinkAllWssSites.exe.config  file to have the new server name in it?

    I tried this and got the same error.

    Also, is it necessary to run the relinkallwsssites at all?

    Is there a tool to check and correct any broken linkages within the whole system?

    Solving this problem will enable us to push forward in using project server and its frustratingly close to being robust.

    Any thoughts welcome.......

    Cheers,

    SteveW.

  • Hi Steve,

    I haven't seen problems with full farm backup - and you shouldn't need to run the WSS relink tool in this scenario.  You could try synchronizing permissions on the Project Workspace page.  Does you dev environment recognize the same user credentials (domain) as the original?

    Best regards,

    Brian.

  • Hi Brian,

    Yes the dev environment will recognise the same user credentials.

    I havent tried synchronising permissions - could you suggest the best way to do it please , or is it self explanitory?

    The error I get is "page not found" when I try looking at the Issues and Risks by the way. In cruising some of the other news groups, losing Issues and Risks and documents seems to be a recurring theme, if that helps at all.

    Thanks

    Cheers

    SteveW

  • Updated 10/13/2008

    --------------------------

    Hi Brian,

    I tried to do syncronization using this method:

    PWA\Server Settings\Operational Policies\Project workspaces

    I then highlighted a random project by clicking on the line for the project. While highlighted i went up to the "synchronise" button up the top of the page & clicked on it.

    I left it run for 5 minutes, tried to look at an issue or risk, and got the exact same error as before "file not found".

    What would you suggest from here please? On the site page for this project, Ican see all the issues and risk listed, but when I click on any isue or risk I get  the "page not found" error.

    I checked the queue and it says "Failed but not blocking correlation" which may be a whole new issue........

    By the way, I was perfoming these tests as a  Project Sevrer administrator.

    Any suggestions most welcome..    :-)

    Cheers

    Steve.

  • Hi Brian,

    I've sent you a comment a couple of days ago about the constant event logs I'm getting, i.e. Errors 7758, 7761 and 7754 related to the project queue and still frustratingly having no clue to the problem. Do you have any idea how these errors come about?

    Any hints most appreciated.

    Thanks,

    KH

  • Hi KH,

    Sorry - the earlier comment did not appear to get to me.  I checked my server and also found some of these IDs, and without the description I cannot be certain they are caused by the same issue for you - but it appears my SQL Server went off line for a short time early yesterday morning - and these messages all relate to lost connections to the server from the queue process.  Are you seeing these during normal production hours?  Could you have had some temporary outage?

    Best regards,

    Brian.

  • Hi Brian,

    You have given me a good hint, though our problem is not exactly the same. Your comment have led us to relook at the SQL Server and we finally found the cause of the problem.

    Although the SQL Server is installed on the same machine as the Project Server, the DB files are located physically on a SAN disk. It is the SAN disk that is giving the problem as it is taking a long time in every transaction - we verifed it by shifting the DB files to the local drive and the problem was resolved.

    Thanks you. You have been very helpful.

    KH

Page 4 of 7 (102 items) «23456»