Cascade Skyline - with Microsoft Logo and Project Support header - author Brian Smith

Patience please - async processes at work

Patience please - async processes at work

  • Comments 23

One thing that I have touched on before is the asynchronous behavior of the some of the processes in Project Server and Windows SharePoint Services 3.0.  This is still catching some people out.  For instance, if you provision a new PWA site and then decide you want to change something - so you delete it - it looks like it has gone right away.  However, the site collection that was created as part of the process (usually /PWA) takes a while to be removed.  This means you may see an error if you try to re-create the PWA site immediately.  Another point of confusion is the "Waiting to be processed" message in the queue service of PWA.  This can sit there and look like it is doing nothing, when it is actually waiting for some other background task to complete.  Canceling and retry may make no difference - it will go when it is ready.  Of course there are times when something has gone wrong and it will never process - such as the queue service not running.  Perhaps a password has changed - or a connectivity or permission issue is stopping the account running the queue service from being able to get to the jobs (the queue jobs are held in the Project Server database tables).  Normally a look in the ULS logs or event logs will give clues to both these types of failures and also sometimes the reasons why a job may just be sleeping.

 For some internal details of the queue service take a look at the TechNet article at http://technet2.microsoft.com/Office/en-us/library/0845d622-95ab-4c20-b419-0dbd5aab33a51033.mspx?mfr=true

The most common time for patience if if you have saved and closed a large project to the server and you are working on a low bandwidth or high latency connection.  This scenario was virtually impossible with 2003 but with 2007 it can be done.  However the save and check-in of the project may still take a while from the point you click the button on the client - and you may well find that the project is still checked out if you try to open it up too soon.  And remember - force check-in will not speed anything up - this just puts another job on the queue!

Technorati Tags:

 

Leave a Comment
  • Please add 6 and 5 and type the answer here:
  • Post
  • Hi Brian,

    I have a somewhat related error to MSPS and MOSS and the Project Server Queue.  Wondering if you happened upon something similar...

    In cleaning up an unused web app that was used to host a PWA site, i have gotten a ton of application error messages to the effect of

    Event Type: Error

    Event Source: Office SharePoint Server

    Event Category: Project Server Queue

    Event ID: 7626

    Description:

    Cannot start queue. SSP: 8cf16df8-2999-441d-8106-c3536492cdbf  SiteUID: 5f3b7e24-5f8b-4c2e-bde8-848d957c7a61 Url:  Queue: TimesheetQ

    The information in the logs on the server basically state the same information as what's in the event log.  I deleted the PWA site first, then the SharePoint web application sometime last week, however I am still getting the messages.  The SiteUID from the app log entry above is for the deleted PWA web app.

    Thanks, in advance,

    Alyssa

  • Hi Alyssa,

    It sounds like the PWA site was deleted through Central Admin before the PWA instance was removed through Manage PWA Sites in Shared Service Providers (or at least before this process had asynchronously completed). If it still shows in the SSP then try and delete from there.

    Best regards,

    Brian

  • We have a problem with the queue of Project Server 2007.

    When the operating system is restareted the queue service "Microsoft Office

    Project Server Queue Service" is correctly started, but it seems not to work.

    The jobs in the queue are in the state "waiting to be processed".

    If I manually restart the service, the queue start to work correctly and

    process the jobs.

    This behavior  happens every time that the system is restarted.

    Other persons have met the same problem?

    Where we can find some information about this problem ?

    Thanks

  • Hi msczr,

    This looks like a timing issue.  When the queue is working you will notice that there are two (or more if you have multiple SSPs) instances each of the eventing and queueing services.  The first appears to start for you OK, but it is the subsequent ones that do the work - which for you are not starting.  I would expect you to see errors in the ULS logs indicating why these are not starting correctly.  It may be that SQL Server was not available (a typical error if everything is on the same server and SQL doesn't start quickly) so setting the service to have a pre-requisite of SQL may allow it to start correctly first time.

    Best regards,

    Brian.

  • We have an issue with project server. Every night 1AM our application server(running project services) logs DB connection errror (event id 5586). It doesn't log this error at any other time.Other Web front ends don't have DB connection issues. There were cube build related errors(we have around 50 PWA sites). I did stop the cube build job, still the error gets logged daily at 1AM, Other error thrown just before that is -" Failed to execute timer job 'ApplyResourceCapacityTimeRangeJob'. Error: This operation returned because the timeout period expired. (Exception from HRESULT: 0x800705B4)"

    These problems started appearing after we tried some stsadm restore operations for few SharePoint sites. Some times some of the project queue jobs do get stuck & fail. Out setup is RTM version of MOSS & Project Server 2007 with SQL 2005

  • Brian, excellent response!

    This is exactly what happen when all the serviceas are on the same machine.

    I have resolved this problem linking the start of the project queue services to the Sql Server Agent (that start only when all the databases are up and running):

    Windows Registry Editor Version 5.00

    [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\ProjectQueueService]

    "DependOnService"=hex(7):53,00,51,00,4c,00,53,00,45,00,52,00,56,00,45,00,52,00,\

     41,00,47,00,45,00,4e,00,54,00,00,00,00,00

    [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\ProjectEventService]

    "DependOnService"=hex(7):53,00,51,00,4c,00,53,00,45,00,52,00,56,00,45,00,52,00,\

     41,00,47,00,45,00,4e,00,54,00,00,00,00,00

    Thank you again.

    Luigi

  • Hi Dinesh,

    You should be able to see either from the description of the job, or by identifying the Guid, which ApplyResourceCapacityTimeRangeJob is failing.  Possibly it is something left over from a deleted site.  Easiest option may be to disable it from Central Administration, Operations, Timer Job Definitions.

    Best regards,

    Brian.

  • Thanks for your response Brian.

    Figured out that there is some authentication issue with domain server. Domain server doesn't respond for a minute around 1AM. Hence the DB connection errors are logged & the jobs are failing. Is it possible to shift those jobs ('ApplyResourceCapacityTimeRangeJob')to some other time other than 1AM so that it can succeed?(There are around 50 jobs one each for PWA)

  • Hi Dinesh,

    I am not aware of a way of doing this (short of hacking the SharePoint config db and changing the job - which of course I cannot condone) but I will ask around.

    Best regards,

    Brian.

  • Hi Brian,

    If you could answer this you would make a whole bunch of very frustrated people happy.....  ;-)

    (1) I take a full farm backup using STSADM of my prod box which has PS2007 running on WSS 3.0.

    (2) Then I attempt to create a copy of the farm on a different server ( the DEV server ). I create an installation of PS2007 including up to PWA.

    (3) I delete all databases except the sharepoint management dbs.

    (4) I create a new SSP called SSP2, and move the management and PWA to it. It has a new database name. I delete the original SSP so that I can restore onto the box.

    (5) I then restore the prod farm from backup using:

    STSADM -o restore <\\DEV_SERVER\Share> <Prod_GUID_from_the prod_farm_backup> -restoremethod new

    (6) Then I make sure I move the PWA and admin into the new SSP and delete the old SSP2, but not delting the SSP2 database ( attempting to delete the db causes it to fail ).

    (7) I do an IISRESET.

    (8) I then run RelinkAllWSSSites http://<dev_server> http://dev_server>/PWA ( note if I use the port ID seems to make it fail, so I ignore the port ID ). Thsi links all the original PWA project sites intot he new system.

    Now - it all works BUT the QUEUE seems to ignore the new system.

    Brian, please do you have any ideas on why the queue might do this? I tried delaying the start to let SQL catch up but no joy.

    Is there a better way to duplicate a farm and have a different server name on the destination server? In effect I am migrating prod to dev.

    Many thanks - a good solution will solve I think a whole lot of pain for a lot of people.

    Thanks in advance,

    SteveW

  • Hi SteveW,

    Not sure where you got the steps from but I haven't seen this process used before and can imagine that it leaves some disconnection in the relationship between the PWA site and the project application.  If you just install your dev box to get Central Admin working then do a full farm restore - change the name of the server - all should be well.  See my postings on moving production to development http://blogs.msdn.com/brismith/archive/2008/09/26/project-server-2007-moving-a-copy-of-production-to-test-part-2.aspx and http://blogs.msdn.com/brismith/archive/2008/09/20/project-server-2007-moving-a-copy-of-production-to-test-part-1.aspx.

    I don't know of a supported way to get you working the way you have migrated.

    Best regards,

    Brian

  • Greetings Brian,

    I'm developing an application which uses the PSI to programmatically create and update projects.  Sometimes they include hundreds of tasks and thousands of associated custom fields, assignments and dependencies.  

    Smaller updates seem to work, but larger ones appear to be getting stuck in the queue.  I have three jobs which have been pegged at 50% done for several days, one since last week.  All say they are in the processing state of an update from the PSI.

    The 3 projects are visible from the ProjTool, but not the PWA Project Center.  They occupy positions 1,2 & 3 in the queue.  From the  sequence of jobs started by my app, I think it's the final one, an update which includes a flag telling the PSI to publish the project that's getting stuck.  I cannot cancel any of the jobs.  Doing so has no apparent effect in the Queue Manager job listing.  Of course stopping and restarting the queue service or rebooting the server have no effect on this.

    At the time that the 1st job entered the queue, I find this error in the Windows app event log:

    Faulting application microsoft.office.project.server.queuing.exe, version 12.0.6211.1000, stamp 46ce70f1, faulting module kernel32.dll, version 5.2.3790.4062, stamp 46264680, debug? 0, fault address 0x0000bee7.

    Following this, I get repeated warnings such as this:

    Standard Information:PSI Entry Point:

    Project User: NT AUTHORITY\NETWORK SERVICE

    Correlation Id: 2e832c71-775b-4e71-a9b5-02c618ebdc00

    PWA Site URL: http://prjsvr01/pwa2

    SSP Name: SharedServices1

    PSError: Success (0)

    Jobs locked by machine 'PRJSVR01' were unlocked and made available to other running queue services. The queue type (project queue, timesheet queue etc) which was non-responsive is: 'ProjectQ'. Queue instance UID of the non-responsive queue is: 'f5211a00-08be-4ea4-92f8-be45683f9e7b'

    Have I somehow mangled the queue beyond repair?  I'm thinking it's time to re-install Project Server since it's only a dev host at this point and doesn't contain any actual user data.  Still that would be a hassle if avoidable since it would mean recreating a number of enterprise custom field defs.

    Any ideas?

    Thanks in advance, Albert

  • Hi Albert,

    I assume you are breaking the dataset up into chunks of less than 1000 total rows in the dataset (see the SDK).  It would be helpful to have the event ID, or information from the ULS logs relating to these errors with the specific ID of the error as then I can search for other occurences.  The messages themselves are fairly generic.

    Best regards,

    Brian.

  • Hello Brian,

    Thanks for your quick reply.  First the good news.  

    After doing some research yesterday, I discovered that we hadn't applied the Infrastructure Update from 2007.  After doing so, the queue managed to clean itself up, with the stuck jobs going from 50% done to simply failed.  The same PSI project update that I ran last week which initiated the problems, today ran successfully!  

    Yes, I have to break up my datasets into rows of less than 1000.  That was a fun little gotcha to come across.

    While IT applied the update, I looked into the ULS, as you suggested.  What a bear.  I have no idea how to find items in there that might be relevent to the errors I see either in the event log of in the Queue Manager.  I did notice that there are some utils which claim to make reviewing the trace logs easier.  You reccomend any in particular?

    Also, despite my update via the PSI now running to completion w/out appearant error, as far as the app's concerned, there are still a failed job in the queue that I don't understand:

    <blockquote>

    Error summary/areas:

    The web that you are trying to create already exists.

    WSSWebAlreadyExists

    Queue

    GeneralQueueJobFailed

    </blockquote>

    From the trace log, I see this perhaps related record (how would I know?):

    "No SP web associated with project"

    Any clues what it's talking about, or if they're related?

    Your feedback's much appreciated!

    Cheers, Albert

  • Hi Albert,

    My recent screencast shows how to use Excel to make ULS logs more manageable - and also has some links to other tools http://blogs.msdn.com/brismith/archive/2009/01/04/project-server-2007-reading-sharepoint-uls-logs-with-excel.aspx. I think the error mentioned above can be ignored as it looks like the code when publishing tries to create a workspace even when one already exists.  There is a lot of work going on to ensure categories and errors are valid in the logs - but much of this will not help with the current release - but I hope the tools mentioned can help you find your way through the logs.

    Best regards,

    Brian.

Page 1 of 2 (23 items) 12