Scaling and Queuing PowerShell Background Jobs

Scaling and Queuing PowerShell Background Jobs

Rate This
  • Comments 21

A couple of months ago I had asked the PowerShell MVPs for suggestions on blog topics. Karl Prosser, one of our awesome MVPs, brought up the topic of scaling and queuing background jobs.

The scenario is familiar: You have a file containing a bunch of input that you want to process and you don’t want to overburden your computer by starting up hundreds of instances of PowerShell at once to process them.

After playing around for about an hour on Friday afternoon, here is what I came up with… This example assumes you have a text file containing the names of many event logs and you want to get the content of each log.

# How many jobs we should run simultaneously

$maxConcurrentJobs = 3;

 

# Read the input and queue it up

$jobInput = get-content .\input.txt

$queue = [System.Collections.Queue]::Synchronized( (New-Object System.Collections.Queue) )

foreach($item in $jobInput)

{

    $queue.Enqueue($item)

}

 

 

# Function that pops input off the queue and starts a job with it

function RunJobFromQueue

{

    if( $queue.Count -gt 0)

    {

        $j = Start-Job -ScriptBlock {param($x); Get-WinEvent -LogName $x} -ArgumentList $queue.Dequeue()

        Register-ObjectEvent -InputObject $j -EventName StateChanged -Action { RunJobFromQueue; Unregister-Event $eventsubscriber.SourceIdentifier; Remove-Job $eventsubscriber.SourceIdentifier } | Out-Null

    }

}

 

 

# Start up to the max number of concurrent jobs

# Each job will take care of running the rest

for( $i = 0; $i -lt $maxConcurrentJobs; $i++ )

{

    RunJobFromQueue

}

The English version of this script is:

  • Given a file input.txt containing the name of many event logs, queue up each line of input
  • Kick off a small number of jobs to process one line of input each. Each job just gets the content of a particular log.
  • When a job finishes (determined by the StateChanged Event), start a new job with the next piece of input from the queue
  • Clean up the jobs corresponding to the event subscription so at the end we only have jobs containing event data

The “Synchronized” code you see when defining the queue is just for good measure to make sure that only one job can access it at a time.

Have something you want to see on the PowerShell blog? Leave a comment… Can’t promise we’ll get to everything but it’s nice to see what everyone is interested in.

 

Travis Jones
Windows PowerShell PM
Microsoft Corporation

Leave a Comment
  • Please add 4 and 1 and type the answer here:
  • Post
  • Hi Stephen,

    I took a quick look at the script you posted and I like the idea of using the hosted runspaces for asynchronous processing. The manage-jobs function I wrote is working great for me, but I will definitely dig a bit deeper into the script you posted when I get some free time coming up. I have to admit, I wish that you could have multiple event handlers running at the same time. I have written a couple of PowerShell scripts with WPF GUIs and it becomes really evident when working in that type of script.

    Thanks!

    Scriptabit

  • Hi Stephen,

    I like the idea of Split-Job very much. After playing with it for a while I can see that it perfectly covers a class of tasks, say, named "slow processing". There is though yet another class that it is not yet covered well, say, named "slow/huge input and [slow] processing".

    Example (hypothetical):

    >: Get-ChildItem c:\ -Recurse -ea 0 | Split-Job { % { ... } }

    Split-Job starts to work only when Get-ChildItem has completed. It takes a lot of time and memory, too. Ideally, this might be avoided if Split-Job has an option to split also the input and invoke jobs with chunks of input data (progress percentage will not be possible, indeed, because the total number of input objects is unknown). What do you think? Can we hope that you will continue work on Split-Job  in that direction?

  • Yet another scenario not yet covered by Split-Job is "infinite input" (subscribing/listening to some events/messages/requests and processing them as soon as they arrive "infinitely"). In this scenario Split-Job never starts to work :)

  • Hi Roman,

    Good point, it does need to have the input finished before it starts processing it.  I've been toying with ideas on it and hope eventually to have a version that doesn't wait for the input to finish before processing.  I'll need to track the state of the pipeline and have that synchronized to the runspaces so they know when they are done.  I just haven't had the time to dig into it yet.  If I get something like that I'll post it to poshcode and it should address both your scenarios.

  • Finally, I created a tool that resolves most of shortcomings. The cmdlet Split-Pipeline splits pipeline input and processes input parts by parallel pipelines. The algorithm is online, it works without having the entire input available. Input can be huge or even infinite.

    github.com/.../SplitPipeline

  • Hi,

    Is this suppose to work with external programs or just Powershell functions? I made 1 modification: instead of "Get-WinEvent -LogName $x", I have "notepad $x". My input.txt contains 5 lines:

    hello

    world

    foo

    bar

    blah

    When I start it, it does open 3 notepads with hello.txt, world.txt and foo.txt. However, when I close any of these, I would expect that a new notepad with bar.txt should pop up. It doesn't... Nothing new pops up.

    Any explanation?

    Thanks,

    J

Page 2 of 2 (21 items) 12