Once in a while on our forum, somebody asks why the CPU usage isn't at 100% when offline encoding. I recently took the time to provide a thorough answer, which I think would be beneficial posted here.
So first, let's explore what's happening when encoding a media file using Expression Encoder. The encoding process involves 5 phases:
If any one of those operations is slower than the others, all the encoding pipeline will suffer in terms of performance because it will bottleneck the flow of frames into it.
#1 and #5 are usually the bottleneck when source files are either not local (aka slow Network share) or are very high-bandwidth coupled with a slow storage. Possible solutions: making sure the source files are copied locally on your fastest storage available and outputting to a different local storage are good solutions to reduce the bottleneck in those two areas.
#2 usually becomes a bottleneck because of the type of sources and the codecs used to decode them. Obviously, using complex HD sources will use significantly more resources than simple lower-resolution sources. Some 3rd party codecs are extremely slow, running only on one core. Trying to test other codecs available on the PC by disabling some of the codecs listed in the "Tools -> Options -> Compatibility" dialog may help reducing the bottleneck. It's also worth noting that because many codecs are single-thread, having faster cores can help enormeously here. This is the main reason why a 3.4MHz 4-core is faster than a 2.6MHz 8-core PC in many cases.
#3 can become the bottleneck when unneeded cropping/rescaling is applied, when "SuperSampling" resizing and/or "Auto Pixel Adaptive" deinterlacing are applied. The last two are our defaults, which were the best choice for "high quality" encodes at the time we shipped v4 RTM over 18 months ago and we unfortunately can't change the defaults until our next major version since it would be an SDK breaking change. For better balance between performance and quality, we highly recommend using "Bicubic" resizing and "Auto selective blend" deinterlacing. Both of them will perform 2-3x times faster than the defaults and will reduce the chances of the pre-processing phase being the bottleneck. Of course, if at all possible, removing the need for a resize (aka using the same frame size and pixel aspect ratio as the source) will also reduce the load on that phase while keeping better output quality.
This leaves phase #4, where most of the CPU cycles should be used if there are no bottlenecks elsewhere in the pipeline. In the best of worlds, this phase should be your bottleneck, which would very likely max out CPU usage on an 8-core PC for a single-stream encode. Depending on the encode settings, 100% may still not be achieved, but it should be pretty high. Here are few ideas to make this phase faster:
In the case where nothing seems to help maximizing CPU usage, one could also consider running 2-4 jobs in parallel in separate encoding processes. Using the UI, simply start 2-4 Encoder instances and run the encodes in parallel, bearing that enough system memory is available. This suggestion would certainly not help with a storage bottleneck situation (aka phase #1 and/or #5), but should greatly help speed up the process of encoding multiple jobs.
Finally, if encoding to H.264, using GPU encoding may drastically help speed up the encoding phase. Cuda is supported in Encoder 4 Pro SP1, and Intel QSV (Sandy Bridge) in our next SP release coming soon. When used with the proper hardware, either (or even both) options can cut down encoding time by more than half, and in some cases significantly more.
Hopefully, this gives you some insights on how to isolate and resolve encode performance issues.