I've actually been meaning to write about real time applications for ages so when I was asked to give a talk at MS Gamefest (http://microsoftgamefest.com) I jumped at the opportunity to give myself a hard reason to do the homework. Last Tuesday I gave that talk and below are the slide contents plus my speaker notes. The actual audio was recorded an will probably be available soonish, when that happens I'll post a link.
I hope you find this somewhat interesting :)
Managed Code for Real Time?
Good for What Ails You
Garbage collected memory models tend not to have long term degradations. Many of the most deadly problems simply can’t happen in this kind of model. When people talk about a “GC” leak what they mean is that they are holding on to a pointer that they should have nulled out. This is much easier to track down than memory that was “lost” in the classic leak model – nothing points to it or it isn’t freed. Wild pointers are totally impossible.
All of this is great news for game developers.
Great, I’ll take a dozen!
People learned best practices for classic memory allocators. In a word – they have awful performance so you have to wrap them. And that’s exactly what everyone does. You get assorted different custom allocators for different purposes. Ones that allocate and free a big chuck, or carve out lots of little objects, or some other important specialized requirement. The main thing is you are careful to get to know your allocator(s) and use them accordingly.Likewise, you have to get to know the GC. It’s useable directly – without wrapping – in a variety of cases, a great step forward. But you might shoot yourself in the foot if you do unwise things.
Things you need to know
(You can find the picture that I used in this article here, this discussion is taken directly from that article)
Collectors come in many flavors and today we’ll be talking about the flavor of a couple of different collectors that you might run into. The one in the .NET CF is quite a bit different than the one in the desktop. There are different rules for both – although if you follow the .NET CF rules you should get excellent performance out of the Desktop collector as well.
“Compacting” refers to the fact that our collectors will squeeze out the free memory, kind of like a disk defrag, when they think it is wise to do so.
“Generational” refers to the fact that the desktop collector can collect just some of the objects rather than all of them – notably it can collect “just the new stuff”
Simplified GC Model
In the simplified model (only):
Why do you care?
You need to know these things so that you get good locality of reference in your data structures. Better locality means fewer cache misses which means fewer clocks per instruction and therefore more frames per second.
You need to know how to prevent very expensive collection costs. Why do these costs arise and what can you do to limit the cost.
Some good things to keep in mind
Stay in Control
The classic thing you have to worry about in the desktop CLR is if you start leaking a lot of objects into generation 2 – the oldest generation. So basically object-lifetime patterns drive performance – the allocation rate and the death rate in each generation.
This is true because in the desktop world the GC heap could be very large (e.g. 1GB+) and full collects could therefore be very costly. So the trick in this world is to make sure you’re only doing partial collections.
There are many real-time applications in the desktop/server world that are successful, e.g. assorted financial companies have stock streaming services that do things with quotes and then facilitate order processing. These are all done on a deadline. How do they do it? Strict control of allocation volume and promotions to the elder generations.
.NET CF Considerations
If you want to get the best performance your total heap sizes in .NET CF should be roughly what your generation 0 size would have been in a generational collector.
If you keep accumulating old objects thereby letting your heap grow, the fixed cost of marking those objects during collections will start to hurt you.
If you have large numbers of objects that you need to pre-create that are going to survive you can minimize the cost of managing these object by keeping them devoid of pointers. If no tracing needs to happen huge swaths of memory can be marked as in-use (e.g. arrays) without even having to look at them.
So a good tactic is to keep your very long lived data low in pointers (e.g. use handles) and things you churn rich in pointers so that they are easy to manage. Controlling this blend is up to you.
Remember that when you start using handles you’re back to managing your own object lifetimes and free lists and so forth but that doesn’t matter at all if you’re talking about objects that basically stay around for a whole level. Use this wisely.
The total heap should be about the size of the CPU cache or a small multiple of it so that you are going to get lots of hits when collecting and processing generally.
Also, .NET CF needs no write barriers as its not generational, so one less cost right there.
Allocation Rate
Volume can kill you as well. Zeroing out all the memory can be expensive – and of course volume drives collections.
Collections are typically triggered when a certain number of bytes has been allocated – at that point the GC deems it wise to do a collection because there is likely to be enough junk that it’s worth bothering with.
The GC gets its efficiency by being lazy – collect to aggressively and all those savings go away. On the flip side, collect not enough and memory usage skyrockets and locality is destroyed. The GC keeps these two competing things in balance with allocation budgets.At the suggested volume of allocations and heap size you should be able to achieve garbage collection overheads in the low to mid single digits of percent. That’s a great result for general memory management.
What else?
Jitting isn’t going to be your death… it’s like a fixed overhead.
The idea is that you can more than win this back with simplified logic. Lack of destructors. No cleanup code to run (and who wants to write code to (e.g.) visit partly created trees and release them?)
Less code to write means more time to focus on algorithmic gains in your code – that’s where the real money is. But, full disclosure, you should know you are starting at a modest penalty.
Use the coding simplifications to your advantage, build great algorithms that would have been too hard or too expensive in man-hours to code otherwise and win.
Don’t try to write a Phong Shader in managed code for your game. That’s what the GPU is for.
Interop with Native
Just a few words here, again you can shoot yourself in the foot
Not much to say here other than, if you must, on the desktop, then keep it as simple as possible.
On 360 it isn’t even an option, therefore you can’t get it wrong :)