Welcome to MSDN Blogs Sign in | Join | Help

ShaderX2

Many years ago, in my former life as a UK game developer, I wrote a couple of chapters for a book called ShaderX2. And now this is available for download as a PDF. How cool is that? It would be great if more publishers would release their old books for free.

I had a lot of fun writing my articles, and learned two important things:

  • Writing is hard and time consuming
  • Technical books are not a good source of income :-)

I love fixing bugs

There is something deeply satisfying about prolonged periods of bug fixing.

Of course writing original code can be satisfying too, but at least for me, fixing bugs tickles an itch that normal programming simply cannot reach.

I think partly this is because working on bugs makes me feel more productive. When I am working on a new feature it may be weeks or even months before I am ready to check in, but I can sometimes fix dozens of bugs in a single day. Think, implement, test, check in, mark as resolved. Such visible signs of progress make me feel good :-)

It is interesting how most bugs take me along a similar path:

  • What is this crazy issue someone filed on Connect?
  • Huh? That can't possibly be the case!
  • Stare at the code for a while
  • Nope, no bug here: code looks fine to me
  • Oohhhh...
  • Yeah, there's a problem sure 'nuff
  • Dang, I don't see how I can fix this without breaking backward compatibility
  • Think hard
  • I wonder if...
  • Type furiously for a minute or two
  • Yeah!
  • Check in
  • Satisfaction

GameFest 2008

Details of the XNA Game Studio track for the 2008 GameFest conference are up. I'm doing two talks, about the Content Pipeline and networking.

Our officially scheduled topic (XML and the Content Pipeline) will resume just as soon as I finish preparing my GameFest slides and implementing my 3.0 framework features. Things are a little hectic right now :-)

/me is famous

Nazeeh continues his series of XNA team member interviews by talking not only to yours truly, but also my cat Rhys.

Indexing my old blog posts

While replying to questions in the forums, I often find myself wanting to link to one of my old blog posts. The only problem is finding the post in question! Google works if I can remember the title, but otherwise I'm stuck looking through my blog history for something I can vaguely remember writing about last year, or was it the year before?

So I made this index.

Thanks to the power of automated scripts, I should be able to keep this up to date as I add new posts, too.

XML and the Content Pipeline

I decided to write a few posts about the role played by XML in the XNA Framework Content Pipeline, because this isn't well documented and people seem to find it confusing.

The first thing to get straight is the distinction between which things are fundamental parts of the pipeline architecture, versus which are just specific implementations for one particular type of data. Let's start with a recap of the basic pipeline architecture:

  1. You have a file containing game data, which can be in any format you like
  2. The ContentImporter reads this file from disk, returning a managed object
    1. It might return one of our standard Microsoft.Xna.Framework.Content.Pipeline.Graphics types, but could also load any custom type of your own
  3. The ContentProcessor converts the managed object into a different format
    1. Sometimes it returns the same type, but massages the contents of the data (for instance adding mipmaps to a texture)
    2. Other times it may return an entirely different type (for instance converting a FontDescription into a SpriteFontContent)
    3. The processor may also be a no-op
  4. The ContentTypeWriter writes the processor output object into a binary .xnb file
  5. The .xnb file is deployed to Xbox
  6. Your game calls ContentManager.Load
  7. The ContentTypeReader loads the .xnb data into memory

Note that there is no mention of XML in any of these steps. So at a fundamental level, XML is not part of the underlying Content Pipeline architecture.

XML enters the picture in two places:

In stage 1 of the steps described above, your file might well happen to be in XML format. If that is the case, you would want the importer in stage 2 to read XML data. There are many ways this can be achieved:

  • You could use our built in XmlImporter, which is a trivial wrapper around the IntermediateSerializer class

  • Or you could write a custom importer using any of the following:
    • The standard .NET XmlSerializer
    • Or XmlDocument
    • Or XPath
    • Or XmlReader
    • Or the serializer formerly known as Indigo
    • Or the WPF XAML serializer
    • Or any of the various third party XML solutions
    • Notice a trend here? .NET offers a lot of different ways to read XML data!

Why, given all these options, did we bother to create our own IntermediateSerializer? Weren't there enough different serializers already?

Because of the second place where the pipeline uses XML.

After the importer runs, but before the processor, we have an optional stage 2.5, where we write the data that was just loaded to an XML file in the obj directory. We do this for two reasons:

  • When you are debugging a problem with your data, it can be useful to examine it in a human readable XML format. This makes it easy to see exactly what has been read by the importer, and what is going into the processor.

  • For performance. Because we have cached the data in this XML file, if a later part of the build requests that same data again, we can just deserialize it rather than having to re-import the original file from scratch. This was originally designed to speed up the case where you change your processor code, requiring the processor to run again, but the importer and original file have not changed. We never had time to implement that level of smarts in the pipeline (I still hope we'll get around to it someday!) but this cache file is part of our planning to eventually make that possible.

Caching is optional: importers can turn it off by setting an attribute. We disable it for our texture importer (textures are big, fast to import, and not very interesting to debug, so caching them would be a waste of time) but we do cache the outputs from our X and FBX importers.

We originally designed our IntermediateSerializer for the purpose of managing these cache files (I will explain why XmlSerializer was unsuitable in my next post). Once we had a serializer of our very own, we decided it would be useful to expose this as a public API so people could use it for other things as well. For instance:

  • When you are debugging a complex processor, it can be useful to manually call IntermediateSerializer.Serialize at interesting points, dumping out copies of your data for later analysis.

  • Since this serializer can efficiently transfer model data between a human readable XML format and the pipeline object model, perhaps this might be useful for people writing tools such as level editors? For instance they could use a technique like this sample to import models from X and FBX formats, then do all their editing directly on the NodeContent data, using the IntermediateSerializer to load and save it.

  • Once we had the IntermediateSerializer, we found ourselves wanting to use it to import XML files into the pipeline. For instance it was trivial for us to load .spritefont files by calling into this existing serializer code. We decided it would be useful if we wrapped it up to create the generic XmlImporter, so people could easily use it to load their own XML data.

The important thing to take away from all this is that there is nothing special or magic about our XmlImporter. This happens to be the default importer which we select when you add an XML file to your content project, but if you don't like how the IntermediateSerializer works, or want to load XML data in some other way, you can write your own importer using any of the other XML choices provided by .NET.

Also, you should note that using the XmlImporter only affects the importer stage of the pipeline. Once the data has been imported, it is just a regular managed object like any other. XML is not involved in the processor, ContentTypeWriter, .xnb, or ContentTypeReader stages.

My next couple of posts will talk about why we decided to create this new serializer, and go into more detail about how it works.

Lock contention during load screen animations

If your game has a lot of content, your LoadContent call might take a while.

If loading takes a long time, you might want to display a "please wait" message.

If you value polish, you might even decide this message should be animated in some kind of awesomely cool way.

It is pretty easy to do that by firing up a background thread at the start of LoadContent, and having it do something like:

    while (!finishedLoading)
    {
        DrawFunkyLoadingAnimation();
        GraphicsDevice.Present();
    }

Here be dragons! The above code will work, but is liable to make your loading hundreds of times slower.

The reason is that every time you touch the graphics device from a different thread, the framework takes out a critical section, in order to avoid heisenbadness. If the animation is constantly redrawing, the animation thread will pretty much always own this lock, so any time your load function wants to create a graphics resource such as a texture, vertex buffer, or shader, it must wait until the animation thread releases it. This might take a while if the animation thread is just constantly looping over the same piece of draw code!

The solution is to slow down your animation thread by inserting a sleep call, which stops it hogging the graphics device.

The animated LoadingScreen class in our Network State Management sample shows one way to implement this.

A tale of many haggis

Once upon a time there lived a bored young aristocrat named Stanley. Growing tired of his indolent lifestyle, Stanley decided to go into the manufacturing business, so he purchased a haggis factory, which was going cheap as its previous owner had died in a tragic golfing accident.

On his first day as the new owner, Stanley arrived in front of the factory gates bright and early, eager to find out what his new investment was capable of. Here is what he saw:

  • The foreman arrived at 7:15, and unlocked the building
  • The workers arrived at 7:30 on the dot
  • The boiler was fired up at 7:50, while the rest of the staff were cleaning the machinery
  • The first sheep was delivered at 8:23
  • The boiler reached operating temperature at 8:48, and the first haggis was added to the pot
  • This haggis finished cooking at 11:55, and was moved to the cooling rack
  • It was packaged for distribution at 1:20 in the afternoon

"Yikes!", thought Stanley. "It took six hours to prepare a single haggis. Assuming I can sell this for $12, that gives an income of $2 per hour; nowhere near enough to cover payroll for my 20 staff. I fear this investment may have been a mistake."

Stanley has made the same error as many beginning graphics programmers, who render a single model (or sometimes just the default CornflowerBlue template) and then post on Internet forums complaining about the resulting framerate.

It is obviously ridiculous to judge the throughput of a factory by examining just one haggis. Sure, it takes a while to clean the equipment and heat up the cooker, but you only have to do that once in the morning. If you were to make 100 haggis, these could all cook at the same time in the same pot, so would take no longer than a single one. If you wanted 200, they might not all fit in the pot at the same time, but you could reuse the existing hot water, and the second batch of 100 haggis could be cooking at the same time as the first batch was cooling.

Graphics cards work the same way. When you see something like this:

  • CornflowerBlue runs at 800 frames per second
  • Adding a 100 triangle model gives 500 frames per second

it is easy to worry that your framerate will decrease by 300 each time you add 100 triangles. If this was true, drawing more graphics would result in:

  • 200 triangles = 200 fps
  • 300 triangles = -100 fps

Huh? A negative framerate is obviously impossible. This proves there must be something wrong with my logic.

My first mistake was to assume that framerate is a linear scale, when in fact the framerate is equal to one divided by the amount of time spent drawing each frame. To convert into linear millisecond units, we must divide 1000 by the framerate:

  • CornflowerBlue = 800 fps = 1.25 ms
  • 100 triangles = 500 fps = 2 ms

Looking at the difference between these frame times, it took 0.75 milliseconds to draw 100 triangles. Time is a linear scale, so we can predict how performance will change as we add more triangles:

  • 200 triangles = 2.75 ms = 364 fps
  • 300 triangles = 3.5 ms = 286 fps
  • 400 triangles = 4.25 ms = 235 fps

But this estimate is still too pessimistic, because graphics drawing time is not linear with regard to how many things are being drawn. In some cases, adding more triangles might be free, if the hardware is able to boil them up in the same pot it is already using to cook your previous graphics. In most cases, adding more triangles will slow you down, but by less than you would expect from measuring just a few in isolation.

It is appealing to think we might be able to predict the performance of a full game by measuring something smaller and simpler, but this is not possible, because we have no way to know how much of that small measurement represents real work, versus how much is just warming up the boiler and cleaning our equipment ready to start cooking.

In fact, measuring the framerate of a game that does only a small amount of work tells you pretty much nothing. If you want to know how long it will take to make a large number of haggis, the only accurate way to find out is to crank up the production line and actually make that many haggis!

Stephen scares me

Not content to spend all day working on the Visual Studio side of XNA Game Studio, my colleague Stephen then chooses to spend his spare time writing more Visual Studio extensions - for fun!

If only I had a more powerful blog post editor, I would choose a much bigger font to say that this is REALLY COOL.

Automatic .xnb serialization

John Doe has released a library that uses reflection to automatically serialize data to and from .xnb files, removing the need to manually implement a ContentTypeWriter / ContentTypeReader pair.

Two things I want to say about this:

  • It is really cool! The original Content Pipeline design actually included something similar to this, but we had to cut it in order to meet our ship date, and since then we've always had other higher priority things to work on. I would love it if we someday have time to build something along these lines directly into the framework.

  • Do you think his name really is John Doe? That would be soo cool :-)

Unloading projects from a VS solution

Here's a nifty little VS feature that I keep forgetting about, then rediscovering and being amazed by how useful it is...

If you have a lot of projects in the same solution, but are only working on one at a time and getting irritated by how long it takes to build the others, you don't need to bother changing your solution configuration to disable them. Just right-click on a project and choose "Unload Project". You can later do the same thing to reload it.

For complex solutions, temporarily getting rid of the things you don't need can dramatically speed up Visual Studio and improve your build times.

I use this when I'm working on games that have projects for Windows and Xbox in the same solution, to filter out whichever platform I'm not currently testing on. I also use it when working on the XNA Framework itself. The solution for our main Windows assembly contains 14 different projects (a mixture of C#, C++, and unit test code), but with half of them unloaded I can still rebuild and test changes in just a few seconds.

(Charles - n) + (George + n) != Charles + George

Or to rephrase the title, in the land of parallel processing you can rob Peter, pay Paul, and have everybody end up richer.

I once did some consulting for a game that was having performance problems. It used a sophisticated visibility system which split the environment up into many small sectors, then tested each piece against the view frustum and used some clever error tolerance heuristics to choose a dynamic level of detail per sector. This resulted in the minimum possible number of triangles being sent to the GPU each frame.

Trouble is, this game was CPU bound!

I recommended they remove the visibility system, merge the entire environment into a single huge mesh, and always just draw the whole thing. This improved the framerate.

These were smart programmers. They had profiled their game, and seen most of their time spent in the environment drawing code, so they concluded it was a good thing they'd already optimized that code, and wondered how much worse things could have been if they didn't have such a good visibility system.

Their mistake was not understanding parallelism. The visibility system was saving GPU cycles (by drawing fewer triangles) at the cost of increased CPU cycles (first to compute the visibility, and then to draw many small sectors as opposed to one big mesh, which caused lots of driver translation work). They had optimized for the wrong processor, and made their game slower as a result.

(ok, I admit it: this was my game and my mistake. I did figure it out in the end though :-)

Important suggestion from the MVP Summit

Once a year, a group of MVPs ("Microsoft Valued Professionals") fly out to Seattle to meet with Microsoft product teams and discuss what's good, what's bad, and where things ought to go in the future.

This can sometimes degenerate into a couple of days of us being told off by the ZMan, but this year I was pleased to come away with one particularly useful suggestion:

clip_image002

Stalls part two: beware of SetData

ajmiles got it in one.

This harmless looking code:

    vertexBuffer.SetData(particlePositions);
    graphicsDevice.VertexBuffer = vertexBuffer;
    graphicsDevice.DrawPrimitives(...);

will cause a pipeline stall if the CPU reaches the SetData call for frame #2 before the GPU finishes processing the DrawPrimitives call from frame #1.

This occurs because resources such as vertex buffers and textures are passed by reference, not by value. You can think of each resource as a separate piece of paper filled with data:

  • During LoadContent(), Charles writes 1 million numbers on a piece of paper
  • He labels this paper #42, then sets it to one side
  • During Draw(), Charles encounters a "render terrain" instruction
  • He writes "draw 1 million triangles, using position data from paper #42" on his "instructions for George" list

This is efficient, because Charles does not have to bother touching the actual vertex data while processing the draw instruction.

But what if the game later calls SetData on this same vertex buffer?

  • Charles encounters an instruction that says "erase paper #42, then write these new numbers onto it"
  • Before he can proceed, Charles must check if there are any references to the previous data still waiting in his "instructions for George" list
  • Even if George has already read the instructions, Charles must interrupt him and say "hey, are you finished with paper #42 yet? Can I have it back please?"

If the GPU is still using the resource, the second SetData call must stall until it has finished.

There are several ways to avoid such a stall:

  • Treat vertex buffers and textures as read-only: do not call SetData outside your load methods. Beginners often think they need dynamic SetData in places where it would be more efficient to leave the source data alone and apply dynamic effects via a vertex or pixel shader.

  • If you are generating new vertex data every frame, use DrawUserPrimitives instead of DrawPrimitives. This does not use a vertex buffer: instead, the vertex data is copied directly into the main "instructions for George" list, avoiding any possibility of a stall. Think of it as pass-by-value instead of pass-by-reference.

  • Double-buffer. Create a pair of identical resources, and alternate which one you use per frame. This gives the GPU more time to finish with each resource before the next time you SetData on it. Good for dynamic texture scenarios such as video playback.

  • Use SetDataOptions.NoOverwrite. This tells Charles "I know you would normally have to wait for George here, but please just carry on regardless. I promise I won't overwrite any part of the vertex buffer that he might still be using". You must then keep track of which specific vertices have recently been drawn, and only modify parts of the buffer once you are 100% sure the GPU has finished with them. We used this technique in our GPU particle system: check out the monster comment in ParticleSystem.cs for an explanation.

Stalling the pipeline

Normally, the CPU and GPU run in parallel. Framerate = max(CPU time, GPU time).

If your code causes a pipeline stall, however, the processors must take turns to run while the other one sits idle. Yikes! Now framerate = CPU time + GPU time. In other words, programs that stall can be both CPU and GPU bound at the same time.

The easiest way to cause a stall is to draw some graphics into a rendertarget, then GetData on the rendertarget texture to read the results back to the CPU. Think about what happens if you do this:

  • Charles (the CPU) is processing your drawing calls.
  • He has filled a piece of paper with instructions for his brother George (the GPU).
  • Charles reaches an instruction that says "copy data from George back into this array".
  • But the drawing instructions haven't actually been processed by George yet!
  • Charles cannot just note down the GetData call on his piece of paper. The next instruction might use values from the array, so he needs that data right away.
  • Charles has no option but to immediately hand the incomplete list of drawing instructions over to George, then wait around twiddling his thumbs in boredom until George has finished drawing everything, at which point Charles can resume processing the GetData instruction while George becomes idle.

One of the great successes of the Direct3D API is how it hides the asynchronous nature of GPU hardware. Many graphics programmers are writing parallel code without even realizing it! But as soon as you try to read data back from GPU to CPU, all this parallelism is lost (one reason it is hard to accelerate things like physics or AI on the GPU).

A similar problem occurs with occlusion queries. To avoid a stall, the query returns immediately, but with the IsComplete property set to false. The query completes at whatever later time George gets around to processing the relevant drawing instructions. Games must deal with this data not being available straight away. For instance our Lens Flare sample falls back on occlusion data from the previous frame if the latest information is not yet available.

There is one situation where you can cause pipeline stalls purely by writing data to the GPU, rather than reading back from it. Can anyone figure out what that is?

More Posts Next page »
 
Page view tracker