Welcome to MSDN Blogs Sign in | Join | Help

The price of complexity

My house was haunted. One of the lights would randomly go on or off and random times without anybody fiddling the switch.

The previous owner of our house had installed fancy dimmer light switches. On a whim, we replaced one of the fancy switches with a simple on/off switch.  As we took the old fancy switch out, we noticed that it had capacitors, a circuit board, resistors, even a microchip! That's a lot of different things to break down.

The new simple switch worked great, and the associated light no longer randomly toggles. So I conclude the fancy light switch was the problem.

Furthermore, the dimmer switches were more complicated to use because they had some fancy pressure touch thing: a quick fast press would toggle the light; a soft press would dim the light. It took us a while to get used to them.

When I was at Home Depot, the fancy switches were $40; whereas the simple on/off switches were ~$3.  The old simple switches throughout our house are still working; and we never really needed the dimming features for the fancy switches.

So the fancy switches were more expensive, harder to use, and haunted. Overall, not an ideal tradeoff for our needs. We'll stick with the simple on/off switches.

 

I think the same things applies to software design. You can build complicated software with lots of features, but that also means more places where it could break down. If a simple solution does the trick, it may not only be cheaper to build, but easier to use and cheaper to maintain in the long run.

Posted by jmstall | 1 Comments

Codegen for On Error Resume Next

VB has a "On Error Resume Next", which tells each line to swallow exceptions and just keep executing to the next line. It's kind of like a try-catch around every single line.

It may be tempting for C++ developers to make fun of VB for this, but this is a lot like programming with HRESULT (or any error handling strategy based on return codes) when you forget to check the return value. And as we look at the codegen below, VB's code could turn out to be even more efficient than the unmanaged error handling equivalent.

 

So in this VB snippet, the exception is is swallowed and "Next" is printed.

Module Module1

    Sub Main()
        On Error Resume Next
        Console.WriteLine("Hello!")
        Throw New Exception("Bong!")

        Console.WriteLine("Next")
    End Sub

End Module

 

If you're wondering what the codegen is like, you can always compile and then view the IL in ildasm. Or you can view it in Reflector as C#  instead of IL, which is easier to comprehend.

 

[STAThread]
public static void Main()
{
    // This item is obfuscated and can not be translated.
    int VB$ResumeTarget;
    try
    {
        int VB$CurrentStatement;
    Label_0001:
        ProjectData.ClearProjectError();
        int VB$ActiveHandler = -2;
    Label_0009:
        VB$CurrentStatement = 2;
        Console.WriteLine("Hello!");
    Label_0016:
        VB$CurrentStatement = 3;
        throw new Exception("Bong!");
    Label_0023:
        VB$CurrentStatement = 4;
        Console.WriteLine("Next");
        goto Label_009E;
    Label_0035:
        VB$ResumeTarget = 0;
        switch ((VB$ResumeTarget + 1))
        {
            case 1:
                goto Label_0001;

            case 2:
                goto Label_0009;

            case 3:
                goto Label_0016;

            case 4:
                goto Label_0023;

            case 5:
                goto Label_009E;

            default:
                goto Label_0093;
        }
    Label_0059:
        VB$ResumeTarget = VB$CurrentStatement;
        switch (((VB$ActiveHandler > -2) ? VB$ActiveHandler : 1))
        {
            case 0:
                goto Label_0093;

            case 1:
                goto Label_0035;
        }
    }
    catch (object obj1) when (?)
    {
        ProjectData.SetProjectError((Exception) obj1);
        goto Label_0059;
    }
Label_0093:
    throw ProjectData.CreateProjectError(-2146828237);
Label_009E:
    if (VB$ResumeTarget != 0)
    {
        ProjectData.ClearProjectError();
    }
}

 

 

So you can see it's just putting the entire region in a try/catch block, and then using a switch table to jump back to the appropriate line.  It has a "Current Statement" variable to track the last successful line to execute before an exception may have been thrown, and then switches to the next line on the exception path.

The switch table may seem evil at first, but remember that in native code, all those IfFailGoto() checks to propagate return results also add up to a lot of switching code. In this case, the branches are at least optimized into a single switch table as opposed to scattered branch code.

Posted by jmstall | 3 Comments

The waiting game

Punting on a problem can be good or bad, depending on the situation. Punting is not always retreating or surrendering.

Punting is good when the problem will be easier to solve later.

  1. For example, maybe you suspect something may happen that will render the problem moot (eg, a product would be discontinued). 
  2. Maybe you'll have more relevant information later that lets you inject some simplifying assumptions. 
  3. Maybe you'll have new technologies / options available later than make it easier to solve.
  4. Sometimes, some problems just work themselves out over time because the decisions to an ambiguous problem are answered by the decisions to a clear problem (see e^pi  = -1). 

 

Punting is bad when the problem gets harder to solver later.

  1. Maybe you're accumulating debt until you solve it. I find this is often the case with FxCop - I like to get FxCop enabled sooner rather than later because it's easier to stay at 0 errors once you get to 0 errors. 
  2. Debt accumulation also happens if future components will be broken until you fix some problem.
  3. Or maybe not having the problem solved is holding you hostage on solving other things.
  4. Maybe punting screws up your dependencies.
Posted by jmstall | 0 Comments

Understand the end-to-end scenarios

If you don't understand the end-to-end scenario, it's easy to do something that is ultimately self-defeating.

For example, my 3yr old daughter recently learned to play hide-and-seek. The goal of the game is to hide and avoid being found while the "it" player searches for you.

She's got the part about hiding down. But then once hidden, she shouts out "Daddy, come and find me!!". Given that I'm not deaf, it's kind of self-defeating.

 

But seriously...

This comes up often in performance scenarios. We've all heard some story where somebody tries a premature optimization and it ends up slowing things down. For example, somebody goes and adds caching to something, and the overhead of maintaining the cache drowns out any perf wins.

Another example is adding some cool feature that's an innate security hole.

A more general example is adding an feature X, but you need feature Y in order to use X.  For example, to create a new ICorDebug object in CLR V2, you needed to pass a version string of the debuggee. But V1 didn't have a way to get a version string from a running process, so we also had to add a 2nd new API, mscoree!GetVersionFromProcess in order to enable the attach scenarios.

Posted by jmstall | 1 Comments
Filed under:

Sometimes it's the obvious answer

Sometimes the answer to a question is so obvious that we skip over it looking for a fancier answer.

Example: A chair at my house had a bunch of little indentations on the seat - kind of like what you'd expect if somebody took a math compass and poked the chair a bunch of times.

I had some friends over and asked them what they thought caused the indentations. They guessed things like:

  • did you set something on top of it such as another chair?
  • did the buttons on somebody's pants scratch it?
  • did it get accidentally bumped a bunch of times?
  • did you drop a box of nails on it?

No. My 3 year old took a math compass and poked the chair a bunch of times.

 

But seriously ...
I noticed this with threading bugs. Sometimes I'd look at a crash dump of a race condition. I'd have no live repro and would be pouring over the sources trying to infer what the race could be. The search space for race conditions is huge, and I'd often find some crazy 15 step race ("the user presses 3 buttons within 10 ms while 4 threads happen to be at certain spots"). But while that was technically a bug, it turned out that was never the real problem. The real problem would always be much simpler.

Posted by jmstall | 1 Comments
Filed under:

Arguing by-example vs. by-principle

You can argue by providing examples supporting your case.  Alternatively, you can argue by appealing to more general principles.

For example, in arguing that "exposing public fields is bad," you could say:
By-principle: "It breaks abstraction and encapsulation."
By-example: "This untrusted plugin could set field m_foo to value 4 and cause a null-reference exception on line 16 of file widget.cs".

 

  By example By principle
When it's good Sometimes an example / counter-example can clearly illustrate a problem in an undeniable way. (example) Can rapidly prune the decision tree and avoid wasting time in overly specific discussions that would ultimately lead to a dead-end. 
Abstraction level low - uses specific data points high - uses generalizations
When it's bad This puts a burden on coming up with the "killer example", which you may not be able to do until it's too late. You can waste a lot of time trying to come up with specific examples. The inability to come up with a specific killer example can lead to a false sense of security. (Just because I can't find  a bug in your code does not mean it's bug-free.) Principles may be more subjective and may be more complicated to process.
Principles can be easily misapplied. (eg, "My principle is that our code should never crash; therefore I'm going to explicitly put null checks before every pointer access").
Myers-Briggs of target audience S (sensory) N (intuitive)
Posted by jmstall | 1 Comments
Filed under:

Things in Metadata that are missing in Reflection

System.Reflection is a high-level way of describing Types in .NET targetted at managed code consumers. The API is easy to use, but does not expose all the information that's actually present and affecting decisions.

For example, Reflection does not expose TypeRef, MemberRef, AssemblyRef, or other Ref tokens.  These tokens are references to things in other assemblies. Reflection just resolves them for you (potentially invoking an event to get help from your app) and hands back the resolved object.

Similarly, reflection is also missing TypeSpecs. TypeSpecs are just binary signature blobs that describe compound types (arrays, generics, etc). Reflection will parse  the blob and resolve it to a real System.Type.

This entry is not a complete list of all things missing in reflection; nor am I going to get into the other problems in reflection here. For now, I'll just look at TypeRefs...

Imagine you have a class that inherits from a type in another assembly. At the metadata level, the base type is described with a TypeRef token.

1. Practically, that means you could use reflection to inspect what a base type actually bound to,  but not what it was supposed to bind to (as described in the original assemblyRef).

2. Another issue is that when you have a high-level API (Reflection) that loses information over a low level API (IMetadataImport), you risk that the high-level API won't be able to talk to the low level API because it may not be able to provide it with the information the low level API requests.

3. In related trivia, ILDasm can print TypeRef, TypeSpec tokens:

//000010:         Console.WriteLine("Hi!" + arg);
  IL_0001:  ldstr      "Hi!" /* 70000001 */
  IL_0006:  ldarg.0
  IL_0007:  call       string [mscorlib/*23000001*/]System.String/*01000012*/::Concat(string,
                                                                                      string) /* 0A000010 */
  IL_000c:  call       void [mscorlib/*23000001*/]System.Console/*01000013*/::WriteLine(string) /* 0A000011 */
  IL_0011:  nop

So the inability to get the raw unresolved tokens from Reflection would prevent you from writing ILDasm on reflection that could print the above snippet.

Posted by jmstall | 1 Comments

Binary vs. Source compatibility

Binary Compatibility means that when something is updated, you continue to work without needing to even recompile. 

Source Compatibility means that you need to recompile to keep things working, but you don't have to actually change the sources.

One is not a superset of the other. Here are some examples in each combo.

Compatibility generally doesn't just mean that the change is discoverable, but that the change has some significant breaking implication that would cause a reasonably client to need to adjust their behavior. (Of course, you always find afterwards that's some important client did something unexpected and you need to compensate. Hence the heroics of the AppCompat folks). For example, with C# + reflection, any change is discoverable, so any change could technically break a client; but if a client breaks because they're relying on the names of private methods that you changed, they're hard-pressed to complain.

 

Yes Binary + Yes Source:

Renaming private methods. Changing a method body in a way that continues to behave the same.

 

Yes Binary,  Not Source:

Adding new method overloads.  Since overload resolution is determined at compile time, adding new methods won't affect already-compiled binaries. But if you recompile, it's possible that you may bind to the new overloads.  (For example, see float.Equals)

 

Not Binary, Yes Source:

In this case, you just need to recompile your sources to keep working. The compiler will respond to the change in a corrective way. For example, consider removing a method overload. At a binary level, the method you're bound to is removed and so things fail. But if you recompile, the compiler may bind to another overload that's semantically equivalent, and so things keep working without you having to change any source.

 

Not Binary, Not Source:

A real breaking change. This requires clients to update their sources and recompile. For example, removing a method.

Posted by jmstall | 2 Comments

Do you compile XML to IL?

We need some customer feedback to determine if we fix a regression that was added in VS2008.

Any language can target the CLR by compiling the language to IL, and then you immediately leverage the .NET platform, including access to the libraries and debugging tools.

Do you write a compiler that takes an XML source file in and then compiles it to IL, produces a managed PDB, and then expect to be able to debug the XML source file using the source-line information you put in the PDB?  For example, if MSBuild compiled to IL (instead of being interpreted), it would fall under this category.

Compilation techniques could mean:

  1. using Reflection.Emit and MarkSequencePoint
  2. compiling to C# source code with #line directives that refer back to the XML.
  3. emitting IL text files and using ilasm to produce
  4. using the unmanaged emit APIs directly.

 

What's regressed?

In VS2005, you can set breakpoints on source lines in the XML file (that map to the ranges specified in the PDB you emitted alongside the IL) and hit them. You can also do set-next-statement and do stepping.


In vS2008, setting code breakpoints in the XML file may not be hit. Instead, the IDE will inspect the source file contents to recognize it's XML and so ignore the managed PDB code ranges and attempt to set an xml data-breakpoint on the entire XML element. The data-breakpoint is designed to cooperate with the XML libraries, but not the managed PDBs. Thus the code breakpoint is not hit and you won't stop in the xml file. Does this impact you?

 

Example

Here's a very simple way to see the impact of this using case #2 above:

using System;
using System.Collections.Generic;
using System.Text;

namespace xml_debug
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine("Hi!");
#line 4 "xmlfile1.xml"
            Console.WriteLine("ABC!");
            Console.WriteLine("DEF!");
            Console.WriteLine("GHI!");
#line default
            Console.WriteLine("Hi!");
        }
    }
}

File "xmlFile1.xml":

<?xml version="1.0" encoding="utf-8" ?>
<doc>
  <test>
    abc <-- line 4
    def
    ghi
  </test>
  <test>
    other
  </test>
</doc>

 

The #line directive in the C# file would cause the next lines (up to #line default) to be associated with lines in the XML file, thus having the PDB associate the xml file with the IL. You can try this out in both VS2005 and VS2008 as a default C# console application to get a feel for the differences and extent of the issue.

Posted by jmstall | 16 Comments
Filed under:

Why are you caching data?

There are multiple reasons to cache data. For example, are you caching because of a performance issue of because of a correctness issue?  Know which, and comment it at the spot doing the cache.

 

If it's performance, the idea is that you have some expensive operation, and you save the results so that you only have to perform the operation once.  Sometimes caching can be more expensive than just recreating the object.

If it's correctness, the idea is that you care about object identity. You only want 1 instance of the object floating around.  Override operator==, Equals  GetHashCode(), etc can mitigate this, but you still have issues. Object.ReferenceEquals will know the difference. And not all .NET languages necessarily map == to operator==. Or maybe you only want the ctor to be called once per object, such if the object owns some exclusive native resource (write access to a file).

 

There's a maintenance angle from this too.

Two developers may look at the same cache and conclude it was needed for 2 different reasons. This can lead to problems:

  1. It can also be easy to start taking an unrealized dependency on object identity.
  2. Somebody may start caching an object without thinking about identity, and so may not go back and add the appropriate Equals() overrides.
  3. If you assume the cache is for correctness when it's really a performance issue, you may end up doing additional work to maintain the identity that's not needed and impedes performance.

 

See also:
Clearly Document Object Lifespans.
Object Identity in Managed Debugging

Posted by jmstall | 0 Comments
Filed under:

Why threading is hard

Anybody who says "I can write correct multi threaded code" probably should be saying "I don't test my multi-threaded code".

It is very difficult to write correct multi-threaded code.

One way to appreciate this is various "find-the-bug" pop quizzes that occasionally come up.

Another approach that I'll go after here is to look at the state space.

If single-threaded apps have O[n] execution paths where n = number of atomic operations,  just two threads have O[n!] paths or worse. (It's not O[n*n], or even polynomial - see below). And n is usually a very large number.

Each of those execution paths is an angle to hit your program and find a bug. Maybe you can test against O[n], but you're delusional if you think you can protect against O[n!] or worse.

15 is a small number, and 15 factorial is 1,307,674,368,000 = ~1.3 trillion. Testing a million cases is hard enough. "Trillion" may sound like an unrealistic number, but listen to some of the ship-stopper stress bugs that show up in product's end games followed by the "what are the odds that could ever have actually happened" comments, and it's not so unrealistic.

Practically, that means multi-threaded apps have so many states that you can't possibly keep track of them all in your head. And even if you do, the next guy that goes and changes a line of code just ruined your model.

 

Counting the execution paths

Suppose you have a function that executes sequence of atomic operations:  A0, A1, A2, A3.

I want to just compare single-threaded and multi-threaded cases, so for now, let's ignore loops, conditionals, etc. because that's common across both and so it just complicates the comparison. (This simplification actually downplays the complexity of adding additional threads). In this model, there's only 1 code-path through this sequence of N states: A0, A1, A2, A3. 

Now suppose you have a second function B0, B1, B2, B3 executing on another thread.

The threads can interleave these operations in any order. The convenient case would be (A0,A1,A2, A3, ... B0,B1,B2,B3). But maybe it's something goofy like: (A0,A1,B0,A2,B1,B2,A3,B3). If the bug only occurs when the pattern (A1,B0,A2, B1) is executed, good luck finding it.

It turns out for 2 sequences of lengths N and M, there are choose(N+M, N) ways to interleave them. That's (N+M)! / ( M! N! ) execution paths.   (see quiz here ). Hence O[n!] for 2 threads.

You can see how much more complex just simple execution gets with threads. Now imagine how much more complicated threads make loops, conditionals, etc.

 

How many states?

Locks don't make your code safe. They just reduce the number of states and hedge your bets.  At one extreme, 1 giant lock around everything can reduce you to 1 state per thread. If you put 4 states under 1 big lock (eg, A0...A3), and you put that same lock around every other state (eg, B0...B3), then you've reduced those 4 states to 1 state, thus eliminating 3 states.

You likely don't control the number of states. An operation that appears atomic (like a memory read) may actually be multiple states. Maybe you're looking at the source-line and the compiler split it out into multiple states. Or maybe there are windows where you communicate with the file-system, or other processes - and so they can be effectively injecting states into your state space. Or maybe the hardware has multiple states (usually around memory models).  That's why there's InterlockedIncrement(&i) instead of just doing i++.


Threading vs. Reentrancy
:

Now threading models aren't the only way to rack up states. Single-threaded models (like UIs) that queue messages can create a ton of states too, and those tend to create nasty reentrancy bugs. And there will be times when another thread would actually produce fewer overall states then a single-threaded model. Just be wary.

 

What if the states independent from each other?

Granted many of the states may appear to be independent from each other. So one may claim that O[N!] is way to pessimistic. But unless you have a very thorough isolation model, it tough to prove the states can't impact each other.

For example, states A0 and B1 may not directly interact, but maybe A0 impacts A1 which impacts B0 which impacts B1.  Maybe A0 impacts how much memory A1 allocates, and that impacts whether an allocation in B0 fails (especially if there's a process wide memory allocator), and then the bug is in B1 handling the out-of-memory failure case.

And even if most of your states actually are isolated, factorial growth is so fast that you only need a small number before you're doomed.

See also: Measuring how difficult a threading bug is.

Posted by jmstall | 8 Comments
Filed under:

Quiz: can you count how many combinations ...

Here's a combinatorics quiz:

If you have 2 ordered lists (lengths N, M), how many ways can they be interleaved into a single list while still preserving the partial ordering from the original lists?

So if the lists were:
List 1: A,B
List 2: X,Y

The following would be valid:

  1. A,B,X,Y
  2. A,X,Y,B
  3. X,Y,A,B
  4. A,X,B,Y

But ' yxab' would be invalid since it lists y before x, and the original list 'xy' says y must come after x.

For the full quiz effect, stop reading now and go figure it out! I'll leave the comments moderated so that folks don't give things away.  ([Update Jan 29] I published the answers) : Read on for hints.

-------------

 

 

My wife and I just worked it through and came up with an answer. Our approaches highlighted our different personalities. She started listing out examples while I started by looking for isomorphisms to more easily countable patterns. I ended up seeing an isomorphism in her examples and we found the solution together. 

 

To verify our answer, I wrote some Python code (run with IronPython) to go through brute-force and count. So the function t(na,nb) enumerates all orderings between 2 lists of sizes na and nb.  In this case, the lists are (a0,a1,a2) and (b0,b1), and there are 10 ways of interleaving them.

>>> t(3,2)
valid= a0 a1 a2 b0 b1
valid= a0 a1 b0 a2 b1
valid= a0 a1 b0 b1 a2
valid= a0 b0 a1 a2 b1
valid= a0 b0 a1 b1 a2
valid= a0 b0 b1 a1 a2
valid= b0 a0 a1 a2 b1
valid= b0 a0 a1 b1 a2
valid= b0 a0 b1 a1 a2
valid= b0 b1 a0 a1 a2
10

 

It does a dumb brute-force approach: enumerate all possible interleavings (which is huge) and then just take the ones that preserve the partial ordering (which is much smaller). While it's dumb and slow, it's more convincingly correct than some complex proof. (It optimizes for simplicity).  Here's the code:

# return a generator listing all orderings of l
# assumes all elements in l are unique
def combo(l):
    #print 'combo:', l
    if (len(l) == 1): yield l
    for head in l:
      rest =  list(l)
      rest.remove(head)
      for x in combo(rest):
        #print 'x=', x
        yield ([head] + x)
        

def verifyHelper(letter, all):
  last = -1
  for x in all:
    if (x[0] == letter):
      if x[1] <= last:
        return False
      last = x[1]
  return True 
  
# verify that the sub sequences are in order
def verify(all):
  return verifyHelper('a', all) and verifyHelper('b', all)

# convert a schedule list to a string for pretty printing,
# eg: a1 b0 a2 b0
def s(l):
  d = ""
  for x  in l:
    d += ('%s%d ' % (x[0], x[1]))
  return d

# Actual counter
def t(na, nb):
  a = [('a', i) for i in range(0, na)]
  b = [('b', i) for i in range(0, nb)]
  all = combo(a + b)
  total = 0
  for x in all:
    if (verify(x)):
      print 'valid=', s(x)
      total += 1
  return total
Posted by jmstall | 10 Comments
Filed under: , ,

Lang.Net 2008 is coming

Lang.Net 2008 is coming up this Monday through wed (Jan 28th -Jan 30th). This is targeted at compiler and language implementers. The agenda is here and includes a lot of great Microsoft and non-Microsoft folks. In my former debugging life, my angle for these sort of conferences was from the debugger perspective. (Here were my takes on the Compiler Labs last May '07 and Feb '05) This will be the first time I'll be at a language conference as a non-debugger guy. I suspect it will feel strange.

Jimmi Schementi has got more to say about it here. (He also wrote a good DLR article on Msdn)

Posted by jmstall | 0 Comments

Battle Simulation: size vs. smarts (part 3)

How much stupidity does it take to prevail over intelligence?

I previously explored simulating Real-time-strategy battles with IronPython. (Part 1, Part 2). We saw that even with very simple rules, different strategies are better than others.  If two armies of equal size attack each other, a good strategy clobbers a bad one.

In this entry, I use Python to look at the flip question: how much larger does the dumb army need to be to prevail over the smart one?

This answer leads to another practical tidbit, listed in the conclusion section.

Background:

See Part 1 for the code snippets and the rules.

The only choice that the armies have is for each unit within an army to decide which enemy unit it's going to attack each turn. The 3 strategies we've seen so far are:

  1. Attack-weakest: attack the weakest target each turn
  2. Sticky-random: pick a random target and attack it until you kill it, and then pick a new random target
  3. Attack-random: attack a random target each turn.

Part 2 showed that attack-weakest appears to be hands down the best over the other two, and attack-random is the worst. Of course, this is just 3 strategies out of a possible infinite, and there's likely a better strategy even than attack-weakest. Here's a graph of victory margin for each of the 3 matchups for army sizes up to 20 units per side. I'll refer to 'attack-weakest' as the 'smart' strategy, and 'attack-random' as the 'dumb' one.

Recall the code snippets from before:

  1. Make(n) makes an army of n units that all use the attack-weakest opponent strategy. MakeR(n) makes an army of n units that all use the attack-random strategy. These are just convenience functions to simplifying writing queries.
  2. Battle(x,y) fights the two armies x and y to the death, and then returns a statistics structure. The 'victory' property is the victory margin between -100% if x wins perfectly and 100% if y wins perfectly, and 0% if a tie.
  3. Battle(MakeR(n), Make(n)).victory fights a battle between a dumb army of size n and a smart army of size n, and then returns the victory margin. We saw previously that the smart army generally wins.

Here's the Python snippet to answer the question. favg() runs the lambda function a number of times to get an average. You can see that it runs the battle in a loop, gradually increasing MakeR's size until it finally wins. It records results: for each initial size between min and max, record the size advantage MakeR needed to finally win.

The code:

def ComputeSize(size = 10, nTimes = 20):
  advantage = 0
  l=[]
  while True:
    print '*******************************'
    print 'ad', advantage, 'size', size
    v = favg(nTimes, lambda: Battle(MakeR(size+advantage), Make(size)).victory)
    l.append((advantage, v))
    if v < 0:
      break
    advantage += 1
  return l

def Compute(min, max, nTimes = 20):
  r = { }
  for size in range(min, max):
    r[size] = ComputeSize(size)
  return r

 

Note this code has a simplifying assumption that the smart army will always win. Our previous results show that to be a safe and cogent assumption. Fortunately, we'll know if the dumb army wins (we'll see a size advantages of just 1), in which case we can go back and add the additional complexity to deal with the situation.

The key thing here is that our queries are now beyond the simple Sum/Avg/Select/Where and have become full-fledged algorithms. 

 

The results:

Here's the raw dump of some of the results of running Compute() with nTimes=3 (average across 3 runs). Each row is for a given base size (starting at 5 and shown up to 19), and the row is a list of tuples (size advantage, victory margin) where advantage ranges from 0 (tied) to however large it takes for the dumb army to beat the smart army. 

>>> for size in r: print size, '|', r[size]
...
5 | [(0, 40.0), (1, 15.3), (2, -36.45)]
6 | [(0, 37.0), (1, 15.45), (2, -31.1)]
7 | [(0, 36.0), (1, 16.75), (2, -24.85)]
8 | [(0, 37.0), (1, 20.95), (2, -8.4)]
9 | [(0, 40.0), (1, 26.0), (2, 9.85), (3, -28.6)]
10 | [(0, 45.0), (1, 34.0), (2, 21.65), (3, 0.05), (4, -33.0)]
11 | [(0, 40.0), (1, 29.0), (2, 16.55), (3, -8.7)]
12 | [(0, 35.0), (1, 23.95), (2, 11.9), (3, -19.5)]
13 | [(0, 44.0), (1, 35.0), (2, 25.95), (3, 14.65), (4, -9.95)]
14 | [(0, 40.0), (1, 31.0), (2, 21.9), (3, 10.9), (4, -18.95)]
15 | [(0, 36.0), (1, 27.0), (2, 17.95), (3, 6.9), (4, -23.35)]
16 | [(0, 32.0), (1, 23.0), (2, 14.0), (3, -1.5)]
17 | [(0, 46.0), (1, 39.0), (2, 31.95), (3, 24.5), (4, 16.85), (5, 4.0), (6, -24.65)]
18 | [(0, 43.0), (1, 36.0), (2, 30.0), (3, 22.95), (4, 14.45), (5, 0.75), (6, -25.55)]
19 | [(0, 40.0), (1, 33.0), (2, 27.0), (3, 20.0), (4, 11.35), (5, -9.3)]
...

So when a smart army of size 5 fights a dumb army of size 5, the smart one wins with a 40% victory margin. When it's 5-on-6, the smart one only wins by 15%. When it's 5-on-7, the dumb one finally prevails. 

 

To see how much advantage the dumb army needs per size:

   for size in r: print size, len(r[size])

Here's a graph of the results, showing size vs. len(r[size]).  The data there looks somewhat random.

image

The advantage line gradually trends up, suggesting that with larger battles, the dumb army needs a larger advantage to win.

Random Noise or Numerical Conspiracy?

Why is the graph above so random, as opposed to a more monotonic changing curve? Is that just random noise. Afterall, we only averaged it over 3 runs. Some of the previous graphs (like victory margin in attack-random vs. attack-weakest) that appeared to be random were actually very deterministic. (This was Quiz #2 from Part 1)

I ran the above with an average over 30 runs and still got nearly exactly the same data. So despite having results built on a random number generator, all that apparently random chaos is actually very deterministic.  (Quiz #2 again ... why?)

 

Playing with the data

To put it in more context, we can show the sizes as a ratio. So here's (Dumb-size / Smart Size) vs smart-size, for each of the cases that the dumb army wins. Since the dumb-army is larger, the ratio is always greater than 1.

I did the ratio in Excel. I already had the data in Excel for the graphing, and Excel is very easy for this sort of primitive manipulation. I see that as a case of using the best tool for the job. This would have been easy to do with Python too:

for x in r:
  a=len(r[x])
  print x, float(a+x)/x

And I'd probably do more complex manipulation with Python instead of Excel.

image

 

The results here are still pretty random without a clear trend line. This kind of looks like it's approaching an asymptote. If that's true, then it would mean that the smart army loses its advantage in larger scale battles. However, I think that would be a premature conclusion. Afterall, it we cut the size window at 10, 13, or 17 instead of 25, we'd jump to very different conclusions.

To make a good conclusion here, we'd need enough data to see the trend line stabilize.

 

Conclusion:

  1. We can see that the dumb army can prevail over the smart one if it's 50% larger. That's actually a useful piece of data, because when the armies are the same size, the smart one wins with a victory margin around 40% and all the smart units survive. 
  2. The smart strategy is good at maintaining low death rates against a similarly sized dumb army, but it is not going to beat overwhelming numbers (even a dumb army 2x its size). 
  3. Practical tidbit #3: Having an extra 50% units would let you just throw an army into battle without having to micromanage it and likely win. This can be useful in an arcade situation where you don't have the time to ensure your units are behaving optimally intelligent. (contrast this to Pracitcal tidbit #1: micromanaging gets better results)
  4. We're left with some strange outstanding questions. Why does a random attack strategy produce such a deterministic curve? Is there a trendline?

The good news is that this was is still pretty easy to model and query in Python. Next up, I'll try using Python to explore more scenarios in search of a trendline and answers to the outstanding questions. Stay tuned...

Posted by jmstall | 3 Comments
Filed under:

Battle Simulations with Iron Python (part 2)

I previously wrote about modeling RTS battles with IronPython. In this entry I'll explore a new policy for attacking that was suggested on the last thread.

Previously, I compared 2 policies for picking which opponent to attack:

1. Attack the weakest enemy.
2. Attack a random enemy.

Each turn (eg, after each round of shooting each other), units reapplied their policy to pick a new target.

Timothy Fries asked what would happen if units kept their targets until they killed the target, rather than picking a new target each turn. I'll call that a "sticky" policy, which can serve as a modifier to another policy.  So a Sticky-random (SR) policy is a policy that randomly picks a target and attacks it until it destroys it, and then randomly picks the next target.

 

Code changes

It was very simple to adjust the python scripts for the new policy:

Our Unit class's PickTarget(self, army)  function gets changed from:

  def PickTarget(self, army):
    guy = self.fpAttackPolicy(army)
    assert(guy.IsAlive())
    return guy

To:

  def PickTarget(self, army):
    if (self.stickyTarget and self.target and self.target.IsAlive()):
      return self.target
    # use policy to pick a target.
    self.target = self.fpAttackPolicy(army)
    assert(self.target.IsAlive())
    return self.target

So now we remember the target in a member field and only pick a new target one the old target is dead. The sticky policy is only applied if stickyTarget is true.

And then we add some new convenient builder functions (in addition to Make(), and MakeR()) to easily create armies with the Sticky-Random policy:

# set the army policy to "sticky-attack" (don't pick a new target until it destroys the current one)
def Sticky(army):
  for x in army:
    x.stickyTarget = True
  return army

def MakeSR(size):
  return Sticky(MakeR(size))

 

Show it's fair

Let's look at two equally sized armies applying a stick-random policy fighting each other.  As with random, we'd expect the battles to average out as a tie.

This gets a vector of victory margins from 20 battles of 5-on-5. l1 is using an attack random policy (MakeR), l2 is using a sticky-random policy (MakeSR).

l1 = frun(20, lambda: Battle(MakeR(5), MakeR(5)).victory)
l2 = frun(20, lambda: Battle(MakeSR(5), MakeSR(5)).victory)

Recall that the victory margin in Battle(x,y) is between -100% (army x wins with no damage) to +100% (army y wins with no damage).

Last time, we saw that the random policy is basically a tie with an average victory margin of 0% and a standard deviation under 5.

Here's a python function to compute standard deviation:

# compute standard deviation against 0
def sd(l):
  import math
  return math.sqrt(sum([x*x for x in l])/len(l))

We can look at the average and standard deviation of each run:

>>> avg(l1)
-1.2
>>> avg(l2)
3.75
>>> sd(l1)
4.69041575982
>>> sd(l2)
16.7630546142

In both cases, the averages come close to 0. However, sticky-random has a significantly larger standard deviation than random (16 vs. 4).  In other words, overall stick-random averages out to a tie, but whoever does win, wins by a bigger margin than in an attack-random policy.

Quiz #1: Explain why sticky-random has a wide victory-margin spread than just random.

 

Random vs. Sticky-Random

We can see how an army using Random policy fares vs. an army using the Stick-Random policy.


def RvsSR(nMax, nTimes=1):
  return [  favg(nTimes, lambda : Battle(MakeR(i),MakeSR(i)).victory) for i in range(1,nMax+1)]

This is very similar to the matchups used last time, but we use MakeSR() instead of Make() or MakeR() to get an army of the appropriate policy.

Here's a chart showing the victory margin of all 3 matchups (R vs SR, R vs. W, SR vs w). In all cases of X vs Y, Y is the winning army and the graph shows Y's victory margin over X:

image

We can see that Attack-Weakest is the best policy. It beats the other two, and it even beats Random by a larger margin than Sticky-random does (the red line is consistently higher than the blue line).

So Sticky-random is an intermediate policy: it's better than just attack-random, but not as good as ganging up on the weakest.

 

Conclusion

A key takeaway was that with Python, it was very easy to both a) adjust the object model and b) run a wide variety of queries. (In fact, I found it harder to make the charts with Excel 2007 then I did to generate the data with Python)

 

We can objectively rank the attack policies from best to worst:

  1. Attack-weakest: attack the weakest target each turn
  2. Sticky-random: pick a random target and attack it until you kill it, and then pick a new random target
  3. Attack-random: attack a random target each turn.
Posted by jmstall | 2 Comments
Filed under:
More Posts Next page »
 
Page view tracker