Mr. Endian Bytes Back Part 2
For the reader below who asked for an ASCII diagram here you go.
Let's say we have the following...
typedef struct
{
SHORT x;
SHORT y;
} FOO;
void BadCode( FOO foo)
{
foo.x = 0x0102;
foo.y = 0x0304;
InterlockedIncrement( (PLONG)&foo.x );
}
The foo struct consumes 4 bytes since each SHORT is a signed 16 bit value. On a little endian system, the byte values are laid out in memory as follows. (Remember - Least (little) signficant byte first)
0x02 (LSB of FOO.x)
0x01 (MSB of FOO.x)
0x04 (LSB of FOO.y)
0x03 (MSB of FOO.y)
InterlockedIncrement actually works on 32bit values. On little endian systems, the interlocked increment function expects the least significant byte of the 32 bit value to be at the address supplied to the interlocked function. So, in this case, the increment actually works as expected, and the result would be:
0x03 (LSB of FOO.x)
0x01 (MSB of FOO.x)
0x04 (LSB of FOO.y)
0x03 (MSB of FOO.y)
where foo.x == 0x0103 and foo.y == 0x0304.
(Also note that if we were to increment foo.x greater than 0xFFFF we would actually overflow the 2 byte SHORT and start incrementing the foo.y value. Since it is signed, similar madness would occur if we decrement the value below 0.)
Now, on a big endian system, memory is laid out as follows.
0x01 (MSB of FOO.x)
0x02 (LSB of FOO.x)
0x03 (MSB of FOO.y)
0x04 (LSB of FOO.y)
When we consider the InterlockedIncrement is 32bit on a big endian system, the fourth byte from a INT32 pointer is the least significant byte. In this case, 0x04 would actually be the least significant byte and would get incremented. After the increment memory would look like the following.
0x01 (MSB of FOO.x)
0x02 (LSB of FOO.x)
0x03 (MSB of FOO.y)
0x05 (LSB of FOO.y)
where foo.x == 0x0102 and foo.y == 0x0305.
So, the "InterlockedIncrement((PLONG)&foo.x)" is actually wrong. The cast is entirely a bad thing in this case, but we have it in there for perf and data compaction reasons.
A naive fix would be to use the address of foo.x - 2 bytes, but that would result in an unaligned pointer in this case, causing the increment to hit an alignment exception.
I fixed the problem by adding an #ifdef to the structure definition to swap the order of the fields.
typedef struct
{
#ifdef BIG_ENDIAN
SHORT y;
SHORT x;
#else
SHORT x;
SHORT y;
#endif
} FOO;
And changed the interlocked code to use the address of the FOO struct instead of the FOO.x field.
InterlockedIncrement( (PLONG)&foo )
And, of course, I was a good coding citizen and stuck a comment in explaining the madness and referring to the spot in code where we actually do the interlocked increment and decrement.