Larry Osterman's WebLog

Confessions of an Old Fogey
Blog - Title

Alignment (part 1)

Alignment (part 1)

  • Comments 15

I got an email the other day from someone (who will remain nameless) complaining about the fact that some of the NT structures had to be declared with #pragma pack:

Consider file WinBase.h from the Platform SDK, containing the following
declaration:
> typedef struct _WIN32_FIND_DATAA {
> DWORD dwFileAttributes;
> FILETIME ftCreationTime;
[...] 

By this person's logic, since the ftCreationTime field in the WIN32_FIND_DATAA structure was an 8 byte FILETIME structure, the ftCreationTime should be at offset 8 from the start of the structure, but he discovered that it was at offset 4 from the start.

And he was convinced that either there was something incorrect in the documentation for structure packing in Windows or that there had to be a hidden #pragma pack directive (which changes the default structure packing) in the Windows headers.

I realized after reading this article that many people have forgotten the lessons learned from the early days of MS-DOS.  This kind of stuff was known to every developer who worked in C and coded to MS-DOS.  You see, back in the MS-DOS days, memory was king.  So DOS packed all of its data structures as tightly as possible to save memory.

And if you attempted to make MS-DOS system calls from C, you invariably ran into packing issues.  For example, the MS-DOS get country data (which returned internationalization information) returned a structure that contained:

Offset Length Description
0x00 2 Date format
0x02 5 Currency Symbol (ASCIZ string)
0x07 2 Thousands separator (ASCIZ string)
0x09 2 Decimal separator (ASCIZ string)
0x0b 2 Date separator (ASCIZ string)
0x0d 2 Time separator (ASCIZ string)
0x0f 1 Bit Field
0x10 1 Currency places
0x11 1 Time format
0x12 4 Case-map call address (DWORD)
0x16 2 Data-list separator (ASCIZ string)
0x18 10 Reserved

If I was to represent this as a C structure, the naive representation would be:

 struct INTL_DATA
{
  WORD _DateFormat;
  CHAR _CurrencySymbol[5];
  CHAR _ThousandsSeparator[2];
  CHAR _DecimalSeparator[2];
  CHAR _DateSeparator[2];
  CHAR _TimeSeparator[2];
  BYTE _Padding;
  BYTE _CurrencyPlaces;
  BYTE _TimeFormat;
  LPVOID _CaseMapCallAddress;
  BYTE _DataListSeparator[2];
  BYTE _Reserved[10];
};

The problem is that this structure definition wouldn't work.

You see, the compiler has a fairly straightforward set of rules defining how structures are aligned, and this structure violates them.

The compilers rules are:  In general, data is aligned on it's "natural" boundary.

What's that mean?  "Natural" boundary?  What on earth is that thing?

Typically, the "natural" alignment of data is based on its size.  A 1 byte field can be located at any address in memory.  A 2 byte field (a short) should only appear at even addresses in memory.  A 4 byte field (a long) should only appear at multiples of 4 bytes in memory.  And an 8 byte field (a longlong) should only appear at multiples of 8 bytes in memory.

A simple example goes a long way towards explaining this:

struct A
{
   int _FieldA1;
   char _FieldA2;
   short _FieldA3;
   char _FieldA4;
   long _FieldA5;
   void *_FieldA6;
};

Consider what the compiler's going to do with a variable of type struct A.  The first thing it does is to lay the structure out in "memory":

Field Index Field Name Field Size
0 _FieldA1 4
1 _FieldA2 1
2 _FieldA3 2
3 _FieldA4 1
4 _FieldA5 4
5 _FieldA6 4 (8 on 64 bit)

The next thing it does it to assign an offset for each field:

Field Index Field Name Field Size Field Offset
0 _FieldA1 4 0
1 _FieldA2 1 4
2 _FieldA3 2 6
3 _FieldA4 1 8
4 _FieldA5 4 12
5 _FieldA6 4 (8 on 64 bit) 16

This is where the "natural" alignment comes to play - The natural alignment for _FieldA3 is 2 bytes (it's a word), so it gets put at offset 6 - there's some empty space left in the structure between _FieldA2 and _FieldA3 (and between _FieldA4 and _FieldA5).  The overall "sizeof" struct A is 20 bytes (24 on 64 bit platforms).

For a given structure, the alignment of the structure is determined by the worst case field within the structure.  So in struct A's case, the alignment of the structure is either 4 bytes (on 32bit platforms) or 8 bytes (on 64bit platforms).

When the compiler lays out nested structures, it just follows its rules recursively.  Consider this structure:

struct B
{
   short _FieldB1;
   struct A _FieldB2;
   int _FieldB3;
   struct A _FieldB4;
}

Field Index Field Name Field Size
0 _FieldB1 2
1 _FieldB2.FieldA1 4
2 _FieldB2.FieldA2 1
3 _FieldB2.FIeldA3 2
4 _FieldB2.FieldA4 1
5 _FieldB2.FieldA5 4
6 _FieldB2.FieldA6 4 (8 on 64bit)
7 _FieldB3 4
8 _FieldB4.FieldA1 4
9 _FieldB4.FieldA2 1
10 _FieldB4.FIeldA3 2
11 _FieldB4.FieldA4 1
12 _FieldB4.FieldA5 4
13 _FieldB4.FieldA6 4 (8 on 64bit)

And again, the compiler then assigns offsets to the fields:

Field Index Field Name Field Size Field Offset
0 _FieldB1 2 0
1 _FieldB2._FieldA1 4 4+0=4 (8+0=8 on 64 bit)
2 _FieldB2._FieldA2 1 4+4=8 (8+4=12 on 64 bit)
3 _FieldB2._FIeldA3 2 4+6=10 (8+6=14 on 64 bit)
4 _FieldB2._FieldA4 1 4+8=12 (8+8=16 on 64 bit)
5 _FieldB2._FieldA5 4 4+12=16 (8+12=20 on 64 bit)
6 _FieldB2._FieldA6 4 (8 on 64bit) 4+16=20 (8+16=24 on 64 bit)
7 _FieldB3 4 24 (32 on 64 bit)
8 _FieldB4._FieldA1 4 28 (40 on 64 bit)
9 _FieldB4._FieldA2 1 28+4=32 (40+4=44 on 64 bit)
10 _FieldB4._FIeldA3 2 28+6=34 (40+6=46 on 64 bit)
11 _FieldB4._FieldA4 1 28+8=36 (40+8=48 on 64 bit)
12 _FieldB4._FieldA5 4 28+12=40 (40+12=52 on 64 bit)
13 _FieldB4._FieldA6 4 (8 on 64bit) 28+16=44 (40+16=56 on 64 bit)

Note that _FieldB2._FieldA1 has different offsets depending on whether it's 32bit or 64bit - this is because of the rule I mentioned earlier - _FieldB2 is aligned to the natural alignment of struct A, which is 4 bytes on 32 bit platforms and 8 bytes on 64 bit platforms.

It's also interesting to see what happens with a slight rearrangement of the fields in struct A:

struct A
{
   int _FieldA1;
   char _FieldA2;
   char _FieldA4;
   short _FieldA3;
   long _FieldA5;
   void *_FieldA6;
};

All I did was to move _FieldA4 up next to _FieldA2.  But this change dramatically changed what happens when the structure is laid out in memory:

Field Index Field Name Field Size Field Offset
0 _FieldA1 4 0
1 _FieldA2 1 4
3 _FieldA4 1 5
2 _FieldA3 2 6
4 _FieldA5 4 8
5 _FieldA6 4 (8 on 64 bit) 12 (16 on 64 bit)

So on 32bit platforms, by just moving one field, the structure's memory footprint shrunk by 4 bytes!  Note that the size of the data contained in the structure didn't change at all - all that was changed was the packing of the data in the structure.  Also note that the total size of the structure didn't change on 64 bit platforms - this is because _FieldA3 pushes the alignment off on _FieldA6. 

So let's go back to the original question about WIN32_FIND_DATAA.  The structure in winbase.h is:

typedef struct _WIN32_FIND_DATAA {
DWORD dwFileAttributes;
FILETIME ftCreationTime;
   :
How does this get laid out in memory?

Lets run through the exercise above.

A FILETIME is:

typedef struct _FILETIME {
    DWORD dwLowDateTime;
    DWORD dwHighDateTime;
} FILETIME, *PFILETIME, *LPFILETIME;

So, continuing as before:

Field Index Field Name Field Size Field Offset
0 dwFileAttributes 4 0
1 ftCreationTime.dwLowDateTime 4 4
2 ftCreationTime.dwHighDateTime 4 8
:     : : :

And the mystery of my reader is solved - ftCreationTime is aligned at offset 4 because it only has offset 4 member variables.

And finally, lets consider the INTL_DATA structure mentioned above - why did I say that it didn't work?

Field Index Field Name Field Size Field Offset
0 _DateFormat 2 0x0
1 _CurrencySymbol 5 0x2
2 _ThousandsSeparator 2 0x7
3 _DecimalSeparator 2 0x9
4 _DateSeparator 2 11
5 _TimeSeparator 2 13
6 _Padding 1 15
7 _CurrencyPlaces 1 16
8 _TimeFormat 1 17
9 _CaseMapCallAddress 4 20
10 _DataListSeparator 2 24
11 _Reserved 10 26

We're just fine up until we get to the _CaseMapCallAddress field.  That one's supposed to be at offset 18 according to the MS-DOS documentation, but it's at offset 20 in the C structure!  This is because the natural alignment of a far pointer to a function was 32bits even on Win16.

Tomorrow, I'll write about how this was resolved for MS-DOS clients (and refine the packing algorithm mentioned above)

Most of the content in this post was already been posted in GrantRi's blog post Alignment from last summer, it's an excellent reference to the topic.  In addition, Raymond Chen wrote about the FILETIME issue here.

  • I did not complain about the need for #pragma pack, I complained that:

    (1) The MSDN pages misstate the rules and need editing, and

    (2) In the on-line version of MSDN, at least one page contains a broken link which was supposed to go to the #pragma pack page.

    Thank you for writing the correct rules. Now if you read those MSDN pages again you will see that they state incorrect rules, and you know exactly what they should be changed to.

    > I realized after reading this article that
    > many people have forgotten the lessons
    > learned from the early days of MS-DOS.

    Either that or you think some of us forgot lessons from VMS (a predecessor of XP) or Unix (another predecessor of XP). But no, it's not that. If DOS rules were supposed to take precedence over MSDN then MSDN should not state any rules at all, MSDN should tell readers to experiment under DOS in order to figure out what Visual Studio's rules are.

    If Raymond Chen reads this then I ask Mr. Chen to please observe that maybe there are reasons why developers have to rely on undocumented characteristics instead of believing documentation. Sigh.
  • Sorry, I'm a little confused on what the offsets in the structure are used for..

    I mean when you ask for data within a structure the compilers going to point that at the final offset not where you would expect it to be, right?
  • Norman,
    I don't believe that VMS used packed structures, did it? I'm surprised if it did, since it was designed for high level languages (which MS-DOS wasn't). The problem occurs when you use structures designed for assembly language with higher level languages (which are designed around different constraints). Tomorrow I'm planning on writing about why the compiler specifies the alignment rules that it does.

    Manip: The "offset" in the tables above (sorry about the look of them - it's a community server bug and I'm trying to figure out how to work around it - the tables looked fine in the source) is the offset from the start of the structure that will be used by the compiler when it lays the structure out in memory. The thing is that structures don't always show up at the offset you expect them to, because of the hidden padding fields that get added.
  • Whenever you mix two languages (e.g. VB and VC++) or even when you mix two compilers for the same language etc., you have to check quite a lot of things that they do. Alignment is one of many things that need checking. I did have to check what several Microsoft compilers are doing, and also checked MSDN with the expectation of seeing what they're supposed to be doing (which of course led to this discussion).

    'Fraid I don't recall if or what options there were in VMS C to specify alignment. Whether or not there were options, there were surely rules, and anyone who had to link together program components in multiple languages still had to investigate everything that was going on.

    Even in a single compiler environment, if one reads a .bmp file from disk or reads a TCP/IP packet from an ethernet chip, one has to know if the compiler's alignment rules will be compatible or if one has to do memcpy() to get stuff. A few years ago I had to do exactly that fix to colleague's program which read .bmp files from compact flash under Windows CE.
  • Larry - VMS (or DEC , at least) *does* generate packed structures. I've just compiled and linked the following code on VC++7.1 and DEC C 5.6

    #include <stdio.h>

    typedef struct
    {
    char a;
    int b;
    } A;

    int main(int argc, char** argv)
    {
    A x;
    printf("Offset of a in x = %d\n", ((unsigned long)&x.a - (unsigned long)&x));
    printf("Offset of b in x = %d\n", ((unsigned long)&x.b - (unsigned long)&x));
    printf("sizeof(x) = %d\n", sizeof(x));
    }

    On the PC, this prints out:

    Offset of a in x = 0
    Offset of b in x = 4
    sizeof(x) = 8

    On VAX/VMS, it prints out:

    Offset of a in x = 0
    Offset of b in x = 1
    sizeof(x) = 5

    i.e. the struct is packed.
  • Larry,

    I think you're right about VMS. I certainly remember running into alignment issues in one enourmous structure my boss had created, and being utterly baffled by what was going on. I'd copied a block of memory from this structure and was using an offset to access one of the members, but for some reason the value wasn't what I expected. I'd only been programming for a few months and I was convinced that there was a compiler bug somewhere! Ahh the arrogance of youth!

    IIRC we ended up putting a bunch of packing bytes in places to make things more obvious.
  • A little off-topic maybe, but I think the master mind behind the #pragma back should be taken out into the back and be given a good round of beating (a friend of mine suggested he/she should be shot, but I believe in giving everyone a second chance).
    Why on earth would you have a #pragma for packing?!? Packing should be specified separately for each struct not for a whole compilation unit!!!
    Uhm, sorry for the rant, but this post made me recollect some not very fond memories of strange, difficult to find bugs.
  • You'll have to admit that this is one of the things that Pascal (at least what's used in Delphi) got right. If you're using a record in a file, you can just write:

    type
    TMyRecord = packed record
    B: Byte;
    D: LongInt;
    end

    SizeOf(TMyRecord) will now return 5. That's much better than fiddling with #pragmas.
  • VMS avoided alignment issues by ording structures. Sure, VAXC would pack structures, but you would get performance hits with alignment problems on the Alpha processor. The common (and quick) thing was to place your largest members first and the decend in size.

    struct X
    {
    INT32 bob;
    INT32 frank;
    INT16 george;
    INT8 tom;
    INT8 harry;
    };

    Of course, this can break down over time with persistant structures.
  • From the Compaq C user documentation

    4.7 Structure Alignment

    The alignment and size of a structure is affected by the alignment requirements and sizes of the structure components for each Compaq C platform. A structure can begin on any byte boundary and occupy any integral number of bytes. However, individual architectures or operating systems can specify particular alignment and padding requirements.

    Compaq C on VAX processors does not require that structures or structure members be aligned on any particular boundaries.

    The components of a structure are laid out in memory in the order they are declared. The first component has the same address as the entire structure. On VAX processors, each additional component follows its predecessor in the immediately following byte.

    For example, the following type is aligned as shown in Figure 4-1:


    struct {char c1;
    short s1;
    float f;
    char c2;
    }

    Figure 4-1 OpenVMS VAX Structure Alignment

    The alignment of the entire structure can occur on any byte boundary, and no padding is introduced. The float variable f may span longwords, and the short variable s1 may span words.

    The following pragma can be used to force specific alignments:


    #pragma member_alignment

    Structure alignment for Compaq C for OpenVMS Systems on VAX processors is achieved by the default, #pragma nomember_alignment , which causes data structure members to be byte-aligned (with the exception of bit-field members).

    Structure alignment for Compaq C for OpenVMS Systems on Alpha processors is achieved by the default, #pragma member_alignment , which causes data structure members to be naturally aligned. This means that data structure members are aligned on the next boundary appropriate to the type of the member, rather than on the next byte.

    For more information on the #pragma member_alignment preprocessor directive, see Section 5.4.11.

    So there you go.
  • Yesterday, I wrote a bit about how the C compiler determines the alignment of...
  • I hadn't dealt with packing in quite a while, (probably not since the MSDOS days when I used to calculate how big my data files would be), and then one bit me recently - I included somebodies header file that had data structures and there was a #pragma pack(1) in there without a #pragma pack(push) / #pragma pack(pop) around it! Took awhile to chase that one down.
  • Very good! This is the best article that explains the problem of data alignment so clearly.
  • PingBack from http://quickdietsite.info/story.php?id=13395

  • PingBack from http://fancyporchswing.info/story.php?id=2543

Page 1 of 1 (15 items)