The Interlocked­Push­Entry­SList function stipulates that all list items must be aligned on a MEMORY_ALLOCATION_ALIGNMENT boundary. For 32-bit Windows, MEMORY_ALLOCATION_ALIGNMENT is 8, but the SLIST_ENTRY structure itself does not have a DECLSPEC_ALIGN(8) attribute. Even more confusingly, the documentation for SLIST_ENTRY says that the 64-bit structure needs to be 16-byte aligned but says nothing about the 32-bit structure. So what are the memory alignment requirements for a 32-bit SLIST_ENTRY, 8 or 4?

It's 8. No, 4. No wait, it's both.

Officially, the alignment requirement is 8. Earlier versions of the header file did not stipulate 8-byte alignment, and changing the declaration would have resulted in existing structures which (inadvertently) misaligned the field changing size and layout when the new requirement was imposed. So the 32-bit structure was sort-of grandfathered in. You should still align it on 8-byte boundaries, but the header file doesn't enforce it to avoid breaking existing code.

Fortunately, when the 64-bit version was introduced, the proper alignment directive was introduced right off the bat. How about that: sometimes Microsoft learns from its mistakes after all.

Why are the alignment requirements greater than the natural word size? To avoid the ABA problem. A standard workaround for the ABA problem is to append additional information (a "tag") to the pointer so that when the value changes from B back to A, the tag ensures that the second A still looks different from the first one. Many CPU architectures have a "double-pointer-sized atomic compare-and-swap" instruction, and some of them have the additional requirement that the double-pointer needs to be on a double-pointer boundary (8 bytes for 32-bit pointers and 16 bytes for 64-bit pointers).

"But wait, the double-pointer compare-and-swap is used on the SLIST_HEADER, not on the SLIST_ENTRY. Why does the SLIST_ENTRY need to be double-pointer aligned, too?"

While it's true that many CPU architectures have a "double-pointer-sized atomic compare-and-swap" instruction, some support only a "pointer-sized atomic compare-and-swap". For example, the original AMD64 architecture did not have a CMPXCHG16B instruction; the largest data size for an atomic compare-and-swap was 8 bytes. As a result, the Slist functions need to pack a 64-bit pointer, a list depth, and tag information into a single 64-bit value. One of the tricks they used was imposing a memory alignment of 16 bytes. This freed up four bits in the pointer for use as a tag.