Thoughts on bit fields.

Published 11 January 05 09:09 AM

In C there is a long tradition of using bit fields to store a collection of boolean values:

 

    enum

    {

        TF_KEYWORD          = 0x0001,

        TF_MEMBER           = 0x0002,

        TF_IDENTIFIER       = 0x0004,

        TF_STRINGLITERAL    = 0x0008,

        TF_CHARACTERLITERAL = 0x0010,

    } TOKENFLAGS;

 

    DWORD dwTokenFlags = TF_KEYWORD | TF_MEMBER;

 

This tradition has been carried over to C#, which is my area of interest.  The support for bit fields in C# is a little different than C++, but my opinion of them hasn’t changed.

 

Upsides:

 

Potetential performance improvements due to reduced memory consumption.  If 'bool' takes 8 bits, and a bit flag takes one, the 8:1 memory saves could concievably have a significant effect on performance. Of course, without real measurements you don't really know.

 

Downsides:

 

They're hard to define correctly.  For each value, I have to write a definition that gives it a single, unique bit.  If I forget that I'm writing in hex, I get a difficult to diagnose bug.

 

    [Flags]

    enum TokenFlags

    {

        Keyword               = 0x0000,

        Member                = 0x0001,

        Identifier            = 0x0002,

        StringLiteral         = 0x0004,

        CharacterLiteral      = 0x0008,

        VerbatimStringLiteral = 0x0016,

    }

 

Did you notice that two of these items are broken?

 

It's hard to add new ones.  If I want to insert a new value in the middle, I have to renumber all the remaining items.

 

In the debugger it's hard to know what the values are.

 

 

The code to manipulate them is hard to write correctly and hard to read.

 

    // this means "if either one is set"

    if (dwTokenFlags & (TF_CHARACTERLITERAL | TF_IDENTIFIER))                                            

    // this means "if both are set"

    if (dwTokenFlags & (TF_CHARACTERLITERAL | TF_IDENTIFIER) == (TF_CHARACTERLITERAL | TF_IDENTIFIER))   

    // unset this flag

    dwTokenFlags &= ~TF_CHARACTERLITERAL;  

 

 

If there are rules about which flag combinations are legal, it's hard to enforce.

 

You can't add a constructor to initalize the flags to a default.

 

You can't add useful methods for checking aggregate combinations of flags.   

 

It's tempting to add unrelated flags in the same enum, and then hard to split them out later.

 

It's tempting to use a single bit for multiple meanings in different contexts.

 

    enum NODEFLAGS

    {

        // Only valid when the node is a MODIFIERNODE

        NF_MOD_ABSTRACT         = 0x0001,

        NF_MOD_NEW              = 0x0002,

        NF_MOD_OVERRIDE         = 0x0004,

        NF_MOD_PRIVATE          = 0x0008,

        NF_MOD_PROTECTED        = 0x0010,

        NF_MOD_INTERNAL         = 0x0020,

        NF_MOD_PUBLIC           = 0x0040,

 

        // Only valid when the node is a STATEMENTNODE

        NF_UNCHECKED            = 0x0001,

        NF_GOTO_CASE            = 0x0002,

        NF_TRY_CATCH            = 0x0004,

        NF_TRY_FINALLY          = 0x0008,

        NF_CONST_DECL           = 0x0010,

    };

 

If you get more than 32 of them, you’re out of luck.  We have a case in our code today with 31 distinct values.  We’re worried that we may need to add 2 soon…. Yikes!

 

It’s easy to mix up two types of flags:

 

    // Type flags

    DWORD dwFlags = TF_CLASS | TF_PUBLIC;

 

    // Member flags

    DWORD dwMember = MF_METHOD | MF_PRIVATE;

 

    if (dwMember | TF_PUBLIC)   // OOPS!

 

Potential for improved performance due to better memory alignment.  Every time you access a bit flag, you have to execute multiple CPU instructions. 'bool' takes fewer instructions. This also means your code is smaller.  Of course, without real measurements you don't really know.

 

 

A simple alternative: a class containing public 'bool' fields.

 

        class TokenFlags

        {

            public bool Keyword;

            public bool Member;

            public bool Identifier;

            public bool StringLiteral;

            public bool Characterliteral;

        }

 

        TokenFlags tf = ...;

        if (tf.Characterliteral || tf.Identifier)

 

This is easy to write and easy to understand.  It solves most of the problems I listed above.

 

 

A hybrid approach: Encapsulated bit fields.

 

Hey, this is 2005, I should think about encapsulation.  If I do some performance analysis & find that my application is too slow because of memory consumed by bools in Flags types that look like the one I showed above, it’s easy to convert:

 

       class TokenFlags

        {

            enum Indexes

            {

                Keyword,

                Member,

                Identifier,

                StringLiteral,

                Characterliteral,

            }

 

            BitArray _bits = new BitArray(Enum.GetValues(Indexes).Length);

 

            public bool Keyword

            {

                get { return this._bits[Indexes.Keyword]; }

                set { this._bits[Indexes.Keyword] = value; }

            }

 

            public bool Member

            {

                get { return this._bits[Indexes.Member]; }

                set { this._bits[Indexes.Member] = value; }

            }

 

            public bool Identifier

            {

                get { return this._bits[Indexes.Identifier]; }

                set { this._bits[Indexes.Identifier] = value; }

            }

 

            public bool StringLiteral

            {

                get { return this._bits[Indexes.StringLiteral]; }

                set { this._bits[Indexes.StringLiteral] = value; }

            }

 

            public bool Characterliteral

            {

                get { return this._bits[Indexes.Characterliteral]; }

                set { this._bits[Indexes.Characterliteral] = value; }

            }

 

Notice that this is source compatible with the ‘public bool’ approach, except if you pass one of the members with ‘ref’.  If you’re worried about that, or binary compatibility, then use properties instead of public fields.  But you know how I am about properties.

 

 

 

What about OO?

 

Good question.  It’s taken me this far to consider OO. 

 

It depends on the nature of the problem domain that you’re trying to work in, but there is surely an OO way, right?

 

For example, suppose we did:

 

        interface IToken { }

        interface KeywordToken : IToken { }

        interface MemberToken : IToken { }

        interface IdentifierToken : IToken { }

        interface StringLiteralToken : IToken { }

        interface CharacterLiteralToken : IToken { }

 

Well, I can write:

 

        IToken token = ...;

        if (token is KeywordToken)

           F();

 

But now I can start taking code that was on the clients of the flags, and move it on to the tokens.  

 

Before:

 

            if (token is KeywordToken)

            {

                FooForKeyword()

            }

            else

            {

                FooForOthers()

            }

 

After:

 

        interface IToken

        {

            void Foo();

        }

 

        token.Foo();

 

 

Comments

# Jerry Pisk said on January 11, 2005 10:07 AM:
You can have more than 32 flags, just define the enum with long (or UInt64) as its base type.
# Jeremy said on January 11, 2005 11:22 AM:
I always used to number my bit field constants as
1<<0
1<<1
1<<2
etc. so I could avoid as much thinking as possible. :)
# RichB said on January 11, 2005 11:33 PM:
> In the debugger it's hard to know what the values are.

That's the debugger's fault. The debugger should have a built in visualizer for a Flags Enum.

> If you get more than 32 of them, you’re out of luck


[Flags]
enum x : long {a = 0, b = 1099511627776}

compiles fine for me.

> A simple alternative...

One of the major benefits of an enum is the ability to reflect over it. For example, I have a "privs" system which can have any number of privs from A-Z. The set of privs is XmlSerialized to disk like:
<privs>A C X Y Z</privs>
and is easily XmlDeserialized back again. New privilege assignments can be checked against the allowable set of privs (which are defined in an flags enum).
Obviously, it wouldn't be too much pain to hand code some of this validation and serialization based on a custom class, but I get all this for free. And free means fewer bugs in my code.

However, I agree that a Flags enum can be error-prone in trivial ways. I wish there was more compiler support for them - like
public flagsenum level {None, morning, afternoon }
level myLevel=level.morning+level.afternoon;
myLevel-=level.afternoon;
# Anon said on January 12, 2005 3:35 PM:
>> Did you notice that two of these items are broken?

Can you explain that more? Since I *didn't* notice it...
# jaybaz [MS] said on January 12, 2005 4:11 PM:
Anon: The first item was 0 (should start at 1) and the last item was 0x16 (should go to 0x10, then 0x20).
# Kent Boogaart said on January 16, 2005 6:38 PM:
I often wondered why the compiler couldn't give more special treatment to enumerations marked with FlagsAttribute. Why can't it automatically assign 2^n to each enumeration entry's value?

And why can't the Enum class have special methods to deal with flag enumerations? For example, Enum.IsSet(typeof(TokenFlags), TokenFlags.Keyword)
New Comments to this post are disabled

This Blog

Syndication

Page view tracker