One can really never get enough of puns about the BOM (Byte Order Mark) and TSA.

And when I say one, I mean I. :-)

Just think back to blogs like Don't sneak a BOM in on someone who promises to ignore free space or Everyone seems averse to the BOM these days; Should we blame TSA? :-) or How to get yourself imprisoned [by/for talking about Unicode].

See what I mean?

I was reminded of this when Pritam asked:

Is there any tool or code available to verify Byte Order Mark signature in XML files?

Of course sniffing out a few bytes is easy enough. Abhinaba provided the full chart of valid BOM values:

Bytes

Encoding Form

00 00 FE FF

UTF-32, big-endian

FF FE 00 00

UTF-32, little-endian

FE FF

UTF-16, big-endian

FF FE

UTF-16, little-endian

EF BB BF

UTF-8

Easy, right?

Okay, anyone want to make a try at writing the minimal code BOM detector?

Think of it as a way to play your part in airport security!

Points awarded for clearest, or for most concise, or for briefest, or for most clever, or for the sake of maintainability, most smart.

If you can write something able to handle other, non-standard byte orderings of data, then you probably went to Cal Tech! :-)


This post brought to you by U+feff, aka ZERO WIDTH NO-BREAK SPACE)