Wednesday, January 07, 2009 7:01 AM
Michael S. Kaplan
Someone please detect if there's a BOM before the plane takes off!
One can really never get enough of puns about the BOM (Byte Order Mark) and TSA.
And when I say one, I mean I. :-)
Just think back to blogs like Don't sneak a BOM in on someone who promises to ignore free space or Everyone seems averse to the BOM these days; Should we blame TSA? :-) or How to get yourself imprisoned [by/for talking about Unicode].
See what I mean?
I was reminded of this when Pritam asked:
Is there any tool or code available to verify Byte Order Mark signature in XML files?
Of course sniffing out a few bytes is easy enough. Abhinaba provided the full chart of valid BOM values:
|
Bytes |
Encoding
Form |
|
00 00
FE FF |
UTF-32,
big-endian |
|
FF FE
00 00 |
UTF-32,
little-endian |
|
FE
FF |
UTF-16,
big-endian |
|
FF
FE |
UTF-16,
little-endian |
|
EF BB
BF |
UTF-8 |
Easy, right?
Okay, anyone want to make a try at writing the minimal code BOM detector?
Think of it as a way to play your part in airport security!
Points awarded for clearest, or for most concise, or for briefest, or for most clever, or for the sake of maintainability, most smart.
If you can write something able to handle other, non-standard byte orderings of data, then you probably went to Cal Tech! :-)
This post brought to you by U+feff, aka
ZERO WIDTH NO-BREAK SPACE)