Holy cow, I wrote a book!
A customer needed to generate an 8-byte unique value,
and their initial idea was to generate a GUID
and throw away the second half, keeping the first eight bytes.
They wanted to know if this was a good idea.
No, it's not a good idea.
The GUID generation algorithm
relies on the fact that it has all 16 bytes to use to establish
and if you throw away half of it,
you lose the uniqueness.
There are multiple GUID generation algorithms,
but I'll pick one of them for concreteness,
specifically the version described in
this Internet draft.
The first 60 bits of the GUID encode a timestamp,
the precise format of which is not important.
The next four bits are always 0001, which identify
that this GUID was generated by "algorithm 1".
The version field is necessary to ensure that two GUID generation
algorithms do not accidentally generate the same GUID.
The algorithms are designed so that a particular algorithm
doesn't generate the same GUID twice, but without a version field,
there would be no way to ensure that some other algorithm wouldn't
generate the same GUID by some systematic collision.
The next 14 bits are "emergency uniquifier bits"; we'll look
at them later, because they are the ones that fine tune the
The next two bits are reserved and fixed at 01.
The last 48 bits are the unique address of the computer's network card.
If the computer does not have a network card,
set the top bit and use a random number generator for the other 47.
No valid network card will have the top bit set in its address,
so there is no possibility that a GUID generated from a computer
without a network card will accidentally collide with a GUID generated
from a computer with a network card.
Once you take it apart, the bits of the GUID break down like this:
for a total of 128 bits.
The goal of this algorithm is to use the combination of time and location
("space-time coordinates" for the relativity geeks out there)
as the uniqueness key.
However, timekeeping is not perfect,
so there's a possibility that, for example,
two GUIDs are generated in rapid succession from the same machine,
so close to each other in time that the timestamp would be the same.
That's where the uniquifier comes in.
When time appears to have stood still (if two requests for a GUID are
made in rapid succession)
or gone backward
(if the system clock is set to a new time earlier than what it was),
the uniquifier is incremented so that GUIDs generated from the
"second time it was five o'clock" don't collide with those generated
"the first time it was five o'clock".
Once you see how it all works, it's clear that you can't just
throw away part of the GUID since all the parts (well, except
for the fixed parts) work together to establish the uniqueness.
If you take any of the three parts away, the algorithm falls apart.
In particular, keeping just the first eight bytes (64 bits)
gives you the timestamp and four constant bits;
in other words, all you have is a timestamp, not a GUID.
Since it's just a timestamp, you can have collisions.
If two computers generate one of these "truncated GUIDs"
at the same time, they will generate the same result.
Or if the system clock goes backward in time due to a clock reset,
you'll start regenerating GUIDs that you had generated
the first time it was that time.
Upon further investigation, the customer really didn't need
The value merely had to be unique among a cluster of a half
Once you understand why the GUID generation algorithm works,
you can reimplement it on a smaller scale:
We can reduce the number of bits to make the computer unique
since the number of computers in the cluster is bounded,
and we can reduce the number of bits in the timestamp by
assuming that the program won't be in service 200 years from now,
or that if it is, the items that were using these unique values
are no longer relevant.
At 100 nanoseconds per tick, 2^56 ticks will take 228 years to elapse.
(Extending the range beyond 228 years is left as an exercise,
but it's wasted effort, because you're going to hit the
16-computer limit first!)
You can get away with a four-bit uniquifier by assuming that
the clock won't drift more than an hour out of skew (say)
and that the clock won't reset more than sixteen times per hour.
Since you're running under a controlled environment,
you can make this one of the rules for running your computing cluster.