UID (unique identification) generation is a really hot topic. It can be really simple as well as really complex. Before going deep in the subject let’s do a couple of simple samples.

To identify a person (citizen) you usually use the SSN (Social Security Number). Isn’t it true? Not at all, because if I don’t consider the domain, the above assertion is completely false. In fact, the SSN is used only in US. So, the SSN is a valid ID for US citizens but not for Chinese, Indians or anyone else.

If I want to identify a record in a table (relational database) I usually use a primary key, which can be simple or composed. But that key is valid only in the domain of the single table. It is pretty guaranteed that we can have another key value in some other table or database.

So, to generate a UID it is really important to try to follow some principles:

  1. The UID must be system wide unique (pretty obvious)
  2. The UID should be human readable
  3. The UID generation must be performing
  4. The UID must be durable
  5. The UID must be strongly typed

The first point is not really important if we are talking about a private system/application. Let’s consider the primary key used in the database for DB optimization purpose. If the key is a private stuff of the database/application, integer IDs works very well. But as soon as they become exposed to the public we start having some problems.

Point 2 is really important for debugging, troubleshooting and human interaction. Think about the IP address, people prefer to use the DNS. Easier and clearer than the IP address.

Point 3 is important for scalability. Some applications need to generate thousands of IDs per second, and we cannot have an algorithm that takes seconds to generate an ID. Think about the ID generated for the lottery transaction system.

Point 4 is important for the history. When I lived in Italy I discovered that the Telecom used to re-assign the same phone number after 6 months it has been discontinued. The result was that I received phone calls for scheduling dentist appointments. Reusing an ID is not a good think

Point 5 is important for developers, to guarantee the ID is correctly implemented and used. If you consider an ID as a string, who can guarantee you are not generating a numeric ID (converted to string) and on the other side expecting that ID to be a GUID? Only at runt-time you can discover such problems.

There are good sample of IDs that can match most of the above points, but it is less common to find ID generation that satisfy all of them. For example GUIDs (Global Unique Identifier) satisfy at least point 1, 3, 4 and 5. But they are not human readable (at least to me).

The Xml Schema ID can satisfy quite easily points 2 and 3, but we need to provide some wrapper facility to make them compatible to 1, 4 and 5. In fact the XML Schema specification requires having the ID unique at the document level (the domain is the document).

Integers, which are widely used in databases, satisfy only point 3.

Strings are widely used in account databases, to identify people (i.e. their SSN), or companies, etc. In fact the string can satisfy all points except 5. 

Personally, I really like to idea of composite ID, where the compounds are the domain and the domain based ID. In C# we can use generics to manage IDs as I did in the following (untested) class:

public class Id<TId> : IEquatable<Id<TId>>
{
    public Id(string domain, TId id)
        : this(domain, id, "/")
    {
    }

    public Id(string domain, TId id, string separator)
    {
        if (string.IsNullOrEmpty(domain))
            throw new ArgumentNullException("domain");
        if (id == null)
            throw new ArgumentNullException("id");
        if (string.IsNullOrEmpty(separator))
            throw new ArgumentNullException("separator");

        this.domain = domain;
        this.id = id;
        this.separator = separator;
    }

    public string Domain
    {
        get { return domain; }
    }

    public TId DomainId
    {
        get { return id; }
    }

    public string FullId
    {
        get { return this.ToString(); }
    }

    public override int GetHashCode()
    {
        return domain.GetHashCode() ^ id.GetHashCode();
    }

    public override bool Equals(object obj)
    {
        if (!(obj is Id<TId>))
            throw new ArgumentException("obj");

        return Equals((Id<TId>)obj);
    }

    public bool Equals(Id<TId> other)
    {
        return other.domain.Equals(this.domain, StringComparison.OrdinalIgnoreCase) &&
            other.id.Equals(this.id);
    }

    public override string ToString()
    {
        return string.Concat(domain, separator, id);
    }

    public static bool operator ==(Id<TId> x, Id<TId> y)
    {
        return x.Equals(y);
    }

    public static bool operator !=(Id<TId> x, Id<TId> y)
    {
        return !x.Equals(y);
    }

    private string domain;
    private TId id;
    private string separator;
}

In this way we can quite easly manage simple Ids (GUID, int, string) as well as composed Ids (all you need to do is to override ToString, Equals and GetHashCode).