Part 1 – Part 2
The first order of business is what URL shortening approach should be used to take some very long URL, which in IE7 is limited to 2,083 characters (KB208427) and provide a nice compact link.
The first part of the link (protocol + server + port) is generally controlled by what domain name you can get – for me, my little demo is http://MyMiniUrl.net. The rest of the URL – the path is something in your control.
Doing a search I came across a few approaches but settled on a Base 62 approach that uses a sequence generation maintained in the persistence tier. The idea is to generate a unique sequential number, then convert that numeric to a Base 62 representation. For the sequence generation, I just relied on the DB layer (MySql or MS SQL) to generate this from an identity column.
Once that identity value is generated, it gets run through a Base 62 conversion to a string representation. That string along with the identity (from the DB), the full Long URL, and a cryptographic hash of the Long URL is stored in the DB. The basic DB table schema is as follows (for MySQL):
Shortened “path” of the URL
Crypto hash of Full URL using MD5
Full URL provided
The indices are as follows:
So, when the URL service is asked to create a short URL, it first checks to see if the URL was already generated. To do that the UrlService uses the basic pattern:
Now, for the Base 62 algorithm, I looked around at a few approaches, and Chris had a good post on various approaches as well – Friendly Unique Id generation.
Starting with his code, I also found another approach located here, then finalized on the following:
static int baseNum = 62;
private static readonly String baseDigits = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
public static string Base62ToString(long fromValue)
string toValue = fromValue == 0 ? "0" : "";
int mod = 0;
while (fromValue != 0)
mod = (int)(fromValue % baseNum); //should be safe
toValue = baseDigits.Substring(mod, 1) + toValue;
fromValue = fromValue / baseNum;
The basic approach is to loop through the source value, grab the remainder, convert that remainder to Base 62 and append it to a string return value, until there’s nothing left.
So, in terms of “string” vs. StringBuilder performance, I also tried using StringBuilder in place of string concatenation, but performance, in a loop of a billion iterations was far better (about 60%) just with simple string concatenation. Now, I’m not too concerned with garbage collection at this point, I just wanted something quick and efficient – and in the end correct.