Larry Osterman's WebLog

Confessions of an Old Fogey
Blog - Title

What's wrong with this code, part 8 - Email Address Validation

What's wrong with this code, part 8 - Email Address Validation

  • Comments 23

It's time for another "What's wrong with this code".

Today's example is really simple, and hopefully easy.  It's a snippet of code I picked up from the net that's intended to validate an email address (useful for helping to avoid SQL injection attacks, for example).

 

    /// <summary>
    /// Validate an email address provided by the caller.
    ///
    /// Taken from http://www.codeproject.com/aspnet/Valid_Email_Addresses.asp
    /// </summary>
    public static bool ValidateEmailAddress(string emailAddress)
    {
        string strRegex = @"^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}" +
                          @"\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\" +
                          @".)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$";
        System.Text.RegularExpressions.Regex re = new System.Text.RegularExpressions.Regex(strRegex);
        if (re.IsMatch(emailAddress))
            return (true);
        else
        return (false);
    }

As always, my next post (Monday) will include the answers and kudos to all those who got it right (and who f

  • fast read: will it work with foo@bar.bin.domain.org

    looks like it only handles
    foo@domain.org format.

    but thats just reading the regex quickly, did I miss it or get it??
  • I've read in the O'Reilly regular expression book (the one with the owl) that it is impossible to do e-mail address validation and if you want to do anyway, they printed a full-one-page regular expression.
  • This routine considers foo@bar.bin.domain.org correct (I just checked with the test program I wrote).

  • You're using a Hungarian-style coding standard (strRegex); the if...else is redundant - just return the result of IsMatch; perhaps the regex should be a constant (debatable).
  • The first thing I noticed is that the regex doesn't handle apostrophes (maybe some other characters) in the part that comes before the @ (this is valid per the RFC 2822).

    The following bugs me (I see it often):
    if (re.IsMatch(emailAddress))
    return true;
    else
    return false;

    Why not just:
    return re.IsMatch(emailAddress);
  • Bingo! Uwe nailed it.

    Adam, the code in question is literally cut&pasted from where I found it. And using hungarian isn't inherently a programming error, it's a style thing :)
  • It doesn't check to see if emailAddress is null. I imagine that'd be a problem.
  • The above example is attempting to do some IP address validation, but failing miserably for things like:

    foo@456.456.456.456

    It would also have problems with its poor assumptions about the TLD length:

    info@about.museum
  • Sometimes close enough is good enough.

    I can't believe philoserf the idealist just said that. I must be getting worn down.
  • Aaron,
    In this case, I think the up-to-3 character part of the regex isn't an attempt to catch IP addresses, but instead to catch foo@bar.co.uk.

    But you're right - the .museum and .info domains will have huge issues here.
  • .info is actually OK with this RE, as is .name. .museum wouldn't be, however. Also, apparently, noc@to, for example, is a valid address for people at the management of certian cTLDs. Also, + is a valid and fairly common character for the before-the-at-sign portion.

    Oh, and that split regex is amazingly ugly -- it's much better to split on something closer to atom boundries (for some reasonable level of "atom"), even though not all chunks are the same length.
  • Hmmm, I thought this was 'whats wrong with the *code*'.

    And it is possible to do 100% accurate email address validation, you send an email to the address and require the customer to click a link. Just about all message boards do this nowadays.

  • Larry,

    <snip>@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|<snip>$

    The {1,3} portions following the [0-9] is searching for IP address patterns. The {2,4} after the [a-zA-Z] pattern is looking for 2, 3, or 4 character TLDs. There are three copies of this [0-9] junk, and then the trailing pattern allows for the final quad.

    It's the ([a-zA-Z0-9\-]+\.)+ pattern that allows for any number (one or more) for domain and subdomain purposes.

    Great resource for testing .NET and client-side RegEx at http://www.regxlib.com. They also list 30 user-submitted variations on email validation patterns.
  • Tad had my comment. A RegEx is good for part of an e-mail validation, but isn't sufficient on its own.
  • It was a pain in the ass to do, but I actually wrote an e-mail address format verifier in JavaScript once. I essentially did a line by line translation of the BNF in RFC822 to JavaScript strings containing equivalent regular expressions. I then matched whatever I wanted to check against this regular expression. What is a pane is that I have to use the backslash character to escape a character in the regular expression and because the regular expression is in a string, I have to escape every backslash. There is definitely a lot more to a correct e-mail address than what that example script uses. You can have extended characters for example.
Page 1 of 2 (23 items) 12