Re: [OxLUG] Email address validation

Top Page
Delete this message
Reply to this message
Author: Chris Wareham
Date:  
To: Oxfordshire Linux User Group Discussion List
Subject: Re: [OxLUG] Email address validation
Tom Walker wrote:
> -------- Original-Nachricht --------
>> There's only one way to do email validation properly:
>>
>> 1. Read the RFC
>> 2. Implement it.
>>
>
> There is another way...
>
> 1. Be aware of the RFC
> 2. Find someone else who has already done it ;-)
> 3. Judge it's quality.
>
> http://commons.apache.org/validator/apidocs/org/apache/commons/validator/routines/package-summary.html#other.email
>
> ( http://svn.apache.org/viewvc/commons/proper/validator/trunk/src/main/java/org/apache/commons/validator/EmailValidator.java?revision=658832&view=markup )
>
>
> I would guess that the apache commons implementation is fairly widely used, so has hopefully had lots of testing and bug fixes. Always better than maintaining your own IMO.
>
> Regards,
> Tom


Thanks for the links, although the example you pointed to is actually
superseded by the following one in the routines package:

http://svn.apache.org/viewvc/commons/proper/validator/trunk/src/main/java/org/apache/commons/validator/routines/EmailValidator.java?view=markup

However, it incorrectly rejects user@tld, doesn't cope with folding
whitespace in usernames or parentherised comments in a non-IP address
domains, uses superfluous "^$" anchors (the matches() method
automatically matches the whole string) and doesn't pre-compile all
patterns, meanwhile DomainValidator hard codes a list of TLDs that would
be better off in a properties file and uses the contains() method of
List which is almost inevitably O(n) - a HashSet would be O(1).

I've attached a "condensed" version that copes with user@tld, but
removes the TLD validation. I'll also submit a patch or two to the
Commons project, as overall it's a neat little library.

Chris

Chris Wareham
Senior Software Engineer
Visit London Ltd
6th floor,
2 More London Riverside, London SE1 2RR

Tel: +44 (0)20 7234 5848
Fax: +44 (0)20 7234 5753


www.visitlondon.com








2008 Visit London Awards Save the Date - Thursday 27th November at the spectacular Royal Albert Hall


Please don't print this e-mail unless you really need to.

'Visit London Limited' is registered in England under No.761149;
Registered Office: Visit London, 2 More London Riverside, London SE1 2RR.

Visit London is the official visitor organisation for London. Visit London is partly funded by Partnership, the Mayor's London Development Agency and London Councils.
The information contained in this e-mail is confidential and intended for the named recipient(s) only. If you have received it in error, please notify the sender immediately and then delete the message. If you are not the intended recipient, you must not use, disclose, copy or distribute this email. The views expressed in this e-mail are those of the individual and not of Visit London. We reserve the right to read and monitor any email or attachment entering or leaving our systems without prior notice.
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package com.visitlondon.util;

import java.io.Serializable;
import java.util.Map;
import java.util.LinkedHashMap;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
* Utility methods for validating email addresses.
*/
public final class EmailAddresses2 {
private static final String ASCII_REGEX = "\\p{ASCII}+";
private static final String EMAIL_REGEX = "\\s*?(.+)@(.+?)\\s*";

private static final String DOMAIN_IP_REGEX = "\\[(\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3})\\]";
private static final String DOMAIN_SUB_REGEX = "\\p{Alnum}(?>[\\p{Alnum}-]*\\p{Alnum})*";
private static final String DOMAIN_TLD_REGEX = "\\p{Alpha}{2,6}";
private static final String DOMAIN_NAME_REGEX = "(?:" + DOMAIN_SUB_REGEX + "\\.)*" + "(" + DOMAIN_TLD_REGEX + ")";

private static final String SPECIAL_CHARS = "\\p{Cntrl}\\(\\)<>@,;:'\\\\\\\"\\.\\[\\]";
private static final String VALID_CHARS = "[^\\s" + SPECIAL_CHARS + "]";
private static final String QUOTED_USER = "(\"[^\"]*\")";
private static final String WORD = "((" + VALID_CHARS + "|')+|" + QUOTED_USER + ")";
private static final String USER_REGEX = "\\s*" + WORD + "(\\." + WORD + ")*";

private static final Pattern ASCII_PATTERN = Pattern.compile(ASCII_REGEX);
private static final Pattern EMAIL_PATTERN = Pattern.compile(EMAIL_REGEX);
private static final Pattern DOMAIN_IP_PATTERN = Pattern.compile(DOMAIN_IP_REGEX);
private static final Pattern DOMAIN_NAME_PATTERN = Pattern.compile(DOMAIN_NAME_REGEX);
private static final Pattern USER_PATTERN = Pattern.compile(USER_REGEX);

/**
* Utility class - no public constructor.
*/
private EmailAddresses2() {
// empty
}

/**
* Return whether a string is a valid email address. A valid email address
* must conform to the rules for remote addresses as specified in RFC 2822.
*
* @param emailAddress the string to validate
* @return whether the string is a valid email address
*/
public static boolean isEmailAddress(final String emailAddress) {
if (emailAddress.endsWith(".")) {
return false;
}

if (!ASCII_PATTERN.matcher(emailAddress).matches()) {
return false;
}

Matcher matcher = EMAIL_PATTERN.matcher(emailAddress);

if (!matcher.matches()) {
return false;
}

if (!isValidUser(matcher.group(1))) {
return false;
}

if (!isValidDomain(matcher.group(2))) {
return false;
}

return true;
}

/**
* Return whether the domain component of an email address is valid.
*
* @param domain the domain to validate
* @return whether the domain is valid
*/
private static boolean isValidDomain(final String domain) {
Matcher matcher = DOMAIN_IP_PATTERN.matcher(domain);

if (!matcher.matches()) {
return DOMAIN_NAME_PATTERN.matcher(domain).matches();
}

for (int i = 1; i < 5; ++i) {
String ipSegment = matcher.group(i);

int iIpSegment = 0;

try {
iIpSegment = Integer.parseInt(ipSegment);
} catch(NumberFormatException exception) {
return false;
}

if (iIpSegment > 255) {
return false;
}
}

return true;
}

/**
* Return whether the user component of an email address is valid.
*
* @param user the user to validate
* @return whether the user is valid
*/
private static boolean isValidUser(final String user) {
return USER_PATTERN.matcher(user).matches();
}

private static final Map<String, Boolean> TESTS = new LinkedHashMap<String, Boolean>();

static {
TESTS.put("name.lastname@???" , true);
TESTS.put(".@" , false);
TESTS.put("a@b" , false);
TESTS.put("@bar.com" , false);
TESTS.put("@@bar.com" , false);
TESTS.put("a@???" , true);
TESTS.put("aaa.com" , false);
TESTS.put("aaa@.com" , false);
TESTS.put("aaa@.123" , false);
TESTS.put("aaa@[123.123.123.123]" , true);
TESTS.put("aaa@[123.123.123.123]a" , false); // extra data outside IP address
TESTS.put("aaa@[123.123.123.333]" , false); // not a valid IP address
TESTS.put("a@???." , false);
TESTS.put("a@bar" , true); // true as long as bar is a TLD
TESTS.put("a-b@???" , true);
TESTS.put(".@b.com" , true); // it does work, honest...
TESTS.put("+@b.c" , false); // min 2 char TLD
TESTS.put("+@b.com" , true);
TESTS.put("a@-b.com" , false);
TESTS.put("a@b-.com" , false);
TESTS.put("-@..com" , false);
TESTS.put("-@a..com" , false);
TESTS.put("a@???" , true);
TESTS.put(" a@??? " , true);
TESTS.put("\"hello my name is\"@stutter.com" , true);
TESTS.put("\"Test \\\"Fail\\\" Ing\"@foo.com" , true);
TESTS.put("valid@???" , true);
TESTS.put("invalid@???-" , false);
TESTS.put("shaitan@???" , false); // tld way too long
TESTS.put("test@...........com" , false); // ......
TESTS.put("foobar@???" , false); // ip need to be []
TESTS.put("\"Abc\\@def\"@foo.com" , true);
TESTS.put("\"Fred Bloggs\"@foo.com" , true);
TESTS.put("\"Joe\\\\Blow\"@foo.com" , true);
TESTS.put("\"Abc@def\"@foo.com" , true);
TESTS.put("customer/department=shipping@???" , true);
TESTS.put("$A12345@???" , true);
TESTS.put("!def!xyz%abc@???" , true);
TESTS.put("_somename@???" , true);
TESTS.put("Test \\\n Folding \\\n Whitespace@???" , true);
TESTS.put("HM2Kinsists@(that comments are)allowed.ok" , true);
TESTS.put("user%uucp!path@???" , true);
};

public static void main(final String[] args) {
for (String ea : TESTS.keySet()) {
boolean er = TESTS.get(ea);
boolean ar = isEmailAddress(ea.trim());
System.out.println("Email address '" + ea + "' tested " + ar + (er == ar ? "" : " FAILED"));
}
}
}