[wp-hackers] is_email

Dougal Campbell dougal at gunters.org
Fri Mar 25 17:00:08 GMT 2005


Nikolay Bachiyski wrote:
> Hello,
> 
> Here is the regular expression found in the is_email() function:
> 
> $chars = "/^([a-z0-9+_]|\\-|\\.)+@(([a-z0-9_]|\\-)+\\.)+[a-z]{2,6}\$/i";
> 
> Some questions arose when I was looking at it:
> - why is it possible to have a '+' in the username
> - why is it possible to have '-' in both the username and the host

Because it's possible to have those characters in username and hostname 
portions of email addresses. In particular, it's long been a convention 
that many mail servers allow addresses like "user+whatever at example.com", 
and automagically alias it to "user at example.com". This allows you to 
generate your own dynamic aliases. It's useful for tracking who's 
sharing your address. I often use that trick when supplying registration 
information. For example, if I registered my email address with the New 
York Times as "dougal+nytimes at gunters.org", then if I get spam to that 
address later, I know that the Times shared my address (and I can 
blackhole further email to that address if I want).

And '-' has always been a valid character for domains/hosts and in 
usernames for most systems.

> - is there any difference between ([a-z0-9_]|\\-|\\.)+ and [a-z0-9_\-.]+
> - isn't it better to put it into single quotes and save some backslash 
> escaping
> 
> Here is a suggestion:
> 
> $email_regex = '/^[a-z0-9_\-.]+\@([a-z0-9\-]{1,255}\.)+[a-z]{2,6}$/i';

I'm not sure why '.' was separated out, but I've seen people put '-' 
outside of a character class due to bugs in some regex implementations. 
I can't remember if PHP had that bug or not. If not, then I think your 
suggestion would be a good one.

-- 
Dougal Campbell <dougal at gunters.org>
http://dougal.gunters.org/



More information about the wp-hackers mailing list