Python正则表达式电子邮件地址

3

我有一组电子邮件地址,想确认它们是否是适用于GMail的有效电子邮件地址。

可能的电子邮件地址:

"admin@gmail.com"
"john.smith@googlemail.com"
"john5.a.smith@gmail.com"
"jane_doe@googlemail.com"
"patrick.o'reilly@gmail.com" 

然而,以下内容将不是有效的电子邮件地址。
".admin@gmail.com"
"postmaster.@gmail.com"

我所拥有的是以a-z或0-9开头的字符串,后跟零个或多个特殊字符。

re.search("^[a-z0-9]+[\.'\-]*[a-z0-9]+@(gmail|googlemail)\.com$", s)

但是它在以下位置出现故障:
"john5.a.smith@gmail.com"

http://emailregex.com/ - tzaman
你不想让第一部分成为[a-z0-9].*[a-z0-9]吗(也要考虑大小写)?请注意,使用正则表达式验证电子邮件地址存在许多问题:https://dev59.com/uHVC5IYBdhLWcg3wtzut - jonrsharpe
目前大小写不重要。[a-z0-9].*[a-z0-9] 在“.admin@gmail.com”上失败。Gmail链接在“admin@gmail.com”上失败。 - Eamonn
使用 ((^[a-z0-9])+([a-z0-9-.][a-z0-9]))+@(gmail|googlemail).com$ 已经可以正常工作。 - Eamonn
2个回答

3
这是一件棘手的事情,使用正则表达式很难或不可能做到正确,因为它很快就会失控。在设计过滤器时,您必须权衡关于误报和漏报的担忧,并根据自己的喜好做出任何决定。认为这种类型的过滤器会100%工作是不正确的。 基于您的要求,您应该做出以下决策:
  1. 积极过滤,并接受一些人无法收到您的电子邮件,或
  2. 不进行过滤,但从邮件列表中删除反弹的地址。
再次强调,这取决于您的需求,但我建议不要过滤。即使在电子邮件声誉是一个问题的情况下,除非您向好地址和坏地址发送相等数量的电子邮件,否则这是更好的选择。

以下几点证明了这个事实

与您发布的内容不同:

  1. admin@gmail.com 是非法地址
  2. postmaster.@gmail.com 将会收到邮件。
这表明很难做到像这样的事情。并且(在我看来)您不应该尝试。即使是“简单”和“显而易见”的事情,在电子邮件的疯狂世界中通常也不是这样。
  1. It's important to note that dots don't matter in gmail addresses.

    Gmail doesn't recognize dots as characters within usernames, you can add or remove the dots from a Gmail address without changing the actual destination address; they'll all go to your inbox, and only yours. In short:

    homerjsimpson@gmail.com = hom.er.j.sim.ps.on@gmail.com
    homerjsimpson@gmail.com = HOMERJSIMPSON@gmail.com
    homerjsimpson@gmail.com = Homer.J.Simpson@gmail.com
    

    A quick test on my personal email has confirmed that emails with leading or trailing dots respect this principle:

    homerjsimpson@gmail.com = .homerjsimpson@gmail.com
    homerjsimpson@gmail.com = homerjsimpson.@gmail.com
    homerjsimpson@gmail.com = homerjsimpson.....@gmail.com
    

    work, and are delivered.

  2. You must distinguish between valid Gmail username, and valid Gmail address. They are not the same thing. Just because you cannot register with certain string for a username does not mean that putting that same string in front of @gmail.com won't deliver an email.

    Some other points:

    • Usernames must be at least 6 characters. This means admin@gmail.com is, in fact, an illegal address. bob@gmail.com, etc. are also illegal according to this guideline, although "obviously legal".
    • Usernames can contain letters (a-z), numbers (0-9), dashes (-), underscores (_), apostrophes ('), and periods (.) You should allow any combination of these in the username if you decide on a regex filter. And also the plus ('+'), and probably some other characters we haven't considered.
    • There are also max-length of username, total length of address constraints, and other constraints on emails in general.
    • Plus signs are not legal parts of Gmail usernames, but can be included in gmail addresses. homerjsimpson+stackoverflow@gmail.com will happily be delivered to homerjsimpson@gmail.com.

我其实之前不知道 Gmail 的这个特点,但是它很有趣。不过我只是以 Gmail/Google Mail 作为例子,实际上我是在写另一个网站的内容。 - Eamonn

1
请使用这个替代方案:
^[a-z0-9]+[\.'\-a-z0-9_]*[a-z0-9]+@(gmail|googlemail)\.com$

在Regex101.com上进行了测试:

enter image description here


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接