使用正则表达式查找电子邮件地址中的域名

Question

使用正则表达式查找电子邮件地址中的域名

26

我知道我很蠢，但是我无法从这个电子邮件地址中提取域名：

'blahblah@gmail.com'

我的期望输出：

'@gmail.com'

我的当前输出：

这只是一个句号字符。

以下是我的代码：

import re
test_string = 'blahblah@gmail.com'
domain = re.search('@*?\.', test_string)
print domain.group()

这是我认为正则表达式 ('@*?.', test_string) 匹配的内容：

 ' # begin to define the pattern I'm looking for (also tell python this is a string)

  @ # find all patterns beginning with the at symbol ("@")

  * # find all characters after ampersand

  ? # find the last character before the period

  \ # breakout (don't use the next character as a wild card, us it is a string character)

  . # find the "." character

  ' # end definition of the pattern I'm looking for (also tell python this is a string)

  , test string # run the preceding search on the variable "test_string," i.e., 'blahblah@gmail.com'

我基于这里的定义进行解释： http://docs.activestate.com/komodo/4.4/regex-intro.html 此外，我已经搜索过了，但其他答案对我来说有点难以理解。

一如既往地感谢您的帮助。谢谢。

我的设备信息如下：

Windows 7 Pro (64 位) Python 2.6 (64 位)

附：StackOverflow问题：我的帖子不包括换行符，除非我在它们之间按两次“return”键。例如（当我发布时，这些都在不同的行上）：

@-查找以at符号（“@”）开头的所有模式 * -查找和符号后面的所有字符？— 查找句点之前的最后一个字符 \ -打破（不使用下一个字符作为通配符，而是将其作为字符串字符） . -查找“。”字符，test string-在变量“test_string”上运行前面的搜索，即'blahblah@gmail.com'

这就是为什么我在每个上面的空行之间放了一个空行。我做错了什么？谢谢。

- PatentDeathSquad

回答一下你的 PS（应该在Meta上）：Stack Overflow 使用Markdown。格式说明中写明：“要换行，请在末尾加入2个空格”。 - chrisaycock

它将接受HTML，例如<br />。 - user684934

一个简单的解决方案是"@.*"，尽管这可能过于贪心了。 - Vetsin

7个回答

18

好的，那么为什么不使用split（或partition）呢？

"@"+'blahblah@gmail.com'.split("@")[-1]

或者您可以使用其他字符串方法，比如find。

>>> s="bal@gmail.com"
>>> s[ s.find("@") : ]
'@gmail.com'
>>>

如果您要从其他文本中提取电子邮件地址

f=open("file")
for line in f:
    words= line.split()
    if "@" in words:
       print "@"+words.split("@")[-1]
f.close()

- kurumi

感谢您的回复。为什么要使用正则表达式而不是常规字符串方法？我有40兆字节的字符串，其中包含与垃圾文本交织在一起的电子邮件地址，我正在尝试提取它们。我是一个业余程序员，我试图保持简单，并尝试使用正则表达式来理解它，所以我没有在这里深入探讨。如果造成了困惑，请原谅。 - PatentDeathSquad

8

使用正则表达式：

>>> re.search('@.*', test_string).group()
'@gmail.com'

另外一种方式：

>>> '@' + test_string.split('@')[1]
'@gmail.com'

- chrisaycock

啊，我看到了，我需要另一个“.”。谢谢！（不确定为什么） - PatentDeathSquad

1

@AquaT33nFan: "@*" 表示零个或多个 "@"。"@.*" 表示一个 "@" 后面跟着零个或多个任意字符（除了换行符）。换句话说，这里的 * 是 Kleene 星号，而不是通配符。 - Rachel Shallit

3

你可以尝试使用urllib。

from urllib import parse
email = 'myemail@mydomain.com'
domain = parse.splituser(email)[1]

输出结果为：

'mydomain.com'

- gs202

1

splituser 函数已被弃用。https://bugs.python.org/issue35891 - Dan Yishai

2

我想指出chrisaycock的方法会匹配形式不正确的电子邮件地址。

herp@

为了正确地匹配可能有效的电子邮件与域名，您需要稍微修改它。

使用正则表达式：

>>> re.search('@.+', test_string).group()
'@gmail.com'

- Josh_at_Savings_Champion

2

使用以下正则表达式，您可以提取任何像.com或.in这样的域名。

import re
s = 'my first email is user1@gmail.com second email is enter code hereuser2@yahoo.in and third email is user3@outlook.com'
print(re.findall('@+\S+[.in|.com|]',s))

输出

['@gmail.com', '@yahoo.in']

- Alok Choudhary

这限制了可能的域，因为它只考虑了2个顶级域。 - Aarav Prasad

0

这里是另一种使用索引函数的方法：

email_addr = 'blahblah@gmail.com'

# Find the location of @ sign
index = email_addr.index("@")

# extract the domain portion starting from the index
email_domain = email_addr[index:]

print(email_domain)
#------------------
# Output:
@gmail.com

- Stryker

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Conrad.Dean · Accepted Answer

我这里有一些可能会有帮助的东西

import re
s = 'My name is Conrad, and blahblah@gmail.com is my email.'
domain = re.search("@[\w.]+", s)
print domain.group()

输出

@gmail.com

正则表达式是如何工作的:

@ - 扫描直至遇到此字符

[\w.] 一组可能匹配的字符，所以 \w 表示所有字母数字字符，而尾部的句点 . 加入该字符集。

+ 匹配前面那组字符中的一个或多个。

由于这个正则表达式匹配了 @ 后面的每个字母数字和句点字符，它可以匹配在句子中间的电子邮件域名。