使用Python的imaplib获取Gmail收件箱主题标题和发件人姓名

30
我正在使用Python的imaplib连接到我的Gmail帐户。我想要检索前15封邮件(无论已读还是未读),并且只显示主题和发件人姓名(或地址),但不知道如何显示收件箱的内容。
以下是目前为止的代码(成功连接):
import imaplib

mail = imaplib.IMAP4_SSL('imap.gmail.com')
mail.login('mygmail@gmail.com', 'somecrazypassword')
mail.list()
mail.select('inbox')

#need to add some stuff in here

mail.logout()

我认为这应该很简单,只是我不太熟悉imaplib库的命令。任何帮助都将不胜感激...

更新 因为Julian的帮助,我可以遍历每个消息并使用以下代码检索整个内容:

typ, data = mail.search(None, 'ALL')
for num in data[0].split():
   typ, data = mail.fetch(num, '(RFC822)')
   print 'Message %s\n%s\n' % (num, data[0][1])
mail.close()

但我只想要主题和发件人。有没有IMAP库的命令可以获取这些内容,或者我必须解析data[0][1]的全部内容以获取文本:主题和发件人?

更新 好的,我已经实现了获取主题和发件人的部分,但是迭代(1,15)显然按照降序进行,首先显示我最旧的邮件。我该如何更改?我尝试过以下操作:

for i in range( len(data[0])-15, len(data[0]) ):
     print data

但在15个迭代中,这只给了我所有的None,有什么想法吗?我也尝试过mail.sort('REVERSE DATE', 'UTF-8', 'ALL'),但Gmail不支持.sort()函数。

更新 找到一种方法来解决:

#....^other code is the same as above except need to import email module
mail.select('inbox')
typ, data = mail.search(None, 'ALL')
ids = data[0]
id_list = ids.split()
#get the most recent email id
latest_email_id = int( id_list[-1] )

#iterate through 15 messages in decending order starting with latest_email_id
#the '-1' dictates reverse looping order
for i in range( latest_email_id, latest_email_id-15, -1 ):
   typ, data = mail.fetch( i, '(RFC822)' )

   for response_part in data:
      if isinstance(response_part, tuple):
          msg = email.message_from_string(response_part[1])
          varSubject = msg['subject']
          varFrom = msg['from']

   #remove the brackets around the sender email address
   varFrom = varFrom.replace('<', '')
   varFrom = varFrom.replace('>', '')

   #add ellipsis (...) if subject length is greater than 35 characters
   if len( varSubject ) > 35:
      varSubject = varSubject[0:32] + '...'

   print '[' + varFrom.split()[-1] + '] ' + varSubject

这个代码给出了最新的15个邮件主题和发件人地址,按照请求的顺序进行了降序排序!感谢所有帮助我的人!


Python文档中的示例对我来说很好用: http://docs.python.org/library/imaplib#imap4-example - Julian
是的,你说得对,这非常有效地检索了所有消息的完整内容。但我只需要主题和发件人地址。然后我可以使for循环只遍历1到15。 - sadmicrowave
另外一个Python文档链接:http://docs.python.org/library/email.html ;) - Julian
6个回答

19
    c.select('INBOX', readonly=True)

    for i in range(1, 30):
        typ, msg_data = c.fetch(str(i), '(RFC822)')
        for response_part in msg_data:
            if isinstance(response_part, tuple):
                msg = email.message_from_string(response_part[1])
                for header in [ 'subject', 'to', 'from' ]:
                    print '%-8s: %s' % (header.upper(), msg[header])
这应该可以让你明白如何获取主题和发件人的想法?

2
什么是电子邮件?你是在指我的“mail”变量吗?而message_from_string()是什么,它是虚构的吗?我收到了一个错误,显示AttributeError("Unknown IMAP4 command: '%s'" % attr)AttributeError: Unknown IMAP4 command: 'message_from_string' - sadmicrowave
3
没事了,我想通了,我忘记加入邮件模块了。谢谢。 - sadmicrowave
如果邮件数量少于30封,代码会不会引发异常?在这种情况下,如果邮件ID(在本例中为“str(i)”)不存在,c.fetch()将触发异常。 - chutsu
我测试了代码,如果电子邮件ID(c.fetch()函数中的第一个参数)达到0,它确实会引发异常,一个简单的解决方法是一旦电子邮件ID达到0就打破循环。(电子邮件ID似乎不从0开始) - chutsu
1
@chutsu 很高兴听到这个消息,也很好知道。然而从经验来看,RFC更像是一种指导方针,而不是实际实现。RFC中有很多“应该…”和“可能…”,开发人员在生成东西时会有些“这不是很重要”的想法。但无论如何,OP肯定应该努力遵循RFC,只是要记住其他人可能不会 :) - Torxed
显示剩余5条评论

9
这是我从电子邮件中提取有用信息的解决方案:
import datetime
import email
import imaplib
import mailbox


EMAIL_ACCOUNT = "your@gmail.com"
PASSWORD = "your password"

mail = imaplib.IMAP4_SSL('imap.gmail.com')
mail.login(EMAIL_ACCOUNT, PASSWORD)
mail.list()
mail.select('inbox')
result, data = mail.uid('search', None, "UNSEEN") # (ALL/UNSEEN)
i = len(data[0].split())

for x in range(i):
    latest_email_uid = data[0].split()[x]
    result, email_data = mail.uid('fetch', latest_email_uid, '(RFC822)')
    # result, email_data = conn.store(num,'-FLAGS','\\Seen') 
    # this might work to set flag to seen, if it doesn't already
    raw_email = email_data[0][1]
    raw_email_string = raw_email.decode('utf-8')
    email_message = email.message_from_string(raw_email_string)

    # Header Details
    date_tuple = email.utils.parsedate_tz(email_message['Date'])
    if date_tuple:
        local_date = datetime.datetime.fromtimestamp(email.utils.mktime_tz(date_tuple))
        local_message_date = "%s" %(str(local_date.strftime("%a, %d %b %Y %H:%M:%S")))
    email_from = str(email.header.make_header(email.header.decode_header(email_message['From'])))
    email_to = str(email.header.make_header(email.header.decode_header(email_message['To'])))
    subject = str(email.header.make_header(email.header.decode_header(email_message['Subject'])))

    # Body details
    for part in email_message.walk():
        if part.get_content_type() == "text/plain":
            body = part.get_payload(decode=True)
            file_name = "email_" + str(x) + ".txt"
            output_file = open(file_name, 'w')
            output_file.write("From: %s\nTo: %s\nDate: %s\nSubject: %s\n\nBody: \n\n%s" %(email_from, email_to,local_message_date, subject, body.decode('utf-8')))
            output_file.close()
        else:
            continue

7

如果您想要检查邮件并解析头部,这是我使用的方法:

def parse_header(str_after, checkli_name, mailbox) :
    #typ, data = m.search(None,'SENTON', str_after)
    print mailbox
    m.SELECT(mailbox)
    date = (datetime.date.today() - datetime.timedelta(1)).strftime("%d-%b-%Y")
    #date = (datetime.date.today().strftime("%d-%b-%Y"))
    #date = "23-Jul-2012"

    print date
    result, data = m.uid('search', None, '(SENTON %s)' % date)
    print data

    doneli = []
    for latest_email_uid in data[0].split():
        print latest_email_uid
        result, data = m.uid('fetch', latest_email_uid, '(RFC822)')
        raw_email = data[0][1]

        import email
        email_message = email.message_from_string(raw_email)
        print email_message['To']
        print email_message['Subject']
        print email.utils.parseaddr(email_message['From'])
        print email_message.items() # print all headers

属性错误:'module'对象没有属性'message_from_string'。我正在导入电子邮件,承诺。 - Chase Roberts
1
@ChaseRoberts 你需要使用 from email import email。我猜你使用了 import email,这意味着你正在尝试在错误的层级上访问 message_from_string - blockloop

3

我正在寻找一个现成的简单脚本,可以通过IMAP列出最新的收件箱而不必浏览所有邮件。这里的信息很有用,尽管需要自己动手实现并且缺少一些方面。首先,IMAP4.select返回消息计数。其次,主题标题的解码不是很直接。

#! /usr/bin/env python
# -*- coding: utf-8 -*-


import imaplib
import email
from email.header import decode_header
import HTMLParser


# to unescape xml entities
_parser = HTMLParser.HTMLParser()

def decodeHeader(value):
  if value.startswith('"=?'):
    value = value.replace('"', '')

  value, encoding = decode_header(value)[0]
  if encoding:
    value = value.decode(encoding)

  return _parser.unescape(value)

def listLastInbox(top = 4):
  mailbox = imaplib.IMAP4_SSL('imap.gmail.com')
  mailbox.login('mygmail@gmail.com', 'somecrazypassword')

  selected = mailbox.select('INBOX')
  assert selected[0] == 'OK'
  messageCount = int(selected[1][0])

  for i in range(messageCount, messageCount - top, -1):
    reponse = mailbox.fetch(str(i), '(RFC822)')[1]
    for part in reponse:
      if isinstance(part, tuple):
        message = email.message_from_string(part[1])
        yield {h: decodeHeader(message[h]) for h in ('subject', 'from', 'date')}

  mailbox.logout()


if __name__ == '__main__':
  for message in listLastInbox():
    print '-' * 40
    for h, v in message.items():
      print u'{0:8s}: {1}'.format(h.upper(), v)

3

BODY 获取几乎所有内容并将消息标记为已读。 BODY[<parts>] 只获取那些部分。 BODY.PEEK[<parts>] 获取相同的部分,但不会将消息标记为已读。 <parts> 可以是 HEADERTEXTHEADER.FIELDS (<list of fields>)HEADER.FIELDS.NOT (<list of fields>)

这是我使用的: typ, data = connection.fetch(message_num_s, b'(BODY.PEEK[HEADER.FIELDS (SUBJECT FROM)])')

`

def safe_encode(seq):
    if seq not in (list,tuple):
        seq = [seq]
    for i in seq:
        if isinstance(i, (int,float)):
            yield str(i).encode()
        elif isinstance(i, str):
            yield i.encode()
        elif isinstance(i, bytes):
            yield i
        else:
            raise ValueError

def fetch_fields(connection, message_num, field_s):
    """Fetch just the fields we care about. Parse them into a dict"""
    if isinstance(field_s, (list,tuple)):
        field_s = b' '.join(safe_encode(field_s))
    else:
        field_s = tuple(safe_encode(field_s))[0]

    message_num = tuple(safe_encode(message_num))[0]

    typ, data = connection.fetch(message_num, b'(BODY.PEEK[HEADER.FIELDS (%s)])'%(field_s.upper()))
    if typ != 'OK':
        return typ, data  #change this to an exception if you'd rather

    items={}
    lastkey = None
    for line in data[0][1].splitlines():
        if b':' in line:
            lastkey, value = line.strip().split(b':', 1)
            lastkey = lastkey.capitalize()
            #not all servers capitalize the same, and some just leave it
            #as however it arrived from some other mail server.

            items[lastkey]=value
        else:
            #subject was so long it ran onto the next line, luckily it didn't have a ':' in it so its easy to recognize.
            items[lastkey]+=line
            #print(items[lastkey])
    return typ, items
`

您可以通过将调用“mail.fetch()”替换为fetch_fields(mail, i, 'SUBJECT FROM')fetch_fields(mail, i, ('SUBJECT' 'FROM'))来将其放入代码示例中。


1
所有其他答案都会获取整个消息 - 例如 mail.fetch( i, '(RFC822)' ) - 这是昂贵且缓慢的。我相信这是唯一一个实际应用IMAPv4rev1设计的答案。 - Stephen Bosch
我简直不敢相信这是得分最低的答案。email.header 模块中的 decode_headermake_header 将为您组装和解码标题。 - viilpe

2
除了以上所有的回答,还有以下补充。
import imaplib
import base64
import os
import email

if __name__ == '__main__':
    email_user = "email@domain.com"
    email_pass = "********"
    mail = imaplib.IMAP4_SSL("hostname", 993)
    mail.login(email_user, email_pass)
    mail.select()
    type, data = mail.search(None, 'ALL')
    mail_ids = data[0].decode('utf-8')
    id_list = mail_ids.split()
    mail.select('INBOX', readonly=True)
    for i in id_list:
        typ, msg_data = mail.fetch(str(i), '(RFC822)')
        for response_part in msg_data:
            if isinstance(response_part, tuple):
                msg = email.message_from_bytes(response_part[1])
                print(msg['from']+"\t"+msg['subject'])

这将给您提供电子邮件的发件人和主题名称。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接