使用imaplib下载多个附件

33

我该如何使用imaplib从一封邮件中下载多个附件?

假设有一封邮件,其中包含4个附件,那么如何下载所有这些附件呢?下面的代码只能从一封邮件中下载一个附件。

detach_dir = 'c:/downloads'
m = imaplib.IMAP4_SSL("imap.gmail.com")
m.login('hello@gmail.com','3323434')
m.select("[Gmail]/All Mail")

resp, items = m.search(None, "(UNSEEN)")
items = items[0].split()

for emailid in items:
    resp, data = m.fetch(emailid, "(RFC822)") 
    email_body = data[0][1] 
    mail = email.message_from_string(email_body) 
    temp = m.store(emailid,'+FLAGS', '\\Seen')
    m.expunge()

    if mail.get_content_maintype() != 'multipart':
        continue

    print "["+mail["From"]+"] :" + mail["Subject"]

    for part in mail.walk():
        if part.get_content_maintype() == 'multipart':
            continue
        if part.get('Content-Disposition') is None:
            continue

        filename = part.get_filename()
        att_path = os.path.join(detach_dir, filename)

        if not os.path.isfile(att_path) :
            fp = open(att_path, 'wb')
            fp.write(part.get_payload(decode=True))
            fp.close()
            return HttpResponse('check folder')

你对于显式Content-Disposition:头的存在的依赖存在多重错误。不幸的是,这里的几个答案只是盲目地继承了这个缺陷。或许可以参考 https://dev59.com/bqjka4cB1Zd3GeqPCrPg#48563281 - tripleee
7个回答

50

对于未来的Python旅行者们。 这是一个用于下载电子邮件中的任何附件并将其保存到特定位置的类。

import email
import imaplib
import os

class FetchEmail():

    connection = None
    error = None

    def __init__(self, mail_server, username, password):
        self.connection = imaplib.IMAP4_SSL(mail_server)
        self.connection.login(username, password)
        self.connection.select(readonly=False) # so we can mark mails as read

    def close_connection(self):
        """
        Close the connection to the IMAP server
        """
        self.connection.close()

    def save_attachment(self, msg, download_folder="/tmp"):
        """
        Given a message, save its attachments to the specified
        download folder (default is /tmp)

        return: file path to attachment
        """
        att_path = "No attachment found."
        for part in msg.walk():
            if part.get_content_maintype() == 'multipart':
                continue
            if part.get('Content-Disposition') is None:
                continue

            filename = part.get_filename()
            att_path = os.path.join(download_folder, filename)

            if not os.path.isfile(att_path):
                fp = open(att_path, 'wb')
                fp.write(part.get_payload(decode=True))
                fp.close()
        return att_path

    def fetch_unread_messages(self):
        """
        Retrieve unread messages
        """
        emails = []
        (result, messages) = self.connection.search(None, 'UnSeen')
        if result == "OK":
            for message in messages[0].split(' '):
                try: 
                    ret, data = self.connection.fetch(message,'(RFC822)')
                except:
                    print "No new emails to read."
                    self.close_connection()
                    exit()

                msg = email.message_from_bytes(data[0][1])
                if isinstance(msg, str) == False:
                    emails.append(msg)
                response, data = self.connection.store(message, '+FLAGS','\\Seen')

            return emails

        self.error = "Failed to retreive emails."
        return emails

    def parse_email_address(self, email_address):
        """
        Helper function to parse out the email address from the message

        return: tuple (name, address). Eg. ('John Doe', 'jdoe@example.com')
        """
        return email.utils.parseaddr(email_address)

6
对于 Python 3,请使用msg = email.message_from_bytes(data[0][1]),而不是msg = email.message_from_string(data[0][1])。否则 for part in msg.walk() 的效果将不如预期。 - a.Dippel
我想知道这个“未读”搜索是什么意思?在我的情况下,这个搜索没有返回任何消息。 - sequence
我们在哪里传递我们想要保存附件的特定位置? - BrianBeing
2
我建议您按照以下方式操作。这样更直接,并且可以处理八位字节流附件。filename = part.get_filename()如果有文件名,就将其保存到下载文件夹中:att_path = os.path.join(download_folder, filename)打开文件并写入内容:fp = open(att_path, 'wb') fp.write(part.get_payload(decode=True)) fp.close() - BenSabo
1
这个类非常有用。谢谢! - Kira
显示剩余2条评论

16

我重构了代码,将其拆分为函数。我使用PEEK以便不改变电子邮件消息的未读状态。

我发布了我的解决方法,类似于@John,但我只使用函数而不是类:

import imaplib
import email

# Connect to an IMAP server
def connect(server, user, password):
    m = imaplib.IMAP4_SSL(server)
    m.login(user, password)
    m.select()
    return m

# Download all attachment files for a given email
def downloaAttachmentsInEmail(m, emailid, outputdir):
    resp, data = m.fetch(emailid, "(BODY.PEEK[])")
    email_body = data[0][1]
    mail = email.message_from_string(email_body)
    if mail.get_content_maintype() != 'multipart':
        return
    for part in mail.walk():
        if part.get_content_maintype() != 'multipart' and part.get('Content-Disposition') is not None:
            open(outputdir + '/' + part.get_filename(), 'wb').write(part.get_payload(decode=True))

# Download all the attachment files for all emails in the inbox.
def downloadAllAttachmentsInInbox(server, user, password, outputdir):
    m = connect(server, user, password)
    resp, items = m.search(None, "(ALL)")
    items = items[0].split()
    for emailid in items:
        downloaAttachmentsInEmail(m, emailid, outputdir)

我们应该为emailid和outputdir参数填写什么内容? - BrianBeing
2
对于 emailid,请查看 downloadAllAttachmentsInInbox() 如何调用 downloaAttachmentsInEmail()。对于 outputdir,它是附件下载的目录。 - sashoalm
好的。所以我只需要为变量server、user、password和outputdir填写值吗? - BrianBeing
2
这个真的有效。在我的情况下,我不得不将message_from_string更改为message_from_bytes,然后它完美地工作了。 - Mujeeb Ishaque
嗯,它下载包含消息本身而非附件的dat文件。 - Okloks

7
您的代码看起来还不错,除了在fp.close()之后有一个return(可能是打字错误?):

...
fp.write(part.get_payload(decode=True))
fp.close()
return HttpResponse('check folder')

保存第一个附件后,函数就会返回。将该行注释掉,看看是否可以解决您的问题。


5
您可以使用imap_tools包: https://pypi.org/project/imap-tools/
from imap_tools import MailBox
with MailBox('imap.mail.com').login('test@mail.com', 'password', 'INBOX') as mailbox:
    for message in mailbox.fetch():
        for att in message.attachments:  # list: [Attachment objects]
            att.filename         # str: 'cat.jpg'
            att.content_type     # str: 'image/jpeg'
            att.payload          # bytes: b'\xff\xd8\xff\xe0\'

2
* You can try following function to get mail attachment

def create_message_attachment(self,msg_str):
        count = 1
        body = ''
        content_id = ''
        for part in msg_str.walk():
            file_name_gl = None
            mptype = part.get_content_maintype()
            file_name_gl = part.get_filename()
            if mptype == "multipart":
                continue
            elif mptype == "text":
                if not file_name_gl: continue
            elif mptype == "image":
                content_id = part.get('Content-ID')
                if not file_name_gl:
                    file_name_gl = 'image_' + str(count) + '.' + part.get_content_subtype()
                    count = count + 1

            body = part.get_payload(decode = True)
            if type(body) <> type(None) :
                body = body.strip()
                if body <> "":
                    body = base64.encodestring(body)

1
import re

def get_valid_filename(s):
    s = str(s).strip().replace(' ', '_')
    return re.sub(r'(?u)[^-\w.]', '', s)

            fileName = get_valid_filename(part.get_filename())

如果文件名包含无效字符,请进行清理。例如:在Windows上的冒号。

1

@sashoalm的代码对我有用,只需要做一个小改变:

downloaAttachmentsInEmail中将mail = email.message_from_string(email_body)改为mail = email.message_from_bytes(email_body)

当我尝试将字节(附件)读取为字符串时,会出现错误。现在它对我完美地工作了。

这是代码的完整示例:

server = 'outlook.office365.com'
user = 'YOUR USERNAME'
password = 'YOUR PASSWORD'
outputdir = 'DIRECTORY THAT YOU WANT FILES DOWNLOADED TO'
subject = 'Data Exports' #subject line of the emails you want to download attachments from

def connect(server, user, password):
    m = imaplib.IMAP4_SSL(server)
    m.login(user, password)
    m.select()
    return m

def downloaAttachmentsInEmail(m, emailid, outputdir):
    resp, data = m.fetch(emailid, "(BODY.PEEK[])")
    email_body = data[0][1]
    mail = email.message_from_bytes(email_body)
    if mail.get_content_maintype() != 'multipart':
        return
    for part in mail.walk():
        if part.get_content_maintype() != 'multipart' and part.get('Content-Disposition') is not None:
            open(outputdir + '/' + part.get_filename(), 'wb').write(part.get_payload(decode=True))

#download attachments from all emails with a specified subject line
def downloadAttachments(subject):
    m = connect(server, user, password)
    m.select("Inbox")
    typ, msgs = m.search(None, '(SUBJECT "' + subject + '")')
    msgs = msgs[0].split()
    for emailid in msgs:
        downloaAttachmentsInEmail(m, emailid, outputdir)

downloadAttachments(subject)

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接