如何从Gmail下载所有带附件的电子邮件?

83

我该如何连接到Gmail并确定哪些邮件有附件?然后我想要下载每个附件,并在处理每个邮件时打印出主题(Subject)和发件人(From)。


24
这个网站旨在获得对明确定义的问题给出明确定义的答案。我的问题是否没有被明确定义?现在我正在寻找一种用我常用的三种语言中的一种得到明确定义答案的方法。 - anon
13个回答

154

有点难 :-)

import email, getpass, imaplib, os

detach_dir = '.' # directory where to save attachments (default: current)
user = raw_input("Enter your GMail username:")
pwd = getpass.getpass("Enter your password: ")

# connecting to the gmail imap server
m = imaplib.IMAP4_SSL("imap.gmail.com")
m.login(user,pwd)
m.select("[Gmail]/All Mail") # here you a can choose a mail box like INBOX instead
# use m.list() to get all the mailboxes

resp, items = m.search(None, "ALL") # you could filter using the IMAP rules here (check http://www.example-code.com/csharp/imap-search-critera.asp)
items = items[0].split() # getting the mails id

for emailid in items:
    resp, data = m.fetch(emailid, "(RFC822)") # fetching the mail, "`(RFC822)`" means "get the whole stuff", but you can ask for headers only, etc
    email_body = data[0][1] # getting the mail content
    mail = email.message_from_string(email_body) # parsing the mail content to get a mail object

    #Check if any attachments at all
    if mail.get_content_maintype() != 'multipart':
        continue

    print "["+mail["From"]+"] :" + mail["Subject"]

    # we use walk to create a generator so we can iterate on the parts and forget about the recursive headach
    for part in mail.walk():
        # multipart are just containers, so we skip them
        if part.get_content_maintype() == 'multipart':
            continue

        # is this part an attachment ?
        if part.get('Content-Disposition') is None:
            continue

        filename = part.get_filename()
        counter = 1

        # if there is no filename, we create one with a counter to avoid duplicates
        if not filename:
            filename = 'part-%03d%s' % (counter, 'bin')
            counter += 1

        att_path = os.path.join(detach_dir, filename)

        #Check if its already there
        if not os.path.isfile(att_path) :
            # finally write the stuff
            fp = open(att_path, 'wb')
            fp.write(part.get_payload(decode=True))
            fp.close()

哇!那真是太神奇了。;-) 但是为了好玩,试试用Java实现同样的功能吧!

顺便说一句,我是在shell中测试的,因此可能还存在一些错误。

享受吧!

编辑:

由于邮箱名称在不同国家可能会有所不同,我建议在m.select("the mailbox name")之前先执行m.list()并从中选择一个项目,以避免出现以下错误:

imaplib.error: command SEARCH illegal in state AUTH, only allowed in states SELECTED


如果您需要在高度活跃的系统上运行此程序,是更倾向于单独处理每个电子邮件,还是定期一次性处理所有电子邮件?这两种解决方案都需要一个队列,但我想知道哪种方案更容易扩展? - kari.patila
通常情况下,您必须进行测量以确保,但考虑到SSL连接的过热,我首先会选择分批处理:连接,处理100封邮件,然后再次执行。它具有批处理的好处,但如果失败,您可以从上一批重新启动。此外,您可以排队批处理。 - Bite code
当邮件缺少主题时,脚本会遇到问题:print "["+mail["From"]+"] :" + mail["Subject"] TypeError: cannot concatenate 'str' and 'NoneType' objects。我个人添加了以下内容来解决这个问题:if mail["Subject"] is not None: print "["+mail["From"]+"] :" + mail["Subject"] else: print "["+mail["From"]+"]" - franzlorenzon
跳过[GMAIL]/部分,它是“标准”的。这样做,就可以消除“命令处于状态…”的错误。 - cox
所有的邮件都没有被获取,其次Gmail安全性不允许获取所有用户的邮件。请问有人可以帮忙吗? - hussain
显示剩余6条评论

9
我不是Perl专家,但我知道GMail支持IMAP和POP3两种完全标准的协议,可以让您实现这一点。也许这能帮助您入门。

我认为IMAP在备份方面更加可靠。 - Kris Kumler

8
#!/usr/bin/env python
"""Save all attachments for given gmail account."""
import os, sys
from libgmail import GmailAccount

ga = GmailAccount("your.account@gmail.com", "pA$$w0Rd_")
ga.login()

# folders: inbox, starred, all, drafts, sent, spam
for thread in ga.getMessagesByFolder('all', allPages=True):
    for msg in thread:
        sys.stdout.write('.')
        if msg.attachments:
           print "\n", msg.id, msg.number, msg.subject, msg.sender
           for att in msg.attachments:
               if att.filename and att.content:
                  attdir = os.path.join(thread.id, msg.id)
                  if not os.path.isdir(attdir):
                     os.makedirs(attdir)                
                  with open(os.path.join(attdir, att.filename), 'wb') as f:
                       f.write(att.content)

未经测试

  1. 确保用户条款允许使用此类脚本,否则您的帐户将被暂停。
  2. 可能有更好的选择:Gmail离线模式、Thunderbird + ExtractExtensions、GmailFS、Gmail Drive等。

1
http://libgmail.cvs.sourceforge.net/viewvc/libgmail/libgmail/libgmail.py?view=markup - jfs

7
请看 Mail::Webmail::Gmail获取附件 有两种方法可以获取附件:
1 -> 通过发送由get_indv_email返回的特定附件的引用
2 -> 使用get_all_attachments获取所有附件
# Creates an array of references to every attachment in your account
my $messages = $gmail->get_messages();
my @attachments;

foreach ( @{ $messages } ) {
    my $email = $gmail->get_indv_email( msg => $_ );
    if ( defined( $email->{ $_->{ 'id' } }->{ 'attachments' } ) ) {
        foreach ( @{ $email->{ $_->{ 'id' } }->{ 'attachments' } } ) {
            push( @attachments, $gmail->get_attachment( attachment => $_ ) );
            if ( $gmail->error() ) {
                print $gmail->error_msg();
            }
        }
    }
}

2 -> 或者通过发送附件ID和消息ID

#retrieve specific attachment
my $msgid = 'F000000000';
my $attachid = '0.1';
my $attach_ref = $gmail->get_attachment( attid => $attachid, msgid => $msgid );

返回一个标量的引用,该标量保存附件的数据。


4
在 Gmail 中,您可以使用“has:attachment”进行过滤,用它来识别在测试时应该收到的消息。请注意,这似乎会同时显示带有附加文件(显示纸夹图标)和内联附加图像(不显示纸夹)的消息。
由于没有 Gmail API,因此 IMAP 或 POP 是您唯一的选择。JavaMail API 可能会有所帮助,以及这篇非常简洁的文章 using Perl 从 IMAP 下载附件。这里在 SO 上的一些 以前的问题 也可能有所帮助。
这个 PHP 示例 也可能有所帮助。不幸的是,据我所见,imap_header 中没有包含附件信息,因此需要下载正文才能看到 X-Attachment-Id 字段。(请有人证明我错了)。

4

这个问题相当古老,当时Gmail API还没有出现。但是现在Google提供了Gmail API来访问IMAP。请参见Google的Gmail API以获取更多信息。此外,您还可以在pypi上查看google-api-python-client


3

如果你们中的任何人已经更新到Python 3.3,我从这里获取了2.7脚本并将其更新到3.3。同时修复了Gmail返回信息的一些问题。

# Something in lines of https://dev59.com/aXRC5IYBdhLWcg3wSu97
# Make sure you have IMAP enabled in your gmail settings.
# Right now it won't download same file name twice even if their contents are different.
# Gmail as of now returns in bytes but just in case they go back to string this line is left here.

import email
import getpass, imaplib
import os
import sys
import time

detach_dir = '.'
if 'attachments' not in os.listdir(detach_dir):
    os.mkdir('attachments')

userName = input('Enter your GMail username:\n')
passwd = getpass.getpass('Enter your password:\n')


try:
    imapSession = imaplib.IMAP4_SSL('imap.gmail.com',993)
    typ, accountDetails = imapSession.login(userName, passwd)
    if typ != 'OK':
        print ('Not able to sign in!')
        raise

    imapSession.select('Inbox')
    typ, data = imapSession.search(None, 'ALL')
    if typ != 'OK':
        print ('Error searching Inbox.')
        raise

    # Iterating over all emails
    for msgId in data[0].split():
        typ, messageParts = imapSession.fetch(msgId, '(RFC822)')

        if typ != 'OK':
            print ('Error fetching mail.')
            raise 

        #print(type(emailBody))
        emailBody = messageParts[0][1]
        #mail = email.message_from_string(emailBody)
        mail = email.message_from_bytes(emailBody)

        for part in mail.walk():
            #print (part)
            if part.get_content_maintype() == 'multipart':
                # print part.as_string()
                continue
            if part.get('Content-Disposition') is None:
                # print part.as_string()
                continue

            fileName = part.get_filename()

            if bool(fileName):
                filePath = os.path.join(detach_dir, 'attachments', fileName)
                if not os.path.isfile(filePath) :
                    print (fileName)
                    fp = open(filePath, 'wb')
                    fp.write(part.get_payload(decode=True))
                    fp.close()

    imapSession.close()
    imapSession.logout()

except :
    print ('Not able to download all attachments.')
    time.sleep(3)

1

Gmail支持标准协议POP和IMAP,因此任何提供这两种协议客户端的平台、工具、应用程序、组件或API都应该可以使用。

我建议您在Google上搜索您喜欢的语言/平台(例如,“python”),加上“pop”,加上“imap”,再加上“开源”,再加上“下载”或“评论”,看看有哪些选项。

有许多免费的应用程序和组件,选择一些看起来值得信赖的,查看评论,然后下载并享受。


1
你需要知道的是,连接GMail时需要SSL(无论是POP3还是IMAP - 当然,对于除端口25之外的SMTP服务器也是如此,但这是另一回事)。

1

这是我用Groovy(Java平台的动态语言)编写的下载银行对账单的代码。

import javax.mail.*
import java.util.Properties

String  gmailServer
int gmailPort
def user, password, LIMIT
def inboxFolder, root, StartDate, EndDate


//    Downloads all attachments from a gmail mail box as per some criteria
//    to a specific folder
//    Based on code from
//    http://agileice.blogspot.com/2008/10/using-groovy-to-connect-to-gmail.html
//    https://dev59.com/QUXRa4cB1Zd3GeqPrF5A
//
//    Requires: 
//        java mail jars in the class path (mail.jar and activation.jar)
//        openssl, with gmail certificate added to java keystore (see agileice blog)
//        
//    further improvement: maybe findAll could be used to filter messages
//    subject could be added as another criteria
////////////////////// <CONFIGURATION> //////////////////////
// Maximm number of emails to access in case parameter range is too high
LIMIT = 10000

// gmail credentials
gmailServer = "imap.gmail.com"
gmailPort = 993

user = "gmailuser@gmail.com"
password = "gmailpassword"

// gmail label, or "INBOX" for inbox
inboxFolder = "finance"

// local file system where the attachment files need to be stored
root = "D:\\AttachmentStore" 

// date range dd-mm-yyyy
StartDate= "31-12-2009"
EndDate = "1-6-2010" 
////////////////////// </CONFIGURATION> //////////////////////

StartDate = Date.parse("dd-MM-yyyy", StartDate)
EndDate = Date.parse("dd-MM-yyyy", EndDate)

Properties props = new Properties();
props.setProperty("mail.store.protocol", "imaps");
props.setProperty("mail.imaps.host", gmailServer);
props.setProperty("mail.imaps.port", gmailPort.toString());
props.setProperty("mail.imaps.partialfetch", "false");

def session = javax.mail.Session.getDefaultInstance(props,null)
def store = session.getStore("imaps")

store.connect(gmailServer, user, password)

int i = 0;
def folder = store.getFolder(inboxFolder)

folder.open(Folder.READ_ONLY)

for(def msg : folder.messages) {

     //if (msg.subject?.contains("bank Statement"))
     println "[$i] From: ${msg.from} Subject: ${msg.subject} -- Received: ${msg.receivedDate}"

     if (msg.receivedDate <  StartDate || msg.receivedDate > EndDate) {
         println "Ignoring due to date range"
         continue
     }


     if (msg.content instanceof Multipart) {
         Multipart mp = (Multipart)msg.content;

         for (int j=0; j < mp.count; j++) {

             Part part = mp.getBodyPart(j);

             println " ---- ${part.fileName} ---- ${part.disposition}"

             if (part.disposition?.equalsIgnoreCase(Part.ATTACHMENT)) {

                 if (part.content) {

                     def name = msg.receivedDate.format("yyyy_MM_dd") + " " + part.fileName
                     println "Saving file to $name"

                     def f = new File(root, name)

                     //f << part.content
                     try {
                         if (!f.exists())
                             f << part.content
                     }
                     catch (Exception e) {
                         println "*** Error *** $e" 
                     }
                 }
                 else {
                    println "NO Content Found!!"
                 }
             }
         }
     }

     if (i++ > LIMIT)
         break;

}

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接