Python: Google API - 从消息中获取mime类型

3

我的目标是使用Google API从指定的邮件中提取数据。当前,我可以找到消息、获取消息数据并将其解码为可读格式。在此之后,我需要查找消息中正确的部分(类型为text/html),然后使用beautiful soup扫描我的链接。不幸的是,我对邮件/ Google API 的结构理解不够,无法扫描邮件中的特定部分。

        try:
            message = gmail_service.users().messages().get(userId='me', id=thread['id'], format='raw').execute()
            print 'Message snippet: %s' % message['snippet']
            msg_str = base64.urlsafe_b64decode(message['raw'].encode('ASCII'))
            mime_msg = email.message_from_string(msg_str)

            print mime_msg        #this line gives the output I quoted
            for parts in mime_msg['payload']:    #this line produces error quoted
                if parts['text/html']:
                    mylink = base64.urlsafe_b64decode(part[0]['body']['data'].encode('UTF-8'))
                    print mylink

这段代码给我报错的信息是:

Traceback (most recent call last):
  File "gmailAPI.py", line 55, in <module>
    for parts in mime_msg['payload']:
TypeError: 'NoneType' object is not iterable

在代码的输出中,我还收到了关于邮件不同部分的信息,这就是我想要的部分。
----boundary_1_81681de2-2c9a-4827-802a-91544e5e6e28
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: base64

PCFET0NUWVBFIGh0bWwgUFVCTElDICItLy9XM0MvL0RURCBIVE1MIDQuMDEgVHJhbnNpdGlvbmFsLy9FTiINCiAgICJodHRwOi8vd3d3LnczLm9yZy9UUi9odG1sNC9sb29zZS5kdGQiPg0KDQo8aHRtbCBsYW5nPSJlbiI+DQo8aGVhZD4NCgk8bWV0YSBodHRwLWVxdWl2PSJDb250ZW50LVR5cGUiIGNvbnRlbnQ9InRleHQvaHRtbDsgY2hhcnNldD11dGYtOCI+DQoJPHRpdGxlPlNpZ251cDwvdGl0bGU+DQo8L2hlYWQ+DQoNCjxib2R5IGJnY29sb3I9IiNmZmZmZmYiIHRvcG1hcmdpbj0iMCIgbGVmdG1hcmdpbj0iMCIgbWFyZ2luaGVpZ2h0PSIwIiBtYXJnaW53aWR0aD0iMCIgc3R5bGU9Ii13ZWJraXQtZm9udC1zbW9vdGhpbmc6IGFudGlhbGlhc2VkO3dpZHRoOjEwMCUgIWltcG9ydGFudDtiYWNrZ3JvdW5kOiNmZmZmZmY7LXdlYmtpdC10ZXh0LXNpemUtYWRqdXN0Om5vbmU7Ij4NCg0KPHRhYmxlIHdpZHRoPSIxMDAlIiBjZWxscGFkZGluZz0iMCIgY2VsbHNwYWNpbmc9IjAiIGJvcmRlcj0iMCIgYmdjb2xvcj0iI2ZmZmZmZiI+DQoJPHRyPg0KCQk8dGQgYmdjb2xvcj0iI2ZmZmZmZiIgd2lkdGg9IjEwMCUiPg0KCQkJPHRhYmxlIHdpZHRoPSI2MDAiIGNlbGxwYWRkaW5nPSIwIiBjZWxsc3BhY2luZz0iMCIgYm9yZGVyPSIwIiBhbGlnbj0iY2VudGVyIiBjbGFzcz0i
dGFibGUiPg0KCQkJCTx0cj4NCgkJCQkJPHRkIHdpZHRoPSI2MDAiIGNsYXNzPSJjZWxsIj4NCgkgICAJCQkJCTx0YWJsZSB3aWR0aD0iNjAwIiBjZWxscGFkZGluZz0iMCIgY2VsbHNwYWNpbmc9IjAiIGNsYXNzPSJtYXN0Ij4NCgkJCQkJCQk8dHI+DQoJCQkJCQkJCTx0ZCB3aWR0aD0iMjUwIiBiZ2NvbG9yPSIjZmZmZmZmIj4NCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIDxpbWcgc3JjPSJjaWQ6QlRfTG9nby5qcGciIGFsdD0iQm90dG9tbGluZSBsb2dvIiBzdHlsZT0iLW1zLWludGVycG9sYXRpb24tbW9kZTpiaWN1YmljOyI+PGJyLz48YnIgLz4NCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgPC90ZD4NCiAgICAgICAgICAgICAgICAgICAgICAgICAgIAk8L3RyPg0KICAgICAgICAgICAgICAgICAgICAgICAgICAgIDx0cj4NCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgPHRkIGFsaWduPSJsZWZ0IiB3aWR0aD0iMzUwIiBzdHlsZT0icGFkZGluZy1ib3R0b206IDE1cHg7IiB2YWxpZ249InRvcCIgYmdjb2xvcj0iI2ZmZmZmZiIgY2xhc3M9InN1YkxvZ28iPjxpbWcgc3JjPSJjaWQ6QlRfTGluZS5qcGciIGFsdD0ibGluZSI+PC90ZD4NCiAgICAgICAgICAgICAgICAgICAgICAgICAgICA8L3RyPg0KCQkJCQkJPC90YWJsZT4JDQogICAgICAgICAgICAgICAgICAgICAgICA8dGFibGUgd2lkdGg9IjEwMCUiIGNlbGxwYWRkaW5nPSIwIiBjZWxsc3BhY2luZz0iMCIgYm9yZGVyPSIwIj4NCiAgICAgICAgICAgICAgICAgICAgICAgICAgICA8dHI+DQogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIDx0ZCBiZ2NvbG9yPSIjZmZmZmZmIiBzdHlsZT0icGFkZGluZzogMjBweDsiIGNsYXNzPSJlbnRyeSIgdmFsaWduPSJ0b3AiPg0KICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICA8c3BhbiBzdHlsZT0iY29sb3I6IzMzMzMzMztmb250LXNpemU6MTRweDtsaW5lLWhlaWdodDoxLjI7Zm9udC1mYW1pbHk6J0hlbHZldGljYSBOZXVlJyxIZWx2ZXRpY2EsQXJpYWwsc2Fucy1zZXJpZjttYXJnaW4tYm90dG9tOjA7cGFkZGluZy10b3A6MDtwYWRkaW5nLWJvdHRvbTowO2ZvbnQtd2VpZ2h0Om5vcm1hbDsiPg0KICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgPGJyLz5UaGFuayB5b3UgZm9yIGNob29zaW5nIGVDb25uZWN0IE9ubGluZSBmcm9tIEJvdHRvbWxpbmUgVGVjaG5vbG9naWVzOyB5b3VyIHNlY3VyZSBjbG91ZCBkb2N1bWVudCBkZWxpdmVyeSBzZXJ2aWNlLiBUbyBjb21wbGV0ZSB0aGUgIHNldHVwIG9mIHlvdXIgYWNjb3VudCwgcGxlYXNlIGZvbGxvdyB0aGUgbGluayBiZWxvdy48YnIgLz4NCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIDwvc3Bhbj48YnIgLz48YnIgLz48YnIgLz4NCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgPGEgc3R5bGU9ImJhY2tncm91bmQtY29sb3I6IzNlN2Q2NTt0ZXh0LWRlY29yYXRpb246bm9uZTsgZm9udC1mYW1pbHk6J0hlbHZldGljYSBOZXVlJyxIZWx2ZXRpY2EsQXJpYWwsc2Fucy1zZXJpZjsgY29sb3I6I2ZmZmZmZjsgcGFkZGluZy10b3A6OHB4OyBwYWRkaW5nLWJvdHRvbTo4cHg7IHBhZGRpbmctbGVmdDo4cHg7IHBhZGRpbmctcmlnaHQ6OHB4OyBmb250LXNpemU6MThweDsgbWFyZ2luOiA4cHg7IiBocmVmPSJodHRwOi8vZWNvbm5lY3QuZW1lYS1ib3R0b21saW5lLnJvb3QuYm90dG9tbGluZS5jb20vYXBpL2FjY291bnQvc2lnbnVwY29tcGxldGUvMjBhNTE4YjktZGIzZS00OTkzLWFjN2UtYjE0YzZjMGVkMzMzIj48c3BhbiBzdHlsZT0iY29sb3I6I2ZmZmZmZiI+Q29tcGxldGUgQWNjb3VudCBTZXR1cCAmcmFxdW87PC9zcGFuPjwvYT48YnIgLz48YnIgLz48YnIgLz4NCg0KICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgPHNwYW4gc3R5bGU9ImNvbG9yOiMzMzMzMzM7Zm9udC1mYW1pbHk6J0hlbHZldGljYSBOZXVlJyxIZWx2ZXRpY2EsQXJpYWwsc2Fucy1zZXJpZjtmb250LXNpemU6MTRweDtsaW5lLWhlaWdodDoxLjI7Zm9u
dC1mYW1pbHk6J0hlbHZldGljYSBOZXVlJyxIZWx2ZXRpY2EsQXJpYWwsc2Fucy1zZXJpZjttYXJnaW4tYm90dG9tOjA7cGFkZGluZy10b3A6MDtwYWRkaW5nLWJvdHRvbTowO2ZvbnQtd2VpZ2h0Om5vcm1hbDsiPg0KICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgPGJyLz5LaW5kIFJlZ2FyZHMsPGJyIC8+DQogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgVGhlIGVDb25uZWN0IE9ubGluZSBUZWFtDQogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICA8L3NwYW4+PGJyIC8+PGJyIC8+DQoNCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIDxzcGFuIHN0eWxlPSJjb2xvcjojMzMzMzMzO2ZvbnQtc2l6ZToxNHB4O2xpbmUtaGVpZ2h0OjEuMjtmb250LWZhbWlseTonSGVsdmV0aWNhIE5ldWUnLEhlbHZldGljYSxBcmlhbCxzYW5zLXNlcmlmO21hcmdpbi1ib3R0b206MDtwYWRkaW5nLXRvcDowO3BhZGRpbmctYm90dG9tOjA7Zm9udC13ZWlnaHQ6bm9ybWFsOyI+DQogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICA8YnIvPkZvciBzdXBwb3J0IHBsZWFzZSBjb250YWN0OiBlbWVhLXN1cHBvcnRAYm90dG9tbGluZS5jb20gPGJyIC8+DQogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgVGVsOiAwODcwIDA4MSA4MjUwPGJyIC8+DQogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICA8L3NwYW4+DQoNCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgPC90ZD4NCiAgICAgICAgICAgICAgICAgICAgICAgICAgICA8L3RyPg0KICAgICAgICAgICAgICAgICAgICAgICAgPC90YWJsZT4NCiAgICAgICAgICAgICAgICAgICAgICAgIDxici8+DQoJCQkJCQkJCQkNCiAgICAgICAgICAgICAgICAgICAgPC90ZD4NCiAgICAgICAgICAgICAgICA8L3RyPg0KICAgICAgICAgICAgPC90YWJsZT4NCgkJPC90ZD4NCgk8L3RyPg0KPC90YWJsZT4NCgkJCQkNCjx0YWJsZSB3aWR0aD0iNjAwIiBjZWxscGFkZGluZz0iMCIgY2VsbHNwYWNpbmc9IjAiIGJvcmRlcj0iMCIgYWxpZ249ImNlbnRlciIgY2xhc3M9ImZvb3RlciI+DQogICAgPHRyPg0KICAgICAgICA8dGQ+DQogICAgICAgICAgICA8dGFibGUgd2lkdGg9IjYwMCIgY2VsbHBhZGRpbmc9IjAiIGNlbGxzcGFjaW5nPSIwIiBib3JkZXI9IjAiIGFsaWduPSJjZW50ZXIiIGNsYXNzPSJ0YWJsZSIgc3R5bGU9ImJvcmRlci10b3A6MXB4IHNvbGlkICNjY2NjY2M7Ij4NCiAgICAgICAgICAgICAgICA8dHI+DQogICAgICAgICAgICAgICAgICAgIDx0ZD4NCiAgICAgICAgICAgICAgICAgICAgICAgIDwvdGQ+DQogICAgICAgICAgICAgICAgPC90cj4NCiAgICAgICAgICAgICAgICA8dHI+DQogICAgICAgICAgICAgICAgICAgIDx0ZD48cCBzdHlsZT0iZm9udC1mYW1pbHk6dmVyZGFuYTsgY29sb3I6IzQ0NDQ0NDsgZm9udC1zaXplOjEwcHg7Ij4NCiAgICAgICAgICAgICAgICAgICAgICAgICAgICZjb3B5OyAyMDE0IEJvdHRvbWxpbmUgVGVjaG5vbG9naWVzLCBJbmMuIEFsbCBSaWdodHMgUmVzZXJ2ZWQ8L3A+PC90ZD4NCiAgICAgICAgICAgICAgICA8L3RyPgkNCiAgICAgICAgICAgIDwvdGFibGU+ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICANCiAgICAgICAgPC90ZD4NCiAgICA8L3RyPg0KPC90YWJsZT4NCgkNCjwvYm9keT4NCjwvaHRtbD4NCg==

我的代码完整备份链接

编辑: 我修复后的代码

try:
        message = gmail_service.users().messages().get(userId='me', id=thread['id'], format='raw').execute()
        # print 'Message snippet: %s' % message['snippet']
        msg_str = base64.urlsafe_b64decode(message['raw'].encode('ASCII'))
        msg = email.message_from_string(msg_str)

        for part in msg.walk():
            msg.get_payload()
            if part.get_content_type() == 'text/html':
                mytext = base64.urlsafe_b64decode(part.get_payload().encode('UTF-8'))
                # print part.get_payload()
                print mytext

我选择的答案文档链接中的信息对解决我的问题非常有价值。


你能提供从API调用返回的消息对象的完整转储吗? - Brandon Jewett-Hall
好的,我会把它放在外部网站上。这个消息很长,我会稍后编辑我的问题。 - LeonH
@Stormie 但是邮件正文仍然不是人类可读版本吗?我也可以打印出message['snippet'],但仍然无法获得电子邮件消息正文的人类可读版本。 - littletiger
1个回答

3
要在Python中迭代多部分消息的各个部分,您应该使用get_payload()方法:https://docs.python.org/2/library/email.message.html#email.message.Message.get_payload 在您的示例中,调用mime_msg['payload']是在查找名为“payload”的消息头,但该消息头并不存在,而且也不是您想要的内容。
一旦您拿到某个部分,您可以使用part['Content-Type']检查其类型,以检查Content-Type头信息。
通常,MIME消息是一个部分的树状结构,因此您可能需要进行递归处理。

我只是在编辑我的问题,以包括“修复”的代码,因为您的解释并不像您链接的文档那样有帮助。 - LeonH

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接